From 52bc6f609a97a795b95ef60770b66a8ee3380d86 Mon Sep 17 00:00:00 2001
From: Mark Goddard <mark@stackhpc.com>
Date: Mon, 7 Oct 2019 12:17:35 +0100
Subject: [PATCH] Docs: add nova cells

Add documentation about deploying nova with multiple cells.

Change-Id: I89ee276917e5b9170746e07b7f644c7593b03da1
Depends-On: https://review.opendev.org/#/c/675659/
Related: blueprint bp/support-nova-cells
---
 doc/source/reference/compute/index.rst        |   1 +
 .../reference/compute/nova-cells-guide.rst    | 461 ++++++++++++++++++
 doc/source/reference/compute/nova-guide.rst   |   8 +
 doc/source/user/multinode.rst                 |  48 ++
 doc/source/user/operating-kolla.rst           |   5 +
 5 files changed, 523 insertions(+)
 create mode 100644 doc/source/reference/compute/nova-cells-guide.rst

diff --git a/doc/source/reference/compute/index.rst b/doc/source/reference/compute/index.rst
index 96b5e5c257..8f84634de3 100644
--- a/doc/source/reference/compute/index.rst
+++ b/doc/source/reference/compute/index.rst
@@ -11,6 +11,7 @@ compute services like HyperV, XenServer and so on.
    hyperv-guide
    libvirt-guide
    masakari-guide
+   nova-cells-guide
    nova-fake-driver
    nova-guide
    qinling-guide
diff --git a/doc/source/reference/compute/nova-cells-guide.rst b/doc/source/reference/compute/nova-cells-guide.rst
new file mode 100644
index 0000000000..443d3e1c2e
--- /dev/null
+++ b/doc/source/reference/compute/nova-cells-guide.rst
@@ -0,0 +1,461 @@
+==========
+Nova Cells
+==========
+
+Overview
+========
+
+Nova cells V2 is a feature that allows Nova deployments to be scaled out to
+a larger size than would otherwise be possible. This is achieved through
+sharding of the compute nodes into pools known as *cells*, with each cell
+having a separate message queue and database.
+
+Further information on cells can be found in the Nova documentation
+:nova-doc:`here <user/cells.html>` and :nova-doc:`here
+<user/cellsv2-layout.html>`. This document assumes the reader is familiar with
+the concepts of cells.
+
+Cells: deployment perspective
+=============================
+
+From a deployment perspective, nova cell support involves separating the Nova
+services into two sets - global services and per-cell services.
+
+Global services:
+
+* ``nova-api``
+* ``nova-scheduler``
+* ``nova-super-conductor`` (in multi-cell mode)
+
+Per-cell control services:
+
+* ``nova-compute-ironic`` (for Ironic cells)
+* ``nova-conductor``
+* ``nova-novncproxy``
+* ``nova-serialproxy``
+* ``nova-spicehtml5proxy``
+
+Per-cell compute services:
+
+* ``nova-compute``
+* ``nova-libvirt``
+* ``nova-ssh``
+
+Another consideration is the database and message queue clusters that the cells
+depend on. This will be discussed later.
+
+Service placement
+-----------------
+
+There are a number of ways to place services in a multi-cell environment.
+
+Single cell topology
+~~~~~~~~~~~~~~~~~~~~
+
+The single cell topology is used by default, and is limited to a single cell::
+
+            +----------------+
+            |                ++
+            |                |-+
+            |   controllers  |-|
+            |                |-|
+            |                |-|
+            +------------------|
+             +-----------------|
+              +----------------+
+
+    +--------------+     +--------------+
+    |              |     |              |
+    |   cell 1     |     |   cell 1     |
+    |   compute 1  |     |   compute 2  |
+    |              |     |              |
+    +--------------+     +--------------+
+
+All control services run on the controllers, and there is no superconductor.
+
+Dedicated cell controller topology
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In this topology, each cell has a dedicated group of controllers to run cell
+control services. The following diagram shows the topology for a cloud with two
+cells::
+
+                                    +----------------+
+                                    |                ++
+                                    |                |-+
+                                    |   controllers  |-|
+                                    |                |-|
+                                    |                |-|
+                                    +------------------|
+                                     +-----------------|
+                                      +----------------+
+
+                       +----------------+        +----------------+
+                       |                ++       |                ++
+                       |   cell 1       |-+      |   cell 2       |-+
+                       |   controllers  |-|      |   controllers  |-|
+                       |                |-|      |                |-|
+                       +------------------|      +------------------|
+                        +-----------------|       +-----------------|
+                         +----------------+        +----------------+
+
+    +--------------+     +--------------+        +--------------+     +--------------+
+    |              |     |              |        |              |     |              |
+    |   cell 1     |     |   cell 1     |        |   cell 2     |     |   cell 2     |
+    |   compute 1  |     |   compute 2  |        |   compute 1  |     |   compute 2  |
+    |              |     |              |        |              |     |              |
+    +--------------+     +--------------+        +--------------+     +--------------+
+
+Shared cell controller topology
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. note::
+
+   This topology is not yet supported by Kolla Ansible.
+
+An alternative configuration is to place the cell control services for multiple
+cells on a single shared group of cell controllers. This might allow for more
+efficient use of hardware where the control services for a single cell do not
+fully consume the resources of a set of cell controllers::
+
+                                    +----------------+
+                                    |                ++
+                                    |                |-+
+                                    |   controllers  |-|
+                                    |                |-|
+                                    |                |-|
+                                    +------------------|
+                                     +-----------------|
+                                      +----------------+
+
+                                    +----------------+
+                                    |                ++
+                                    |   shared cell  |-+
+                                    |   controllers  |-|
+                                    |                |-|
+                                    +------------------|
+                                     +-----------------|
+                                      +----------------+
+
+    +--------------+     +--------------+        +--------------+     +--------------+
+    |              |     |              |        |              |     |              |
+    |   cell 1     |     |   cell 1     |        |   cell 2     |     |   cell 2     |
+    |   compute 1  |     |   compute 2  |        |   compute 1  |     |   compute 2  |
+    |              |     |              |        |              |     |              |
+    +--------------+     +--------------+        +--------------+     +--------------+
+
+Databases & message queues
+--------------------------
+
+The global services require access to a database for the API and cell0
+databases, in addition to a message queue. Each cell requires its own database
+and message queue instance. These could be separate database and message queue
+clusters, or shared database and message queue clusters partitioned via
+database names and virtual hosts. Currently Kolla Ansible supports deployment
+of shared database cluster and message queue clusters.
+
+Configuration
+=============
+
+.. seealso::
+
+   Configuring Kolla Ansible for deployment of multiple cells typically
+   requires use of :ref:`inventory host and group variables
+   <multinode-host-and-group-variables>`.
+
+Enabling multi-cell support
+---------------------------
+
+Support for deployment of multiple cells is disabled by default - nova is
+deployed in single conductor mode.
+
+Deployment of multiple cells may be enabled by setting ``enable_cells`` to
+``yes`` in ``globals.yml``. This deploys nova in superconductor mode, with
+separate conductors for each cell.
+
+Naming cells
+------------
+
+By default, all cell services are deployed in a single unnamed cell. This
+behaviour is backwards compatible with previous releases of Kolla Ansible.
+
+To deploy hosts in a different cell, set the ``nova_cell_name`` variable
+for the hosts in the cell. This can be done either using host variables or
+group variables.
+
+Groups
+------
+
+In a single cell deployment, the following Ansible groups are used to determine
+the placement of services:
+
+* ``compute``: ``nova-compute``, ``nova-libvirt``, ``nova-ssh``
+* ``nova-compute-ironic``: ``nova-compute-ironic``
+* ``nova-conductor``: ``nova-conductor``
+* ``nova-novncproxy``: ``nova-novncproxy``
+* ``nova-serialproxy``: ``nova-serialproxy``
+* ``nova-spicehtml5proxy``: ``nova-spicehtml5proxy``
+
+In a multi-cell deployment, this is still necessary - compute hosts must be in
+the ``compute`` group. However, to provide further control over where cell
+services are placed, the following variables are used:
+
+* ``nova_cell_compute_group``
+* ``nova_cell_compute_ironic_group``
+* ``nova_cell_conductor_group``
+* ``nova_cell_novncproxy_group``
+* ``nova_cell_serialproxy_group``
+* ``nova_cell_spicehtml5proxy_group``
+
+For backwards compatibility, these are set by default to the original group
+names.  For a multi-cell deployment, they should be set to the name of a group
+containing only the compute hosts in that cell.
+
+Example
+~~~~~~~
+
+In the following example we have two cells, ``cell1`` and ``cell2``. Each cell
+has two compute nodes and a cell controller.
+
+Inventory:
+
+.. code-block:: INI
+
+   [compute:children]
+   compute-cell1
+   compute-cell2
+
+   [nova-conductor:children]
+   cell-control-cell1
+   cell-control-cell2
+
+   [nova-novncproxy:children]
+   cell-control-cell1
+   cell-control-cell2
+
+   [nova-spicehtml5proxy:children]
+   cell-control-cell1
+   cell-control-cell2
+
+   [nova-serialproxy:children]
+   cell-control-cell1
+   cell-control-cell2
+
+   [cell1:children]
+   compute-cell1
+   cell-control-cell1
+
+   [cell2:children]
+   compute-cell2
+   cell-control-cell2
+
+   [compute-cell1]
+   compute01
+   compute02
+
+   [compute-cell2]
+   compute03
+   compute04
+
+   [cell-control-cell1]
+   cell-control01
+
+   [cell-control-cell2]
+   cell-control02
+
+Cell1 group variables (``group_vars/cell1``):
+
+.. code-block:: yaml
+
+   nova_cell_name: cell1
+   nova_cell_compute_group: compute-cell1
+   nova_cell_conductor_group: cell-control-cell1
+   nova_cell_novncproxy_group: cell-control-cell1
+   nova_cell_serialproxy_group: cell-control-cell1
+   nova_cell_spicehtml5proxy_group: cell-control-cell1
+
+Cell2 group variables (``group_vars/cell2``):
+
+.. code-block:: yaml
+
+   nova_cell_name: cell2
+   nova_cell_compute_group: compute-cell2
+   nova_cell_conductor_group: cell-control-cell2
+   nova_cell_novncproxy_group: cell-control-cell2
+   nova_cell_serialproxy_group: cell-control-cell2
+   nova_cell_spicehtml5proxy_group: cell-control-cell2
+
+Note that these example cell group variables specify groups for all console
+proxy services for completeness. You will need to ensure that there are no
+port collisions. For example, if in both cell1 and cell2, you use the default
+``novncproxy`` console proxy, you could add ``nova_novncproxy_port: 6082``
+to the cell2 group variables to prevent a collision with cell1.
+
+Databases
+---------
+
+The database connection for each cell is configured via the following
+variables:
+
+* ``nova_cell_database_name``
+* ``nova_cell_database_user``
+* ``nova_cell_database_password``
+* ``nova_cell_database_address``
+* ``nova_cell_database_port``
+
+By default the MariaDB cluster deployed by Kolla Ansible is used.  For an
+unnamed cell, the ``nova`` database is used for backwards compatibility.  For a
+named cell, the database is named ``nova_<cell name>``.
+
+Message queues
+--------------
+
+The RPC message queue for each cell is configured via the following variables:
+
+* ``nova_cell_rpc_user``
+* ``nova_cell_rpc_password``
+* ``nova_cell_rpc_port``
+* ``nova_cell_rpc_group_name``
+* ``nova_cell_rpc_transport``
+* ``nova_cell_rpc_vhost``
+
+And for notifications:
+
+* ``nova_cell_notify_user``
+* ``nova_cell_notify_password``
+* ``nova_cell_notify_port``
+* ``nova_cell_notify_group_name``
+* ``nova_cell_notify_transport``
+* ``nova_cell_notify_vhost``
+
+By default the message queue cluster deployed by Kolla Ansible is used. For an
+unnamed cell, the ``/`` virtual host used by all OpenStack services is used for
+backwards compatibility.  For a named cell, a virtual host named ``nova_<cell
+name>`` is used.
+
+Conductor & API database
+------------------------
+
+By default the cell conductors are configured with access to the API database.
+This is currently necessary for `some operations
+<https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls>`__
+in Nova which require an *upcall*.
+
+If those operations are not required, it is possible to prevent cell conductors
+from accessing the API database by setting
+``nova_cell_conductor_has_api_database`` to ``no``.
+
+Console proxies
+---------------
+
+General information on configuring console access in Nova is available
+:ref:`here <nova-consoles>`. For deployments with multiple cells, the console
+proxies for each cell must be accessible by a unique endpoint. We achieve this
+by adding an HAProxy frontend for each cell that forwards to the console
+proxies for that cell. Each frontend must use a different port. The port may be
+configured via the following variables:
+
+* ``nova_novncproxy_port``
+* ``nova_spicehtml5proxy_port``
+* ``nova_serialproxy_port``
+
+Ironic
+------
+
+Currently all Ironic-based instances are deployed in a single cell. The name of
+that cell is configured via ``nova_cell_ironic_cell_name``, and defaults to the
+unnamed cell. ``nova_cell_compute_ironic_group`` can be used to set the group
+that the ``nova-compute-ironic`` services are deployed to.
+
+Deployment
+==========
+
+Deployment in a multi-cell environment does not need to be done differently
+than in a single-cell environment - use the ``kolla-ansible deploy`` command.
+
+Scaling out
+-----------
+
+A common operational task in large scale environments is to add new compute
+resources to an existing deployment. In a multi-cell environment it is likely
+that these will all be added to one or more new or existing cells. Ideally we
+would not risk affecting other cells, or even the control hosts, when deploying
+these new resources.
+
+The Nova cells support in Kolla Ansible has been built such that it is possible
+to add new cells or extend existing ones without affecting the rest of the
+cloud. This is achieved via the ``--limit`` argument to ``kolla-ansible``. For
+example, if we are adding a new cell ``cell03`` to an existing cloud, and all
+hosts for that cell (control and compute) are in a ``cell03`` group, we could
+use this as our limit:
+
+.. code-block:: console
+
+   kolla-ansible deploy --limit cell03
+
+When adding a new cell, we also need to ensure that HAProxy is configured for
+the console proxies in that cell:
+
+.. code-block:: console
+
+   kolla-ansible deploy --tags haproxy
+
+Another benefit of this approach is that it should be faster to complete, as
+the number of hosts Ansible manages is reduced.
+
+.. _nova-cells-upgrade:
+
+Upgrades
+========
+
+Similar to deploys, upgrades in a multi-cell environment can be performed in
+the same way as single-cell environments, via ``kolla-ansible upgrade``.
+
+Staged upgrades
+---------------
+
+.. note::
+
+   Staged upgrades are not applicable when ``nova_safety_upgrade`` is ``yes``.
+
+In large environments the risk involved with upgrading an entire site can be
+significant, and the ability to upgrade one cell at a time is crucial. This
+is very much an advanced procedure, and operators attempting this should be
+familiar with the :nova-doc:`Nova upgrade documentation <user/upgrade>`.
+
+Here we use Ansible tags and limits to control the upgrade process. We will
+only consider the Nova upgrade here. It is assumed that all dependent services
+have been upgraded (see ``ansible/site.yml`` for correct ordering).
+
+The first step, which may be performed in advance of the upgrade, is to perform
+the database schema migrations.
+
+.. code-block:: console
+
+   kolla-bootstrap upgrade --tags nova-bootstrap
+
+Next, we upgrade the global services.
+
+.. code-block:: console
+
+   kolla-bootstrap upgrade --tags nova-api-upgrade
+
+Now the cell services can be upgraded. This can be performed in batches of
+one or more cells at a time, using ``--limit``. For example, to upgrade
+services in ``cell03``:
+
+.. code-block:: console
+
+   kolla-bootstrap upgrade --tags nova-cell-upgrade --limit cell03
+
+At this stage, we might wish to perform testing of the new services, to check
+that they are functioning correctly before proceeding to other cells.
+
+Once all cells have been upgraded, we can reload the services to remove RPC
+version pinning, and perform online data migrations.
+
+.. code-block:: console
+
+   kolla-bootstrap upgrade --tags nova-reload,nova-online-data-migrations
+
+The nova upgrade is now complete, and upgrading of other services may continue.
diff --git a/doc/source/reference/compute/nova-guide.rst b/doc/source/reference/compute/nova-guide.rst
index 953274680e..bd064b6ca0 100644
--- a/doc/source/reference/compute/nova-guide.rst
+++ b/doc/source/reference/compute/nova-guide.rst
@@ -52,6 +52,8 @@ The fake driver can be used for testing Nova's scaling properties without
 requiring access to a large amount of hardware resources. It is covered in
 :doc:`nova-fake-driver`.
 
+.. _nova-consoles:
+
 Consoles
 ========
 
@@ -59,3 +61,9 @@ The console driver may be selected via ``nova_console`` in ``globals.yml``.
 Valid options are ``none``, ``novnc``, ``spice``, or ``rdp``. Additionally,
 serial console support can be enabled by setting
 ``enable_nova_serialconsole_proxy`` to ``yes``.
+
+Cells
+=====
+
+Information on using Nova Cells V2 to scale out can be found in
+:doc:`nova-cells-guide`.
diff --git a/doc/source/user/multinode.rst b/doc/source/user/multinode.rst
index 64f4a50fbd..00639fff51 100644
--- a/doc/source/user/multinode.rst
+++ b/doc/source/user/multinode.rst
@@ -145,6 +145,54 @@ grouped together and changing these around can break your deployment:
    [haproxy:children]
    network
 
+.. _multinode-host-and-group-variables:
+
+Host and group variables
+========================
+
+Typically, Kolla Ansible configuration is stored in the ``globals.yml`` file.
+Variables in this file apply to all hosts. In an environment with multiple
+hosts, it may become necessary to have different values for variables for
+different hosts. A common example of this is for network interface
+configuration, e.g. ``api_interface``.
+
+Ansible's host and group variables can be assigned in a `variety of ways
+<https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html>`_.
+Simplest is in the inventory file itself:
+
+.. code-block:: ini
+
+   # Host with a host variable.
+   [control]
+   control01 api_interface=eth3
+
+   # Group with a group variable.
+   [control:vars]
+   api_interface=eth4
+
+This can quickly start to become difficult to maintain, so it may be preferable
+to use ``host_vars`` or ``group_vars`` directories containing YAML files with
+host or group variables:
+
+.. code-block:: console
+
+   inventory/
+     group_vars/
+       control
+     host_vars/
+       control01
+     multinode
+
+`Ansible's variable precedence rules
+<https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#ansible-variable-precedence>`__
+are quite complex, but it is worth becoming familiar with them if using host
+and group variables. The playbook group variables in
+``ansible/group_vars/all.yml`` define global defaults, and these take
+precedence over variables defined in an inventory file and inventory
+``group_vars/all``, but not over inventory ``group_vars/*``. Variables in
+'extra' files (``globals.yml``) have the highest precedence, so any variables
+which must differ between hosts must not be in ``globals.yml``.
+
 Deploying Kolla
 ===============
 
diff --git a/doc/source/user/operating-kolla.rst b/doc/source/user/operating-kolla.rst
index 93e636f05f..f8e96ed44a 100644
--- a/doc/source/user/operating-kolla.rst
+++ b/doc/source/user/operating-kolla.rst
@@ -29,6 +29,11 @@ contained in the kolla-ansible package.
 Upgrade procedure
 ~~~~~~~~~~~~~~~~~
 
+.. note::
+
+   If you have set ``enable_cells`` to ``yes`` then you should read the
+   upgrade notes in the :ref:`Nova cells guide<nova-cells-upgrade>`.
+
 Kolla's strategy for upgrades is to never make a mess and to follow consistent
 patterns during deployment such that upgrades from one environment to the next
 are simple to automate.