Add Distributed Cloud GEO Redundancy docs (r9, dsr8MR3)
- Overview of the feature - Procedure of the feature configuration Story: 2010852 Task: 48493 Change-Id: If5fd6792adbb7e77ab2e92f29527c951be0134ee Signed-off-by: Litao Gao <litao.gao@windriver.com> Signed-off-by: Ngairangbam Mili <ngairangbam.mili@windriver.com>
This commit is contained in:
parent
a34b9d46c8
commit
2a75cb0a7a
@ -23,6 +23,8 @@
|
||||
.. |os-prod-hor| replace:: OpenStack |prod-hor|
|
||||
.. |prod-img| replace:: https://mirror.starlingx.windriver.com/mirror/starlingx/
|
||||
.. |prod-abbr| replace:: StX
|
||||
.. |prod-dc-geo-red| replace:: Distributed Cloud Geo Redundancy
|
||||
.. |prod-dc-geo-red-long| replace:: Distributed Cloud System controller Geographic Redundancy
|
||||
|
||||
.. Guide names; will be formatted in italics by default.
|
||||
.. |node-doc| replace:: :title:`StarlingX Node Configuration and Management`
|
||||
|
@ -16,6 +16,12 @@ system data backup file has been generated on the subcloud, it will be
|
||||
transferred to the system controller and stored at a dedicated central location
|
||||
``/opt/dc-vault/backups/<subcloud-name>/<release-version>``.
|
||||
|
||||
.. note::
|
||||
|
||||
Enabling the GEO Redundancy function will affect some of the subcloud
|
||||
backup functions. For more information on GEO Redundancy and its
|
||||
restrictions, see :ref:`configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662`.
|
||||
|
||||
Backup data creation requires the subcloud to be online, managed, and in
|
||||
healthy state.
|
||||
|
||||
|
@ -0,0 +1,617 @@
|
||||
.. _configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662:
|
||||
|
||||
============================================================
|
||||
Configure Distributed Cloud System Controller GEO Redundancy
|
||||
============================================================
|
||||
|
||||
.. rubric:: |context|
|
||||
|
||||
You can configure a distributed cloud System Controller GEO Redundancy
|
||||
using DC manager |CLI| commands.
|
||||
|
||||
System administrators can follow the procedures below to enable and
|
||||
disable the GEO Redundancy feature.
|
||||
|
||||
.. Note::
|
||||
|
||||
In this release, the GEO Redundancy feature supports only two
|
||||
distributed clouds in one protection group.
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
:depth: 1
|
||||
|
||||
---------------------
|
||||
Enable GEO Redundancy
|
||||
---------------------
|
||||
|
||||
Set up a protection group for two distributed clouds, making these two
|
||||
distributed clouds operational in 1+1 active GEO Redundancy mode.
|
||||
|
||||
For example, let us assume we have two distributed clouds, site A and site B.
|
||||
When the operation is performed on site A, the local site is site A and the
|
||||
peer site is site B. When the operation is performed on site B, the local
|
||||
site is site B and the peer site is site A.
|
||||
|
||||
.. rubric:: |prereq|
|
||||
|
||||
The peer system controller's |OAM| network is accessible to each other and can
|
||||
access the subclouds via both |OAM| and management networks.
|
||||
|
||||
For security of production system, it is important to ensure the safety and
|
||||
identification of peer site queries. To meet this objective, it is essential to
|
||||
have an HTTPS-based system API in place. This necessitates the presence of a
|
||||
well-known and trusted |CA| to enable secure HTTPS communication between peers.
|
||||
If you are using an internally trusted |CA|, ensure that the system trusts the |CA| by installing
|
||||
its certificate with the following command.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
~(keystone_admin)]$ system certificate-install --mode ssl_ca <trusted-ca-bundle-pem-file>
|
||||
|
||||
where:
|
||||
|
||||
``<trusted-ca-bundle-pem-file>``
|
||||
is the path to the intermediate or Root |CA| certificate associated
|
||||
with the |prod| REST API's Intermediate or Root |CA|-signed certificate.
|
||||
|
||||
.. rubric:: |proc|
|
||||
|
||||
You can enable the GEO Redundancy feature between site A and site B from the
|
||||
command line. In this procedure, the subclouds managed by site A will be
|
||||
configured to be managed by GEO Redundancy protection group that consists of site
|
||||
A and site B. When site A is offline for some reasons, an alarm notifies the
|
||||
administrator, who initiates the group based batch migration
|
||||
to rehome the subclouds of site A to site B for centralized management.
|
||||
|
||||
Similarly, you can also configure the subclouds managed by site B to be
|
||||
taken over by site A when site B is offline by following the same procedure where
|
||||
site B is local site and site A is peer site.
|
||||
|
||||
#. Log in to the active controller node of site B and get the required
|
||||
information about the site B to create a protection group.
|
||||
|
||||
* Unique |UUID| of the central cloud of the peer system controller
|
||||
* URI of Keystone endpoint of peer system controller
|
||||
* Gateway IP address of the management network of peer system controller
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
sysadmin@controller-0:~$ source /etc/platform/openrc
|
||||
~(keystone_admin)]$ system show | grep -i uuid
|
||||
| uuid | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a |
|
||||
|
||||
~(keystone_admin)]$ openstack endpoint list --service keystone \
|
||||
--interface public --region RegionOne -c URL
|
||||
+-----------------------------+
|
||||
| URL |
|
||||
+-----------------------------+
|
||||
| http://10.10.10.2:5000 |
|
||||
+-----------------------------+
|
||||
|
||||
~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
|
||||
gateway
|
||||
10.10.27.1
|
||||
|
||||
#. Log in to the active controller node of the central cloud of site A. Create
|
||||
a System Peer instance of site B on site A so that site A can access information of
|
||||
site B.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site A
|
||||
~(keystone_admin)]$ dcmanager system-peer add \
|
||||
--peer-uuid 223fcb30-909d-4edf-8c36-1aebc8e9bd4a \
|
||||
--peer-name siteB \
|
||||
--manager-endpoint http://10.10.10.2:5000 \
|
||||
--peer-controller-gateway-address 10.10.27.1
|
||||
Enter the admin password for the system peer:
|
||||
Re-enter admin password to confirm:
|
||||
|
||||
+----+--------------------------------------+-----------+-----------------------------+----------------------------+
|
||||
| id | peer uuid | peer name | manager endpoint | controller gateway address |
|
||||
+----+--------------------------------------+-----------+-----------------------------+----------------------------+
|
||||
| 2 | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a | siteB | http://10.10.10.2:5000 | 10.10.27.1 |
|
||||
+----+--------------------------------------+-----------+-----------------------------+----------------------------+
|
||||
|
||||
#. Collect the information from site A.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site A
|
||||
sysadmin@controller-0:~$ source /etc/platform/openrc
|
||||
~(keystone_admin)]$ system show | grep -i uuid
|
||||
~(keystone_admin)]$ openstack endpoint list --service keystone --interface public --region RegionOne -c URL
|
||||
~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
|
||||
|
||||
#. Log in to the active controller node of the central cloud of site B. Create
|
||||
a System Peer instance of site A on site B so that site B has information about site A.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
~(keystone_admin)]$ dcmanager system-peer add \
|
||||
--peer-uuid 3963cb21-c01a-49cc-85dd-ebc1d142a41d \
|
||||
--peer-name siteA \
|
||||
--manager-endpoint http://10.10.11.2:5000 \
|
||||
--peer-controller-gateway-address 10.10.25.1
|
||||
Enter the admin password for the system peer:
|
||||
Re-enter admin password to confirm:
|
||||
|
||||
#. Create a |SPG| for site A.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site A
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group add --peer-group-name group1
|
||||
|
||||
#. Add the subclouds needed for redundancy protection on site A.
|
||||
|
||||
Ensure that the subclouds bootstrap data is updated. The bootstrap data is
|
||||
the data used to bootstrap the subcloud, which includes the |OAM| and
|
||||
management network information, system controller gateway information, and docker
|
||||
registry information to pull necessary images to bootstrap the system.
|
||||
|
||||
For an example of a typical bootstrap file, see :ref:`installing-and-provisioning-a-subcloud`.
|
||||
|
||||
#. Update the subcloud information with the bootstrap values.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~(keystone_admin)]$ dcmanager subcloud update subcloud1 \
|
||||
--bootstrap-address <Subcloud_OAM_IP_Address> \
|
||||
--bootstrap-values <Path_of_Bootstrap-Value-File>
|
||||
|
||||
#. Update the subcloud information with the |SPG| created locally.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud1-Name> \
|
||||
--peer-group <SiteA-Subcloud-Peer-Group-ID-or-Name>
|
||||
|
||||
For example,
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group group1
|
||||
|
||||
#. If you want to remove one subcloud from the |SPG|, run the
|
||||
following command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud-Name> --peer-group none
|
||||
|
||||
For example,
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group none
|
||||
|
||||
#. Check the subclouds that are under the |SPG|.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-ID-or-Name>
|
||||
|
||||
#. Create an association between the System Peer and |SPG|.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site A
|
||||
~(keystone_admin)]$ dcmanager peer-group-association add \
|
||||
--system-peer-id <SiteB-System-Peer-ID> \
|
||||
--peer-group-id <SiteA-System-Peer-Group1> \
|
||||
--peer-group-priority <priority>
|
||||
|
||||
The ``peer-group-priority`` parameter can accept an integer value greater
|
||||
than 0. It is used to set the priority of the |SPG|, which is
|
||||
created in peer site using the peer site's dcmanager API during association
|
||||
synchronization.
|
||||
|
||||
* The default priority in the |SPG| is 0 when it is created
|
||||
in the local site.
|
||||
|
||||
* The smallest integer has the highest priority.
|
||||
|
||||
During the association creation, the |SPG| in the association
|
||||
will be synchronized from the local site to the peer site, and the subclouds
|
||||
belonging to the |SPG|.
|
||||
|
||||
Confirm that the local |SPG| and its subclouds have been synchronized
|
||||
into site B with the same name.
|
||||
|
||||
* Show the association information just created in site A and ensure that
|
||||
``sync_status`` is ``in-sync``.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site A
|
||||
~(keystone_admin)]$ dcmanager peer-group-association list <Association-ID>
|
||||
|
||||
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||
| id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority |
|
||||
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||
| 1 | 1 | 2 | primary | in-sync | 2 |
|
||||
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||
|
||||
* Show ``subcloud-peer-group`` in site B and ensure that it has been created.
|
||||
|
||||
* List the subcloud in ``subcloud-peer-group`` in site B and ensure that all
|
||||
the subclouds have been synchronized as secondary subclouds.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group show <SiteA-Subcloud-Peer-Group-Name>
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-Name>
|
||||
|
||||
When you create the primary association on site A, a non-primary association
|
||||
on site B will automatically be created to associate the synchronized |SPG|
|
||||
from site A and the system peer pointing to site A.
|
||||
|
||||
You can check the association list to confirm if the non-primary association
|
||||
was created on site B.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
~(keystone_admin)]$ dcmanager peer-group-association list
|
||||
+----+---------------+----------------+-------------+-------------+---------------------+
|
||||
| id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority |
|
||||
+----+---------------+----------------+-------------+-------------+---------------------+
|
||||
| 2 | 26 | 1 | non-primary | in-sync | None |
|
||||
+----+---------------+----------------+-------------+-------------+---------------------+
|
||||
|
||||
#. (Optional) Update the protection group related configuration.
|
||||
|
||||
After the peer group association has been created, you can still update the
|
||||
related resources configured in the protection group:
|
||||
|
||||
* Update subcloud with bootstrap values
|
||||
* Add subcloud(s) into the |SPG|
|
||||
* Remove subcloud(s) from the |SPG|
|
||||
|
||||
After any of the above operations, ``sync_status`` is changed to ``out-of-sync``.
|
||||
|
||||
After the update has been completed, you need to use the :command:`sync`
|
||||
command to push the |SPG| changes to the peer site that
|
||||
keeps the |SPG| the same status.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site A
|
||||
dcmanager peer-group-association sync <SiteA-Peer-Group-Association1-ID>
|
||||
|
||||
.. warning::
|
||||
|
||||
The :command:`dcmanager peer-group-association sync` command must be run
|
||||
after any of the following changes:
|
||||
|
||||
- Subcloud is removed from the |SPG| for the subcloud name change.
|
||||
|
||||
- Subcloud is removed from the |SPG| for the subcloud management network
|
||||
reconfiguration.
|
||||
|
||||
- Subcloud updates one or both of these parameters:
|
||||
``--bootstrap-address``, ``--bootstrap-values parameters``.
|
||||
|
||||
Similarly, you need to check the information has been synchronized by
|
||||
showing the association information just created in site A, ensuring that
|
||||
``sync_status`` is ``in-sync``.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site A
|
||||
~(keystone_admin)]$ dcmanager peer-group-association show <Association-ID>
|
||||
|
||||
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||
| id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority |
|
||||
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||
| 1 | 1 | 2 | primary | in-sync | 2 |
|
||||
+----+---------------+----------------+---------+-----------------+---------------------+
|
||||
|
||||
.. rubric:: |result|
|
||||
|
||||
You have configured a GEO Redundancy protection group between site A and site B.
|
||||
If site A is offline, the subclouds configured in the |SPG| can be
|
||||
migrated in batch to site B for central management manually.
|
||||
|
||||
----------------------------
|
||||
Health Monitor and Migration
|
||||
----------------------------
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Peer monitoring and alarming
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
After the peer protection group is formed, if site A cannot be connected to
|
||||
site B, there will be an alarm message on site B.
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
~(keystone_admin)]$ fm alarm-list
|
||||
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
|
||||
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
|
||||
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
|
||||
| 280.004 | Peer siteA is in disconnected state. Following subcloud peer groups are impacted: group1. | peer=223fcb30-909d-4edf- | major | 2023-08-18T10:25:29. |
|
||||
| | | 8c36-1aebc8e9bd4a | | 670977 |
|
||||
| | | | | |
|
||||
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
|
||||
|
||||
Administrator can suppress the alarm with the following command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
~(keystone_admin)]$ fm event-suppress --alarm_id 280.004
|
||||
+----------+------------+
|
||||
| Event ID | Status |
|
||||
+----------+------------+
|
||||
| 280.004 | suppressed |
|
||||
+----------+------------+
|
||||
|
||||
---------
|
||||
Migration
|
||||
---------
|
||||
|
||||
If site A is down, after receiving the alarming message the administrator
|
||||
can choose to perform the migration on site B, which will migrate the
|
||||
subclouds under the |SPG| from site A to site B.
|
||||
|
||||
.. note::
|
||||
|
||||
Before initiating the migration operation, ensure that ``sync-status`` of the
|
||||
peer group association is ``in-sync`` so that the latest updates from site A
|
||||
have been successfully synchronized to site B. If ``sync_status`` is not
|
||||
``in-sync``, the migration may fail.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
|
||||
|
||||
# For example:
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
|
||||
|
||||
During the batch migration, you can check the status of the migration of each
|
||||
subcloud in the |SPG| by showing the details of the |SPG| being migrated.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group status <Subcloud-Peer-Group-ID-or-Name>
|
||||
|
||||
After successful migration, the subcloud(s) should be in
|
||||
``managed/online/complete`` status on site B.
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site B
|
||||
~(keystone_admin)]$ dcmanager subcloud list
|
||||
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
|
||||
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
|
||||
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
|
||||
| 45 | subcloud3-node2 | managed | online | complete | in-sync | None | None |
|
||||
| 46 | subcloud1-node6 | managed | online | complete | in-sync | None | None |
|
||||
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
|
||||
|
||||
--------------
|
||||
Post Migration
|
||||
--------------
|
||||
|
||||
If site A is restored, the subcloud(s) should be adjusted to
|
||||
``unmanaged/secondary`` status in site A. The administrator can receive an
|
||||
alarm on site A that notifies that the |SPG| is managed by a peer site (site
|
||||
B), because this |SPG| on site A has the higher priority.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~(keystone_admin)]$ fm alarm-list
|
||||
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
|
||||
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
|
||||
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
|
||||
| 280.005 | Subcloud peer group (peer_group_name=group1) is managed by remote system | subcloud_peer_group=7 | warning | 2023-09-04T04:51:58. |
|
||||
| | (peer_uuid=223fcb30-909d-4edf-8c36-1aebc8e9bd4a) with lower priority. | | | 435539 |
|
||||
| | | | | |
|
||||
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
|
||||
|
||||
Then, the administrator can decide if and when to migrate the subcloud(s) back.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# On site A
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
|
||||
|
||||
# For example:
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
|
||||
|
||||
After successful migration, the subcloud status should be back to the
|
||||
``managed/online/complete`` status.
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
|
||||
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
|
||||
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
|
||||
| 33 | subcloud3-node2 | managed | online | complete | in-sync | None | None |
|
||||
| 34 | subcloud1-node6 | managed | online | complete | in-sync | None | None |
|
||||
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
|
||||
|
||||
Also, the alarm mentioned above will be cleared after migrating back.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
~(keystone_admin)]$ fm alarm-list
|
||||
|
||||
----------------------
|
||||
Disable GEO Redundancy
|
||||
----------------------
|
||||
|
||||
You can disable the GEO Redundancy feature from the command line.
|
||||
|
||||
Ensure that you have a stable environment to disable the GEO Redundancy
|
||||
feature, ensuring that the subclouds are managed by the expected site.
|
||||
|
||||
.. rubric:: |proc|
|
||||
|
||||
#. Delete the primary association on both the sites.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# site A
|
||||
~(keystone_admin)]$ dcmanager peer-group-association delete <SiteA-Peer-Group-Association1-ID>
|
||||
|
||||
#. Delete the |SPG|.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# site A
|
||||
~(keystone_admin)]$ dcmanager subcloud-peer-group delete group1
|
||||
|
||||
#. Delete the system peer.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# site A
|
||||
~(keystone_admin)]$ dcmanager system-peer delete siteB
|
||||
# site B
|
||||
~(keystone_admin)]$ dcmanager system-peer delete siteA
|
||||
|
||||
.. rubric:: |result|
|
||||
|
||||
You have torn down the protection group between site A and site B.
|
||||
|
||||
---------------------------
|
||||
Backup and Restore Subcloud
|
||||
---------------------------
|
||||
|
||||
You can backup and restore a subcloud in a distributed cloud environment.
|
||||
However, GEO redundancy does not support the replication of subcloud backup
|
||||
files from one site to another.
|
||||
|
||||
A subcloud backup is valid only for the current system controller. When a
|
||||
subcloud is migrated from site A to site B, the existing backup becomes
|
||||
unavailable. In this case, you can create a new backup of that subcloud on site
|
||||
B. Subsequently, you can restore the subcloud from this newly created backup
|
||||
when it is managed under site B.
|
||||
|
||||
For information on how to backup and restore a subcloud, see
|
||||
:ref:`backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42`
|
||||
and :ref:`restore-a-subcloud-group-of-subclouds-from-backup-data-using-dcmanager-cli-f10c1b63a95e`.
|
||||
|
||||
-------------------------------------------
|
||||
Operations Performed by Protected Subclouds
|
||||
-------------------------------------------
|
||||
|
||||
The table below lists the operations that can/cannot be performed on the protected subclouds.
|
||||
|
||||
**Primary site**: The site where the |SPG| was created.
|
||||
|
||||
**Secondary site**: The peer site where the subclouds in the |SPG| can be migrated to.
|
||||
|
||||
**Protected subcloud**: The subcloud that belongs to a |SPG|.
|
||||
|
||||
**Local/Unprotected subcloud**: The subcloud that does not belong to any |SPG|.
|
||||
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Operation | Allow (Y/N/Maybe) | Note |
|
||||
+==========================================+==================================+=================================================================================================+
|
||||
| Unmanage | N | Subcloud must be removed from the |SPG| before it can be manually unmanaged. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Manage | N | Subcloud must be removed from the |SPG| before it can be manually managed. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Delete | N | Subcloud must be removed from the |SPG| before it can be manually unmanaged |
|
||||
| | | and deleted. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Update | Maybe | Subcloud can only be updated while it is managed in the primary site because the sync command |
|
||||
| | | can only be issued from the system controller where the |SPG| was created. |
|
||||
| | | |
|
||||
| | | .. warning:: |
|
||||
| | | |
|
||||
| | | The subcloud network cannot be reconfigured while it is being managed by the secondary |
|
||||
| | | site. If this operation is necessary, perform the following steps: |
|
||||
| | | |
|
||||
| | | #. Remove the subcloud from the |SPG| to make it a local/unprotected |
|
||||
| | | subcloud. |
|
||||
| | | #. Update the subcloud. |
|
||||
| | | #. (Optional) Manually rehome the subcloud to the primary site after it is restored. |
|
||||
| | | #. (Optional) Re-add the subcloud to the |SPG|. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Rename | Yes | - If the subcloud in the primary site is already a part of |SPG|, we need to remove it from the |
|
||||
| | | |SPG| and then unmanage, rename, and manage the subcloud, and add it back to |SPG| and perform|
|
||||
| | | the sync operation. |
|
||||
| | | |
|
||||
| | | - If the subcloud is in the secondary site, perform the following steps: |
|
||||
| | | |
|
||||
| | | #. Remove the subcloud from the |SPG| to make it a local/unprotected subcloud. |
|
||||
| | | |
|
||||
| | | #. Unmange the subcloud. |
|
||||
| | | |
|
||||
| | | #. Rename the subcloud. |
|
||||
| | | |
|
||||
| | | #. (Optional) Manually rehome the subcloud to the primary site after it is restored. |
|
||||
| | | |
|
||||
| | | #. (Optional) Re-add the subcloud to the |SPG|. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Patch | Y | .. warning:: |
|
||||
| | | |
|
||||
| | | There may be a patch out-of-sync alarm when the subcloud is migrated to another site. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Upgrade | Y | All the system controllers in the protection group must be upgraded first before upgrading |
|
||||
| | | any of the subclouds. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Rehome | N | Subcloud cannot be manually rehomed while being part of the |SPG| |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Backup | Y | |
|
||||
| | | |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Restore | Maybe | - If the subcloud in the primary site is already a part of |SPG|, we need to remove it from the |
|
||||
| | | |SPG| and then unmanage and restore the subcloud, and add it back to |SPG| and perform |
|
||||
| | | the sync operation. |
|
||||
| | | |
|
||||
| | | - If the subcloud is in the secondary site, perform the following steps: |
|
||||
| | | |
|
||||
| | | #. Remove the subcloud from the |SPG| to make it a local/unprotected subcloud. |
|
||||
| | | |
|
||||
| | | #. Unmange the subcloud. |
|
||||
| | | |
|
||||
| | | #. Restore the subcloud from the backup. |
|
||||
| | | |
|
||||
| | | #. (Optional) Manually rehome the subcloud to the primary site after it is restored. |
|
||||
| | | |
|
||||
| | | #. (Optional) Re-add the subcloud to the |SPG|. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Prestage | Y | .. warning:: |
|
||||
| | | |
|
||||
| | | The prestage data will get overwritten because it is not guaranteed that both the system |
|
||||
| | | controllers always run on the same patch level (ostree repo) and/or have the same images |
|
||||
| | | list. |
|
||||
| | | |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Reinstall | Y | |
|
||||
| | | |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Remove from |SPG| | Maybe | Subcloud can be removed from the |SPG| in the primary site. Subcloud can |
|
||||
| | | only be removed from the |SPG| in the secondary site if the primary site is |
|
||||
| | | currently down. |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
| Add to |SPG| | Maybe | Subcloud can only be added to the |SPG| in the primary site as manual sync is required. |
|
||||
| | | |
|
||||
| | | |
|
||||
+------------------------------------------+----------------------------------+-------------------------------------------------------------------------------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
|
BIN
doc/source/dist_cloud/kubernetes/figures/dcg1695034653874.png
Normal file
BIN
doc/source/dist_cloud/kubernetes/figures/dcg1695034653874.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 34 KiB |
@ -175,6 +175,16 @@ Upgrade Orchestration for Distributed Cloud SubClouds
|
||||
failure-prior-to-the-installation-of-n-plus-1-load-on-a-subcloud
|
||||
failure-during-the-installation-or-data-migration-of-n-plus-1-load-on-a-subcloud
|
||||
|
||||
--------------------------------------------------
|
||||
Distributed Cloud System Controller GEO Redundancy
|
||||
--------------------------------------------------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
overview-of-distributed-cloud-geo-redundancy
|
||||
configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662
|
||||
|
||||
--------
|
||||
Appendix
|
||||
--------
|
||||
|
@ -0,0 +1,118 @@
|
||||
|
||||
.. eho1558617205547
|
||||
.. _overview-of-distributed-cloud-geo-redundancy:
|
||||
|
||||
============================================
|
||||
Overview of Distributed Cloud GEO Redundancy
|
||||
============================================
|
||||
|
||||
|prod-long| |prod-dc-geo-red| configuration supports the ability to recover from
|
||||
a catastrophic event that requires subclouds to be rehomed away from the failed
|
||||
system controller site to the available site(s) which have enough spare capacity.
|
||||
This way, even if the failed site cannot be restored in short time, the subclouds
|
||||
can still be rehomed to available peer system controller(s) for centralized
|
||||
management.
|
||||
|
||||
In this configuration, the following items are addressed:
|
||||
|
||||
* 1+1 GEO redundancy
|
||||
|
||||
- Active-Active redundancy model
|
||||
- Total number of subcloud should not exceed 1K
|
||||
|
||||
* Automated operations
|
||||
|
||||
- Synchronization and liveness check between peer systems
|
||||
- Alarm generation if peer system controller is down
|
||||
|
||||
* Manual operations
|
||||
|
||||
- Batch rehoming from alive peer system controller
|
||||
|
||||
---------------------------------------------
|
||||
Distributed Cloud GEO Redundancy Architecture
|
||||
---------------------------------------------
|
||||
|
||||
1+1 Distributed Cloud GEO Redundancy Architecture consists of two local high
|
||||
availability Distributed Cloud clusters. They are the mutual peers that form a
|
||||
protection group illustrated in the figure below:
|
||||
|
||||
.. image:: figures/dcg1695034653874.png
|
||||
|
||||
The architecture features a synchronized distributed control plane for
|
||||
geographic redundancy, where system peer instance is created in each local
|
||||
Distributed Cloud cluster pointing to each other via keystone endpoints to
|
||||
form a system protection group.
|
||||
|
||||
If the administrator wants the peer site to take over the subclouds where local
|
||||
system controller is in failure state, |SPG| needs to be created and subclouds
|
||||
need to be assigned to it. Then, a Peer Group Association needs to be created
|
||||
to link the system peer and |SPG| together. The |SPG| information and the
|
||||
subclouds in it will be synchronized to the peer site via the endpoint information
|
||||
stored in system peer instance.
|
||||
|
||||
The peer sites do health checks via the endpoint information stored in the system peer
|
||||
instance. If the local site detects that the peer site is not reachable,
|
||||
it will raise an alarm to alert the administrator.
|
||||
|
||||
If the failed site cannot be restored quickly, the administrator needs to
|
||||
initiate batch subcloud migration by performing migration on the |SPG| from the
|
||||
healthy peer of the failed site.
|
||||
|
||||
When the failed site has been restored and is ready for service, administrator can
|
||||
initiate the batch subcloud migration from the restored site to migrate back
|
||||
all the subclouds in the |SPG| for geographic proximity.
|
||||
|
||||
**Protection Group** A group of peer sites, which is configured to monitor each
|
||||
other and decide how to take over the subclouds (based on predefined |SPG|) if
|
||||
any peer in the group fails.
|
||||
|
||||
**System Peer**
|
||||
A logic entity, which is created in a system controller site. System controller
|
||||
site uses the information (keystone endpoint, credential) stored in the system
|
||||
peer for the health check and data synchronization.
|
||||
|
||||
**Subcloud Secondary Deploy State**
|
||||
This is a newly introduced state for a subcloud. If a subcloud is in the secondary
|
||||
deploy state, the subcloud instance is only a placeholder holding the configuration
|
||||
parameters, which can be used to migrate the corresponding subcloud from the peer
|
||||
site. After rehoming, the subcloud's state will be changed from secondary to complete,
|
||||
and is managed by the local site. The subcloud instance on the peer site is changed to secondary.
|
||||
|
||||
**Subcloud Peer Group**
|
||||
Group of locally managed subclouds, which is supposed to be duplicated into a
|
||||
peer site as secondary subclouds. The |SPG| instance will also be created in
|
||||
peer site and it will contain all the secondary subclouds just duplicated.
|
||||
|
||||
Multiple |SPGs| are supported and the membership of the |SPG| is decided by
|
||||
administrator. This way, administrator can divide local subclouds into different groups.
|
||||
|
||||
|SPG| can be used to initiate subcloud batch migration. For example, when the
|
||||
peer site has been detected to be down, and the local site is supposed to take
|
||||
over the management of the subclouds in failed peer site, administrator can
|
||||
perform |SPG| migration to migrate all the subclouds in the |SPG| to the local
|
||||
site for centralized management.
|
||||
|
||||
**Subcloud Peer Group Priority**
|
||||
The priority is an attribute of |SPG| instance, and the |SPG| is designed to be
|
||||
synchronized to each peer sites in the protection group with different priority
|
||||
value.
|
||||
|
||||
In a Protection Group, there can be multiple System Peers. The site which owns
|
||||
the |SPG| with the highest priority (smallest value) is the
|
||||
leader site, which needs to initiate the batch migration to take over the
|
||||
subclouds grouped by the |SPG|.
|
||||
|
||||
**Subcloud Peer Group and System Peer Association**
|
||||
Association refers to the binding relationship between |SPG| and system peer.
|
||||
When the association between a |SPG| and system peer is created on the local site,
|
||||
the |SPG| and the subclouds in the group will be duplicated to the peer site to
|
||||
which the system peer in this association is pointing. This way, when the local
|
||||
site is down, the peer site has enough information to initiate the |SPG| based batch
|
||||
migration to take over the centralized management for subclouds previously
|
||||
managed by the failed site.
|
||||
|
||||
One system peer can be associated with multiple |SPGs|. One |SPG| can be associated
|
||||
with multiple system peers, with priority specified. This priority is used to
|
||||
decide which |SPG| has the higher priority to take over the subclouds when batch migration
|
||||
should be performed.
|
@ -17,6 +17,12 @@ controller using the rehoming playbook.
|
||||
The rehoming playbook does not work with freshly installed/bootstrapped
|
||||
subclouds.
|
||||
|
||||
.. note::
|
||||
|
||||
Manual rehoming is not possible if a subcloud is included in an |SPG|.
|
||||
Use the :command:`dcmanager subcloud-peer-group migrate` command for automatic
|
||||
rehoming. To get more information, see :ref:`configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662`.
|
||||
|
||||
.. note::
|
||||
|
||||
The system time should be accurately configured on the system controllers
|
||||
@ -27,7 +33,7 @@ controller using the rehoming playbook.
|
||||
Do not rehome a subcloud if the RECONCILED status on the system resource or
|
||||
any host resource of the subcloud is FALSE. To check the RECONCILED status,
|
||||
run the :command:`kubectl -n deployment get system` and :command:`kubectl -n deployment get hosts` commands.
|
||||
|
||||
|
||||
Use the following procedure to enable subcloud rehoming and to update the new
|
||||
subcloud configuration (networking parameters, passwords, etc.) to be
|
||||
compatible with the new system controller.
|
||||
|
Loading…
x
Reference in New Issue
Block a user