![Ron Stone](/assets/img/avatar_default.png)
Run spellcheck job and correct errors. Fix malformed table Change-Id: I15d30123ce246adcbdde5d0c9b05e3ff4a69abc0 Signed-off-by: Ron Stone <ronald.stone@windriver.com>
40 KiB
Configure Distributed Cloud System Controller GEO Redundancy
You can configure a distributed cloud System Controller GEO Redundancy using DC manager commands.
System administrators can follow the procedures below to enable and disable the GEO Redundancy feature.
Note
In this release, the GEO Redundancy feature supports only two distributed clouds in one protection group.
Enable GEO Redundancy
Set up a protection group for two distributed clouds, making these two distributed clouds operational in 1+1 active GEO Redundancy mode.
For example, let us assume we have two distributed clouds, site A and site B. When the operation is performed on site A, the local site is site A and the peer site is site B. When the operation is performed on site B, the local site is site B and the peer site is site A.
The peer system controller's network is accessible to each other and can access the subclouds via both and management networks.
For security of production system, it is important to ensure the safety and identification of peer site queries. To meet this objective, it is essential to have an HTTPS-based system API in place. This necessitates the presence of a well-known and trusted to enable secure HTTPS communication between peers. If you are using an internally trusted , ensure that the system trusts the by installing its certificate with the following command.
~(keystone_admin)]$ system certificate-install --mode ssl_ca <trusted-ca-bundle-pem-file>
where:
<trusted-ca-bundle-pem-file>
-
is the path to the intermediate or Root certificate associated with the REST API's Intermediate or Root -signed certificate.
You can enable the GEO Redundancy feature between site A and site B from the command line. In this procedure, the subclouds managed by site A will be configured to be managed by GEO Redundancy protection group that consists of site A and site B. When site A is offline for some reasons, an alarm notifies the administrator, who initiates the group based batch migration to rehome the subclouds of site A to site B for centralized management.
Similarly, you can also configure the subclouds managed by site B to be taken over by site A when site B is offline by following the same procedure where site B is local site and site A is peer site.
Log in to the active controller node of site B and get the required information about the site B to create a protection group.
- Unique of the central cloud of the peer system controller
- URI of Keystone endpoint of peer system controller
- Gateway IP address of the management network of peer system controller
For example:
# On site B sysadmin@controller-0:~$ source /etc/platform/openrc ~(keystone_admin)]$ system show | grep -i uuid | uuid | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a | ~(keystone_admin)]$ openstack endpoint list --service keystone \ --interface public --region RegionOne -c URL +-----------------------------+ | URL | +-----------------------------+ | http://10.10.10.2:5000 | +-----------------------------+ ~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$" gateway 10.10.27.1
Log in to the active controller node of the central cloud of site A. Create a System Peer instance of site B on site A so that site A can access information of site B.
# On site A ~(keystone_admin)]$ dcmanager system-peer add \ --peer-uuid 223fcb30-909d-4edf-8c36-1aebc8e9bd4a \ --peer-name siteB \ --manager-endpoint http://10.10.10.2:5000 \ --peer-controller-gateway-address 10.10.27.1 Enter the admin password for the system peer: Re-enter admin password to confirm: +----+--------------------------------------+-----------+-----------------------------+----------------------------+ | id | peer uuid | peer name | manager endpoint | controller gateway address | +----+--------------------------------------+-----------+-----------------------------+----------------------------+ | 2 | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a | siteB | http://10.10.10.2:5000 | 10.10.27.1 | +----+--------------------------------------+-----------+-----------------------------+----------------------------+
Collect the information from site A.
# On site A sysadmin@controller-0:~$ source /etc/platform/openrc ~(keystone_admin)]$ system show | grep -i uuid ~(keystone_admin)]$ openstack endpoint list --service keystone --interface public --region RegionOne -c URL ~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
Log in to the active controller node of the central cloud of site B. Create a System Peer instance of site A on site B so that site B has information about site A.
# On site B ~(keystone_admin)]$ dcmanager system-peer add \ --peer-uuid 3963cb21-c01a-49cc-85dd-ebc1d142a41d \ --peer-name siteA \ --manager-endpoint http://10.10.11.2:5000 \ --peer-controller-gateway-address 10.10.25.1 Enter the admin password for the system peer: Re-enter admin password to confirm:
Create a for site A.
# On site A ~(keystone_admin)]$ dcmanager subcloud-peer-group add --peer-group-name group1
Add the subclouds needed for redundancy protection on site A.
Ensure that the subclouds bootstrap data is updated. The bootstrap data is the data used to bootstrap the subcloud, which includes the and management network information, system controller gateway information, and docker registry information to pull necessary images to bootstrap the system.
For an example of a typical bootstrap file, see
installing-and-provisioning-a-subcloud
.Update the subcloud information with the bootstrap values.
~(keystone_admin)]$ dcmanager subcloud update subcloud1 \ --bootstrap-address <Subcloud_OAM_IP_Address> \ --bootstrap-values <Path_of_Bootstrap-Value-File>
Update the subcloud information with the created locally.
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud1-Name> \ --peer-group <SiteA-Subcloud-Peer-Group-ID-or-Name>
For example,
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group group1
If you want to remove one subcloud from the , run the following command:
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud-Name> --peer-group none
For example,
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group none
Check the subclouds that are under the .
~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-ID-or-Name>
Create an association between the System Peer and .
# On site A ~(keystone_admin)]$ dcmanager peer-group-association add \ --system-peer-id <SiteB-System-Peer-ID> \ --peer-group-id <SiteA-System-Peer-Group1> \ --peer-group-priority <priority>
The
peer-group-priority
parameter can accept an integer value greater than 0. It is used to set the priority of the , which is created in peer site using the peer site's dcmanager API during association synchronization.- The default priority in the is 0 when it is created in the local site.
- The smallest integer has the highest priority.
During the association creation, the in the association will be synchronized from the local site to the peer site, and the subclouds belonging to the .
Confirm that the local and its subclouds have been synchronized into site B with the same name.
Show the association information just created in site A and ensure that
sync_status
isin-sync
.# On site A ~(keystone_admin)]$ dcmanager peer-group-association list <Association-ID> +----+---------------+----------------+---------+-----------------+---------------------+ | id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority | +----+---------------+----------------+---------+-----------------+---------------------+ | 1 | 1 | 2 | primary | in-sync | 2 | +----+---------------+----------------+---------+-----------------+---------------------+
Show
subcloud-peer-group
in site B and ensure that it has been created.List the subcloud in
subcloud-peer-group
in site B and ensure that all the subclouds have been synchronized as secondary subclouds.# On site B ~(keystone_admin)]$ dcmanager subcloud-peer-group show <SiteA-Subcloud-Peer-Group-Name> ~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-Name>
When you create the primary association on site A, a non-primary association on site B will automatically be created to associate the synchronized from site A and the system peer pointing to site A.
You can check the association list to confirm if the non-primary association was created on site B.
# On site B ~(keystone_admin)]$ dcmanager peer-group-association list +----+---------------+----------------+-------------+-------------+---------------------+ | id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority | +----+---------------+----------------+-------------+-------------+---------------------+ | 2 | 26 | 1 | non-primary | in-sync | None | +----+---------------+----------------+-------------+-------------+---------------------+
(Optional) Update the protection group related configuration.
After the peer group association has been created, you can still update the related resources configured in the protection group:
- Update subcloud with bootstrap values
- Add subcloud(s) into the
- Remove subcloud(s) from the
After any of the above operations,
sync_status
is changed toout-of-sync
.After the update has been completed, you need to use the
sync
command to push the changes to the peer site that keeps the the same status.# On site A dcmanager peer-group-association sync <SiteA-Peer-Group-Association1-ID>
Warning
The
dcmanager peer-group-association sync
command must be run after any of the following changes:- Subcloud is removed from the for the subcloud name change.
- Subcloud is removed from the for the subcloud management network reconfiguration.
- Subcloud updates one or both of these parameters:
--bootstrap-address
,--bootstrap-values parameters
.
Similarly, you need to check the information has been synchronized by showing the association information just created in site A, ensuring that
sync_status
isin-sync
.# On site A ~(keystone_admin)]$ dcmanager peer-group-association show <Association-ID> +----+---------------+----------------+---------+-----------------+---------------------+ | id | peer_group_id | system_peer_id | type | sync_status | peer_group_priority | +----+---------------+----------------+---------+-----------------+---------------------+ | 1 | 1 | 2 | primary | in-sync | 2 | +----+---------------+----------------+---------+-----------------+---------------------+
You have configured a GEO Redundancy protection group between site A and site B. If site A is offline, the subclouds configured in the can be migrated in batch to site B for central management manually.
Health Monitor and Migration
Peer monitoring and alarming
After the peer protection group is formed, if site A cannot be connected to site B, there will be an alarm message on site B.
For example:
# On site B
~(keystone_admin)]$ fm alarm-list
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
| 280.004 | Peer siteA is in disconnected state. Following subcloud peer groups are impacted: group1. | peer=223fcb30-909d-4edf- | major | 2023-08-18T10:25:29. |
| | | 8c36-1aebc8e9bd4a | | 670977 |
| | | | | |
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
Administrator can suppress the alarm with the following command:
# On site B
~(keystone_admin)]$ fm event-suppress --alarm_id 280.004
+----------+------------+
| Event ID | Status |
+----------+------------+
| 280.004 | suppressed |
+----------+------------+
Migration
If site A is down, after receiving the alarming message the administrator can choose to perform the migration on site B, which will migrate the subclouds under the from site A to site B.
Note
Before initiating the migration operation, ensure that
sync-status
of the peer group association is
in-sync
so that the latest updates from site A have been
successfully synchronized to site B. If sync_status
is not
in-sync
, the migration may fail.
# On site B
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
# For example:
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
During the batch migration, you can check the status of the migration of each subcloud in the by showing the details of the being migrated.
# On site B
~(keystone_admin)]$ dcmanager subcloud-peer-group status <Subcloud-Peer-Group-ID-or-Name>
After successful migration, the subcloud(s) should be in
managed/online/complete
status on site B.
For example:
# On site B
~(keystone_admin)]$ dcmanager subcloud list
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
| 45 | subcloud3-node2 | managed | online | complete | in-sync | None | None |
| 46 | subcloud1-node6 | managed | online | complete | in-sync | None | None |
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
Post Migration
If site A is restored, the subcloud(s) should be adjusted to
unmanaged/secondary
status in site A. The administrator can
receive an alarm on site A that notifies that the is managed by a peer
site (site B), because this on site A has the higher priority.
~(keystone_admin)]$ fm alarm-list
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
| 280.005 | Subcloud peer group (peer_group_name=group1) is managed by remote system | subcloud_peer_group=7 | warning | 2023-09-04T04:51:58. |
| | (peer_uuid=223fcb30-909d-4edf-8c36-1aebc8e9bd4a) with lower priority. | | | 435539 |
| | | | | |
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
Then, the administrator can decide if and when to migrate the subcloud(s) back.
# On site A
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>
# For example:
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1
After successful migration, the subcloud status should be back to the
managed/online/complete
status.
For example:
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
| 33 | subcloud3-node2 | managed | online | complete | in-sync | None | None |
| 34 | subcloud1-node6 | managed | online | complete | in-sync | None | None |
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
Also, the alarm mentioned above will be cleared after migrating back.
~(keystone_admin)]$ fm alarm-list
Disable GEO Redundancy
You can disable the GEO Redundancy feature from the command line.
Ensure that you have a stable environment to disable the GEO Redundancy feature, ensuring that the subclouds are managed by the expected site.
Delete the primary association on both the sites.
# site A ~(keystone_admin)]$ dcmanager peer-group-association delete <SiteA-Peer-Group-Association1-ID>
Delete the .
# site A ~(keystone_admin)]$ dcmanager subcloud-peer-group delete group1
Delete the system peer.
# site A ~(keystone_admin)]$ dcmanager system-peer delete siteB # site B ~(keystone_admin)]$ dcmanager system-peer delete siteA
You have torn down the protection group between site A and site B.
Backup and Restore Subcloud
You can backup and restore a subcloud in a distributed cloud environment. However, GEO redundancy does not support the replication of subcloud backup files from one site to another.
A subcloud backup is valid only for the current system controller. When a subcloud is migrated from site A to site B, the existing backup becomes unavailable. In this case, you can create a new backup of that subcloud on site B. Subsequently, you can restore the subcloud from this newly created backup when it is managed under site B.
For information on how to backup and restore a subcloud, see backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42
and restore-a-subcloud-group-of-subclouds-from-backup-data-using-dcmanager-cli-f10c1b63a95e
.
Operations Performed by Protected Subclouds
The table below lists the operations that can/cannot be performed on the protected subclouds.
Primary site: The site where the was created.
Secondary site: The peer site where the subclouds in the can be migrated to.
Protected subcloud: The subcloud that belongs to a .
Local/Unprotected subcloud: The subcloud that does not belong to any .
Operation | Allow (Y/N/Maybe) | Note |
---|---|---|
Unmanage |
|
Subcloud must be removed from the before it can be manually unmanaged. |
Manage |
|
Subcloud must be removed from the before it can be manually managed. |
Delete |
|
Subcloud must be removed from the before it can be manually unmanaged and deleted. |
Update |
|
Subcloud can only be updated while it is managed in the primary site because the sync command can only be issued from the system controller where the was created. Warning The subcloud network cannot be reconfigured while it is being managed by the secondary site. If this operation is necessary, perform the following steps:
|
Rename |
|
|
Patch |
|
Warning There may be a patch out-of-sync alarm when the subcloud is migrated to another site. |
Upgrade |
|
All the system controllers in the protection group must be upgraded first before upgrading any of the subclouds. |
Rehome |
|
Subcloud cannot be manually rehomed while being part of the |
Backup |
|
|
Restore |
|
|
Prestage |
|
Warning The prestage data will get overwritten because it is not guaranteed that both the system controllers always run on the same patch level (ostree repo) and/or have the same images list. |
Reinstall |
|
If the subcloud in the primary site is already a part of , you need to remove it from the , unmanage and reinstall the subcloud, and add it back to and perform the sync operation. If the subcloud is in the secondary site, perform the following steps:
|
Remove from |
|
Subcloud can be removed from the in the primary site. Subcloud can only be removed from the in the secondary site if the primary site is currently down. |
Add to |
|
Subcloud can only be added to the in the primary site as manual sync is required. |
Note
After migrating the subcloud, kube-rootca_sync_status
may become out-of-sync
if it is not synchronized with the
new system controller. To update the root certificate of the subcloud,
run the dcmanager kube-rootca-update-strategy
command and
pass the kube-root
cert from the new system controller.
However, if you update the certificate and migrate the subcloud back to
the primary site, then the certificate needs to be updated again.