Ron Stone 547bc79e7d Spellcheck (r9, dsR9)

Run spellcheck job and correct errors.
Fix malformed table

Change-Id: I15d30123ce246adcbdde5d0c9b05e3ff4a69abc0
Signed-off-by: Ron Stone <ronald.stone@windriver.com>

2024-06-11 17:27:22 +00:00

40 KiB

Raw Blame History

Configure Distributed Cloud System Controller GEO Redundancy

You can configure a distributed cloud System Controller GEO Redundancy using DC manager commands.

System administrators can follow the procedures below to enable and disable the GEO Redundancy feature.

Note

In this release, the GEO Redundancy feature supports only two distributed clouds in one protection group.

Enable GEO Redundancy

Set up a protection group for two distributed clouds, making these two distributed clouds operational in 1+1 active GEO Redundancy mode.

For example, let us assume we have two distributed clouds, site A and site B. When the operation is performed on site A, the local site is site A and the peer site is site B. When the operation is performed on site B, the local site is site B and the peer site is site A.

The peer system controller's network is accessible to each other and can access the subclouds via both and management networks.

For security of production system, it is important to ensure the safety and identification of peer site queries. To meet this objective, it is essential to have an HTTPS-based system API in place. This necessitates the presence of a well-known and trusted to enable secure HTTPS communication between peers. If you are using an internally trusted , ensure that the system trusts the by installing its certificate with the following command.

~(keystone_admin)]$ system certificate-install --mode ssl_ca <trusted-ca-bundle-pem-file>

where:

<trusted-ca-bundle-pem-file>: is the path to the intermediate or Root certificate associated with the REST API's Intermediate or Root -signed certificate.

You can enable the GEO Redundancy feature between site A and site B from the command line. In this procedure, the subclouds managed by site A will be configured to be managed by GEO Redundancy protection group that consists of site A and site B. When site A is offline for some reasons, an alarm notifies the administrator, who initiates the group based batch migration to rehome the subclouds of site A to site B for centralized management.

Similarly, you can also configure the subclouds managed by site B to be taken over by site A when site B is offline by following the same procedure where site B is local site and site A is peer site.

Log in to the active controller node of site B and get the required information about the site B to create a protection group.

Unique of the central cloud of the peer system controller
URI of Keystone endpoint of peer system controller
Gateway IP address of the management network of peer system controller

For example:

# On site B
sysadmin@controller-0:~$ source /etc/platform/openrc
~(keystone_admin)]$ system show | grep -i uuid
| uuid | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a |

~(keystone_admin)]$ openstack endpoint list --service keystone \
    --interface public --region RegionOne -c URL
+-----------------------------+
| URL                         |
+-----------------------------+
| http://10.10.10.2:5000      |
+-----------------------------+

~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"
gateway
10.10.27.1

Log in to the active controller node of the central cloud of site A. Create a System Peer instance of site B on site A so that site A can access information of site B.

# On site A
~(keystone_admin)]$ dcmanager system-peer add \
    --peer-uuid 223fcb30-909d-4edf-8c36-1aebc8e9bd4a \
    --peer-name siteB \
    --manager-endpoint http://10.10.10.2:5000 \
    --peer-controller-gateway-address 10.10.27.1
Enter the admin password for the system peer:
Re-enter admin password to confirm:

+----+--------------------------------------+-----------+-----------------------------+----------------------------+
| id | peer uuid                            | peer name | manager endpoint            | controller gateway address |
+----+--------------------------------------+-----------+-----------------------------+----------------------------+
|  2 | 223fcb30-909d-4edf-8c36-1aebc8e9bd4a | siteB     | http://10.10.10.2:5000      | 10.10.27.1                 |
+----+--------------------------------------+-----------+-----------------------------+----------------------------+

Collect the information from site A.

# On site A
sysadmin@controller-0:~$ source /etc/platform/openrc
~(keystone_admin)]$ system show | grep -i uuid
~(keystone_admin)]$ openstack endpoint list --service keystone --interface public --region RegionOne -c URL
~(keystone_admin)]$ system host-route-list controller-0 | awk '{print $10}' | grep -v "^$"

Log in to the active controller node of the central cloud of site B. Create a System Peer instance of site A on site B so that site B has information about site A.

# On site B
~(keystone_admin)]$ dcmanager system-peer add \
    --peer-uuid 3963cb21-c01a-49cc-85dd-ebc1d142a41d \
    --peer-name siteA \
    --manager-endpoint http://10.10.11.2:5000 \
    --peer-controller-gateway-address 10.10.25.1
Enter the admin password for the system peer:
Re-enter admin password to confirm:

Create a for site A.

# On site A
~(keystone_admin)]$ dcmanager subcloud-peer-group add --peer-group-name group1

Add the subclouds needed for redundancy protection on site A.

Ensure that the subclouds bootstrap data is updated. The bootstrap data is the data used to bootstrap the subcloud, which includes the and management network information, system controller gateway information, and docker registry information to pull necessary images to bootstrap the system.

For an example of a typical bootstrap file, see installing-and-provisioning-a-subcloud.
1. Update the subcloud information with the bootstrap values.
```
~(keystone_admin)]$ dcmanager subcloud update subcloud1 \
   --bootstrap-address <Subcloud_OAM_IP_Address> \
   --bootstrap-values <Path_of_Bootstrap-Value-File>
```
2. Update the subcloud information with the created locally.
```
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud1-Name> \
    --peer-group <SiteA-Subcloud-Peer-Group-ID-or-Name>
```
  For example,
```
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group group1
```
3. If you want to remove one subcloud from the , run the following command:
```
~(keystone_admin)]$ dcmanager subcloud update <SiteA-Subcloud-Name> --peer-group none
```
  For example,
```
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --peer-group none
```
4. Check the subclouds that are under the .
```
~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-ID-or-Name>
```

Create an association between the System Peer and .

# On site A
~(keystone_admin)]$ dcmanager peer-group-association add \
    --system-peer-id <SiteB-System-Peer-ID> \
    --peer-group-id <SiteA-System-Peer-Group1> \
    --peer-group-priority <priority>

The peer-group-priority parameter can accept an integer value greater than 0. It is used to set the priority of the , which is created in peer site using the peer site's dcmanager API during association synchronization.

The default priority in the is 0 when it is created in the local site.
The smallest integer has the highest priority.

During the association creation, the in the association will be synchronized from the local site to the peer site, and the subclouds belonging to the .

Confirm that the local and its subclouds have been synchronized into site B with the same name.

Show the association information just created in site A and ensure that sync_status is in-sync.

# On site A
~(keystone_admin)]$ dcmanager peer-group-association list <Association-ID>

+----+---------------+----------------+---------+-----------------+---------------------+
| id | peer_group_id | system_peer_id | type    | sync_status     | peer_group_priority |
+----+---------------+----------------+---------+-----------------+---------------------+
|  1 |             1 |              2 | primary | in-sync         | 2                   |
+----+---------------+----------------+---------+-----------------+---------------------+

Show subcloud-peer-group in site B and ensure that it has been created.

List the subcloud in subcloud-peer-group in site B and ensure that all the subclouds have been synchronized as secondary subclouds.

# On site B
~(keystone_admin)]$ dcmanager subcloud-peer-group show <SiteA-Subcloud-Peer-Group-Name>
~(keystone_admin)]$ dcmanager subcloud-peer-group list-subclouds <SiteA-Subcloud-Peer-Group-Name>

When you create the primary association on site A, a non-primary association on site B will automatically be created to associate the synchronized from site A and the system peer pointing to site A.

You can check the association list to confirm if the non-primary association was created on site B.

# On site B
~(keystone_admin)]$ dcmanager peer-group-association list
+----+---------------+----------------+-------------+-------------+---------------------+
| id | peer_group_id | system_peer_id | type        | sync_status | peer_group_priority |
+----+---------------+----------------+-------------+-------------+---------------------+
|  2 |            26 |              1 | non-primary | in-sync     | None                |
+----+---------------+----------------+-------------+-------------+---------------------+

(Optional) Update the protection group related configuration.

After the peer group association has been created, you can still update the related resources configured in the protection group:
- Update subcloud with bootstrap values
- Add subcloud(s) into the
- Remove subcloud(s) from the
After any of the above operations, sync_status is changed to out-of-sync.

After the update has been completed, you need to use the sync command to push the changes to the peer site that keeps the the same status.
```
# On site A
dcmanager peer-group-association sync <SiteA-Peer-Group-Association1-ID>
```
Warning

The dcmanager peer-group-association sync command must be run after any of the following changes:
- Subcloud is removed from the for the subcloud name change.
- Subcloud is removed from the for the subcloud management network reconfiguration.
- Subcloud updates one or both of these parameters: --bootstrap-address, --bootstrap-values parameters.
Similarly, you need to check the information has been synchronized by showing the association information just created in site A, ensuring that sync_status is in-sync.
```
# On site A
~(keystone_admin)]$ dcmanager peer-group-association show <Association-ID>

 +----+---------------+----------------+---------+-----------------+---------------------+
 | id | peer_group_id | system_peer_id | type    | sync_status     | peer_group_priority |
 +----+---------------+----------------+---------+-----------------+---------------------+
 |  1 |             1 |              2 | primary | in-sync         | 2                   |
 +----+---------------+----------------+---------+-----------------+---------------------+
```

You have configured a GEO Redundancy protection group between site A and site B. If site A is offline, the subclouds configured in the can be migrated in batch to site B for central management manually.

Health Monitor and Migration

Peer monitoring and alarming

After the peer protection group is formed, if site A cannot be connected to site B, there will be an alarm message on site B.

For example:

# On site B
~(keystone_admin)]$ fm alarm-list
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
| Alarm ID | Reason Text                                                                                                              | Entity ID                            | Severity | Time Stamp               |
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+
| 280.004  | Peer siteA is in disconnected state. Following subcloud peer groups are impacted: group1.                                | peer=223fcb30-909d-4edf-             | major    | 2023-08-18T10:25:29.     |
|          |                                                                                                                          | 8c36-1aebc8e9bd4a                    |          | 670977                   |
|          |                                                                                                                          |                                      |          |                          |
+----------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+----------+--------------------------+

Administrator can suppress the alarm with the following command:

# On site B
~(keystone_admin)]$ fm event-suppress --alarm_id 280.004
+----------+------------+
| Event ID | Status     |
+----------+------------+
| 280.004  | suppressed |
+----------+------------+

Migration

If site A is down, after receiving the alarming message the administrator can choose to perform the migration on site B, which will migrate the subclouds under the from site A to site B.

Note

Before initiating the migration operation, ensure that sync-status of the peer group association is in-sync so that the latest updates from site A have been successfully synchronized to site B. If sync_status is not in-sync, the migration may fail.

# On site B
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>

# For example:
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1

During the batch migration, you can check the status of the migration of each subcloud in the by showing the details of the being migrated.

# On site B
~(keystone_admin)]$ dcmanager subcloud-peer-group status <Subcloud-Peer-Group-ID-or-Name>

After successful migration, the subcloud(s) should be in managed/online/complete status on site B.

For example:

# On site B
~(keystone_admin)]$ dcmanager subcloud list
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
| id | name                            | management | availability | deploy status | sync        | backup status | backup datetime |
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+
| 45 | subcloud3-node2                 | managed    | online       | complete      | in-sync     | None          | None            |
| 46 | subcloud1-node6                 | managed    | online       | complete      | in-sync     | None          | None            |
+----+---------------------------------+------------+--------------+---------------+-------------+---------------+-----------------+

Post Migration

If site A is restored, the subcloud(s) should be adjusted to unmanaged/secondary status in site A. The administrator can receive an alarm on site A that notifies that the is managed by a peer site (site B), because this on site A has the higher priority.

~(keystone_admin)]$ fm alarm-list
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
| Alarm ID | Reason Text                                                                                                             | Entity ID                        | Severity | Time Stamp            |
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+
| 280.005  | Subcloud peer group (peer_group_name=group1)                                              is managed by remote system   | subcloud_peer_group=7            | warning  | 2023-09-04T04:51:58.  |
|          | (peer_uuid=223fcb30-909d-4edf-8c36-1aebc8e9bd4a) with lower priority.                                                   |                                  |          | 435539                |
|          |                                                                                                                         |                                  |          |                       |
+----------+-------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------+-----------------------+

Then, the administrator can decide if and when to migrate the subcloud(s) back.

# On site A
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate <Subcloud-Peer-Group-ID-or-Name>

# For example:
~(keystone_admin)]$ dcmanager subcloud-peer-group migrate group1

After successful migration, the subcloud status should be back to the managed/online/complete status.

For example:

+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
| id | name                            | management | availability | deploy status | sync    | backup status | backup datetime |
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+
| 33 | subcloud3-node2                 | managed    | online       | complete      | in-sync | None          | None            |
| 34 | subcloud1-node6                 | managed    | online       | complete      | in-sync | None          | None            |
+----+---------------------------------+------------+--------------+---------------+---------+---------------+-----------------+

Also, the alarm mentioned above will be cleared after migrating back.

~(keystone_admin)]$ fm alarm-list

Disable GEO Redundancy

You can disable the GEO Redundancy feature from the command line.

Ensure that you have a stable environment to disable the GEO Redundancy feature, ensuring that the subclouds are managed by the expected site.

Delete the primary association on both the sites.

# site A
~(keystone_admin)]$ dcmanager peer-group-association delete <SiteA-Peer-Group-Association1-ID>

Delete the .

# site A
~(keystone_admin)]$ dcmanager subcloud-peer-group delete group1

Delete the system peer.

# site A
~(keystone_admin)]$ dcmanager system-peer delete siteB
# site B
~(keystone_admin)]$ dcmanager system-peer delete siteA

You have torn down the protection group between site A and site B.

Backup and Restore Subcloud

You can backup and restore a subcloud in a distributed cloud environment. However, GEO redundancy does not support the replication of subcloud backup files from one site to another.

A subcloud backup is valid only for the current system controller. When a subcloud is migrated from site A to site B, the existing backup becomes unavailable. In this case, you can create a new backup of that subcloud on site B. Subsequently, you can restore the subcloud from this newly created backup when it is managed under site B.

For information on how to backup and restore a subcloud, see backup-a-subcloud-group-of-subclouds-using-dcmanager-cli-f12020a8fc42 and restore-a-subcloud-group-of-subclouds-from-backup-data-using-dcmanager-cli-f10c1b63a95e.

Operations Performed by Protected Subclouds

The table below lists the operations that can/cannot be performed on the protected subclouds.

Primary site: The site where the was created.

Secondary site: The peer site where the subclouds in the can be migrated to.

Protected subcloud: The subcloud that belongs to a .

Local/Unprotected subcloud: The subcloud that does not belong to any .

Operation	Allow (Y/N/Maybe)	Note
Unmanage	N	Subcloud must be removed from the before it can be manually unmanaged.
Manage	N	Subcloud must be removed from the before it can be manually managed.
Delete	N	Subcloud must be removed from the before it can be manually unmanaged and deleted.
Update	Maybe	Subcloud can only be updated while it is managed in the primary site because the sync command can only be issued from the system controller where the was created. Warning The subcloud network cannot be reconfigured while it is being managed by the secondary site. If this operation is necessary, perform the following steps: Remove the subcloud from the to make it a local/unprotected subcloud. Update the subcloud. (Optional) Manually rehome the subcloud to the primary site after it is restored. (Optional) Re-add the subcloud to the .
Rename	Yes	If the subcloud in the primary site is already a part of , we need to remove it from the and then unmanage, rename, and manage the subcloud, and add it back to and perform the sync operation. If the subcloud is in the secondary site, perform the following steps: Remove the subcloud from the to make it a local/unprotected subcloud. Unmanage the subcloud. Rename the subcloud. (Optional) Manually rehome the subcloud to the primary site after it is restored. (Optional) Re-add the subcloud to the .
Patch	Y	Warning There may be a patch out-of-sync alarm when the subcloud is migrated to another site.
Upgrade	Y	All the system controllers in the protection group must be upgraded first before upgrading any of the subclouds.
Rehome	N	Subcloud cannot be manually rehomed while being part of the
Backup	Y
Restore	Maybe	If the subcloud in the primary site is already a part of , we need to remove it from the and then unmanage and restore the subcloud, and add it back to and perform the sync operation. If the subcloud is in the secondary site, perform the following steps: Remove the subcloud from the to make it a local/unprotected subcloud. Unmanage the subcloud. Restore the subcloud from the backup. (Optional) Manually rehome the subcloud to the primary site after it is restored. (Optional) Re-add the subcloud to the .
Prestage	Y	Warning The prestage data will get overwritten because it is not guaranteed that both the system controllers always run on the same patch level (ostree repo) and/or have the same images list.
Reinstall	Maybe	If the subcloud in the primary site is already a part of , you need to remove it from the , unmanage and reinstall the subcloud, and add it back to and perform the sync operation. If the subcloud is in the secondary site, perform the following steps: Remove the subcloud from the to make it a local/unprotected subcloud. Unmanage the subcloud. Re-install the subcloud. (Optional) Manually rehome the subcloud to the primary site after it is restored. (Optional) Re-add the subcloud to the .
Remove from	Maybe	Subcloud can be removed from the in the primary site. Subcloud can only be removed from the in the secondary site if the primary site is currently down.
Add to	Maybe	Subcloud can only be added to the in the primary site as manual sync is required.

Note

After migrating the subcloud, kube-rootca_sync_status may become out-of-sync if it is not synchronized with the new system controller. To update the root certificate of the subcloud, run the dcmanager kube-rootca-update-strategy command and pass the kube-root cert from the new system controller. However, if you update the certificate and migrate the subcloud back to the primary site, then the certificate needs to be updated again.

40 KiB Raw Blame History

Configure Distributed Cloud System Controller GEO Redundancy

Enable GEO Redundancy

Health Monitor and Migration

Peer monitoring and alarming

Migration

Post Migration

Disable GEO Redundancy

Backup and Restore Subcloud

Operations Performed by Protected Subclouds

40 KiB

Raw Blame History