Merge "Rehoming a Subcloud"

This commit is contained in:
Zuul 2021-06-09 13:43:22 +00:00 committed by Gerrit Code Review
commit 37ef706610
10 changed files with 382 additions and 116 deletions

3
doc/.vscode/settings.json vendored Normal file
View File

@ -0,0 +1,3 @@
{
"restructuredtext.confPath": ""
}

View File

@ -3,3 +3,6 @@
.. begin-prereqs .. begin-prereqs
.. end-prereqs .. end-prereqs
.. begin-ref-1
.. end-ref-1

View File

@ -0,0 +1,3 @@
.. rehoming-begin
.. rehoming-end

View File

@ -28,6 +28,7 @@ Installation
installing-and-provisioning-a-subcloud installing-and-provisioning-a-subcloud
installing-a-subcloud-using-redfish-platform-management-service installing-a-subcloud-using-redfish-platform-management-service
installing-a-subcloud-without-redfish-platform-management-service installing-a-subcloud-without-redfish-platform-management-service
reinstalling-a-subcloud-with-redfish-platform-management-service
--------- ---------
Operation Operation
@ -50,6 +51,7 @@ Operation
updating-docker-registry-credentials-on-a-subcloud updating-docker-registry-credentials-on-a-subcloud
migrate-an-aiosx-subcloud-to-an-aiodx-subcloud migrate-an-aiosx-subcloud-to-an-aiodx-subcloud
restoring-subclouds-from-backupdata-using-dcmanager restoring-subclouds-from-backupdata-using-dcmanager
rehoming-a-subcloud
---------------------- ----------------------
Manage Subcloud Groups Manage Subcloud Groups

View File

@ -28,6 +28,7 @@ subcloud, the subcloud installation has these phases:
.. note:: .. note::
After a successful remote installation of a subcloud in a Distributed Cloud After a successful remote installation of a subcloud in a Distributed Cloud
system, a subsequent remote reinstallation fails because of an existing ssh system, a subsequent remote reinstallation fails because of an existing ssh
key entry in the /root/.ssh/known\_hosts on the System Controller. In this key entry in the /root/.ssh/known\_hosts on the System Controller. In this
@ -50,6 +51,7 @@ subcloud, the subcloud installation has these phases:
command\). command\).
.. note:: .. note::
This is required only once and does not have to be done for every This is required only once and does not have to be done for every
subcloud install. subcloud install.
@ -72,10 +74,8 @@ subcloud, the subcloud installation has these phases:
#. At the subcloud location, physically install the servers and network #. At the subcloud location, physically install the servers and network
connectivity required for the subcloud. connectivity required for the subcloud.
.. See |inst-doc|: :ref:`Preparing Servers <preparing-servers>` for more
information.
.. note:: .. note::
Do not power off the servers. The host portion of the server can be Do not power off the servers. The host portion of the server can be
powered off, but the |BMC| portion of the server must be powered and powered off, but the |BMC| portion of the server must be powered and
accessible from the System Controller. accessible from the System Controller.
@ -83,16 +83,22 @@ subcloud, the subcloud installation has these phases:
There is no need to wipe the disks. There is no need to wipe the disks.
.. note:: .. note::
The servers require connectivity to a gateway router that provides IP The servers require connectivity to a gateway router that provides IP
routing between the subcloud management subnet and the System Controller routing between the subcloud management subnet and the System Controller
management subnet, and between the subcloud |OAM| subnet and the management subnet, and between the subcloud |OAM| subnet and the
System Controller subnet. System Controller subnet.
.. include:: ../_includes/installing-a-subcloud-using-redfish-platform-management-service.rest
:start-after: begin-ref-1
:end-before: end-ref-1
#. Create the install-values.yaml file and use the content to pass the file #. Create the install-values.yaml file and use the content to pass the file
into the :command:`dcmanager subcloud add` command, using the into the :command:`dcmanager subcloud add` command, using the
:command:`--install-values` command option. :command:`--install-values` command option.
.. note:: .. note::
If your controller is on a ZTSystems Triton server that requires a If your controller is on a ZTSystems Triton server that requires a
longer timeout value, you can now use the rd.net.timeout.ipv6dad dracut longer timeout value, you can now use the rd.net.timeout.ipv6dad dracut
parameter to specify an increased timeout value for dracut to wait for parameter to specify an increased timeout value for dracut to wait for

View File

@ -30,6 +30,7 @@ subcloud, the subcloud installation process has two phases:
.. note:: .. note::
After a successful remote installation of a subcloud in a Distributed Cloud After a successful remote installation of a subcloud in a Distributed Cloud
system, a subsequent remote reinstallation fails because of an existing ssh system, a subsequent remote reinstallation fails because of an existing ssh
key entry in the /root/.ssh/known\_hosts on the System Controller. In this key entry in the /root/.ssh/known\_hosts on the System Controller. In this
@ -51,14 +52,17 @@ subcloud, the subcloud installation process has two phases:
#. At the subcloud location, physically install the servers and network #. At the subcloud location, physically install the servers and network
connectivity required for the subcloud. connectivity required for the subcloud.
.. See |inst-doc|: :ref:`Preparing Servers <preparing-servers>`.
.. note:: .. note::
The servers require connectivity to a gateway router that provides IP The servers require connectivity to a gateway router that provides IP
routing between the subcloud management subnet and the System routing between the subcloud management subnet and the System
Controller management subnet, and between the subcloud OAM subnet and Controller management subnet, and between the subcloud OAM subnet and
the System Controller subnet. the System Controller subnet.
.. include:: ../_includes/installing-a-subcloud-without-redfish-platform-management-service.rest
:start-after: begin-ref-1
:end-before: end-ref-1
#. Update the ISO image to modify installation boot parameters \(if #. Update the ISO image to modify installation boot parameters \(if
required\), automatically select boot menu options and add a kickstart file required\), automatically select boot menu options and add a kickstart file
to automatically perform configurations such as configuring the initial IP to automatically perform configurations such as configuring the initial IP
@ -132,13 +136,15 @@ subcloud, the subcloud installation process has two phases:
#. At the subcloud location, install the |prod| software from a USB #. At the subcloud location, install the |prod| software from a USB
device or a |PXE| Boot Server on the server designated as controller-0. device or a |PXE| Boot Server on the server designated as controller-0.
See |inst-doc| instructions for preparing servers. .. include:: ../_includes/installing-a-subcloud-without-redfish-platform-management-service.rest
:start-after: begin-ref-1
:end-before: end-ref-1
#. At the subcloud location, verify that the |OAM| interface on the subcloud #. At the subcloud location, verify that the |OAM| interface on the subcloud
controller has been properly configured by the kickstart file added to the controller has been properly configured by the kickstart file added to the
ISO. ISO.
Log in to the subcloud's controller-0 and ping the Central Cloud's floating #. Log in to the subcloud's controller-0 and ping the Central Cloud's floating
|OAM| IP Address. |OAM| IP Address.
#. At the System Controller, create a #. At the System Controller, create a
@ -203,6 +209,7 @@ subcloud, the subcloud installation process has two phases:
password: <your_wrs-aws.io_password> password: <your_wrs-aws.io_password>
.. note:: .. note::
If you have a reason not to use the Central Cloud's local registry you If you have a reason not to use the Central Cloud's local registry you
can pull the images from another local private docker registry. can pull the images from another local private docker registry.

View File

@ -26,111 +26,3 @@ Platform Management Service.
.. include:: ../_includes/installing-and-provisioning-a-subcloud.rest .. include:: ../_includes/installing-and-provisioning-a-subcloud.rest
:start-after: begin-shared-nic :start-after: begin-shared-nic
:end-before: end-shared-nic :end-before: end-shared-nic
-------------------------------------------------------------
Reinstall a Subcloud with Redfish Platform Management Service
-------------------------------------------------------------
For subclouds with servers that support Redfish Virtual Media Service
\(version 1.2 or higher\), you can use the Central Cloud's CLI to re-install
the ISO and bootstrap subclouds from the Central Cloud.
.. caution::
All application and data on the subcloud will be lost after re-installation.
.. rubric:: |context|
The subcloud reinstallation has two phases:
Executing the dcmanager subcloud reinstall command in the Central Cloud:
- Uses Redfish Virtual Media Service to remote install the ISO on controller-0
in the subcloud.
- Uses Ansible to bootstrap |prod| on controller-0.
.. rubric:: |prereq|
- The install values are required for subcloud reinstallation. By default,
install values are stored in the database after a subcloud installation or
upgrade, and the reinstallation will re-use the install values. If you want
to update the install values, use the following CLI command in the Central
Cloud.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --install-values install-values.yaml --bmc-password <password>
For more information on install-values.yaml file, see :ref:`Installing a Subcloud Using Redfish Platform Management Service
<installing-a-subcloud-using-redfish-platform-management-service>`.
You can only reinstall the same software version with the Central Cloud on
the subcloud.
- Check the subcloud's availability in the Central Cloud, for example,
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud list
+----+----------+------------+--------------+---------------+---------+
| id | name | management | availability | deploy status | sync |
+----+----------+------------+--------------+---------------+---------+
| 1 | subcloud1| unmanaged | offline | complete | unknown |
+----+----------+------------+--------------+---------------+---------+
As the reinstall will cause data and application loss, it is not necessary
and not recommended to reinstall a healthy subcloud. The dcmanager rejects
the reinstallation of a managed or online subcloud.
.. rubric:: |proc|
#. Execute the reinstall using the CLI. For example,
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud reinstall subcloud1
#. Confirm the reinstall of the subcloud.
You are prompted to enter ``reinstall`` to confirm the reinstallation.
.. warning::
This will reinstall the subcloud. All applications and data on the
subcloud will be lost.
Please type ``reinstall`` to confirm: reinstall
Any other input will abort the reinstallation.
#. At the Central Cloud, monitor the progress of the subcloud installation
and bootstrapping by using the deploy status field of the dcmanager
subcloud list command, for example,
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+---------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+---------+
| 1 | subcloud1 | unmanaged | offline | installing | unknown |
+----+-----------+------------+--------------+---------------+---------+
For more information on the deploy status filed, see :ref:`Installing a Subcloud Using Redfish Platform Management Service
<installing-a-subcloud-using-redfish-platform-management-service>`.
You can also monitor detailed logging of the subcloud installation,
bootstrapping by monitoring the following log files on the active
controller in the Central Cloud.
- /var/log/dcmanager/subcloud_name_install_date_stamp.log
- /var/log/dcmanager/subcloud_name_bootstrap_date_stamp.log
#. After the subcloud is successfully reinstalled and bootstrapped, use the
following command to reconfigure the subcloud, **subcloud reconfig**.
For more information, see :ref:`Managing Subclouds Using the CLI
<managing-subclouds-using-the-cli>`.

View File

@ -0,0 +1,235 @@
.. _rehoming-a-subcloud:
=================
Rehome a Subcloud
=================
|release-caveat|
When the System Controller needs to be reinstalled, or when the subclouds from
multiple System Controllers are being consolidated into a single System
Controller, you can add already deployed subclouds to a different System
Controller using the rehoming playbook.
.. note::
The rehoming playbook does not work with freshly installed/bootstrapped
subclouds.
Use the following procedure to enable subcloud rehoming and to update the new
subcloud configuration \(networking parameters, passwords, etc.\) to be
compatible with the new System Controller.
.. rubric:: |context|
There are six phases for Rehoming a subcloud:
#. Unmanage the subcloud from the previous System Controller.
.. note::
You can skip this phase if the previous System Controller is no longer
running or is unable to connect to the subcloud.
#. Update the admin password on the subcloud to match the new System
Controller, if required.
#. Run the :command:`subcloud add` command with the ``--migrate`` option on
the new System Controller. This will update the System Controller and
connect to the subcloud to update the appropriate configuration parameters.
#. On the subcloud, lock/unlock the subcloud controller(s) to enable the new
configuration.
#. Use the :command:`dcmanager subcloud list` command to check the status
of the subcloud, ensure the subcloud is online and complete before managing
the subcloud.
#. On the new System Controller, set the subcloud to "managed" and wait for it
to sync.
.. rubric:: |prereq|
- Ensure that the subcloud management subnet, oam_floating_address,
oam_node_0_address and oam_node_1_address \(if applicable\) does not overlap
addresses already being used by the new System Controller or any of its
subclouds.
- Ensure that the subcloud has been backed up, in case something goes wrong
and a subcloud system recovery is required.
- Transfer the yaml file that was used to bootstrap the subcloud prior to
rehoming, to the new System Controller. This data is required for rehoming.
- If the subcloud can be remotely installed via Redfish Virtual Media service,
transfer the yaml file that contains the install data for this subcloud,
and use this install data in the new System Controller, via the
``--install-values`` option, when running the remote subcloud reinstall,
upgrade or restore commands.
.. rubric:: |proc|
#. If the previous System Controller is running, use the following command to
ensure that it does not try to change subcloud configuration while you are
modifying it to be compatible with the new System Controller.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud unmanage <subcloud_name>
#. Ensure that the subcloud's bootstrap values file is available on the new
System Controller. If required, in the subcloud's bootstrap values file
update the **systemcontroller_gateway_address** entry to point to the
appropriate network gateway for the new System Controller to communicate
with the subcloud.
#. If the admin password of the subcloud does not match the admin password of
the new System Controller, use the following command to change the subcloud
admin password. This step is done on the subcloud that is being migrated.
.. code-block:: none
~(keystone_admin)]$ openstack user password set
.. note::
You will need to specify the old and the new password.
#. For an |AIO-DX| subcloud, ensure that the active controller is
controller-0. Perform a host-swact of the active controller \(controller-1\)
to make controller-0 active.
.. code-block:: none
~(keystone_admin)]$ system host-swact controller-1
#. Lock controller-1 of the subcloud.
.. code-block:: none
~(keystone_admin)]$ system host-lock controller-1
#. On the new System Controller, use the following command to start the
rehoming process.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud add --migrate bootstrap-address <subcloud-controller-0-oam-address> --bootstrap-values <bootstrap_values_file> [--install-values <install_values_file>]
The subcloud deploy status will change to "pre-rehome" and if the
preliminary steps complete successfully it will change to "rehoming".
At this point an Ansible playbook will run and update the appropriate
configuration data in the subcloud. You can query the status by running
:command:`dcmanager subcloud show` command. Once the subcloud has been
updated, the subcloud deploy status will change to "complete".
.. note::
The ``--install-values`` is optional. It is not mandatory for subcloud
rehoming. However, you can opt to save these values in the new System
Controller as part of the rehoming process so that future operations
that involve remote reinstallation of the subcloud (e.g. reinstall,
upgrade, restore) can be performed for a rehomed subcloud similar to
other existing Redfish capable subclouds in the Distributed Cloud.
**Delete the "image:" line from the install-values file, if it exists, so
that the image is correctly located based on the new System Controller
configuration**.
#. For an |AIO-SX| subcloud, use the following commands to lock/unlock the
controller \(wait for the lock to complete before unlocking the controller\).
.. code-block:: none
~(keystone_admin)]$ system host-lock controller-0
~(keystone_admin)]$ system host-unlock controller-0
For an |AIO-DX| subcloud, first, use the following command to unlock
controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-1
#. Wait until controller-1 is unlocked/online/available, then use the
following command to switch activity to controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-swact controller-0
#. After the swact is complete, use the following commands to lock/unlock
controller-0.
.. code-block:: none
~(keystone_admin)]$ system host-lock controller-0
~(keystone_admin)]$ system host-unlock controller-0
#. Wait until controller-0 is unlocked/online/available, then switch
activity back to controller-0.
#. Perform a swact on controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-swact controller-1
Wait until the swact is complete.
#. Use the :command:`dcmanager subcloud list` command to display the status of
the subcloud, and ensure the subcloud is online and complete before
managing the subcloud.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+---------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+---------+
| 1 | subcloud1 | unmanaged | online | complete | unknown |
+----+-----------+------------+--------------+---------------+---------+
#. Use the following command to "manage" the subcloud. This is executed on the
System Controller.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud manage <subcloud-name>
#. The new System Controller will audit the subcloud and determine whether it
is in-sync with the System Controller.
.. only:: partner
.. include:: /_includes/rehoming-a-subcloud.rest
:start-after: rehoming-begin
:end-before: rehoming-end
**Error Recovery**
If the subcloud rehoming process begins successfully, (status changes to
"rehoming") but there is a transient fault that prevents step 5 from completing
successfully, then manual error recovery is required.
The first stage of error recovery is to delete the subcloud from
the new System Controller and re-attempt rehoming using the following commands:
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud delete <subcloud-name>
~(keystone_admin)]$ dcmanager subcloud add --migrate bootstrap-address <subcloud-controller-0-oam-address> --bootstrap-values <bootstrap_values_file> [--install-values <install_values_file>]
If the second attempt fails, it is recommended to contact Wind River Customer
Support at https://www.windriver.com/support.
If all attempts fail, restore the subcloud from backups, that will revert the
subcloud to the original state prior to rehoming.

View File

@ -0,0 +1,111 @@
.. _reinstalling-a-subcloud-with-redfish-platform-management-service:
=============================================================
Reinstall a Subcloud with Redfish Platform Management Service
=============================================================
For subclouds with servers that support Redfish Virtual Media Service
\(version 1.2 or higher\), you can use the Central Cloud's CLI to re-install
the ISO and bootstrap subclouds from the Central Cloud.
.. caution::
All application and data on the subcloud will be lost after re-installation.
.. rubric:: |context|
The subcloud reinstallation has two phases:
Executing the dcmanager subcloud reinstall command in the Central Cloud:
- Uses Redfish Virtual Media Service to remote install the ISO on controller-0
in the subcloud.
- Uses Ansible to bootstrap |prod| on controller-0.
.. rubric:: |prereq|
- The install values are required for subcloud reinstallation. By default,
install values are stored in the database after a subcloud installation or
upgrade, and the reinstallation will re-use the install values. If you want
to update the install values, use the following CLI command in the Central
Cloud.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --install-values install-values.yaml --bmc-password <password>
For more information on install-values.yaml file, see :ref:`Installing a Subcloud Using Redfish Platform Management Service
<installing-a-subcloud-using-redfish-platform-management-service>`.
You can only reinstall the same software version with the Central Cloud on
the subcloud.
- Check the subcloud's availability in the Central Cloud, for example,
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud list
+----+----------+------------+--------------+---------------+---------+
| id | name | management | availability | deploy status | sync |
+----+----------+------------+--------------+---------------+---------+
| 1 | subcloud1| unmanaged | offline | complete | unknown |
+----+----------+------------+--------------+---------------+---------+
As the reinstall will cause data and application loss, it is not necessary
and not recommended to reinstall a healthy subcloud. The dcmanager rejects
the reinstallation of a managed or online subcloud.
.. rubric:: |proc|
#. Execute the reinstall using the CLI. For example,
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud reinstall subcloud1
#. Confirm the reinstall of the subcloud.
You are prompted to enter ``reinstall`` to confirm the reinstallation.
.. warning::
This will reinstall the subcloud. All applications and data on the
subcloud will be lost.
Please type ``reinstall`` to confirm: reinstall
Any other input will abort the reinstallation.
#. At the Central Cloud, monitor the progress of the subcloud installation
and bootstrapping by using the deploy status field of the dcmanager
subcloud list command, for example,
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+---------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+---------+
| 1 | subcloud1 | unmanaged | offline | installing | unknown |
+----+-----------+------------+--------------+---------------+---------+
For more information on the deploy status filed, see :ref:`Installing a Subcloud Using Redfish Platform Management Service
<installing-a-subcloud-using-redfish-platform-management-service>`.
You can also monitor detailed logging of the subcloud installation,
bootstrapping by monitoring the following log files on the active
controller in the Central Cloud.
- /var/log/dcmanager/subcloud_name_install_date_stamp.log
- /var/log/dcmanager/subcloud_name_bootstrap_date_stamp.log
#. After the subcloud is successfully reinstalled and bootstrapped, use the
following command to reconfigure the subcloud, **subcloud reconfig**.
For more information, see :ref:`Managing Subclouds Using the CLI
<managing-subclouds-using-the-cli>`.