Merge "Review Upgrade the System Controller Using the CLI page (dsr10)"

This commit is contained in:
Zuul 2025-01-24 17:19:31 +00:00 committed by Gerrit Code Review
commit 0b3356194f
3 changed files with 244 additions and 216 deletions

View File

@ -0,0 +1,10 @@
.. software-upload-begin
.. software-upload-end
.. software-load-begin
.. software-load-end
.. software-upload-precheck-begin
.. software-upload-precheck-end

View File

@ -14,3 +14,9 @@
.. deploymentmanager-begin
.. deploymentmanager-end
.. manualupgrade1-begin
.. manualupgrade1-end
.. manualupgrade2-begin
.. manualupgrade2-end

View File

@ -5,10 +5,17 @@
Upgrade the System Controller Using the CLI
===========================================
You can upload and apply upgrades to the system controller in order to upgrade
the central repository, from the CLI. The system controller can be upgraded
using either a manual software upgrade procedure or by using the
non-distributed systems :command:`sw-manager` orchestration procedure.
You can upload and apply a software upgrade (deploy a major release or patched
major Release) to the system controller, using the CLI. The software upgrade
not only upgrades software of the system controller but also updates software
in the system controller's |prod-dc| vault and the central container image
repository, in support of subsequent subcloud upgrades.
The system controller can be upgraded using either a :ref:`manual software
upgrade <manual-host-software-deployment-ee17ec6f71a4>` or by using the
standalone cloud :ref:`orchestrated software upgraded procedure
<orchestrated-deployment-host-software-deployment-d234754c7d20>` with
:command:`sw-manager`.
.. rubric:: |context|
@ -16,9 +23,54 @@ Follow the steps below to manually upgrade the system controller:
.. rubric:: |prereq|
- Validate the list of new images with the target release. If you are using a
private registry for installs/upgrades, you must populate your private
registry with the new images prior to bootstrap and/or patch application.
.. only:: starlingx
- Transfer the ISO and signature files for the new major release (or new
patched major release) from the |prod-long| mirror
https://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/debian/monolithic/outputs/iso/
to controller-0 (active controller).
- Upgrade to a patched major release (patched ISO).
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: manualupgrade1-begin
:end-before: manualupgrade1-end
.. only:: starlingx
- If you are using a private registry (see the ``docker / *-registry``
sections of `system service-parameter-list`), transfer the container
image versions associated with the new major release (or new patched
major release) using the list from |prod-long| mirror
https://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/debian/monolithic/outputs/docker-images/
from docker.io to the private registry.
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: manualupgrade2-begin
:end-before: manualupgrade2-end
- The platform issuer (system-local-ca) is required to have an RSA
certificate/private key pair before upgrading. If ``system-local-ca`` was
configured with a different type of certificate/private key, the deploy pre
check will fail with an informative message. In this case, the
:ref:`migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d`
procedure needs to be executed to reconfigure ``system-local-ca`` with the
RSA certificate/private key targeting the ``SystemController`` and all
subclouds.
- If there are software updates for your current |prod| software release that
are required in order to upgrade to the new software release, these
patches/updates should be applied in a separate software deploy of the
patch release(s) (see :ref:`manual-host-software-deployment-ee17ec6f71a4`)
on the system controller. These patches/updates should also be applied in
an orchestrated software deploy of the subclouds (see
:ref:`orchestrated-deployment-host-software-deployment-d234754c7d20`) in
order to get patch current of all the systems before starting the upgrade
to the new major release on the |prod-dc| system.
.. rubric:: |proc|
@ -37,39 +89,35 @@ Follow the steps below to manually upgrade the system controller:
:start-after: license-begin
:end-before: license-end
#. Transfer iso and signature files to controller-0 (active controller) and import the load.
#. Upload the load.
.. code-block:: none
.. only:: starlingx
~(keystone_admin)]$ software --os-region-name SystemController upload --local <bootimage>.iso <bootimage>.sig
+-------------------------------+-------------------+
| Uploaded File | Release |
+-------------------------------+-------------------+
| starlingx-intel-x86-64-cd.iso | starlingx-24.09.0 |
+-------------------------------+-------------------+
.. parsed-literal::
~(keystone_admin)]$ software upload --local /full_path/<bootimage>.iso /full_path/<bootimage>.sig
+-------------------------------+--------------------------+
| Uploaded File | Release |
+-------------------------------+--------------------------+
| starlingx-intel-x86-64-cd.iso | stx-10.0.0 |
+-------------------------------+--------------------------+
.. only:: partner
.. include:: /_includes/software-upload-output.rest
:start-after: software-upload-begin
:end-before: software-upload-end
.. note::
Do not use ``--os-region-name SystemController`` proxy at this moment for
subcloud deployment. This step will be performed once the system
controller deploy is complete.
.. note::
If you face any issue while importing the load, go to
``/var/log/software.log`` and examine the error messages.
.. note::
This can take several minutes. After the system controller is successfully
upgraded, the old load (which is in imported state) should not be deleted
from load list otherwise the subcloud upgrade orchestration will fail
with an error.
#. Apply any required software updates. After the update is installed ensure
controller-0 is active.
The system controller as well as the subclouds must be 'patch current'. All
software updates related to your current |prod| software release must be
uploaded, applied, and installed.
All software updates to the new |prod| release, only need to be uploaded
and applied. The install of these software updates will occur automatically
during the software upgrade procedure as the hosts are reset to load the
new release of software.
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
@ -81,8 +129,7 @@ Follow the steps below to manually upgrade the system controller:
Check the current system health status, resolve any alarms and other issues
reported by the :command:`software deploy precheck <release-id>` command
then recheck the system health status to confirm that all **System Health**
fields are set to **OK**. "If the upgrade health query fails 'Boot Device
and Root file system Device' check as seen below:"
fields are set to **OK**.
.. code-block:: none
@ -97,32 +144,29 @@ Follow the steps below to manually upgrade the system controller:
All kubernetes control plane pods are ready: [OK]
All kubernetes applications are in a valid state: [OK]
All hosts are patch current: [OK]
Active kubernetes version [vX.XX.X] is a valid supported version: [OK]
Active controller is controller-0: [OK]
Installed license is valid: [OK]
Valid upgrade path from release 22.12 to 24.09: [OK]
Required patches are applied: [OK]
Where ``<release-id>`` is ``starlingx-24.09.0`` for above software upload
example, or it can be found out by running :command:`software list`.
The platform issuer (system-local-ca) is required to have an RSA
certificate/private key pair before upgrading. If ``system-local-ca`` was
configured with a different type of certificate/private key, the upgrade
pre check will fail with an informative message. In this case, the
:ref:`migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d` procedure
needs to be executed to reconfigure ``system-local-ca`` with the RSA
certificate/private key targeting the ``SystemController`` and all subclouds.
.. only:: starlingx
By default, the upgrade process cannot run and is not recommended to run
Where ``<release-id>`` is stx-10.0.0 for above software upload
example, or it can be found out by running :command:`software list`.
.. only:: partner
.. include:: /_includes/software-upload-output.rest
:start-after: software-upload-precheck-begin
:end-before: software-upload-precheck-end
By default, the deploy process cannot run and is not recommended to run
with active alarms present. It is strongly recommended that you clear your
system of all alarms before doing an upgrade.
system of all alarms before doing a deploy.
.. note::
Use the command :command:`system upgrade-start --force` to force the
upgrade process to start and ignore non-management-affecting alarms.
This should ONLY be done if these alarms do not cause an issue for the
upgrades process.
#. Start the upgrade from controller-0.
#. Begin the deploy from controller-0.
Make sure that controller-0 is the active controller, and you are logged
into controller-0 as **sysadmin** and your present working directory is
@ -134,54 +178,34 @@ Follow the steps below to manually upgrade the system controller:
+--------------+------------+------+--------------+
| From Release | To Release | RR | State |
+--------------+------------+------+--------------+
| 22.12.0 | 24.09.0 | True | deploy-start |
| 22.12.0 | 24.09.100 | True | deploy-start |
+--------------+------------+------+--------------+
When ``deploy start`` is complete:
.. code-block:: none
+--------------+------------+------+-------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-------------------+
| 22.12.0 | 24.09.0 | True | deploy-start-done |
+--------------+------------+------+-------------------+
This will make a copy of the system data to be used in the upgrade.
Configuration changes must not be made after this point, until the
upgrade is completed.
The following upgrade state applies once this command is executed. Run the
:command:`system upgrade-show` command to verify the status of the upgrade.
- started:
- State entered after :command:`system upgrade-start` completes.
- Release <nn.nn> system data (for example, postgres databases) has
been exported to be used in the upgrade.
As part of the upgrade, the upgrade process checks the health of the system
and validates that the system is ready for an upgrade.
The upgrade process checks that no alarms are active before starting an
upgrade.
.. note::
Use the command :command:`system upgrade-start --force` to force the
upgrades process to start and to ignore management affecting alarms.
This should only be done if these alarms do not cause an issue for the
upgrades process.
It is recommended to run the :command:`software deploy precheck`
command before running :command:`software deploy start`. However, the
:command:`software deploy start` command will automatically run
the precheck command even if the precheck command has not been run
before.
The ``fm alarm-list --mgmt_affecting`` option provides specific alarms
which may be blocking an orchestrated upgrade.
Wait for :command:`software deploy start <release-id>` to complete by monitoring the
status of the deploy.
On systems with Ceph storage, it also checks that the Ceph cluster is
healthy.
.. code-block:: none
#. Upgrade controller-1.
~(keystone_admin)]$ software deploy show
+--------------+------------+------+-------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-------------------+
| 22.12.0 | 24.09.100 | True | deploy-start-done |
+--------------+------------+------+-------------------+
:command:`software deploy start <release-id>` will migrate configuration
data to the new release's data model. Configuration must not be changed
after this point, until the deploy is completed.
#. Software deploy controller-1.
#. Lock controller-1.
@ -190,10 +214,7 @@ Follow the steps below to manually upgrade the system controller:
~(keystone_admin)]$ system host-lock controller-1
#. Start the upgrade on controller-1.
Controller-1 installs the update and reboots, then performs data
migration.
#. Begin the deploy on controller-1.
.. code-block:: none
@ -211,29 +232,31 @@ Follow the steps below to manually upgrade the system controller:
the DRBD sync **400.001** Services-related alarm has been raised and then
cleared.
The **upgrading-controllers** state applies when this command is
run. This state is entered after controller-1 has been upgraded to
release nn.nn and data migration is successfully completed.
When the first :command:`software deploy host <hostname>` command is
issued after the deploy state becomes ``deploy-start-done``, the
software deploy show state is changed to ``deploy-host``. When the
software is deployed to all the hosts, that is, when the
:command:`software deploy host <hostname>` successfully completes
against the last host, the software deploy show state changes to
``deploy-host-done``.
where *nn.nn* in the update file name is the |prod| release number.
If it transitions to **unlocked-disabled-failed**, check the issue
before proceeding to the next step. The alarms may indicate a
configuration error. Check the result of the configuration logs on
controller-1, (for example, Error logs in
controller1:``/var/log/puppet``).
If software deploy show state transitions to
**unlocked-disabled-failed**, check the issue before proceeding to the
next step. The alarms may indicate a configuration error. Check the
result of the configuration logs on controller-1, (for example, Error
logs in controller-1:``/var/log/puppet``).
#. Run the :command:`system application-list` and :command:`software deploy host-list`
commands to view the current progress.
After controller-1 is unlocked/enabled/available, insert step to check
After controller-1 is unlocked/enabled/available, run the following step to check
controller-1 is running the new release:
.. code-block:: none
~(keystone_admin)]$ system host-show controller-1
#. Set controller-1 as the active controller. Swact to controller-1.
#. Set controller-1 as the active controller. Swact away from controller-0.
.. code-block:: none
@ -243,12 +266,7 @@ Follow the steps below to manually upgrade the system controller:
proceeding to the next step. When all services on controller-1 are
enabled-active, the swact is complete.
.. note::
Continue the remaining steps below to manually upgrade or use upgrade
orchestration to upgrade the remaining nodes.
#. Upgrade controller-0.
#. Software deploy controller-0.
For more information, see
:ref:`introduction-platform-software-updates-upgrades-06d6de90bbd0`.
@ -259,57 +277,28 @@ Follow the steps below to manually upgrade the system controller:
~(keystone_admin)]$ system host-lock controller-0
#. Upgrade controller-0.
#. Begin the deploy on controller-0.
.. code-block:: none
~(keystone_admin)]$ software deploy host controller-0
.. note::
controller-0 must pxe-boot over the management network and its load
must be served from controller-1, and not from any external
pxe-boot server attached to the |OAM| network. To ensure this,
check that the network boot list/order of BIOS |NIC| is correct.
Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id>
#. Unlock controller-0.
.. code-block:: none
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-0
.. code-block:: none
~(keystone_admin)]$ software deploy host controller-0
You may encounter the following error message:
.. code-block:: none
Expecting number of interface sriov_numvfs=16. Please wait a few
minutes for inventory update and retry host-unlock.
If you see this error message, you need to retry after 5 minutes.
Wait until the DRBD sync **400.001** Services-related alarm has been raised
and then cleared before proceeding to the next step.
- upgrading-hosts:
- State entered when both controllers are running release <nn.nn>
software.
#. Check the system health to ensure that there are no unexpected alarms.
.. code-block:: none
~(keystone_admin)]$ fm alarm-list
Clear all alarms unrelated to the upgrade process.
Clear all alarms unrelated to the deploy process.
#. If using Ceph storage backend, upgrade the storage nodes one at a time.
#. If using Ceph storage backend, deploy the storage nodes one at a time.
The storage node must be locked and all |OSDs| must be down in order to do
the upgrade.
@ -323,16 +312,32 @@ Follow the steps below to manually upgrade the system controller:
#. Verify that the |OSDs| are down after the storage node is locked.
In the Horizon interface, navigate to **Admin** \> **Platform** \>
**Storage Overview** to view the status of the |OSDs|.
.. code-block:: none
#. Upgrade storage-0.
~(keystone_admin)]$ ceph osd tree
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| ID | CLASS | WEIGHT | TYPE | NAME | STATUS | REWEIGHT | PRI-AFF |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| -1 | | 0.01700 | root | storage-tier | | | |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| -2 | | 0.01700 | chassis | group-0 | | | |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| -4 | | 0.00850 | host | controller-0 | | | |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| 0 | hdd | 0.00850 | | osd.0 | up | 1.00000 | 1.00000 |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| -3 | | 0.00850 | host | controller-1 | | | |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| 1 | hdd | 0.00850 | | osd.1 | down | 1.00000 | 1.00000 |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
#. Begin the deploy on storage-0.
.. code-block:: none
~(keystone_admin)]$ software deploy host storage-0
The upgrade is complete when the node comes online, and at that point,
The deploy is complete when the node comes online, and at that point,
you can safely unlock the node.
After upgrading a storage node, but before unlocking, there are Ceph
@ -341,7 +346,7 @@ Follow the steps below to manually upgrade the system controller:
(since the infrastructure network interface configuration has not been
applied to the storage node yet, as it has not been unlocked).
Unlock the node as soon as the upgraded storage node comes online.
Unlock the node as soon as the deployed storage node comes online.
#. Unlock storage-0.
@ -350,17 +355,17 @@ Follow the steps below to manually upgrade the system controller:
~(keystone_admin)]$ system host-unlock storage-0
Wait for all alarms to clear after the unlock before proceeding to
upgrade the next storage host.
deploy the next storage host.
#. Repeat the above steps for each storage host.
.. note::
After upgrading the first storage node you can expect alarm
After deploying the first storage node you can expect alarm
**800.003**. The alarm is cleared after all storage nodes are
upgraded.
deployed.
#. If worker nodes are present, upgrade worker hosts, serially or in parallel,
#. If worker nodes are present, deploy worker hosts, serially or in parallel,
if any.
@ -370,7 +375,7 @@ Follow the steps below to manually upgrade the system controller:
~(keystone_admin)]$ system host-lock worker-0
#. Upgrade worker-0.
#. Deploy worker-0.
.. code-block:: none
@ -391,7 +396,7 @@ Follow the steps below to manually upgrade the system controller:
#. Repeat the above steps for each worker host.
#. Set controller-0 as the active controller. Swact to controller-0.
#. Set controller-0 as the active controller. Swact away from controller-1.
.. code-block:: none
@ -401,7 +406,7 @@ Follow the steps below to manually upgrade the system controller:
proceeding to the next step. When all services on controller-0 are
enabled-active, the swact is complete.
#. Activate the upgrade.
#. Activate the deploy.
.. code-block:: none
@ -410,30 +415,32 @@ Follow the steps below to manually upgrade the system controller:
Check deploy state:
.. code-block:: none
.. code-block:: none
~(keystone_admin)]$ software deploy show
+--------------+------------+------+-----------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-----------------+
| 22.12.0 | 24.09.0 | True | deploy-activate |
| 22.12.0 | 24.09.100 | True | deploy-activate |
+--------------+------------+------+-----------------+
When activate is complete:
Wait for :command:`software deploy activate` to complete by monitoring the
status of the deploy.
.. code-block:: none
.. code-block:: none
~(keystone_admin)]$ software deploy show
+--------------+------------+------+----------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+----------------------+
| 22.12.0 | 24.09.0 | True | deploy-activate-done |
| 22.12.0 | 24.09.100 | True | deploy-activate-done |
+--------------+------------+------+----------------------+
During the running of the :command:`upgrade-activate` command, new
During the running of the :command:`software deploy activate` command, new
configurations are applied to the controller. 250.001 (**hostname
Configuration is out-of-date**) alarms are raised and are cleared as the
configuration is applied. The upgrade state goes from **activating** to
**activation-complete** once this is done.
configuration is applied. The deploy state goes from ``deploy-activate`` to
``deploy-activate-done`` once this is done.
.. only:: partner
@ -443,43 +450,19 @@ Follow the steps below to manually upgrade the system controller:
The following states apply when this command is executed.
**activation-requested**
State entered when :command:`system upgrade-activate` is executed.
**deploy-activate**
State entered when deploy is being activated.
**activating**
State entered when we have started activating the upgrade by
applying new configurations to the controller and compute hosts.
**activating-hosts**
State entered when applying host-specific configurations. This state is
entered only if needed.
**activation-complete**
State entered when new configurations have been applied to all
controller and compute hosts.
#. Check the status of the upgrade again to see it has reached
**activation-complete**, for example.
.. code-block:: none
~(keystone_admin)]$ system upgrade-show
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
| state | activation-complete |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
**deploy-activate-done**
State entered when the deploy-activate completes successfully.
.. note::
This can take more than half an hour to complete.
This can take more than 15 minutes to complete.
.. note::
Alarms are generated as the subcloud load sync_status is "out-of-sync".
Alarms are generated as the subcloud software sync_status is "out-of-sync".
#. Complete the upgrade.
@ -492,33 +475,62 @@ Follow the steps below to manually upgrade the system controller:
.. code-block:: none
~(keystone_admin)]$ software deploy show,
+--------------+------------+------+------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+------------------+
| 22.12.0 | 24.09.0 | True | deploy-completed |
+--------------+------------+------+------------------+
~(keystone_admin)]$ software deploy show
+--------------+------------+------+-----------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-----------------------+
| 22.12.0 | 24.09.100 | True | deploy-completed |
+--------------+------------+------+-----------------------+
Run the :command:`system upgrade-show` command, and the status will display
"no upgrade in progress". The subclouds will be out-of-sync.
#. Upgrade Kubernetes, after the platform deploy is completed. To upgrade
Kubernetes of standalone system, see :ref:`index-updates-kub-03d4d10fa0be`.
#. Upgrade Kubernetes, after deploy is completed. When Kubernetes upgrade
completes, conclude the deploy by deleting it.
#. When the Kubernetes upgrade completes, conclude the platform deploy by deleting
it.
.. code-block:: none
~(keystone_admin)]$ software deploy delete, output
~(keystone_admin)]$ software deploy delete
Deploy deleted with success
Verify deploy state:
.. code-block:: none
~(keystone_admin)]$ software deploy show, output
.. code-block:: none
~(keystone_admin)]$ software deploy show
No deploy in progress
#. Upload the load for subcloud deployment.
.. only:: starlingx
.. parsed-literal::
~(keystone_admin)]$ software --os-region-name SystemController upload --local /full_path/<bootimage>.iso /full_path/<bootimage>.sig
+-------------------------------+--------------------------+
| Uploaded File | Release |
+-------------------------------+--------------------------+
| starlingx-intel-x86-64-cd.iso | stx-10.0.0 |
+-------------------------------+--------------------------+
.. only:: partner
.. include:: /_includes/software-upload-output.rest
:start-after: software-load-begin
:end-before: software-load-end
.. note::
This can take a few minutes. After the system controller is successfully
deployed, the old load (which is in imported state) should not be deleted
from load list as this load is required for managing the subclouds that
are still running the previous load.
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: DMupgrades-begin
:end-before: DMupgrades-end
.. rubric:: |postreq|
Separately apply the patches after the upgrade to the major release.