docs/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst
Ngairangbam Mili 02e9e8b05d Review Upgrade the System Controller Using the CLI page (dsr10)
Elisa is working on the same "Upgrade the System Controller Using the CLI" page.
Link: https://review.opendev.org/c/starlingx/docs/+/937735/28/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst#b485
@all Please review the changes in the above review

Change-Id: I979c0c3387d7e7fd90f588a0e74c6f95c4eb2ff5
Signed-off-by: Ngairangbam Mili <ngairangbam.mili@windriver.com>
2025-01-24 16:46:12 +00:00

537 lines
20 KiB
ReStructuredText

.. vco1593176327490
.. _upgrading-the-systemcontroller-using-the-cli:
===========================================
Upgrade the System Controller Using the CLI
===========================================
You can upload and apply a software upgrade (deploy a major release or patched
major Release) to the system controller, using the CLI. The software upgrade
not only upgrades software of the system controller but also updates software
in the system controller's |prod-dc| vault and the central container image
repository, in support of subsequent subcloud upgrades.
The system controller can be upgraded using either a :ref:`manual software
upgrade <manual-host-software-deployment-ee17ec6f71a4>` or by using the
standalone cloud :ref:`orchestrated software upgraded procedure
<orchestrated-deployment-host-software-deployment-d234754c7d20>` with
:command:`sw-manager`.
.. rubric:: |context|
Follow the steps below to manually upgrade the system controller:
.. rubric:: |prereq|
.. only:: starlingx
- Transfer the ISO and signature files for the new major release (or new
patched major release) from the |prod-long| mirror
https://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/debian/monolithic/outputs/iso/
to controller-0 (active controller).
- Upgrade to a patched major release (patched ISO).
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: manualupgrade1-begin
:end-before: manualupgrade1-end
.. only:: starlingx
- If you are using a private registry (see the ``docker / *-registry``
sections of `system service-parameter-list`), transfer the container
image versions associated with the new major release (or new patched
major release) using the list from |prod-long| mirror
https://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/debian/monolithic/outputs/docker-images/
from docker.io to the private registry.
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: manualupgrade2-begin
:end-before: manualupgrade2-end
- The platform issuer (system-local-ca) is required to have an RSA
certificate/private key pair before upgrading. If ``system-local-ca`` was
configured with a different type of certificate/private key, the deploy pre
check will fail with an informative message. In this case, the
:ref:`migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d`
procedure needs to be executed to reconfigure ``system-local-ca`` with the
RSA certificate/private key targeting the ``SystemController`` and all
subclouds.
- If there are software updates for your current |prod| software release that
are required in order to upgrade to the new software release, these
patches/updates should be applied in a separate software deploy of the
patch release(s) (see :ref:`manual-host-software-deployment-ee17ec6f71a4`)
on the system controller. These patches/updates should also be applied in
an orchestrated software deploy of the subclouds (see
:ref:`orchestrated-deployment-host-software-deployment-d234754c7d20`) in
order to get patch current of all the systems before starting the upgrade
to the new major release on the |prod-dc| system.
.. rubric:: |proc|
.. _upgrading-the-systemcontroller-using-the-cli-steps-oq4-dgm-cmb:
#. Source the platform environment.
.. code-block:: none
$ source /etc/platform/openrc
~(keystone_admin)]$
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: license-begin
:end-before: license-end
#. Upload the load.
.. only:: starlingx
.. parsed-literal::
~(keystone_admin)]$ software upload --local /full_path/<bootimage>.iso /full_path/<bootimage>.sig
+-------------------------------+--------------------------+
| Uploaded File | Release |
+-------------------------------+--------------------------+
| starlingx-intel-x86-64-cd.iso | stx-10.0.0 |
+-------------------------------+--------------------------+
.. only:: partner
.. include:: /_includes/software-upload-output.rest
:start-after: software-upload-begin
:end-before: software-upload-end
.. note::
Do not use ``--os-region-name SystemController`` proxy at this moment for
subcloud deployment. This step will be performed once the system
controller deploy is complete.
.. note::
If you face any issue while importing the load, go to
``/var/log/software.log`` and examine the error messages.
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: wrsbegin
:end-before: wrsend
#. Confirm that the system is healthy.
Check the current system health status, resolve any alarms and other issues
reported by the :command:`software deploy precheck <release-id>` command
then recheck the system health status to confirm that all **System Health**
fields are set to **OK**.
.. code-block:: none
~(keystone_admin)]$ software deploy precheck <release-id>
System Health:
All hosts are provisioned: [OK]
All hosts are unlocked/enabled: [OK]
All hosts have current configurations: [OK]
Ceph Storage Healthy: [OK]
No alarms: [OK]
All kubernetes nodes are ready: [OK]
All kubernetes control plane pods are ready: [OK]
All kubernetes applications are in a valid state: [OK]
All hosts are patch current: [OK]
Active kubernetes version [vX.XX.X] is a valid supported version: [OK]
Active controller is controller-0: [OK]
Installed license is valid: [OK]
Valid upgrade path from release 22.12 to 24.09: [OK]
Required patches are applied: [OK]
.. only:: starlingx
Where ``<release-id>`` is stx-10.0.0 for above software upload
example, or it can be found out by running :command:`software list`.
.. only:: partner
.. include:: /_includes/software-upload-output.rest
:start-after: software-upload-precheck-begin
:end-before: software-upload-precheck-end
By default, the deploy process cannot run and is not recommended to run
with active alarms present. It is strongly recommended that you clear your
system of all alarms before doing a deploy.
#. Begin the deploy from controller-0.
Make sure that controller-0 is the active controller, and you are logged
into controller-0 as **sysadmin** and your present working directory is
your home directory.
.. code-block:: none
~(keystone_admin)]$ software deploy start <release-id>
+--------------+------------+------+--------------+
| From Release | To Release | RR | State |
+--------------+------------+------+--------------+
| 22.12.0 | 24.09.100 | True | deploy-start |
+--------------+------------+------+--------------+
.. note::
It is recommended to run the :command:`software deploy precheck`
command before running :command:`software deploy start`. However, the
:command:`software deploy start` command will automatically run
the precheck command even if the precheck command has not been run
before.
Wait for :command:`software deploy start <release-id>` to complete by monitoring the
status of the deploy.
.. code-block:: none
~(keystone_admin)]$ software deploy show
+--------------+------------+------+-------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-------------------+
| 22.12.0 | 24.09.100 | True | deploy-start-done |
+--------------+------------+------+-------------------+
:command:`software deploy start <release-id>` will migrate configuration
data to the new release's data model. Configuration must not be changed
after this point, until the deploy is completed.
#. Software deploy controller-1.
#. Lock controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-lock controller-1
#. Begin the deploy on controller-1.
.. code-block:: none
~(keystone_admin)]$ software deploy host controller-1
Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id>
Host installation was successful on controller-1
#. Unlock controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-1
Wait for controller-1 to enter the ``unlocked-enabled`` state. Wait until
the DRBD sync **400.001** Services-related alarm has been raised and then
cleared.
When the first :command:`software deploy host <hostname>` command is
issued after the deploy state becomes ``deploy-start-done``, the
software deploy show state is changed to ``deploy-host``. When the
software is deployed to all the hosts, that is, when the
:command:`software deploy host <hostname>` successfully completes
against the last host, the software deploy show state changes to
``deploy-host-done``.
If software deploy show state transitions to
**unlocked-disabled-failed**, check the issue before proceeding to the
next step. The alarms may indicate a configuration error. Check the
result of the configuration logs on controller-1, (for example, Error
logs in controller-1:``/var/log/puppet``).
#. Run the :command:`system application-list` and :command:`software deploy host-list`
commands to view the current progress.
After controller-1 is unlocked/enabled/available, run the following step to check
controller-1 is running the new release:
.. code-block:: none
~(keystone_admin)]$ system host-show controller-1
#. Set controller-1 as the active controller. Swact away from controller-0.
.. code-block:: none
~(keystone_admin)]$ system host-swact controller-0
Wait until services have gone active on the new active controller-1 before
proceeding to the next step. When all services on controller-1 are
enabled-active, the swact is complete.
#. Software deploy controller-0.
For more information, see
:ref:`introduction-platform-software-updates-upgrades-06d6de90bbd0`.
#. Lock controller-0.
.. code-block:: none
~(keystone_admin)]$ system host-lock controller-0
#. Begin the deploy on controller-0.
.. code-block:: none
~(keystone_admin)]$ software deploy host controller-0
Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id>
#. Unlock controller-0.
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-0
#. Check the system health to ensure that there are no unexpected alarms.
.. code-block:: none
~(keystone_admin)]$ fm alarm-list
Clear all alarms unrelated to the deploy process.
#. If using Ceph storage backend, deploy the storage nodes one at a time.
The storage node must be locked and all |OSDs| must be down in order to do
the upgrade.
#. Lock storage-0.
.. code-block:: none
~(keystone_admin)]$ system host-lock storage-0
#. Verify that the |OSDs| are down after the storage node is locked.
.. code-block:: none
~(keystone_admin)]$ ceph osd tree
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| ID | CLASS | WEIGHT | TYPE | NAME | STATUS | REWEIGHT | PRI-AFF |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| -1 | | 0.01700 | root | storage-tier | | | |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| -2 | | 0.01700 | chassis | group-0 | | | |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| -4 | | 0.00850 | host | controller-0 | | | |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| 0 | hdd | 0.00850 | | osd.0 | up | 1.00000 | 1.00000 |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| -3 | | 0.00850 | host | controller-1 | | | |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
| 1 | hdd | 0.00850 | | osd.1 | down | 1.00000 | 1.00000 |
+----+---------+------------+---------+-------------------+-------------+------------------+-------------+
#. Begin the deploy on storage-0.
.. code-block:: none
~(keystone_admin)]$ software deploy host storage-0
The deploy is complete when the node comes online, and at that point,
you can safely unlock the node.
After upgrading a storage node, but before unlocking, there are Ceph
synchronization alarms (that appear to be making progress in
synching), and there are infrastructure network interface alarms
(since the infrastructure network interface configuration has not been
applied to the storage node yet, as it has not been unlocked).
Unlock the node as soon as the deployed storage node comes online.
#. Unlock storage-0.
.. code-block:: none
~(keystone_admin)]$ system host-unlock storage-0
Wait for all alarms to clear after the unlock before proceeding to
deploy the next storage host.
#. Repeat the above steps for each storage host.
.. note::
After deploying the first storage node you can expect alarm
**800.003**. The alarm is cleared after all storage nodes are
deployed.
#. If worker nodes are present, deploy worker hosts, serially or in parallel,
if any.
#. Lock worker-0.
.. code-block:: none
~(keystone_admin)]$ system host-lock worker-0
#. Deploy worker-0.
.. code-block:: none
~(keystone_admin)]$ software deploy host worker-0
Wait for the host to run the installer, reboot, and go online before
unlocking it in the next step.
#. Unlock worker-0.
.. code-block:: none
~(keystone_admin)]$ system host-unlock worker-0
Wait for all alarms to clear after the unlock before proceeding to the
next worker host.
#. Repeat the above steps for each worker host.
#. Set controller-0 as the active controller. Swact away from controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-swact controller-1
Wait until services have gone active on the active controller-0 before
proceeding to the next step. When all services on controller-0 are
enabled-active, the swact is complete.
#. Activate the deploy.
.. code-block:: none
~(keystone_admin)]$ software deploy activate
Deploy activate has started
Check deploy state:
.. code-block:: none
~(keystone_admin)]$ software deploy show
+--------------+------------+------+-----------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-----------------+
| 22.12.0 | 24.09.100 | True | deploy-activate |
+--------------+------------+------+-----------------+
Wait for :command:`software deploy activate` to complete by monitoring the
status of the deploy.
.. code-block:: none
~(keystone_admin)]$ software deploy show
+--------------+------------+------+----------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+----------------------+
| 22.12.0 | 24.09.100 | True | deploy-activate-done |
+--------------+------------+------+----------------------+
During the running of the :command:`software deploy activate` command, new
configurations are applied to the controller. 250.001 (**hostname
Configuration is out-of-date**) alarms are raised and are cleared as the
configuration is applied. The deploy state goes from ``deploy-activate`` to
``deploy-activate-done`` once this is done.
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: deploymentmanager-begin
:end-before: deploymentmanager-end
The following states apply when this command is executed.
**deploy-activate**
State entered when deploy is being activated.
**deploy-activate-done**
State entered when the deploy-activate completes successfully.
.. note::
This can take more than 15 minutes to complete.
.. note::
Alarms are generated as the subcloud software sync_status is "out-of-sync".
#. Complete the upgrade.
.. code-block:: none
~(keystone_admin)]$ software deploy complete
Deployment has been completed
Verify deploy state:
.. code-block:: none
~(keystone_admin)]$ software deploy show
+--------------+------------+------+-----------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-----------------------+
| 22.12.0 | 24.09.100 | True | deploy-completed |
+--------------+------------+------+-----------------------+
#. Upgrade Kubernetes, after the platform deploy is completed. To upgrade
Kubernetes of standalone system, see :ref:`index-updates-kub-03d4d10fa0be`.
#. When the Kubernetes upgrade completes, conclude the platform deploy by deleting
it.
.. code-block:: none
~(keystone_admin)]$ software deploy delete
Deploy deleted with success
Verify deploy state:
.. code-block:: none
~(keystone_admin)]$ software deploy show
No deploy in progress
#. Upload the load for subcloud deployment.
.. only:: starlingx
.. parsed-literal::
~(keystone_admin)]$ software --os-region-name SystemController upload --local /full_path/<bootimage>.iso /full_path/<bootimage>.sig
+-------------------------------+--------------------------+
| Uploaded File | Release |
+-------------------------------+--------------------------+
| starlingx-intel-x86-64-cd.iso | stx-10.0.0 |
+-------------------------------+--------------------------+
.. only:: partner
.. include:: /_includes/software-upload-output.rest
:start-after: software-load-begin
:end-before: software-load-end
.. note::
This can take a few minutes. After the system controller is successfully
deployed, the old load (which is in imported state) should not be deleted
from load list as this load is required for managing the subclouds that
are still running the previous load.
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
:start-after: DMupgrades-begin
:end-before: DMupgrades-end
.. rubric:: |postreq|
Separately apply the patches after the upgrade to the major release.