Introduction Updates and Upgrades (USM)

Add new introduction section.
Fix conflict.

Depends on https://review.opendev.org/c/starlingx/docs/+/923472

Story: 2010676
Task: 50141

Change-Id: I3125493f37666a7a2db5b3c013e82dc246309542
Signed-off-by: Elisamara Aoki Goncalves <elisamaraaoki.goncalves@windriver.com>
This commit is contained in:
Elisamara Aoki Goncalves 2024-07-09 13:04:53 +00:00
parent 44a05d9e9d
commit 70a2e95a03
7 changed files with 189 additions and 329 deletions

View File

@ -542,13 +542,11 @@
.. |firmware-update-orchestration-using-the-cli| replace:: :ref:`Firmware Update Orchestration Using the CLI <firmware-update-orchestration-using-the-cli>`
.. |configure-firmware-update-orchestration| replace:: :ref:`Configure Firmware Update Orchestration <configure-firmware-update-orchestration>`
.. |configuring-kubernetes-update-orchestration| replace:: :ref:`Create Kubernetes Version Upgrade Cloud Orchestration Strategy <configuring-kubernetes-update-orchestration>`
.. |software-upgrades| replace:: :ref:`Software Upgrades <software-upgrades>`
.. |configuring-kubernetes-multi-version-upgrade-orchestration-aio-b0b59a346466| replace:: :ref:`Configure Kubernetes Multi-Version Upgrade Cloud Orchestration for AIO-SX <configuring-kubernetes-multi-version-upgrade-orchestration-aio-b0b59a346466>`
.. |manual-kubernetes-components-upgrade| replace:: :ref:`Manual Kubernetes Version Upgrade <manual-kubernetes-components-upgrade>`
.. |overview-of-firmware-update-orchestration| replace:: :ref:`Overview <overview-of-firmware-update-orchestration>`
.. |handling-kubernetes-update-orchestration-failures| replace:: :ref:`Handle Kubernetes Version Upgrade Orchestration Failures <handling-kubernetes-update-orchestration-failures>`
.. |the-firmware-update-orchestration-process| replace:: :ref:`The Firmware Update Orchestration Process <the-firmware-update-orchestration-process>`
.. |software-updates-and-upgrades-software-updates| replace:: :ref:`Software Updates <software-updates-and-upgrades-software-updates>`
.. |manual-kubernetes-multi-version-upgrade-in-aio-sx-13e05ba19840| replace:: :ref:`Manual Kubernetes Multi-Version Upgrade in AIO-SX <manual-kubernetes-multi-version-upgrade-in-aio-sx-13e05ba19840>`
.. |contribute| replace:: :ref:`Contributor Guides <contribute>`
.. |configure-an-optional-cinder-file-system| replace:: :ref:`Configure the Optional Image Conversion File System <configure-an-optional-cinder-file-system>`

View File

@ -43,15 +43,16 @@ Follow the steps below to manually upgrade the system controller:
.. code-block:: none
~(keystone_admin)]$ system --os-region-name SystemController load-import --local <bootimage>.iso <bootimage>.sig
.. note::
Move the iso to ``/opt/backups`` before importing the load to avoid any
disk space issues during the load import.
~(keystone_admin)]$ software --os-region-name SystemController upload --local <bootimage>.iso <bootimage>.sig
+-------------------------------+-------------------+
| Uploaded File | Release |
+-------------------------------+-------------------+
| starlingx-intel-x86-64-cd.iso | starlingx-24.09.0 |
+-------------------------------+-------------------+
.. note::
If you face any issue while importing the load, go to
``/var/log/load-import.log`` and examine the error messages.
``/var/log/software.log`` and examine the error messages.
.. note::
This can take several minutes. After the system controller is successfully
@ -80,42 +81,29 @@ Follow the steps below to manually upgrade the system controller:
#. Confirm that the system is healthy.
Check the current system health status, resolve any alarms and other issues
reported by the :command:`system health-query-upgrade` command then recheck
the system health status to confirm that all **System Health** fields are
set to **OK**. "If the upgrade health query fails 'Boot Device and Root file
system Device' check as seen below:"
reported by the :command:`software deploy precheck <release-id>` command
then recheck the system health status to confirm that all **System Health**
fields are set to **OK**. "If the upgrade health query fails 'Boot Device
and Root file system Device' check as seen below:"
.. code-block:: none
~(keystone_admin)]$ system health-query-upgrade
~(keystone_admin)]$ software deploy precheck <release-id>
System Health:
All hosts are provisioned: [OK]
All hosts are unlocked/enabled: [OK]
All hosts have current configurations: [OK]
All hosts are patch current: [OK]
Ceph Storage Healthy: [OK]
No alarms: [OK]
All kubernetes nodes are ready: [OK]
All kubernetes control plane pods are ready: [OK]
All PodSecurityPolicies are removed: [OK]
Required patches are applied: [OK]
License valid for upgrade: [OK]
No instances running on controller-1: [OK]
All kubernetes applications are in a valid state: [OK]
Active controller is controller-0: [OK]
Disk space requirement: [OK]
Boot Device and Root file system Device: [Fail]
boot_device (/dev/sde) for controller-0 does not match root disk /dev/sda
rootfs_device (/dev/disk/by-path/pci-0000:00:1f.2-ata-1.0) for controller-0 does not match root disk /dev/sda
All hosts are patch current: [OK]
Valid upgrade path from release 22.12 to 24.09: [OK]
Required patches are applied: [OK]
Use the following commands to correct the boot_device and/or rootfs_device
settings if you encounter an error:
.. code-block:: none
~(keystone_admin)]$ system host-lock <hostname_or_id>
~(keystone_admin)]$ system host-update <hostname_or_id> boot_device=<boot_device> rootfs_device=<rootfs_device>
~(keystone_admin)]$ system host-unlock <hostname_or_id>
Where ``<release-id>`` is ``starlingx-24.09.0`` for above software upload
example, or it can be found out by running :command:`software list`.
By default, the upgrade process cannot run and is not recommended to run
with active alarms present. It is strongly recommended that you clear your
@ -136,15 +124,22 @@ Follow the steps below to manually upgrade the system controller:
.. code-block:: none
~(keystone_admin)]$ system upgrade-start
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
| state | starting |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
~(keystone_admin)]$ software deploy start <release-id>
+--------------+------------+------+--------------+
| From Release | To Release | RR | State |
+--------------+------------+------+--------------+
| 22.12.0 | 24.09.0 | True | deploy-start |
+--------------+------------+------+--------------+
When ``deploy start`` is complete:
.. code-block:: none
+--------------+------------+------+-------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-------------------+
| 22.12.0 | 24.09.0 | True | deploy-start-done |
+--------------+------------+------+-------------------+
This will make a copy of the system data to be used in the upgrade.
Configuration changes must not be made after this point, until the
@ -196,63 +191,9 @@ Follow the steps below to manually upgrade the system controller:
.. code-block:: none
~(keystone_admin)]$ system host-upgrade controller-1
Wait for controller-1 to reinstall with the load N+1 and enter the
``locked-disabled-online`` state.
controller-1 must pxe-boot over the management network and its load
must be served from controller-0, and not from any external
pxe-boot server attached to the |OAM| network. To ensure this,
check that the network boot list/order of BIOS |NIC| is correct.
The following data migration states apply when this command is executed.
- data-migration:
- State entered when :command:`system host-upgrade controller-1`
is executed.
- System data is being migrated from release N to release N+1.
- data-migration-complete or upgrading-controllers:
- State entered when controller-1 upgrade is complete.
- System data has been successfully migrated from release <nn.nn>
to release <nn.nn>.
where *nn.nn* in the update file name is the |prod| release number.
- data-migration-failed:
- State entered if data migration on controller-1 fails.
- Upgrade must be aborted.
#. Check the upgrade state.
.. code-block:: none
~(keystone_admin)]$ system upgrade-show
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | e7c8f6bc-518c-46d4-ab81-7a59f8f8e64b |
| state | data-migration-complete |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
If the :command:`upgrade-show` status indicates
'data-migration-failed', then there is an issue with the data
migration. Check the issue before proceeding to the next step.
.. note::
Do not unlock controller-1, before running the :command:`system upgrade-show`
command to display the upgrade status **data-migration-complete** or **upgrading-controllers**.
~(keystone_admin)]$ software deploy host controller-1
Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id>
Host installation was successful on controller-1
#. Unlock controller-1.
@ -276,9 +217,16 @@ Follow the steps below to manually upgrade the system controller:
controller-1, (for example, Error logs in
controller1:``/var/log/puppet``).
#. Run the :command:`system application-list` and :command:`system host-upgrade-list`
#. Run the :command:`system application-list` and :command:`software deploy host-list`
commands to view the current progress.
After controller-1 is unlocked/enabled/available, insert step to check
controller-1 is running the new release:
.. code-block:: none
~(keystone_admin)]$ system host-show controller-1
#. Set controller-1 as the active controller. Swact to controller-1.
.. code-block:: none
@ -296,8 +244,8 @@ Follow the steps below to manually upgrade the system controller:
#. Upgrade controller-0.
For more information, see :ref:`Updates and Upgrades
<software-updates-and-upgrades-software-updates>`.
For more information, see
:ref:`introduction-platform-software-updates-upgrades-06d6de90bbd0`.
#. Lock controller-0.
@ -309,10 +257,10 @@ Follow the steps below to manually upgrade the system controller:
.. code-block:: none
~(keystone_admin)]$ system host-upgrade controller-0
~(keystone_admin)]$ software deploy host controller-0
.. note::
controller-0 must pxe-boot over the management network and its load
must be served from controller-1, and not from any external
pxe-boot server attached to the |OAM| network. To ensure this,
@ -320,10 +268,14 @@ Follow the steps below to manually upgrade the system controller:
#. Unlock controller-0.
.. code-block:: none
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-0
.. code-block:: none
~(keystone_admin)]$ software deploy host controller-0
You may encounter the following error message:
.. code-block:: none
@ -372,7 +324,7 @@ Follow the steps below to manually upgrade the system controller:
.. code-block:: none
~(keystone_admin)]$ system host-upgrade storage-0
~(keystone_admin)]$ software deploy host storage-0
The upgrade is complete when the node comes online, and at that point,
you can safely unlock the node.
@ -412,12 +364,11 @@ Follow the steps below to manually upgrade the system controller:
~(keystone_admin)]$ system host-lock worker-0
#. Upgrade worker-0.
.. code-block:: none
~(keystone_admin)]$ system host-upgrade worker-0
~(keystone_admin)]$ software deploy host worker-0
Wait for the host to run the installer, reboot, and go online before
unlocking it in the next step.
@ -448,15 +399,29 @@ Follow the steps below to manually upgrade the system controller:
.. code-block:: none
~(keystone_admin)]$ system upgrade-activate
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
| state | activating |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
~(keystone_admin)]$ software deploy activate
Deploy activate has started
Check deploy state:
.. code-block:: none
~(keystone_admin)]$ software deploy show
+--------------+------------+------+-----------------+
| From Release | To Release | RR | State |
+--------------+------------+------+-----------------+
| 22.12.0 | 24.09.0 | True | deploy-activate |
+--------------+------------+------+-----------------+
When activate is complete:
.. code-block:: none
+--------------+------------+------+----------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+----------------------+
| 22.12.0 | 24.09.0 | True | deploy-activate-done |
+--------------+------------+------+----------------------+
During the running of the :command:`upgrade-activate` command, new
configurations are applied to the controller. 250.001 (**hostname
@ -514,19 +479,38 @@ Follow the steps below to manually upgrade the system controller:
.. code-block:: none
~(keystone_admin)]$ system upgrade-complete
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
| state | completing |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
~(keystone_admin)]$ software deploy complete
Deployment has been completed
Verify deploy state:
.. code-block:: none
~(keystone_admin)]$ software deploy show,
+--------------+------------+------+------------------+
| From Release | To Release | RR | State |
+--------------+------------+------+------------------+
| 22.12.0 | 24.09.0 | True | deploy-completed |
+--------------+------------+------+------------------+
Run the :command:`system upgrade-show` command, and the status will display
"no upgrade in progress". The subclouds will be out-of-sync.
#. Upgrade Kubernetes, after deploy is completed. When Kubernetes upgrade
completes, conclude the deploy by deleting it.
.. code-block:: none
~(keystone_admin)]$ software deploy delete, output
Deploy deleted with success
Verify deploy state:
.. code-block:: none
~(keystone_admin)]$ software deploy show, output
No deploy in progress
.. only:: partner
.. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest

View File

@ -18,7 +18,7 @@ however the specific procedure for incrementally uploading and applying one or
more patches for the SystemController is provided below.
For standard |prod| updating procedures, see the |updates-doc|:
:ref:`software-updates-and-upgrades-software-updates` guide.
:ref:`introduction-platform-software-updates-upgrades-06d6de90bbd0` guide.
For SystemController of |prod-dc| (and the central update repository), you
must include the additional |CLI| parameter ``--os-region-name`` with the value

View File

@ -16,8 +16,7 @@ Introduction
.. toctree::
:maxdepth: 1
software-updates-and-upgrades-software-updates
software-upgrades
introduction-platform-software-updates-upgrades-06d6de90bbd0
------------------------
Host software deployment

View File

@ -0,0 +1,80 @@
.. WARNING: Add no lines of text between the label immediately following
.. and the title.
.. _introduction-platform-software-updates-upgrades-06d6de90bbd0:
============
Introduction
============
|prod| software management enables you to upversion your |prod| software to a
new Patch Release or a new Major Release.
**Major Releases**
- deliver new and enhanced feature content,
- are packaged and delivered as Install ISOs containing all software packages.
**Patch Releases**
- deliver fixes for known bugs and CVE vulnerabilities,
- are packaged and delivered as patch archive files,
- containing only new and changed software packages,
- with meta data to indicate dependencies on previously released Patch
Releases or the associated Major Release.
Both Major Releases and Patch Releases are **cryptographically signed** to
ensure integrity and authenticity, and the StarlingX REST APIs, CLIs, and GUI
validate the signature of software releases before loading them into the
system.
Both Major Releases and Patch Releases can be deployed using either:
- :ref:`manual-host-software-deployment-ee17ec6f71a4`
or
- :ref:`orchestrated-deployment-host-software-deployment-d234754c7d20`
Both manual and orchestrated procedures use a '**rolling deployment**'
procedure for deploying the software of the new release. |prod| hosts are
updated/upgraded one (or more) at a time such that |prod| can continue to
provide hosting services to its hosted applications on other |prod| hosts.
Specifically:
- Controllers are updated/upgraded one at a time,
- then Storage hosts are updated/upgraded one (or more) at a time,
respecting Storage host redundancy,
- then Worker hosts are updated/upgraded one (or more) at a time.
For a Major Release deployment, the upgrading of a new Major Release will
result in a **reboot of each host**, as the host is upgraded with the new Major
Release, in order for the host to boot into the new software's root filesystem.
For a Patch Release deployment, depending on the type of software changes in
the Patch Release, one of two deployment modes will be used:
In-Service
in this mode, the upversioning to a new Patch Release will only result in
the install of new software and the restart of the required services, as
each host is upversioned with the new Patch Release.
Reboot-Required (RR)
in this mode, the upversioning to a new Patch Release will result in a
reboot of each host, as the host is upversioned to the new Patch Release.
For a Major Release only, an **Abort and Rollback** of the active software
deployment is supported. The deployment of a Major Release can be aborted and
rolled back at any step of the deployment process, as long as the active
deployment has not been both completed and deleted.
For a Patch Release only, a **removal or un-deployment of a release** is
supported. One or more Patch Releases can be removed/un-deployed by deploying a
previous Patch Release.

View File

@ -1,82 +0,0 @@
.. lei1552920487053
.. _software-updates-and-upgrades-software-updates:
================
Software Updates
================
|prod-long| software updates (also known as patches) must be applied to the
system in order to keep your system updated with feature enhancements, free of
known bugs, and security vulnerabilities.
|org| provides software updates that are cryptographically signed to ensure
integrity and authenticity. The |prod-long| REST APIs, CLIs and GUI validate
the signature of software updates before loading it into the system.
An update typically modifies a small portion of your system to address the
following items:
.. _software-updates-and-upgrades-software-updates-ul-gcd-smn-xw:
- bugs
- security vulnerabilities
- feature enhancements
Software updates can be installed manually or by the Update Orchestrator, which
automates a rolling install of an update across all of the |prod-long| hosts.
.. warning::
Do NOT use the |updates-doc| guide for |prod-dc| orchestrated
software updates. The |prod-dc| Update Orchestrator automates a
recursive rolling install of an update across all subclouds and all hosts
within the subclouds.
.. xbooklink For more information, see, |distcloud-doc|: :ref:`Update Management for
Distributed Cloud <update-management-for-distributed-cloud>`.
|prod| handles multiple updates being applied and removed at once. Software
updates can modify and update any area of |prod| software, including the kernel
itself.
.. For information on populating, installing and removing software
.. updates, see :ref:`Manage Software Updates <managing-software-updates>`.
There are two different kinds of Software updates that you can use to update
the |prod| software:
.. _software-updates-and-upgrades-software-updates-ol-kxm-wgv-njb:
#. **Software Updates**
These software updates deliver |prod| software updates containing ostree
commits for updating the |prod| software running directly on the hosts.
Software updates can be installed manually or by the Update Orchestrator
which automates a rolling install of an update across all of the
|prod-long| hosts.
.. For information on populating, installing and removing software updates, see :ref:`Manage Software Updates <managing-software-updates>`.
.. note::
A 10 GB internal management network is required for reboot-required
software update operations.
#. **Application Software Updates**
These software updates apply to software being managed through the
StarlingX Application Package Manager, that is, :command:`system
application-upload/apply/remove/delete`. |prod| delivers some software
through this mechanism, for example, ``platform-integ-apps``.
For software updates for these applications, download the updated
application tarball, containing the updated FluxCD manifest, and updated
Helm charts for the application, and apply the updates using the
:command:`system application-update` command.
.. xbooklink For more information, see,
:ref:`Cloud Platform Kubernetes Admin Tutorials
<about-the-admin-tutorials>`: :ref:`StarlingX Application Package Manager
<kubernetes-admin-tutorials-tarlingx-application-package-manager>`.

View File

@ -1,119 +0,0 @@
.. upe1593016272562
.. _software-upgrades:
=================
Software Upgrades
=================
|prod-long| upgrades enable you to move |prod| software from one release of
|prod| to the next release of |prod|.
.. contents:: |minitoc|
:local:
:depth: 1
|prod| software upgrade is a multi-step rolling-upgrade process, where |prod|
hosts are upgraded one at time while continuing to provide its hosting services
to its hosted applications. An upgrade can be performed manually or using
Upgrade Orchestration, which automates much of the upgrade procedure, leaving a
few manual steps to prevent operator oversight.
.. For more information on manual upgrades, see :ref:`Manual Platform Components Upgrade <manual-upgrade-overview>`.
.. For more information on upgrade orchestration, see :ref:`Orchestrated Platform Component Upgrade <orchestration-upgrade-overview>`.
.. warning::
Do NOT use information in the |updates-doc| guide for |prod-dc|
orchestrated software upgrades. If information in this document is used for
a |prod-dc| orchestrated upgrade, the upgrade will fail, resulting
in an outage. The |prod-dc| Upgrade Orchestrator automates a
recursive rolling upgrade of all subclouds and all hosts within the
subclouds.
.. xbooklink For more information on the |prod-dc| Upgrade Orchestrator, see,
|distcloud-doc|: :ref:`Upgrade Orchestration for Distributed Cloud
Subclouds Using CLI
<upgrade-orchestration-for-distributed-cloud-subclouds-using-the-cli>`.
Before starting the upgrades process:
.. _software-upgrades-ul-ant-vgq-gmb:
- The system must be 'patch current'.
- There must be no management-affecting alarms present on the system.
- Ensure that any certificates managed by cert manager will not be renewed
during the upgrade process.
- The new software load must be imported.
- A valid license file for the new software release must be installed.
The upgrade process starts by upgrading the controllers. The standby controller
is upgraded first and involves loading the standby controller with the new
release of software and migrating all the controller services' databases for the
new release of software. Activity is switched to the upgraded controller,
running in a 'compatibility' mode where all inter-node messages are using
message formats from the old release of software. Prior to upgrading the second
controller, you reach a "point-of-no-return for an in-service abort" of the
upgrades process. The second controller is loaded with the new release of
software and becomes the new Standby controller.
.. For more information on manual upgrades, see :ref:`Manual Platform Components Upgrade <manual-upgrade-overview>` .
If present, storage nodes are locked, upgraded and unlocked one at a time in
order to respect the redundancy model of |prod| storage nodes. Storage nodes
can be upgraded in parallel if using upgrade orchestration.
Worker nodes are then upgraded. Worker nodes are tainted when locked, such that
Kubernetes shuts down any pods on this worker node and restarts the pods on
another worker node. When upgrading the worker node, the worker node network
boots/installs the new software from the active controller. After unlocking the
worker node, the worker services are running in a 'compatibility' mode where all
inter-node messages are using message formats from the old release of software.
Note that the worker nodes can only be upgraded in parallel if using upgrade
orchestration.
The final step of the upgrade process is to activate and complete the upgrade.
This involves disabling 'compatibility' modes on all hosts and clearing the
Upgrade Alarm.
.. only:: partner
.. include:: /_includes/software-upgrades.rest
:start-after: software-upgrade-begin
:end-before: software-upgrade-end
.. _software-upgrades-section-N1002F-N1001F-N10001:
----------------------------------
Rolling Back / Aborting an Upgrade
----------------------------------
In general, any issues encountered during an upgrade should be addressed during
the upgrade with the intention of completing the upgrade after the issues are
resolved. Issues specific to a storage or worker host can be addressed by
temporarily downgrading the host, addressing the issues and then upgrading the
host again, or in some cases by replacing the node.
In extremely rare cases, it may be necessary to abort an upgrade. This is a last
resort and should only be done if there is no other way to address the issue
within the context of the upgrade. There are two scenarios for doing such an
abort:
.. _software-upgrades-ul-dqp-brt-cx:
- Before controller-0 has been upgraded (that is, only controller-1 has been
upgraded): In this case the upgrade can be aborted and the system will
remain in service during the abort.
.. See, :ref:`Rolling Back a Software Upgrade Before the Second Controller Upgrade <rolling-back-a-software-upgrade-before-the-second-controller-upgrade>`.
- After controller-0 has been upgraded (that is, both controllers have been
upgraded): In this case the upgrade can only be aborted with a complete
outage and a reinstall of all hosts. This would only be done as a last
resort, if there was absolutely no other way to recover the system.
.. See, :ref:`Rolling Back a Software Upgrade After the Second Controller Upgrade <rolling-back-a-software-upgrade-after-the-second-controller-upgrade>`.