diff --git a/doc/source/_includes/installing-software-updates-before-initial-commissioning.rest b/doc/source/_includes/installing-software-updates-before-initial-commissioning.rest new file mode 100644 index 000000000..982215f4b --- /dev/null +++ b/doc/source/_includes/installing-software-updates-before-initial-commissioning.rest @@ -0,0 +1 @@ +#. Install, configure and unlock nodes. \ No newline at end of file diff --git a/doc/source/_vendor/vendor_strings.txt b/doc/source/_vendor/vendor_strings.txt index c6444cc1e..115ab76bd 100755 --- a/doc/source/_vendor/vendor_strings.txt +++ b/doc/source/_vendor/vendor_strings.txt @@ -33,6 +33,7 @@ .. |admintasks-doc| replace:: :title:`StarlingX Administrator Tasks` .. |datanet-doc| replace:: :title:`StarlingX Data Networks` .. |os-intro-doc| replace:: :title:`OpenStack Introduction` +.. |updates-doc| replace:: :title:`StarlingX Updates and Upgrades` .. Name of downloads location @@ -54,4 +55,12 @@ .. |max-workers| replace:: 99 -.. |release-caveat| replace:: This is a pre-release feature and may not function as described in |prod| 5. +.. Product name used in patch file names + +.. |pn| replace:: STLX + +.. Product version used in patch file names + +.. |pvr| replace:: 00004 + +.. |release-caveat| replace:: This is a pre-release feature and may not function as described in |prod| 5 documentation. diff --git a/doc/source/updates/kubernetes/aborting-simplex-system-upgrades.rst b/doc/source/updates/kubernetes/aborting-simplex-system-upgrades.rst new file mode 100644 index 000000000..adb86d04e --- /dev/null +++ b/doc/source/updates/kubernetes/aborting-simplex-system-upgrades.rst @@ -0,0 +1,97 @@ + +.. syj1592947192958 +.. _aborting-simplex-system-upgrades: + +============================= +Abort Simplex System Upgrades +============================= + +You can abort a Simplex System upgrade before or after upgrading controller-0. + +.. _aborting-simplex-system-upgrades-section-N10025-N1001B-N10001: + +.. contents:: |minitoc| + :local: + :depth: 1 + +----------------------------- +Before upgrading controller-0 +----------------------------- + +.. _aborting-simplex-system-upgrades-ol-nlw-zbp-xdb: + +#. Abort the upgrade with the upgrade-abort command. + + .. code-block:: none + + $ system upgrade-abort + + The upgrade state is set to aborting. Once this is executed, there is no + canceling; the upgrade must be completely aborted. + +#. Complete the upgrade. + + .. code-block:: none + + $ system upgrade-complete + + At this time any upgrade data generated as part of the upgrade-start + command will be deleted. This includes the upgrade data in + /opt/platform-backup. + +.. _aborting-simplex-system-upgrades-section-N10063-N1001B-N10001: + +---------------------------- +After upgrading controller-0 +---------------------------- + +After controller-0 has been upgraded it is possible to roll back the software +upgrade. This involves performing a system restore with the previous release. + +.. _aborting-simplex-system-upgrades-ol-jmw-kcp-xdb: + +#. Abort the upgrade with the :command:`upgrade-abort` command. + + .. code-block:: none + + $ system upgrade-abort + + The upgrade state is set to aborting. Once this is executed, there is no + canceling; the upgrade must be completely aborted. + +#. Lock and downgrade controller-0 + + .. code-block:: none + + $ system host-lock controller-0 + $ system host-downgrade controller-0 + + The data is stored in /opt/platform-backup. Ensure the data is present,and + preserved through the downgrade. + +#. Install the previous release of |prod-long| Simplex software via network or + USB. + +#. Restore the system data. The restore is preserved in /opt/platform-backup. + + For more information, see, :ref:`Upgrading All-in-One Simplex + `. + +#. Abort the upgrade with the :command:`upgrade-abort` command. + + .. code-block:: none + + $ system upgrade-abort + + The system will be restored to the state when the :command:`upgrade-start` + command was issued. The :command:`upgrade-abort` command must be issued at + this time. + + The upgrade state is set to aborting. Once this is executed, there is no + canceling; the upgrade must be completely aborted. + +#. Complete the upgrade. + + .. code-block:: none + + $ system upgrade-complete diff --git a/doc/source/updates/kubernetes/configuring-update-orchestration.rst b/doc/source/updates/kubernetes/configuring-update-orchestration.rst new file mode 100644 index 000000000..bb0817acb --- /dev/null +++ b/doc/source/updates/kubernetes/configuring-update-orchestration.rst @@ -0,0 +1,192 @@ + +.. gep1552920534437 +.. _configuring-update-orchestration: + +============================== +Configure Update Orchestration +============================== + +You can configure update orchestration using the Horizon Web interface. + +.. rubric:: |context| + +The update orchestration interface is found in Horizon on the Patch +Orchestration tab, available from **Admin** \> **Platform** \> **Software +Management** in the left-hand pane. + +.. note:: + Management-affecting alarms cannot be ignored at the indicated severity + level or higher by using relaxed alarm rules during an orchestrated update + operation. For a list of management-affecting alarms, see |fault-doc|: + :ref:`Alarm Messages <100-series-alarm-messages>`. To display + management-affecting active alarms, use the following command: + + .. code-block:: none + + ~(keystone_admin)]$ fm alarm-list --mgmt_affecting + + During an orchestrated update operation, the following alarms are ignored + even when strict restrictions are selected: + + - 200.001, Maintenance host lock alarm + + - 900.001, Patch in progress + + - 900.005, Upgrade in progress + + - 900.101, Software patch auto apply in progress + +.. _configuring-update-orchestration-ul-qhy-q1p-v1b: + +.. rubric:: |prereq| + +You cannot successfully create an update \(patch\) strategy if any hosts show +**Patch Current** = **Pending**, indicating that the update status of these +hosts has not yet been updated. The creation attempt fails, and you must try +again. You can use :command:`sw-patch query-hosts` to review the current update +status before creating a update strategy. + +.. rubric:: |proc| + +#. Upload and apply your updates as described in :ref:`Manage Software Updates + ` \(do not lock any hosts or use + :command:`host-install` to install the updates on any hosts\). + +#. Select **Platform** \> **Software Management**, then select the **Patch + Orchestration** tab. + +#. Click the **Create Strategy** button. + + The Create Strategy dialog appears. + + .. image:: figures/zcj1567178380908.png + +#. Create a update strategy by specifying settings for the parameters in the + Create Strategy dialog box. + + **Description** field + Provides information about current alarms, including whether an alarm + is Management Affecting. + + **Controller Apply Type** + - Serial \(default\): controllers will be updated one at a time + \(standby controller first\) + + - Ignore: controllers will not be updated + + **Storage Apply Type** + - Serial \(default\): storage hosts will be updated one at a time + + - Parallel: storage hosts will be updated in parallel, ensuring that + only one storage node in each replication group is updated at a + time. + + - Ignore: storage hosts will not be updated + + **Worker Apply Type** + - Serial \(default\): worker hosts will be updated one at a time + + - Parallel: worker hosts will be updated in parallel + + - At most, **Parallel** will be updated at the same time. + + - For a reboot parallel update only, worker hosts with no pods + are updated before worker hosts with pods. + + - Parallel: specify the maximum worker hosts to update in parallel + \(minimum: 2, maximum: 100\) + + - Ignore: Worker hosts will not be updated + + **Default Instance Action** + This parameter only applies for systems with the stx-openstack + application. + + - Stop-Start \(default\): hosted applications VMs will be stopped + before a host is updated \(applies to reboot updates only\) + + - Migrate: hosted application VMs will be migrated off a host before + it is updated \(applies to reboot updates only\). + + **Alarm Restrictions** + This option lets you specify how update orchestration behaves when + alarms are present. + + You can use the CLI command :command:`fm alarm-list --mgmt_affecting` + to view the alarms that are management affecting. + + **Strict** + The default strict option will result in update orchestration + failing if there are any alarms present in the system \(except for a + small list of alarms\). + + **Relaxed** + This option allows orchestration to proceed if alarms are present, + as long as none of these alarms are management affecting. + +#. Click **Create Strategy** to save the update orchestration strategy. + + .. note:: + The update orchestration process ensures that no hosts are reported as + **Patch Status** = **Pending**. If any hosts have this status, the + creation attempt fails with an error message. Wait a few minutes and + try again. You can also use :command:`sw-patch query-hosts` to review + the current update status. + + Examine the update strategy. Pay careful attention to: + + + - The sets of hosts that will be updated together in each stage. + + - The sets of hosted application pods that will be impacted in each stage. + + + The update strategy has one or more stages, with each stage consisting of + one or more hosts to be updated at the same time. Each stage is split into + steps \(for example, :command:`query-alarms`, :command:`lock-hosts`, + :command:`sw-patch-hosts`\). Note the following about stages: + + .. note:: + + - Controller hosts are updated first, followed by storage hosts and + then worker hosts. + + - Worker hosts with no hosted application pods are updated before + worker hosts with hosted application pods. + + - The final step in each stage is "system-stabilize," which waits for + a period of time \(up to several minutes\) and ensures that the + system is free of alarms. This ensures that the update orchestrator + does not continue to update more hosts if the update application + has caused an issue resulting in an alarm. + + +#. Click the **Apply Strategy** button to apply the update- strategy. You can + optionally apply a single stage at a time by clicking the **Apply Stage** + button. + + When applying a single stage, you can only apply the next stage; you cannot + skip stages. + +#. To abort the update, click the **Abort Strategy** button. + + - While a update-strategy is being applied, it can be aborted. This + results in: + + - The current step being allowed to complete. + + - If necessary an abort phase will be created and applied, which will + attempt to unlock any hosts that were locked. + + .. note:: + If a update-strategy is aborted after hosts were locked, but before + they were updated, the hosts will not be unlocked, as this would result + in the updates being installed. You must either install the updates on + the hosts or remove the updates before unlocking the hosts. + +#. Delete the update strategy. + + After a update strategy has been applied \(or aborted\) it must be deleted + before another update strategy can be created. If a update strategy + application fails, you must address the issue that caused the failure, then + delete and re-create the strategy before attempting to apply it again. diff --git a/doc/source/updates/kubernetes/figures/ekn1453233538504.png b/doc/source/updates/kubernetes/figures/ekn1453233538504.png new file mode 100644 index 000000000..439e7e8be Binary files /dev/null and b/doc/source/updates/kubernetes/figures/ekn1453233538504.png differ diff --git a/doc/source/updates/kubernetes/figures/zcj1567178380908.png b/doc/source/updates/kubernetes/figures/zcj1567178380908.png new file mode 100644 index 000000000..dfddb0fb8 Binary files /dev/null and b/doc/source/updates/kubernetes/figures/zcj1567178380908.png differ diff --git a/doc/source/updates/kubernetes/identifying-the-software-version-and-update-level-using-horizon.rst b/doc/source/updates/kubernetes/identifying-the-software-version-and-update-level-using-horizon.rst new file mode 100644 index 000000000..ca630177d --- /dev/null +++ b/doc/source/updates/kubernetes/identifying-the-software-version-and-update-level-using-horizon.rst @@ -0,0 +1,35 @@ + +.. kiv1552920729184 +.. _identifying-the-software-version-and-update-level-using-horizon: + +============================================================ +Identify the Software Version and Update Level Using Horizon +============================================================ + +You can view the current software version and update level from the Horizon Web +interface. The system type is also shown. + +.. rubric:: |proc| + +#. In the |prod| Horizon, open the System Configuration page. + + The System Configuration page is available from **Admin** \> **Platform** + \> **System Configuration** in the left-hand pane. + +#. Select the **Systems** tab to view the software version. + + The software version is shown in the **Version** field. + + The type of system selected at installation \(Standard or All-in-one\) is + shown in the **System Type** field. The mode \(**simplex**, **duplex**, or + **standard**\) is shown in the **System Mode** field. + +#. In the |prod| Horizon interface, open the Software Management page. + + The Software Management page is available from **Admin** \> **Platform** \> + **Software Management** in the left-hand pane. + +#. Select the **Patches** tab to view update information. + + The **Patches** tab shows the Patch ID, a Summary description, the Status + of the Patch, and an Actions button to use to select an appropriate action. diff --git a/doc/source/updates/kubernetes/identifying-the-software-version-and-update-level-using-the-cli.rst b/doc/source/updates/kubernetes/identifying-the-software-version-and-update-level-using-the-cli.rst new file mode 100644 index 000000000..8acae841c --- /dev/null +++ b/doc/source/updates/kubernetes/identifying-the-software-version-and-update-level-using-the-cli.rst @@ -0,0 +1,58 @@ + +.. lob1552920716157 +.. _identifying-the-software-version-and-update-level-using-the-cli: + +============================================================ +Identify the Software Version and Update Level Using the CLI +============================================================ + +You can view the current software version and update level from the CLI. The +system type is also shown. + +.. rubric:: |context| + +For more about working with software updates, see :ref:`Manage Software Updates +` + +.. rubric:: |proc| + +.. _identifying-the-software-version-and-update-level-using-the-cli-steps-smg-b4r-hkb: + +- To find the software version from the CLI, use the :command:`system show` + command. + + .. code-block:: none + + ~(keystone_admin)]$ system show + +----------------------+----------------------------------------------------+ + | Property | Value | + +----------------------+----------------------------------------------------+ + | contact | None | + | created_at | 2020-02-27T15:29:26.140606+00:00 | + | description | yow-cgcs-ironpass-1_4 | + | https_enabled | False | + | location | None | + | name | yow-cgcs-ironpass-1-4 | + | region_name | RegionOne | + | sdn_enabled | False | + | security_feature | spectre_meltdown_v1 | + | service_project_name | services | + | software_version | 20.06 | + | system_mode | duplex | + | system_type | Standard | + | timezone | UTC | + | updated_at | 2020-02-28T16:19:56.987581+00:00 | + | uuid | 90212c98-7e27-4a14-8981-b8f5b777b26b | + | vswitch_type | none | + +----------------------+----------------------------------------------------+ + + .. note:: + The **system\_mode** field is shown only for a |prod| Simplex or Duplex + system. + +- To list applied software updates from the CLI, use the :command:`sw-patch + query` command. + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch query diff --git a/doc/source/updates/kubernetes/in-service-versus-reboot-required-software-updates.rst b/doc/source/updates/kubernetes/in-service-versus-reboot-required-software-updates.rst new file mode 100644 index 000000000..299647044 --- /dev/null +++ b/doc/source/updates/kubernetes/in-service-versus-reboot-required-software-updates.rst @@ -0,0 +1,24 @@ + +.. gwe1552920505159 +.. _in-service-versus-reboot-required-software-updates: + +================================================== +In-Service Versus Reboot-Required Software Updates +================================================== + +In-Service \(Reboot-not-Required\) and a Reboot-Required software updates are +available depending on the nature of the update to be performed. + +In-Service software updates provides a mechanism to issue updates that do not +require a reboot, allowing the update to be installed on in-service nodes and +restarting affected processes as needed. + +Depending on the area of software being updated and the type of software +change, installation of the update may or may not require the |prod| hosts to +be rebooted. For example, a software update to the kernel would require the +host to be rebooted in order to apply the update. Software updates are +classified as reboot-required or reboot-not-required \(also referred to as +in-service\) type updates to indicate this. For reboot-required updates, the +hosted application pods are automatically relocated to an alternate host as +part of the update procedure, prior to applying the update and rebooting the +host. diff --git a/doc/source/updates/kubernetes/index.rst b/doc/source/updates/kubernetes/index.rst index 94af1d888..294d8951a 100644 --- a/doc/source/updates/kubernetes/index.rst +++ b/doc/source/updates/kubernetes/index.rst @@ -3,6 +3,49 @@ Kubernetes ========== +------------ +Introduction +------------ + +.. toctree:: + :maxdepth: 1 + + software-updates-and-upgrades-software-updates + software-upgrades + +----------------------- +Manual software updates +----------------------- + +.. toctree:: + :maxdepth: 1 + + managing-software-updates + in-service-versus-reboot-required-software-updates + identifying-the-software-version-and-update-level-using-horizon + identifying-the-software-version-and-update-level-using-the-cli + populating-the-storage-area + update-status-and-lifecycle + installing-software-updates-before-initial-commissioning + installing-reboot-required-software-updates-using-horizon + installing-reboot-required-software-updates-using-the-cli + installing-in-service-software-update-using-horizon + installing-in-service-software-updates-using-the-cli + removing-reboot-required-software-updates + software-update-space-reclamation + reclaiming-disk-space + +---------------------------- +Orchestrated Software Update +---------------------------- + +.. toctree:: + :maxdepth: 1 + + update-orchestration-overview + configuring-update-orchestration + update-orchestration-cli + --------------------------------- Manual Kubernetes Version Upgrade --------------------------------- @@ -27,3 +70,52 @@ Kubernetes Version Upgrade Cloud Orchestration configuring-kubernetes-update-orchestration handling-kubernetes-update-orchestration-failures +---------------------------------- +Manual Platform components upgrade +---------------------------------- + +.. toctree:: + :maxdepth: 1 + + manual-upgrade-overview + +****************** +All-in-one Simplex +****************** + +.. toctree:: + :maxdepth: 1 + + upgrading-all-in-one-simplex + aborting-simplex-system-upgrades + +****************** +All-in-one Duplex +****************** + +.. toctree:: + :maxdepth: 1 + + upgrading-all-in-one-duplex-or-standard + overview-of-upgrade-abort-procedure + +****************** +Roll back upgrades +****************** + +.. toctree:: + :maxdepth: 1 + + rolling-back-a-software-upgrade-before-the-second-controller-upgrade + rolling-back-a-software-upgrade-after-the-second-controller-upgrade + +--------------------------------------- +Orchestrated Platform component upgrade +--------------------------------------- + +.. toctree:: + :maxdepth: 1 + + orchestration-upgrade-overview + performing-an-orchestrated-upgrade + performing-an-orchestrated-upgrade-using-the-cli diff --git a/doc/source/updates/kubernetes/installing-in-service-software-update-using-horizon.rst b/doc/source/updates/kubernetes/installing-in-service-software-update-using-horizon.rst new file mode 100644 index 000000000..ee0ba687b --- /dev/null +++ b/doc/source/updates/kubernetes/installing-in-service-software-update-using-horizon.rst @@ -0,0 +1,81 @@ + +.. jfc1552920636790 +.. _installing-in-service-software-update-using-horizon: + +================================================ +Install In-Service Software Update Using Horizon +================================================ + +The procedure for applying an in-service update is similar to that of a +reboot-required update, except that the host does not need to be locked and +unlocked as part of applying the update. + +.. rubric:: |proc| + +.. _installing-in-service-software-update-using-horizon-steps-x1b-qnv-vw: + +#. Log in to the Horizon Web interface as the **admin** user. + +#. In |prod| Horizon, open the Software Management page. + + The Software Management page is available from **Admin** \> **Platform** \> + **Software Management** in the left-hand pane. + +#. Select the Patches tab to see the current update status. + + The Patches page shows the current status of all updates uploaded to the + system. If there are no updates, an empty Patch Table is displayed. + +#. Upload the update \(patch\) file to the update storage area. + + Click the **Upload Patch** button to display an upload window from which + you can browse your workstation's file system to select the update file. + Click the **Upload Patch** button once the selection is done. + + The update file is transferred to the Active Controller and is copied to + the update storage area, but it has yet to be applied to the cluster. This + is reflected in the Patches page. + +#. Apply the update. + + Click the **Apply Patch** button associated with the update. Alternatively, + select the update first using the selection boxes on the left, and then + click the **Apply Patches** button at the top. You can use this selection + process to apply all updates, or a selected subset, in a single operation. + + The Patches page is updated to report the update to be in the + *Partial-Apply* state. + +#. Install the update on **controller-0**. + + #. Select the **Hosts** tab. + + The **Hosts** tab on the Host Inventory page reflects the new status of + the hosts with respect to the new update state. In this example, the + update only applies to controller software, as can be seen by the + worker host's status field being empty, indicating that it is 'patch + current'. + + .. image:: figures/ekn1453233538504.png + + #. Next, select the Install Patches option from the **Edit Host** button + associated with **controller-0** to install the update. + + A confirmation window is presented giving you a last opportunity to + cancel the operation before proceeding. + +#. Repeat the steps 6 a,b, above with **controller-1** to install the update + on **controller-1**. + +#. Repeat the steps 6 a,b above for the worker and/or storage hosts \(if + present\). + + This step does not apply for |prod| Simplex or Duplex systems. + +#. Verify the state of the update. + + Visit the Patches page again. The update is now in the *Applied* state. + +.. rubric:: |result| + +The update is now applied, and all affected hosts have been updated. diff --git a/doc/source/updates/kubernetes/installing-in-service-software-updates-using-the-cli.rst b/doc/source/updates/kubernetes/installing-in-service-software-updates-using-the-cli.rst new file mode 100644 index 000000000..3f62cede5 --- /dev/null +++ b/doc/source/updates/kubernetes/installing-in-service-software-updates-using-the-cli.rst @@ -0,0 +1,131 @@ + +.. hfj1552920618138 +.. _installing-in-service-software-updates-using-the-cli: + +================================================= +Install In-Service Software Updates Using the CLI +================================================= + +The procedure for applying an in-service update is similar to that of a +reboot-required update, except that the host does not need to be locked and +unlocked as part of applying the update. + +.. rubric:: |proc| + +#. Upload the update \(patch\). + + .. code-block:: none + + $ sudo sw-patch upload INSVC_HORIZON_SYSINV.patch + INSVC_HORIZON_SYSINV is now available + +#. Confirm that the update is available. + + .. code-block:: none + + $ sudo sw-patch query + Patch ID RR Release Patch State + ==================== == ======= =========== + INSVC_HORIZON_SYSINV N 20.04 Available + +#. Check the status of the hosts. + + .. code-block:: none + + $ sudo sw-patch query-hosts + Hostname IP Address Patch Current Reboot Required Release State + ============ ============== ============= =============== ======= ===== + worker-0 192.168.204.24 Yes No 20.01 idle + controller-0 192.168.204.3 Yes No 20.01 idle + controller-1 192.168.204.4 Yes No 20.01 idle + +#. Ensure the original update files have been deleted from the root drive. + + After they are uploaded to the storage area, the original files are no + longer required. You must use the command-line interface to delete them, in + order to ensure enough disk space to complete the installation. + + .. code-block:: none + + $ rm + + .. caution:: + If the original files are not deleted before the updates are applied, + the installation may fail due to a full disk. + +#. Apply the update \(patch\). + + .. code-block:: none + + $ sudo sw-patch apply INSVC_HORIZON_SYSINV + INSVC_HORIZON_SYSINV is now in the repo + + The update state transitions to Partial-Apply: + + .. code-block:: none + + $ sudo sw-patch query + Patch ID RR Release Patch State + ==================== == ======= ============= + INSVC_HORIZON_SYSINV N 20.04 Partial-Apply + + As it is an in-service update, the hosts report that they are not 'patch + current', but they do not require a reboot. + + .. code-block:: none + + $ sudo sw-patch query-hosts + Hostname IP Address Patch Current Reboot Required Release State + ============ ============== ============= =============== ======= ===== + worker-0 192.168.204.24 No No 20.04 idle + controller-0 192.168.204.3 No No 20.04 idle + controller-1 192.168.204.4 No No 20.04 idle + + +#. Install the update on controller-0. + + .. code-block:: none + + $ sudo sw-patch host-install controller-0 + ............. + Installation was successful. + +#. Query the hosts to check status. + + .. code-block:: none + + $ sudo sw-patch query-hosts + Hostname IP Address Patch Current Reboot Required Release State + ============ ============== ============= =============== ======= ===== + worker-0 192.168.204.24 No No 20.01 idle + controller-0 192.168.204.3 Yes No 20.01 idle + controller-1 192.168.204.4 No No 20.01 idle + + The controller-1 host reports it is now 'patch current' and does not + require a reboot, without having been locked or rebooted + +#. Install the update on worker-0 \(and other worker nodes and storage nodes, + if present\) + + .. code-block:: none + + $ sudo sw-patch host-install worker-0 + .... + Installation was successful. + + You can query the hosts to confirm that all nodes are now 'patch current', + and that the update has transitioned to the Applied state. + + .. code-block:: none + + $ sudo sw-patch query-hosts + Hostname IP Address Patch Current Reboot Required Release State + ============ ============== ============= =============== ======= ===== + worker-0 192.168.204.24 Yes No 20.04 idle + controller-0 192.168.204.3 Yes No 20.04 idle + controller-1 192.168.204.4 Yes No 20.04 idle + + $ sudo sw-patch query + Patch ID RR Release Patch State + ==================== == ======= =========== + INSVC_HORIZON_SYSINV N 20.04 Applied diff --git a/doc/source/updates/kubernetes/installing-reboot-required-software-updates-using-horizon.rst b/doc/source/updates/kubernetes/installing-reboot-required-software-updates-using-horizon.rst new file mode 100644 index 000000000..6e1fc2aea --- /dev/null +++ b/doc/source/updates/kubernetes/installing-reboot-required-software-updates-using-horizon.rst @@ -0,0 +1,126 @@ + +.. phg1552920664442 +.. _installing-reboot-required-software-updates-using-horizon: + +====================================================== +Install Reboot-Required Software Updates Using Horizon +====================================================== + +You can use the Horizon Web interface to upload, delete, apply, and remove +software updates. + +.. rubric:: |context| + +This section presents an example of a software update workflow using a single +update. The main steps of the procedure are: + + +.. _installing-reboot-required-software-updates-using-horizon-ul-mbr-wsr-s5: + +- Upload the updates. + +- Lock the host\(s\). + +- Install updates; any unlocked nodes will reject the request. + +- Unlock the host\(s\). Unlocking the host\(s\) automatically triggers a + reboot. + +.. rubric:: |proc| + +.. _installing-reboot-required-software-updates-using-horizon-steps-lnt-14y-hjb: + +#. Log in to the Horizon Web interface interface as the **admin** user. + +#. In Horizon, open the Software Management page. + + The Software Management page is available from **Admin** \> **Platform** \> + **Software Management** in the left-hand pane. + +#. Select the Patches tab to see the current status. + + The Patches page shows the current status of all updates uploaded to the + system. If there are no updates, an empty Patch Table is displayed. + +#. Upload the update \(patch\) file to the update storage area. + + Click the **Upload Patches** button to display an upload window from which + you can browse your workstation's file system to select the update file. + Click the **Upload Patches** button once the selection is done. + + The update file is transferred to the Active Controller and is copied to + the storage area, but it has yet to be applied to the cluster. This is + reflected in the Patches page. + +#. Apply the update. + + Click the **Apply Patch** button associated with the update. Alternatively, + select the update first using the selection boxes on the left, and then + click the **Apply Patches** button at the top. You can use this selection + process to apply all updates, or a selected subset, in a single operation. + + The Patches page is updated to report the update to be in the + *Partial-Apply* state. + +#. Install the update on **controller-0**. + + .. _installing-reboot-required-software-updates-using-horizon-step-N10107-N10028-N1001C-N10001: + + #. Select the **Hosts** tab. + + The **Hosts** tab on the Host Inventory page reflects the new status of + the hosts with respect to the new update state. As shown below, both + controllers are now reported as not 'patch current' and requiring + reboot. + + .. image:: figures/ekn1453233538504.png + + #. Transfer active services to the standby controller by selecting the + **Swact Host** option from the **Edit Host** button associated with the + active controller host. + + .. note:: + Access to Horizon may be lost briefly during the active controller + transition. You may have to log in again. + + #. Select the Lock Host option from the **Edit Host** button associated + with **controller-0**. + + #. Select the Install Patches option from the **Edit Host** button + associated with **controller-0** to install the update. + + A confirmation window is presented giving you a last opportunity to + cancel the operation before proceeding. + + Wait for the update install to complete. + + #. Select the Unlock Host option from the **Edit Host** button associated + with controller-0. + +#. Repeat steps :ref:`6 + ` + a to e, with **controller-1** to install the update on **controller-1**. + + .. note:: + For |prod| Simplex systems, this step does not apply. + +#. Repeat steps :ref:`6 + ` + a to e, for the worker and/or storage hosts. + + .. note:: + For |prod| Simplex or Duplex systems, this step does not apply. + +#. Verify the state of the update. + + Visit the Patches page. The update is now in the Applied state. + + +.. rubric:: |result| + +The update is applied now, and all affected hosts have been updated. + +Updates can be removed using the **Remove Patches** button from the Patches +page. The workflow is similar to the one presented in this section, with the +exception that updates are being removed from each host instead of being +applied. diff --git a/doc/source/updates/kubernetes/installing-reboot-required-software-updates-using-the-cli.rst b/doc/source/updates/kubernetes/installing-reboot-required-software-updates-using-the-cli.rst new file mode 100644 index 000000000..f9494e5dd --- /dev/null +++ b/doc/source/updates/kubernetes/installing-reboot-required-software-updates-using-the-cli.rst @@ -0,0 +1,295 @@ + +.. ffh1552920650754 +.. _installing-reboot-required-software-updates-using-the-cli: + +====================================================== +Install Reboot-Required Software Updates Using the CLI +====================================================== + +You can install reboot-required software updates using the CLI. + +.. rubric:: |proc| + + +.. _installing-reboot-required-software-updates-using-the-cli-steps-v1q-vlv-vw: + +#. Log in as user **sysadmin** to the active controller and source the script + /etc/platform/openrc to obtain administrative privileges. + +#. Verify that the updates are available using the :command:`sw-patch query` + command. + + .. parsed-literal:: + + ~(keystone_admin)]$ sudo sw-patch query + + Patch ID Patch State + ===================== =========== + |pn|-nn.nn_PATCH_0001 Available + |pn|-nn.nn_PATCH_0002 Available + |pn|-nn.nn_PATCH_0003 Available + + where *nn.nn* in the update \(patch\) filename is the |prod| release number. + +#. Ensure the original update files have been deleted from the root drive. + + After the updates are uploaded to the storage area, the original files are + no longer required. You must use the command-line interface to delete them, + in order to ensure enough disk space to complete the installation. + + .. code-block:: none + + $ rm + + .. caution:: + If the original files are not deleted before the updates are applied, + the installation may fail due to a full disk. + +#. Apply the update. + + .. parsed-literal:: + + ~(keystone_admin)]$ sudo sw-patch apply |pn|-nn.nn_PATCH_0001 + |pn|-nn.nn_PATCH_0001 is now in the repo + + where nn.nn in the update filename is the |prod-long| release number. + + The update is now in the Partial-Apply state, ready for installation from + the software updates repository on the impacted hosts. + +#. Apply all available updates in a single operation, for example: + + .. parsed-literal:: + + ~(keystone_admin)]$ sudo sw-patch apply --all + |pn|-|pvr|-PATCH_0001 is now in the repo + |pn|-|pvr|-PATCH_0002 is now in the repo + |pn|-|pvr|-PATCH_0003 is now in the repo + + In this example, there are three updates ready for installation from the + software updates repository. + +#. Query the updating status of all hosts in the cluster. + + You can query the updating status of all hosts at any time as illustrated + below. + + .. note:: + The reported status is the accumulated result of all applied and + removed updates in the software updates repository, and not just the + status due to a particular update. + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch query-hosts + + Hostname IP Address Patch Current Reboot Required Release State + ============ ============== ============= =============== ======= ===== + worker-0 192.168.204.12 Yes No 20.04 idle + controller-0 192.168.204.3 Yes Yes 20.04 idle + controller-1 192.168.204.4 Yes Yes 20.04 idle + + + For each host in the cluster, the following status fields are displayed: + + **Patch Current** + Indicates whether there are updates pending for installation or removal + on the host or not. If *Yes*, then all relevant updates in the software + updates repository have been installed on, or removed from, the host + already. If *No*, then there is at least one update in either the + Partial-Apply or Partial-Remove state that has not been applied to the + host. + + The **Patch Current** field of the :command:`query-hosts` command will + briefly report “Pending” after you apply or remove an update, until + that host has checked against the repository to see if it is impacted + by the patching operation. + + **Reboot Required** + Indicates whether the host must be rebooted or not as a result of one + or more updates that have been either applied or removed, or because it + is not 'patch current'. + + **Release** + Indicates the running software release version. + + **State** + There are four possible states: + + **idle** + In a wait state. + + **installing** + Installing \(or removing\) updates. + + **install-failed** + The operation failed, either due to an update error or something + killed the process. Check the patching.log on the node in question. + + **install-rejected** + The node is unlocked, therefore the request to install has been + rejected. This state persists until there is another install + request, or the node is reset. + + Once the state has gone back to idle, the install operation is complete + and you can safely unlock the node. + + In this example, **worker-0** is up to date, no updates need to be + installed and no reboot is required. By contrast, the controllers are not + 'patch current', and therefore a reboot is required as part of installing + the update. + +#. Install all pending updates on **controller-0**. + + + #. Switch the active controller services. + + .. code-block:: none + + ~(keystone_admin)]$ system host-swact controller-0 + + Before updating a controller node, you must transfer any active + services running on the host to the other controller. Only then it is + safe to lock the host. + + #. Lock the host. + + You must lock the target host, controller, worker, or storage, before + installing updates. + + .. code-block:: none + + ~(keystone_admin)]$ system host-lock controller-0 + + #. Install the update. + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch host-install + + .. note:: + You can use the :command:`sudo sw-patch host-install-async` + command if you are launching multiple installs in + parallel. + + #. Unlock the host. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock controller-0 + + Unlocking the host forces a reset of the host followed by a reboot. + This ensures that the host is restarted in a known state. + + All updates are now installed on **controller-0**. Querying the current + update status displays the following information: + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch query-hosts + + Hostname IP Address Patch Current Reboot Required Release State + ============ ============== ============= =============== ======= ===== + compute-0 192.168.204.95 Yes No 20.04 idle + compute-1 192.168.204.63 Yes No 20.04 idle + compute-2 192.168.204.99 Yes No 20.04 idle + compute-3 192.168.204.49 Yes No 20.04 idle + controller-0 192.168.204.3 Yes No 20.04 idle + controller-1 192.168.204.4 Yes No 20.04 idle + storage-0 192.168.204.37 Yes No 20.04 idle + storage-1 192.168.204.90 Yes No 20.04 idle + +#. Install all pending updates on **controller-1**. + + .. note:: + For |prod| Simplex systems, this step does not apply. + + Repeat the previous step targeting **controller-1**. + + All updates are now installed on **controller-1** as well. Querying the + current updating status displays the following information: + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch query-hosts + + Hostname IP Address Patch Current Reboot Required Release State + ============ ============== ============= =============== ======= ===== + compute-0 192.168.204.95 Yes No 20.04 idle + compute-1 192.168.204.63 Yes No 20.04 idle + compute-2 192.168.204.99 Yes No 20.04 idle + compute-3 192.168.204.49 Yes No 20.04 idle + controller-0 192.168.204.3 Yes No 20.04 idle + controller-1 192.168.204.4 Yes No 20.04 idle + storage-0 192.168.204.37 Yes No 20.04 idle + storage-1 192.168.204.90 Yes No 20.04 idle + +#. Install any pending updates for the worker or storage hosts. + + .. note:: + For |prod| Simplex or Duplex systems, this step does not apply. + + All hosted application pods currently running on a worker host are + re-located to another host. + + If the **Patch Current** status for a worker or storage host is **No**, + apply the pending updates using the following commands: + + .. code-block:: none + + ~(keystone_admin)]$ system host-lock + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch host-install-async + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock + + where is the name of the host \(for example, **worker-0**\). + + .. note:: + Update installations can be triggered in parallel. + + The :command:`sw-patch host-install-async` command \(**install + patches** on the Horizon Web interface\) can be run on all locked + nodes, without waiting for one node to complete the install before + triggering the install on the next. If you can lock the nodes at the + same time, without impacting hosted application services, you can + update them at the same time. + + Likewise, you can install an update to the standby controller and a + worker node at the same time. The only restrictions are those of the + lock: You cannot lock both controllers, and you cannot lock a worker + node if you do not have enough free resources to relocate the hosted + applications from it. Also, in a Ceph configuration \(with storage + nodes\), you cannot lock more than one of + controller-0/controller-1/storage-0 at the same time, as these nodes + are running Ceph monitors and you must have at least two in service at + all times. + +#. Confirm that all updates are installed and the |prod| is up-to-date. + + Use the :command:`sw-patch query` command to verify that all updates are + **Applied**. + + .. parsed-literal:: + + ~(keystone_admin)]$ sudo sw-patch query + + Patch ID Patch State + ========================= =========== + |pn|-nn.nn_PATCH_0001 Applied + + where *nn.nn* in the update filename is the |prod| release number. + + If the **Patch State** for any update is still shown as **Available** or + **Partial-Apply**, use the **sw-patch query-hosts** command to identify + which hosts are not **Patch Current**, and then apply updates to them as + described in the preceding steps. + + +.. rubric:: |result| + +The |prod| is up to date now. All updates are installed. diff --git a/doc/source/updates/kubernetes/installing-software-updates-before-initial-commissioning.rst b/doc/source/updates/kubernetes/installing-software-updates-before-initial-commissioning.rst new file mode 100644 index 000000000..5c1879ead --- /dev/null +++ b/doc/source/updates/kubernetes/installing-software-updates-before-initial-commissioning.rst @@ -0,0 +1,105 @@ + +.. tla1552920677022 +.. _installing-software-updates-before-initial-commissioning: + +===================================================== +Install Software Updates Before Initial Commissioning +===================================================== + +This section describes installing software updates before you can commission +|prod-long|. + +.. rubric:: |context| + +This procedure assumes that the software updates to install are available on a +USB flash drive, or from a server reachable by **controller-0**. + +.. rubric:: |prereq| + +When initially installing the |prod-long| software, it is required that you +install the latest available updates on **controller-0** before running Ansible +Bootstrap Playbook, and before installing the software on other hosts. This +ensures that: + +.. _installing-software-updates-before-initial-commissioning-ul-gsq-1ht-vp: + +- The software on **controller-0**, and all other hosts, is up to date when + the cluster comes alive. + +- You reduce installation time by avoiding updating the system right after an + out-of-date software installation is complete. + +.. rubric:: |proc| + +#. Install software on **controller-0**. + + Use the |prod-long| bootable ISO image to initialize **controller-0**. + + This step takes you to the point where you use the console port to log in + to **controller-0** as user **sysadmin**. + +#. Populate the storage area. + + Upload the updates from the USB flash drive using the command + :command:`sw-patch upload` or :command:`sw-patch upload-dir` as described + in :ref:`Populating the Storage Area `. + +#. Delete the update files from the root drive. + + After the updates are uploaded to the storage area, the original files are + no longer required. You must delete them to ensure enough disk space to + complete the installation. + + .. caution:: + If the original files are not deleted before the updates are applied, + the installation may fail due to a full disk. + +#. Apply the updates. + + Apply the updates using the command :command:`sw-patch apply --all`. + + The updates are now in the repository, ready to be installed. + +#. Install the updates on the controller. + + .. code-block:: none + + $ sudo sw-patch install-local + Patch installation is complete. + Please reboot before continuing with configuration. + + This command installs all applied updates on **controller-0**. + +#. Reboot **controller-0**. + + You must reboot the controller to ensure that it is running with the + software fully updated. + + .. code-block:: none + + $ sudo reboot + +#. Bootstrap system on controller-0. + + #. Configure an IP interface. + + .. note:: + The |prod| software will automatically enable all interfaces and + send out a |DHCP| request, so this may happen automatically if a + |DHCP| Server is present on the network. Otherwise, you must + manually configure an IP interface. + + #. Run the Ansible Bootstrap Playbook. This can be run remotely or locally + on controller-0. + +.. include:: /_includes/installing-software-updates-before-initial-commissioning.rest + +.. rubric:: |result| + +Once all hosts in the cluster are initialized and they are all running fully +updated software. The |prod-long| cluster is up to date. + + +.. xbooklink From step 1 + For details, see :ref:`Install Software on controller-0 + ` for your system. \ No newline at end of file diff --git a/doc/source/updates/kubernetes/managing-software-updates.rst b/doc/source/updates/kubernetes/managing-software-updates.rst new file mode 100644 index 000000000..b5e7b1a20 --- /dev/null +++ b/doc/source/updates/kubernetes/managing-software-updates.rst @@ -0,0 +1,108 @@ + +.. kol1552920779041 +.. _managing-software-updates: + +======================= +Manage Software Updates +======================= + +Updates \(also known as patches\) to the system software become available as +needed to address issues associated with a current |prod-long| software +release. Software updates must be uploaded to the active controller and applied +to all required hosts in the cluster. + +.. note:: + Updating |prod-dc| is distinct from updating other |prod| configurations. + +.. xbooklink For information on updating |prod-dc|, see |distcloud-doc|: :ref:`Update + Management for Distributed Cloud + `. + +The following elements form part of the software update environment: + +**Reboot-Required Software Updates** + Reboot-required updates are typically major updates that require hosts to + be locked during the update process and rebooted to complete the process. + + .. note:: + When a |prod| host is locked and rebooted for updates, the hosted + application pods are re-located to an alternate host in order to + minimize the impact to the hosted application service. + +**In-Service Software Updates** + In-service \(reboot-not-required\), software updates are updates that do + not require the locking and rebooting of hosts. The required |prod| + software is updated and any required |prod| processes are re-started. + Hosted applications pods and services are completely unaffected. + +**Software Update Commands** + The :command:`sw-patch` command is available on both active controllers. It + must be run as root using :command:`sudo`. It provides the user interface + to process the updates, including querying the state of an update, listing + affected hosts, and applying, installing, and removing updates. + +**Software Update Storage Area** + A central storage area maintained by the update controller. Software + updates are initially uploaded to the storage area and remains there until + they are deleted. + +**Software Update Repository** + A central repository of software updates associated with any updates + applied to the system. This repository is used by all hosts in the cluster + to identify the software updates and rollbacks required on each host. + +**Software Update Logs** + The following logs are used to record software update activity: + + **patching.log** + This records software update agent activity on each host. + + **patching-api.log** + This records user actions that involve software updates, performed + using either the CLI or the REST API. + +The overall flow for installing a software update from the command line +interface on a working |prod| cluster is the following: + +.. _managing-software-updates-ol-vgf-yzz-jp: + +#. Consult the |org| support personnel for details on the availability of new + software updates. + +#. Download the software update from the |org| servers to a workstation that + can reach the active controller through the |OAM| network. + +#. Copy the software update to the active controller using the cluster's |OAM| + floating IP address as the destination point. + + You can use a command such as :command:`scp` to copy the software update. + The software update workflows presented in this document assume that this + step is complete already, that is, they assume that the software update is + already available on the file system of the active controller. + +#. Upload the new software update to the storage area. + + This step makes the new software update available within the system, but + does not install it to the cluster yet. For all purposes, the software + update is dormant. + +#. Apply the software update. + + This step adds the updates to the repository, making it visible to all + hosts in the cluster. + +#. Install the software updates on each of the affected hosts in the cluster. + This can be done manually or by using upgrade orchestration. For more + information, see :ref:`Update Orchestration Overview + `. + +Updating software in the system can be done using the Horizon Web interface or +the command line interface on the active controller. When using Horizon you +upload the software update directly from your workstation using a file browser +window provided by the software update upload facility. + +A special case occurs during the initial provisioning of a cluster when you +want to update **controller-0** before the system software is configured. This +can only be done from the command line interface. See :ref:`Install Software +Updates Before Initial Commissioning +` for details. diff --git a/doc/source/updates/kubernetes/manual-upgrade-overview.rst b/doc/source/updates/kubernetes/manual-upgrade-overview.rst new file mode 100644 index 000000000..13188dec6 --- /dev/null +++ b/doc/source/updates/kubernetes/manual-upgrade-overview.rst @@ -0,0 +1,39 @@ + +.. mzg1592854560344 +.. _manual-upgrade-overview: + +======================= +Manual Upgrade Overview +======================= + +|prod-long| enables you to upgrade the software across your Simplex, Duplex, +Standard, |prod-dc|, and subcloud deployments. + +.. note:: + Upgrading |prod-dc| is distinct from upgrading other |prod| configurations. + +.. xbooklink For information on updating |prod-dc|, see |distcloud-doc|: :ref:`Upgrade + Management `. + +An upgrade can be performed manually or by the Upgrade Orchestrator which +automates a rolling install of an update across all of the |prod-long| hosts. +This section describes the manual upgrade procedures. + +.. xbooklink For the orchestrated + procedure, see |distcloud-doc|: :ref:`Orchestration Upgrade Overview + `. + +Before starting the upgrades process, the system must be “patch current,” there +must be no management-affecting alarms present on the system, the new software +load must be imported, and a valid license file for the upgrade must be +installed. + +The upgrade procedure is different for the All-in-One Simplex configuration +versus the All-in-One Duplex, and Standard configurations. For more +information, see: + +.. _manual-upgrade-overview-ul-bcp-ght-cmb: + +- :ref:`Upgrading All-in-One Simplex ` + +- :ref:`Upgrading All-in-One Duplex / Standard ` diff --git a/doc/source/updates/kubernetes/orchestration-upgrade-overview.rst b/doc/source/updates/kubernetes/orchestration-upgrade-overview.rst new file mode 100644 index 000000000..77e01e62f --- /dev/null +++ b/doc/source/updates/kubernetes/orchestration-upgrade-overview.rst @@ -0,0 +1,135 @@ + +.. bla1593031188931 +.. _orchestration-upgrade-overview: + +============================== +Upgrade Orchestration Overview +============================== + +Upgrade Orchestration automates much of the upgrade procedure, leaving a few +manual steps for operator oversight. + +.. contents:: |minitoc| + :local: + :depth: 1 + +.. note:: + Upgrading of |prod-dc| is distinct from upgrading other |prod| + configurations. + +.. xbooklink For information on updating |prod-dc|, see |distcloud-doc|: + :ref:`Upgrade Management `. + +.. note:: + The upgrade orchestration CLI is :command:`sw-manager`.To use upgrade + orchestration commands, you need administrator privileges. You must log in + to the active controller as user **sysadmin** and source the + /etc/platform/openrc script to obtain administrator privileges. Do not use + **sudo**. + +.. code-block:: none + + ~(keystone_admin)]$ sw-manager upgrade-strategy --help + usage: sw-manager upgrade-strategy [-h] ... + + optional arguments: + -h, --help show this help message and exit + + Software Upgrade Commands: + + create Create a strategy + delete Delete a strategy + apply Apply a strategy + abort Abort a strategy + show Show a strategy + +.. _orchestration-upgrade-overview-section-N10029-N10026-N10001: + +---------------------------------- +Upgrade Orchestration Requirements +---------------------------------- + +Upgrade orchestration can only be done on a system that meets the following +conditions: + +.. _orchestration-upgrade-overview-ul-blp-gcx-ry: + +- The system is clear of alarms \(with the exception of the alarm upgrade in + progress\). + +- All hosts must be unlocked, enabled, and available. + +- The system is fully redundant \(two controller nodes available, at least + one complete storage replication group available for systems with Ceph + backend\). + +- An upgrade has been started, and controller-1 has been upgraded and is + active. + +- No update orchestration strategy exists. An upgrade cannot be orchestrated + while update orchestration is in progress. + +- Sufficient free capacity or unused worker resources must be available + across the cluster. A rough calculation is: Required spare capacity \( %\) + = \(Number of hosts to upgrade in parallel divided by the total number of + hosts\) times 100. + +.. _orchestration-upgrade-overview-section-N10081-N10026-N10001: + +--------------------------------- +The Upgrade Orchestration Process +--------------------------------- + +Upgrade orchestration can be initiated after the manual upgrade and stability +of the initial controller host. Upgrade orchestration automatically iterates +through the remaining hosts, installing the new software load on each one: +first the other controller host, then the storage hosts, and finally the worker +hosts. During worker host upgrades, pods are moved to alternate worker hosts +automatically. + +The user first creates an upgrade orchestration strategy, or plan, for the +automated upgrade procedure. This customizes the upgrade orchestration, using +parameters to specify: + +.. _orchestration-upgrade-overview-ul-eyw-fyr-31b: + +- the host types to be upgraded + +- whether to upgrade hosts serially or in parallel + +Based on these parameters, and the state of the hosts, upgrade orchestration +creates a number of stages for the overall upgrade strategy. Each stage +generally consists of moving pods, locking hosts, installing upgrades, and +unlocking hosts for a subset of the hosts on the system. + +After creating the upgrade orchestration strategy, the user can either apply +the entire strategy automatically, or apply individual stages to control and +monitor its progress manually. + +Update and upgrade orchestration are mutually exclusive; they perform +conflicting operations. Only a single strategy \(sw-patch or sw-upgrade\) is +allowed to exist at a time. If you need to update during an upgrade, you can +abort/delete the sw-upgrade strategy, and then create and apply a sw-patch +strategy before going back to the upgrade. + +Some stages of the upgrade could take a significant amount of time \(hours\). +For example, after upgrading a storage host, re-syncing the OSD data could take +30m per TB \(assuming 500MB/s sync rate, which is about half of a 10G +infrastructure link\). + +.. _orchestration-upgrade-overview-section-N10101-N10026-N10001: + +------------------------------ +Upgrade Orchestration Workflow +------------------------------ + +The Upgrade Orchestration procedure has several major parts: + +.. _orchestration-upgrade-overview-ul-r1k-wzj-wy: + +- Manually upgrade controller-1. + +- Orchestrate the automatic upgrade of the remaining controller, all the + storage nodes, and all the worker nodes. + +- Manually complete the upgrade. diff --git a/doc/source/updates/kubernetes/overview-of-upgrade-abort-procedure.rst b/doc/source/updates/kubernetes/overview-of-upgrade-abort-procedure.rst new file mode 100644 index 000000000..ef07a2d1f --- /dev/null +++ b/doc/source/updates/kubernetes/overview-of-upgrade-abort-procedure.rst @@ -0,0 +1,29 @@ + +.. yim1593277634652 +.. _overview-of-upgrade-abort-procedure: + +=================================== +Overview of Upgrade Abort Procedure +=================================== + +You can abort an upgrade procedure if necessary. + +There are two cases for aborting an upgrade: + + +.. _overview-of-upgrade-abort-procedure-ul-q5f-vmz-bx: + +- Before controller-0 has been upgraded \(that is, only controller-1 has been + upgraded\): In this case the upgrade can be aborted and the system will + remain in service during the abort. + +- After controller-0 has been upgraded \(that is, both controllers have been + upgraded\): In this case the upgrade can only be aborted with a complete + outage and a re-install of all hosts. This would only be done as a last + resort, if there was absolutely no other way to recover the system. + +- :ref:`Rolling Back a Software Upgrade Before the Second Controller Upgrade + ` + +- :ref:`Rolling Back a Software Upgrade After the Second Controller Upgrade + ` diff --git a/doc/source/updates/kubernetes/performing-an-orchestrated-upgrade-using-the-cli.rst b/doc/source/updates/kubernetes/performing-an-orchestrated-upgrade-using-the-cli.rst new file mode 100644 index 000000000..9ee760104 --- /dev/null +++ b/doc/source/updates/kubernetes/performing-an-orchestrated-upgrade-using-the-cli.rst @@ -0,0 +1,185 @@ + +.. kad1593196868935 +.. _performing-an-orchestrated-upgrade-using-the-cli: + +============================================= +Perform an Orchestrated Upgrade Using the CLI +============================================= + +The upgrade orchestration CLI is :command:`sw-manager`. + +.. rubric:: |context| + +.. note:: + To use upgrade orchestration commands, you need administrator privileges. + You must log in to the active controller as user **sysadmin** and source the + /etc/platform/openrc script to obtain administrator privileges. Do not use + **sudo**. + +The upgrade strategy options are shown in the following output: + +.. code-block:: none + + ~(keystone_admin)]$ sw-manager upgrade-strategy --help + usage: sw-manager upgrade-strategy [-h] ... + + optional arguments: + -h, --help show this help message and exit + + Software Upgrade Commands: + + create Create a strategy + delete Delete a strategy + apply Apply a strategy + abort Abort a strategy + show Show a strategy + +You can perform a partially orchestrated upgrade using the CLI. Upgrade and +stability of the initial controller node must be done manually before using +upgrade orchestration to orchestrate the remaining nodes of the |prod|. + +.. note:: + Management-affecting alarms cannot be ignored at the indicated severity + level or higher by using relaxed alarm rules during an orchestrated upgrade + operation. For a list of management-affecting alarms, see |fault-doc|: + :ref:`Alarm Messages `. To display + management-affecting active alarms, use the following command: + + .. code-block:: none + + ~(keystone_admin)]$ fm alarm-list --mgmt_affecting + + During an orchestrated upgrade, the following alarms are ignored even when + strict restrictions are selected: + + - 900.005, Upgrade in progress + + - 900.201, Software upgrade auto apply in progress + +.. _performing-an-orchestrated-upgrade-using-the-cli-ul-qhy-q1p-v1b: + +.. rubric:: |prereq| + +See :ref:`Upgrading All-in-One Duplex / Standard +` to manually upgrade the initial +controller node before doing the upgrade orchestration described below to +upgrade the remaining nodes of the |prod|. + +.. rubric:: |proc| + +.. _performing-an-orchestrated-upgrade-using-the-cli-steps-e45-kh5-sy: + +#. Create a update strategy using the :command:`sw-manager` upgrade-strategy + command. + + .. code-block:: none + + ~(keystone_admin)]$ sw-manager upgrade-strategy create + + Create an upgrade strategy, specifying the following parameters: + + + - storage-apply-type: + + + - serial \(default\): storage hosts will be upgraded one at a time + + - parallel: storage hosts will be upgraded in parallel, ensuring that + only one storage node in each replication group is patched at a + time. + + - ignore: storage hosts will not be upgraded + + - worker-apply-type: + + **serial** \(default\) + Worker hosts will be upgraded one at a time. + + **ignore** + Worker hosts will not be upgraded. + + - Alarm Restrictions + + This option lets you determine how to handle alarm restrictions based + on the management affecting statuses of any existing alarms, which + takes into account the alarm type as well as the alarm's current + severity. If set to relaxed, orchestration will be allowed to proceed + if there are no management affecting alarms present. + + Performing management actions without specifically relaxing the alarm + checks will still fail if there are any alarms present in the system + \(except for a small list of basic alarms for the orchestration actions + such as an upgrade operation in progress alarm not impeding upgrade + orchestration\). + + You can use the CLI command :command:`fm alarm-list --mgmt_affecting` + to view the alarms that are management affecting. + + **Strict** + Maintains alarm restrictions. + + **Relaxed** + Relaxes the usual alarm restrictions and allows the action to + proceed if there are no alarms present in the system with a severity + equal to or greater than its management affecting severity. + + The upgrade strategy consists of one or more stages, which consist of one + or more hosts to be upgraded at the same time. Each stage will be split + into steps \(for example, query-alarms, lock-hosts, upgrade-hosts\). + Following are some notes about stages: + + - Controller-0 is upgraded first, followed by storage hosts and then + worker hosts. + + - Worker hosts with no instances are upgraded before worker hosts with + application pods. + + - Pods will be relocated off each worker host before it is upgraded. + + - The final step in each stage is one of: + + **system-stabilize** + This waits for a period of time \(up to several minutes\) and + ensures that the system is free of alarms. This ensures that we do + not continue to upgrade more hosts if the upgrade has caused an + issue resulting in an alarm. + + **wait-data-sync** + This waits for a period of time \(up to many hours\) and ensures + that data synchronization has completed after the upgrade of a + controller or storage node. + + Examine the upgrade strategy. Pay careful attention to: + + - The sets of hosts that will be upgraded together in each stage. + + - The sets of pods that will be impacted in each stage. + + .. note:: + It is likely that as each stage is applied, pods will be relocated + to worker hosts that have not yet been upgraded. That means that + later stages will be relocating more pods than those originally + listed in the upgrade strategy. The upgrade strategy is NOT + updated, but any additional pods on each worker host will be + relocated before it is upgraded. + +#. Apply the upgrade-strategy. You can optionally apply a single stage at a time. + + .. code-block:: none + + ~(keystone_admin)]$ sw-manager upgrade-strategy apply + + While an upgrade-strategy is being applied, it can be aborted. This results + in: + + + - The current step will be allowed to complete. + + - If necessary an abort phase will be created and applied, which will + attempt to unlock any hosts that were locked. + + After an upgrade-strategy has been applied \(or aborted\) it must be + deleted before another upgrade-strategy can be created. If an + upgrade-strategy application fails, you must address the issue that caused + the failure, then delete/re-create the strategy before attempting to apply + it again. diff --git a/doc/source/updates/kubernetes/performing-an-orchestrated-upgrade.rst b/doc/source/updates/kubernetes/performing-an-orchestrated-upgrade.rst new file mode 100644 index 000000000..e0abe7afe --- /dev/null +++ b/doc/source/updates/kubernetes/performing-an-orchestrated-upgrade.rst @@ -0,0 +1,169 @@ + +.. sab1593196680415 +.. _performing-an-orchestrated-upgrade: + +=============================== +Perform an Orchestrated Upgrade +=============================== + +You can perform a partially-Orchestrated Upgrade of a |prod| system using the CLI and Horizon Web interface. Upgrade and stability of the initial controller node must be done manually before using upgrade orchestration to orchestrate the remaining nodes of the |prod|. + +.. rubric:: |context| + +.. note:: + Management-affecting alarms cannot be ignored at the indicated severity + level or higher by using relaxed alarm rules during an orchestrated upgrade + operation. For a list of management-affecting alarms, see |fault-doc|: + :ref:`Alarm Messages `. To display + management-affecting active alarms, use the following command: + + .. code-block:: none + + ~(keystone_admin)]$ fm alarm-list --mgmt_affecting + + During an orchestrated upgrade, the following alarms are ignored even when + strict restrictions are selected: + + - 750.006, Automatic application re-apply is pending + + - 900.005, Upgrade in progress + + - 900.201, Software upgrade auto apply in progress + +.. _performing-an-orchestrated-upgrade-ul-qhy-q1p-v1b: + +.. rubric:: |prereq| + +See :ref:`Upgrading All-in-One Duplex / Standard +`, to manually upgrade the initial +controller node before doing the upgrade orchestration described below to +upgrade the remaining nodes of the |prod| system. + +.. rubric:: |proc| + +.. _performing-an-orchestrated-upgrade-steps-e45-kh5-sy: + +#. Select **Platform** \> **Software Management**, then select the **Upgrade + Orchestration** tab. + +#. Click the **Create Strategy** button. + + The Create Strategy dialog appears. + +#. Create an upgrade strategy by specifying settings for the parameters in the + Create Strategy dialog box. + + Create an upgrade strategy, specifying the following parameters: + + - storage-apply-type: + + **serial** \(default\) + Storage hosts will be upgraded one at a time. + + **parallel** + Storage hosts will be upgraded in parallel, ensuring that only one + storage node in each replication group is upgraded at a time. + + **ignore** + Storage hosts will not be upgraded. + + - worker-apply-type: + + **serial** \(default\): + Worker hosts will be upgraded one at a time. + + **parallel** + Worker hosts will be upgraded in parallel, ensuring that: + + - At most max-parallel-worker-hosts \(see below\) worker hosts + will be upgraded at the same time. + + - At most half of the hosts in a host aggregate will be upgraded + at the same time. + + - Worker hosts with no application pods are upgraded before + worker hosts with application pods. + + **ignore** + Worker hosts will not be upgraded. + + **max-parallel-worker-hosts** + Specify the maximum worker hosts to upgrade in parallel \(minimum: + 2, maximum: 10\). + + + **alarm-restrictions** + This option lets you specify how upgrade orchestration behaves when + alarms are present. + + You can use the CLI command :command:`fm alarm-list + --mgmt_affecting` to view the alarms that are management affecting. + + **Strict** + The default strict option will result in upgrade orchestration + failing if there are any alarms present in the system \(except + for a small list of alarms\). + + **Relaxed** + This option allows orchestration to proceed if alarms are + present, as long as none of these alarms are management + affecting. + +#. Click **Create Strategy** to save the upgrade orchestration strategy. + + The upgrade strategy consists of one or more stages, which consist of one + or more hosts to be upgraded at the same time. Each stage will be split + into steps \(for example, query-alarms, lock-hosts, upgrade-hosts\). + Following are some notes about stages: + + - Controller-0 is upgraded first, followed by storage hosts and then + worker hosts. + + - Worker hosts with no application pods are upgraded before worker hosts + with application pods. + + - Pods will be moved off each worker host before it is upgraded. + + - The final step in each stage is one of: + + **system-stabilize** + This waits for a period of time \(up to several minutes\) and + ensures that the system is free of alarms. This ensures that we do + not continue to upgrade more hosts if the upgrade has caused an + issue resulting in an alarm. + + **wait-data-sync** + This waits for a period of time \(up to many hours\) and ensures + that data synchronization has completed after the upgrade of a + controller or storage node. + + Examine the upgrade strategy. Pay careful attention to: + + - The sets of hosts that will be upgraded together in each stage. + + - The sets of pods that will be impacted in each stage. + + .. note:: + It is likely that as each stage is applied, application pods will + be relocated to worker hosts that have not yet been upgraded. That + means that later stages will be migrating more pods than those + originally listed in the upgrade strategy. The upgrade strategy is + NOT updated, but any additional pods on each worker host will be + relocated before it is upgraded. + +#. Apply the upgrade-strategy. You can optionally apply a single stage at a + time. + + While an upgrade-strategy is being applied, it can be aborted. This results + in: + + - The current step will be allowed to complete. + + - If necessary an abort phase will be created and applied, which will + attempt to unlock any hosts that were locked. + + After an upgrade-strategy has been applied \(or aborted\) it must be + deleted before another upgrade-strategy can be created. If an + upgrade-strategy application fails, you must address the issue that caused + the failure, then delete/re-create the strategy before attempting to apply + it again. diff --git a/doc/source/updates/kubernetes/populating-the-storage-area.rst b/doc/source/updates/kubernetes/populating-the-storage-area.rst new file mode 100644 index 000000000..2d724c38d --- /dev/null +++ b/doc/source/updates/kubernetes/populating-the-storage-area.rst @@ -0,0 +1,74 @@ + +.. fek1552920702618 +.. _populating-the-storage-area: + +========================= +Populate the Storage Area +========================= + +Software updates \(patches\) have to be uploaded to the |prod| storage area +before they can be applied. + +.. rubric:: |proc| + +#. Log in as **sysadmin** to the active controller. + +#. Upload the update file to the storage area. + + .. parsed-literal:: + + $ sudo sw-patch upload /home/sysadmin/patches/|pn|-CONTROLLER__PATCH_0001.patch + Cloud_Platform__CONTROLLER_nn.nn_PATCH_0001 is now available + + where *nn.nn* in the update file name is the |prod| release number. + + This example uploads a single update to the storage area. You can specify + multiple update files on the same command separating their names with + spaces. + + Alternatively, you can upload all update files stored in a directory using + a single command, as illustrated in the following example: + + .. code-block:: none + + $ sudo sw-patch upload-dir /home/sysadmin/patches + + The update is now available in the storage area, but has not been applied + to the update repository or installed to the nodes in the cluster. + +#. Verify the status of the update. + + .. code-block:: none + + $ sudo sw-patch query + + The update state is *Available* now, indicating that it is included in the + storage area. Further details about the updates can be retrieved as + follows: + +#. Delete the update files from the root drive. + + After the updates are uploaded to the storage area, the original files are + no longer required. You must delete them to ensure there is enough disk + space to complete the installation. + + .. code-block:: none + + $ rm /home/sysadmin/patches/* + + .. caution:: + If the original files are not deleted before the updates are applied, + the installation may fail due to a full disk. + +.. rubric:: |postreq| + +When an update in the *Available* state is no longer required, you can delete +it using the following command: + +.. parsed-literal:: + + $ sudo sw-patch delete |pn|-|pvr|-PATCH_0001 + +The update to delete from the storage area is identified by the update +\(patch\) ID reported by the :command:`sw-patch query` command. You can provide +multiple patch IDs to the delete command, separating their names by spaces. diff --git a/doc/source/updates/kubernetes/reclaiming-disk-space.rst b/doc/source/updates/kubernetes/reclaiming-disk-space.rst new file mode 100644 index 000000000..a35f34052 --- /dev/null +++ b/doc/source/updates/kubernetes/reclaiming-disk-space.rst @@ -0,0 +1,95 @@ + +.. ngk1552920570137 +.. _reclaiming-disk-space: + +================== +Reclaim Disk Space +================== + +You can free up and reclaim disk space taken by previous updates once a newer +version of an update has been committed to the system. + +.. rubric:: |proc| + +#. Run the :command:`query-dependencies` command to show a list of updates + that are required by the specified update \(patch\), including itself. + + .. code-block:: none + + sw-patch query-dependences [ --recursive ] + + The :command:`query-dependencies` command will show a list of updates that + are required by the specified update \(including itself\). The + **--recursive** option will crawl through those dependencies to return a + list of all the updates in the specified update's dependency tree. This + query is used by the “commit” command in calculating the set of updates to + be committed.For example, + + .. parsed-literal:: + + controller-0:/home/sysadmin# sw-patch query-dependencies |pn|-|pvr|-PATCH_0004 + |pn|-|pvr|-PATCH_0002 + |pn|-|pvr|-PATCH_0003 + |pn|-|pvr|-PATCH_0004 + + controller-0:/home/sysadmin# sw-patch query-dependencies |pn|-|pvr|-PATCH_0004 --recursive + |pn|-|pvr|-PATCH_0001 + |pn|-|pvr|-PATCH_0002 + |pn|-|pvr|-PATCH_0003 + |pn|-|pvr|-PATCH_0004 + +#. Run the :command:`sw-patch commit` command. + + .. code-block:: none + + sw-patch commit [ --dry-run ] [ --all ] [ --release ] [ … ] + + The :command:`sw-patch commit` command allows you to specify a set of + updates to be committed. The commit set is calculated by querying the + dependencies of each specified update. + + The **--all** option, without the **--release** option, commits all updates + of the currently running release. When two releases are on the system use + the **--release** option to specify a particular release's updates if + committing all updates for the non-running release. The **--dry-run** + option shows the list of updates to be committed and how much disk space + will be freed up. This information is also shown without the **--dry-run** + option, before prompting to continue with the operation. An update can only + be committed once it has been fully applied to the system, and cannot be + removed after. + + Following are examples that show the command usage. + + The following command lists the status of all updates that are in an + APPLIED state. + + .. code-block:: none + + controller-0:/home/sysadmin# sw-patch query + + The following command commits the updates. + + .. parsed-literal:: + + controller-0:/home/sysadmin# sw-patch commit |pvr|-PATCH_0001 |pvr|-PATCH_0002 + The following patches will be committed: + |pvr|-PATCH_0001 + |pvr|-PATCH_0002 + + This commit operation would free 2186.31 MiB + + WARNING: Committing a patch is an irreversible operation. Committed patches + cannot be removed. + + Would you like to continue? [y/N]: y + The patches have been committed. + + The following command shows the updates now in the COMMITTED state. + + .. parsed-literal:: + + controller-0:/home/sysadmin# sw-patch query + Patch ID RR Release Patch State + ================ ===== ======== ========= + |pvr|-PATCH_0001 N |pvr| Committed + |pvr|-PATCH_0002 Y |pvr| Committed diff --git a/doc/source/updates/kubernetes/removing-reboot-required-software-updates.rst b/doc/source/updates/kubernetes/removing-reboot-required-software-updates.rst new file mode 100644 index 000000000..e5654187f --- /dev/null +++ b/doc/source/updates/kubernetes/removing-reboot-required-software-updates.rst @@ -0,0 +1,117 @@ + +.. scm1552920603294 +.. _removing-reboot-required-software-updates: + +======================================= +Remove Reboot-Required Software Updates +======================================= + +Updates in the *Applied* or *Partial-Apply* states can be removed if necessary, +for example, when they trigger undesired or unplanned effects on the cluster. + +.. rubric:: |context| + +Rolling back updates is conceptually identical to installing updates. A +roll-back operation can be commanded for an update in either the *Applied* or +the *Partial-Apply* states. As the update is removed, it goes through the +following state transitions: + +**Applied or Partial-Apply to Partial-Remove** + An update in the *Partial-Remove* state indicates that it has been removed + from zero or more, but not from all, the applicable hosts. + + Use the command :command:`sw-patch remove` to trigger this transition. + +**Partial-Remove to Available** + Use the command :command:`sudo sw-patch host-install-async` + repeatedly targeting each one of the applicable hosts in the cluster. The + transition to the *Available* state is complete when the update is removed + from all target hosts. The update remains in the update storage area as if + it had just been uploaded. + + .. note:: + The command :command:`sudo sw-patch host-install-async` both + installs and removes updates as necessary. + +The following example describes removing an update that applies only to the +controllers. Removing updates can be done using the Horizon Web interface, +also, as discussed in :ref:`Install Reboot-Required Software Updates Using +Horizon `. + +.. rubric:: |proc| + +#. Log in as Keystone user **admin** to the active controller. + +#. Verify the state of the update. + + .. parsed-literal:: + + ~(keystone_admin)]$ sudo sw-patch query + Patch ID Patch State + ========================= =========== + |pn|-|pvr|-PATCH_0001 Applied + + In this example the update is listed in the *Applied* state, but it could + be in the *Partial-Apply* state as well. + +#. Remove the update. + + .. parsed-literal:: + + ~(keystone_admin)]$ sudo sw-patch remove |pn|-|pvr|-PATCH_0001 + |pn|-|pvr|-PATCH_0001 has been removed from the repo + + The update is now in the *Partial-Remove* state, ready to be removed from + the impacted hosts where it was already installed. + +#. Query the updating status of all hosts in the cluster. + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch query-hosts + + Hostname IP Address Patch Current Reboot Required Release State + ============ =============== ============= =============== ======= ===== + compute-0 192.168.204.179 Yes No 20.04 idle + compute-1 192.168.204.173 Yes No 20.04 idle + controller-0 192.168.204.3 No No 20.04 idle + controller-1 192.168.204.4 No No 20.04 idle + storage-0 192.168.204.213 Yes No 20.04 idle + storage-1 192.168.204.181 Yes No 20.04 idle + + + In this example, the controllers have updates ready to be removed, and + therefore must be rebooted. + +#. Remove all pending-for-removal updates from **controller-0**. + + #. Swact controller services away from controller-0. + + #. Lock controller-0. + + #. Run the updating \(patching\) sequence. + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch host-install-async + + #. Unlock controller-0. + +#. Remove all pending-for-removal updates from controller-1. + + #. Swact controller services away from controller-1. + + #. Lock controller-1. + + #. Run the updating sequence. + + #. Unlock controller-1. + + .. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch host-install-async + +.. rubric:: |result| + +The cluster is up to date now. All updates have been removed, and the update +|pn|-|pvr|-PATCH_0001 can be deleted from the storage area if necessary. diff --git a/doc/source/updates/kubernetes/rolling-back-a-software-upgrade-after-the-second-controller-upgrade.rst b/doc/source/updates/kubernetes/rolling-back-a-software-upgrade-after-the-second-controller-upgrade.rst new file mode 100644 index 000000000..6a5c14124 --- /dev/null +++ b/doc/source/updates/kubernetes/rolling-back-a-software-upgrade-after-the-second-controller-upgrade.rst @@ -0,0 +1,129 @@ + +.. eiu1593277809293 +.. _rolling-back-a-software-upgrade-after-the-second-controller-upgrade: + +================================================================ +Roll Back a Software Upgrade After the Second Controller Upgrade +================================================================ + +After the second controller is upgraded, you can still roll back a software +upgrade, however, the rollback will impact the hosting of applications. + +.. rubric:: |proc| + +#. Run the :command:`upgrade-abort` command to abort the upgrade. + + .. code-block:: none + + $ system upgrade-abort + + Once this is done there is no going back; the upgrade must be completely + aborted. + + The following state applies when you run this command. + + - aborting-reinstall: + + - State entered when :command:`system upgrade-abort` is executed + after upgrading controller-0. + + - Remain in this state until the abort is completed. + +#. Make controller-1 active. + + .. code-block:: none + + $ system host-swact controller-0 + +#. Lock controller-0. + + .. code-block:: none + + $ system host-lock controller-0 + +#. Wipe the disk and power down all storage \(if applicable\) and worker hosts. + + .. note:: + Skip this step if doing this procedure on a |prod| Duplex system. + + #. Execute :command:`wipedisk` from the shell on each storage or worker + host. + + #. Power down each host. + +#. Lock all storage \(if applicable\) and worker hosts. + + .. note:: + Skip this step if doing this procedure on a |prod| Duplex system. + + .. code-block:: none + + $ system host-lock + +#. Downgrade controller-0. + + .. code-block:: none + + $ system host-downgrade controller-0 + + The host is re-installed with the previous release load. + +#. Unlock controller-0. + + .. code-block:: none + + $ system host-unlock controller-0 + +#. Swact to controller-0. + + .. code-block:: none + + $ system host-swact controller-1 + + Swacting back to controller-0 will switch back to using the previous + release databases, which were frozen at the time of the swact to + controller-1. This is essentially the same result as a system restore. + +#. Lock and downgrade controller-1. + + .. code-block:: none + + $ system host-downgrade controller-1 + + The host is re-installed with the previous release load. + +#. Unlock controller-1. + + .. code-block:: none + + $ system host-unlock controller-1 + + +#. Power up and unlock the storage hosts one at a time \(if using a Ceph + storage backend\). The hosts are re-installed with the release N load. + + .. note:: + Skip this step if doing this procedure on a |prod| Duplex system. + +#. Power up and unlock the worker hosts one at a time. + + .. note:: + Skip this step if doing this procedure on a |prod| Duplex system. + + The hosts are re-installed with the previous release load. As each worker + host goes online, application pods will be automatically recovered by the + system. + +#. Complete the upgrade. + + .. code-block:: none + + $ system upgrade-complete + + This cleans up the upgrade release, configuration, databases, and so forth. + +#. Delete the upgrade release load. + + .. code-block:: none + + $ system load-delete diff --git a/doc/source/updates/kubernetes/rolling-back-a-software-upgrade-before-the-second-controller-upgrade.rst b/doc/source/updates/kubernetes/rolling-back-a-software-upgrade-before-the-second-controller-upgrade.rst new file mode 100644 index 000000000..6efb0a366 --- /dev/null +++ b/doc/source/updates/kubernetes/rolling-back-a-software-upgrade-before-the-second-controller-upgrade.rst @@ -0,0 +1,68 @@ + +.. wyr1593277734184 +.. _rolling-back-a-software-upgrade-before-the-second-controller-upgrade: + +================================================================= +Roll Back a Software Upgrade Before the Second Controller Upgrade +================================================================= + +You can perform an in-service abort of an upgrade before the second Controller +\(controller-0 in the examples of this procedure\) have been upgraded. + +.. rubric:: |proc| + +#. Abort the upgrade with the :command:`upgrade-abort` command. + + .. code-block:: none + + $ system upgrade-abort + + The upgrade state is set to aborting. Once this is executed, there is no + canceling; the upgrade must be completely aborted. + + The following states apply when you execute this command. + + - aborting: + + - State entered when :command:`system upgrade-abort` is executed + before upgrading controller-0. + + - Remain in this state until the abort is completed. + +#. Make controller-0 active. + + .. code-block:: none + + $ system host-swact controller-1 + + If controller-1 was active with the new upgrade release, swacting back to + controller-0 will switch back to using the previous release databases, + which were frozen at the time of the swact to controller-1. Any changes to + the system that were made while controller-1 was active will be lost. + +#. Lock and downgrade controller-1. + + .. code-block:: none + + $ system host-lock controller-1 + $ system host-downgrade controller-1 + + The host is re-installed with the previous release load. + +#. Unlock controller-1. + + .. code-block:: none + + $ system host-unlock controller-1 + +#. Complete the upgrade. + + .. code-block:: none + + $ system upgrade-complete + +#. Delete the newer upgrade release that has been aborted. + + .. code-block:: none + + $ system load-delete diff --git a/doc/source/updates/kubernetes/software-update-space-reclamation.rst b/doc/source/updates/kubernetes/software-update-space-reclamation.rst new file mode 100644 index 000000000..c85f43126 --- /dev/null +++ b/doc/source/updates/kubernetes/software-update-space-reclamation.rst @@ -0,0 +1,19 @@ + +.. qbz1552920585263 +.. _software-update-space-reclamation: + +================================= +Software Update Space Reclamation +================================= + +|prod-long| provides functionality for reclaiming disk space used by older +versions of software updates once newer versions have been committed. + +The :command:`sw-patch commit` command allows you to “commit” a set of software +updates, which effectively locks down those updates and makes them unremovable. +In doing so, |prod-long| is then able to delete package files with lower +versions from the storage and repo, keeping only the highest version of each +package in the committed software update set. + +.. caution:: + This action is irreversible. diff --git a/doc/source/updates/kubernetes/software-updates-and-upgrades-software-updates.rst b/doc/source/updates/kubernetes/software-updates-and-upgrades-software-updates.rst new file mode 100644 index 000000000..4d393c60c --- /dev/null +++ b/doc/source/updates/kubernetes/software-updates-and-upgrades-software-updates.rst @@ -0,0 +1,88 @@ + +.. lei1552920487053 +.. _software-updates-and-upgrades-software-updates: + +================ +Software Updates +================ + +|prod-long| software updates \(also known as patches\) must be applied to the +system in order to keep your system updated with feature enhancements, free of +known bugs, and security vulnerabilities. + +|org| provides software updates that are cryptographically signed to ensure +integrity and authenticity. The |prod-long| REST APIs, CLIs and GUI validate +the signature of software updates before loading it into the system. + +An update typically modifies a small portion of your system to address the +following items: + +.. _software-updates-and-upgrades-software-updates-ul-gcd-smn-xw: + +- bugs + +- security vulnerabilities + +- feature enhancements + +Software updates can be installed manually or by the Update Orchestrator which +automates a rolling install of an update across all of the |prod-long| hosts. +For more information on manual updates, see :ref:`Manage Software Updates +`. For more information on upgrade orchestration, +see :ref:`Orchestrated Software Update `. + +.. warning:: + Do NOT use the |updates-doc| guide for |prod-dc| orchestrated + software updates. The |prod-dc| Update Orchestrator automates a + recursive rolling install of an update across all subclouds and all hosts + within the subclouds. + +.. xbooklink For more information, see, |distcloud-doc|: :ref:`Update Management for + Distributed Cloud `. + +The |prod| handles multiple updates being applied and removed at once. Software +updates can modify and update any area of |prod| software, including the kernel +itself. For information on populating, installing and removing software +updates, see :ref:`Manage Software Updates `. + +There are two different kinds of Software updates that you can use to update +the |prod| software: + +.. _software-updates-and-upgrades-software-updates-ol-kxm-wgv-njb: + +#. **RPM Software Updates** + + These software updates deliver |prod| software updates containing RPMs for + updating the |prod| software running directly on the hosts. + + Software updates can be installed manually or by the Update Orchestrator + which automates a rolling install of an update across all of the + |prod-long| hosts. + + The |prod| handles multiple updates being applied and removed at once. + Software updates can modify and update any area of |prod| software, + including the kernel itself. + + For information on populating, installing and removing software updates, + see :ref:`Manage Software Updates `. + + .. note:: + A 10 GB internal management network is required for reboot-required + software update operations. + +#. **Application Software Updates** + + These software updates apply to software being managed through the + StarlingX Application Package Manager, that is, ':command:`system + application-upload/apply/remove/delete`'. |prod| delivers some software + through this mechanism, for example, **platform-integ-apps**. + + For software updates for these applications, download the updated + application tarball, containing the updated Armada manifest, and updated + Helm charts for the application, and apply the updates using the + :command:`system application-update` command. + +.. xbooklink For more information, see, + :ref:`Cloud Platform Kubernetes Admin Tutorials + `: :ref:`StarlingX Application Package Manager + `. diff --git a/doc/source/updates/kubernetes/software-upgrades.rst b/doc/source/updates/kubernetes/software-upgrades.rst new file mode 100644 index 000000000..71e6df612 --- /dev/null +++ b/doc/source/updates/kubernetes/software-upgrades.rst @@ -0,0 +1,108 @@ + +.. upe1593016272562 +.. _software-upgrades: + +================= +Software Upgrades +================= + +|prod-long| upgrades enable you to move |prod| software from one release of +|prod| to the next release of |prod|. + +.. contents:: |minitoc| + :local: + :depth: 1 + +|prod| software upgrade is a multi-step rolling-upgrade process, where |prod| +hosts are upgraded one at time while continuing to provide its hosting services +to its hosted applications. An upgrade can be performed manually or using +Upgrade Orchestration, which automates much of the upgrade procedure, leaving a +few manual steps to prevent operator oversight. For more information on manual +upgrades, see :ref:`Manual |prod| Components Upgrade +`. For more information on upgrade orchestration, see +:ref:`Orchestrated |prod| Component Upgrade `. + +.. warning:: + Do NOT use information in the |updates-doc| guide for |prod-dc| + orchestrated software upgrades. If information in this document is used for + a |prod-dc| orchestrated upgrade, the upgrade will fail resulting + in an outage. The |prod-dc| Upgrade Orchestrator automates a + recursive rolling upgrade of all subclouds and all hosts within the + subclouds. + +.. xbooklink For more information on the |prod-dc| Upgrade Orchestrator, see, + |distcloud-doc|: :ref:`Upgrade Orchestration for Distributed Cloud + Subclouds Using CLI + `. + +Before starting the upgrades process: + +.. _software-upgrades-ul-ant-vgq-gmb: + +- the system must be “patch current” + +- there must be no management-affecting alarms present on the system + +- the new software load must be imported, and + +- a valid license file for the new software release must be installed + +The upgrade process starts by upgrading the controllers. The standby controller +is upgraded first and involves loading the standby controller with the new +release of software and migrating all the controller services' databases for +the new release of software. Activity is switched to the upgraded controller, +running in a 'compatibility' mode where all inter-node messages are using +message formats from the old release of software. Before upgrading the second +controller, is the "point-of-no-return for an in-service abort" of the upgrades +process. The second controller is loaded with the new release of software and +becomes the new Standby controller. For more information on manual upgrades, +see :ref:`Manual |prod| Components Upgrade ` . + +If present, storage nodes are locked, upgraded and unlocked one at a time in +order to respect the redundancy model of |prod| storage nodes. Storage nodes +can be upgraded in parallel if using upgrade orchestration. + +Upgrade of worker nodes is the next step in the process. When locking a worker +node the node is tainted, such that Kubernetes shuts down any pods on this +worker node and restarts the pods on another worker node. When upgrading the +worker node, the worker node network boots/installs the new software from the +active controller. After unlocking the worker node, the worker services are +running in a 'compatibility' mode where all inter-node messages are using +message formats from the old release of software. Note that the worker nodes +can only be upgraded in parallel if using upgrade orchestration. + +The final step of the upgrade process is to activate and complete the upgrade. +This involves disabling 'compatibility' modes on all hosts and clearing the +Upgrade Alarm. + +.. _software-upgrades-section-N1002F-N1001F-N10001: + +---------------------------------- +Rolling Back / Aborting an Upgrade +---------------------------------- + +In general, any issues encountered during an upgrade should be addressed during +the upgrade with the intention of completing the upgrade after the issues are +resolved. Issues specific to a storage or worker host can be addressed by +temporarily downgrading the host, addressing the issues and then upgrading the +host again, or in some cases by replacing the node. + +In extremely rare cases, it may be necessary to abort an upgrade. This is a +last resort and should only be done if there is no other way to address the +issue within the context of the upgrade. There are two cases for doing such an +abort: + +.. _software-upgrades-ul-dqp-brt-cx: + +- Before controller-0 has been upgraded \(that is, only controller-1 has been + upgraded\): In this case the upgrade can be aborted and the system will + remain in service during the abort, see, :ref:`Rolling Back a Software + Upgrade Before the Second Controller Upgrade + `. + +- After controller-0 has been upgraded \(that is, both controllers have been + upgraded\): In this case the upgrade can only be aborted with a complete + outage and a reinstall of all hosts. This would only be done as a last + resort, if there was absolutely no other way to recover the system, see, + :ref:`Rolling Back a Software Upgrade After the Second Controller Upgrade + `. diff --git a/doc/source/updates/kubernetes/update-orchestration-cli.rst b/doc/source/updates/kubernetes/update-orchestration-cli.rst new file mode 100644 index 000000000..2a7b9c4e5 --- /dev/null +++ b/doc/source/updates/kubernetes/update-orchestration-cli.rst @@ -0,0 +1,69 @@ + +.. agv1552920520258 +.. _update-orchestration-cli: + +======================== +Update Orchestration CLI +======================== + +The update orchestration CLI is :command:`sw-manager`. Use this to create your +update strategy. + +The commands and options map directly to the parameter descriptions in the web +interface dialog, described in :ref:`Configuring Update Orchestration +`. + +.. note:: + To use update orchestration commands, you need administrator privileges. + You must log in to the active controller as user **sysadmin** and source + the /etc/platform/openrc script to obtain administrator privileges. Do not + use **sudo**. + +.. note:: + Management-affecting alarms cannot be ignored at the indicated severity + level or higher by using relaxed alarm rules during an orchestrated update + operation. For a list of management-affecting alarms, see |fault-doc|: + :ref:`Alarm Messages <100-series-alarm-messages>`. To display + management-affecting active alarms, use the following command: + + .. code-block:: none + + ~(keystone_admin)$ fm alarm-list --mgmt_affecting + + During an orchestrated update operation, the following alarms are ignored + even when strict restrictions are selected: + + - 200.001, Maintenance host lock alarm + + - 900.001, Patch in progress + + - 900.005, Upgrade in progress + + - 900.101, Software patch auto apply in progress + +.. _update-orchestration-cli-ul-qhy-q1p-v1b: + +Help is available for the overall command and also for each sub-command. For +example: + +.. code-block:: none + + ~(keystone_admin)]$ sw-manager patch-strategy --help + usage: sw-manager patch-strategy [-h] ... + + optional arguments: + -h, --help show this help message and exit + +Update orchestration commands include: + +.. _update-orchestration-cli-ul-cvv-gdd-nx: + +- :command:`create` - Create a strategy + +- :command:`delete` - Delete a strategy + +- :command:`apply` - Apply a strategy + +- :command:`abort` - Abort a strategy + +- :command:`show` - Show a strategy diff --git a/doc/source/updates/kubernetes/update-orchestration-overview.rst b/doc/source/updates/kubernetes/update-orchestration-overview.rst new file mode 100644 index 000000000..4155febd0 --- /dev/null +++ b/doc/source/updates/kubernetes/update-orchestration-overview.rst @@ -0,0 +1,95 @@ + +.. kzb1552920557323 +.. _update-orchestration-overview: + +============================= +Update Orchestration Overview +============================= + +Update orchestration allows an entire |prod| system to be updated with a single +operation. + +.. contents:: |minitoc| + :local: + :depth: 1 + +You can configure and run update orchestration using the CLI, the Horizon Web +interface, or the stx-nfv REST API. + +.. note:: + Updating of |prod-dc| is distinct from updating of other |prod| + configurations. + +.. xbooklink For information on updating |prod-dc|, see |distcloud-doc|: + :ref:`Update Management for Distributed Cloud + `. + +.. _update-orchestration-overview-section-N10031-N10023-N10001: + +--------------------------------- +Update Orchestration Requirements +--------------------------------- + +Update orchestration can only be done on a system that meets the following +conditions: + +.. _update-orchestration-overview-ul-e1y-t4c-nx: + +- The system is clear of alarms \(with the exception of alarms for locked + hosts, and update applications in progress\). + + .. note:: + When configuring update orchestration, you have the option to ignore + alarms with a severity less than management-affecting severity. For + more information, see :ref:`Configuring Update Orchestration + `. + +- All hosts must be unlocked-enabled-available. + +- Two controller hosts must be available. + +- All storage hosts must be available. + +- When installing reboot required updates, there must be spare worker + capacity to move hosted application pods off the worker host\(s\) being + updated such that hosted application services are not impacted. + +.. _update-orchestration-overview-section-N1009D-N10023-N10001: + +-------------------------------- +The Update Orchestration Process +-------------------------------- + +Update orchestration automatically iterates through all hosts on the system and +installs the applied updates to each host: first the controller hosts, then the +storage hosts, and finally the worker hosts. During the worker host updating, +hosted application pod re-locations are managed automatically. The controller +hosts are always updated serially. The storage hosts and worker hosts can be +configured to be updated in parallel in order to reduce the overall update +installation time. + +Update orchestration can install one or more applied updates at the same time. +It can also install reboot-required updates or in-service updates or both at +the same time. Update orchestration only locks and unlocks \(that is, reboots\) +a host to install an update if at least one reboot-required update has been +applied. + +The user first creates an update orchestration strategy, or plan, for the +automated updating procedure. This customizes the update orchestration, using +parameters to specify: + +.. _update-orchestration-overview-ul-eyw-fyr-31b: + +- the host types to be updated + +- whether to update hosts serially or in parallel + +Based on these parameters, and the state of the hosts, update orchestration +creates a number of stages for the overall update strategy. Each stage +generally consists of re-locating hosted application pods, locking hosts, +installing updates, and unlocking hosts for a subset of the hosts on the +system. + +After creating the update orchestration strategy, the user can either apply the +entire strategy automatically, or manually apply individual stages to control +and monitor the update progress. diff --git a/doc/source/updates/kubernetes/update-status-and-lifecycle.rst b/doc/source/updates/kubernetes/update-status-and-lifecycle.rst new file mode 100644 index 000000000..c761469e7 --- /dev/null +++ b/doc/source/updates/kubernetes/update-status-and-lifecycle.rst @@ -0,0 +1,76 @@ + +.. utq1552920689344 +.. _update-status-and-lifecycle: + +=========================== +Update Status and Lifecycle +=========================== + +|prod| software updates move through different status levels as the updates are +being applied. + +.. rubric:: |context| + +After adding an update \(patch\) to the storage area you must move it to the +repository, which manages distribution for the cluster. From there, you can +install the updates to the hosts that require them. + +Some of the available updates may be required on controller hosts only, while +others may be required on worker or storage hosts. Use :command:`sw-patch +query-hosts` to see which hosts are impacted by the newly applied \(or +removed\) updates. You can then use :command:`sw-patch host-install` to update +the software on individual hosts. + +To keep track of software update installation, you can use the +:command:`sw-patch query` command. + +.. parsed-literal:: + + ~(keystone_admin)]$ sudo sw-patch query + Patch ID Patch State + =========== ============ + |pvr|-nn.nn_PATCH_0001 Applied + +where *nn.nn* in the update filename is the |prod| release number. + +This shows the **Patch State** for each of the updates in the storage area: + +**Available** + An update in the *Available* state has been added to the storage area, but + is not currently in the repository or installed on the hosts. + +**Partial-Apply** + An update in the *Partial-Apply* state has been added to the software + updates repository using the :command:`sw-patch apply` command, but has not + been installed on all hosts that require it. It may have been installed on + some but not others, or it may not have been installed on any hosts. If any + reboot-required update is in a partial state \(Partial-Apply or + Partial-Remove\), you cannot update the software on any given host without + first locking it. If, for example, you had one reboot-required update and + one in-service update, both in a Partial-Apply state and both applicable to + node X, you cannot just install the non-reboot-required update to the + unlocked node X. + +**Applied** + An update in the *Applied* state has been installed on all hosts that + require it. + +You can use the :command:`sw-patch query-hosts` command to see which hosts are +fully updated \(**Patch Current**\). This also shows which hosts require +reboot, either because they are not fully updated, or because they are fully +updated but not yet rebooted. + +.. code-block:: none + + ~(keystone_admin)]$ sudo sw-patch query-hosts + + Hostname IP Address Patch Current Reboot Required Release State + ============ ============== ============= =============== ======= ===== + compute-0 192.168.204.95 Yes No 20.06 idle + compute-1 192.168.204.63 Yes No 20.06 idle + compute-2 192.168.204.99 Yes No 20.06 idle + compute-3 192.168.204.49 Yes No 20.06 idle + controller-0 192.168.204.3 Yes No 20.06 idle + controller-1 192.168.204.4 Yes No 20.06 idle + storage-0 192.168.204.37 Yes No 20.06 idle + storage-1 192.168.204.90 Yes No 20.06 idle diff --git a/doc/source/updates/kubernetes/upgrading-all-in-one-duplex-or-standard.rst b/doc/source/updates/kubernetes/upgrading-all-in-one-duplex-or-standard.rst new file mode 100644 index 000000000..0775d92ec --- /dev/null +++ b/doc/source/updates/kubernetes/upgrading-all-in-one-duplex-or-standard.rst @@ -0,0 +1,495 @@ + +.. btn1592861794542 +.. _upgrading-all-in-one-duplex-or-standard: + +====================================== +Upgrade All-in-One Duplex / Standard +====================================== + +You can upgrade the |prod| Duplex or Standard configurations with a new release +of |prod| software. + +.. rubric:: |prereq| + +.. _upgrading-all-in-one-duplex-or-standard-ul-ezb-b11-cx: + +- Perform a full backup to allow recovery. + + .. note:: + Back up files in the /home/sysadmin and /root directories prior + to doing an upgrade. Home directories are not preserved during backup or + restore operations, blade replacement, or upgrades. + +- The system must be "patch current". All updates available for the current + release running on the system must be applied. To find and download + applicable updates, visit the |dnload-loc|. + +- Transfer the new release software load to controller-0 \(or onto a USB + stick\); controller-0 must be active. + +- Transfer the new release software license file to controller-0, \(or onto a + USB stick\). + +- Transfer the new release software signature to controller-0 \(or onto a USB + stick\). + +- Unlock all hosts. + + - All nodes must be unlocked. The upgrade cannot be started when there + are locked nodes \(the health check prevents it\). + +.. note:: + The upgrade procedure includes steps to resolve system health issues. + +.. rubric:: |proc| + +#. Ensure that controller-0 is the active controller. + +#. Install the license file for the release you are upgrading to, for example, + 20.06. + + .. code-block:: none + + ~(keystone_admin)]$ system license-install + + For example, + + .. code-block:: none + + ~(keystone_admin)]$ system license-install license.lic + +#. Import the new release. + + + #. Run the :command:`load-import` command on **controller-0** to import + the new release. + + First, source /etc/platform/openrc. Also, you must specify an exact + path to the \*.iso bootimage file and to the \*.sig bootimage signature + file. + + .. code-block:: none + + $ source /etc/platform/openrc + ~(keystone_admin)]$ system load-import /home/sysadmin/.iso \ + .sig + +--------------------+-----------+ + | Property | Value | + +--------------------+-----------+ + | id | 2 | + | state | importing | + | software_version | 20.06 | + | compatible_version | 20.04 | + | required_patches | | + +--------------------+-----------+ + + The :command:`load-import` must be done on **controller-0** and accepts + relative paths. + + #. Check to ensure the load was successfully imported. + + .. code-block:: none + + ~(keystone_admin)]$ system load-list + + +----+----------+------------------+ + | id | state | software_version | + +----+----------+------------------+ + | 1 | active | 20.04 | + | 2 | imported | 20.06 | + +----+----------+------------------+ + + +#. Apply any required software updates. + + The system must be 'patch current'. All software updates related to your + current |prod| software release must be uploaded, applied, and installed. + + All software updates to the new |prod| release, only need to be uploaded + and applied. The install of these software updates will occur automatically + during the software upgrade procedure as the hosts are reset to load the + new release of software. + + To find and download applicable updates, visit the |dnload-loc|. + + For more information, see :ref:`Manage Software Updates + `. + +#. Confirm that the system is healthy. + + Check the current system health status, resolve any alarms and other issues + reported by the :command:`health-query-upgrade` command, then recheck the + system health status to confirm that all **System Health** fields are set + to **OK**. + + .. code-block:: none + + ~(keystone_admin)]$ system health-query-upgrade + System Health: + All hosts are provisioned: [OK] + All hosts are unlocked/enabled: [OK] + All hosts have current configurations: [OK] + All hosts are patch current: [OK] + Ceph Storage Healthy: [OK] + No alarms: [OK] + All kubernetes nodes are ready: [OK] + All kubernetes control plane pods are ready: [OK] + Required patches are applied: [OK] + License valid for upgrade: [OK] + + By default, the upgrade process cannot be run and is not recommended to be + run with Active Alarms present. However, management affecting alarms can be + ignored with the :command:`--force` option with the :command:`system + upgrade-start` command to force the upgrade process to start. + + .. note:: + It is strongly recommended that you clear your system of any and all + alarms before doing an upgrade. While the :command:`--force` option is + available to run the upgrade, it is a best practice to clear any + alarms. + +#. Start the upgrade from controller-0. + + Make sure that controller-0 is the active controller, and you are logged + into controller-0 as **sysadmin** and your present working directory is + your home directory. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-start + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | starting | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + + This will make a copy of the system data to be used in the upgrade. + Configuration changes are not allowed after this point until the swact to + controller-1 is completed. + + The following upgrade state applies once this command is executed: + + - started: + + - State entered after :command:`system upgrade-start` completes. + + - Release 20.04 system data \(for example, postgres databases\) has + been exported to be used in the upgrade. + + - Configuration changes must not be made after this point, until the + upgrade is completed. + + As part of the upgrade, the upgrade process checks the health of the system + and validates that the system is ready for an upgrade. + + The upgrade process checks that no alarms are active before starting an + upgrade. + + .. note:: + Use the command :command:`system upgrade-start --force` to force the + upgrades process to start and to ignore management affecting alarms. + This should ONLY be done if you feel these alarms will not be an issue + over the upgrades process. + + On systems with Ceph storage, it also checks that the Ceph cluster is + healthy. + +#. Upgrade controller-1. + + #. Lock controller-1. + + .. code-block:: none + + ~(keystone_admin)]$ system host-lock controller-1 + + #. Upgrade controller-1. + + Controller-1 installs the update and reboots, then performs data + migration. + + .. code-block:: none + + ~(keystone_admin)]$ system host-upgrade controller-1 + + Wait for controller-1 to reinstall with the load N+1 and becomes + **locked-disabled-online** state. + + The following data migration states apply when this command is + executed. + + - data-migration: + + - State entered when :command:`system host-upgrade controller-1` + is executed. + + - System data is being migrated from release N to release N+1. + + - data-migration-complete: + + - State entered when controller-1 upgrade is complete. + + - System data has been successfully migrated from release 20.04 + to release 20.06. + + - data-migration-failed: + + - State entered if data migration on controller-1 fails. + + - Upgrade must be aborted. + + #. Check the upgrade state. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-show + + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | e7c8f6bc-518c-46d4-ab81-7a59f8f8e64b | + | state | data-migration-complete | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + + If the :command:`upgrade-show` status indicates + 'data-migration-failed', then there is an issue with the data + migration. Check the issue before proceeding to the next step. + + #. Unlock controller-1. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock controller-1 + + Wait for controller-1 to become **unlocked-enabled**. Wait for the DRBD + sync **400.001** Services-related alarm is raised and then cleared. + + The following states apply when this command is executed. + + - upgrading-controllers: + + - State entered when controller-1 has been unlocked and is + running release 20.06 software. + + If it transitions to **unlocked-disabled-failed**, check the issue + before proceeding to the next step. The alarms may indicate a + configuration error. Check the result of the configuration logs on + controller-1, \(for example, Error logs in + controller1:/var/log/puppet\). + +#. Set controller-1 as the active controller. Swact to controller-1. + + .. code-block:: none + + ~(keystone_admin)]$ system host-swact controller-0 + + Wait until services have gone active on the new active controller-1 before + proceeding to the next step. When all services on controller-1 are + enabled-active, the swact is complete. + +#. Upgrade **controller-0**. + + #. Lock **controller-0**. + + .. code-block:: none + + ~(keystone_admin)]$ system host-lock controller-0 + + #. Upgrade **controller-0**. + + .. code-block:: none + + ~(keystone_admin)]$ system host-upgrade controller-0 + + + #. Unlock **controller-0**. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock controller-0 + + Wait until the DRBD sync **400.001** Services-related alarm is raised + and then cleared before proceeding to the next step. + + - upgrading-hosts: + + - State entered when both controllers are running release 20.06 + software. + +#. Check the system health to ensure that there are no unexpected alarms. + + .. code-block:: none + + ~(keystone_admin)]$ fm alarm-list + + Clear all alarms unrelated to the upgrade process. + +#. If using Ceph storage backend, upgrade the storage nodes one at a time. + + The storage node must be locked and all OSDs must be down in order to do + the upgrade. + + #. Lock storage-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-lock storage-0 + + #. Verify that the OSDs are down after the storage node is locked. + + In the Horizon interface, navigate to **Admin** \> **Platform** \> + **Storage Overview** to view the status of the OSDs. + + #. Upgrade storage-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-upgrade storage-0 + + The upgrade is complete when the node comes online, and at that point, + you can safely unlock the node. + + After upgrading a storage node, but before unlocking, there are Ceph + synchronization alarms \(that appear to be making progress in + synching\), and there are infrastructure network interface alarms + \(since the infrastructure network interface configuration has not been + applied to the storage node yet, as it has not been unlocked\). + + Unlock the node as soon as the upgraded storage node comes online. + + #. Unlock storage-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock storage-0 + + Wait for all alarms to clear after the unlock before proceeding to + upgrade the next storage host. + + #. Repeat the above steps for each storage host. + + .. note:: + After upgrading the first storage node you can expect alarm + **800.003**. The alarm is cleared after all storage nodes are + upgraded. + +#. Upgrade worker hosts, one at a time, if any. + + #. Lock worker-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-lock worker-0 + + #. Upgrade worker-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-upgrade worker-0 + + Wait for the host to run the installer, reboot, and go online before + unlocking it in the next step. + + #. Unlock worker-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock worker-0 + + Wait for all alarms to clear after the unlock before proceeding to the + next worker host. + + #. Repeat the above steps for each worker host. + +#. Set controller-0 as the active controller. Swact to controller-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-swact controller-1 + + Wait until services have gone active on the active controller-0 before + proceeding to the next step. When all services on controller-0 are + enabled-active, the swact is complete. + +#. Activate the upgrade. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-activate + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | activating | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + + During the running of the :command:`upgrade-activate` command, new + configurations are applied to the controller. 250.001 \(**hostname + Configuration is out-of-date**\) alarms are raised and are cleared as the + configuration is applied. The upgrade state goes from **activating** to + **activation-complete** once this is done. + + The following states apply when this command is executed. + + **activation-requested** + State entered when :command:`system upgrade-activate` is executed. + + **activating** + State entered when we have started activating the upgrade by applying + new configurations to the controller and compute hosts. + + **activation-complete** + State entered when new configurations have been applied to all + controller and compute hosts. + + #. Check the status of the upgrade again to see it has reached + **activation-complete**. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-show + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | activation-complete | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + +#. Complete the upgrade. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-complete + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | completing | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + +#. Delete the imported load. + + .. code-block:: none + + ~(keystone_admin)]$ system load-list + +----+----------+------------------+ + | id | state | software_version | + +----+----------+------------------+ + | 1 | imported | 20.04 | + | 2 | active | 20.06 | + +----+----------+------------------+ + + ~(keystone_admin)]$ system load-delete 1 + Deleted load: load 1 + + diff --git a/doc/source/updates/kubernetes/upgrading-all-in-one-simplex.rst b/doc/source/updates/kubernetes/upgrading-all-in-one-simplex.rst new file mode 100644 index 000000000..0dbf48cb7 --- /dev/null +++ b/doc/source/updates/kubernetes/upgrading-all-in-one-simplex.rst @@ -0,0 +1,377 @@ + +.. nfq1592854955302 +.. _upgrading-all-in-one-simplex: + +========================== +Upgrade All-in-One Simplex +========================== + +You can upgrade a |prod| Simplex configuration with a new release of |prod| +software. + +.. rubric:: |prereq| + +.. _upgrading-all-in-one-simplex-ul-ezb-b11-cx: + +- Perform a full backup to allow recovery. + + .. note:: + Back up files in the /home/sysadmin and /rootdirectories prior to doing + an upgrade. Home directories are not preserved during backup or restore + operations, blade replacement, or upgrades. + +- The system must be 'patch current'. All upgrades available for the current + release running on the system must be applied. To find and download + applicable upgrades, visit the |dnload-loc| site. + +- Transfer the new release software load to controller-0 \(or onto a USB + stick\); controller-0 must be active. + +- Transfer the new release software license file to controller-0, \(or onto a + USB stick\). + +- Transfer the new release software signature to controller-0 \(or onto a USB + stick\). + +- Unlock all hosts. + + - All nodes must be unlocked. The upgrade cannot be started when there + are locked nodes \(the health check prevents it\). + +.. note:: + The upgrade procedure includes steps to resolve system health issues. + +.. rubric:: |proc| + +#. Source the platform environment. + + .. code-block:: none + + $ source /etc/platform/openrc + ~(keystone_admin)]$ + +#. Install the license file for the release you are upgrading to, for example, + 20.06. + + .. code-block:: none + + ~(keystone_admin)]$ system license-install + + For example, + + .. code-block:: none + + ~(keystone_admin)]$ system license-install license.lic + +#. Import the new release. + + #. Run the :command:`load-import` command on **controller-0** to import + the new release. + + First, source /etc/platform/openrc. + + You must specify an exact path to the \*.iso bootimage file and to the + \*.sig bootimage signature file. + + .. code-block:: none + + $ source /etc/platform/openrc + ~(keystone_admin)]$ system load-import /home/sysadmin/.iso \ + .sig + +--------------------+-----------+ + | Property | Value | + +--------------------+-----------+ + | id | 2 | + | state | importing | + | software_version | 20.06 | + | compatible_version | 20.04 | + | required_patches | | + +--------------------+-----------+ + + The :command:`load-import` must be done on **controller-0** and accepts + relative paths. + + #. Check to ensure the load was successfully imported. + + .. code-block:: none + + ~(keystone_admin)]$ system load-list + + +----+----------+------------------+ + | id | state | software_version | + +----+----------+------------------+ + | 1 | active | 20.04 | + | 2 | imported | 20.06 | + +----+----------+------------------+ + +#. Apply any required software updates. + + The system must be 'patch current'. All software updates related to your + current |prod| software release must be, uploaded, applied, and installed. + + All software updates to the new |prod| release, only need to be uploaded + and applied. The install of these software updates will occur automatically + during the software upgrade procedure as the hosts are reset to load the + new release of software. + + To find and download applicable updates, visit the |dnload-loc|. + + For more information, see :ref:`Manage Software Updates + `. + +#. Confirm that the system is healthy. + + Check the current system health status, resolve any alarms and other issues + reported by the :command:`health-query-upgrade` command, then recheck the + system health status to confirm that all **System Health** fields are set + to **OK**. + + .. code-block:: none + + ~(keystone_admin)]$ system health-query-upgrade + System Health: + All hosts are provisioned: [OK] + All hosts are unlocked/enabled: [OK] + All hosts have current configurations: [OK] + All hosts are patch current: [OK] + Ceph Storage Healthy: [OK] + No alarms: [OK] + All kubernetes nodes are ready: [OK] + All kubernetes control plane pods are ready: [OK] + Required patches are applied: [OK] + License valid for upgrade: [OK] + + By default, the upgrade process cannot be run and is not recommended to be + run with Active Alarms present. However, management affecting alarms can be + ignored with the :command:`--force` option with the :command:`system + upgrade-start` command to force the upgrade process to start. + + .. note:: + It is strongly recommended that you clear your system of any and all + alarms before doing an upgrade. While the :command:`--force` option is + available to run the upgrade, it is a best practice to clear any + alarms. + +#. Start the upgrade. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-start + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | starting | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + + This will back up the system data and images to /opt/platform-backup. + /opt/platform-backup is preserved when the host is reinstalled. With the + platform backup, the size of /home/sysadmin must be less than 2GB. + + This process may take several minutes. + + When the upgrade state is upgraded to **started** the process is complete. + + Any changes made to the system after this point will be lost when the data + is restored. + + The following upgrade state applies once this command is executed: + + - started: + + - State entered after :command:`system upgrade-start` completes. + + - Release 20.04 system data \(for example, postgres databases\) has + been exported to be used in the upgrade. + + - Configuration changes must not be made after this point, until the + upgrade is completed. + + As part of the upgrade, the upgrade process checks the health of the system + and validates that the system is ready for an upgrade. + + The upgrade process checks that no alarms are active before starting an + upgrade. + + .. note:: + Use the command :command:`system upgrade-start --force` to force the + upgrades process to start and to ignore management affecting alarms. + This should ONLY be done if you feel these alarms will not be an issue + over the upgrades process. + +#. Check the upgrade state. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-show + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | started | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + +#. \(Optional\) Copy the upgrade data from the system to an alternate safe + location \(such as a USB drive or remote server\). + + The upgrade data is located under /opt/platform-backup. Example file names + are: + + **lost+found upgrade\_data\_2020-06-23T033950\_61e5fcd7-a38d-40b0-ab83-8be55b87fee2.tgz** + + .. code-block:: none + + ~(keystone_admin)]$ ls /opt/platform-backup/ + +#. Lock controller-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-lock controller-0 + +#. Start Upgrade controller-0. + + This is the point of no return. All data except /opt/platform-backup/ will + be erased from the system. This will wipe the **rootfs** and reboot the + host. The new release must then be manually installed \(via network or + USB\). + + .. code-block:: none + + ~(keystone_admin)]$ system host-upgrade controller-0 + WARNING: THIS OPERATION WILL COMPLETELY ERASE ALL DATA FROM THE SYSTEM. + Only proceed once the system data has been copied to another system. + Are you absolutely sure you want to continue? [yes/N]: yes + +#. Install the new release of |prod-long| Simplex software via network or USB. + +#. Restore the upgrade data. + + .. code-block:: none + + ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/upgrade_platform.yml + + Once the host has installed the new load, this will restore the upgrade + data and migrate it to the new load. + + The playbook can be run locally or remotely and must be provided with the + following parameter: + + ``ansible_become_pass`` + + The ansible playbook will check /home/sysadmin/.yml for these + user configuration override files for hosts. For example, if running + ansible locally, /home/sysadmin/localhost.yml. + + By default the playbook will search for the upgrade data file under + /opt/platform-backup. If required, use the **upgrade\_data\_file** + parameter to specify the path to the **upgrade\_data**. + + .. note:: + This playbook does not support replay. + + Once the data restoration is complete the upgrade state will be set to + **upgrading-hosts**. + +#. Check the status of the upgrade. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-show + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | upgrading-hosts | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + +#. Unlock controller-0. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock controller-0 + + This step is required only for Simplex systems that are not a subcloud. + +#. Activate the upgrade. + + During the running of the :command:`upgrade-activate` command, new + configurations are applied to the controller. 250.001 \(**hostname + Configuration is out-of-date**\) alarms are raised and are cleared as the + configuration is applied. The upgrade state goes from **activating** to + **activation-complete** once this is done. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-activate + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | activating | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + + The following states apply when this command is executed. + + **activation-requested** + State entered when :command:`system upgrade-activate` is executed. + + **activating** + State entered when we have started activating the upgrade by applying + new configurations to the controller and compute hosts. + + **activation-complete** + State entered when new configurations have been applied to all + controller and compute hosts. + + #. Check the status of the upgrade again to see it has reached + **activation-complete** + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-show + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | activation-complete | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + +#. Complete the upgrade. + + .. code-block:: none + + ~(keystone_admin)]$ system upgrade-complete + +--------------+--------------------------------------+ + | Property | Value | + +--------------+--------------------------------------+ + | uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 | + | state | completing | + | from_release | 20.04 | + | to_release | 20.06 | + +--------------+--------------------------------------+ + +#. Delete the imported load. + + .. code-block:: none + + ~(keystone_admin)]$ system load-list + +----+----------+------------------+ + | id | state | software_version | + +----+----------+------------------+ + | 1 | imported | 20.04 | + | 2 | active | 20.06 | + +----+----------+------------------+ + + ~(keystone_admin)]$ system load-delete 1 + Deleted load: load 1