docs/doc/source/updates/kubernetes/orchestrated-deployment-host-software-deployment-d234754c7d20.rst
Suzana Fernandes 3150cbaf6f Add info to indicate that only certificates with RSA keys are supported
closes-bug: 2086733

Change-Id: I1f1719e213d70e5a30abdd40c63388f0e4379fac
Signed-off-by: Suzana Fernandes <Suzana.Fernandes@windriver.com>
2024-11-11 17:01:23 +00:00

677 lines
30 KiB
ReStructuredText

.. WARNING: Add no lines of text between the label immediately following
.. and the title.
.. _orchestrated-deployment-host-software-deployment-d234754c7d20:
================================================
Orchestrated Deployment Host Software Deployment
================================================
Software deployment orchestration automates the process of upversioning the
|prod| software to a new major release or new patch release (In-Service or
Reboot Required (RR)). It automates the execution of all :command:`software deploy`
steps across all the hosts in a cluster, based on the configured policies.
.. note::
Software deployment orchestration also covers the orchestrated upversioning
to a new patched major release, that is, all the comments in this section
that are specific to major release also apply to a patched major release.
Software deployment Orchestration supports all standalone configurations:
|AIO-SX|, |AIO-DX| and standard configuration.
.. note::
Orchestrating the software deployment of a |DC| system is different from
orchestrating the software deployment of standalone |prod| configurations.
Software deployment orchestration automatically iterates through all the hosts
and deploys the new software load on each host: first the controller hosts,
then the storage hosts, and lastly the worker hosts, and finally activates and
completes the software deployment. During software deployment on a worker host
(and duplex |AIO| controllers), pods or |VMs| are automatically moved to the
alternate worker hosts. After software deployment orchestration has deployed
the new software on all hosts, it will activate, complete, and delete the new
software deployment.
.. note::
Software deployment orchestration completes and deletes the new software
deployment only when the ``-delete`` option is selected by the user during
create strategy. In case of a Major Release, if the software deployment is
deleted, it can no longer be rolled back.
To perform a software deployment orchestration, first create an upgrade
orchestration strategy for the automated software deployment procedure. This
provides polices to perform the software deployment orchestration using the following
parameters:
- The host types to be software deployed.
- Whether to deploy the software to hosts serially or in parallel.
- The maximum number of hosts to deploy in parallel.
- Maintenance action (stop-start or migrate) for hosted OpenStack |VMs|
on a host that is about to have its software updated.
- Alarm restrictions, that is, options to specify how the orchestration behaves
when alarms occur.
Based on these parameters and the state of the hosts, software deployment
orchestration creates a number of stages for the overall software deployment
strategy. Each stage generally consists of deploying software on hosts for a
subset of the hosts on the system. In the case of a reboot required (RR)
software release, each stage consists of moving pods or |VMs|, locking hosts,
deploying software on hosts, and unlocking hosts for a subset of the hosts on
the system. After creating the software deployment orchestration strategy, you
can either apply the entire strategy automatically or apply individual stages
to control and monitor their progress manually.
.. rubric:: |prereq|
- No other orchestration strategy exists. Firmware-upgrade,
kubernetes-version-upgrade, system-config-update-strategy, and
kube-rootca-update are other types of orchestration. A software deployment
cannot be orchestrated while another orchestration is in progress.
- You have the administrator role privileges.
- The system is clear of alarms except the software deployment in progress alarm.
- All the hosts are unlocked, enabled, and available.
- For Duplex systems, the system should be fully redundant. There should be two controller
nodes available, at least one complete storage replication group available
for systems with Ceph backend.
- Sufficient free capacity or unused worker resources must be available
across the cluster. A rough calculation is:
``Required spare capacity ( %) = (<Number-of-hosts-to-upgrade-in-parallel> / <total-number-of-hosts>) * 100``
- For a major release deployment, the license for the new release has been installed using
:command:`system license-install <license-for-new-major-release>`.
- The software release to be deployed has been uploaded.
- For a major release:
.. code-block::
~(keystone_admin)]$ software upload [ --local ] <new-release>.iso
<new-release>.sig <new-release-id> is now uploaded
+-------------------------------+-------------------+
| Uploaded File | Release |
+-------------------------------+-------------------+
| <new-release>.iso | <new-release-id> |
+-------------------------------+-------------------+
This command may take 5-10 mins depending on hardware.
where `--local` can be used when running this command in an |SSH| session
on the active controller to optimize performance. With this option, the
system will read files directly from the local disk rather than
transferring files over REST APIs backing the |CLI|.
- For a patch release:
.. code-block::
~(keystone_admin)]$ software upload <filename>.patch
<release-id> is now uploaded
+-------------------------------+-------------------+
| Uploaded File | Release |
+-------------------------------+-------------------+
| <new-release>.patch | <new-release-id> |
+-------------------------------+-------------------+
- Ensure that the new software release was successfully uploaded.
.. code-block::
~(keystone_admin)]$ software list
+--------------------------+-------+-----------+
| Release | RR | State |
+--------------------------+-------+-----------+
| starlingx-10.0.0 | True | deployed |
| <new-release-id> | True | available |
+--------------------------+-------+-----------+
- For a major release deployment, the platform issuer (system-local-ca) must be
configured beforehand with an RSA certificate/private key. If ``system-local-ca``
was configured with a different type of certificate/private key, use the
:ref:`migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d` procedure
to reconfigure it with RSA certificate/private key.
.. rubric:: |proc|
#. Create a software deployment orchestration strategy for a specified software
release with desired policies.
.. _orchestrated-deployment-host-software-deployment-d234754c7d20-step:
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy create [--controller-apply-type {serial,ignore}]
[--storage-apply-type {serial,parallel,ignore}]
[--worker-apply-type {serial,parallel,ignore}]
[--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]
[--instance-action {stop-start,migrate}]
[--alarm-restrictions {strict,relaxed}]
[--delete]
<software-release-id>
strategy-uuid: 5435e049-7002-4403-acfb-7886f6da14af
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: build
current-phase-completion: 0%
state: building
inprogress: true
where,
``<software-release-id>``
Specifies the specific software release to deploy. This can be a patch
release or a major release.
``[--controller-apply-type {serial,ignore}]``
(Optional) Specifies whether software should be deployed to controller
hosts in serial or ignored. By default, it is serial. ``ignore`` should
be used only when re-creating and applying a strategy after an abort or
failure.
``[--storage-apply-type {serial,parallel,ignore}]``
(Optional) Specifies whether software should be deployed to storage
hosts in serial, in parallel, or ignored. By default, it is serial.
Software is deployed to storage hosts in parallel by software deploying
a storage host from each storage redundancy group. ``ignore`` should be
used only when re-creating and applying a strategy after an abort or
failure.
.. note::
If parallel apply for storage is used, it will be automatically
replaced with the serial apply for ``--storage-apply-type``.
``[--worker-apply-type {serial,parallel,ignore}]``
(Optional) Specifies whether software should be deployed to worker hosts
in serial, in parallel or ignored. By default, it is serial. The number
of worker hosts that are software deployed in parallel is specified by
``[--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]``. The default is
2. ``ignore`` should be used only when re-creating and applying a
strategy after an abort or failure.
``[--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10}]``
Specifies the number of worker hosts that are software deployed in
parallel that is specified by ``[--max-parallel-worker-hosts
{2,3,4,5,6,7,8,9,10}]``. The default is 2.
``[--instance-action {stop-start,migrate}]``
Applies only to OpenStack |VM| hosted guests. It specifies the action
performed to hosted OpenStack |VMs| on a worker host (or |AIO|
controller) prior to deploying the new software to the host. The default
is ``stop-start``.
- ``stop-start``
Before deploying the software release to the host, all the hosted
OpenStack |VMs| are stopped or shutdown.
After deploying the software release to the host, all the hosted
OpenStack |VMs| are restarted.
- ``migrate``
Before deploying the software release to the host, all the hosted
OpenStack |VMs| are migrated to another host capable of hosting the
hosted OpenStack |VM| and that is not part of the current stage.
- Hosts whose software is already updated are preferred over the hosts
whose software has not been updated yet.
- Live migration is attempted first. If live migration is not
possible for the OpenStack |VM|, cold migration is performed.
``[--alarm-restrictions {strict,relaxed}]``
Lets you determine how to handle alarm restrictions based on the
management affecting statuses of any existing alarms, which takes into
account the alarm type as well as the alarm's current severity. Default
is strict. If set to relaxed, orchestration will be allowed to proceed
if there are no management affecting alarms present.
Performing management actions without specifically relaxing the alarm
checks will still fail if there are any alarms present in the system
(except for a small list of basic alarms for the orchestration actions,
such as an upgrade operation in progress alarm not impeding upgrade
orchestration). You can use the CLI command :command:`fm alarm-list
--mgmt_affecting` to view the alarms that are management affecting.
- ``Strict`` maintains alarm restrictions.
- ``Relaxed`` relaxes the usual alarm restrictions and allows the action to
proceed if there are no alarms present in the system with a severity equal
to or greater than its management affecting severity. That is, it will use
the ``-f`` (force) option on the precheck or start of the deployment.
``[--delete]``
(Optional) Specifies if the software deployment needs to be deleted or not.
#. Wait for the ``build`` phase of the software deployment orchestration
strategy create to be 100% complete and its state to be ``ready-to-apply``.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show
Strategy Software Deploy Strategy:
strategy-uuid: 6282f049-bb9e-46f0-9ca8-97bf626884e0
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: build
current-phase-completion: 100%
state: ready-to-apply
build-result: success
build-reason:
.. note::
If the build phase fails (``build-result: failed`` that will appear in
the show command), determine the issue from the build error reason
(``build-reason: <Error information>`` that will appear in the show
command) and/or in ``/var/log/nfv-vim*.log`` on the active controller,
address the issues, delete the strategy, and retry the create.
#. (Optional) Displays ``--error-details`` (phases and steps) of the build strategy.
The software deploy strategy consists of one or more stages, which consist
of one or more hosts to have the new software deployed at the same time.
Each stage will be split into steps (for example, query-alarms, lock-hosts,
upgrade-hosts).
The new software is deployed on the controller hosts first, followed by the
storage hosts, and then the worker hosts.
The new software is deployed on the worker hosts with no hosted guests
(Kubernetes pods or OpenStack |VMs|) and before the worker hosts with hosted guests
(Kubernetes pods or OpenStack |VMs|).
Hosted Kubernetes pods will be relocated off each worker host
(AIO-Controller) if another worker host capable of hosting the Kubernetes
pods is available before the new software is deployed to the worker host
(AIO-Controller).
Hosted OpenStack |VMs| will be managed according to the requested
``--instance-action`` on each worker host (AIO-Controller) before the new
software is deployed to the worker host (AIO-Controller).
The final step in each stage is one of the following:
``system-stabilize``
This waits for a period of time (up to several minutes) and ensures that the
system is free of alarms.
This ensures that we do not continue to deploy the new software to more
hosts if the software deployment has caused an issue resulting in an alarm.
``wait-data-sync``
This waits for a period of time (up to many hours) and ensures that data
synchronization has completed after the upgrade of a controller or storage
node.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show --details
Strategy Software Deploy Strategy:
strategy-uuid: 6282f049-bb9e-46f0-9ca8-97bf626884e0
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: build
current-phase-completion: 100%
state: ready-to-apply
build-phase:
...
stages:
...
steps:
...
apply-phase:
...
stages:
...
steps:
...
#. Apply and monitor the software deployment orchestration.
You can either apply the entire strategy automatically or apply the
individual stages to control and monitor their progress manually.
#. Apply the entire strategy automatically and monitor its progress:
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy apply
Strategy Software Deploy Strategy:
strategy-uuid: 52873771-fc1a-48cd-b322-ab921d34d01c
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: apply
current-phase-completion: 0%
state: applying
inprogress: true
Show high-level status of apply.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show
Strategy Software Deploy Strategy:
strategy-uuid: 35b48793-66f8-46be-8972-cc22117a93ff
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: apply
current-phase-completion: 7%
state: applying
inprogress: true
Show details of active stage or step of apply.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show --active
Strategy Software Deploy Strategy:
strategy-uuid: 52873771-fc1a-48cd-b322-ab921d34d01c
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: apply
current-phase-completion: 7%
state: applying
apply-phase:
total-stages: 3
current-stage: 0
stop-at-stage: 3
timeout: 12019 seconds
completion-percentage: 7%
start-date-time: 2024-06-11 12:19:51
inprogress: true
stages:
stage-id: 0
stage-name: sw-upgrade-start
total-steps: 3
current-step: 1
timeout: 1321 seconds
start-date-time: 2024-06-11 12:19:51
inprogress: true
steps:
step-id: 1
step-name: start-upgrade
timeout: 1200 seconds
start-date-time: 2024-06-11 12:19:51
result: wait
reason:
#. Apply individual stages.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy apply --stage-id <STAGE-ID>
Strategy Software Deploy Strategy:
strategy-uuid: a0277e08-93cc-4964-ba39-ebab367a547c
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: apply
current-phase-completion: 0%
state: applying
inprogress: true
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show
Strategy Software Deploy Strategy:
strategy-uuid: a0277e08-93cc-4964-ba39-ebab367a547c
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: apply
current-phase-completion: 7%
state: applying
inprogress: true
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show --active
Strategy Software Deploy Strategy:
strategy-uuid: a0277e08-93cc-4964-ba39-ebab367a547c
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: apply
current-phase-completion: 7%
state: applying
apply-phase:
total-stages: 3
current-stage: 0
stop-at-stage: 1
timeout: 1322 seconds
completion-percentage: 7%
start-date-time: 2024-06-11 14:40:23
inprogress: true
stages:
stage-id: 0
stage-name: sw-upgrade-start
total-steps: 3
current-step: 1
timeout: 1321 seconds
start-date-time: 2024-06-11 14:40:23
inprogress: true
steps:
step-id: 1
step-name: start-upgrade
timeout: 1200 seconds
start-date-time: 2024-06-11 14:40:23
result: wait
reason:
#. While a software deployment orchestration strategy is being applied, it can be aborted.
The current step will be allowed to complete and if necessary, an abort
phase will be created and applied, which will attempt to unlock any hosts
that were locked.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy abort
Strategy Software Deploy Strategy:
strategy-uuid: 63f48dfc-f833-479b-b597-d11f9219baf5
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: apply
current-phase-completion: 7%
state: aborting
inprogress: true
Wait for the abort to complete.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show
Strategy Software Deploy Strategy:
strategy-uuid: 63f48dfc-f833-479b-b597-d11f9219baf5
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: abort
current-phase-completion: 100%
state: aborted
apply-result: failed
apply-reason:
abort-result: success
abort-reason:
.. note::
To view detailed errors, run the following commands:
.. code-block::
[sysadmin@controller-0 ~(keystone_admin)]$ sw-manager sw-deploy-strategy show --error-details
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show
Strategy Software Deploy Strategy:
strategy-uuid: <>
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: abort
current-phase-completion: 100%
state: aborted
apply-result: failed
apply-error-response:
abort-result: success
abort-reason:
abort-error-response:
.. note::
After a software deployment strategy has been applied (or aborted), it
must be deleted before another software deployment strategy can be
created.
#. Otherwise, wait for all the steps of all stages of the software deployment
orchestration strategy to complete.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy show
Strategy Software Deploy Strategy:
strategy-uuid: 6282f049-bb9e-46f0-9ca8-97bf626884e0
release-id: <software-release-id>
controller-apply-type: serial
storage-apply-type: serial
worker-apply-type: serial
default-instance-action: stop-start
alarm-restrictions: strict
current-phase: applied
current-phase-completion: 100%
state: applied
apply-result: success
apply-reason:
If a software deployment strategy apply fails, you must address the issue
that caused the failure, then delete/re-create the strategy before
attempting to apply it again.
For additional details, run the :command:`sw-manager sw-deploy-strategy show --error-details` command.
#. Delete the completed software deployment strategy.
.. code-block::
~(keystone_admin)]$ sw-manager sw-deploy-strategy delete
Strategy deleted
.. rubric:: |postreq|
After a successful software deployment orchestration,
- The Kubernetes Version Upgrade procedure can be executed, if desired, to
upversion to a new Kubernetes versions available in the new software release.
- You should also validate that the system and hosted applications are healthy.
- In the case of a major release software deployment:
- If you do not need to rollback the major release software deployment, then
delete the software deployment that was used by the software deployment
orchestration.
.. code-block::
~(keystone_admin)]$ software deploy delete
Deployment has been deleted
.. code-block::
~(keystone_admin)]$ software deploy show
No deploy in progress
- Remove the old major release to reclaim disk space.
.. code-block::
~(keystone_admin)]$ software list
+--------------------------+-------+-------------+
| Release | RR | State |
+--------------------------+-------+-------------+
| starlingx-10.0.0 | True | unavailable |
| <new-major-release-id> | True | deployed |
+--------------------------+-------+-------------+
.. code-block::
~(keystone_admin)]$ software delete starlingx-10.0.0
starlingx-10.0.0 has been deleted.
.. code-block::
~(keystone_admin)]$ software list
+--------------------------+-------+-------------+
| Release | RR | State |
+--------------------------+-------+-------------+
| <new-major-release-id> | True | deployed |
+--------------------------+-------+-------------+