DC Orchestration for AIO-DX & Standard Subclouds

Distributed Upgrade Orchestration Process Using the CLI - Modified this topic to include 2 prerequisites
Changing Case on file extensions.
Updated comments for Patchset 3
Fixed merge conflicts
Updated comments for Patchset 4
Fixed merge conflicts

Story: 2008055
Task: 42387

Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com>
Change-Id: Ia2c44812052c4f70f4742923fa847698cc0d6fa6
Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com>
This commit is contained in:
Juanita-Balaraj 2021-05-04 18:38:24 -04:00
parent cebb02eb21
commit 94fd67c34a
17 changed files with 1392 additions and 4 deletions

View File

@ -0,0 +1,3 @@
{
"restructuredtext.confPath": ""
}

View File

@ -0,0 +1,21 @@
.. hil1593180554641
.. _aborting-the-distributed-upgrade-orchestration:
==============================================
Aborting the Distributed Upgrade Orchestration
==============================================
To abort the current upgrade orchestration operation, use the
:command:`upgrade-strategy abort` command.
.. note::
The :command:`dcmanager upgrade-strategy abort` command completes the
current upgrading stage before aborting, to prevent hosts from being left
in a locked state requiring manual intervention.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy abort

View File

@ -0,0 +1,158 @@
.. jul1593180757282
.. _configuration-for-specific-subclouds:
====================================
Configuration for Specific Subclouds
====================================
To determine how upgrades are applied to the nodes on each subcloud, the
upgrade strategy refers to separate configuration settings.
The following settings are applied by default:
.. _configuration-for-specific-subclouds-ul-sgb-p34-gdb:
- storage apply type: parallel
- worker apply type: parallel
- max parallel workers: 10
- alarm restriction type: relaxed
- default instance action: migrate \(This parameter is only applicable to
hosted application |VMs| with the stx-openstack application.\)
To update the default values, use the :command:`dcmanager strategy-config
update` command. You can also use this command to configure custom behavior for
individual subclouds.
- To list the default upgrade strategy and any custom configurations
configured for individual subclouds, use the :command:`strategy-config
list` command.
For example:
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-config list
+--------------------+--------------------+--------------------+-----------------------+------------------------+------------------+
| cloud | storage apply type | worker apply type | max parallel workers | alarm restriction type | default instance |
| | | | | | action |
+--------------------+--------------------+--------------------+-----------------------+------------------------+------------------+
| all clouds default | parallel | parallel | 10 | relaxed | migrate |
| subcloud-6 | parallel | parallel | 2 | relaxed | stop-start |
+--------------------+--------------------+--------------------+-----------------------+------------------------+------------------+
- To show the configuration settings applicable to all subclouds by default,
use the :command:`strategy-config show` command.
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-config show
+-------------------------+--------------------+
| Field | Value |
+-------------------------+--------------------+
| cloud | all clouds default |
| storage apply type | parallel |
| worker apply type | parallel |
| max parallel workers | 10 |
| alarm restriction type | relaxed |
| default instance action | migrate |
| created_at | None |
| updated_at | None |
+-------------------------+--------------------+
- To update the settings, or to create a custom configuration for a subcloud,
use the :command:`strategy-config update` command.
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-config update \
\
--storage-apply-type <type> \
--worker-apply-type <type> \
--max-parallel-workers <i> \
--alarm-restriction-type <level> \
--default-instance-action <action> \
[<subcloud_name>]
where
**storage apply type**
parallel or serial — determines whether storage nodes are upgraded in
parallel or serially.
**worker apply type**
parallel or serial — determines whether worker nodes are upgraded in
parallel or serially.
**max parallel workers**
Set the maximum number of worker nodes that can be upgraded in
parallel.
**alarm restriction type**
relaxed or strict — determines whether the orchestration is aborted for
alarms that are not management-affecting. For more information, refer
to the
.. xbooklink :ref:`|updates-doc| <software-updates-and-upgrades-software-updates>` guide.
**default instance action**
.. note::
This parameter is only applicable to hosted application |VMs| with
the stx-openstack application.
migrate or stop-start — determines whether hosted application |VMs| are
migrated or stopped and restarted when a worker host is upgraded
**subcloud\_name**
The name of the subcloud to use the custom strategy. If this omitted,
the default upgrade strategy is updated.
.. note::
You must specify all of the settings.
- To show the configuration settings for a subcloud, use the
:command:`strategy-config show` <subcloud> command.
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-config show [<name>]
For example:
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-config show subcloud-6
+-------------------------+----------------------------+
| Field | Value |
+-------------------------+----------------------------+
| cloud | subcloud-6 |
| storage apply type | parallel |
| worker apply type | parallel |
| max parallel workers | 2 |
| alarm restriction type | relaxed |
| default instance action | stop-start |
| created_at | 2020-03-12 20:08:48.917866 |
| updated_at | None |
+-------------------------+----------------------------+
If custom configuration settings have not been created for the subcloud,
the following message is displayed:
.. code-block:: none
ERROR (app) No options found for Subcloud with id 1, defaults will be
used.

View File

@ -104,8 +104,8 @@ Deletes subcloud group details from the database.
+--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+ +--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+
|id|name |desc|loc.|sof.ver|mgmnt |avail |deploy_stat|mgmt_subnet|mgmt_start_ip|mgmt_end_ip|mgmt_gtwy_ip|sysctrl_gtwy|grp_id|created_at|updated_at| |id|name |desc|loc.|sof.ver|mgmnt |avail |deploy_stat|mgmt_subnet|mgmt_start_ip|mgmt_end_ip|mgmt_gtwy_ip|sysctrl_gtwy|grp_id|created_at|updated_at|
+--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+ +--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+
|3 |subcl1|None|None|20.06 |managed|online|complete |fd01:12::0.|fd01:12::2 |fd01:12::11|fd01:12::1 |fd01:11::1 | 2 |2021-01-09|2021-01-12| |3 |subcl1|None|None|nn.nn |managed|online|complete |fd01:12::0.|fd01:12::2 |fd01:12::11|fd01:12::1 |fd01:11::1 | 2 |2021-01-09|2021-01-12|
|4 |subcl2|None|None|20.06 |managed|online|complete |fd01:13::0.|fd01:13::2 |fd01:13::11|fd01:13::1 |fd01:11::1 | 2 |2021-01-09|2021-01-12| |4 |subcl2|None|None|nn.nn |managed|online|complete |fd01:13::0.|fd01:13::2 |fd01:13::11|fd01:13::1 |fd01:11::1 | 2 |2021-01-09|2021-01-12|
+--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+ +--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+
- To show the details of a subcloud group, use the following command: - To show the details of a subcloud group, use the following command:

View File

@ -0,0 +1,335 @@
.. pek1594745988225
.. _distributed-upgrade-orchestration-process-using-the-cli:
=======================================================
Distributed Upgrade Orchestration Process Using the CLI
=======================================================
Distributed upgrade orchestration can be initiated after the upgrade and
stability of the SystemController cloud. Upgrade orchestration automatically
iterates through each of the subclouds, installing the new software load on
each one.
.. rubric:: |context|
The user first creates a distributed upgrade orchestration strategy, or plan,
for the automated upgrade procedure. This customizes the upgrade orchestration,
using parameters to specify:
.. _distributed-upgrade-orchestration-process-using-the-cli-ul-eyw-fyr-31b:
- whether to stop on failure of a subcloud upgrade or continue with the next
subcloud
- whether to upgrade hosts serially or in parallel
Based on these parameters, and the state of the subclouds, distributed upgrade
orchestration creates a number of stages for the overall upgrade strategy. All
the subclouds that are included in the same stage will be upgraded in parallel.
.. rubric:: |prereq|
Distributed upgrade orchestration can only be done on a system that meets the
following conditions:
.. _distributed-upgrade-orchestration-process-using-the-cli-ul-blp-gcx-ry:
- The subclouds must use the Redfish platform management service if it is
an |AIO-SX| subcloud.
- Duplex \(|AIO-DX|/Standard\) upgrades are supported, and they do not
require remote install using Redfish.
- Redfish |BMC| is required for orchestrated subcloud upgrades. The install
values, and :command:`bmc\_password` for each |AIO-SX| subcloud controller
must be provided using the following |CLI| command on the SystemController:
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud update subcloud1 --install-values\
install-values.yaml --bmc-password <password>
For more information on :command:`install-values.yaml` file, see
:ref:`Installing a Subcloud Using Redfish Platform Management Service
<installing-a-subcloud-using-redfish-platform-management-service>`.
- All subclouds are clear of alarms \(with the exception of the alarm upgrade
in progress\).
- All hosts of all subclouds must be unlocked, enabled, and available.
- No distributed update orchestration strategy exists, to verify use the
command :command:`dcmanager upgrade-stratagy-show`. An upgrade cannot be
orchestrated while update orchestration is in progress.
- Verify the size and format of the platform-backup filesystem on each
subcloud. From the shell on each subcloud, use the following command to view
the details of the file system:
:command:`df -Th /opt/platform-backup`
The type must be ext4 and the size must be 9.5GB. For example, on
controller-0, run the following command:
.. code-block:: none
~(keystone_admin)]$ df -Th /opt/platform-backup/ Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 9.5G 51M 9.0G 1% /opt/platform-backup
- **If a previous upgrade has been done on the subcloud**, from the shell on
each subcloud, use the following command to remove the previous upgrade
data:
:command:`sudo rm /opt/platform-backup/upgrade\_data\*`
.. rubric:: |proc|
.. _distributed-upgrade-orchestration-process-using-the-cli-steps-vcm-pq4-3mb:
#. Review the upgrade status for the subclouds.
After the SystemController upgrade is completed, wait for 10 minutes for
the **load\_sync\_status** of all subclouds to be updated.
To identify which subclouds are upgrade-current \(in-sync\), use the
:command:`subcloud list` command. For example:
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+--------------+--------------------+-------------+
| id | name | management | availability | sync |
+----+-----------+--------------+--------------------+-------------+
| 1 | subcloud1 | managed | online | out-of-sync |
| 2 | subcloud2 | managed | online | out-of-sync |
| 3 | subcloud3 | managed | online | out-of-sync |
| 4 | subcloud4 | managed | online | out-of-sync |
+----+-----------+--------------+--------------------+-------------+
.. note::
The sync status is the rolled up sync status of platform, patching,
identity, etc.
To see synchronization details for a subcloud, use the following command:
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud show subcloud1
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 1 |
| name | subcloud1 |
| description | None |
| location | None |
| software_version | nn.nn |
| management | managed |
| availability | online |
| deploy_status | complete |
| management_subnet | fd01:82::0/64 |
| management_start_ip | fd01:82::2 |
| management_end_ip | fd01:82::11 |
| management_gateway_ip | fd01:82::1 |
| systemcontroller_gateway_ip | fd01:81::1 |
| group_id | 1 |
| created_at | 2020-07-15 19:23:50.966984 |
| updated_at | 2020-07-17 12:36:28.815655 |
| dc-cert_sync_status | in-sync |
| identity_sync_status | in-sync |
| load_sync_status | in-sync |
| patching_sync_status | in-sync |
| platform_sync_status | in-sync |
+-----------------------------+----------------------------+
#. To create an upgrade strategy, use the :command:`dcmanager upgrade-strategy create`
command.
The upgrade strategy for a |prod-dc| system controls how upgrades are
applied to subclouds.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy create \
[--subcloud-apply-type <type>] \
[-max-parallel-subclouds <i>] \
[-stop-on-failure <level>] \
[--group group] \
[<subcloud>]
where:
**subcloud-apply-type**
**parallel** or **serial**— determines whether the subclouds are
upgraded in parallel, or serially.
If this is not specified using the CLI, the values for
:command:`subcloud\_update\_type` defined for each subcloud group will
be used by default.
**max-parallel-subclouds**
Sets the maximum number of subclouds that can be upgraded in parallel
\(default 20\).
If this is not specified using the CLI, the values for
:command:`max\_parallel\_subclouds` defined for each subcloud group
will be used by default.
**stop-on-failure**
**true**\(default\) or **false**— determines whether upgrade
orchestration failure for a subcloud prevents application to subsequent
subclouds.
**group**
Optionally pass the name or ID of a subcloud group to the
:command:`dcmanager upgrade-strategy create` command. This results in a
strategy that is only applied to all subclouds in the specified group.
The subcloud group values are used for subcloud apply type and max
parallel subclouds parameters.
For example:
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy create
+------------------------+----------------------------+
| Field | Value |
+------------------------+----------------------------+
| strategy type | upgrade |
| subcloud apply type | parallel |
| max parallel subclouds | 10 |
| stop on failure | False |
| state | initial |
| created_at | 2020-06-10T17:16:51.857207 |
| updated_at | None |
+------------------------+----------------------------+
#. To show the settings for the upgrade strategy, use the
:command:`dcmanager upgrade-strategy show` command.
For example:
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy show
+------------------------+----------------------------+
| Field | Value |
+------------------------+----------------------------+
| subcloud apply type | parallel |
| max parallel subclouds | 20 |
| stop on failure | False |
| state | initial |
| created_at | 2020-02-02T14:42:13.822499 |
| updated_at | None |
+------------------------+----------------------------+
.. note::
A value of **None** for :command:`subcloud apply type`, and
:command:`max parallel subclouds` indicates that subcloud group values
are being used.
#. Review the upgrade strategy for the subclouds.
To show the subclouds that will be upgraded when the upgrade strategy is
applied, use the :command:`dcmanager strategy-step list` command. For
example:
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-step list
+------------------+-------+---------+---------+------------+-------------+
| cloud | stage | state | details | started_at | finished_at |
+------------------+-------+---------+---------+------------+-------------+
| subcloud-1 | 1 | initial | | None | None |
| subcloud-4 | 1 | initial | | None | None |
| subcloud-5 | 2 | initial | | None | None |
| subcloud-6 | 2 | initial | | None | None |
+------------------+-------+---------+---------+------------+-------------+
.. note::
All the subclouds that are included in the same stage will be upgraded
in parallel.
#. To apply the upgrade strategy, use the :command:`dcmanager upgrade-strategy apply`
command.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy apply
+------------------------+----------------------------+
| Field | Value |
+------------------------+----------------------------+
| subcloud apply type | parallel |
| max parallel subclouds | 20 |
| stop on failure | False |
| state | applying |
| created_at | 2020-02-02T14:42:13.822499 |
| updated_at | 2020-02-02T14:42:19.376688 |
+------------------------+----------------------------+
#. To show the step currently being performed on each of the subclouds, use
the :command:`dcmanager strategy-step list` command.
For example:
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-step list
+------------------+-------+-------------+-----------------------------+----------------------------+----------------------------+
| cloud | stage | state | details | started_at | finished_at |
+------------------+-------+-------------+-----------------------------+----------------------------+----------------------------+
| subcloud-1 | 2 | applying... | apply phase is 66% complete | 2020-03-13 14:12:12.262001 | 2020-03-13 14:15:52.450908 |
| subcloud-4 | 2 | applying... | apply phase is 83% complete | 2020-03-13 14:16:02.457588 | None |
| subcloud-5 | 2 | finishing | | 2020-03-13 14:16:02.463213 | None |
| subcloud-6 | 2 | applying... | apply phase is 66% complete | 2020-03-13 14:16:02.473669 | None |
+------------------+-------+-------------+-----------------------------+----------------------------+----------------------------+
#. To show the step currently being performed on a subcloud, use the
:command:`dcmanager strategy-step show` <subcloud> command.
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-step show <subcloud>
#. When the distributed upgrade orchestration complete, delete the upgrade
strategy, using the :command:`dcmanager upgrade-strategy delete` command.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy delete
+------------------------+----------------------------+
| Field | Value |
+------------------------+----------------------------+
| subcloud apply type | parallel |
| max parallel subclouds | 20 |
| stop on failure | False |
| state | deleting |
| created_at | 2020-03-23T20:04:50.992444 |
| updated_at | 2020-03-23T20:05:14.157352 |
+------------------------+----------------------------+
.. rubric:: |postreq|
.. _distributed-upgrade-orchestration-process-using-the-cli-ul-lx1-zcv-3mb:
- Check and update docker registry credentials for **ALL** subclouds. For
each subcloud:
.. code-block:: none
REGISTRY="docker-registry"
SECRET_UUID='system service-parameter-list | fgrep
$REGISTRY | fgrep auth-secret | awk '{print $10}''
SECRET_REF='openstack secret list | fgrep ${SECRET_UUID}|
awk '{print $2}''
openstack secret get ${SECRET_REF} --payload -f value
The secret payload should be, "username: sysinv password:<password>". If
the secret payload is, "username: admin password:<password>", see,
:ref:`Updating Docker Registry Credentials on a Subcloud
<updating-docker-registry-credentials-on-a-subcloud>` for more information.
.. only:: partner
.. include:: ../_includes/distributed-upgrade-orchestration-process-using-the-cli.rest

View File

@ -0,0 +1,140 @@
.. oeo1597292999568
.. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud:
===========================================================================
Failure During the Installation or Data Migration of N+1 Load on a Subcloud
===========================================================================
You may encounter some errors during Installation or Data migration of the
**N+1** load on a subcloud. This section explains the errors and the steps
required to fix these errors.
.. contents:: |minitoc|
:local:
:depth: 1
Errors can occur due to one of the following:
.. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-ul-j5r-czs-qmb:
- One or more invalid install values
- A network error that results in the subcloud's being temporarily unreachable
- An invalid docker registry certificate
**Failure Caused by Install Values**
If the subcloud install values contain an incorrect value, use the following
command to fix it.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud update <<subcloud-name>> --install-values <<subcloud-install-values-yaml>>
This type of failure is recoverable and you can rerun the upgrade strategy for
the failed subcloud\(s\) using the following procedure:
.. rubric:: |proc|
.. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-ol-lc1-cyr-qmb:
#. Delete the failed upgrade strategy.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy delete
#. Create a new upgrade strategy for the failed subcloud.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy create <<subcloud-name>> --force <<additional options>>
.. note::
If the upgrade failed during the |AIO|-SX upgrade or data migration, the
subcloud availability status is displayed as 'offline'. Use the
:command:`--force` option when creating the new strategy.
#. Apply the new upgrade strategy.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy apply
#. Verify the upgrade strategy status.
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-step list
.. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-section-f5f-j1y-qmb:
-----------------------------------------------------
Failure Caused by Invalid Docker Registry Certificate
-----------------------------------------------------
If the docker registry certificate on the subcloud is invalid/expired prior to
an upgrade, the upgrade will fail during data migration.
.. warning::
This type of failure cannot be recovered. You will need to re-deploy the
subcloud, redo all configuration changes, and regenerate the data.
.. note::
Ensure that the docker registry certificate on all subclouds must be
upgraded prior to performing an orchestrated upgrade.
To re-deploy the subcloud, use the following procedure:
.. rubric:: |proc|
.. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-ol-dpp-bzr-qmb:
#. Unmanage the failed subcloud.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud unmanage <<subcloud-name>>
#. Delete the subcloud.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud delete <<subcloud-name>>
#. Re-deploy the failed subcloud.
.. code-block:: none
~(keystone_admin)]$ dcmanager subcloud add <<parameters>>
.. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-section-lj4-1rr-qmb:
-----------------------------------------
Failure Post Data Migration on a Subcloud
-----------------------------------------
Once the data migration on the subcloud is completed, the upgrade is activated
and finalized. If failure occurs:
.. rubric:: |proc|
.. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-ul-ogc-cp5-qmb:
- Check specified log files
- Follow the recovery procedure. See :ref:`Failure Prior to the Installation
of N+1 Load on a Subcloud <failure-prior-to-the-installation-of-n+1-load-on-a-subcloud>`
.. only:: partner
.. include:: ../_includes/distributed-upgrade-orchestration-process-using-the-cli.rest

View File

@ -0,0 +1,61 @@
.. uvp1597292940831
.. _failure-prior-to-the-installation-of-n+1-load-on-a-subcloud:
===========================================================
Failure Prior to the Installation of N+1 Load on a Subcloud
===========================================================
You may encounter some errors prior to Installation of the **N+1** load on a
subcloud. This section explains the errors and the steps required to fix these
errors.
Errors can occur due to any one of the following:
.. _failure-prior-to-the-installation-of-n+1-load-on-a-subcloud-ul-onf-2vs-qmb:
- Insufficient disk space on scratch filesystems
- Missing subcloud install values
- Invalid license
- Invalid/corrupted load file
- The /home/sysadmin directory on the subcloud is too large
If you encounter any of the above errors, use the following procedure to fix
it:
.. rubric:: |proc|
#. Delete the failed upgrade strategy
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy delete
#. Create a new upgrade strategy.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy create <<additional options>>
.. note::
If only one subcloud fails the upgrade, specify the name of the
subcloud in the command.
#. Apply the new upgrade strategy.
.. code-block:: none
~(keystone_admin)]$ dcmanager upgrade-strategy apply
#. Verify the upgrade strategy status
.. code-block:: none
~(keystone_admin)]$ dcmanager strategy-step list

View File

@ -60,6 +60,30 @@ Kubernetes Version Upgrade Distributed Cloud Orchestration
the-kubernetes-distributed-cloud-update-orchestration-process the-kubernetes-distributed-cloud-update-orchestration-process
configuring-kubernetes-update-orchestration-on-distributed-cloud configuring-kubernetes-update-orchestration-on-distributed-cloud
------------------
Upgrade management
------------------
.. toctree::
:maxdepth: 1
upgrade-management-overview
upgrading-the-systemcontroller-using-the-cli
*******************************************************************
Upgrade Orchestration for Distributed Cloud SubClouds using the CLI
*******************************************************************
.. toctree::
:maxdepth: 1
distributed-upgrade-orchestration-process-using-the-cli
aborting-the-distributed-upgrade-orchestration
configuration-for-specific-subclouds
robust-error-handling-during-an-orchestrated-upgrade
failure-prior-to-the-installation-of-n+1-load-on-a-subcloud
failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud
-------- --------
Appendix Appendix
-------- --------

View File

@ -109,7 +109,7 @@ subcloud, the subcloud installation has these phases:
.. code-block:: none .. code-block:: none
# Specify the WRCP software version, for example '20.06' for the WRCP 20.06 release of software. # Specify the |pp| software version, for example 'nn.nn' for the |pp| nn.nn release of software.
software_version: <software_version> software_version: <software_version>
bootstrap_interface: <bootstrap_interface_name> # e.g. eno1 bootstrap_interface: <bootstrap_interface_name> # e.g. eno1
bootstrap_address: <bootstrap_interface_ip_address> # e.g.128.224.151.183 bootstrap_address: <bootstrap_interface_ip_address> # e.g.128.224.151.183

View File

@ -13,7 +13,7 @@ system.
The Central Cloud supports either The Central Cloud supports either
- an |AIO|-Duplex deployment configuration - an |AIO-DX| deployment configuration
- a Standard with Dedicated Storage Nodes deployment Standard with Controller - a Standard with Dedicated Storage Nodes deployment Standard with Controller
Storage and one or more workers deployment configuration, or Storage and one or more workers deployment configuration, or

View File

@ -0,0 +1,41 @@
.. ziu1597089603252
.. _robust-error-handling-during-an-orchestrated-upgrade:
====================================================
Robust Error Handling During An Orchestrated Upgrade
====================================================
This section describes the errors you may encounter during an orchestrated
upgrade and the steps you can use to troubleshoot the errors.
.. rubric:: |prereq|
For a successful orchestrated upgrade, ensure the upgrade prerequisites,
procedure, and postrequisites are met.
If a failure occurs, use the following general steps:
.. _robust-error-handling-during-an-orchestrated-upgrade-ol-l5y-mby-qmb:
#. Allow the failed strategy to complete on its own.
#. Check the output using the :command:`dcmanager strategy-step list` command
for failures, if any.
#. Address the cause of the failure. For more information, see :ref:`Failure
During the Installation or Data Migration of N+1 Load on a Subcloud
<failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud>`.
#. Rerun the orchestrated upgrade. For more information, see :ref:`Distributed
Upgrade Orchestration Process Using the CLI
<distributed-upgrade-orchestration-process-using-the-cli>`.
.. seealso::
:ref:`Failure Prior to the Installation of N+1 Load on a Subcloud
<failure-prior-to-the-installation-of-n+1-load-on-a-subcloud>`
:ref:`Failure During the Installation or Data Migration of N+1 Load on a
Subcloud <failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud>`

View File

@ -0,0 +1,120 @@
.. gjf1592841770001
.. _upgrade-management-overview:
===========================
Upgrade Management Overview
===========================
You can upgrade |prod|'s |prod-dc|'s SystemController, and subclouds with a new
release of |prod| software.
.. rubric:: |context|
.. note::
Backup all yaml files that are updated using the Redfish Platform
Management service. For more information, see, :ref:`Installing a Subcloud
Using Redfish Platform Management Service
<installing-a-subcloud-using-redfish-platform-management-service>`.
You can use the |CLI| to manage upgrades. The workflow for upgrades is as
follows:
.. _upgrade-management-overview-ol-uqv-p24-3mb:
#. To upgrade the |prod-dc| system, you must first upgrade the
SystemController. See, :ref:`Upgrading the SystemController Using the CLI
<upgrading-the-systemcontroller-using-the-cli>`.
#. Use |prod-dc| Upgrade Orchestration to upgrade the subclouds. See,
:ref:`Distributed Upgrade Orchestration Process Using the CLI <distributed-upgrade-orchestration-process-using-the-cli>`.
#. To handle errors during an orchestrated upgrade, see :ref:`Robust Error
Handling During An Orchestrated Upgrade
<robust-error-handling-during-an-orchestrated-upgrade>`.
.. rubric:: |prereq|
The following prerequisites apply to a |prod-dc| upgrade management service.
.. _upgrade-management-overview-ul-smx-y2m-cmb:
- **Configuration Verification**: Ensure that the following configurations
are verified before you proceed with the upgrade on the |prod-dc|
and subclouds:
- Run the :command:`system application-list` command to ensure that all
applications are running
- Run the :command:`system host-list` command to list the configured
hosts
- Run the :command:`dcmanager subcloud list` command to list the
subclouds
- Run the :command:`kubectl get pods --all-namespaces` command to test
that the authentication token validates correctly
- Run the :command:`fm alarm-list` command to check the system health to
ensure that there are no unexpected alarms
- Run the :command:`kubectl get host -n deployment` command to ensure all
nodes in the cluster have reconciled and is set to 'true'
- Ensure **controller-0** is the active controller
- The subclouds must all be |AIO-DX|, and using the Redfish
platform management service.
- **Remove Non GA Applications**:
- Use the following command to remove the analytics application on the
subclouds:
- :command:`system application-remove wra-analytics`
- :command:`system application-delete wra-analytics`
- Remove any non-GA applications such as Wind River Analytics, and
|prefix|-openstack, from the |prod-dc| system, if they exist.
- **Increase Scratch File System Size**:
- Check the size of scratch partition on both the system controller and
subclouds using the :command:`system host-fs-list` command.
.. note::
Increase in scratch filesystem size is also required on each
subcloud.
- All controller nodes and subclouds should have a minimum of 16G scratch
file system. The process of importing a new load for upgrade will
temporarily use up to 11G of scratch disk space. Use the :command:`system
host-fs-modify` command to increase scratch size on **each controller
node** and subcloud controllers as needed in preparation for software
upgrade. For example, run the following commands:
.. code-block:: none
~(keystone_admin)]$ system host-fs-modify controller-0 scratch=16
Run the :command:`fm alarm-list` command to check the system health to
ensure that there are no unexpected alarms
- For orchestrated subcloud upgrades the install-values for each subcloud
that was used for deployment must be saved and restored to the SystemController
after the SystemController upgrade.
- Run the :command:`kubectl -n kube-system get secret` command on the
SystemController before upgrading subclouds, as the docker **rvmc** image on
orchestrated subcloud upgrade tries to copy the :command:`kube-system
default-registry-key`.
.. only:: partner
.. include:: ../_includes/upgrade-management-overview.rest

View File

@ -0,0 +1,485 @@
.. vco1593176327490
.. _upgrading-the-systemcontroller-using-the-cli:
==========================================
Upgrade the SystemController Using the CLI
==========================================
You can upload and apply upgrades to the SystemController in order to upgrade
the central repository, from the CLI. The SystemController can be upgraded
using either a manual software upgrade procedure or by using the
non-distributed systems :command:`sw-manager` orchestration procedure.
.. rubric:: |context|
Follow the steps below to manually upgrade the SystemController:
.. rubric:: |proc|
.. _upgrading-the-systemcontroller-using-the-cli-steps-oq4-dgm-cmb:
#. Source the platform environment.
.. code-block:: none
$ source /etc/platform/openrc
~(keystone_admin)]$
.. only:: partner
.. include:: ../_includes/upgrading-the-systemcontroller-using-the-cli.rest
#. Import the software release load, and copy the iso file to controller-0 \(active controller\).
.. code-block:: none
~(keystone_admin)]$ system --os-region-name SystemController load-import <bootimage>.iso <bootimage>.sig
For example,
.. code-block:: none
~(keystone_admin)]$ system --os-region-name SystemController load-import <bootimage>.iso <bootimage>.sig
#. Apply any required software updates. After the update is installed ensure
controller-0 is active.
The system must be 'patch current'. All software updates related to your
current |prod| software release must be uploaded, applied, and installed.
All software updates to the new |prod| release, only need to be uploaded
and applied. The install of these software updates will occur automatically
during the software upgrade procedure as the hosts are reset to load the
new release of software.
To find and download applicable updates, visit the `Wind River Support
Network <https://docs.windriver.com>`__.
.. xbooklink For more information, see |updates-doc|: :ref:`Managing Software Updates <managing-software-updates>`.
#. Confirm that the system is healthy.
Check the current system health status, resolve any alarms and other issues
reported by the :command:`health-query-upgrade` command, then recheck the
system health status to confirm that all **System Health** fields are set
to **OK**.
.. code-block:: none
~(keystone_admin)]$ system health-query-upgrade
System Health:
All hosts are provisioned: [OK]
All hosts are unlocked/enabled: [OK]
All hosts have current configurations: [OK]
All hosts are patch current: [OK]
Ceph Storage Healthy: [OK]
No alarms: [OK]
All kubernetes nodes are ready: [OK]
All kubernetes control plane pods are ready: [OK]
Required patches are applied: [OK]
License valid for upgrade: [OK]
By default, the upgrade process cannot run and is not recommended to run
with active alarms present. It is strongly recommended that you clear your
system of all alarms before doing an upgrade.
.. note::
Use the command :command:`system upgrade-start --force` to force the
upgrades process to start and to ignore management affecting alarms.
This should ONLY be done if these alarms do not cause an issue for the
upgrades process.
If there are alarms present during the upgrade, subcloud load sync\_status
will display "out-of-sync".
#. Start the upgrade from controller-0.
Make sure that controller-0 is the active controller, and you are logged
into controller-0 as **sysadmin** and your present working directory is
your home directory.
.. code-block:: none
~(keystone_admin)]$ system upgrade-start
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
| state | starting |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
This will make a copy of the system data to be used in the upgrade.
Configuration changes are not allowed after this point until the swact to
controller-1 is completed.
The following upgrade state applies once this command is executed. Run the
:command:`system upgrade-show` command to verify the status of the upgrade.
- started:
- State entered after :command:`system upgrade-start` completes.
- Release 20.04 system data \(for example, postgres databases\) has
been exported to be used in the upgrade.
- Configuration changes must not be made after this point, until the
upgrade is completed.
As part of the upgrade, the upgrade process checks the health of the system
and validates that the system is ready for an upgrade.
The upgrade process checks that no alarms are active before starting an
upgrade.
.. note::
Use the command :command:`system upgrade-start --force` to force the
upgrades process to start and to ignore management affecting alarms.
This should ONLY be done if these alarms do not cause an issue for the
upgrades process.
If there are alarms present during the upgrade, subcloud load
sync\_status will display "out-of-sync".
On systems with Ceph storage, it also checks that the Ceph cluster is
healthy.
#. Upgrade controller-1.
#. Lock controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-lock controller-1
#. Start the upgrade on controller-1.
Controller-1 installs the update and reboots, then performs data
migration.
.. code-block:: none
~(keystone_admin)]$ system host-upgrade controller-1
Wait for controller-1 to reinstall with the load N+1 and becomes
**locked-disabled-online** state.
The following data migration states apply when this command is executed.
- data-migration:
- State entered when :command:`system host-upgrade controller-1`
is executed.
- System data is being migrated from release N to release N+1.
- data-migration-complete:
- State entered when controller-1 upgrade is complete.
- System data has been successfully migrated from release nn.nn
to release nn.nn.
where *nn.nn* in the update file name is the |prod| release number.
- data-migration-failed:
- State entered if data migration on controller-1 fails.
- Upgrade must be aborted.
#. Check the upgrade state.
.. code-block:: none
~(keystone_admin)]$ system upgrade-show
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | e7c8f6bc-518c-46d4-ab81-7a59f8f8e64b |
| state | data-migration-complete |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
If the :command:`upgrade-show` status indicates
'data-migration-failed', then there is an issue with the data
migration. Check the issue before proceeding to the next step.
.. note::
Do not unlock controller-1, before running :command:`system
upgrade-show` to display the upgrade status
"data-migration-complete".
#. Unlock controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-1
Wait for controller-1 to become **unlocked-enabled**. Wait for the DRBD
sync **400.001** Services-related alarm is raised and then cleared.
The following states apply when this command is executed.
- upgrading-controllers:
- State entered when controller-1 has been unlocked and is
running release nn.nn software.
where *nn.nn* in the update file name is the |prod| release
number.
If it transitions to **unlocked-disabled-failed**, check the issue
before proceeding to the next step. The alarms may indicate a
configuration error. Check the result of the configuration logs on
controller-1, \(for example, Error logs in
controller1:/var/log/puppet\).
#. Run the :command:`system application-list`, and :command:`system
host-upgrade-list` commands to view the current progress.
#. Set controller-1 as the active controller. Swact to controller-1.
.. code-block:: none
~(keystone_admin)]$ system host-swact controller-0
Wait until services have gone active on the new active controller-1 before
proceeding to the next step. When all services on controller-1 are
enabled-active, the swact is complete.
.. note::
Continue the remaining steps below to manually upgrade or use upgrade
orchestration to upgrade the remaining nodes.
#. Upgrade **controller-0**. For more information, see
.. xbooklink :ref:`|updates-doc| <software-updates-and-upgrades-software-updates>`.
#. Lock **controller-0**.
.. code-block:: none
~(keystone_admin)]$ system host-lock controller-0
#. Upgrade **controller-0**.
.. code-block:: none
~(keystone_admin)]$ system host-upgrade controller-0
#. Unlock **controller-0**.
.. code-block:: none
~(keystone_admin)]$ system host-unlock controller-0
Wait until the DRBD sync **400.001** Services-related alarm is raised
and then cleared before proceeding to the next step.
- upgrading-hosts:
- State entered when both controllers are running release nn.nn
software.
#. Check the system health to ensure that there are no unexpected alarms.
.. code-block:: none
~(keystone_admin)]$ fm alarm-list
Clear all alarms unrelated to the upgrade process.
#. If using Ceph storage backend, upgrade the storage nodes one at a time.
The storage node must be locked and all OSDs must be down in order to do
the upgrade.
#. Lock storage-0.
.. code-block:: none
~(keystone_admin)]$ system host-lock storage-0
#. Verify that the OSDs are down after the storage node is locked.
In the Horizon interface, navigate to **Admin** \> **Platform** \>
**Storage Overview** to view the status of the OSDs.
#. Upgrade storage-0.
.. code-block:: none
~(keystone_admin)]$ system host-upgrade storage-0
The upgrade is complete when the node comes online, and at that point,
you can safely unlock the node.
After upgrading a storage node, but before unlocking, there are Ceph
synchronization alarms \(that appear to be making progress in
synching\), and there are infrastructure network interface alarms
\(since the infrastructure network interface configuration has not been
applied to the storage node yet, as it has not been unlocked\).
Unlock the node as soon as the upgraded storage node comes online.
#. Unlock storage-0.
.. code-block:: none
~(keystone_admin)]$ system host-unlock storage-0
Wait for all alarms to clear after the unlock before proceeding to
upgrade the next storage host.
#. Repeat the above steps for each storage host.
.. note::
After upgrading the first storage node you can expect alarm
**800.003**. The alarm is cleared after all storage nodes are
upgraded.
#. If worker nodes are present, upgrade worker hosts, serially or parallelly,
if any.
#. Lock worker-0.
.. code-block:: none
~(keystone_admin)]$ system host-lock worker-0
#. Upgrade worker-0.
.. code-block:: none
~(keystone_admin)]$ system host-upgrade worker-0
Wait for the host to run the installer, reboot, and go online before
unlocking it in the next step.
#. Unlock worker-0.
.. code-block:: none
~(keystone_admin)]$ system host-unlock worker-0
Wait for all alarms to clear after the unlock before proceeding to the
next worker host.
#. Repeat the above steps for each worker host.
#. Set controller-0 as the active controller. Swact to controller-0.
.. code-block:: none
~(keystone_admin)]$ system host-swact controller-1
Wait until services have gone active on the active controller-0 before
proceeding to the next step. When all services on controller-0 are
enabled-active, the swact is complete.
#. Activate the upgrade.
.. code-block:: none
~(keystone_admin)]$ system upgrade-activate
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
| state | activating |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
During the running of the :command:`upgrade-activate` command, new
configurations are applied to the controller. 250.001 \(**hostname
Configuration is out-of-date**\) alarms are raised and are cleared as the
configuration is applied. The upgrade state goes from **activating** to
**activation-complete** once this is done.
The following states apply when this command is executed.
- activation-requested:
- State entered when :command:`system upgrade-activate` is executed.
- activating:
- State entered when we have started activating the upgrade by
applying new configurations to the controller and compute hosts.
- activation-complete:
- State entered when new configurations have been applied to all
controller and compute hosts.
#. Check the status of the upgrade again to see it has reached
**activation-complete**, for example.
.. code-block:: none
~(keystone_admin)]$ system upgrade-show
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
| state | activation-complete |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
.. note::
Alarms are generated as the subcloud load sync\_status is "out-of-sync".
#. Complete the upgrade.
.. code-block:: none
~(keystone_admin)]$ system upgrade-complete
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
| state | completing |
| from_release | nn.nn |
| to_release | nn.nn |
+--------------+--------------------------------------+
Run the :command:`system upgrade-show` command, and the status will display
"no upgrade in progress". The subclouds will be out-of-sync.
.. rubric:: |postreq|
.. warning::
Do NOT delete the N load from the SystemController once the upgrade is
complete. If the load is deleted from the SystemController, you must
manually delete the N load from each subcloud.