DC Orchestration for AIO-DX & Standard Subclouds

Distributed Upgrade Orchestration Process Using the CLI - Modified this topic to include 2 prerequisites Changing Case on file extensions. Updated comments for Patchset 3 Fixed merge conflicts Updated comments for Patchset 4 Fixed merge conflicts Story: 2008055 Task: 42387 Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com> Change-Id: Ia2c44812052c4f70f4742923fa847698cc0d6fa6 Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com>
2021-05-04 18:38:24 -04:00 · 2021-05-04 18:38:24 -04:00 · 94fd67c34a
commit 94fd67c34a
parent cebb02eb21
17 changed files with 1392 additions and 4 deletions
--- a/doc/source/_includes/distributed-upgrade-orchestration-process-using-the-cli.rest
+++ b/doc/source/_includes/distributed-upgrade-orchestration-process-using-the-cli.rest
--- a/doc/source/_includes/failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud.rest
+++ b/doc/source/_includes/failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud.rest
--- a/doc/source/_includes/upgrade-management-overview.rest
+++ b/doc/source/_includes/upgrade-management-overview.rest
--- a/doc/source/_includes/upgrading-the-systemcontroller-using-the-cli.rest
+++ b/doc/source/_includes/upgrading-the-systemcontroller-using-the-cli.rest
--- a/doc/source/dist_cloud/.vscode/settings.json
+++ b/doc/source/dist_cloud/.vscode/settings.json
@ -0,0 +1,3 @@
 {
    "restructuredtext.confPath": ""
 }
--- a/doc/source/dist_cloud/aborting-the-distributed-upgrade-orchestration.rst
+++ b/doc/source/dist_cloud/aborting-the-distributed-upgrade-orchestration.rst
@ -0,0 +1,21 @@
 .. hil1593180554641
 .. _aborting-the-distributed-upgrade-orchestration:
 ==============================================
 Aborting the Distributed Upgrade Orchestration
 ==============================================
 To abort the current upgrade orchestration operation, use the
 :command:`upgrade-strategy abort` command.
 .. note::
    The :command:`dcmanager upgrade-strategy abort` command completes the
    current upgrading stage before aborting, to prevent hosts from being left
    in a locked state requiring manual intervention.
 .. code-block:: none
    ~(keystone_admin)]$ dcmanager upgrade-strategy abort
--- a/doc/source/dist_cloud/configuration-for-specific-subclouds.rst
+++ b/doc/source/dist_cloud/configuration-for-specific-subclouds.rst
@ -0,0 +1,158 @@
 .. jul1593180757282
 .. _configuration-for-specific-subclouds:
 ====================================
 Configuration for Specific Subclouds
 ====================================
 To determine how upgrades are applied to the nodes on each subcloud, the
 upgrade strategy refers to separate configuration settings.
 The following settings are applied by default:
 .. _configuration-for-specific-subclouds-ul-sgb-p34-gdb:
 -   storage apply type: parallel
 -   worker apply type: parallel
 -   max parallel workers: 10
 -   alarm restriction type: relaxed
 -   default instance action: migrate \(This parameter is only applicable to
    hosted application |VMs| with the stx-openstack application.\)
 To update the default values, use the :command:`dcmanager strategy-config
 update` command. You can also use this command to configure custom behavior for
 individual subclouds.
 -   To list the default upgrade strategy and any custom configurations
    configured for individual subclouds, use the :command:`strategy-config
    list` command.
    For example:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-config list
        +--------------------+--------------------+--------------------+-----------------------+------------------------+------------------+
        | cloud              | storage apply type | worker apply type  | max parallel workers  | alarm restriction type | default instance |
        |                    |                    |                    |                       |                        | action           |
        +--------------------+--------------------+--------------------+-----------------------+------------------------+------------------+
        | all clouds default | parallel           | parallel           |                    10 | relaxed                | migrate          |
        | subcloud-6         | parallel           | parallel           |                     2 | relaxed                | stop-start       |
        +--------------------+--------------------+--------------------+-----------------------+------------------------+------------------+
 -   To show the configuration settings applicable to all subclouds by default,
    use the :command:`strategy-config show` command.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-config show
        +-------------------------+--------------------+
        | Field                   | Value              |
        +-------------------------+--------------------+
        | cloud                   | all clouds default |
        | storage apply type      | parallel           |
        | worker apply type       | parallel           |
        | max parallel workers    | 10                 |
        | alarm restriction type  | relaxed            |
        | default instance action | migrate            |
        | created_at              | None               |
        | updated_at              | None               |
        +-------------------------+--------------------+
 -   To update the settings, or to create a custom configuration for a subcloud,
    use the :command:`strategy-config update` command.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-config update \
         \
        --storage-apply-type <type> \
        --worker-apply-type <type> \
        --max-parallel-workers <i> \
        --alarm-restriction-type <level> \
        --default-instance-action <action> \
        [<subcloud_name>]
    where
    **storage apply type**
        parallel or serial — determines whether storage nodes are upgraded in
        parallel or serially.
    **worker apply type**
        parallel or serial — determines whether worker nodes are upgraded in
        parallel or serially.
    **max parallel workers**
        Set the maximum number of worker nodes that can be upgraded in
        parallel.
    **alarm restriction type**
        relaxed or strict — determines whether the orchestration is aborted for
        alarms that are not management-affecting. For more information, refer
        to the
 .. xbooklink :ref:`|updates-doc| <software-updates-and-upgrades-software-updates>` guide.
    **default instance action**
        .. note::
            This parameter is only applicable to hosted application |VMs| with
            the stx-openstack application.
        migrate or stop-start — determines whether hosted application |VMs| are
        migrated or stopped and restarted when a worker host is upgraded
    **subcloud\_name**
        The name of the subcloud to use the custom strategy. If this omitted,
        the default upgrade strategy is updated.
    .. note::
        You must specify all of the settings.
 -   To show the configuration settings for a subcloud, use the
    :command:`strategy-config show` <subcloud> command.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-config show [<name>]
    For example:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-config show subcloud-6
        +-------------------------+----------------------------+
        | Field                   | Value                      |
        +-------------------------+----------------------------+
        | cloud                   | subcloud-6                 |
        | storage apply type      | parallel                   |
        | worker apply type       | parallel                   |
        | max parallel workers    | 2                          |
        | alarm restriction type  | relaxed                    |
        | default instance action | stop-start                 |
        | created_at              | 2020-03-12 20:08:48.917866 |
        | updated_at              | None                       |
        +-------------------------+----------------------------+
    If custom configuration settings have not been created for the subcloud,
    the following message is displayed:
    .. code-block:: none
        ERROR (app) No options found for Subcloud with id 1, defaults will be
        used.
--- a/doc/source/dist_cloud/creating-subcloud-groups.rst
+++ b/doc/source/dist_cloud/creating-subcloud-groups.rst
@ -104,8 +104,8 @@ Deletes subcloud group details from the database.
        +--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+
        |id|name  |desc|loc.|sof.ver|mgmnt  |avail |deploy_stat|mgmt_subnet|mgmt_start_ip|mgmt_end_ip|mgmt_gtwy_ip|sysctrl_gtwy|grp_id|created_at|updated_at|
        +--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+
-        |3 |subcl1|None|None|20.06  |managed|online|complete   |fd01:12::0.|fd01:12::2   |fd01:12::11|fd01:12::1  |fd01:11::1  | 2    |2021-01-09|2021-01-12|
+        |3 |subcl1|None|None|nn.nn  |managed|online|complete   |fd01:12::0.|fd01:12::2   |fd01:12::11|fd01:12::1  |fd01:11::1  | 2    |2021-01-09|2021-01-12|
-        |4 |subcl2|None|None|20.06  |managed|online|complete   |fd01:13::0.|fd01:13::2   |fd01:13::11|fd01:13::1  |fd01:11::1  | 2    |2021-01-09|2021-01-12|
+        |4 |subcl2|None|None|nn.nn  |managed|online|complete   |fd01:13::0.|fd01:13::2   |fd01:13::11|fd01:13::1  |fd01:11::1  | 2    |2021-01-09|2021-01-12|
        +--+------+----+----+-------+-------+------+-----------+-----------+-------------+-----------+------------+------------+------+----------+----------+
 -   To show the details of a subcloud group, use the following command:
--- a/doc/source/dist_cloud/distributed-upgrade-orchestration-process-using-the-cli.rst
+++ b/doc/source/dist_cloud/distributed-upgrade-orchestration-process-using-the-cli.rst
@ -0,0 +1,335 @@
 .. pek1594745988225
 .. _distributed-upgrade-orchestration-process-using-the-cli:
 =======================================================
 Distributed Upgrade Orchestration Process Using the CLI
 =======================================================
 Distributed upgrade orchestration can be initiated after the upgrade and
 stability of the SystemController cloud. Upgrade orchestration automatically
 iterates through each of the subclouds, installing the new software load on
 each one.
 .. rubric:: |context|
 The user first creates a distributed upgrade orchestration strategy, or plan,
 for the automated upgrade procedure. This customizes the upgrade orchestration,
 using parameters to specify:
 .. _distributed-upgrade-orchestration-process-using-the-cli-ul-eyw-fyr-31b:
 -   whether to stop on failure of a subcloud upgrade or continue with the next
    subcloud
 -   whether to upgrade hosts serially or in parallel
 Based on these parameters, and the state of the subclouds, distributed upgrade
 orchestration creates a number of stages for the overall upgrade strategy. All
 the subclouds that are included in the same stage will be upgraded in parallel.
 .. rubric:: |prereq|
 Distributed upgrade orchestration can only be done on a system that meets the
 following conditions:
 .. _distributed-upgrade-orchestration-process-using-the-cli-ul-blp-gcx-ry:
 -   The subclouds must use the Redfish platform management service if it is
    an |AIO-SX| subcloud.
 -   Duplex \(|AIO-DX|/Standard\) upgrades are supported, and they do not
    require remote install using Redfish.
 -   Redfish |BMC| is required for orchestrated subcloud upgrades. The install
    values, and :command:`bmc\_password` for each |AIO-SX| subcloud controller
    must be provided using the following |CLI| command on the SystemController:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager subcloud update subcloud1 --install-values\
        install-values.yaml --bmc-password <password>
    For more information on :command:`install-values.yaml` file, see
    :ref:`Installing a Subcloud Using Redfish Platform Management Service
    <installing-a-subcloud-using-redfish-platform-management-service>`.
 -   All subclouds are clear of alarms \(with the exception of the alarm upgrade
    in progress\).
 -   All hosts of all subclouds must be unlocked, enabled, and available.
 -   No distributed update orchestration strategy exists, to verify use the
    command :command:`dcmanager upgrade-stratagy-show`. An upgrade cannot be
    orchestrated while update orchestration is in progress.
 -   Verify the size and format of the platform-backup filesystem on each
    subcloud. From the shell on each subcloud, use the following command to view
    the details of the file system:
    :command:`df -Th /opt/platform-backup`
    The type must be ext4 and the size must be 9.5GB. For example, on
    controller-0, run the following command:
    .. code-block:: none
        ~(keystone_admin)]$ df -Th /opt/platform-backup/ Filesystem Type Size Used Avail Use% Mounted on /dev/sda2 ext4 9.5G 51M 9.0G 1% /opt/platform-backup
 -   **If a previous upgrade has been done on the subcloud**, from the shell on
    each subcloud, use the following command to remove the previous upgrade
    data:
    :command:`sudo rm /opt/platform-backup/upgrade\_data\*`
 .. rubric:: |proc|
 .. _distributed-upgrade-orchestration-process-using-the-cli-steps-vcm-pq4-3mb:
 #.  Review the upgrade status for the subclouds.
    After the SystemController upgrade is completed, wait for 10 minutes for
    the **load\_sync\_status** of all subclouds to be updated.
    To identify which subclouds are upgrade-current \(in-sync\), use the
    :command:`subcloud list` command. For example:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager subcloud list
        +----+-----------+--------------+--------------------+-------------+
        | id | name      | management   | availability       | sync        |
        +----+-----------+--------------+--------------------+-------------+
        |  1 | subcloud1 | managed      | online             | out-of-sync |
        |  2 | subcloud2 | managed      | online             | out-of-sync |
        |  3 | subcloud3 | managed      | online             | out-of-sync |
        |  4 | subcloud4 | managed      | online             | out-of-sync |
        +----+-----------+--------------+--------------------+-------------+
    .. note::
        The sync status is the rolled up sync status of platform, patching,
        identity, etc.
    To see synchronization details for a subcloud, use the following command:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager subcloud show subcloud1
        +-----------------------------+----------------------------+
        | Field                       | Value                      |
        +-----------------------------+----------------------------+
        | id                          | 1                          |
        | name                        | subcloud1                  |
        | description                 | None                       |
        | location                    | None                       |
        | software_version            | nn.nn                      |
        | management                  | managed                    |
        | availability                | online                     |
        | deploy_status               | complete                   |
        | management_subnet           | fd01:82::0/64              |
        | management_start_ip         | fd01:82::2                 |
        | management_end_ip           | fd01:82::11                |
        | management_gateway_ip       | fd01:82::1                 |
        | systemcontroller_gateway_ip | fd01:81::1                 |
        | group_id                    | 1                          |
        | created_at                  | 2020-07-15 19:23:50.966984 |
        | updated_at                  | 2020-07-17 12:36:28.815655 |
        | dc-cert_sync_status         | in-sync                    |
        | identity_sync_status        | in-sync                    |
        | load_sync_status            | in-sync                    |
        | patching_sync_status        | in-sync                    |
        | platform_sync_status        | in-sync                    |
        +-----------------------------+----------------------------+
 #.  To create an upgrade strategy, use the :command:`dcmanager upgrade-strategy create`
    command.
    The upgrade strategy for a |prod-dc| system controls how upgrades are
    applied to subclouds.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy create \
        [--subcloud-apply-type <type>] \
        [–-max-parallel-subclouds <i>] \
        [–-stop-on-failure <level>] \
        [--group group] \
        [<subcloud>]
    where:
    **subcloud-apply-type**
        **parallel** or **serial**— determines whether the subclouds are
        upgraded in parallel, or serially.
        If this is not specified using the CLI, the values for
        :command:`subcloud\_update\_type` defined for each subcloud group will
        be used by default.
    **max-parallel-subclouds**
        Sets the maximum number of subclouds that can be upgraded in parallel
        \(default 20\).
        If this is not specified using the CLI, the values for
        :command:`max\_parallel\_subclouds` defined for each subcloud group
        will be used by default.
    **stop-on-failure**
        **true**\(default\) or **false**— determines whether upgrade
        orchestration failure for a subcloud prevents application to subsequent
        subclouds.
    **group**
        Optionally pass the name or ID of a subcloud group to the
        :command:`dcmanager upgrade-strategy create` command. This results in a
        strategy that is only applied to all subclouds in the specified group.
        The subcloud group values are used for subcloud apply type and max
        parallel subclouds parameters.
    For example:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy create
        +------------------------+----------------------------+
        | Field                  | Value                      |
        +------------------------+----------------------------+
        | strategy type          | upgrade                    |
        | subcloud apply type    | parallel                   |
        | max parallel subclouds | 10                         |
        | stop on failure        | False                      |
        | state                  | initial                    |
        | created_at             | 2020-06-10T17:16:51.857207 |
        | updated_at             | None                       |
        +------------------------+----------------------------+
 #.  To show the settings for the upgrade strategy, use the
    :command:`dcmanager upgrade-strategy show` command.
    For example:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy show
        +------------------------+----------------------------+
        | Field                  | Value                      |
        +------------------------+----------------------------+
        | subcloud apply type    | parallel                   |
        | max parallel subclouds | 20                         |
        | stop on failure        | False                      |
        | state                  | initial                    |
        | created_at             | 2020-02-02T14:42:13.822499 |
        | updated_at             | None                       |
        +------------------------+----------------------------+
    .. note::
        A value of **None** for :command:`subcloud apply type`, and
        :command:`max parallel subclouds` indicates that subcloud group values
        are being used.
 #.  Review the upgrade strategy for the subclouds.
    To show the subclouds that will be upgraded when the upgrade strategy is
    applied, use the :command:`dcmanager strategy-step list` command. For
    example:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-step list
        +------------------+-------+---------+---------+------------+-------------+
        | cloud            | stage | state   | details | started_at | finished_at |
        +------------------+-------+---------+---------+------------+-------------+
        | subcloud-1       |     1 | initial |         | None       | None        |
        | subcloud-4       |     1 | initial |         | None       | None        |
        | subcloud-5       |     2 | initial |         | None       | None        |
        | subcloud-6       |     2 | initial |         | None       | None        |
        +------------------+-------+---------+---------+------------+-------------+
    .. note::
        All the subclouds that are included in the same stage will be upgraded
        in parallel.
 #.  To apply the upgrade strategy, use the :command:`dcmanager upgrade-strategy apply`
    command.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy apply
        +------------------------+----------------------------+
        | Field                  | Value                      |
        +------------------------+----------------------------+
        | subcloud apply type    | parallel                   |
        | max parallel subclouds | 20                         |
        | stop on failure        | False                      |
        | state                  | applying                   |
        | created_at             | 2020-02-02T14:42:13.822499 |
        | updated_at             | 2020-02-02T14:42:19.376688 |
        +------------------------+----------------------------+
 #.  To show the step currently being performed on each of the subclouds, use
    the :command:`dcmanager strategy-step list` command.
    For example:
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-step list
        +------------------+-------+-------------+-----------------------------+----------------------------+----------------------------+
        | cloud            | stage | state       | details                     | started_at                 | finished_at                |
        +------------------+-------+-------------+-----------------------------+----------------------------+----------------------------+
        | subcloud-1       |     2 | applying... | apply phase is 66% complete | 2020-03-13 14:12:12.262001 | 2020-03-13 14:15:52.450908 |
        | subcloud-4       |     2 | applying... | apply phase is 83% complete | 2020-03-13 14:16:02.457588 | None                       |
        | subcloud-5       |     2 | finishing   |                             | 2020-03-13 14:16:02.463213 | None                       |
        | subcloud-6       |     2 | applying... | apply phase is 66% complete | 2020-03-13 14:16:02.473669 | None                       |
        +------------------+-------+-------------+-----------------------------+----------------------------+----------------------------+
 #.  To show the step currently being performed on a subcloud, use the
    :command:`dcmanager strategy-step show` <subcloud> command.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-step show <subcloud>
 #.  When the distributed upgrade orchestration complete, delete the upgrade
    strategy, using the :command:`dcmanager upgrade-strategy delete` command.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy delete
        +------------------------+----------------------------+
        | Field                  | Value                      |
        +------------------------+----------------------------+
        | subcloud apply type    | parallel                   |
        | max parallel subclouds | 20                         |
        | stop on failure        | False                      |
        | state                  | deleting                   |
        | created_at             | 2020-03-23T20:04:50.992444 |
        | updated_at             | 2020-03-23T20:05:14.157352 |
        +------------------------+----------------------------+
 .. rubric:: |postreq|
 .. _distributed-upgrade-orchestration-process-using-the-cli-ul-lx1-zcv-3mb:
 -   Check and update docker registry credentials for **ALL** subclouds. For
    each subcloud:
    .. code-block:: none
        REGISTRY="docker-registry"
        SECRET_UUID='system service-parameter-list | fgrep
        $REGISTRY | fgrep auth-secret | awk '{print $10}''
        SECRET_REF='openstack secret list | fgrep ${SECRET_UUID}|
        awk '{print $2}''
        openstack secret get ${SECRET_REF} --payload -f value
    The secret payload should be, "username: sysinv password:<password>". If
    the secret payload is, "username: admin password:<password>", see,
    :ref:`Updating Docker Registry Credentials on a Subcloud
    <updating-docker-registry-credentials-on-a-subcloud>` for more information.
 .. only:: partner
   .. include:: ../_includes/distributed-upgrade-orchestration-process-using-the-cli.rest
--- a/doc/source/dist_cloud/failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud.rst
+++ b/doc/source/dist_cloud/failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud.rst
@ -0,0 +1,140 @@
 .. oeo1597292999568
 .. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud:
 ===========================================================================
 Failure During the Installation or Data Migration of N+1 Load on a Subcloud
 ===========================================================================
 You may encounter some errors during Installation or Data migration of the
 **N+1** load on a subcloud. This section explains the errors and the steps
 required to fix these errors.
 .. contents:: |minitoc|
    :local:
    :depth: 1
 Errors can occur due to one of the following:
 .. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-ul-j5r-czs-qmb:
 -   One or more invalid install values
 -   A network error that results in the subcloud's being temporarily unreachable
 -   An invalid docker registry certificate
 **Failure Caused by Install Values**
 If the subcloud install values contain an incorrect value, use the following
 command to fix it.
 .. code-block:: none
    ~(keystone_admin)]$ dcmanager subcloud update <<subcloud-name>> --install-values <<subcloud-install-values-yaml>>
 This type of failure is recoverable and you can rerun the upgrade strategy for
 the failed subcloud\(s\) using the following procedure:
 .. rubric:: |proc|
 .. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-ol-lc1-cyr-qmb:
 #.  Delete the failed upgrade strategy.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy delete
 #.  Create a new upgrade strategy for the failed subcloud.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy create <<subcloud-name>> --force <<additional options>>
    .. note::
        If the upgrade failed during the |AIO|-SX upgrade or data migration, the
        subcloud availability status is displayed as 'offline'. Use the
        :command:`--force` option when creating the new strategy.
 #.  Apply the new upgrade strategy.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy apply
 #.  Verify the upgrade strategy status.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-step list
 .. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-section-f5f-j1y-qmb:
 -----------------------------------------------------
 Failure Caused by Invalid Docker Registry Certificate
 -----------------------------------------------------
 If the docker registry certificate on the subcloud is invalid/expired prior to
 an upgrade, the upgrade will fail during data migration.
 .. warning::
    This type of failure cannot be recovered. You will need to re-deploy the
    subcloud, redo all configuration changes, and regenerate the data.
 .. note::
    Ensure that the docker registry certificate on all subclouds must be
    upgraded prior to performing an orchestrated upgrade.
 To re-deploy the subcloud, use the following procedure:
 .. rubric:: |proc|
 .. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-ol-dpp-bzr-qmb:
 #.  Unmanage the failed subcloud.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager subcloud unmanage <<subcloud-name>>
 #.  Delete the subcloud.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager subcloud delete <<subcloud-name>>
 #.  Re-deploy the failed subcloud.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager subcloud add <<parameters>>
 .. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-section-lj4-1rr-qmb:
 -----------------------------------------
 Failure Post Data Migration on a Subcloud
 -----------------------------------------
 Once the data migration on the subcloud is completed, the upgrade is activated
 and finalized. If failure occurs:
 .. rubric:: |proc|
 .. _failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud-ul-ogc-cp5-qmb:
 -   Check specified log files
 -   Follow the recovery procedure. See :ref:`Failure Prior to the Installation
    of N+1 Load on a Subcloud <failure-prior-to-the-installation-of-n+1-load-on-a-subcloud>`
 .. only:: partner
    .. include:: ../_includes/distributed-upgrade-orchestration-process-using-the-cli.rest
--- a/doc/source/dist_cloud/failure-prior-to-the-installation-of-n+1-load-on-a-subcloud.rst
+++ b/doc/source/dist_cloud/failure-prior-to-the-installation-of-n+1-load-on-a-subcloud.rst
@ -0,0 +1,61 @@
 .. uvp1597292940831
 .. _failure-prior-to-the-installation-of-n+1-load-on-a-subcloud:
 ===========================================================
 Failure Prior to the Installation of N+1 Load on a Subcloud
 ===========================================================
 You may encounter some errors prior to Installation of the **N+1** load on a
 subcloud. This section explains the errors and the steps required to fix these
 errors.
 Errors can occur due to any one of the following:
 .. _failure-prior-to-the-installation-of-n+1-load-on-a-subcloud-ul-onf-2vs-qmb:
 -   Insufficient disk space on scratch filesystems
 -   Missing subcloud install values
 -   Invalid license
 -   Invalid/corrupted load file
 -   The /home/sysadmin directory on the subcloud is too large
 If you encounter any of the above errors, use the following procedure to fix
 it:
 .. rubric:: |proc|
 #.  Delete the failed upgrade strategy
    .. code-block:: none
        ~(keystone_admin)]$  dcmanager upgrade-strategy delete
 #.  Create a new upgrade strategy.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy create <<additional options>>
    .. note::
        If only one subcloud fails the upgrade, specify the name of the
        subcloud in the command.
 #.  Apply the new upgrade strategy.
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager upgrade-strategy apply
 #.  Verify the upgrade strategy status
    .. code-block:: none
        ~(keystone_admin)]$ dcmanager strategy-step list
--- a/doc/source/dist_cloud/index.rst
+++ b/doc/source/dist_cloud/index.rst
@ -60,6 +60,30 @@ Kubernetes Version Upgrade Distributed Cloud Orchestration
    the-kubernetes-distributed-cloud-update-orchestration-process
    configuring-kubernetes-update-orchestration-on-distributed-cloud
 ------------------
 Upgrade management
 ------------------
 .. toctree::
    :maxdepth: 1
    upgrade-management-overview
    upgrading-the-systemcontroller-using-the-cli
 *******************************************************************
 Upgrade Orchestration for Distributed Cloud SubClouds using the CLI
 *******************************************************************
 .. toctree::
    :maxdepth: 1
    distributed-upgrade-orchestration-process-using-the-cli
    aborting-the-distributed-upgrade-orchestration
    configuration-for-specific-subclouds
    robust-error-handling-during-an-orchestrated-upgrade
    failure-prior-to-the-installation-of-n+1-load-on-a-subcloud
    failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud
 --------
 Appendix
 --------
--- a/doc/source/dist_cloud/installing-a-subcloud-using-redfish-platform-management-service.rst
+++ b/doc/source/dist_cloud/installing-a-subcloud-using-redfish-platform-management-service.rst
@ -109,7 +109,7 @@ subcloud, the subcloud installation has these phases:
    .. code-block:: none
-        # Specify the WRCP software version, for example '20.06' for the WRCP 20.06 release of software.
+        # Specify the |pp| software version, for example 'nn.nn' for the |pp| nn.nn release of software.
        software_version: <software_version>
        bootstrap_interface: <bootstrap_interface_name> # e.g. eno1
        bootstrap_address: <bootstrap_interface_ip_address> # e.g.128.224.151.183
--- a/doc/source/dist_cloud/installing-and-provisioning-the-central-cloud.rst
+++ b/doc/source/dist_cloud/installing-and-provisioning-the-central-cloud.rst
@ -13,7 +13,7 @@ system.
 The Central Cloud supports either
-  an |AIO|-Duplex deployment configuration
+-  an |AIO-DX| deployment configuration
 -  a Standard with Dedicated Storage Nodes deployment Standard with Controller
   Storage and one or more workers deployment configuration, or
--- a/doc/source/dist_cloud/robust-error-handling-during-an-orchestrated-upgrade.rst
+++ b/doc/source/dist_cloud/robust-error-handling-during-an-orchestrated-upgrade.rst
@ -0,0 +1,41 @@
 .. ziu1597089603252
 .. _robust-error-handling-during-an-orchestrated-upgrade:
 ====================================================
 Robust Error Handling During An Orchestrated Upgrade
 ====================================================
 This section describes the errors you may encounter during an orchestrated
 upgrade and the steps you can use to troubleshoot the errors.
 .. rubric:: |prereq|
 For a successful orchestrated upgrade, ensure the upgrade prerequisites,
 procedure, and postrequisites are met.
 If a failure occurs, use the following general steps:
 .. _robust-error-handling-during-an-orchestrated-upgrade-ol-l5y-mby-qmb:
 #.  Allow the failed strategy to complete on its own.
 #.  Check the output using the :command:`dcmanager strategy-step list` command
    for failures, if any.
 #.  Address the cause of the failure. For more information, see :ref:`Failure
    During the Installation or Data Migration of N+1 Load on a Subcloud
    <failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud>`.
 #.  Rerun the orchestrated upgrade. For more information, see :ref:`Distributed
    Upgrade Orchestration Process Using the CLI
    <distributed-upgrade-orchestration-process-using-the-cli>`.
 .. seealso::
    :ref:`Failure Prior to the Installation of N+1 Load on a Subcloud
    <failure-prior-to-the-installation-of-n+1-load-on-a-subcloud>`
    :ref:`Failure During the Installation or Data Migration of N+1 Load on a
    Subcloud <failure-during-the-installation-or-data-migration-of-n+1-load-on-a-subcloud>`
--- a/doc/source/dist_cloud/upgrade-management-overview.rst
+++ b/doc/source/dist_cloud/upgrade-management-overview.rst
@ -0,0 +1,120 @@
 .. gjf1592841770001
 .. _upgrade-management-overview:
 ===========================
 Upgrade Management Overview
 ===========================
 You can upgrade |prod|'s |prod-dc|'s SystemController, and subclouds with a new
 release of |prod| software.
 .. rubric:: |context|
 .. note::
    Backup all yaml files that are updated using the Redfish Platform
    Management service. For more information, see, :ref:`Installing a Subcloud
    Using Redfish Platform Management Service
    <installing-a-subcloud-using-redfish-platform-management-service>`.
 You can use the |CLI| to manage upgrades. The workflow for upgrades is as
 follows:
 .. _upgrade-management-overview-ol-uqv-p24-3mb:
 #.  To upgrade the |prod-dc| system, you must first upgrade the
    SystemController. See, :ref:`Upgrading the SystemController Using the CLI
    <upgrading-the-systemcontroller-using-the-cli>`.
 #.  Use |prod-dc| Upgrade Orchestration to upgrade the subclouds. See,
    :ref:`Distributed Upgrade Orchestration Process Using the CLI <distributed-upgrade-orchestration-process-using-the-cli>`.
 #.  To handle errors during an orchestrated upgrade, see :ref:`Robust Error
    Handling During An Orchestrated Upgrade
    <robust-error-handling-during-an-orchestrated-upgrade>`.
 .. rubric:: |prereq|
 The following prerequisites apply to a |prod-dc| upgrade management service.
 .. _upgrade-management-overview-ul-smx-y2m-cmb:
 -   **Configuration Verification**: Ensure that the following configurations
    are verified before you proceed with the upgrade on the |prod-dc|
    and subclouds:
    -   Run the :command:`system application-list` command to ensure that all
        applications are running
    -   Run the :command:`system host-list` command to list the configured
        hosts
    -   Run the :command:`dcmanager subcloud list` command to list the
        subclouds
    -   Run the :command:`kubectl get pods --all-namespaces` command to test
        that the authentication token validates correctly
    -   Run the :command:`fm alarm-list` command to check the system health to
        ensure that there are no unexpected alarms
    -   Run the :command:`kubectl get host -n deployment` command to ensure all
        nodes in the cluster have reconciled and is set to 'true'
    -   Ensure **controller-0** is the active controller
 -   The subclouds must all be |AIO-DX|, and using the Redfish
    platform management service.
 -   **Remove Non GA Applications**:
    -   Use the following command to remove the analytics application on the
        subclouds:
        -   :command:`system application-remove wra-analytics`
        -   :command:`system application-delete wra-analytics`
    -   Remove any non-GA applications such as Wind River Analytics, and
        |prefix|-openstack, from the |prod-dc| system, if they exist.
 -   **Increase Scratch File System Size**:
    -   Check the size of scratch partition on both the system controller and
        subclouds using the :command:`system host-fs-list` command.
        .. note::
            Increase in scratch filesystem size is also required on each
            subcloud.
    -   All controller nodes and subclouds should have a minimum of 16G scratch
        file system. The process of importing a new load for upgrade will
        temporarily use up to 11G of scratch disk space. Use the :command:`system
        host-fs-modify` command to increase scratch size on **each controller
        node** and subcloud controllers as needed in preparation for software
        upgrade. For example, run the following commands:
        .. code-block:: none
            ~(keystone_admin)]$  system host-fs-modify controller-0 scratch=16
        Run the :command:`fm alarm-list` command to check the system health to
        ensure that there are no unexpected alarms
 -   For orchestrated subcloud upgrades the install-values for each subcloud
    that was used for deployment must be saved and restored to the SystemController
    after the SystemController upgrade.
 -   Run the :command:`kubectl -n kube-system get secret` command on the
    SystemController before upgrading subclouds, as the docker **rvmc** image on
    orchestrated subcloud upgrade tries to copy the :command:`kube-system
    default-registry-key`.
 .. only:: partner
    .. include:: ../_includes/upgrade-management-overview.rest
--- a/doc/source/dist_cloud/upgrading-the-systemcontroller-using-the-cli.rst
+++ b/doc/source/dist_cloud/upgrading-the-systemcontroller-using-the-cli.rst
@ -0,0 +1,485 @@
 .. vco1593176327490
 .. _upgrading-the-systemcontroller-using-the-cli:
 ==========================================
 Upgrade the SystemController Using the CLI
 ==========================================
 You can upload and apply upgrades to the SystemController in order to upgrade
 the central repository, from the CLI. The SystemController can be upgraded
 using either a manual software upgrade procedure or by using the
 non-distributed systems :command:`sw-manager` orchestration procedure.
 .. rubric:: |context|
 Follow the steps below to manually upgrade the SystemController:
 .. rubric:: |proc|
 .. _upgrading-the-systemcontroller-using-the-cli-steps-oq4-dgm-cmb:
 #.  Source the platform environment.
    .. code-block:: none
        $ source /etc/platform/openrc
        ~(keystone_admin)]$
    .. only:: partner
        .. include:: ../_includes/upgrading-the-systemcontroller-using-the-cli.rest
 #.  Import the software release load, and copy the iso file to controller-0 \(active controller\).
    .. code-block:: none
        ~(keystone_admin)]$ system --os-region-name SystemController load-import <bootimage>.iso <bootimage>.sig
    For example,
    .. code-block:: none
        ~(keystone_admin)]$ system --os-region-name SystemController load-import <bootimage>.iso <bootimage>.sig
 #.  Apply any required software updates. After the update is installed ensure
    controller-0 is active.
    The system must be 'patch current'. All software updates related to your
    current |prod| software release must be uploaded, applied, and installed.
    All software updates to the new |prod| release, only need to be uploaded
    and applied. The install of these software updates will occur automatically
    during the software upgrade procedure as the hosts are reset to load the
    new release of software.
    To find and download applicable updates, visit the `Wind River Support
    Network <https://docs.windriver.com>`__.
 .. xbooklink For more information, see |updates-doc|: :ref:`Managing Software Updates <managing-software-updates>`.
 #.  Confirm that the system is healthy.
    Check the current system health status, resolve any alarms and other issues
    reported by the :command:`health-query-upgrade` command, then recheck the
    system health status to confirm that all **System Health** fields are set
    to **OK**.
    .. code-block:: none
        ~(keystone_admin)]$ system health-query-upgrade
        System Health:
        All hosts are provisioned: [OK]
        All hosts are unlocked/enabled: [OK]
        All hosts have current configurations: [OK]
        All hosts are patch current: [OK]
        Ceph Storage Healthy: [OK]
        No alarms: [OK]
        All kubernetes nodes are ready: [OK]
        All kubernetes control plane pods are ready: [OK]
        Required patches are applied: [OK]
        License valid for upgrade: [OK]
    By default, the upgrade process cannot run and is not recommended to run
    with active alarms present. It is strongly recommended that you clear your
    system of all alarms before doing an upgrade.
    .. note::
        Use the command :command:`system upgrade-start --force` to force the
        upgrades process to start and to ignore management affecting alarms.
        This should ONLY be done if these alarms do not cause an issue for the
        upgrades process.
    If there are alarms present during the upgrade, subcloud load sync\_status
    will display "out-of-sync".
 #.  Start the upgrade from controller-0.
    Make sure that controller-0 is the active controller, and you are logged
    into controller-0 as **sysadmin** and your present working directory is
    your home directory.
    .. code-block:: none
        ~(keystone_admin)]$ system upgrade-start
        +--------------+--------------------------------------+
        | Property     | Value                                |
        +--------------+--------------------------------------+
        | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
        | state        | starting                             |
        | from_release | nn.nn                                |
        | to_release   | nn.nn                                |
        +--------------+--------------------------------------+
    This will make a copy of the system data to be used in the upgrade.
    Configuration changes are not allowed after this point until the swact to
    controller-1 is completed.
    The following upgrade state applies once this command is executed. Run the
    :command:`system upgrade-show` command to verify the status of the upgrade.
    -   started:
        -   State entered after :command:`system upgrade-start` completes.
        -   Release 20.04 system data \(for example, postgres databases\) has
            been exported to be used in the upgrade.
        -   Configuration changes must not be made after this point, until the
            upgrade is completed.
    As part of the upgrade, the upgrade process checks the health of the system
    and validates that the system is ready for an upgrade.
    The upgrade process checks that no alarms are active before starting an
    upgrade.
    .. note::
        Use the command :command:`system upgrade-start --force` to force the
        upgrades process to start and to ignore management affecting alarms.
        This should ONLY be done if these alarms do not cause an issue for the
        upgrades process.
        If there are alarms present during the upgrade, subcloud load
        sync\_status will display "out-of-sync".
    On systems with Ceph storage, it also checks that the Ceph cluster is
    healthy.
 #.  Upgrade controller-1.
    #.  Lock controller-1.
        .. code-block:: none
            ~(keystone_admin)]$ system host-lock controller-1
    #.  Start the upgrade on controller-1.
        Controller-1 installs the update and reboots, then performs data
        migration.
        .. code-block:: none
            ~(keystone_admin)]$ system host-upgrade controller-1
        Wait for controller-1 to reinstall with the load N+1 and becomes
        **locked-disabled-online** state.
        The following data migration states apply when this command is executed.
        -   data-migration:
            -   State entered when :command:`system host-upgrade controller-1`
                is executed.
            -   System data is being migrated from release N to release N+1.
        -   data-migration-complete:
            -   State entered when controller-1 upgrade is complete.
            -   System data has been successfully migrated from release nn.nn
                to release nn.nn.
                where *nn.nn* in the update file name is the |prod| release number.
        -   data-migration-failed:
            -   State entered if data migration on controller-1 fails.
            -   Upgrade must be aborted.
    #.  Check the upgrade state.
        .. code-block:: none
            ~(keystone_admin)]$ system upgrade-show
            +--------------+--------------------------------------+
            | Property     | Value                                |
            +--------------+--------------------------------------+
            | uuid         | e7c8f6bc-518c-46d4-ab81-7a59f8f8e64b |
            | state        | data-migration-complete              |
            | from_release | nn.nn                                |
            | to_release   | nn.nn                                |
            +--------------+--------------------------------------+
        If the :command:`upgrade-show` status indicates
        'data-migration-failed', then there is an issue with the data
        migration. Check the issue before proceeding to the next step.
        .. note::
            Do not unlock controller-1, before running :command:`system
            upgrade-show` to display the upgrade status
            "data-migration-complete".
    #.  Unlock controller-1.
        .. code-block:: none
            ~(keystone_admin)]$ system host-unlock controller-1
        Wait for controller-1 to become **unlocked-enabled**. Wait for the DRBD
        sync **400.001** Services-related alarm is raised and then cleared.
        The following states apply when this command is executed.
        -   upgrading-controllers:
            -   State entered when controller-1 has been unlocked and is
                running release nn.nn software.
                where *nn.nn* in the update file name is the |prod| release
                number.
        If it transitions to **unlocked-disabled-failed**, check the issue
        before proceeding to the next step. The alarms may indicate a
        configuration error. Check the result of the configuration logs on
        controller-1, \(for example, Error logs in
        controller1:/var/log/puppet\).
    #.  Run the :command:`system application-list`, and :command:`system
        host-upgrade-list` commands to view the current progress.
 #.  Set controller-1 as the active controller. Swact to controller-1.
    .. code-block:: none
        ~(keystone_admin)]$ system host-swact controller-0
    Wait until services have gone active on the new active controller-1 before
    proceeding to the next step. When all services on controller-1 are
    enabled-active, the swact is complete.
    .. note::
        Continue the remaining steps below to manually upgrade or use upgrade
        orchestration to upgrade the remaining nodes.
 #.  Upgrade **controller-0**. For more information, see
 .. xbooklink :ref:`|updates-doc| <software-updates-and-upgrades-software-updates>`.
    #.  Lock **controller-0**.
        .. code-block:: none
            ~(keystone_admin)]$ system host-lock controller-0
    #.  Upgrade **controller-0**.
        .. code-block:: none
            ~(keystone_admin)]$ system host-upgrade controller-0
    #.  Unlock **controller-0**.
        .. code-block:: none
            ~(keystone_admin)]$ system host-unlock controller-0
        Wait until the DRBD sync **400.001** Services-related alarm is raised
        and then cleared before proceeding to the next step.
        -   upgrading-hosts:
            -   State entered when both controllers are running release nn.nn
                software.
 #.  Check the system health to ensure that there are no unexpected alarms.
    .. code-block:: none
        ~(keystone_admin)]$ fm alarm-list
    Clear all alarms unrelated to the upgrade process.
 #.  If using Ceph storage backend, upgrade the storage nodes one at a time.
    The storage node must be locked and all OSDs must be down in order to do
    the upgrade.
    #.  Lock storage-0.
        .. code-block:: none
            ~(keystone_admin)]$ system host-lock storage-0
    #.  Verify that the OSDs are down after the storage node is locked.
        In the Horizon interface, navigate to **Admin** \> **Platform** \>
        **Storage Overview** to view the status of the OSDs.
    #.  Upgrade storage-0.
        .. code-block:: none
            ~(keystone_admin)]$ system host-upgrade storage-0
        The upgrade is complete when the node comes online, and at that point,
        you can safely unlock the node.
        After upgrading a storage node, but before unlocking, there are Ceph
        synchronization alarms \(that appear to be making progress in
        synching\), and there are infrastructure network interface alarms
        \(since the infrastructure network interface configuration has not been
        applied to the storage node yet, as it has not been unlocked\).
        Unlock the node as soon as the upgraded storage node comes online.
    #.  Unlock storage-0.
        .. code-block:: none
            ~(keystone_admin)]$ system host-unlock storage-0
        Wait for all alarms to clear after the unlock before proceeding to
        upgrade the next storage host.
    #.  Repeat the above steps for each storage host.
        .. note::
            After upgrading the first storage node you can expect alarm
            **800.003**. The alarm is cleared after all storage nodes are
            upgraded.
 #.  If worker nodes are present, upgrade worker hosts, serially or parallelly,
    if any.
    #.  Lock worker-0.
        .. code-block:: none
            ~(keystone_admin)]$ system host-lock worker-0
    #.  Upgrade worker-0.
        .. code-block:: none
            ~(keystone_admin)]$ system host-upgrade worker-0
        Wait for the host to run the installer, reboot, and go online before
        unlocking it in the next step.
    #.  Unlock worker-0.
        .. code-block:: none
            ~(keystone_admin)]$ system host-unlock worker-0
        Wait for all alarms to clear after the unlock before proceeding to the
        next worker host.
    #.  Repeat the above steps for each worker host.
 #.  Set controller-0 as the active controller. Swact to controller-0.
    .. code-block:: none
        ~(keystone_admin)]$ system host-swact controller-1
    Wait until services have gone active on the active controller-0 before
    proceeding to the next step. When all services on controller-0 are
    enabled-active, the swact is complete.
 #.  Activate the upgrade.
    .. code-block:: none
        ~(keystone_admin)]$ system upgrade-activate
        +--------------+--------------------------------------+
        | Property     | Value                                |
        +--------------+--------------------------------------+
        | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
        | state        | activating                           |
        | from_release | nn.nn                                |
        | to_release   | nn.nn                                |
        +--------------+--------------------------------------+
    During the running of the :command:`upgrade-activate` command, new
    configurations are applied to the controller. 250.001 \(**hostname
    Configuration is out-of-date**\) alarms are raised and are cleared as the
    configuration is applied. The upgrade state goes from **activating** to
    **activation-complete** once this is done.
    The following states apply when this command is executed.
    -   activation-requested:
        -   State entered when :command:`system upgrade-activate` is executed.
    -   activating:
        -   State entered when we have started activating the upgrade by
            applying new configurations to the controller and compute hosts.
    -   activation-complete:
        -   State entered when new configurations have been applied to all
            controller and compute hosts.
    #.  Check the status of the upgrade again to see it has reached
        **activation-complete**, for example.
        .. code-block:: none
            ~(keystone_admin)]$ system upgrade-show
            +--------------+--------------------------------------+
            | Property     | Value                                |
            +--------------+--------------------------------------+
            | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
            | state        | activation-complete                  |
            | from_release | nn.nn                                |
            | to_release   | nn.nn                                |
            +--------------+--------------------------------------+
    .. note::
        Alarms are generated as the subcloud load sync\_status is "out-of-sync".
 #.  Complete the upgrade.
    .. code-block:: none
        ~(keystone_admin)]$ system upgrade-complete
        +--------------+--------------------------------------+
        | Property     | Value                                |
        +--------------+--------------------------------------+
        | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
        | state        | completing                           |
        | from_release | nn.nn                                |
        | to_release   | nn.nn                                |
        +--------------+--------------------------------------+
    Run the :command:`system upgrade-show` command, and the status will display
    "no upgrade in progress". The subclouds will be out-of-sync.
 .. rubric:: |postreq|
 .. warning::
    Do NOT delete the N load from the SystemController once the upgrade is
    complete. If the load is deleted from the SystemController, you must
    manually delete the N load from each subcloud.