docs/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst
Ngairangbam Mili 02e9e8b05d Review Upgrade the System Controller Using the CLI page (dsr10)
Elisa is working on the same "Upgrade the System Controller Using the CLI" page.
Link: https://review.opendev.org/c/starlingx/docs/+/937735/28/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst#b485
@all Please review the changes in the above review

Change-Id: I979c0c3387d7e7fd90f588a0e74c6f95c4eb2ff5
Signed-off-by: Ngairangbam Mili <ngairangbam.mili@windriver.com>
2025-01-24 16:46:12 +00:00

20 KiB

Upgrade the System Controller Using the CLI

You can upload and apply a software upgrade (deploy a major release or patched major Release) to the system controller, using the CLI. The software upgrade not only upgrades software of the system controller but also updates software in the system controller's vault and the central container image repository, in support of subsequent subcloud upgrades.

The system controller can be upgraded using either a manual software upgrade <manual-host-software-deployment-ee17ec6f71a4> or by using the standalone cloud orchestrated software upgraded procedure <orchestrated-deployment-host-software-deployment-d234754c7d20> with sw-manager.

Follow the steps below to manually upgrade the system controller:

starlingx

partner

starlingx

partner

  • The platform issuer (system-local-ca) is required to have an RSA certificate/private key pair before upgrading. If system-local-ca was configured with a different type of certificate/private key, the deploy pre check will fail with an informative message. In this case, the migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d procedure needs to be executed to reconfigure system-local-ca with the RSA certificate/private key targeting the SystemController and all subclouds.
  • If there are software updates for your current software release that are required in order to upgrade to the new software release, these patches/updates should be applied in a separate software deploy of the patch release(s) (see manual-host-software-deployment-ee17ec6f71a4) on the system controller. These patches/updates should also be applied in an orchestrated software deploy of the subclouds (see orchestrated-deployment-host-software-deployment-d234754c7d20) in order to get patch current of all the systems before starting the upgrade to the new major release on the system.

  1. Source the platform environment.

    $ source /etc/platform/openrc
    ~(keystone_admin)]$

    partner

  2. Upload the load.

    starlingx

    ~(keystone_admin)]$ software upload --local /full_path/<bootimage>.iso /full_path/<bootimage>.sig +-------------------------------+--------------------------+ | Uploaded File | Release | +-------------------------------+--------------------------+ | starlingx-intel-x86-64-cd.iso | stx-10.0.0 | +-------------------------------+--------------------------+

    partner

    Note

    Do not use --os-region-name SystemController proxy at this moment for subcloud deployment. This step will be performed once the system controller deploy is complete.

    Note

    If you face any issue while importing the load, go to /var/log/software.log and examine the error messages.

    partner

  3. Confirm that the system is healthy.

    Check the current system health status, resolve any alarms and other issues reported by the software deploy precheck <release-id> command then recheck the system health status to confirm that all System Health fields are set to OK.

    ~(keystone_admin)]$ software deploy precheck <release-id>
        System Health:
        All hosts are provisioned: [OK]
        All hosts are unlocked/enabled: [OK]
        All hosts have current configurations: [OK]
        Ceph Storage Healthy: [OK]
        No alarms: [OK]
        All kubernetes nodes are ready: [OK]
        All kubernetes control plane pods are ready: [OK]
        All kubernetes applications are in a valid state: [OK]
        All hosts are patch current: [OK]
        Active kubernetes version [vX.XX.X] is a valid supported version: [OK]
        Active controller is controller-0: [OK]
        Installed license is valid: [OK]
        Valid upgrade path from release 22.12 to 24.09: [OK]
        Required patches are applied: [OK]

    starlingx

    Where <release-id> is stx-10.0.0 for above software upload example, or it can be found out by running software list.

    partner

    By default, the deploy process cannot run and is not recommended to run with active alarms present. It is strongly recommended that you clear your system of all alarms before doing a deploy.

  4. Begin the deploy from controller-0.

    Make sure that controller-0 is the active controller, and you are logged into controller-0 as sysadmin and your present working directory is your home directory.

    ~(keystone_admin)]$ software deploy start <release-id>
    +--------------+------------+------+--------------+
    | From Release | To Release | RR   | State        |
    +--------------+------------+------+--------------+
    | 22.12.0      | 24.09.100  | True | deploy-start |
    +--------------+------------+------+--------------+

    Note

    It is recommended to run the software deploy precheck command before running software deploy start. However, the software deploy start command will automatically run the precheck command even if the precheck command has not been run before.

    Wait for software deploy start <release-id> to complete by monitoring the status of the deploy.

    ~(keystone_admin)]$ software deploy show
    +--------------+------------+------+-------------------+
    | From Release | To Release | RR   | State             |
    +--------------+------------+------+-------------------+
    | 22.12.0      | 24.09.100  | True | deploy-start-done |
    +--------------+------------+------+-------------------+

    software deploy start <release-id> will migrate configuration data to the new release's data model. Configuration must not be changed after this point, until the deploy is completed.

  5. Software deploy controller-1.

    1. Lock controller-1.

      ~(keystone_admin)]$ system host-lock controller-1
    2. Begin the deploy on controller-1.

      ~(keystone_admin)]$ software deploy host controller-1
      Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id>
      Host installation was successful on controller-1
    3. Unlock controller-1.

      ~(keystone_admin)]$ system host-unlock controller-1

      Wait for controller-1 to enter the unlocked-enabled state. Wait until the DRBD sync 400.001 Services-related alarm has been raised and then cleared.

      When the first software deploy host <hostname> command is issued after the deploy state becomes deploy-start-done, the software deploy show state is changed to deploy-host. When the software is deployed to all the hosts, that is, when the software deploy host <hostname> successfully completes against the last host, the software deploy show state changes to deploy-host-done.

      If software deploy show state transitions to unlocked-disabled-failed, check the issue before proceeding to the next step. The alarms may indicate a configuration error. Check the result of the configuration logs on controller-1, (for example, Error logs in controller-1:/var/log/puppet).

    4. Run the system application-list and software deploy host-list commands to view the current progress.

      After controller-1 is unlocked/enabled/available, run the following step to check controller-1 is running the new release:

      ~(keystone_admin)]$ system host-show controller-1
  6. Set controller-1 as the active controller. Swact away from controller-0.

    ~(keystone_admin)]$ system host-swact controller-0

    Wait until services have gone active on the new active controller-1 before proceeding to the next step. When all services on controller-1 are enabled-active, the swact is complete.

  7. Software deploy controller-0.

    For more information, see introduction-platform-software-updates-upgrades-06d6de90bbd0.

    1. Lock controller-0.

      ~(keystone_admin)]$ system host-lock controller-0
    2. Begin the deploy on controller-0.

      ~(keystone_admin)]$ software deploy host controller-0
      Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id>
    3. Unlock controller-0.

      ~(keystone_admin)]$ system host-unlock controller-0
  8. Check the system health to ensure that there are no unexpected alarms.

    ~(keystone_admin)]$ fm alarm-list

    Clear all alarms unrelated to the deploy process.

  9. If using Ceph storage backend, deploy the storage nodes one at a time.

    The storage node must be locked and all must be down in order to do the upgrade.

    1. Lock storage-0.

      ~(keystone_admin)]$ system host-lock storage-0
    2. Verify that the are down after the storage node is locked.

      ~(keystone_admin)]$ ceph osd tree
      +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
      | ID | CLASS   | WEIGHT     | TYPE    |    NAME           | STATUS      | REWEIGHT         | PRI-AFF     |
      +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
      | -1 |         | 0.01700    | root    |  storage-tier     |             |                  |             |
      +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
      | -2 |         | 0.01700    | chassis |  group-0          |             |                  |             |
      +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
      | -4 |         | 0.00850    | host    |  controller-0     |             |                  |             |
      +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
      |  0 |   hdd   | 0.00850    |         |  osd.0            |  up         |  1.00000         | 1.00000     |
      +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
      | -3 |         | 0.00850    | host    |  controller-1     |             |                  |             |
      +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
      | 1  |   hdd   | 0.00850    |         |  osd.1            |  down       |  1.00000         | 1.00000     |
      +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
    3. Begin the deploy on storage-0.

      ~(keystone_admin)]$ software deploy host storage-0

      The deploy is complete when the node comes online, and at that point, you can safely unlock the node.

      After upgrading a storage node, but before unlocking, there are Ceph synchronization alarms (that appear to be making progress in synching), and there are infrastructure network interface alarms (since the infrastructure network interface configuration has not been applied to the storage node yet, as it has not been unlocked).

      Unlock the node as soon as the deployed storage node comes online.

    4. Unlock storage-0.

      ~(keystone_admin)]$ system host-unlock storage-0

      Wait for all alarms to clear after the unlock before proceeding to deploy the next storage host.

    5. Repeat the above steps for each storage host.

      Note

      After deploying the first storage node you can expect alarm 800.003. The alarm is cleared after all storage nodes are deployed.

  10. If worker nodes are present, deploy worker hosts, serially or in parallel, if any.

    1. Lock worker-0.

      ~(keystone_admin)]$ system host-lock worker-0
    2. Deploy worker-0.

      ~(keystone_admin)]$ software deploy host worker-0

      Wait for the host to run the installer, reboot, and go online before unlocking it in the next step.

    3. Unlock worker-0.

      ~(keystone_admin)]$ system host-unlock worker-0

      Wait for all alarms to clear after the unlock before proceeding to the next worker host.

    4. Repeat the above steps for each worker host.

  11. Set controller-0 as the active controller. Swact away from controller-1.

    ~(keystone_admin)]$ system host-swact controller-1

    Wait until services have gone active on the active controller-0 before proceeding to the next step. When all services on controller-0 are enabled-active, the swact is complete.

  12. Activate the deploy.

    ~(keystone_admin)]$ software deploy activate
    Deploy activate has started

    Check deploy state:

    ~(keystone_admin)]$ software deploy show
    +--------------+------------+------+-----------------+
    | From Release | To Release | RR   | State           |
    +--------------+------------+------+-----------------+
    | 22.12.0      | 24.09.100  | True | deploy-activate |
    +--------------+------------+------+-----------------+

    Wait for software deploy activate to complete by monitoring the status of the deploy.

    ~(keystone_admin)]$ software deploy show
    +--------------+------------+------+----------------------+
    | From Release | To Release | RR   | State                |
    +--------------+------------+------+----------------------+
    | 22.12.0      | 24.09.100  | True | deploy-activate-done |
    +--------------+------------+------+----------------------+

    During the running of the software deploy activate command, new configurations are applied to the controller. 250.001 (hostname Configuration is out-of-date) alarms are raised and are cleared as the configuration is applied. The deploy state goes from deploy-activate to deploy-activate-done once this is done.

    partner

    The following states apply when this command is executed.

    deploy-activate

    State entered when deploy is being activated.

    deploy-activate-done

    State entered when the deploy-activate completes successfully.

    Note

    This can take more than 15 minutes to complete.

    Note

    Alarms are generated as the subcloud software sync_status is "out-of-sync".

  13. Complete the upgrade.

    ~(keystone_admin)]$ software deploy complete
    Deployment has been completed

    Verify deploy state:

    ~(keystone_admin)]$ software deploy show
    +--------------+------------+------+-----------------------+
    | From Release | To Release | RR   | State                 |
    +--------------+------------+------+-----------------------+
    | 22.12.0      | 24.09.100  | True | deploy-completed      |
    +--------------+------------+------+-----------------------+
  14. Upgrade Kubernetes, after the platform deploy is completed. To upgrade Kubernetes of standalone system, see index-updates-kub-03d4d10fa0be.

  15. When the Kubernetes upgrade completes, conclude the platform deploy by deleting it.

    ~(keystone_admin)]$ software deploy delete
    Deploy deleted with success

    Verify deploy state:

    ~(keystone_admin)]$ software deploy show
    No deploy in progress
  16. Upload the load for subcloud deployment.

    starlingx

    ~(keystone_admin)]$ software --os-region-name SystemController upload --local /full_path/<bootimage>.iso /full_path/<bootimage>.sig +-------------------------------+--------------------------+ | Uploaded File | Release | +-------------------------------+--------------------------+ | starlingx-intel-x86-64-cd.iso | stx-10.0.0 | +-------------------------------+--------------------------+

    partner

Note

This can take a few minutes. After the system controller is successfully deployed, the old load (which is in imported state) should not be deleted from load list as this load is required for managing the subclouds that are still running the previous load.

partner

Separately apply the patches after the upgrade to the major release.