Elisa is working on the same "Upgrade the System Controller Using the CLI" page. Link: https://review.opendev.org/c/starlingx/docs/+/937735/28/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst#b485 @all Please review the changes in the above review Change-Id: I979c0c3387d7e7fd90f588a0e74c6f95c4eb2ff5 Signed-off-by: Ngairangbam Mili <ngairangbam.mili@windriver.com>
20 KiB
Upgrade the System Controller Using the CLI
You can upload and apply a software upgrade (deploy a major release or patched major Release) to the system controller, using the CLI. The software upgrade not only upgrades software of the system controller but also updates software in the system controller's vault and the central container image repository, in support of subsequent subcloud upgrades.
The system controller can be upgraded using either a manual software
upgrade <manual-host-software-deployment-ee17ec6f71a4>
or
by using the standalone cloud orchestrated software upgraded procedure
<orchestrated-deployment-host-software-deployment-d234754c7d20>
with sw-manager
.
Follow the steps below to manually upgrade the system controller:
starlingx
- Transfer the ISO and signature files for the new major release (or
new patched major release) from the mirror https://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/debian/monolithic/outputs/iso/
to controller-0 (active controller).
- Upgrade to a patched major release (patched ISO).
partner
starlingx
- If you are using a private registry (see the
docker / *-registry
sections of system service-parameter-list), transfer the container image versions associated with the new major release (or new patched major release) using the list from mirror https://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/debian/monolithic/outputs/docker-images/ from docker.io to the private registry.
partner
- The platform issuer (system-local-ca) is required to have an RSA
certificate/private key pair before upgrading. If
system-local-ca
was configured with a different type of certificate/private key, the deploy pre check will fail with an informative message. In this case, themigrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d
procedure needs to be executed to reconfiguresystem-local-ca
with the RSA certificate/private key targeting theSystemController
and all subclouds. - If there are software updates for your current software release that
are required in order to upgrade to the new software release, these
patches/updates should be applied in a separate software deploy of the
patch release(s) (see
manual-host-software-deployment-ee17ec6f71a4
) on the system controller. These patches/updates should also be applied in an orchestrated software deploy of the subclouds (seeorchestrated-deployment-host-software-deployment-d234754c7d20
) in order to get patch current of all the systems before starting the upgrade to the new major release on the system.
Source the platform environment.
$ source /etc/platform/openrc ~(keystone_admin)]$
partner
Upload the load.
starlingx
~(keystone_admin)]$ software upload --local /full_path/<bootimage>.iso /full_path/<bootimage>.sig +-------------------------------+--------------------------+ | Uploaded File | Release | +-------------------------------+--------------------------+ | starlingx-intel-x86-64-cd.iso | stx-10.0.0 | +-------------------------------+--------------------------+
partner
Note
Do not use
--os-region-name SystemController
proxy at this moment for subcloud deployment. This step will be performed once the system controller deploy is complete.Note
If you face any issue while importing the load, go to
/var/log/software.log
and examine the error messages.partner
Confirm that the system is healthy.
Check the current system health status, resolve any alarms and other issues reported by the
software deploy precheck <release-id>
command then recheck the system health status to confirm that all System Health fields are set to OK.~(keystone_admin)]$ software deploy precheck <release-id> System Health: All hosts are provisioned: [OK] All hosts are unlocked/enabled: [OK] All hosts have current configurations: [OK] Ceph Storage Healthy: [OK] No alarms: [OK] All kubernetes nodes are ready: [OK] All kubernetes control plane pods are ready: [OK] All kubernetes applications are in a valid state: [OK] All hosts are patch current: [OK] Active kubernetes version [vX.XX.X] is a valid supported version: [OK] Active controller is controller-0: [OK] Installed license is valid: [OK] Valid upgrade path from release 22.12 to 24.09: [OK] Required patches are applied: [OK]
starlingx
Where
<release-id>
is stx-10.0.0 for above software upload example, or it can be found out by runningsoftware list
.partner
By default, the deploy process cannot run and is not recommended to run with active alarms present. It is strongly recommended that you clear your system of all alarms before doing a deploy.
Begin the deploy from controller-0.
Make sure that controller-0 is the active controller, and you are logged into controller-0 as sysadmin and your present working directory is your home directory.
~(keystone_admin)]$ software deploy start <release-id> +--------------+------------+------+--------------+ | From Release | To Release | RR | State | +--------------+------------+------+--------------+ | 22.12.0 | 24.09.100 | True | deploy-start | +--------------+------------+------+--------------+
Note
It is recommended to run the
software deploy precheck
command before runningsoftware deploy start
. However, thesoftware deploy start
command will automatically run the precheck command even if the precheck command has not been run before.Wait for
software deploy start <release-id>
to complete by monitoring the status of the deploy.~(keystone_admin)]$ software deploy show +--------------+------------+------+-------------------+ | From Release | To Release | RR | State | +--------------+------------+------+-------------------+ | 22.12.0 | 24.09.100 | True | deploy-start-done | +--------------+------------+------+-------------------+
software deploy start <release-id>
will migrate configuration data to the new release's data model. Configuration must not be changed after this point, until the deploy is completed.Software deploy controller-1.
Lock controller-1.
~(keystone_admin)]$ system host-lock controller-1
Begin the deploy on controller-1.
~(keystone_admin)]$ software deploy host controller-1 Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id> Host installation was successful on controller-1
Unlock controller-1.
~(keystone_admin)]$ system host-unlock controller-1
Wait for controller-1 to enter the
unlocked-enabled
state. Wait until the DRBD sync 400.001 Services-related alarm has been raised and then cleared.When the first
software deploy host <hostname>
command is issued after the deploy state becomesdeploy-start-done
, the software deploy show state is changed todeploy-host
. When the software is deployed to all the hosts, that is, when thesoftware deploy host <hostname>
successfully completes against the last host, the software deploy show state changes todeploy-host-done
.If software deploy show state transitions to unlocked-disabled-failed, check the issue before proceeding to the next step. The alarms may indicate a configuration error. Check the result of the configuration logs on controller-1, (for example, Error logs in controller-1:
/var/log/puppet
).Run the
system application-list
andsoftware deploy host-list
commands to view the current progress.After controller-1 is unlocked/enabled/available, run the following step to check controller-1 is running the new release:
~(keystone_admin)]$ system host-show controller-1
Set controller-1 as the active controller. Swact away from controller-0.
~(keystone_admin)]$ system host-swact controller-0
Wait until services have gone active on the new active controller-1 before proceeding to the next step. When all services on controller-1 are enabled-active, the swact is complete.
Software deploy controller-0.
For more information, see
introduction-platform-software-updates-upgrades-06d6de90bbd0
.Lock controller-0.
~(keystone_admin)]$ system host-lock controller-0
Begin the deploy on controller-0.
~(keystone_admin)]$ software deploy host controller-0 Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id>
Unlock controller-0.
~(keystone_admin)]$ system host-unlock controller-0
Check the system health to ensure that there are no unexpected alarms.
~(keystone_admin)]$ fm alarm-list
Clear all alarms unrelated to the deploy process.
If using Ceph storage backend, deploy the storage nodes one at a time.
The storage node must be locked and all must be down in order to do the upgrade.
Lock storage-0.
~(keystone_admin)]$ system host-lock storage-0
Verify that the are down after the storage node is locked.
~(keystone_admin)]$ ceph osd tree +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ | ID | CLASS | WEIGHT | TYPE | NAME | STATUS | REWEIGHT | PRI-AFF | +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ | -1 | | 0.01700 | root | storage-tier | | | | +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ | -2 | | 0.01700 | chassis | group-0 | | | | +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ | -4 | | 0.00850 | host | controller-0 | | | | +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ | 0 | hdd | 0.00850 | | osd.0 | up | 1.00000 | 1.00000 | +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ | -3 | | 0.00850 | host | controller-1 | | | | +----+---------+------------+---------+-------------------+-------------+------------------+-------------+ | 1 | hdd | 0.00850 | | osd.1 | down | 1.00000 | 1.00000 | +----+---------+------------+---------+-------------------+-------------+------------------+-------------+
Begin the deploy on storage-0.
~(keystone_admin)]$ software deploy host storage-0
The deploy is complete when the node comes online, and at that point, you can safely unlock the node.
After upgrading a storage node, but before unlocking, there are Ceph synchronization alarms (that appear to be making progress in synching), and there are infrastructure network interface alarms (since the infrastructure network interface configuration has not been applied to the storage node yet, as it has not been unlocked).
Unlock the node as soon as the deployed storage node comes online.
Unlock storage-0.
~(keystone_admin)]$ system host-unlock storage-0
Wait for all alarms to clear after the unlock before proceeding to deploy the next storage host.
Repeat the above steps for each storage host.
Note
After deploying the first storage node you can expect alarm 800.003. The alarm is cleared after all storage nodes are deployed.
If worker nodes are present, deploy worker hosts, serially or in parallel, if any.
Lock worker-0.
~(keystone_admin)]$ system host-lock worker-0
Deploy worker-0.
~(keystone_admin)]$ software deploy host worker-0
Wait for the host to run the installer, reboot, and go online before unlocking it in the next step.
Unlock worker-0.
~(keystone_admin)]$ system host-unlock worker-0
Wait for all alarms to clear after the unlock before proceeding to the next worker host.
Repeat the above steps for each worker host.
Set controller-0 as the active controller. Swact away from controller-1.
~(keystone_admin)]$ system host-swact controller-1
Wait until services have gone active on the active controller-0 before proceeding to the next step. When all services on controller-0 are enabled-active, the swact is complete.
Activate the deploy.
~(keystone_admin)]$ software deploy activate Deploy activate has started
Check deploy state:
~(keystone_admin)]$ software deploy show +--------------+------------+------+-----------------+ | From Release | To Release | RR | State | +--------------+------------+------+-----------------+ | 22.12.0 | 24.09.100 | True | deploy-activate | +--------------+------------+------+-----------------+
Wait for
software deploy activate
to complete by monitoring the status of the deploy.~(keystone_admin)]$ software deploy show +--------------+------------+------+----------------------+ | From Release | To Release | RR | State | +--------------+------------+------+----------------------+ | 22.12.0 | 24.09.100 | True | deploy-activate-done | +--------------+------------+------+----------------------+
During the running of the
software deploy activate
command, new configurations are applied to the controller. 250.001 (hostname Configuration is out-of-date) alarms are raised and are cleared as the configuration is applied. The deploy state goes fromdeploy-activate
todeploy-activate-done
once this is done.partner
The following states apply when this command is executed.
- deploy-activate
-
State entered when deploy is being activated.
- deploy-activate-done
-
State entered when the deploy-activate completes successfully.
Note
This can take more than 15 minutes to complete.
Note
Alarms are generated as the subcloud software sync_status is "out-of-sync".
Complete the upgrade.
~(keystone_admin)]$ software deploy complete Deployment has been completed
Verify deploy state:
~(keystone_admin)]$ software deploy show +--------------+------------+------+-----------------------+ | From Release | To Release | RR | State | +--------------+------------+------+-----------------------+ | 22.12.0 | 24.09.100 | True | deploy-completed | +--------------+------------+------+-----------------------+
Upgrade Kubernetes, after the platform deploy is completed. To upgrade Kubernetes of standalone system, see
index-updates-kub-03d4d10fa0be
.When the Kubernetes upgrade completes, conclude the platform deploy by deleting it.
~(keystone_admin)]$ software deploy delete Deploy deleted with success
Verify deploy state:
~(keystone_admin)]$ software deploy show No deploy in progress
Upload the load for subcloud deployment.
starlingx
~(keystone_admin)]$ software --os-region-name SystemController upload --local /full_path/<bootimage>.iso /full_path/<bootimage>.sig +-------------------------------+--------------------------+ | Uploaded File | Release | +-------------------------------+--------------------------+ | starlingx-intel-x86-64-cd.iso | stx-10.0.0 | +-------------------------------+--------------------------+
partner
Note
This can take a few minutes. After the system controller is successfully deployed, the old load (which is in imported state) should not be deleted from load list as this load is required for managing the subclouds that are still running the previous load.
partner
Separately apply the patches after the upgrade to the major release.