docs/doc/source/dist_cloud/kubernetes/upgrading-the-systemcontroller-using-the-cli.rst

.. Greg updates required for -High Security Vulnerability Document Updates

.. vco1593176327490
.. _upgrading-the-systemcontroller-using-the-cli:

===========================================
Upgrade the System Controller Using the CLI
===========================================

You can upload and apply upgrades to the system controller in order to upgrade
the central repository, from the CLI. The system controller can be upgraded
using either a manual software upgrade procedure or by using the
non-distributed systems :command:`sw-manager` orchestration procedure.

.. rubric:: |context|

Follow the steps below to manually upgrade the system controller:

.. rubric:: |prereq|

-   Validate the list of new images with the target release. If you are using a
    private registry for installs/upgrades, you must populate your private
    registry with the new images prior to bootstrap and/or patch application.

.. rubric:: |proc|

.. _upgrading-the-systemcontroller-using-the-cli-steps-oq4-dgm-cmb:

#.  Source the platform environment.

    .. code-block:: none

        $ source /etc/platform/openrc
        ~(keystone_admin)]$

    .. only:: partner

        .. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
           :start-after: license-begin
           :end-before: license-end

#.  Transfer iso and signature files to controller-0 (active controller) and import the load.

    .. code-block:: none

        ~(keystone_admin)]$ software --os-region-name SystemController upload --local <bootimage>.iso <bootimage>.sig
        +-------------------------------+-------------------+
        | Uploaded File                 | Release           |
        +-------------------------------+-------------------+
        | starlingx-intel-x86-64-cd.iso | starlingx-24.09.0 |
        +-------------------------------+-------------------+

    .. note::
        If you face any issue while importing the load, go to
        ``/var/log/software.log`` and examine the error messages.

    .. note::
        This can take several minutes. After the system controller is successfully
        upgraded, the old load (which is in imported state) should not be deleted
        from load list otherwise the subcloud upgrade orchestration will fail
        with an error.

#.  Apply any required software updates. After the update is installed ensure
    controller-0 is active.

    The system controller as well as the subclouds must be 'patch current'. All
    software updates related to your current |prod| software release must be
    uploaded, applied, and installed.

    All software updates to the new |prod| release, only need to be uploaded
    and applied. The install of these software updates will occur automatically
    during the software upgrade procedure as the hosts are reset to load the
    new release of software.

    .. only:: partner

       .. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
          :start-after: wrsbegin
          :end-before: wrsend

#.  Confirm that the system is healthy.

    Check the current system health status, resolve any alarms and other issues
    reported by the :command:`software deploy precheck <release-id>` command
    then recheck the system health status to confirm that all **System Health**
    fields are set to **OK**. "If the upgrade health query fails 'Boot Device
    and Root file system Device' check as seen below:"

    .. code-block:: none

        ~(keystone_admin)]$ software deploy precheck <release-id>
            System Health:
            All hosts are provisioned: [OK]
            All hosts are unlocked/enabled: [OK]
            All hosts have current configurations: [OK]
            Ceph Storage Healthy: [OK]
            No alarms: [OK]
            All kubernetes nodes are ready: [OK]
            All kubernetes control plane pods are ready: [OK]
            All kubernetes applications are in a valid state: [OK]
            All hosts are patch current: [OK]
            Valid upgrade path from release 22.12 to 24.09: [OK]
            Required patches are applied: [OK]

    Where ``<release-id>`` is ``starlingx-24.09.0`` for above software upload
    example, or it can be found out by running :command:`software list`.

    The platform issuer (system-local-ca) is required to have an RSA
    certificate/private key pair before upgrading. If ``system-local-ca`` was
    configured with a different type of certificate/private key, the upgrade
    pre check will fail with an informative message. In this case, the
    :ref:`migrate-platform-certificates-to-use-cert-manager-c0b1727e4e5d` procedure
    needs to be executed to reconfigure ``system-local-ca`` with the RSA
    certificate/private key targeting the ``SystemController`` and all subclouds.

    By default, the upgrade process cannot run and is not recommended to run
    with active alarms present. It is strongly recommended that you clear your
    system of all alarms before doing an upgrade.

    .. note::

        Use the command :command:`system upgrade-start --force` to force the
        upgrade process to start and ignore non-management-affecting alarms.
        This should ONLY be done if these alarms do not cause an issue for the
        upgrades process.

#.  Start the upgrade from controller-0.

    Make sure that controller-0 is the active controller, and you are logged
    into controller-0 as **sysadmin** and your present working directory is
    your home directory.

    .. code-block:: none

        ~(keystone_admin)]$ software deploy start <release-id>
        +--------------+------------+------+--------------+
        | From Release | To Release | RR   | State        |
        +--------------+------------+------+--------------+
        | 22.12.0      | 24.09.0    | True | deploy-start |
        +--------------+------------+------+--------------+

    When ``deploy start`` is complete:

    .. code-block:: none

        +--------------+------------+------+-------------------+
        | From Release | To Release | RR   | State             |
        +--------------+------------+------+-------------------+
        | 22.12.0      | 24.09.0    | True | deploy-start-done |
        +--------------+------------+------+-------------------+

    This will make a copy of the system data to be used in the upgrade.
    Configuration changes must not be made after this point, until the
    upgrade is completed.

    The following upgrade state applies once this command is executed. Run the
    :command:`system upgrade-show` command to verify the status of the upgrade.


    -   started:

        -   State entered after :command:`system upgrade-start` completes.

        -   Release <nn.nn> system data (for example, postgres databases) has
            been exported to be used in the upgrade.

    As part of the upgrade, the upgrade process checks the health of the system
    and validates that the system is ready for an upgrade.

    The upgrade process checks that no alarms are active before starting an
    upgrade.

    .. note::

        Use the command :command:`system upgrade-start --force` to force the
        upgrades process to start and to ignore management affecting alarms.
        This should only be done if these alarms do not cause an issue for the
        upgrades process.

        The ``fm alarm-list --mgmt_affecting`` option provides specific alarms
        which may be blocking an orchestrated upgrade.

    On systems with Ceph storage, it also checks that the Ceph cluster is
    healthy.

#.  Upgrade controller-1.


    #.  Lock controller-1.

        .. code-block:: none

            ~(keystone_admin)]$ system host-lock controller-1

    #.  Start the upgrade on controller-1.

        Controller-1 installs the update and reboots, then performs data
        migration.

        .. code-block:: none

            ~(keystone_admin)]$ software deploy host controller-1
            Running major release deployment, major_release=24.09, force=False, async_req=False, commit_id=<commit-id>
            Host installation was successful on controller-1

    #.  Unlock controller-1.

        .. code-block:: none

            ~(keystone_admin)]$ system host-unlock controller-1

        Wait for controller-1 to enter the ``unlocked-enabled`` state. Wait until
        the DRBD sync **400.001** Services-related alarm has been raised and then
        cleared.

        The **upgrading-controllers** state applies when this command is
        run. This state is entered after controller-1 has been upgraded to
        release nn.nn and data migration is successfully completed.

        where *nn.nn* in the update file name is the |prod| release number.

        If it transitions to **unlocked-disabled-failed**, check the issue
        before proceeding to the next step. The alarms may indicate a
        configuration error. Check the result of the configuration logs on
        controller-1, (for example, Error logs in
        controller1:``/var/log/puppet``).

    #.  Run the :command:`system application-list` and :command:`software deploy host-list`
        commands to view the current progress.

        After controller-1 is unlocked/enabled/available, insert step to check
        controller-1 is running the new release:

        .. code-block:: none

            ~(keystone_admin)]$ system host-show controller-1

#.  Set controller-1 as the active controller. Swact to controller-1.

    .. code-block:: none

        ~(keystone_admin)]$ system host-swact controller-0

    Wait until services have gone active on the new active controller-1 before
    proceeding to the next step. When all services on controller-1 are
    enabled-active, the swact is complete.

    .. note::

        Continue the remaining steps below to manually upgrade or use upgrade
        orchestration to upgrade the remaining nodes.

#.  Upgrade controller-0.

    For more information, see
    :ref:`introduction-platform-software-updates-upgrades-06d6de90bbd0`.

    #.  Lock controller-0.

        .. code-block:: none

            ~(keystone_admin)]$ system host-lock controller-0

    #.  Upgrade controller-0.

        .. code-block:: none

            ~(keystone_admin)]$ software deploy host controller-0

        .. note::

            controller-0 must pxe-boot over the management network and its load
            must be served from controller-1, and not from any external
            pxe-boot server attached to the |OAM| network. To ensure this,
            check that the network boot list/order of BIOS |NIC| is correct.

    #.  Unlock controller-0.

        .. code-block:: none

            ~(keystone_admin)]$ system host-unlock controller-0

        .. code-block:: none

            ~(keystone_admin)]$ software deploy host controller-0

        You may encounter the following error message:

        .. code-block:: none

            Expecting number of interface sriov_numvfs=16. Please wait a few
            minutes for inventory update and retry host-unlock.

        If you see this error message, you need to retry after 5 minutes.

        Wait until the DRBD sync **400.001** Services-related alarm has been raised
        and then cleared before proceeding to the next step.


        -   upgrading-hosts:

            -   State entered when both controllers are running release <nn.nn>
                software.


#.  Check the system health to ensure that there are no unexpected alarms.

    .. code-block:: none

        ~(keystone_admin)]$ fm alarm-list

    Clear all alarms unrelated to the upgrade process.

#.  If using Ceph storage backend, upgrade the storage nodes one at a time.

    The storage node must be locked and all |OSDs| must be down in order to do
    the upgrade.


    #.  Lock storage-0.

        .. code-block:: none

            ~(keystone_admin)]$ system host-lock storage-0

    #.  Verify that the |OSDs| are down after the storage node is locked.

        In the Horizon interface, navigate to **Admin** \> **Platform** \>
        **Storage Overview** to view the status of the |OSDs|.

    #.  Upgrade storage-0.

        .. code-block:: none

            ~(keystone_admin)]$ software deploy host storage-0

        The upgrade is complete when the node comes online, and at that point,
        you can safely unlock the node.

        After upgrading a storage node, but before unlocking, there are Ceph
        synchronization alarms (that appear to be making progress in
        synching), and there are infrastructure network interface alarms
        (since the infrastructure network interface configuration has not been
        applied to the storage node yet, as it has not been unlocked).

        Unlock the node as soon as the upgraded storage node comes online.

    #.  Unlock storage-0.

        .. code-block:: none

            ~(keystone_admin)]$ system host-unlock storage-0

        Wait for all alarms to clear after the unlock before proceeding to
        upgrade the next storage host.

    #.  Repeat the above steps for each storage host.

        .. note::

            After upgrading the first storage node you can expect alarm
            **800.003**. The alarm is cleared after all storage nodes are
            upgraded.

#.  If worker nodes are present, upgrade worker hosts, serially or in parallel,
    if any.


    #.  Lock worker-0.

        .. code-block:: none

            ~(keystone_admin)]$ system host-lock worker-0

    #.  Upgrade worker-0.

        .. code-block:: none

            ~(keystone_admin)]$ software deploy host worker-0

        Wait for the host to run the installer, reboot, and go online before
        unlocking it in the next step.

    #.  Unlock worker-0.

        .. code-block:: none

            ~(keystone_admin)]$ system host-unlock worker-0

        Wait for all alarms to clear after the unlock before proceeding to the
        next worker host.

    #.  Repeat the above steps for each worker host.


#.  Set controller-0 as the active controller. Swact to controller-0.

    .. code-block:: none

        ~(keystone_admin)]$ system host-swact controller-1

    Wait until services have gone active on the active controller-0 before
    proceeding to the next step. When all services on controller-0 are
    enabled-active, the swact is complete.

#.  Activate the upgrade.

    .. code-block:: none

        ~(keystone_admin)]$ software deploy activate
        Deploy activate has started

    Check deploy state:

    .. code-block:: none

        ~(keystone_admin)]$ software deploy show
        +--------------+------------+------+-----------------+
        | From Release | To Release | RR   | State           |
        +--------------+------------+------+-----------------+
        | 22.12.0      | 24.09.0    | True | deploy-activate |
        +--------------+------------+------+-----------------+

    When activate is complete:

    .. code-block:: none

        +--------------+------------+------+----------------------+
        | From Release | To Release | RR   | State                |
        +--------------+------------+------+----------------------+
        | 22.12.0      | 24.09.0    | True | deploy-activate-done |
        +--------------+------------+------+----------------------+

    During the running of the :command:`upgrade-activate` command, new
    configurations are applied to the controller. 250.001 (**hostname
    Configuration is out-of-date**) alarms are raised and are cleared as the
    configuration is applied. The upgrade state goes from **activating** to
    **activation-complete** once this is done.

    .. only:: partner

        .. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
           :start-after: deploymentmanager-begin
           :end-before: deploymentmanager-end

    The following states apply when this command is executed.

    **activation-requested**
        State entered when :command:`system upgrade-activate` is executed.

    **activating**
        State entered when we have started activating the upgrade by
        applying new configurations to the controller and compute hosts.

    **activating-hosts**
        State entered when applying host-specific configurations. This state is
        entered only if needed.

    **activation-complete**
        State entered when new configurations have been applied to all
        controller and compute hosts.

    #.  Check the status of the upgrade again to see it has reached
        **activation-complete**, for example.

        .. code-block:: none

            ~(keystone_admin)]$ system upgrade-show
            +--------------+--------------------------------------+
            | Property     | Value                                |
            +--------------+--------------------------------------+
            | uuid         | 61e5fcd7-a38d-40b0-ab83-8be55b87fee2 |
            | state        | activation-complete                  |
            | from_release | nn.nn                                |
            | to_release   | nn.nn                                |
            +--------------+--------------------------------------+

    .. note::

        This can take more than half an hour to complete.

    .. note::

        Alarms are generated as the subcloud load sync_status is "out-of-sync".

#.  Complete the upgrade.

    .. code-block:: none

        ~(keystone_admin)]$ software deploy complete
        Deployment has been completed

    Verify deploy state:

    .. code-block:: none

        ~(keystone_admin)]$ software deploy show,
        +--------------+------------+------+------------------+
        | From Release | To Release | RR   | State            |
        +--------------+------------+------+------------------+
        | 22.12.0      | 24.09.0    | True | deploy-completed |
        +--------------+------------+------+------------------+

    Run the :command:`system upgrade-show` command, and the status will display
    "no upgrade in progress". The subclouds will be out-of-sync.

#.  Upgrade Kubernetes, after deploy is completed. When Kubernetes upgrade
    completes, conclude the deploy by deleting it.

    .. code-block:: none

        ~(keystone_admin)]$ software deploy delete, output
        Deploy deleted with success

    Verify deploy state:

    .. code-block:: none

        ~(keystone_admin)]$ software deploy show, output
        No deploy in progress

.. only:: partner

    .. include:: /_includes/upgrading-the-systemcontroller-using-the-cli.rest
       :start-after: DMupgrades-begin
       :end-before: DMupgrades-end