From 5d0c0ecb11bdf1e3ab0ba1222dc9474df0deffbb Mon Sep 17 00:00:00 2001 From: Suzana Fernandes Date: Wed, 9 Oct 2024 19:17:20 +0000 Subject: [PATCH] Update Cloud shutdown procedure (dsr8mr3, dsr8mr2+) Closes-bug: 2084146 Change-Id: I9174b1afdf36f6d49c98e05cfafa10036a72968d Signed-off-by: Suzana Fernandes --- .../shutting-down-starlingx.rst | 136 +++++++++++++++--- .../starting-starlingx.rst | 25 +++- 2 files changed, 137 insertions(+), 24 deletions(-) diff --git a/doc/source/node_management/kubernetes/node_inventory_tasks/shutting-down-starlingx.rst b/doc/source/node_management/kubernetes/node_inventory_tasks/shutting-down-starlingx.rst index b3414272f..fa1037a24 100644 --- a/doc/source/node_management/kubernetes/node_inventory_tasks/shutting-down-starlingx.rst +++ b/doc/source/node_management/kubernetes/node_inventory_tasks/shutting-down-starlingx.rst @@ -13,11 +13,23 @@ hardware. For information on restarting the cluster, see :ref:`Start the System `. +.. note:: + There are two applicable cases for the shut down: + + * System with dedicated storage node + * System with controller storage + + Check in which of the cases the shut down procedure needs to be performed + to avoid errors. + +Shut Down the System with Dedicated Storage Nodes +------------------------------------------------- + .. rubric:: |prereq| On a system that contains storage nodes, a local console or a |BMC| console -connected to **storage-0** is required so that you can issue a shutdown -command in the final step of this procedure. +connected to storage with monitor (**storage-0**) is required so that you can +issue a shutdown command in the final step of this procedure. .. rubric:: |proc| @@ -36,33 +48,43 @@ command in the final step of this procedure. .. code-block:: none - # sudo shutdown -hP now + $ sudo shutdown -hP now Wait until the node is completely shut down before proceeding to the next step. -#. Lock and shut down each storage node except for **storage-0**. +#. Lock and shut down each storage node except for storage node that has + monitor (**storage-0**). - **Storage-0** is required as part of the Ceph monitor quorum. Do not - shut it down until the controllers have been shut down. + Use the following commands in the active controller terminal to check which + storage node has monitor. + + .. code-block:: none + + $ source /etc/platform/openrc + $ system ceph-mon-list + + The storage with monitor (**Storage-0**) is required as part of the Ceph + monitor quorum. Do not shut it down until the controllers have been shut + down. .. note:: - This step applies to Ceph-backed systems - (systems with storage nodes) only. + This step applies to Ceph-backend systems + (systems with dedicated storage nodes) only. #. From the **Admin** \> **Platform** \> **Host Inventory** page, on the Hosts tab, select **Edit Host** \> **Lock Host**. - #. From the terminal of **storage-1**, issue a :command:`shutdown` + #. From the terminal of the storage node, issue a :command:`shutdown` command. .. code-block:: none - # sudo shutdown -hP now + $ sudo shutdown -hP now Wait for several minutes to ensure Ceph has detected and reacted to the missing storage node. You can use :command:`ceph -s` to verify that the - OSDs on storage-1 are down. + OSDs on the storage node are down. #. Lock and shut down **controller-1**. @@ -74,7 +96,7 @@ command in the final step of this procedure. .. code-block:: none - # sudo shutdown -hP now + $ sudo shutdown -hP now Wait until the node is completely shut down before proceeding to the next step. @@ -86,20 +108,100 @@ command in the final step of this procedure. .. code-block:: none - # sudo shutdown -hP now + $ sudo shutdown -hP now Wait until the node is completely shut down before proceeding to the next step. -#. Shut down **storage-0**. +#. Shut down the storage node that has monitor (**storage-0**). .. note:: - This step applies to Ceph-backed systems (systems with storage nodes) - only. + This step applies to Ceph-backend systems (systems with dedicated + storage nodes) only. You must use a local console or a |BMC| console to issue the shutdown command. .. code-block:: none - # sudo shutdown -hP now + $ sudo shutdown -hP now + +Shut Down the System with Controller Storage +-------------------------------------------- + +.. rubric:: |proc| + +#. Swact to controller-0. + + From the **Admin** \> **Platform** \> **Host Inventory** page, on the + Hosts tab, select **Edit Host** \> **Swact Host** for controller-0. + +#. Lock and shut down each worker node except for worker that has monitor. + + Use the following commands in the active controller terminal to check which + worker has monitor. + + .. code-block:: none + + $ source /etc/platform/openrc + $ system ceph-mon-list + + The worker with monitor is required as part of the Ceph monitor quorum. + Do not shut it down until the controllers have been shut down. + + .. note:: + This step applies to Ceph-backend systems + (systems with controller storage) only. + + #. From the **Admin** \> **Platform** \> **Host Inventory** page, on the + Hosts tab, select **Edit Host** \> **Lock Host**. + + #. From the terminal of the worker node, issue a :command:`shutdown` + command. + + .. code-block:: none + + $ sudo shutdown -hP now + + Wait until the node is completely shut down before proceeding to the + next step. + +#. Lock and shut down **controller-1**. + + #. From the **Admin** \> **Platform** \> **Host Inventory** page, on the + Hosts tab, select **Edit Host** \> **Lock Host**. + + #. From the terminal of **controller-1**, issue a :command:`shutdown` + command. + + .. code-block:: none + + $ sudo shutdown -hP now + + Wait until the node is completely shut down before proceeding to the + next step. + +#. Shut down the worker that has monitor. + + .. note:: + This step applies to Ceph-backend systems (systems with controller + storage) only. + + You must use a local console or a |BMC| console to issue the shutdown + command. + + .. code-block:: none + + $ sudo shutdown -hP now + + Wait until the node is completely shut down before proceeding to the + next step. + +#. Shut down **controller-0**. + + You cannot lock this controller node, as it is the last remaining + controller node. + + .. code-block:: none + + $ sudo shutdown -hP now diff --git a/doc/source/node_management/kubernetes/node_inventory_tasks/starting-starlingx.rst b/doc/source/node_management/kubernetes/node_inventory_tasks/starting-starlingx.rst index b8f78ef96..3dcd53d51 100644 --- a/doc/source/node_management/kubernetes/node_inventory_tasks/starting-starlingx.rst +++ b/doc/source/node_management/kubernetes/node_inventory_tasks/starting-starlingx.rst @@ -34,11 +34,11 @@ shut down and physically moved. #. Use the :command:`system host-list` command to ensure that the host is fully booted before proceeding. -#. Power on **storage-0**. +#. Power on storage node with monitor (**storage-0**). .. note:: - This step applies to Ceph-backed systems (systems with storage - nodes) only. + This step applies to Ceph-backend systems (systems with dedicated + storage nodes) only. #. Apply power to the system. @@ -47,11 +47,11 @@ shut down and physically moved. #. Use the :command:`system host-list` command to ensure that the host is fully booted before proceeding. -#. Power on and unlock **storage-1**. +#. Power on and unlock the remaining storage nodes. .. note:: - This step applies to Ceph-backed systems (systems with storage nodes) - only. + This step applies to Ceph-backend systems (systems with dedicated + storage nodes) only. #. Apply power to the system. @@ -62,7 +62,18 @@ shut down and physically moved. #. Use the :command:`system host-list` command to ensure that the host is fully booted before proceeding. -#. Power on and unlock each worker node. +#. Power on worker node with monitor. + + .. note:: + This step applies to Ceph-backend systems (systems with controller + storage) only. + + #. Apply power to the system. + + #. Use the :command:`system host-list` command to ensure that the + host is fully booted before proceeding. + +#. Power on and unlock the remaining worker nodes. #. Follow the instructions for the node's hardware to power it up.