Update Cloud shutdown procedure (dsr8mr3, dsr8mr2+)

Closes-bug: 2084146 Change-Id: I9174b1afdf36f6d49c98e05cfafa10036a72968d Signed-off-by: Suzana Fernandes <Suzana.Fernandes@windriver.com>
2024-10-09 19:17:20 +00:00 · 2024-10-09 19:17:20 +00:00 · 5d0c0ecb11
commit 5d0c0ecb11
parent 7dc8f326fe
2 changed files with 137 additions and 24 deletions
--- a/doc/source/node_management/kubernetes/node_inventory_tasks/shutting-down-starlingx.rst
+++ b/doc/source/node_management/kubernetes/node_inventory_tasks/shutting-down-starlingx.rst
@ -13,11 +13,23 @@ hardware.
 For information on restarting the cluster, see :ref:`Start the System
 <starting-starlingx>`.

+.. note::
+    There are two applicable cases for the shut down:
+
+    * System with dedicated storage node
+    * System with controller storage
+
+    Check in which of the cases the shut down procedure needs to be performed
+    to avoid errors.
+
+Shut Down the System with Dedicated Storage Nodes
+-------------------------------------------------
+
 .. rubric:: |prereq|

 On a system that contains storage nodes, a local console or a |BMC| console
-connected to **storage-0** is required so that you can issue a shutdown
-command in the final step of this procedure.
+connected to storage with monitor (**storage-0**) is required so that you can
+issue a shutdown command in the final step of this procedure.

 .. rubric:: |proc|

@ -36,33 +48,43 @@ command in the final step of this procedure.

        .. code-block:: none

-            # sudo shutdown -hP now
+            $ sudo shutdown -hP now

        Wait until the node is completely shut down before proceeding to
        the next step.

-#.  Lock and shut down each storage node except for **storage-0**.
+#.  Lock and shut down each storage node except for storage node that has
+    monitor (**storage-0**).

-    **Storage-0** is required as part of the Ceph monitor quorum. Do not
-    shut it down until the controllers have been shut down.
+    Use the following commands in the active controller terminal to check which
+    storage node has monitor.
+
+    .. code-block:: none
+
+        $ source /etc/platform/openrc
+        $ system ceph-mon-list
+
+    The storage with monitor (**Storage-0**) is required as part of the Ceph
+    monitor quorum. Do not shut it down until the controllers have been shut
+    down.

    .. note::
-        This step applies to Ceph-backed systems
-        (systems with storage nodes) only.
+        This step applies to Ceph-backend systems
+        (systems with dedicated storage nodes) only.

    #.  From the **Admin** \> **Platform** \> **Host Inventory** page, on the
        Hosts tab, select **Edit Host** \> **Lock Host**.

-    #.  From the terminal of **storage-1**, issue a :command:`shutdown`
+    #.  From the terminal of the storage node, issue a :command:`shutdown`
        command.

        .. code-block:: none

-            # sudo shutdown -hP now
+            $ sudo shutdown -hP now

    Wait for several minutes to ensure Ceph has detected and reacted to the
    missing storage node. You can use :command:`ceph -s` to verify that the
-    OSDs on storage-1 are down.
+    OSDs on the storage node are down.

 #.  Lock and shut down **controller-1**.

@ -74,7 +96,7 @@ command in the final step of this procedure.

        .. code-block:: none

-            # sudo shutdown -hP now
+            $ sudo shutdown -hP now

    Wait until the node is completely shut down before proceeding to the
    next step.
@ -86,20 +108,100 @@ command in the final step of this procedure.

    .. code-block:: none

-        # sudo shutdown -hP now
+        $ sudo shutdown -hP now

    Wait until the node is completely shut down before proceeding to the
    next step.

-#.  Shut down **storage-0**.
+#.  Shut down the storage node that has monitor (**storage-0**).

    .. note::
-        This step applies to Ceph-backed systems (systems with storage nodes)
-        only.
+        This step applies to Ceph-backend systems (systems with dedicated
+        storage nodes) only.

    You must use a local console or a |BMC| console to issue the shutdown
    command.

    .. code-block:: none

-        # sudo shutdown -hP now
+        $ sudo shutdown -hP now
+
+Shut Down the System with Controller Storage
+--------------------------------------------
+
+.. rubric:: |proc|
+
+#.  Swact to controller-0.
+
+    From the **Admin** \> **Platform** \> **Host Inventory** page, on the
+    Hosts tab, select **Edit Host** \> **Swact Host** for controller-0.
+
+#.  Lock and shut down each worker node except for worker that has monitor.
+
+    Use the following commands in the active controller terminal to check which
+    worker has monitor.
+
+    .. code-block:: none
+
+        $ source /etc/platform/openrc
+        $ system ceph-mon-list
+
+    The worker with monitor is required as part of the Ceph monitor quorum.
+    Do not shut it down until the controllers have been shut down.
+
+    .. note::
+        This step applies to Ceph-backend systems
+        (systems with controller storage) only.
+
+    #.  From the **Admin** \> **Platform** \> **Host Inventory** page, on the
+        Hosts tab, select **Edit Host** \> **Lock Host**.
+
+    #.  From the terminal of the worker node, issue a :command:`shutdown`
+        command.
+
+        .. code-block:: none
+
+            $ sudo shutdown -hP now
+
+    Wait until the node is completely shut down before proceeding to the
+    next step.
+
+#.  Lock and shut down **controller-1**.
+
+    #.  From the **Admin** \> **Platform** \> **Host Inventory** page, on the
+        Hosts tab, select **Edit Host** \> **Lock Host**.
+
+    #.  From the terminal of **controller-1**, issue a :command:`shutdown`
+        command.
+
+        .. code-block:: none
+
+            $ sudo shutdown -hP now
+
+    Wait until the node is completely shut down before proceeding to the
+    next step.
+
+#.  Shut down the worker that has monitor.
+
+    .. note::
+        This step applies to Ceph-backend systems (systems with controller
+        storage) only.
+
+    You must use a local console or a |BMC| console to issue the shutdown
+    command.
+
+    .. code-block:: none
+
+        $ sudo shutdown -hP now
+
+    Wait until the node is completely shut down before proceeding to the
+    next step.
+
+#.  Shut down **controller-0**.
+
+    You cannot lock this controller node, as it is the last remaining
+    controller node.
+
+    .. code-block:: none
+
+        $ sudo shutdown -hP now
--- a/doc/source/node_management/kubernetes/node_inventory_tasks/starting-starlingx.rst
+++ b/doc/source/node_management/kubernetes/node_inventory_tasks/starting-starlingx.rst
@ -34,11 +34,11 @@ shut down and physically moved.
    #.  Use the :command:`system host-list` command to ensure that the host
        is fully booted before proceeding.

-#.  Power on **storage-0**.
+#.  Power on storage node with monitor (**storage-0**).

    .. note::
-        This step applies to Ceph-backed systems  (systems with storage
-        nodes) only.
+        This step applies to Ceph-backend systems  (systems with dedicated
+        storage nodes) only.

    #.  Apply power to the system.

@ -47,11 +47,11 @@ shut down and physically moved.
    #.  Use the :command:`system host-list` command to ensure that the
        host is fully booted before proceeding.

-#.  Power on and unlock **storage-1**.
+#.  Power on and unlock the remaining storage nodes.

    .. note::
-        This step applies to Ceph-backed systems (systems with storage nodes)
-        only.
+        This step applies to Ceph-backend systems (systems with dedicated
+        storage nodes) only.

    #.  Apply power to the system.

@ -62,7 +62,18 @@ shut down and physically moved.
    #.  Use the :command:`system host-list` command to ensure that the
        host is fully booted before proceeding.

-#.  Power on and unlock each worker node.
+#.  Power on worker node with monitor.
+
+    .. note::
+        This step applies to Ceph-backend systems (systems with controller
+        storage) only.
+
+    #.  Apply power to the system.
+
+    #.  Use the :command:`system host-list` command to ensure that the
+        host is fully booted before proceeding.
+
+#.  Power on and unlock the remaining worker nodes.

    #.  Follow the instructions for the node's hardware to power it up.