Document how to quiesce and delete a cephstorage node

Adds a chapter in post_deployment describing how to cleanly remove from the Ceph cluster a cephstorage node and all the OSDs it hosted. Change-Id: I6690bb54e4724d1042ecc3fa812cc390de6ce49d
2016-11-29 22:49:16 +01:00 · 2016-11-29 22:49:16 +01:00 · 39b88b1ff4
commit 39b88b1ff4
parent d927b28a10
3 changed files with 60 additions and 2 deletions
--- a/doc/source/post_deployment/delete_nodes.rst
+++ b/doc/source/post_deployment/delete_nodes.rst
@ -18,8 +18,9 @@ IDs (which represent nodes) to be deleted.
   changes to the overcloud.

 .. note::
-   Before deleting a compute node please make sure that the node is quiesced,
-   see :ref:`quiesce_compute`.
+   Before deleting a compute node or a cephstorage node, please make sure that
+   the node is quiesced, see :ref:`quiesce_compute` or
+   :ref:`quiesce_cephstorage`.

 .. note::
   A list of nova instance IDs can be listed with command::
--- a/doc/source/post_deployment/post_deployment.rst
+++ b/doc/source/post_deployment/post_deployment.rst
@ -10,6 +10,7 @@ In this chapter you will find advanced management of various |project| areas.
   scale_roles
   delete_nodes
   quiesce_compute
+   quiesce_cephstorage
   vm_snapshot
   package_update
   upgrade
--- a/doc/source/post_deployment/quiesce_cephstorage.rst
+++ b/doc/source/post_deployment/quiesce_cephstorage.rst
@ -0,0 +1,56 @@
+.. _quiesce_cephstorage:
+
+Quiescing a CephStorage Node
+============================
+
+The process of quiescing a cephstorage node means to inform the Ceph
+cluster that one or multiple OSDs will be permanently removed so that
+the node can be shut down without affecting the data availability.
+
+Take the OSDs out of the cluster
+--------------------------------
+
+Before you remove an OSD, you need to take it out of the cluster so that Ceph
+can begin rebalancing and copying its data to other OSDs. Running the following
+commands on a given cephstorage node will take all data out of the OSDs hosted
+on it::
+
+    OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
+    for OSD_ID in $OSD_IDS; do ceph crush reweight osd.$OSD_ID 0.0; done
+
+Ceph will begin rebalancing the cluster by migrating placement groups out of
+the OSDs. You can observe this process with the ceph tool::
+
+    ceph -w
+
+You should see the placement group states change from active+clean to active,
+some degraded objects, and finally active+clean when migration completes.
+
+Removing the OSDs
+-----------------
+
+After the rebalancing, the OSDs will still be running. Running the following on
+that same cephstorage node will stop all OSDs hosted on it, remove them from the
+CRUSH map, from the OSDs map and delete the authentication keys::
+
+    OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
+    for OSD_ID in $OSD_IDS; do
+      ceph osd out $OSD_ID
+      systemctl stop ceph-osd@$OSD_ID
+      ceph osd crush remove osd.$OSD_ID
+      ceph auth del osd.$OSD_ID
+      ceph osd rm $OSD_ID
+    done
+
+.. admonition:: Mitaka
+   :class: mitaka
+
+   TripleO/Mitaka uses and supports Ceph/Hammer, not Jewel, which does not
+   use systemd but sysv init scripts. For Mitaka the systemctl command above
+   which stops the OSD should be replaced by::
+
+       service ceph stop osd.$OSD_ID
+
+You are now free to reboot or shut down the node (using the Ironic API), or
+even remove it from the overcloud altogether by scaling down the overcloud
+deployment, see :ref:`delete_nodes`.