From c1c9f5b26e34f0af112af4e6365bbc8e581cd446 Mon Sep 17 00:00:00 2001 From: MCamp859 Date: Tue, 16 Jun 2020 21:44:08 -0400 Subject: [PATCH] Editorial updates on Ceph Cluster Migration guide Follow on to review 723291. Text edits to align with format of other STX guides. Also alphasorted Operations index list. Change-Id: I7311c696070ab95a11e6396f109d310d74495948 Signed-off-by: MCamp859 Signed-off-by: Martin, Chen --- .../operations/ceph_cluster_migration.rst | 545 ++++++++++++------ doc/source/operations/index.rst | 2 +- 2 files changed, 378 insertions(+), 169 deletions(-) diff --git a/doc/source/operations/ceph_cluster_migration.rst b/doc/source/operations/ceph_cluster_migration.rst index 892c0786d..bc6fe67de 100644 --- a/doc/source/operations/ceph_cluster_migration.rst +++ b/doc/source/operations/ceph_cluster_migration.rst @@ -1,9 +1,9 @@ ====================== -Ceph cluster migration +Ceph Cluster Migration ====================== -This guide contains step by step instructions for migrating StarlingX R3.0 -with standard controller storage Ceph cluster to the containerized Ceph +This guide contains step by step instructions for manually migrating a StarlingX +deployment with a standard dedicated storage Ceph Cluster to a containerized Ceph cluster deployed by Rook. .. contents:: @@ -14,122 +14,123 @@ cluster deployed by Rook. Introduction ------------ -In StarlingX 3.0 or lower versions, Ceph as the backend storage cluster -solution was deployed directly on the host platform. The intent is that -starting from 4.0 Ceph cluster will be containerized and managed by Rook, -for the sake of operation and maintenance efficiency. Therefore, in the -context of StarlingX upgrade from 3.0 to 4.0, we are here introducing a -method to migrate the original Ceph cluster deployed by users at 3.0 -provisioning stage to the newly containerized Ceph cluster at 4.0, while -keeping user data (in OSDs) uncorrupted. +In early releases of StarlingX, the backend storage cluster solution (Ceph) was +deployed directly on the host platform. In an upcoming release of StarlingX, +Ceph cluster will be containerized and managed by Rook, to improve operation +and maintenance efficiency. + +This guide describes a method to migrate the Ceph cluster deployed with StarlingX +early releses to the newly containerized Ceph clusters using an upcoming StarlingX +release, while maintaining user data in :abbr:`OSDs (Object Store Devices)`. --------------------- Prepare for migration --------------------- -StarlingX has some HA mechanisms for critical service monitoring and -recovering. To migrate Ceph monitor(s) and Ceph OSD(s), the first step is to -disable the monitoring and recovering for Ceph services, otherwise the migration -procedure might be interfered due to the continuous service restarting. +StarlingX uses some :abbr:`HA (High Availability)` mechanisms for critical +service monitoring and recovering. To migrate Ceph monitor(s) and Ceph OSD(s), +the first step is to disable monitoring and recovery for Ceph services. This +avoids interrupting the migration procedure with service restarts. ************************************* -Disable StarlingX HA for ceph service +Disable StarlingX HA for Ceph service ************************************* Disable monitoring and recovering for Ceph service by pmon and service manager. -#. Disable pmon's monitoring for Ceph mon and Ceph osd on every host. +#. Disable pmon monitoring for Ceph mon and Ceph osd on every host. :: - $ sudo rm -f /etc/pmon.d/ceph.conf - $ sudo /usr/local/sbin/pmon-restart pmon_cmd_port + sudo rm -f /etc/pmon.d/ceph.conf + sudo /usr/local/sbin/pmon-restart pmon_cmd_port -#. Disable service manager's monitoring for Ceph manager on controller host. +#. Disable service manager's monitoring of Ceph manager on controller host. :: - $ sudo sm-unmanage service mgr-restful-plugin + sudo sm-unmanage service mgr-restful-plugin Service (mgr-restful-plugin) is no longer being managed. - $ sudo sm-unmanage service ceph-manager + sudo sm-unmanage service ceph-manager Service (ceph-manager) is no longer being managed. + sudo sm-deprovision service-group-member storage-monitoring-services ceph-manager + sudo sm-deprovision service-group-member storage-services mgr-restful-plugin ********************************** -Enable ceph service authentication +Enable Ceph service authentication ********************************** -StarlingX disables Ceph authentication, but authentication is must for rook. -Before migration, enable authentication for each of daemons. +StarlingX disables Ceph authentication, but authentication is required for Rook. +Before migration, enable authentication for each daemon. -#. Enabled authentication for Ceph mon and osd service. +#. Enable authentication for Ceph mon and osd service. :: - $ ceph config set mon.storage-0 auth_cluster_required cephx - $ ceph config set mon.storage-0 auth_supported cephx - $ ceph config set mon.storage-0 auth_service_required cephx - $ ceph config set mon.storage-0 auth_client_required cephx - $ ceph config set mon.controller-0 auth_cluster_required cephx - $ ceph config set mon.controller-0 auth_supported cephx - $ ceph config set mon.controller-0 auth_service_required cephx - $ ceph config set mon.controller-0 auth_client_required cephx - $ ceph config set mon.controller-1 auth_cluster_required cephx - $ ceph config set mon.controller-1 auth_supported cephx - $ ceph config set mon.controller-1 auth_service_required cephx - $ ceph config set mon.controller-1 auth_client_required cephx - $ ceph config set mgr.controller-0 auth_supported cephx - $ ceph config set mgr.controller-0 auth_cluster_required cephx - $ ceph config set mgr.controller-0 auth_client_required cephx - $ ceph config set mgr.controller-0 auth_service_required cephx - $ ceph config set mgr.controller-1 auth_supported cephx - $ ceph config set mgr.controller-1 auth_cluster_required cephx - $ ceph config set mgr.controller-1 auth_client_required cephx - $ ceph config set mgr.controller-1 auth_service_required cephx - $ ceph config set osd.0 auth_supported cephx - $ ceph config set osd.0 auth_cluster_required cephx - $ ceph config set osd.0 auth_service_required cephx - $ ceph config set osd.0 auth_client_required cephx - $ ceph config set osd.1 auth_supported cephx - $ ceph config set osd.1 auth_cluster_required cephx - $ ceph config set osd.1 auth_service_required cephx - $ ceph config set osd.1 auth_client_required cephx + ceph config set mon.storage-0 auth_cluster_required cephx + ceph config set mon.storage-0 auth_supported cephx + ceph config set mon.storage-0 auth_service_required cephx + ceph config set mon.storage-0 auth_client_required cephx + ceph config set mon.controller-0 auth_cluster_required cephx + ceph config set mon.controller-0 auth_supported cephx + ceph config set mon.controller-0 auth_service_required cephx + ceph config set mon.controller-0 auth_client_required cephx + ceph config set mon.controller-1 auth_cluster_required cephx + ceph config set mon.controller-1 auth_supported cephx + ceph config set mon.controller-1 auth_service_required cephx + ceph config set mon.controller-1 auth_client_required cephx + ceph config set mgr.controller-0 auth_supported cephx + ceph config set mgr.controller-0 auth_cluster_required cephx + ceph config set mgr.controller-0 auth_client_required cephx + ceph config set mgr.controller-0 auth_service_required cephx + ceph config set mgr.controller-1 auth_supported cephx + ceph config set mgr.controller-1 auth_cluster_required cephx + ceph config set mgr.controller-1 auth_client_required cephx + ceph config set mgr.controller-1 auth_service_required cephx + ceph config set osd.0 auth_supported cephx + ceph config set osd.0 auth_cluster_required cephx + ceph config set osd.0 auth_service_required cephx + ceph config set osd.0 auth_client_required cephx + ceph config set osd.1 auth_supported cephx + ceph config set osd.1 auth_cluster_required cephx + ceph config set osd.1 auth_service_required cephx + ceph config set osd.1 auth_client_required cephx -#. Generate client.admin key. +#. Generate ``client.admin`` key. :: - $ ADMIN_KEY=$(ceph auth get-or-create-key client.admin mon 'allow *' osd 'allow *' mgr 'allow *' mds 'allow *') - $ echo $ADMIN_KEY + ADMIN_KEY=$(ceph auth get-or-create-key client.admin mon 'allow *' osd 'allow *' mgr 'allow *' mds 'allow *') + echo $ADMIN_KEY AQDRGqFea0cYERAAwYdhhle5zEbLLkYHWF+sDw== - $ MON_KEY=$(ceph auth get-or-create-key mon. mon 'allow *') - $ echo $MON_KEY + MON_KEY=$(ceph auth get-or-create-key mon. mon 'allow *') + echo $MON_KEY AQBbs79eM4/FMRAAbu4jwdBFVS1hOmlCdoCacQ== *********************************************** -Create configmap and secret for rook deployment +Create configmap and secret for Rook deployment *********************************************** -Rook will read secret rook-ceph-mon and configmap rook-ceph-mon-endpoint -to get cluster info in deployment. +Rook uses a configmap, ``rook-ceph-mon-endpoint``, and a secret, +``rook-ceph-mon``, to get cluster info. Create the configmap and secret with +the commands below. -#. Create configmap and secret for rook deployment. +:: - :: + export NAMESPACE=kube-system + export ROOK_EXTERNAL_CEPH_MON_DATA=a=192.188.204.3:6789 + export ROOK_EXTERNAL_FSID=$(ceph fsid) + export ROOK_EXTERNAL_CLUSTER_NAME=$NAMESPACE + export ROOK_EXTERNAL_MAX_MON_ID=0 - $ export NAMESPACE=kube-system - $ export ROOK_EXTERNAL_CEPH_MON_DATA=a=192.188.204.3:6789 - $ export ROOK_EXTERNAL_FSID=$(ceph fsid) - $ export ROOK_EXTERNAL_CLUSTER_NAME=$NAMESPACE - $ export ROOK_EXTERNAL_MAX_MON_ID=0 - - $ kubectl -n "$NAMESPACE" create secret generic rook-ceph-mon \ + kubectl -n "$NAMESPACE" create secret generic rook-ceph-mon \ > --from-literal=cluster-name="$ROOK_EXTERNAL_CLUSTER_NAME" \ > --from-literal=fsid="$ROOK_EXTERNAL_FSID" \ > --from-literal=admin-secret="$ADMIN_KEY" \ > --from-literal=mon-secret="$MON_KEY" secret/rook-ceph-mon created - $ kubectl -n "$NAMESPACE" create configmap rook-ceph-mon-endpoints \ + kubectl -n "$NAMESPACE" create configmap rook-ceph-mon-endpoints \ > --from-literal=data="$ROOK_EXTERNAL_CEPH_MON_DATA" \ > --from-literal=mapping="$ROOK_EXTERNAL_MAPPING" \ > --from-literal=maxMonId="$ROOK_EXTERNAL_MAX_MON_ID" @@ -139,16 +140,16 @@ to get cluster info in deployment. Remove rbd-provisioner ********************** -Application platform-integ-apps deploys the helm chart rbd-provisioner. This -chart will be unncesssary after rook deployed, remove before rook deployment. +The ``platform-integ-apps`` application deploys the helm chart +``rbd-provisioner``. This chart is unnecessary after Rook is deployed, remove +it with the command below. -#. remove rbd-provisioner. +:: - :: + sudo rm -rf /opt/platform/sysinv/20.01/.crushmap_applied + source /etc/platform/openrc + system application-remove platform-integ-apps - $ sudo rm -rf /opt/platform/sysinv/20.01/.crushmap_applied - $ source /etc/platform/openrc - $ system application-remove platform-integ-apps +---------------+----------------------------------+ | Property | Value | +---------------+----------------------------------+ @@ -167,14 +168,20 @@ chart will be unncesssary after rook deployed, remove before rook deployment. Disable ceph osd on all storage hosts and create configmap for migration ************************************************************************ -Login to storage host with osd provisioned, disable Ceph osd service and create -journal file +#. Login to controller host and run ``ceph-preshutdown.sh`` firstly. -#. Disable Ceph osd service. + :: + + sudo ceph-preshutdown.sh + +#. Login to the storage host with provisioned OSD, disable the Ceph osd service, +and create a journal file. + +#. Disable the Ceph osd service. :: - $ sudo service ceph -a stop osd.1 + sudo service ceph -a stop osd.1 === osd.1 === Stopping Ceph osd.1 on storage-0...kill 213077... done @@ -182,23 +189,24 @@ journal file 2020-04-26 23:36:56.988 7f1d647bb1c0 -1 journal do_read_entry(585007104): bad header magic 2020-04-26 23:36:56.994 7f1d647bb1c0 -1 flushed journal /var/lib/ceph/osd/ceph-1/journal for object store /var/lib/ceph/osd/ceph-1 -#. Remove journal link and create a blank journal file +#. Remove the journal link and create a blank journal file. :: - $ sudo rm -f /var/lib/ceph/osd/ceph-1/journal - $ sudo touch /var/lib/ceph/osd/ceph-1/journal - $ sudo dd if=/dev/zero of=/var/lib/ceph/osd/ceph-1/journal bs=1M count=1024 - $ sudo ceph-osd --id 1 --mkjournal --no-mon-config - $ sudo umount /dev/sdc1 + sudo rm -f /var/lib/ceph/osd/ceph-1/journal + sudo touch /var/lib/ceph/osd/ceph-1/journal + sudo dd if=/dev/zero of=/var/lib/ceph/osd/ceph-1/journal bs=1M count=1024 + sudo ceph-osd --id 1 --mkjournal --no-mon-config + sudo umount /dev/sdc1 -#. Mount to host patch /var/lib/ceph/osd, which could be access by rook's osd pod +#. Mount to host patch /var/lib/ceph/osd, which can be accessed by the Rook + osd pod. :: - $ sudo mkdir -p /var/lib/ceph/ceph-1/osd1 - $ sudo mount /dev/sdc1 /var/lib/ceph/ceph-1/osd1 - $ sudo ls /var/lib/ceph/ceph-1/osd1 -l + sudo mkdir -p /var/lib/ceph/ceph-1/osd1 + sudo mount /dev/sdc1 /var/lib/ceph/ceph-1/osd1 + sudo ls /var/lib/ceph/ceph-1/osd1 -l total 1048640 -rw-r--r-- 1 root root 3 Apr 26 12:57 active -rw-r--r-- 1 root root 37 Apr 26 12:57 ceph_fsid @@ -216,15 +224,15 @@ journal file -rw-r--r-- 1 root root 2 Apr 26 12:57 wanttobe -rw-r--r-- 1 root root 2 Apr 26 12:57 whoami -For every host with osd device, create a configmap. Configmap name is -rook-ceph-osd--config. In configmap, it specified osd data folder. -For example, this data will info rook osd0 data path is /var/lib/ceph/osd0 +For every host with an OSD device, create a configmap with the name +``rook-ceph-osd--config``. In the configmap, specify the OSD data +folder. In the example below, the Rook osd0 data path is ``/var/lib/ceph/osd0``. - :: +:: osd-dirs: '{"/var/lib/ceph/ceph-0/":0}' - $ system host-stor-list storage-0 + system host-stor-list storage-0 +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ | uuid | function | osdid | state | idisk_uuid | journal_path | journal_no | journal_size | tier_name | | | | | | | | de | _gib | | @@ -237,7 +245,7 @@ For example, this data will info rook osd0 data path is /var/lib/ceph/osd0 | | | | | | | | | | +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ - $ system host-stor-list storage-1 + system host-stor-list storage-1 +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ | uuid | function | osdid | state | idisk_uuid | journal_path | journal_no | journal_size | tier_name | | | | | | | | de | _gib | | @@ -251,7 +259,7 @@ For example, this data will info rook osd0 data path is /var/lib/ceph/osd0 +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ -#. Sample osd-configmap.yaml +#. Sample ``osd-configmap.yaml`` file. :: apiVersion: v1 @@ -270,11 +278,11 @@ For example, this data will info rook osd0 data path is /var/lib/ceph/osd0 data: osd-dirs: '{"/var/lib/ceph/ceph-2":2,"/var/lib/ceph/ceph-3":3}' -#. Apply yaml file for configmap +#. Apply yaml file for configmap. :: - $ kubectl apply -f osd-configmap.yaml + kubectl apply -f osd-configmap.yaml configmap/rook-ceph-osd-storage-0-config created configmap/rook-ceph-osd-storage-1-config created @@ -282,55 +290,56 @@ For example, this data will info rook osd0 data path is /var/lib/ceph/osd0 Ceph monitor data movement ************************** -For Ceph monitor migration, Rook deployed monitor pod will read monitor data -for host path /var/lib/ceph/mon-/data. For example, if only deployed one -monitor pod, a monitor process named "mon.a" in monitor pod will be created -and monitor data in host path /var/lib/ceph/mon-a/data. So before migration, -one monitor service should be disable and launch another monitor which will -be specified with parameter "--mon-data /var/lib/ceph/mon-a/data" to make -monitor data migrating to /var/lib/ceph/mon-a/data. +For Ceph monitor migration, the Rook deployed monitor pod will read monitor data +for host path ``/var/lib/ceph/mon-/data``. For example, if only one monitor +pod is deployed, a monitor process named ``mon.a`` in the monitor pod will be +created and monitor data will be in the host path ``/var/lib/ceph/mon-a/data``. -#. Login host controller-0, disable service monitor.controller-0. +Before migration, disable one monitor service and launch another monitor +specified with the ``--mon-data /var/lib/ceph/mon-a/data`` parameter. This will +migrate the monitor data to ``/var/lib/ceph/mon-a/data``. + +#. Login to host controller-0 and disable service monitor.controller-0. :: - $ sudo service ceph -a stop mon.controller-0 + sudo service ceph -a stop mon.controller-0 === mon.controller-0 === Stopping Ceph mon.controller-0 on controller-0...kill 291101...done -#. Login host controller-1, disable service monitor.controller-1. +#. Login to host controller-1 and disable service monitor.controller-1. :: - $ sudo service ceph -a stop mon.controller-1 + sudo service ceph -a stop mon.controller-1 === mon.controller-1 === Stopping Ceph mon.controller-1 on controller-1...kill 385107... done -#. Login host storage-0, disable service monitor.storage-0. +#. Login to host storage-0 and disable service monitor.storage-0. :: - $ sudo service ceph -a stop mon.storage-0 + sudo service ceph -a stop mon.storage-0 === mon.storage-0 === Stopping Ceph mon.storage-0 on storage-0...kill 31394... done -#. Copy mon data to folder /var/lib/ceph/mon-a/data. +#. Copy mon data to the ``/var/lib/ceph/mon-a/data`` folder. :: - $ sudo mkdir -p /var/lib/ceph/mon-a/data/ - $ sudo ceph-monstore-tool /var/lib/ceph/mon/ceph-controller-0/ store-copy /var/lib/ceph/mon-a/data/ + sudo mkdir -p /var/lib/ceph/mon-a/data/ + sudo ceph-monstore-tool /var/lib/ceph/mon/ceph-controller-0/ store-copy /var/lib/ceph/mon-a/data/ -#. Update monmap in this copy of monitor data, update monitor info. +#. Update monmap in this copy of monitor data and update monitor info. :: - $ sudo ceph-mon --extract-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ + sudo ceph-mon --extract-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ 2020-05-21 06:01:39.477 7f69d63b2140 -1 wrote monmap to monmap - $ monmaptool --print monmap + monmaptool --print monmap monmaptool: monmap file monmap epoch 2 fsid 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a @@ -340,26 +349,26 @@ monitor data migrating to /var/lib/ceph/mon-a/data. 1: 192.188.204.4:6789/0 mon.controller-1 2: 192.188.204.41:6789/0 mon.storage-0 - $ sudo monmaptool --rm controller-0 monmap + sudo monmaptool --rm controller-0 monmap monmaptool: monmap file monmap monmaptool: removing controller-0 monmaptool: writing epoch 2 to monmap (2 monitors) - $ sudo monmaptool --rm controller-1 monmap + sudo monmaptool --rm controller-1 monmap monmaptool: monmap file monmap monmaptool: removing controller-1 monmaptool: writing epoch 2 to monmap (1 monitors) - $ sudo monmaptool --rm storage-0 monmap + sudo monmaptool --rm storage-0 monmap monmaptool: monmap file monmap monmaptool: removing storage-0 monmaptool: writing epoch 2 to monmap (0 monitors) - $ sudo monmaptool --add a 192.188.204.3 monmap + sudo monmaptool --add a 192.188.204.3 monmap monmaptool: monmap file monmap monmaptool: writing epoch 2 to monmap (1 monitors) - $ monmaptool --print monmap + monmaptool --print monmap monmaptool: monmap file monmap epoch 2 fsid 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a @@ -367,42 +376,45 @@ monitor data migrating to /var/lib/ceph/mon-a/data. created 2020-05-21 03:50:51.893155 0: 192.188.204.3:6789/0 mon.a - $ sudo ceph-mon --inject-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ + sudo ceph-mon --inject-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ ---------------------- Deploy Rook helm chart ---------------------- -StarlingX already creates a application for Rook deployment, after finish the -above preparation, apply the application to deploy rook. To make live migration -and keep Ceph service always readiness, Ceph service should migrate in turn. -First Ceph monitor, which is mon.a, exits and launch rook cluster with one monitor -pod. At this time, 2 monitor daemons and 1 monitor pod is running and then migrate -osd one by one. At last, migrate 2 monitor daemon and migration is done. +StarlingX creates a application for Rook deployment. After finishing the +preparation steps above, run the application to deploy Rook. To complete live +migration and keep Ceph services ready, you should migrate Ceph services in the +following order: + +* Exit the first Ceph monitor, ``mon.a``, and launch the Rook cluster with one + monitor pod. At this time, 2 monitor daemons and 1 monitor pod are running. +* Migrate OSD pods one by one. +* Finally, migrate 2 monitor daemons to complete the migration. ************************************** Disable Ceph monitors and Ceph manager ************************************** -Disable Ceph manager on host controller-0 and controller-1 +Disable Ceph manager on host controller-0 and controller-1. -#. Disable Ceph manager +:: - :: - - $ ps -aux | grep mgr + ps -aux | grep mgr root 97971 0.0 0.0 241336 18488 ? S< 03:54 0:02 /usr/bin/python /etc/init.d/mgr-restful-plugin start root 97990 0.5 0.0 241468 18916 ? S< 03:54 0:38 /usr/bin/python /etc/init.d/mgr-restful-plugin start root 186145 1.2 0.3 716488 111328 ? S