========================================= Ceph Cluster Migration for simplex system ========================================= This guide contains step by step instructions for manually migrating a StarlingX deployment with an all-in-one simplex Ceph Cluster to a containerized Ceph cluster deployed by Rook. .. contents:: :local: :depth: 1 ------------ Introduction ------------ In early releases of StarlingX, the backend storage cluster solution (Ceph) was deployed directly on the host platform. In an upcoming release of StarlingX, Ceph cluster will be containerized and managed by Rook, to improve operation and maintenance efficiency. This guide describes a method to migrate the host-based Ceph cluster deployed with StarlingX early releses to the newly containerized Ceph clusters using an upcoming StarlingX release, while maintaining user data in :abbr:`OSDs (Object Store Devices)`. The migration procedure maintains CEPH OSDs and data on OSDs. Although the procedure does result in hosted applications experiencing several minutes of service outage due to temporary loss of access to PVCs or virtual disks, due to the temporary loss of the ceph service. --------------------- Prepare for migration --------------------- StarlingX uses some :abbr:`HA (High Availability)` mechanisms for critical service monitoring and recovering. To migrate Ceph monitor(s) and Ceph OSD(s), the first step is to disable monitoring and recovery for Ceph services. This avoids interrupting the migration procedure with service restarts. ************************************* Disable StarlingX HA for Ceph service ************************************* #. Disable service manager's monitoring of Ceph related service on both two controllers. :: sudo rm -f /etc/pmon.d/ceph.conf sudo /usr/local/sbin/pmon-restart pmon_cmd_port sudo sm-unmanage service mgr-restful-plugin Service (mgr-restful-plugin) is no longer being managed. sudo sm-unmanage service ceph-manager Service (ceph-manager) is no longer being managed. sudo sm-deprovision service-group-member storage-monitoring-services ceph-manager sudo sm-deprovision service-group-member storage-monitoring-services mgr-restful-plugin ********************************** Enable Ceph service authentication ********************************** StarlingX disables Ceph authentication, but authentication is required for Rook. Before migration, enable authentication for each daemon. #. Enable authentication for Ceph mon and osd service. :: ceph config set mon.controller-0 auth_cluster_required cephx ceph config set mon.controller-0 auth_supported cephx ceph config set mon.controller-0 auth_service_required cephx ceph config set mon.controller-0 auth_client_required cephx ceph config set mgr.controller-0 auth_supported cephx ceph config set mgr.controller-0 auth_cluster_required cephx ceph config set mgr.controller-0 auth_client_required cephx ceph config set mgr.controller-0 auth_service_required cephx ceph config set osd.0 auth_supported cephx ceph config set osd.0 auth_cluster_required cephx ceph config set osd.0 auth_service_required cephx ceph config set osd.0 auth_client_required cephx #. Generate ``client.admin`` key. :: ADMIN_KEY=$(ceph auth get-or-create-key client.admin mon 'allow *' osd 'allow *' mgr 'allow *' mds 'allow *') echo $ADMIN_KEY AQDRGqFea0cYERAAwYdhhle5zEbLLkYHWF+sDw== MON_KEY=$(ceph auth get-or-create-key mon. mon 'allow *') echo $MON_KEY AQBbs79eM4/FMRAAbu4jwdBFVS1hOmlCdoCacQ== *********************************************** Create configmap and secret for Rook deployment *********************************************** Rook uses a configmap, ``rook-ceph-mon-endpoint``, and a secret, ``rook-ceph-mon``, to get cluster info. Create the configmap and secret with the commands below. :: export NAMESPACE=kube-system export ROOK_EXTERNAL_CEPH_MON_DATA=a=192.188.204.3:6789 export ROOK_EXTERNAL_FSID=$(ceph fsid) export ROOK_EXTERNAL_CLUSTER_NAME=$NAMESPACE export ROOK_EXTERNAL_MAX_MON_ID=0 kubectl -n "$NAMESPACE" create secret generic rook-ceph-mon \ > --from-literal=cluster-name="$ROOK_EXTERNAL_CLUSTER_NAME" \ > --from-literal=fsid="$ROOK_EXTERNAL_FSID" \ > --from-literal=admin-secret="$ADMIN_KEY" \ > --from-literal=mon-secret="$MON_KEY" secret/rook-ceph-mon created kubectl -n "$NAMESPACE" create configmap rook-ceph-mon-endpoints \ > --from-literal=data="$ROOK_EXTERNAL_CEPH_MON_DATA" \ > --from-literal=mapping="$ROOK_EXTERNAL_MAPPING" \ > --from-literal=maxMonId="$ROOK_EXTERNAL_MAX_MON_ID" configmap/rook-ceph-mon-endpoint created ********************** Remove rbd-provisioner ********************** The ``platform-integ-apps`` application deploys the helm chart ``rbd-provisioner``. This chart is unnecessary after Rook is deployed, remove it with the command below. :: sudo rm -rf /opt/platform/sysinv/20.01/.crushmap_applied source /etc/platform/openrc system application-remove platform-integ-apps +---------------+----------------------------------+ | Property | Value | +---------------+----------------------------------+ | active | True | | app_version | 1.0-8 | | created_at | 2020-04-22T14:56:19.148562+00:00 | | manifest_file | manifest.yaml | | manifest_name | platform-integration-manifest | | name | platform-integ-apps | | progress | None | | status | removing | | updated_at | 2020-04-22T15:46:24.018090+00:00 | +---------------+----------------------------------+ ---------------------------------------------- Remove storage backend ceph-store and clean up ---------------------------------------------- After migration, remove the default storage backend ceph-store. :: system storage-backend-list +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ | uuid | name | backend | state | task | services | capabilities | +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ | 3fd0a407-dd8b-4a5c-9dec-8754d76956f4 | ceph-store | ceph | configured | None | None | min_replication: 1 replication: 2 | | | | | | | | | +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ system storage-backend-delete 3fd0a407-dd8b-4a5c-9dec-8754d76956f4 --force Update puppet system config. :: sudo sysinv-puppet create-system-config Remove script ceph.sh on both controllers. :: sudo rm -rf /etc/services.d/controller/ceph.sh sudo rm -rf /etc/services.d/worker/ceph.sh sudo rm -rf /etc/services.d/storage/ceph.sh ************************************************************************ Disable ceph osd on all storage hosts and create configmap for migration ************************************************************************ #. Login to controller host and run ``ceph-preshutdown.sh`` firstly. :: sudo ceph-preshutdown.sh Login to controller-0, disable the Ceph osd service, and create a journal file. #. Disable the Ceph osd service. :: sudo service ceph -a stop osd.0 === osd.0 === Stopping Ceph osd.0 on controller-0...kill 213077... done 2020-04-26 23:36:56.988 7f1d647bb1c0 -1 journal do_read_entry(585007104): bad header magic 2020-04-26 23:36:56.988 7f1d647bb1c0 -1 journal do_read_entry(585007104): bad header magic 2020-04-26 23:36:56.994 7f1d647bb1c0 -1 flushed journal /var/lib/ceph/osd/ceph-0/journal for object store /var/lib/ceph/osd/ceph-0 #. Remove the journal link and create a blank journal file. :: sudo rm -f /var/lib/ceph/osd/ceph-0/journal sudo touch /var/lib/ceph/osd/ceph-0/journal sudo dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/journal bs=1M count=1024 sudo ceph-osd --id 0 --mkjournal --no-mon-config sudo umount /dev/sdc1 #. Mount to host patch /var/lib/ceph/osd, which can be accessed by the Rook osd pod. :: sudo mkdir -p /var/lib/ceph/ceph-0/osd0 sudo mount /dev/sdc1 /var/lib/ceph/ceph-0/osd0 sudo ls /var/lib/ceph/ceph-1/osd0 -l total 1048640 -rw-r--r-- 1 root root 3 Apr 26 12:57 active -rw-r--r-- 1 root root 37 Apr 26 12:57 ceph_fsid drwxr-xr-x 388 root root 12288 Apr 27 00:01 current -rw-r--r-- 1 root root 37 Apr 26 12:57 fsid -rw-r--r-- 1 root root 1073741824 Apr 27 00:49 journal -rw-r--r-- 1 root root 37 Apr 26 12:57 journal_uuid -rw------- 1 root root 56 Apr 26 12:57 keyring -rw-r--r-- 1 root root 21 Apr 26 12:57 magic -rw-r--r-- 1 root root 6 Apr 26 12:57 ready -rw-r--r-- 1 root root 4 Apr 26 12:57 store_version -rw-r--r-- 1 root root 53 Apr 26 12:57 superblock -rw-r--r-- 1 root root 0 Apr 26 12:57 sysvinit -rw-r--r-- 1 root root 10 Apr 26 12:57 type -rw-r--r-- 1 root root 2 Apr 26 12:57 wanttobe -rw-r--r-- 1 root root 2 Apr 26 12:57 whoami Create a configmap with the name ``rook-ceph-osd-controller-0-config``. In the configmap, specify the OSD data folder. In the example below, the Rook osd0 data path is ``/var/lib/ceph/osd0``. :: osd-dirs: '{"/var/lib/ceph/ceph-0/":0}' system host-stor-list controller-0 +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ | uuid | function | osdid | state | idisk_uuid | journal_path | journal_no | journal_size | tier_name | | | | | | | | de | _gib | | +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ | 21a90d60-2f1e-4f46-badc-afa7d9117622 | osd | 0 | configured | a13c6ac9-9d59-4063-88dc-2847e8aded85 | /dev/disk/by-path/pci-0000: | /dev/sdc2 | 1 | storage | | | | | | | 00:03.0-ata-3.0-part2 | | | | | | | | | | | | | | +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ #. Sample ``osd-configmap.yaml`` file. :: apiVersion: v1 kind: ConfigMap metadata: name: rook-ceph-osd-controller-0-config namespace: kube-system data: osd-dirs: '{"/var/lib/ceph/ceph-0":0}' #. Apply yaml file for configmap. :: kubectl apply -f osd-configmap.yaml configmap/rook-ceph-osd-controller-0-config created configmap/rook-ceph-osd-controller-1-config created ************************** Ceph monitor data movement ************************** For Ceph monitor migration, the Rook deployed monitor pod will read monitor data for host path ``/var/lib/ceph/mon-/data``. For example, if only one monitor pod is deployed, a monitor process named ``mon.a`` in the monitor pod will be created and monitor data will be in the host path ``/var/lib/ceph/mon-a/data``. Before migration, disable one monitor service and launch another monitor specified with the ``--mon-data /var/lib/ceph/mon-a/data`` parameter. This will migrate the monitor data to ``/var/lib/ceph/mon-a/data``. #. Login to host controller-0 and disable service monitor.controller. :: sudo service ceph -a stop mon.controller === mon.controller-0 === Stopping Ceph mon.controller on controller-0...kill 291101...done #. Copy mon data to the ``/var/lib/ceph/mon-a/data`` folder. :: sudo mkdir -p /var/lib/ceph/mon-a/data/ sudo ceph-monstore-tool /var/lib/ceph/mon/ceph-controller/ store-copy /var/lib/ceph/mon-a/data/ #. Update monmap in this copy of monitor data and update monitor info. :: sudo ceph-mon --extract-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ 2020-05-21 06:01:39.477 7f69d63b2140 -1 wrote monmap to monmap monmaptool --print monmap monmaptool: monmap file monmap epoch 2 fsid 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a last_changed 2020-05-21 04:29:59.164965 created 2020-05-21 03:50:51.893155 0: 192.188.204.3:6789/0 mon.controller sudo monmaptool --rm controller monmap monmaptool: monmap file monmap monmaptool: removing controller monmaptool: writing epoch 2 to monmap (2 monitors) sudo monmaptool --add a 192.188.204.3 monmap monmaptool: monmap file monmap monmaptool: writing epoch 2 to monmap (1 monitors) monmaptool --print monmap monmaptool: monmap file monmap epoch 2 fsid 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a last_changed 2020-05-21 04:29:59.164965 created 2020-05-21 03:50:51.893155 0: 192.188.204.3:6789/0 mon.a sudo ceph-mon --inject-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ ************************************** Disable Ceph monitors and Ceph manager ************************************** Disable Ceph manager on host controller-0 and controller-1. :: ps -aux | grep mgr root 97971 0.0 0.0 241336 18488 ? S< 03:54 0:02 /usr/bin/python /etc/init.d/mgr-restful-plugin start root 97990 0.5 0.0 241468 18916 ? S< 03:54 0:38 /usr/bin/python /etc/init.d/mgr-restful-plugin start root 186145 1.2 0.3 716488 111328 ? S