diff --git a/doc/source/operations/ceph_cluster_aio_duplex_migration.rst b/doc/source/operations/ceph_cluster_aio_duplex_migration.rst new file mode 100644 index 000000000..18f78d576 --- /dev/null +++ b/doc/source/operations/ceph_cluster_aio_duplex_migration.rst @@ -0,0 +1,1130 @@ +======================================== +Ceph Cluster Migration for duplex system +======================================== + +This guide contains step by step instructions for manually migrating a StarlingX +deployment with an all-in-one duplex Ceph Cluster to a containerized Ceph cluster +deployed by Rook. + +.. contents:: + :local: + :depth: 1 + +------------ +Introduction +------------ + +In early releases of StarlingX, the backend storage cluster solution (Ceph) was +deployed directly on the host platform. In an upcoming release of StarlingX, +Ceph cluster will be containerized and managed by Rook, to improve operation +and maintenance efficiency. + +This guide describes a method to migrate the host-based Ceph cluster deployed with +StarlingX early releses to the newly containerized Ceph clusters using an upcoming +StarlingX release, while maintaining user data in :abbr:`OSDs (Object Store Devices)`. + +The migration procedure maintains CEPH OSDs and data on OSDs. Although the procedure +does result in hosted applications experiencing several minutes of service outage due +to temporary loss of access to PVCs or virtual disks, due to the temporary loss of +the ceph service. + +--------------------- +Prepare for migration +--------------------- + +StarlingX uses some :abbr:`HA (High Availability)` mechanisms for critical +service monitoring and recovering. To migrate Ceph monitor(s) and Ceph OSD(s), +the first step is to disable monitoring and recovery for Ceph services. This +avoids interrupting the migration procedure with service restarts. + +************************************* +Disable StarlingX HA for Ceph service +************************************* + +#. Disable service manager's monitoring of Ceph related service on both two controllers. + + :: + + sudo sm-unmanage service mgr-restful-plugin + Service (mgr-restful-plugin) is no longer being managed. + sudo sm-unmanage service ceph-manager + Service (ceph-manager) is no longer being managed. + sudo sm-unmanage service ceph-mon + Service (ceph-mon) is no longer being managed. + sudo sm-unmanage service cephmon-fs + Service (cephmon-fs) is no longer being managed. + sudo sm-unmanage service drbd-cephmon + Service (drbd-cephmon) is no longer being managed. + sudo sm-unmanage service ceph-osd + Service (ceph-osd) is no longer being managed. + sudo sm-deprovision service-group-member storage-monitoring-services ceph-manager + sudo sm-deprovision service-group-member storage-services mgr-restful-plugin + sudo sm-deprovision service-group-member storage-services ceph-osd + sudo sm-deprovision service-group-member controller-services ceph-mon + sudo sm-deprovision service-group-member controller-services cephmon-fs + sudo sm-deprovision service-group-member controller-services drbd-cephmon + +********************************** +Enable Ceph service authentication +********************************** + +StarlingX disables Ceph authentication, but authentication is required for Rook. +Before migration, enable authentication for each daemon. + +#. Enable authentication for Ceph mon and osd service. + + :: + + ceph config set mon.controller auth_cluster_required cephx + ceph config set mon.controller auth_supported cephx + ceph config set mon.controller auth_service_required cephx + ceph config set mon.controller auth_client_required cephx + ceph config set mgr.controller-0 auth_supported cephx + ceph config set mgr.controller-0 auth_cluster_required cephx + ceph config set mgr.controller-0 auth_client_required cephx + ceph config set mgr.controller-0 auth_service_required cephx + ceph config set mgr.controller-1 auth_supported cephx + ceph config set mgr.controller-1 auth_cluster_required cephx + ceph config set mgr.controller-1 auth_client_required cephx + ceph config set mgr.controller-1 auth_service_required cephx + ceph config set osd.0 auth_supported cephx + ceph config set osd.0 auth_cluster_required cephx + ceph config set osd.0 auth_service_required cephx + ceph config set osd.0 auth_client_required cephx + ceph config set osd.1 auth_supported cephx + ceph config set osd.1 auth_cluster_required cephx + ceph config set osd.1 auth_service_required cephx + ceph config set osd.1 auth_client_required cephx + +#. Generate ``client.admin`` key. + + :: + + ADMIN_KEY=$(ceph auth get-or-create-key client.admin mon 'allow *' osd 'allow *' mgr 'allow *' mds 'allow *') + echo $ADMIN_KEY + AQDRGqFea0cYERAAwYdhhle5zEbLLkYHWF+sDw== + MON_KEY=$(ceph auth get-or-create-key mon. mon 'allow *') + echo $MON_KEY + AQBbs79eM4/FMRAAbu4jwdBFVS1hOmlCdoCacQ== + +*********************************************** +Create configmap and secret for Rook deployment +*********************************************** + +Rook uses a configmap, ``rook-ceph-mon-endpoint``, and a secret, +``rook-ceph-mon``, to get cluster info. Create the configmap and secret with +the commands below. + +:: + + export NAMESPACE=kube-system + export ROOK_EXTERNAL_CEPH_MON_DATA=a=192.188.204.3:6789 + export ROOK_EXTERNAL_FSID=$(ceph fsid) + export ROOK_EXTERNAL_CLUSTER_NAME=$NAMESPACE + export ROOK_EXTERNAL_MAX_MON_ID=0 + + kubectl -n "$NAMESPACE" create secret generic rook-ceph-mon \ + > --from-literal=cluster-name="$ROOK_EXTERNAL_CLUSTER_NAME" \ + > --from-literal=fsid="$ROOK_EXTERNAL_FSID" \ + > --from-literal=admin-secret="$ADMIN_KEY" \ + > --from-literal=mon-secret="$MON_KEY" + secret/rook-ceph-mon created + + kubectl -n "$NAMESPACE" create configmap rook-ceph-mon-endpoints \ + > --from-literal=data="$ROOK_EXTERNAL_CEPH_MON_DATA" \ + > --from-literal=mapping="$ROOK_EXTERNAL_MAPPING" \ + > --from-literal=maxMonId="$ROOK_EXTERNAL_MAX_MON_ID" + configmap/rook-ceph-mon-endpoint created + +********************** +Remove rbd-provisioner +********************** + +The ``platform-integ-apps`` application deploys the helm chart +``rbd-provisioner``. This chart is unnecessary after Rook is deployed, remove +it with the command below. + +:: + + sudo rm -rf /opt/platform/sysinv/20.01/.crushmap_applied + source /etc/platform/openrc + system application-remove platform-integ-apps + + +---------------+----------------------------------+ + | Property | Value | + +---------------+----------------------------------+ + | active | True | + | app_version | 1.0-8 | + | created_at | 2020-04-22T14:56:19.148562+00:00 | + | manifest_file | manifest.yaml | + | manifest_name | platform-integration-manifest | + | name | platform-integ-apps | + | progress | None | + | status | removing | + | updated_at | 2020-04-22T15:46:24.018090+00:00 | + +---------------+----------------------------------+ + +---------------------------------------------- +Remove storage backend ceph-store and clean up +---------------------------------------------- + +After migration, remove the default storage backend ceph-store. + +:: + + system storage-backend-list + +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ + | uuid | name | backend | state | task | services | capabilities | + +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ + | 3fd0a407-dd8b-4a5c-9dec-8754d76956f4 | ceph-store | ceph | configured | None | None | min_replication: 1 replication: 2 | + | | | | | | | | + +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ + system storage-backend-delete 3fd0a407-dd8b-4a5c-9dec-8754d76956f4 --force + +Update puppet system config. + +:: + + sudo sysinv-puppet create-system-config + +Remove script ceph.sh on both controllers. + +:: + + sudo rm -rf /etc/services.d/controller/ceph.sh + sudo rm -rf /etc/services.d/worker/ceph.sh + sudo rm -rf /etc/services.d/storage/ceph.sh + +************************************************************************ +Disable ceph osd on all storage hosts and create configmap for migration +************************************************************************ + +#. Login to controller host and run ``ceph-preshutdown.sh`` firstly. + + :: + + sudo ceph-preshutdown.sh + +Login to the both controllers, disable the Ceph osd service, and create a +journal file. + +#. Disable the Ceph osd service. + + :: + + sudo service ceph -a stop osd.1 + === osd.1 === + Stopping Ceph osd.1 on controller-1...kill 213077... + done + 2020-04-26 23:36:56.988 7f1d647bb1c0 -1 journal do_read_entry(585007104): bad header magic + 2020-04-26 23:36:56.988 7f1d647bb1c0 -1 journal do_read_entry(585007104): bad header magic + 2020-04-26 23:36:56.994 7f1d647bb1c0 -1 flushed journal /var/lib/ceph/osd/ceph-1/journal for object store /var/lib/ceph/osd/ceph-1 + +#. Remove the journal link and create a blank journal file. + + :: + + sudo rm -f /var/lib/ceph/osd/ceph-1/journal + sudo touch /var/lib/ceph/osd/ceph-1/journal + sudo dd if=/dev/zero of=/var/lib/ceph/osd/ceph-1/journal bs=1M count=1024 + sudo ceph-osd --id 1 --mkjournal --no-mon-config + sudo umount /dev/sdc1 + +#. Mount to host patch /var/lib/ceph/osd, which can be accessed by the Rook + osd pod. + + :: + + sudo mkdir -p /var/lib/ceph/ceph-1/osd1 + sudo mount /dev/sdc1 /var/lib/ceph/ceph-1/osd1 + sudo ls /var/lib/ceph/ceph-1/osd1 -l + total 1048640 + -rw-r--r-- 1 root root 3 Apr 26 12:57 active + -rw-r--r-- 1 root root 37 Apr 26 12:57 ceph_fsid + drwxr-xr-x 388 root root 12288 Apr 27 00:01 current + -rw-r--r-- 1 root root 37 Apr 26 12:57 fsid + -rw-r--r-- 1 root root 1073741824 Apr 27 00:49 journal + -rw-r--r-- 1 root root 37 Apr 26 12:57 journal_uuid + -rw------- 1 root root 56 Apr 26 12:57 keyring + -rw-r--r-- 1 root root 21 Apr 26 12:57 magic + -rw-r--r-- 1 root root 6 Apr 26 12:57 ready + -rw-r--r-- 1 root root 4 Apr 26 12:57 store_version + -rw-r--r-- 1 root root 53 Apr 26 12:57 superblock + -rw-r--r-- 1 root root 0 Apr 26 12:57 sysvinit + -rw-r--r-- 1 root root 10 Apr 26 12:57 type + -rw-r--r-- 1 root root 2 Apr 26 12:57 wanttobe + -rw-r--r-- 1 root root 2 Apr 26 12:57 whoami + +For every host with an OSD device, create a configmap with the name +``rook-ceph-osd--config``. In the configmap, specify the OSD data +folder. In the example below, the Rook osd0 data path is ``/var/lib/ceph/osd0``. + +:: + + osd-dirs: '{"/var/lib/ceph/ceph-0/":0}' + + system host-stor-list controller-0 + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + | uuid | function | osdid | state | idisk_uuid | journal_path | journal_no | journal_size | tier_name | + | | | | | | | de | _gib | | + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + | 21a90d60-2f1e-4f46-badc-afa7d9117622 | osd | 0 | configured | a13c6ac9-9d59-4063-88dc-2847e8aded85 | /dev/disk/by-path/pci-0000: | /dev/sdc2 | 1 | storage | + | | | | | | 00:03.0-ata-3.0-part2 | | | | + | | | | | | | | | | + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + + system host-stor-list controller-1 + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + | uuid | function | osdid | state | idisk_uuid | journal_path | journal_no | journal_size | tier_name | + | | | | | | | de | _gib | | + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + | 17f2db8e-c80e-4df7-9525-1f0cb5b54cd3 | osd | 1 | configured | 89637c7d-f959-4c54-bfe1-626b5c630d96 | /dev/disk/by-path/pci-0000: | /dev/sdc2 | 1 | storage | + | | | | | | 00:03.0-ata-3.0-part2 | | | | + | | | | | | | | | | + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + + +#. Sample ``osd-configmap.yaml`` file. + :: + + apiVersion: v1 + kind: ConfigMap + metadata: + name: rook-ceph-osd-controller-0-config + namespace: kube-system + data: + osd-dirs: '{"/var/lib/ceph/ceph-0":0}' + --- + apiVersion: v1 + kind: ConfigMap + metadata: + name: rook-ceph-osd-controller-1-config + namespace: kube-system + data: + osd-dirs: '{"/var/lib/ceph/ceph-1":1}' + +#. Apply yaml file for configmap. + + :: + + kubectl apply -f osd-configmap.yaml + configmap/rook-ceph-osd-controller-0-config created + configmap/rook-ceph-osd-controller-1-config created + +************************** +Ceph monitor data movement +************************** + +For Ceph monitor migration, the Rook deployed monitor pod will read monitor data +for host path ``/var/lib/ceph/mon-/data``. For example, if only one monitor +pod is deployed, a monitor process named ``mon.a`` in the monitor pod will be +created and monitor data will be in the host path ``/var/lib/ceph/mon-a/data``. + +Before migration, disable one monitor service and launch another monitor +specified with the ``--mon-data /var/lib/ceph/mon-a/data`` parameter. This will +migrate the monitor data to ``/var/lib/ceph/mon-a/data``. + +#. Login to host controller-0 and disable service monitor.controller. + + :: + + sudo service ceph -a stop mon.controller + === mon.controller-0 === + Stopping Ceph mon.controller on controller-0...kill 291101...done + +#. Copy mon data to the ``/var/lib/ceph/mon-a/data`` folder. + + :: + + sudo mkdir -p /var/lib/ceph/mon-a/data/ + sudo ceph-monstore-tool /var/lib/ceph/mon/ceph-controller/ store-copy /var/lib/ceph/mon-a/data/ + +#. Update monmap in this copy of monitor data and update monitor info. + + :: + + sudo ceph-mon --extract-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ + 2020-05-21 06:01:39.477 7f69d63b2140 -1 wrote monmap to monmap + + monmaptool --print monmap + monmaptool: monmap file monmap + epoch 2 + fsid 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a + last_changed 2020-05-21 04:29:59.164965 + created 2020-05-21 03:50:51.893155 + 0: 192.188.204.3:6789/0 mon.controller + + sudo monmaptool --rm controller monmap + monmaptool: monmap file monmap + monmaptool: removing controller + monmaptool: writing epoch 2 to monmap (2 monitors) + + sudo monmaptool --add a 192.188.204.3 monmap + monmaptool: monmap file monmap + monmaptool: writing epoch 2 to monmap (1 monitors) + + monmaptool --print monmap + monmaptool: monmap file monmap + epoch 2 + fsid 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a + last_changed 2020-05-21 04:29:59.164965 + created 2020-05-21 03:50:51.893155 + 0: 192.188.204.3:6789/0 mon.a + + sudo ceph-mon --inject-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ + +************************************** +Disable Ceph monitors and Ceph manager +************************************** + +Disable Ceph manager on host controller-0 and controller-1. + +:: + + ps -aux | grep mgr + root 97971 0.0 0.0 241336 18488 ? S< 03:54 0:02 /usr/bin/python /etc/init.d/mgr-restful-plugin start + root 97990 0.5 0.0 241468 18916 ? S< 03:54 0:38 /usr/bin/python /etc/init.d/mgr-restful-plugin start + root 186145 1.2 0.3 716488 111328 ? S 8080/TCP,8081/TCP 4h23m + csi-rbdplugin-metrics ClusterIP 10.102.41.27 8080/TCP,8081/TCP 4h23m + ic-nginx-ingress-controller ClusterIP 10.111.161.197 80/TCP,443/TCP 21h + ic-nginx-ingress-default-backend ClusterIP 10.104.104.150 80/TCP 21h + kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 21h + rook-ceph-mgr ClusterIP 10.108.43.251 9283/TCP 4h14m + rook-ceph-mgr-dashboard ClusterIP 10.98.157.27 8443/TCP 4h20m + rook-ceph-mon-a ClusterIP 10.104.152.151 6789/TCP,3300/TCP 3h56m + +#. Apply deployment mon-data-edit + + :: + + mon-data-edit.yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + name: mon-data-edit + namespace: kube-system + labels: + app: mon-data-edit + spec: + replicas: 1 + selector: + matchLabels: + app: mon-data-edit + template: + metadata: + labels: + app: mon-data-edit + spec: + dnsPolicy: ClusterFirstWithHostNet + containers: + - name: mon-data-edit + image: registry.local:9001/docker.io/rook/ceph:v1.2.7 + command: ["/tini"] + args: ["-g", "--", "/usr/local/bin/toolbox.sh"] + imagePullPolicy: IfNotPresent + env: + - name: ROOK_ADMIN_SECRET + valueFrom: + secretKeyRef: + name: rook-ceph-mon + key: admin-secret + volumeMounts: + - mountPath: /etc/ceph + name: ceph-config + - name: mon-endpoint-volume + mountPath: /etc/rook + - name: rook-data + mountPath: /var/lib/ceph + volumes: + - name: mon-endpoint-volume + configMap: + name: rook-ceph-mon-endpoints + items: + - key: data + path: mon-endpoints + - name: ceph-config + emptyDir: {} + - name: rook-data + hostPath: + path: /var/lib/ceph + tolerations: + - key: "node.kubernetes.io/unreachable" + operator: "Exists" + effect: "NoExecute" + tolerationSeconds: 5 + nodeName: controller-0 + + $ kubectl apply -f mon-data-edit.yaml + deployment.apps/mon-data-edit created + +#. In mon-data-edit pod, edit monmap + + :: + + $ kubectl exec -it mon-data-edit-d65546cdb-b5vkr -n kube-system -- bash + bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory + [root@mon-data-edit-d65546cdb-b5vkr /]# + + [root@mon-data-edit-d65546cdb-b5vkr /]# ceph-mon --extract-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ + 2020-09-17 04:59:49.308 7f5b047d2040 -1 wrote monmap to monmap + + [root@mon-data-edit-d65546cdb-b5vkr /]# monmaptool --print monmap + monmaptool: monmap file monmap + epoch 2 + fsid ceface4e-9957-480e-96f4-f91fc1cb7fc9 + last_changed 2020-09-17 02:08:02.446136 + created 2020-09-16 07:58:20.615682 + min_mon_release 14 (nautilus) + 0: [v2:192.168.204.3:3300/0,v1:192.168.204.3:6789/0] mon.a + + [root@mon-data-edit-d65546cdb-b5vkr /]# monmaptool --rm a monmap + monmaptool: monmap file monmap + monmaptool: removing a + monmaptool: writing epoch 2 to monmap (0 monitors) + + [root@mon-data-edit-d65546cdb-b5vkr /]# monmaptool --add a 10.104.152.151 monmap + monmaptool: monmap file monmap + monmaptool: writing epoch 2 to monmap (1 monitors) + + [root@mon-data-edit-d65546cdb-b5vkr /]# ceph-mon --inject-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ + [root@mon-data-edit-d65546cdb-b5vkr /]# exit + +#. Edit CephCluster, change host network to false + + :: + + $ kubectl edit CephCluster -n kube-system + + network: + hostNetwork: false + +#. Delete deployment rook-ceph-mgr-a, rook-ceph-osd-0 and rook-ceph-osd-1 mon-data-edit + + :: + + $ kubectl delete deployments.apps -n kube-system rook-ceph-mgr-a rook-ceph-osd-0 rook-ceph-osd-1 mon-data-edit + deployment.apps "rook-ceph-mgr-a" deleted + deployment.apps "rook-ceph-osd-0" deleted + deployment.apps "rook-ceph-osd-1" deleted + deployment.apps "mon-data-edit" deleted + +#. Delete configmap rook-ceph-mon-endpoints and rook-ceph-csi-config + + :: + + $ kubectl delete configmap -n kube-system rook-ceph-mon-endpoints rook-ceph-csi-config + configmap "rook-ceph-mon-endpoints" deleted + configmap "rook-ceph-csi-config" deleted + +#. Delete pod rook-ceph-operator and rook-ceph-tools + + :: + + $ kubectl delete po rook-ceph-operator-79fb8559-grgz8 -n kube-system + pod "rook-ceph-operator-79fb8559-grgz8" deleted + $ kubectl delete po rook-ceph-tools-5778d7f6c-cj947 -n kube-system + pod "rook-ceph-tools-5778d7f6c-cj947" deleted + + +#. Wait for Ceph cluster launch. + + :: + + $ kubectl get pods -n kube-system + NAME READY STATUS RESTARTS AGE + calico-kube-controllers-5cd4695574-q8dkc 1/1 Running 0 14h + calico-node-jwth4 1/1 Running 8 21h + calico-node-pk4pp 1/1 Running 6 20h + coredns-78d9fd7cb9-78kw7 1/1 Running 0 15h + coredns-78d9fd7cb9-lsd2s 1/1 Running 0 14h + csi-cephfsplugin-provisioner-55995dd4f6-6ts8x 5/5 Running 0 4h33m + csi-cephfsplugin-provisioner-55995dd4f6-tn4cb 5/5 Running 0 4h33m + csi-cephfsplugin-wx5fn 3/3 Running 0 4h33m + csi-cephfsplugin-z8l22 3/3 Running 0 4h33m + csi-rbdplugin-9m7dq 3/3 Running 0 4h33m + csi-rbdplugin-hn6kx 3/3 Running 0 4h33m + csi-rbdplugin-provisioner-57974d4b9c-k47nc 6/6 Running 0 4h33m + csi-rbdplugin-provisioner-57974d4b9c-pvrxq 6/6 Running 0 4h33m + ic-nginx-ingress-controller-7k6lq 1/1 Running 0 14h + ic-nginx-ingress-controller-cjdmw 1/1 Running 0 14h + ic-nginx-ingress-default-backend-5ffcfd7744-76dv2 1/1 Running 0 14h + kube-apiserver-controller-0 1/1 Running 10 21h + kube-apiserver-controller-1 1/1 Running 5 20h + kube-controller-manager-controller-0 1/1 Running 7 21h + kube-controller-manager-controller-1 1/1 Running 5 20h + kube-multus-ds-amd64-6jcpj 1/1 Running 0 14h + kube-multus-ds-amd64-g5twh 1/1 Running 0 14h + kube-proxy-hrxpk 1/1 Running 3 21h + kube-proxy-m8fs9 1/1 Running 3 20h + kube-scheduler-controller-0 1/1 Running 7 21h + kube-scheduler-controller-1 1/1 Running 5 20h + kube-sriov-cni-ds-amd64-bdwfr 1/1 Running 0 14h + kube-sriov-cni-ds-amd64-r2rsf 1/1 Running 0 14h + rook-ceph-crashcollector-controller-0-57c5fdc6d6-72ftn 1/1 Running 0 173m + rook-ceph-crashcollector-controller-1-67877489b7-4hvvq 1/1 Running 0 173m + rook-ceph-mgr-a-8d656f86c-n67vg 1/1 Running 0 179m + rook-ceph-mon-a-85f9db5c6-mz4br 1/1 Running 0 3h + rook-ceph-operator-79fb8559-grgz8 1/1 Running 0 3h1m + rook-ceph-osd-0-64b9b74788-ws89m 1/1 Running 0 173m + rook-ceph-osd-1-5b789485c6-qt8xr 1/1 Running 0 173m + rook-ceph-osd-prepare-controller-0-7nhbj 0/1 Completed 0 167m + rook-ceph-osd-prepare-controller-1-qjmvj 0/1 Completed 0 167m + rook-ceph-tools-5778d7f6c-cj947 1/1 Running 0 169m + rook-discover-c8kbn 1/1 Running 0 4h33m + rook-discover-rk2rp 1/1 Running 0 4h33m + storage-init-rook-ceph-provisioner-n6zgj 0/1 Completed 0 4h15m + +#. Assign ``ceph-mon-placement`` and ``ceph-mgr-placement`` labels. + + :: + + system host-label-assign controller-1 ceph-mon-placement=enabled + +-------------+--------------------------------------+ + | Property | Value | + +-------------+--------------------------------------+ + | uuid | df9aff88-8863-49ca-aea8-85f8a0e1dc71 | + | host_uuid | 8cdb45bc-1fd7-4811-9459-bfd9301e93cf | + | label_key | ceph-mon-placement | + | label_value | enabled | + +-------------+--------------------------------------+ + + [sysadmin@controller-0 ~(keystone_admin)]$ system host-label-assign controller-1 ceph-mgr-placement=enabled + +-------------+--------------------------------------+ + | Property | Value | + +-------------+--------------------------------------+ + | uuid | badde75f-4d4f-4c42-8cb8-7e9c69729935 | + | host_uuid | 8cdb45bc-1fd7-4811-9459-bfd9301e93cf | + | label_key | ceph-mgr-placement | + | label_value | enabled | + +-------------+--------------------------------------+ + +#. Edit CephCluster and change mon number to 3 and allowMultiplePerNode to true. + + :: + + mgr: {} + mon: + count: 3 + allowMultiplePerNode: true + +#. Wait for two other monitor pods to launch. + + :: + + $ kubectl get pods -n kube-system + NAME READY STATUS RESTARTS AGE + calico-kube-controllers-5cd4695574-q8dkc 1/1 Running 0 14h + calico-node-jwth4 1/1 Running 8 21h + calico-node-pk4pp 1/1 Running 6 20h + coredns-78d9fd7cb9-78kw7 1/1 Running 0 15h + coredns-78d9fd7cb9-lsd2s 1/1 Running 0 14h + csi-cephfsplugin-provisioner-55995dd4f6-6ts8x 5/5 Running 0 4h33m + csi-cephfsplugin-provisioner-55995dd4f6-tn4cb 5/5 Running 0 4h33m + csi-cephfsplugin-wx5fn 3/3 Running 0 4h33m + csi-cephfsplugin-z8l22 3/3 Running 0 4h33m + csi-rbdplugin-9m7dq 3/3 Running 0 4h33m + csi-rbdplugin-hn6kx 3/3 Running 0 4h33m + csi-rbdplugin-provisioner-57974d4b9c-k47nc 6/6 Running 0 4h33m + csi-rbdplugin-provisioner-57974d4b9c-pvrxq 6/6 Running 0 4h33m + ic-nginx-ingress-controller-7k6lq 1/1 Running 0 14h + ic-nginx-ingress-controller-cjdmw 1/1 Running 0 14h + ic-nginx-ingress-default-backend-5ffcfd7744-76dv2 1/1 Running 0 14h + kube-apiserver-controller-0 1/1 Running 10 21h + kube-apiserver-controller-1 1/1 Running 5 20h + kube-controller-manager-controller-0 1/1 Running 7 21h + kube-controller-manager-controller-1 1/1 Running 5 20h + kube-multus-ds-amd64-6jcpj 1/1 Running 0 14h + kube-multus-ds-amd64-g5twh 1/1 Running 0 14h + kube-proxy-hrxpk 1/1 Running 3 21h + kube-proxy-m8fs9 1/1 Running 3 20h + kube-scheduler-controller-0 1/1 Running 7 21h + kube-scheduler-controller-1 1/1 Running 5 20h + kube-sriov-cni-ds-amd64-bdwfr 1/1 Running 0 14h + kube-sriov-cni-ds-amd64-r2rsf 1/1 Running 0 14h + rook-ceph-crashcollector-controller-0-57c5fdc6d6-72ftn 1/1 Running 0 173m + rook-ceph-crashcollector-controller-1-67877489b7-4hvvq 1/1 Running 0 173m + rook-ceph-mgr-a-8d656f86c-n67vg 1/1 Running 0 179m + rook-ceph-mon-a-85f9db5c6-mz4br 1/1 Running 0 3h + rook-ceph-mon-b-55f4bb467d-jm25b 1/1 Running 0 169m + rook-ceph-mon-c-84c75b988-jjq68 1/1 Running 0 168m + rook-ceph-operator-79fb8559-grgz8 1/1 Running 0 3h1m + rook-ceph-osd-0-64b9b74788-ws89m 1/1 Running 0 173m + rook-ceph-osd-1-5b789485c6-qt8xr 1/1 Running 0 173m + rook-ceph-osd-prepare-controller-0-7nhbj 0/1 Completed 0 167m + rook-ceph-osd-prepare-controller-1-qjmvj 0/1 Completed 0 167m + rook-ceph-tools-5778d7f6c-cj947 1/1 Running 0 169m + rook-discover-c8kbn 1/1 Running 0 4h33m + rook-discover-rk2rp 1/1 Running 0 4h33m + storage-init-rook-ceph-provisioner-n6zgj 0/1 Completed 0 4h15m + +#. Check the cluster status in the pod rook-tool. + + :: + + kubectl exec -it rook-ceph-tools-5778d7f6c-cj947 -- bash -n kube-system + bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory + [root@compute-1 /]# ceph -s + cluster: + id: 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a + health: HEALTH_OK + + services: + mon: 3 daemons, quorum a,b,c + mgr: a(active) + osd: 2 osds: 2 up, 2 in + + data: + pools: 1 pools, 64 pgs + objects: 0 objects, 0 B + usage: 4.4 GiB used, 391 GiB / 396 GiB avail + pgs: 64 active+clean + +----------------------------- +Migrate openstack application +----------------------------- + +Login to pod rook-ceph-tools, get generated key for client.admin and ceph.conf in container. + +:: + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl exec -it rook-ceph-tools-84c7fff88c-kgwbn bash -n kube-system + kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead. + bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory + [root@storage-1 /]# ceph -s + cluster: + id: d4c0400e-06ed-4f97-ab8e-ed1bef427039 + health: HEALTH_OK + + services: + mon: 3 daemons, quorum a,b,d + mgr: a(active) + osd: 4 osds: 4 up, 4 in + + data: + pools: 5 pools, 600 pgs + objects: 252 objects, 743 MiB + usage: 5.8 GiB used, 390 GiB / 396 GiB avail + pgs: 600 active+clean + + [root@storage-1 /]# cat /etc/ceph/ceph.conf + [global] + mon_host = 10.109.143.37:6789,10.100.141.25:6789,10.106.83.145:6789 + + [client.admin] + keyring = /etc/ceph/keyring + [root@storage-1 /]# cat /etc/ceph/keyring + [client.admin] + key = AQDabgVf7CFhFxAAqY1L4X9XnUONzMWWJnxBFA== + [root@storage-1 /]# exit + +On host controller-0 and controller-1 replace /etc/ceph/ceph.conf and /etc/ceph/keyring +with content got from pod rook-ceph-tools. + +Update configmap ceph-etc, with data field, with new mon ip + +:: + + data: + ceph.conf: | + [global] + mon_host = 10.109.143.37:6789,10.100.141.25:6789,10.106.83.145:6789 + +Calculate the base64 of key and write to secret ceph-admin. + +:: + + [sysadmin@controller-0 script(keystone_admin)]$ echo "AQDabgVf7CFhFxAAqY1L4X9XnUONzMWWJnxBFA==" | base64 + QVFEYWJnVmY3Q0ZoRnhBQXFZMUw0WDlYblVPTnpNV1dKbnhCRkE9PQo= + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit secret ceph-admin -n kube-system + secret/ceph-admin edited + + apiVersion: v1 + data: + key: QVFEYWJnVmY3Q0ZoRnhBQXFZMUw0WDlYblVPTnpNV1dKbnhCRkE9PQo= + kind: Secret + +Create crush rule "kube-rbd" in pod rook-ceph-tools. + +:: + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl exec -it rook-ceph-tools-84c7fff88c-kgwbn bash -n kube-system + kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead. + bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory + bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory + [root@storage-1 /]# + [root@storage-1 /]# + [root@storage-1 /]# ceph osd crush rule create-replicated kube-rbd storage-tier host + +Update every mariadb and rabbitmq pv and pvc provisioner from ceph.com/rbd +to kube-system.rbd.csi.ceph.com in annotation. + +:: + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl get pv + NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE + pvc-0a5a97ba-b78c-4909-89c5-f3703e3a7b39 1Gi RWO Delete Bound openstack/rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-1 general 144m + pvc-65adf629-a07f-46ab-a891-5e35a12413b8 5Gi RWO Delete Bound openstack/mysql-data-mariadb-server-1 general 147m + pvc-7bec7ab2-793f-405b-96f9-a3d2b855de17 5Gi RWO Delete Bound openstack/mysql-data-mariadb-server-0 general 147m + pvc-7c96fb7a-65dc-483f-94bc-aadefc685580 1Gi RWO Delete Bound openstack/rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-0 general 144m + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit pv pvc-65adf629-a07f-46ab-a891-5e35a12413b8 + persistentvolume/pvc-65adf629-a07f-46ab-a891-5e35a12413b8 edited + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit pv pvc-0a5a97ba-b78c-4909-89c5-f3703e3a7b39 + persistentvolume/pvc-0a5a97ba-b78c-4909-89c5-f3703e3a7b39 edited + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit pv pvc-7bec7ab2-793f-405b-96f9-a3d2b855de17 + persistentvolume/pvc-7bec7ab2-793f-405b-96f9-a3d2b855de17 edited + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit pv pvc-7c96fb7a-65dc-483f-94bc-aadefc685580 + persistentvolume/pvc-7c96fb7a-65dc-483f-94bc-aadefc685580 edited + + apiVersion: v1 + kind: PersistentVolume + metadata: + annotations: + pv.kubernetes.io/provisioned-by: kube-system.rbd.csi.ceph.com + rbdProvisionerIdentity: kube-system.rbd.csi.ceph.com + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl get pvc -n openstack + NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE + mysql-data-mariadb-server-0 Bound pvc-7bec7ab2-793f-405b-96f9-a3d2b855de17 5Gi RWO general 153m + mysql-data-mariadb-server-1 Bound pvc-65adf629-a07f-46ab-a891-5e35a12413b8 5Gi RWO general 153m + rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-0 Bound pvc-7c96fb7a-65dc-483f-94bc-aadefc685580 1Gi RWO general 150m + rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-1 Bound pvc-0a5a97ba-b78c-4909-89c5-f3703e3a7b39 1Gi RWO general 150m + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit pvc -n openstack mysql-data-mariadb-server-0 + persistentvolumeclaim/mysql-data-mariadb-server-0 edited + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit pvc -n openstack mysql-data-mariadb-server-1 + persistentvolumeclaim/mysql-data-mariadb-server-1 edited + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit pvc -n openstack rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-0 + persistentvolumeclaim/rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-0 edited + + [sysadmin@controller-0 script(keystone_admin)]$ kubectl edit pvc -n openstack rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-1 + persistentvolumeclaim/rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-1 edited + + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + annotations: + pv.kubernetes.io/bind-completed: "yes" + pv.kubernetes.io/bound-by-controller: "yes" + volume.beta.kubernetes.io/storage-provisioner: kube-system.rbd.csi.ceph.com + +Edit these PersistentVolume one by one for mariadb and rabbitmq + +:: + + $ kubectl get pv + NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE + pvc-36fd174d-a058-45b6-99ba-a9e362cb72f1 1Gi RWO Delete Bound openstack/rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-0 general 43h + pvc-933a4e66-3687-472f-be11-71befc7780e1 5Gi RWO Delete Bound openstack/mysql-data-mariadb-server-1 general 55m + pvc-b2ba8ad2-fa26-475c-9287-14b1f0525ee5 1Gi RWO Delete Bound openstack/rabbitmq-data-osh-openstack-rabbitmq-rabbitmq-1 general 43h + pvc-f1ac8d8f-c718-4793-9165-8cbc01f0109c 5Gi RWO Delete Bound openstack/mysql-data-mariadb-server-0 general 176m + + $ kubectl get pv pvc-36fd174d-a058-45b6-99ba-a9e362cb72f1 -o yaml > rabbitmq0.yaml + +Edit yaml to update monitor ip + +:: + + persistentVolumeReclaimPolicy: Delete + rbd: + image: kubernetes-dynamic-pvc-be1c74e7-ff0e-11ea-b88c-d24b7b64770e + keyring: /etc/ceph/keyring + monitors: + - 10.98.241.108:6789 + - 10.99.168.50:6789 + - 10.96.65.38:6789 + pool: kube-rbd + + $ kubectl replace --cascade=false --force -f rabbitmq0.yaml & + [1] 2500885 + $ persistentvolume "pvc-36fd174d-a058-45b6-99ba-a9e362cb72f1" deleted + +Delete field "finalizers" in the persistentvolume + +:: + + $ kubectl patch pv pvc-b2ba8ad2-fa26-475c-9287-14b1f0525ee5 --type merge -p '{"metadata":{"finalizers": [null]}}' + +You also can use "kubectl edit pv " , to delete these two lines + +:: + + finalizers: + - kubernetes.io/pv-protection + +Delete pod mariadb-server-0 mariadb-server-1 osh-openstack-rabbitmq-rabbitmq-0 osh-openstack-rabbitmq-rabbitmq-1 + +:: + + $ kubectl delete po mariadb-server-0 mariadb-server-1 osh-openstack-rabbitmq-rabbitmq-0 osh-openstack-rabbitmq-rabbitmq-1 -n openstack + pod "mariadb-server-0" deleted + pod "mariadb-server-1" deleted + pod "osh-openstack-rabbitmq-rabbitmq-0" deleted + pod "osh-openstack-rabbitmq-rabbitmq-1" deleted + +Update override for cinder helm chart. + +:: + + $ system helm-override-update stx-openstack cinder openstack --reuse-value --values cinder_override.yaml + + $ controller-0:~$ cat cinder_override.yaml + conf: + backends: + ceph-store: + image_volume_cache_enabled: "True" + rbd_ceph_conf: /etc/ceph/ceph.conf + rbd_pool: cinder-volumes + rbd_user: cinder + volume_backend_name: ceph-store + volume_driver: cinder.volume.drivers.rbd.RBDDriver + rbd1: + volume_driver: "" + +Apply application stx-openstack again + +:: + + [sysadmin@controller-0 script(keystone_admin)]$ system application-apply stx-openstack + +---------------+----------------------------------+ + | Property | Value | + +---------------+----------------------------------+ + | active | True | + | app_version | 1.0-45 | + | created_at | 2020-07-08T05:47:24.019723+00:00 | + | manifest_file | stx-openstack.yaml | + | manifest_name | armada-manifest | + | name | stx-openstack | + | progress | None | + | status | applying | + | updated_at | 2020-07-08T06:27:08.836258+00:00 | + +---------------+----------------------------------+ + Please use 'system application-list' or 'system application-show stx-openstack' to view the current progress. + [sysadmin@controller-0 script(keystone_admin)]$ + +Check application apply successfully and all pods work well without error. + +Reboot both of the controllers and wait for the host to be available. diff --git a/doc/source/operations/ceph_cluster_aio_simplex_migration.rst b/doc/source/operations/ceph_cluster_aio_simplex_migration.rst new file mode 100644 index 000000000..e8732a571 --- /dev/null +++ b/doc/source/operations/ceph_cluster_aio_simplex_migration.rst @@ -0,0 +1,664 @@ +========================================= +Ceph Cluster Migration for simplex system +========================================= + +This guide contains step by step instructions for manually migrating a StarlingX +deployment with an all-in-one simplex Ceph Cluster to a containerized Ceph cluster +deployed by Rook. + +.. contents:: + :local: + :depth: 1 + +------------ +Introduction +------------ + +In early releases of StarlingX, the backend storage cluster solution (Ceph) was +deployed directly on the host platform. In an upcoming release of StarlingX, +Ceph cluster will be containerized and managed by Rook, to improve operation +and maintenance efficiency. + +This guide describes a method to migrate the host-based Ceph cluster deployed with +StarlingX early releses to the newly containerized Ceph clusters using an upcoming +StarlingX release, while maintaining user data in :abbr:`OSDs (Object Store Devices)`. + +The migration procedure maintains CEPH OSDs and data on OSDs. Although the procedure +does result in hosted applications experiencing several minutes of service outage due +to temporary loss of access to PVCs or virtual disks, due to the temporary loss of +the ceph service. + +--------------------- +Prepare for migration +--------------------- + +StarlingX uses some :abbr:`HA (High Availability)` mechanisms for critical +service monitoring and recovering. To migrate Ceph monitor(s) and Ceph OSD(s), +the first step is to disable monitoring and recovery for Ceph services. This +avoids interrupting the migration procedure with service restarts. + +************************************* +Disable StarlingX HA for Ceph service +************************************* + +#. Disable service manager's monitoring of Ceph related service on both two controllers. + + :: + + sudo rm -f /etc/pmon.d/ceph.conf + sudo /usr/local/sbin/pmon-restart pmon_cmd_port + sudo sm-unmanage service mgr-restful-plugin + Service (mgr-restful-plugin) is no longer being managed. + sudo sm-unmanage service ceph-manager + Service (ceph-manager) is no longer being managed. + sudo sm-deprovision service-group-member storage-monitoring-services ceph-manager + sudo sm-deprovision service-group-member storage-monitoring-services mgr-restful-plugin + +********************************** +Enable Ceph service authentication +********************************** + +StarlingX disables Ceph authentication, but authentication is required for Rook. +Before migration, enable authentication for each daemon. + +#. Enable authentication for Ceph mon and osd service. + + :: + + ceph config set mon.controller-0 auth_cluster_required cephx + ceph config set mon.controller-0 auth_supported cephx + ceph config set mon.controller-0 auth_service_required cephx + ceph config set mon.controller-0 auth_client_required cephx + ceph config set mgr.controller-0 auth_supported cephx + ceph config set mgr.controller-0 auth_cluster_required cephx + ceph config set mgr.controller-0 auth_client_required cephx + ceph config set mgr.controller-0 auth_service_required cephx + ceph config set osd.0 auth_supported cephx + ceph config set osd.0 auth_cluster_required cephx + ceph config set osd.0 auth_service_required cephx + ceph config set osd.0 auth_client_required cephx + +#. Generate ``client.admin`` key. + + :: + + ADMIN_KEY=$(ceph auth get-or-create-key client.admin mon 'allow *' osd 'allow *' mgr 'allow *' mds 'allow *') + echo $ADMIN_KEY + AQDRGqFea0cYERAAwYdhhle5zEbLLkYHWF+sDw== + MON_KEY=$(ceph auth get-or-create-key mon. mon 'allow *') + echo $MON_KEY + AQBbs79eM4/FMRAAbu4jwdBFVS1hOmlCdoCacQ== + +*********************************************** +Create configmap and secret for Rook deployment +*********************************************** + +Rook uses a configmap, ``rook-ceph-mon-endpoint``, and a secret, +``rook-ceph-mon``, to get cluster info. Create the configmap and secret with +the commands below. + +:: + + export NAMESPACE=kube-system + export ROOK_EXTERNAL_CEPH_MON_DATA=a=192.188.204.3:6789 + export ROOK_EXTERNAL_FSID=$(ceph fsid) + export ROOK_EXTERNAL_CLUSTER_NAME=$NAMESPACE + export ROOK_EXTERNAL_MAX_MON_ID=0 + + kubectl -n "$NAMESPACE" create secret generic rook-ceph-mon \ + > --from-literal=cluster-name="$ROOK_EXTERNAL_CLUSTER_NAME" \ + > --from-literal=fsid="$ROOK_EXTERNAL_FSID" \ + > --from-literal=admin-secret="$ADMIN_KEY" \ + > --from-literal=mon-secret="$MON_KEY" + secret/rook-ceph-mon created + + kubectl -n "$NAMESPACE" create configmap rook-ceph-mon-endpoints \ + > --from-literal=data="$ROOK_EXTERNAL_CEPH_MON_DATA" \ + > --from-literal=mapping="$ROOK_EXTERNAL_MAPPING" \ + > --from-literal=maxMonId="$ROOK_EXTERNAL_MAX_MON_ID" + configmap/rook-ceph-mon-endpoint created + +********************** +Remove rbd-provisioner +********************** + +The ``platform-integ-apps`` application deploys the helm chart +``rbd-provisioner``. This chart is unnecessary after Rook is deployed, remove +it with the command below. + +:: + + sudo rm -rf /opt/platform/sysinv/20.01/.crushmap_applied + source /etc/platform/openrc + system application-remove platform-integ-apps + + +---------------+----------------------------------+ + | Property | Value | + +---------------+----------------------------------+ + | active | True | + | app_version | 1.0-8 | + | created_at | 2020-04-22T14:56:19.148562+00:00 | + | manifest_file | manifest.yaml | + | manifest_name | platform-integration-manifest | + | name | platform-integ-apps | + | progress | None | + | status | removing | + | updated_at | 2020-04-22T15:46:24.018090+00:00 | + +---------------+----------------------------------+ + +---------------------------------------------- +Remove storage backend ceph-store and clean up +---------------------------------------------- + +After migration, remove the default storage backend ceph-store. + +:: + + system storage-backend-list + +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ + | uuid | name | backend | state | task | services | capabilities | + +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ + | 3fd0a407-dd8b-4a5c-9dec-8754d76956f4 | ceph-store | ceph | configured | None | None | min_replication: 1 replication: 2 | + | | | | | | | | + +--------------------------------------+------------+---------+------------+------+----------+------------------------------------------------------------------------+ + system storage-backend-delete 3fd0a407-dd8b-4a5c-9dec-8754d76956f4 --force + +Update puppet system config. + +:: + + sudo sysinv-puppet create-system-config + +Remove script ceph.sh on both controllers. + +:: + + sudo rm -rf /etc/services.d/controller/ceph.sh + sudo rm -rf /etc/services.d/worker/ceph.sh + sudo rm -rf /etc/services.d/storage/ceph.sh + +************************************************************************ +Disable ceph osd on all storage hosts and create configmap for migration +************************************************************************ + +#. Login to controller host and run ``ceph-preshutdown.sh`` firstly. + + :: + + sudo ceph-preshutdown.sh + +Login to controller-0, disable the Ceph osd service, and create a journal +file. + +#. Disable the Ceph osd service. + + :: + + sudo service ceph -a stop osd.0 + === osd.0 === + Stopping Ceph osd.0 on controller-0...kill 213077... + done + 2020-04-26 23:36:56.988 7f1d647bb1c0 -1 journal do_read_entry(585007104): bad header magic + 2020-04-26 23:36:56.988 7f1d647bb1c0 -1 journal do_read_entry(585007104): bad header magic + 2020-04-26 23:36:56.994 7f1d647bb1c0 -1 flushed journal /var/lib/ceph/osd/ceph-0/journal for object store /var/lib/ceph/osd/ceph-0 + +#. Remove the journal link and create a blank journal file. + + :: + + sudo rm -f /var/lib/ceph/osd/ceph-0/journal + sudo touch /var/lib/ceph/osd/ceph-0/journal + sudo dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/journal bs=1M count=1024 + sudo ceph-osd --id 0 --mkjournal --no-mon-config + sudo umount /dev/sdc1 + +#. Mount to host patch /var/lib/ceph/osd, which can be accessed by the Rook + osd pod. + + :: + + sudo mkdir -p /var/lib/ceph/ceph-0/osd0 + sudo mount /dev/sdc1 /var/lib/ceph/ceph-0/osd0 + sudo ls /var/lib/ceph/ceph-1/osd0 -l + total 1048640 + -rw-r--r-- 1 root root 3 Apr 26 12:57 active + -rw-r--r-- 1 root root 37 Apr 26 12:57 ceph_fsid + drwxr-xr-x 388 root root 12288 Apr 27 00:01 current + -rw-r--r-- 1 root root 37 Apr 26 12:57 fsid + -rw-r--r-- 1 root root 1073741824 Apr 27 00:49 journal + -rw-r--r-- 1 root root 37 Apr 26 12:57 journal_uuid + -rw------- 1 root root 56 Apr 26 12:57 keyring + -rw-r--r-- 1 root root 21 Apr 26 12:57 magic + -rw-r--r-- 1 root root 6 Apr 26 12:57 ready + -rw-r--r-- 1 root root 4 Apr 26 12:57 store_version + -rw-r--r-- 1 root root 53 Apr 26 12:57 superblock + -rw-r--r-- 1 root root 0 Apr 26 12:57 sysvinit + -rw-r--r-- 1 root root 10 Apr 26 12:57 type + -rw-r--r-- 1 root root 2 Apr 26 12:57 wanttobe + -rw-r--r-- 1 root root 2 Apr 26 12:57 whoami + +Create a configmap with the name ``rook-ceph-osd-controller-0-config``. In the +configmap, specify the OSD data folder. In the example below, the Rook osd0 data +path is ``/var/lib/ceph/osd0``. + +:: + + osd-dirs: '{"/var/lib/ceph/ceph-0/":0}' + + system host-stor-list controller-0 + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + | uuid | function | osdid | state | idisk_uuid | journal_path | journal_no | journal_size | tier_name | + | | | | | | | de | _gib | | + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + | 21a90d60-2f1e-4f46-badc-afa7d9117622 | osd | 0 | configured | a13c6ac9-9d59-4063-88dc-2847e8aded85 | /dev/disk/by-path/pci-0000: | /dev/sdc2 | 1 | storage | + | | | | | | 00:03.0-ata-3.0-part2 | | | | + | | | | | | | | | | + +--------------------------------------+----------+-------+------------+--------------------------------------+-----------------------------+------------+--------------+-----------+ + +#. Sample ``osd-configmap.yaml`` file. + :: + + apiVersion: v1 + kind: ConfigMap + metadata: + name: rook-ceph-osd-controller-0-config + namespace: kube-system + data: + osd-dirs: '{"/var/lib/ceph/ceph-0":0}' + +#. Apply yaml file for configmap. + + :: + + kubectl apply -f osd-configmap.yaml + configmap/rook-ceph-osd-controller-0-config created + configmap/rook-ceph-osd-controller-1-config created + +************************** +Ceph monitor data movement +************************** + +For Ceph monitor migration, the Rook deployed monitor pod will read monitor data +for host path ``/var/lib/ceph/mon-/data``. For example, if only one monitor +pod is deployed, a monitor process named ``mon.a`` in the monitor pod will be +created and monitor data will be in the host path ``/var/lib/ceph/mon-a/data``. + +Before migration, disable one monitor service and launch another monitor +specified with the ``--mon-data /var/lib/ceph/mon-a/data`` parameter. This will +migrate the monitor data to ``/var/lib/ceph/mon-a/data``. + +#. Login to host controller-0 and disable service monitor.controller. + + :: + + sudo service ceph -a stop mon.controller + === mon.controller-0 === + Stopping Ceph mon.controller on controller-0...kill 291101...done + +#. Copy mon data to the ``/var/lib/ceph/mon-a/data`` folder. + + :: + + sudo mkdir -p /var/lib/ceph/mon-a/data/ + sudo ceph-monstore-tool /var/lib/ceph/mon/ceph-controller/ store-copy /var/lib/ceph/mon-a/data/ + +#. Update monmap in this copy of monitor data and update monitor info. + + :: + + sudo ceph-mon --extract-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ + 2020-05-21 06:01:39.477 7f69d63b2140 -1 wrote monmap to monmap + + monmaptool --print monmap + monmaptool: monmap file monmap + epoch 2 + fsid 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a + last_changed 2020-05-21 04:29:59.164965 + created 2020-05-21 03:50:51.893155 + 0: 192.188.204.3:6789/0 mon.controller + + sudo monmaptool --rm controller monmap + monmaptool: monmap file monmap + monmaptool: removing controller + monmaptool: writing epoch 2 to monmap (2 monitors) + + sudo monmaptool --add a 192.188.204.3 monmap + monmaptool: monmap file monmap + monmaptool: writing epoch 2 to monmap (1 monitors) + + monmaptool --print monmap + monmaptool: monmap file monmap + epoch 2 + fsid 6c9e9e4b-599e-4a4f-931e-2c09bec74a2a + last_changed 2020-05-21 04:29:59.164965 + created 2020-05-21 03:50:51.893155 + 0: 192.188.204.3:6789/0 mon.a + + sudo ceph-mon --inject-monmap monmap --mon-data /var/lib/ceph/mon-a/data/ + +************************************** +Disable Ceph monitors and Ceph manager +************************************** + +Disable Ceph manager on host controller-0 and controller-1. + +:: + + ps -aux | grep mgr + root 97971 0.0 0.0 241336 18488 ? S< 03:54 0:02 /usr/bin/python /etc/init.d/mgr-restful-plugin start + root 97990 0.5 0.0 241468 18916 ? S< 03:54 0:38 /usr/bin/python /etc/init.d/mgr-restful-plugin start + root 186145 1.2 0.3 716488 111328 ? S-config" + to get every host's osd info. In this configmap 'osd-dirs: '{"":}'' + means, osd with this id, mount to this folder /osd. + +#. Exit all ceph daemon + +#. Assign label for rook-ceph + + Assign label "ceph-mon-placement=enabled" and "ceph-mgr-placement=enabled" + to controller-0. + +#. Copy mon data to /var/lib/ceph/mon-a/data folder on controller-0 + + Rook-ceph will launch mon pod on host with label + "ceph-mon-placement=enabled". And the mon pod will use folder + /var/lib/ceph/mon-/data, so use ceph-monstore-tool copy mon-data to + this folder. As rook launch mon with id a,b,c..., the first mon pod must + use folder /var/lib/ceph/mon-a/data. + +#. Edit monmap + + Get monmap for /var/lib/ceph/mon-a/data, edit to update mon item with + mon.a and controller-0 ip, and inject to mon-a data folder. + +#. Apply application rook-ceph-apps + + Rook-ceph will register CRD CephCluster and launch a containerized ceph + cluster. By read above setting, this containerized ceph cluster will use + original mon data and OSD data. + +#. Launch more mon pods + + After containerized ceph cluster launch, for duplex and multi, assign label + to other 2 hosts, and change mon count to 3, with "kubectl edit CephCluster". + +#. Edit pv and pvc provisioner + + For already created pv and pvc in openstack, if stx-openstack application + is applied before migration, change the provsioner to kube-system.rbd.csi.ceph.com + from rbd/ceph.com, as csi pod will provide csi service to K8s. + +#. Update keyring and conf file on controller-0, controler-1 + + For k8s pv, it will use host /etc/ceph/keyring for ceph cluster access, update + folder /etc/ceph/ with containerized ceph cluster. + +#. Update helm override value for application stx-openstack + + Update helm override value for cinder, to change provisoner from rbd/ceph.com + to kube-system.rbd.csi.ceph.com. + +#. Edit secret ceph-admin to update keyring + + Update keyring in ceph-admin + +#. Re-apply application stx-openstack + +------------------ +Guide Step by Step +------------------ + +.. toctree:: + :maxdepth: 1 + + ceph_cluster_aio_duplex_migration + ceph_cluster_aio_simplex_migration + ceph_cluster_migration