Merge "[rook-ceph] Document the procedure for migrating Ceph to Rook"
This commit is contained in:
commit
1409a039e0
@ -12,6 +12,7 @@ issues with the following:
|
|||||||
persistent-storage
|
persistent-storage
|
||||||
ubuntu-hwe-kernel
|
ubuntu-hwe-kernel
|
||||||
ceph
|
ceph
|
||||||
|
migrate-ceph-to-rook
|
||||||
|
|
||||||
Getting help
|
Getting help
|
||||||
============
|
============
|
||||||
|
99
doc/source/troubleshooting/migrate-ceph-to-rook.rst
Normal file
99
doc/source/troubleshooting/migrate-ceph-to-rook.rst
Normal file
@ -0,0 +1,99 @@
|
|||||||
|
Migrating Ceph to Rook
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
It may be necessary or desired to migrate an existing `Ceph`_ cluster that was
|
||||||
|
originally deployed using the Ceph charts in `openstack-helm-infra`_ to be
|
||||||
|
managed by the Rook operator moving forward. This operation is not a supported
|
||||||
|
`Rook`_ feature, but it is possible to achieve.
|
||||||
|
|
||||||
|
The procedure must completely stop and start all of the Ceph pods a few times
|
||||||
|
during the migration process and is therefore disruptive to Ceph operations,
|
||||||
|
but the result is a Ceph cluster deployed and managed by the Rook operator that
|
||||||
|
maintains the same cluster FSID as the original with all OSD data preserved.
|
||||||
|
|
||||||
|
The steps involved in migrating a legacy openstack-helm Ceph cluster to Rook
|
||||||
|
are based on the Rook `Troubleshooting`_ documentation and are outlined below.
|
||||||
|
|
||||||
|
#. Retrieve the cluster FSID, the name and IP address of an existing ceph-mon
|
||||||
|
host that contains a healthy monitor store, and the numbers of existing
|
||||||
|
ceph-mon and ceph-mgr pods from the existing Ceph cluster and save
|
||||||
|
this information for later.
|
||||||
|
#. Rename CephFS pools so that they use the naming convention expected by Rook.
|
||||||
|
CephFS pools names are not customizable with Rook, so new metadata and data
|
||||||
|
pools will be created for any filesystems deployed by Rook if the existing
|
||||||
|
pools are not named as expected. Pools must be named
|
||||||
|
"<filesystem name>-metadata" and "<filesystem name>-data" in order for Rook to
|
||||||
|
use existing CephFS pools.
|
||||||
|
#. Delete Ceph resources deployed via the openstack-helm-infra charts,
|
||||||
|
uninstall the charts, and remove Ceph node labels.
|
||||||
|
#. Add the Rook Helm repository and deploy the Rook operator and a minimal Ceph
|
||||||
|
cluster using the Rook Helm charts. The cluster will have a new FSID and will
|
||||||
|
not include any OSDs because the OSD disks are all initialized in the old Ceph
|
||||||
|
cluster and that will be recognized.
|
||||||
|
#. Save the Ceph monitor keyring, host, and IP address, as well as the IP
|
||||||
|
address of the new monitor itself, for later use.
|
||||||
|
#. Stop the Rook operator by scaling the rook-ceph-operator deployment in the
|
||||||
|
Rook operator namespace to zero replicas and destroy the newly-deployed Ceph
|
||||||
|
cluster by deleting all deployments in the Rook Ceph cluster namespace except
|
||||||
|
for rook-ceph-tools.
|
||||||
|
#. Copy the store from an old monitor saved in the first step to the new
|
||||||
|
monitor and update its keyring with the key saved from the new monitor.
|
||||||
|
#. Edit the monmap in the copied monitor store using monmaptool to remove all
|
||||||
|
of the old monitors from the monmap and add a single new one using the new
|
||||||
|
monitor's IP address.
|
||||||
|
#. Edit the rook-ceph-mon secret in the Rook Ceph cluster namespace and
|
||||||
|
overwrite the base64-encoded FSID with the base64 encoding of the FSID from the
|
||||||
|
old Ceph cluster. Make sure the encoding includes the FSID only, no whitespace
|
||||||
|
or newline characters.
|
||||||
|
#. Edit the rook-config-override configmap in the Rook Ceph cluster namespace
|
||||||
|
and set auth_supported, auth_client_required, auth_service_required, and
|
||||||
|
auth_cluster_required all to none in the global config section to disable
|
||||||
|
authentication in the Ceph cluster.
|
||||||
|
#. Scale the rook-ceph-operator deployment back to one replica to deploy the
|
||||||
|
Ceph cluster again. This time, the cluster will have the old cluster's FSID
|
||||||
|
and the OSD pods will come up in the new cluster with their data intact.
|
||||||
|
#. Using 'ceph auth import' from the rook-ceph-tools pod, import the
|
||||||
|
[client.admin] portion of the key saved from the new monitor from its initial
|
||||||
|
Rook deployment.
|
||||||
|
#. Edit the rook-config-override configmap again and remove the settings
|
||||||
|
previously added to disable authentication. This will re-enable authentication
|
||||||
|
when Ceph daemons are restarted.
|
||||||
|
#. Scale the rook-ceph-operator deployment to zero replicas and delete the Ceph
|
||||||
|
cluster deployments again to destroy the Ceph cluster one more time.
|
||||||
|
#. When everything has terminated, immediately scale the rook-ceph-operator
|
||||||
|
deployment back to one to deploy the Ceph cluster one final time.
|
||||||
|
#. After the Ceph cluster has been deployed again and all of the daemons are
|
||||||
|
running (with authentication enabled now), edit the deployed cephcluster
|
||||||
|
resource in the Rook Ceph cluster namespace to set the mon and mgr counts to
|
||||||
|
their original values saved in the first step.
|
||||||
|
#. Now you have a Ceph cluster, deployed and managed by Rook, with the original
|
||||||
|
FSID and all of its data intact, with the same number of monitors and managers
|
||||||
|
that existed in the previous deployment.
|
||||||
|
|
||||||
|
There is a rudimentary `script`_ provided that automates this process. It
|
||||||
|
isn't meant to be a complete solution and isn't supported as such. It is simply
|
||||||
|
an example that is known to work for some test implementations.
|
||||||
|
|
||||||
|
The script makes use of environment variables to tell it which Rook release and
|
||||||
|
Ceph release to deploy, in which Kubernetes namespaces to deploy the Rook
|
||||||
|
operator and Ceph cluster, and paths to YAML files that contain the necessary
|
||||||
|
definitions for the Rook operator and Rook Ceph cluster. All of these have
|
||||||
|
default values and are not required to be set, but it will likely be necessary
|
||||||
|
at least to define paths to the YAML files required to deploy the Rook operator
|
||||||
|
and Ceph cluster. Please refer to the comments near the top of the script for
|
||||||
|
more information about utilizing these environment variables.
|
||||||
|
|
||||||
|
The Ceph cluster definition provided to Rook should match the existing Ceph
|
||||||
|
cluster as closely as possible. Otherwise, the migration may not migrate Ceph
|
||||||
|
cluster resources as expected. The migration of deployed Ceph resources is
|
||||||
|
unique to each deployment, so sample definitions are not provided here.
|
||||||
|
|
||||||
|
Migrations using this procedure and/or script are very delicate and will
|
||||||
|
require a lot of testing prior to being implemented in production. This is a
|
||||||
|
risky operation even with testing and should be performed very carefully.
|
||||||
|
|
||||||
|
.. _Ceph: https://ceph.io
|
||||||
|
.. _openstack-helm-infra: https://opendev.org/openstack/openstack-helm-infra
|
||||||
|
.. _Rook: https://rook.io
|
||||||
|
.. _Troubleshooting: https://rook.io/docs/rook/latest-release/Troubleshooting/disaster-recovery/#adopt-an-existing-rook-ceph-cluster-into-a-new-kubernetes-cluster
|
||||||
|
.. _script: https://opendev.org/openstack/openstack-helm-infra/src/tools/deployment/ceph/migrate-to-rook-ceph.sh
|
Loading…
Reference in New Issue
Block a user