porthole/docs/ceph_maintenance.md
Venkata, Krishna (kv988c) 68ea0f9bfa [ceph]: Added procedure to stop the osd pod from being scheduled
Change-Id: I7d39f5fdfe9a198baaadfc0f56fbf7b7d0a8fc6b
2019-08-22 14:35:23 -05:00

2.2 KiB
Raw Blame History

Ceph Maintenance

This MOP covers Maintenance Activities related to Ceph.

Table of Contents

  • Table of Contents
      1. Generic commands
      1. Replace failed OSD

1. Generic Commands

Check OSD Status

To check the current status of OSDs, execute the following:

utilscli osd-maintenance check_osd_status

OSD Removal

To purge OSDs in down state, execute the following:

utilscli osd-maintenance osd_remove

OSD Removal By OSD ID

To purge OSDs by OSD ID in down state, execute the following:

utilscli osd-maintenance remove_osd_by_id --osd-id <OSDID>

Reweight OSDs

To adjust an OSDs crush weight in the CRUSH map of a running cluster, execute the following:

utilscli osd-maintenance reweight_osds

2. Replace failed OSD

In the context of a failed drive, Please follow below procedure.

Disable OSD pod on the host from being rescheduled

kubectl label nodes --all ceph_maintenance_window=inactive

Replace <NODE> with the name of the node were the failed osd pods exist.

kubectl label nodes <NODE> --overwrite ceph_maintenance_window=active

Replace <POD_NAME> with failed OSD pod name

kubectl patch -n ceph ds <POD_NAME> -p='{"spec":{"template":{"spec":{"nodeSelector":{"ceph-osd":"enabled","ceph_maintenance_window":"inactive"}}}}}'

Following commands should be run from utility container

Capture the failed OSD ID. Check for status down

utilscli ceph osd tree

Remove the OSD from Cluster. Replace <OSD_ID> with above captured failed OSD ID

utilscli osd-maintenance osd_remove_by_id --osd-id <OSD_ID>

Remove the failed drive and replace it with a new one without bringing down the node.

Once new drive is placed, change the label and delete the concern OSD pod in error or CrashLoopBackOff state. Replace <POD_NAME> with failed OSD pod name.

kubectl label nodes <NODE> --overwrite ceph_maintenance_window=inactive
kubectl delete pod <POD_NAME> -n ceph

Once pod is deleted, kubernetes will re-spin a new pod for the OSD. Once Pod is up, the osd is added to ceph cluster with weight equal to 0. we need to re-weight the osd.

utilscli osd-maintenance reweight_osds