porthole/docs/ceph_maintenance.md
Kavva, Jagan Mohan (jk330k) c9b9c5aaeb Edit nccli string to utilscli for Ceph Utility Container
Updated the nccli string to utilscli to avoid AT&T specific Network
cloud terminology.

Change-Id: I8dae02559a422dab0bdb8007daaa4f86a67f087e
2019-08-08 06:19:58 +00:00

1.7 KiB
Raw Blame History

Ceph Maintenance

This MOP covers Maintenance Activities related to Ceph.

Table of Contents

  • Table of Contents
      1. Generic commands
      1. Replace failed OSD

1. Generic Commands

Check OSD Status

To check the current status of OSDs, execute the following:

utilscli osd-maintenance check_osd_status

OSD Removal

To purge OSDs in down state, execute the following:

utilscli osd-maintenance osd_remove

OSD Removal By OSD ID

To purge OSDs by OSD ID in down state, execute the following:

utilscli osd-maintenance remove_osd_by_id --osd-id <OSDID>

Reweight OSDs

To adjust an OSDs crush weight in the CRUSH map of a running cluster, execute the following:

utilscli osd-maintenance reweight_osds

2. Replace failed OSD

In the context of a failed drive, Please follow below procedure. Following commands should be run from utility container

Capture the failed OSD ID. Check for status down

utilscli ceph osd tree

Remove the OSD from Cluster. Replace <OSD_ID> with above captured failed OSD ID

utilscli osd-maintenance osd_remove_by_id --osd-id <OSD_ID>

Remove the failed drive and replace it with a new one without bringing down the node.

Once new drive is placed, delete the concern OSD pod in error or CrashLoopBackOff state. Replace <pod_name> with failed OSD pod name.

kubectl delete pod <pod_name> -n ceph

Once pod is deleted, kubernetes will re-spin a new pod for the OSD. Once Pod is up, the osd is added to ceph cluster with weight equal to 0. we need to re-weight the osd.

utilscli osd-maintenance reweight_osds