Juanita-Balaraj 3e51eb1421 Replace OSDs on an AIO-DX System (pick)

Updated Patchset 9 comments
Updated Patchset 8 comments
Updated Patchset 6 comments
Updated Patchset 5 comments
Updated Patchset 2 comments
Updated Patchset 1 comments
Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com>
Change-Id: Ic12380d71ac0779c52b1280fbcce95710f6a2214
Signed-off-by: Juanita-Balaraj <juanita.balaraj@windriver.com>

2021-11-24 13:44:10 -05:00

3.4 KiB

Raw Blame History

Replace OSDs on an AIO-DX System

On systems that use a Ceph backend for persistent storage, you can replace storage disks or swap an node while the system is running, even if the storage resources are in active use.

Note

All storage alarms need to be cleared before starting this procedure.

You can replace in an system to increase capacity, or replace faulty disks on the host without reinstalling the host.

Ensure that the controller with the to be replaced is the standby controller.

For example, if the disk replacement has to be done on controller-1 and it is the active controller, use the following command to swact the controller to controller-0:
```
~(keystone_admin)$ system host-show controller-1 | fgrep capabilities
~(keystone_admin)$ system host-swact controller-1
```
After controller swact, you will have to connect via ssh again to the <oam-floating-ip> to connect to the newly active controller-0.

Determine the osdid of the disk that is to be replaced.

~(keystone_admin)$ system host-stor-list controller-1

Lock the standby controller-1 to make the changes.
```
~(keystone_admin)$ system host-lock controller-1
```

Run the ceph osd destroy osd.<ID> --yes-i-really-mean-it command.

~(keystone_admin)$ ceph osd destroy osd.<id> --yes-i-really-mean-it

Power down controller-1.
Replace the storage disk.
Power on controller-1.

Unlock controller-1.

~(keystone_admin)]$ system host-unlock controller-1

Wait for the recovery process in the Ceph cluster to complete.

~(keystone_admin)]$ ceph -s

cluster:
  id:     50ce952f-bd16-4864-9487-6c7e959be95e
  health: HEALTH_WARN
  Degraded data redundancy: 13/50 objects degraded (26.000%), 10 pgs degraded

services:
  mon: 1 daemons, quorum controller (age 68m)
  mgr: controller-0(active, since 66m)
  mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
  osd: 2 osds: 2 up (since 9s), 2 in (since 9s)

data:
  pools:   3 pools, 192 pgs
  objects: 25 objects, 300 MiB
  usage:   655 MiB used, 15 GiB / 16 GiB avail
  pgs:     13/50 objects degraded (26.000%)
           182 active+clean
           8   active+recovery_wait+degraded
           2   active+recovering+degraded

io:
  recovery: 24 B/s, 1 keys/s, 1 objects/s

Ensure that the Ceph cluster is healthy.

~(keystone_admin)]$ ceph -s

cluster:
  id:     50ce952f-bd16-4864-9487-6c7e959be95e
  health: HEALTH_OK

services:
  mon: 1 daemons, quorum controller (age 68m)
  mgr: controller-0(active, since 66m), standbys: controller-1
  mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
  osd: 2 osds: 2 up (since 36s), 2 in (since 36s)

data:
  pools:   3 pools, 192 pgs
  objects: 25 objects, 300 MiB
  usage:   815 MiB used, 15 GiB / 16 GiB avail
  pgs:     192 active+clean

3.4 KiB Raw Blame History

Replace OSDs on an AIO-DX System

3.4 KiB

Raw Blame History