docs/doc/source/storage/kubernetes/replace-osds-on-a-standard-system-f3b1e376304c.rst
Keane Lim cbc821d750 New disk replacement procedures
Change-Id: I703a61da792e59fdd19dfcaef5376b7f2f2ca975
Signed-off-by: Keane Lim <keane.lim@windriver.com>
2022-03-30 17:50:44 -04:00

5.1 KiB

Replace OSDs on a Standard System

You can replace in a standard system to increase capacity, or replace faulty disks on the host without reinstalling the host.

For standard systems with controller storage, ensure that the controller with the to be replaced is the standby controller.

For example, if the disk replacement has to be done on controller-1 and it is the active controller, use the following command to swact the controller to controller-0:

~(keystone_admin)$ system host-swact controller-1

After controller swact, you will have to connect via ssh again to the <oam-floating-ip> to connect to the newly active controller-0.

Standard systems with controller storage

  1. If controller-1 has the OSD to be replaced, lock it.

    ~(keystone_admin)$ system host-lock controller-1
  2. Run the ceph osd destroy osd.<ID> --yes-i-really-mean-it command.

    ~(keystone_admin)$ ceph osd destroy osd.<id> --yes-i-really-mean-it
  3. Power down controller-1.

  4. Replace the storage disk.

  5. Power on controller-1.

  6. Unlock controller-1.

    ~(keystone_admin)]$ system host-unlock controller-1
  7. Wait for the recovery process in the Ceph cluster to start and finish.

    ~(keystone_admin)]$ ceph -s
    
    cluster:
      id:     50ce952f-bd16-4864-9487-6c7e959be95e
      health: HEALTH_WARN
      Degraded data redundancy: 13/50 objects degraded (26.000%), 10 pgs degraded
    
    services:
      mon: 1 daemons, quorum controller (age 68m)
      mgr: controller-0(active, since 66m)
      mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
      osd: 2 osds: 2 up (since 9s), 2 in (since 9s)
    
    data:
      pools:   3 pools, 192 pgs
      objects: 25 objects, 300 MiB
      usage:   655 MiB used, 15 GiB / 16 GiB avail
      pgs:     13/50 objects degraded (26.000%)
               182 active+clean
               8   active+recovery_wait+degraded
               2   active+recovering+degraded
    
    io:
      recovery: 24 B/s, 1 keys/s, 1 objects/s
  8. Ensure that the Ceph cluster is healthy.

    ~(keystone_admin)]$ ceph -s
    
    cluster:
      id:     50ce952f-bd16-4864-9487-6c7e959be95e
      health: HEALTH_OK
    
    services:
      mon: 1 daemons, quorum controller (age 68m)
      mgr: controller-0(active, since 66m), standbys: controller-1
      mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
      osd: 2 osds: 2 up (since 36s), 2 in (since 36s)
    
    data:
      pools:   3 pools, 192 pgs
      objects: 25 objects, 300 MiB
      usage:   815 MiB used, 15 GiB / 16 GiB avail
      pgs:     192 active+clean

Standard systems with dedicated storage nodes

  1. If storage-1 has the OSD to be replaced, lock it.

    ~(keystone_admin)$ system host-lock storage-1
  2. Run the ceph osd destroy osd.<ID> --yes-i-really-mean-it command.

    ~(keystone_admin)$ ceph osd destroy osd.<id> --yes-i-really-mean-it
  3. Power down storage-1.

  4. Replace the storage disk.

  5. Power on storage-1.

  6. Unlock storage-1.

    ~(keystone_admin)]$ system host-unlock storage-1
  7. Wait for the recovery process in the Ceph cluster to start and finish.

    ~(keystone_admin)]$ ceph -s
    
    cluster:
      id:     50ce952f-bd16-4864-9487-6c7e959be95e
      health: HEALTH_WARN
      Degraded data redundancy: 13/50 objects degraded (26.000%), 10 pgs degraded
    
    services:
      mon: 1 daemons, quorum controller (age 68m)
      mgr: controller-0(active, since 66m)
      mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
      osd: 2 osds: 2 up (since 9s), 2 in (since 9s)
    
    data:
      pools:   3 pools, 192 pgs
      objects: 25 objects, 300 MiB
      usage:   655 MiB used, 15 GiB / 16 GiB avail
      pgs:     13/50 objects degraded (26.000%)
               182 active+clean
               8   active+recovery_wait+degraded
               2   active+recovering+degraded
    
    io:
      recovery: 24 B/s, 1 keys/s, 1 objects/s
  8. Ensure that the Ceph cluster is healthy.

    ~(keystone_admin)]$ ceph -s
    
    cluster:
      id:     50ce952f-bd16-4864-9487-6c7e959be95e
      health: HEALTH_OK
    
    services:
      mon: 1 daemons, quorum controller (age 68m)
      mgr: controller-0(active, since 66m), standbys: controller-1
      mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
      osd: 2 osds: 2 up (since 36s), 2 in (since 36s)
    
    data:
      pools:   3 pools, 192 pgs
      objects: 25 objects, 300 MiB
      usage:   815 MiB used, 15 GiB / 16 GiB avail
      pgs:     192 active+clean