openstack-helm/doc/source/testing/ceph-resiliency/monitor-failure.rst

===============
Monitor Failure
===============

Test Environment
================

- Cluster size: 4 host machines
- Number of disks: 24 (= 6 disks per host * 4 hosts)
- Kubernetes version: 1.9.3
- Ceph version: 12.2.3
- OpenStack-Helm commit: 28734352741bae228a4ea4f40bcacc33764221eb

We have 3 Monitors in this Ceph cluster, one on each of the 3 Monitor
hosts.

Case: 1 out of 3 Monitor Processes is Down
==========================================

This is to test a scenario when 1 out of 3 Monitor processes is down.

To bring down 1 Monitor process (out of 3), we identify a Monitor
process and kill it from the monitor host (not a pod).

.. code-block:: console

  $ ps -ef | grep ceph-mon
  ceph     16112 16095  1 14:58 ?        00:00:03 /usr/bin/ceph-mon --cluster ceph --setuser ceph --setgroup ceph -d -i voyager2 --mon-data /var/lib/ceph/mon/ceph-voyager2 --public-addr 135.207.240.42:6789
  $ sudo kill -9 16112

In the mean time, we monitored the status of Ceph and noted that it
takes about 24 seconds for the killed Monitor process to recover from
``down`` to ``up``. The reason is that Kubernetes automatically
restarts pods whenever they are killed.

.. code-block:: console

  (mon-pod):/# ceph -s
    cluster:
      id:     fd366aef-b356-4fe7-9ca5-1c313fe2e324
      health: HEALTH_WARN
              mon voyager1 is low on available space
              1/3 mons down, quorum voyager1,voyager3

    services:
      mon: 3 daemons, quorum voyager1,voyager3, out of quorum: voyager2
      mgr: voyager4(active)
      osd: 24 osds: 24 up, 24 in

.. code-block:: console

  (mon-pod):/# ceph -s
    cluster:
      id:     fd366aef-b356-4fe7-9ca5-1c313fe2e324
      health: HEALTH_WARN
              mon voyager1 is low on available space
              1/3 mons down, quorum voyager1,voyager2

    services:
      mon: 3 daemons, quorum voyager1,voyager2,voyager3
      mgr: voyager4(active)
      osd: 24 osds: 24 up, 24 in

We also monitored the status of the Monitor pod through ``kubectl get
pods -n ceph``, and the status of the pod (where a Monitor process is
killed) changed as follows: ``Running`` -> ``Error`` -> ``Running``
and this recovery process takes about 24 seconds.

Case: 2 out of 3 Monitor Processes are Down
===========================================

This is to test a scenario when 2 out of 3 Monitor processes are down.
To bring down 2 Monitor processes (out of 3), we identify two Monitor
processes and kill them from the 2 monitor hosts (not a pod).

We monitored the status of Ceph when the Monitor processes are killed
and noted that the symptoms are similar to when 1 Monitor process is
killed:

- It takes longer (about 1 minute) for the killed Monitor processes to
  recover from ``down`` to ``up``.

- The status of the pods (where the two Monitor processes are killed)
  changed as follows: ``Running`` -> ``Error`` -> ``CrashLoopBackOff``
  -> ``Running`` and this recovery process takes about 1 minute.


Case: 3 out of 3 Monitor Processes are Down
===========================================

This is to test a scenario when 3 out of 3 Monitor processes are down.
To bring down 3 Monitor processes (out of 3), we identify all 3
Monitor processes and kill them from the 3 monitor hosts (not pods).

We monitored the status of Ceph Monitor pods and noted that the
symptoms are similar to when 1 or 2 Monitor processes are killed:

.. code-block:: console

  $ kubectl get pods -n ceph -o wide | grep ceph-mon
  NAME                                       READY     STATUS    RESTARTS   AGE
  ceph-mon-8tml7                             0/1       Error     4          10d
  ceph-mon-kstf8                             0/1       Error     4          10d
  ceph-mon-z4sl9                             0/1       Error     7          10d

.. code-block:: console

  $ kubectl get pods -n ceph -o wide | grep ceph-mon
  NAME                                       READY     STATUS               RESTARTS   AGE
  ceph-mon-8tml7                             0/1       CrashLoopBackOff     4          10d
  ceph-mon-kstf8                             0/1       Error                4          10d
  ceph-mon-z4sl9                             0/1       CrashLoopBackOff     7          10d


.. code-block:: console

  $ kubectl get pods -n ceph -o wide | grep ceph-mon
  NAME                                       READY     STATUS    RESTARTS   AGE
  ceph-mon-8tml7                             1/1       Running   5          10d
  ceph-mon-kstf8                             1/1       Running   5          10d
  ceph-mon-z4sl9                             1/1       Running   8          10d

The status of the pods (where the three Monitor processes are killed)
changed as follows: ``Running`` -> ``Error`` -> ``CrashLoopBackOff``
-> ``Running`` and this recovery process takes about 1 minute.

Case: Monitor database is destroyed
===================================

We intentionlly destroy a Monitor database by removing
``/var/lib/openstack-helm/ceph/mon/mon/ceph-voyager3/store.db``.

Symptom:
--------

A Ceph Monitor running on voyager3 (whose Monitor database is destroyed) becomes out of quorum,
and the mon-pod's status stays in ``Running`` -> ``Error`` -> ``CrashLoopBackOff`` while keeps restarting.

.. code-block:: console

  (mon-pod):/# ceph -s
    cluster:
      id:     9d4d8c61-cf87-4129-9cef-8fbf301210ad
      health: HEALTH_WARN
              too few PGs per OSD (22 < min 30)
              mon voyager1 is low on available space
              1/3 mons down, quorum voyager1,voyager2

    services:
      mon: 3 daemons, quorum voyager1,voyager2, out of quorum: voyager3
      mgr: voyager1(active), standbys: voyager3
      mds: cephfs-1/1/1 up  {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
      osd: 24 osds: 24 up, 24 in
      rgw: 2 daemons active

    data:
      pools:   18 pools, 182 pgs
      objects: 240 objects, 3359 bytes
      usage:   2675 MB used, 44675 GB / 44678 GB avail
      pgs:     182 active+clean

.. code-block:: console

  $ kubectl get pods -n ceph -o wide|grep ceph-mon
  ceph-mon-4gzzw                             1/1       Running            0          6d        135.207.240.42    voyager2
  ceph-mon-6bbs6                             0/1       CrashLoopBackOff   5          6d        135.207.240.43    voyager3
  ceph-mon-qgc7p                             1/1       Running            0          6d        135.207.240.41    voyager1

The logs of the failed mon-pod shows the ceph-mon process cannot run as ``/var/lib/ceph/mon/ceph-voyager3/store.db`` does not exist.

.. code-block:: console

  $ kubectl logs ceph-mon-6bbs6 -n ceph
  + ceph-mon --setuser ceph --setgroup ceph --cluster ceph -i voyager3 --inject-monmap /etc/ceph/monmap-ceph --keyring /etc/ceph/ceph.mon.keyring --mon-data /var/lib/ceph/mon/ceph-voyager3
  2018-07-10 18:30:04.546200 7f4ca9ed4f00 -1 rocksdb: Invalid argument: /var/lib/ceph/mon/ceph-voyager3/store.db: does not exist (create_if_missing is false)
  2018-07-10 18:30:04.546214 7f4ca9ed4f00 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-voyager3': (22) Invalid argument

Recovery:
---------

Remove the entire ceph-mon directory on voyager3, and then Ceph will automatically
recreate the database by using the other ceph-mons' database.

.. code-block:: console

  $ sudo rm -rf /var/lib/openstack-helm/ceph/mon/mon/ceph-voyager3

.. code-block:: console

  (mon-pod):/# ceph -s
    cluster:
      id:     9d4d8c61-cf87-4129-9cef-8fbf301210ad
      health: HEALTH_WARN
              too few PGs per OSD (22 < min 30)
              mon voyager1 is low on available space

    services:
      mon: 3 daemons, quorum voyager1,voyager2,voyager3
      mgr: voyager1(active), standbys: voyager3
      mds: cephfs-1/1/1 up  {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
      osd: 24 osds: 24 up, 24 in
      rgw: 2 daemons active

    data:
      pools:   18 pools, 182 pgs
      objects: 240 objects, 3359 bytes
      usage:   2675 MB used, 44675 GB / 44678 GB avail
      pgs:     182 active+clean