openstack-helm/ceph/templates/bin/_ceph-osd-liveness-readiness.sh.tpl
dave kormann 5f3f13cc0a Ceph liveness scripts
Replace socket-based liveness checks with scripts

The current TCP socket-based liveness/readiness check for Ceph
doesn't accurately reflect when daemons are live, doesn't handle
multiple OSDs on a host, and doesn't work when hostNetworking is
in use and the Ceph network is different from the one associated
with the hostname.  This change adds new scripts for checking
Ceph monitor and OSD liveness/readiness that query the Ceph Unix
domain sockets to get daemon status and exits 0 iff all sockets
report that their daemons are in an "active" state.

This isn't perfect: we don't know how many daemons SHOULD be
active, so if only a subset is live and the others have no
sockets (yet?), we'll still claim the pod is ready.  The scripts
also don't distinguish between liveness and readiness for OSDs.

Change-Id: I5d370b4bc4025fece2e640355c3a29167afca871
2017-12-01 13:45:41 +00:00

46 lines
1.4 KiB
Smarty
Executable File

#!/bin/sh
# Copyright 2017 The Openstack-Helm Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# A liveness check for ceph OSDs: exit 0 iff
# all OSDs on this host are in the "active" state
# per their admin sockets.
CEPH=${CEPH_CMD:-/usr/bin/ceph}
SOCKDIR=${CEPH_SOCKET_DIR:-/run/ceph}
SBASE=${CEPH_OSD_SOCKET_BASE:-ceph-osd}
SSUFFIX=${CEPH_SOCKET_SUFFIX:-asok}
# default: no sockets, not live
cond=1
for sock in $SOCKDIR/$SBASE.*.$SSUFFIX; do
if [ -S $sock ]; then
osdid=`echo $sock | awk -F. '{print $2}'`
state=`${CEPH} -f json-pretty --connect-timeout 1 --admin-daemon "${sock}" status|grep state|sed 's/.*://;s/[^a-z]//g'`
echo "OSD $osdid $state";
# this might be a stricter check than we actually want. what are the
# other values for the "state" field?
if [ "x${state}x" = 'xactivex' ]; then
cond=0
else
# one's not ready, so the whole pod's not ready.
exit 1
fi
else
echo "No daemon sockets found in $SOCKDIR"
fi
done
exit $cond