Nagios: Updated the alert for Ceph OSD Down

Earlier the Nagios alert monitor was percent based as in when the percent of OSD
down is greater than 80, it will send alert.
>check_prom_alert!ceph_osd_down_pct_high!CRITICAL- CEPH OSDs down is
more than 80 percent!OK- CEPH OSDs down is less than 80 percent

Updated the code in nagios values.yaml to send alert when even 1 OSD is
down:
>check_prom_alert!ceph_osd_down!CRITICAL- One or more CEPH OSDs are down
>for more than 5 minutes!OK- All the CEPH OSDs are up

Change-Id: Id24c4a0cca64674890dae3599edc0c90d9534e90
This commit is contained in:
Pai, Radhika (rp592h) 2019-07-11 20:16:07 +00:00
parent b191d4ae99
commit 47565d2d19

View File

@ -944,7 +944,7 @@ conf:
}
define service {
check_command check_prom_alert!ceph_osd_down_pct_high!CRITICAL- CEPH OSDs down are more than 80 percent!OK- CEPH OSDs down is less than 80 percent
check_command check_prom_alert!ceph_osd_down!CRITICAL- One or more CEPH OSDs are down for more than 5 minutes!OK- All the CEPH OSDs are up
check_interval 60
hostgroup_name prometheus-hosts
service_description CEPH_OSDs-down