docs/doc/source/fault-mgmt/kubernetes/800-series-alarm-messages.rst
Ron Stone 2b62f49a9d Fix symlinks
Changed paths to avoid '..', which breaks symlinks in newer versions of sphinx.
Consolidated installation include files under /_includes. Prefixed r5 versions with 'r5_'
Moved files that are used up/down, but at different paths under /shared/_includes
and /shared/figures
Move two include files to /_includes
Moved addtional images to /shared/figures/... Required for DS platform builds.

Signed-off-by: Ron Stone <ronald.stone@windriver.com>
Change-Id: Ia38f4205c5803b3d1fc043e6c59617c34a4e5cbd
Signed-off-by: Ron Stone <ronald.stone@windriver.com>
2021-09-02 13:31:45 +00:00

3.9 KiB

800 Series Alarm Messages

The system inventory and maintenance service reports system changes with different degrees of severity. Use the reported alarms to monitor the overall health of the system.

Alarm ID: 800.001

Storage Alarm Condition:

1 mons down, quorum 1,2 controller-1,storage-0

Entity Instance cluster=<dist-fs-uuid>
Degrade Affecting Severity: None
Severity: C/M*
Proposed Repair Action If problem persists, contact next level of support.

Alarm ID: 800.003 Storage Alarm Condition: Quota/Space mismatch for the <tiername> tier. The sum of Ceph pool quotas does not match the tier size.
Entity Instance cluster=<dist-fs-uuid>.tier=<tiername>
Degrade Affecting Severity: None
Severity: m
Proposed Repair Action Update ceph storage pool quotas to use all available tier space.

Alarm ID: 800.010 Potential data loss. No available OSDs in storage replication group.
Entity Instance cluster=<dist-fs-uuid>.peergroup=<group-x>
Degrade Affecting Severity: None
Severity: C*
Proposed Repair Action Ensure storage hosts from replication group are unlocked and available. Check if OSDs of each storage host are up and running. If problem persists contact next level of support.

Alarm ID: 800.011 Loss of replication in peergroup.
Entity Instance cluster=<dist-fs-uuid>.peergroup=<group-x>
Degrade Affecting Severity: None
Severity: M*
Proposed Repair Action Ensure storage hosts from replication group are unlocked and available. Check if OSDs of each storage host are up and running. If problem persists contact next level of support.

Alarm ID: 800.102

Storage Alarm Condition:

PV configuration <error/failed to apply> on <hostname>. Reason: <detailed reason>.

Entity Instance pv=<pv_uuid>
Degrade Affecting Severity: None
Severity: C/M*
Proposed Repair Action Remove failed PV and associated Storage Device then recreate them.

Alarm ID: 800.103

Storage Alarm Condition:

[ Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded threshold and automatic extension failed.

Metadata usage for LVM thin pool <VG name>/<Pool name> exceeded threshold ]; threshold x%, actual y%.

Entity Instance <hostname>.lvmthinpool=<VG name>/<Pool name>
Degrade Affecting Severity: None
Severity: C*
Proposed Repair Action Increase Storage Space Allotment for Cinder on the 'lvm' backend. Consult the user documentation for more details. If problem persists, contact next level of support.

Alarm ID: 800.104

Storage Alarm Condition:

<storage-backend-name> configuration failed to apply on host: <host-uuid>.

Degrade Affecting Severity: None
Severity: C*
Proposed Repair Action Update backend setting to reapply configuration. Consult the user documentation for more details. If problem persists, contact next level of support.