Added examples to create event based alarm
- Aodh event based alarm using single and multiple traits - New sub-section in 'Debug alarm' - Correct path to aodh.conf.sample file - Seperated example sections into - Threshold based alarm - Composite alarm - Event based alarm Change-Id: I0b025a7401ad7feaf05f7e31e77cd3f92c9761a0 Signed-off-by: Sachin Patil <psachin@redhat.com>
This commit is contained in:
parent
2c795471fe
commit
c875113961
@ -144,18 +144,27 @@ Using alarms
|
|||||||
Alarm creation
|
Alarm creation
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
|
Threshold based alarm
|
||||||
|
`````````````````````
|
||||||
|
|
||||||
An example of creating a Gnocchi threshold-oriented alarm, based on an upper
|
An example of creating a Gnocchi threshold-oriented alarm, based on an upper
|
||||||
bound on the CPU utilization for a particular instance:
|
bound on the CPU utilization for a particular instance:
|
||||||
|
|
||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
$ aodh alarm create --name cpu_hi \
|
$ aodh alarm create \
|
||||||
|
--name cpu_hi \
|
||||||
--type gnocchi_resources_threshold \
|
--type gnocchi_resources_threshold \
|
||||||
--description 'instance running hot' \
|
--description 'instance running hot' \
|
||||||
--metric cpu_util --threshold 70.0 \
|
--metric cpu_util \
|
||||||
--comparison-operator gt --aggregation_method avg \
|
--threshold 70.0 \
|
||||||
--granularity 600 --evaluation-periods 3 \
|
--comparison-operator gt \
|
||||||
--alarm-action 'log://' --resource_id INSTANCE_ID
|
--aggregation-method mean \
|
||||||
|
--granularity 600 \
|
||||||
|
--evaluation-periods 3 \
|
||||||
|
--alarm-action 'log://' \
|
||||||
|
--resource-id INSTANCE_ID \
|
||||||
|
--resource-type instance
|
||||||
|
|
||||||
This creates an alarm that will fire when the average CPU utilization
|
This creates an alarm that will fire when the average CPU utilization
|
||||||
for an individual instance exceeds 70% for three consecutive 10
|
for an individual instance exceeds 70% for three consecutive 10
|
||||||
@ -220,17 +229,21 @@ time-constraint
|
|||||||
day or days of the week (expressed as ``cron`` expression with an
|
day or days of the week (expressed as ``cron`` expression with an
|
||||||
optional timezone).
|
optional timezone).
|
||||||
|
|
||||||
|
Composite alarm
|
||||||
|
```````````````
|
||||||
|
|
||||||
An example of creating a combination alarm, based on the combined
|
An example of creating a combination alarm, based on the combined
|
||||||
state of two underlying alarms:
|
state of two underlying alarms:
|
||||||
|
|
||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
$ aodh alarm create --name meta --type composite \
|
$ aodh alarm create --name meta --type composite \
|
||||||
--composite-rule '{"or":[{"threshold": 0.8,"metric": "cpu_util", "type": \
|
--composite-rule '{"or": [{"threshold": 0.8,"metric": "cpu_util", \
|
||||||
"gnocchi_resources_threshold", "resource_id": INSTANCE_ID, \
|
"type": "gnocchi_resources_threshold", "resource_id": INSTANCE_ID1, \
|
||||||
"aggregation-method": "last"},{"threshold": 0.8,"metric": "cpu_util", \
|
"aggregation_method": "last", "resource_type": "instance"}, {"threshold": \
|
||||||
"type": "gnocchi_resources_threshold", "resource_id": INSTANCE_ID2, \
|
0.8, "metric": "cpu_util", "type": "gnocchi_resources_threshold", \
|
||||||
"aggregation-method": "last"}]}' \
|
"resource_id": INSTANCE_ID2, "aggregation_method": "last", \
|
||||||
|
"resource_type": "instance"}]}' \
|
||||||
--alarm-action 'http://example.org/notify'
|
--alarm-action 'http://example.org/notify'
|
||||||
|
|
||||||
This creates an alarm that will fire when either one of two underlying
|
This creates an alarm that will fire when either one of two underlying
|
||||||
@ -239,6 +252,12 @@ is a webhook call. Any number of underlying alarms can be combined in
|
|||||||
this way, using either ``and`` or ``or``. Additionally, combinations
|
this way, using either ``and`` or ``or``. Additionally, combinations
|
||||||
can contain nested conditions:
|
can contain nested conditions:
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Observe the *underscore in* ``resource_id`` & ``resource_type`` in
|
||||||
|
composite rule as opposed to ``--resource-id`` &
|
||||||
|
``--resource-type`` CLI arguments.
|
||||||
|
|
||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
$ aodh alarm create --name meta --type composite \
|
$ aodh alarm create --name meta --type composite \
|
||||||
@ -246,6 +265,76 @@ can contain nested conditions:
|
|||||||
--alarm-action 'http://example.org/notify'
|
--alarm-action 'http://example.org/notify'
|
||||||
|
|
||||||
|
|
||||||
|
Event based alarm
|
||||||
|
`````````````````
|
||||||
|
|
||||||
|
An example of creating a event alarm based on power state of
|
||||||
|
instance:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ aodh alarm create \
|
||||||
|
--type event \
|
||||||
|
--name instance_off \
|
||||||
|
--description 'Instance powered OFF' \
|
||||||
|
--event-type "compute.instance.power_off.*" \
|
||||||
|
--enable True \
|
||||||
|
--query "traits.instance_id=string::INSTANCE_ID" \
|
||||||
|
--alarm-action 'log://' \
|
||||||
|
--ok-action 'log://' \
|
||||||
|
--insufficient-data-action 'log://'
|
||||||
|
|
||||||
|
Valid list of ``event-type`` and ``traits`` can be found in
|
||||||
|
``event_definitions.yaml`` file . ``--query`` may also contain mix of
|
||||||
|
traits for example to create alarm when instance is powered on but
|
||||||
|
went into error state:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ aodh alarm create \
|
||||||
|
--type event \
|
||||||
|
--name instance_on_but_in err_state \
|
||||||
|
--description 'Instance powered ON but in error state' \
|
||||||
|
--event-type "compute.instance.power_on.*" \
|
||||||
|
--enable True \
|
||||||
|
--query "traits.instance_id=string::INSTANCE_ID;traits.state=string::error" \
|
||||||
|
--alarm-action 'log://' \
|
||||||
|
--ok-action 'log://' \
|
||||||
|
--insufficient-data-action 'log://'
|
||||||
|
|
||||||
|
Sample output of alarm type **event**:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
+---------------------------+---------------------------------------------------------------+
|
||||||
|
| Field | Value |
|
||||||
|
+---------------------------+---------------------------------------------------------------+
|
||||||
|
| alarm_actions | [u'log://'] |
|
||||||
|
| alarm_id | 15c0da26-524d-40ad-8fba-3e55ee0ddc91 |
|
||||||
|
| description | Instance powered ON but in error state |
|
||||||
|
| enabled | True |
|
||||||
|
| event_type | compute.instance.power_on.* |
|
||||||
|
| insufficient_data_actions | [u'log://'] |
|
||||||
|
| name | instance_on_state_err |
|
||||||
|
| ok_actions | [u'log://'] |
|
||||||
|
| project_id | 9ee200732f4c4d10a6530bac746f1b6e |
|
||||||
|
| query | traits.instance_id = bb912729-fa51-443b-bac6-bf4c795f081d AND |
|
||||||
|
| | traits.state = error |
|
||||||
|
| repeat_actions | False |
|
||||||
|
| severity | low |
|
||||||
|
| state | insufficient data |
|
||||||
|
| state_timestamp | 2017-07-15T02:28:31.114455 |
|
||||||
|
| time_constraints | [] |
|
||||||
|
| timestamp | 2017-07-15T02:28:31.114455 |
|
||||||
|
| type | event |
|
||||||
|
| user_id | 89b4e48bcbdb4816add7800502bd5122 |
|
||||||
|
+---------------------------+---------------------------------------------------------------+
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
To enable event alarms please refer `Configuration
|
||||||
|
<https://docs.openstack.org/aodh/latest/contributor/event-alarm.html#configuration>`_
|
||||||
|
|
||||||
Alarm retrieval
|
Alarm retrieval
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
@ -341,3 +430,54 @@ or even deleted permanently (an irreversible step):
|
|||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|
||||||
$ aodh alarm delete ALARM_ID
|
$ aodh alarm delete ALARM_ID
|
||||||
|
|
||||||
|
Debug alarms
|
||||||
|
------------
|
||||||
|
|
||||||
|
A good place to start is to add ``--debug`` flag when creating or
|
||||||
|
updating an alarm. For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ aodh --debug alarm create <OTHER_PARAMS>
|
||||||
|
|
||||||
|
Look for the state to transition when event is triggered in
|
||||||
|
``/var/log/aodh/listener.log`` file. For example, the below logs shows
|
||||||
|
the transition state of alarm with id
|
||||||
|
``85a2942f-a2ec-4310-baea-d58f9db98654`` triggered by event id
|
||||||
|
``abe437a3-b75b-40b4-a3cb-26022a919f5e``
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
2017-07-15 07:03:20.149 2866 INFO aodh.evaluator [-] alarm 85a2942f-a2ec-4310-baea-d58f9db98654 transitioning to alarm because Event <id=abe437a3-b75b-40b4-a3cb-26022a919f5e,event_type=compute.instance.power_off.start> hits the query <query=[{"field": "traits.instance_id", "op": "eq", "type": "string", "value": "bb912729-fa51-443b-bac6-bf4c795f081d"}]>.
|
||||||
|
|
||||||
|
|
||||||
|
The below entry in ``/var/log/aodh/notifier.log`` also confirms that
|
||||||
|
event id ``abe437a3-b75b-40b4-a3cb-26022a919f5e`` hits the query
|
||||||
|
matching instance id ``bb912729-fa51-443b-bac6-bf4c795f081d``
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
2017-07-15 07:03:24.071 2863 INFO aodh.notifier.log [-] Notifying alarm instance_off 85a2942f-a2ec-4310-baea-d58f9db98654 of low priority from insufficient data to alarm with action log: because Event <id=abe437a3-b75b-40b4-a3cb-26022a919f5e,event_type=compute.instance.power_off.start> hits the query <query=[{"field": "traits.instance_id", "op": "eq", "type": "string", "value": "bb912729-fa51-443b-bac6-bf4c795f081d"}]>
|
||||||
|
|
||||||
|
|
||||||
|
``aodh alarm-history`` as mentioned earlier will also display the
|
||||||
|
transition:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ aodh alarm-history show 85a2942f-a2ec-4310-baea-d58f9db98654
|
||||||
|
+----------------------------+------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
|
||||||
|
| timestamp | type | detail | event_id |
|
||||||
|
+----------------------------+------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
|
||||||
|
| 2017-07-15T01:33:20.390623 | state transition | {"transition_reason": "Event <id=abe437a3-b75b-40b4-a3cb-26022a919f5e,event_type=compute.instance.power_off.start> hits | c5ca92ae-584b-4da6-a12c-b7a00dd39fef |
|
||||||
|
| | | the query <query=[{\"field\": \"traits.instance_id\", \"op\": \"eq\", \"type\": \"string\", \"value\": \"bb912729-fa51 | |
|
||||||
|
| | | -443b-bac6-bf4c795f081d\"}]>.", "state": "alarm"} | |
|
||||||
|
| 2017-07-15T01:31:14.516188 | creation | {"alarm_actions": ["log://"], "user_id": "89b4e48bcbdb4816add7800502bd5122", "name": "instance_off", "state": | fb31f4c2-e357-44c3-9b6a-bd2aaaa4ae68 |
|
||||||
|
| | | "insufficient data", "timestamp": "2017-07-15T01:31:14.516188", "description": "event_instance_power_off", "enabled": | |
|
||||||
|
| | | true, "state_timestamp": "2017-07-15T01:31:14.516188", "rule": {"query": [{"field": "traits.instance_id", "type": | |
|
||||||
|
| | | "string", "value": "bb912729-fa51-443b-bac6-bf4c795f081d", "op": "eq"}], "event_type": "compute.instance.power_off.*"}, | |
|
||||||
|
| | | "alarm_id": "85a2942f-a2ec-4310-baea-d58f9db98654", "time_constraints": [], "insufficient_data_actions": ["log://"], | |
|
||||||
|
| | | "repeat_actions": false, "ok_actions": ["log://"], "project_id": "9ee200732f4c4d10a6530bac746f1b6e", "type": "event", | |
|
||||||
|
| | | "severity": "low"} | |
|
||||||
|
+----------------------------+------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
|
||||||
|
@ -33,4 +33,4 @@ The following is a sample Aodh configuration for adaptation and use. It is
|
|||||||
auto-generated from Aodh when this documentation is built, and can also be
|
auto-generated from Aodh when this documentation is built, and can also be
|
||||||
viewed in `file form <_static/aodh.conf.sample>`_.
|
viewed in `file form <_static/aodh.conf.sample>`_.
|
||||||
|
|
||||||
.. literalinclude:: _static/aodh.conf.sample
|
.. literalinclude:: ../_static/aodh.conf.sample
|
||||||
|
@ -63,13 +63,14 @@ The following is an example of event alarm rule::
|
|||||||
Configuration
|
Configuration
|
||||||
=============
|
=============
|
||||||
|
|
||||||
To enable this functionality, config the Ceilometer to be able to publish
|
To enable this functionality, config the Ceilometer to be able to
|
||||||
events to the queue the aodh-listener service listen on. The
|
publish events to the queue the aodh-listener service listen on. The
|
||||||
*event_alarm_topic* config option of Aodh identify which messaging topic the
|
*event_alarm_topic* config option of Aodh identify which messaging
|
||||||
aodh-listener on, the default value is "alarm.all". In Ceilometer side,
|
topic the aodh-listener on, the default value is "alarm.all". In
|
||||||
a publisher of notifier type need to be configured in the event pipeline config
|
Ceilometer side, a publisher of notifier type need to be configured in
|
||||||
file(event_pipeline.yaml as default), the notifier should be with a messaging
|
the event pipeline config file(``event_pipeline.yaml`` as default),
|
||||||
topic same as the *event_alarm_topic* option defined. For an example::
|
the notifier should be with a messaging topic same as the
|
||||||
|
*event_alarm_topic* option defined. For an example::
|
||||||
|
|
||||||
---
|
---
|
||||||
sources:
|
sources:
|
||||||
|
Loading…
Reference in New Issue
Block a user