13 KiB
Alarm monitoring framework
This document describes how to use alarm-based monitoring driver in Apmec.
Sample TOSCA with monitoring policy
The following example shows monitoring policy using TOSCA template. The target (VDU1) of the monitoring policy in this example need to be described firstly like other TOSCA templates in Apmec.
policies:
- vdu1_cpu_usage_monitoring_policy:
type: tosca.policies.apmec.Alarming
triggers:
resize_compute:
event_type:
type: tosca.events.resource.utilization
implementation: ceilometer
metrics: cpu_util
condition:
threshold: 50
constraint: utilization greater_than 50%
period: 65
evaluations: 1
method: avg
comparison_operator: gt
actions: [respawn]
Alarm framework already supported the some default backend actions like scaling, respawn, log, and log_and_kill.
Apmec users could change the desired action as described in the above example. Until now, the backend actions could be pointed to the specific policy which is also described in TOSCA template like scaling policy. The integration between alarming monitoring and scaling was also supported by Alarm monitor in Apmec:
tosca_definitions_version: tosca_simple_profile_for_mec_1_0_0
description: Demo example
metadata:
template_name: sample-tosca-mead
topology_template:
node_templates:
VDU1:
type: tosca.nodes.mec.VDU.Apmec
capabilities:
mec_compute:
properties:
disk_size: 1 GB
mem_size: 512 MB
num_cpus: 2
properties:
image: cirros-0.3.5-x86_64-disk
mgmt_driver: noop
availability_zone: nova
metadata: {metering.mea: SG1}
CP1:
type: tosca.nodes.mec.CP.Apmec
properties:
management: true
anti_spoofing_protection: false
requirements:
- virtualLink:
node: VL1
- virtualBinding:
node: VDU1
VDU2:
type: tosca.nodes.mec.VDU.Apmec
capabilities:
mec_compute:
properties:
disk_size: 1 GB
mem_size: 512 MB
num_cpus: 2
properties:
image: cirros-0.3.5-x86_64-disk
mgmt_driver: noop
availability_zone: nova
metadata: {metering.mea: SG1}
CP2:
type: tosca.nodes.mec.CP.Apmec
properties:
management: true
anti_spoofing_protection: false
requirements:
- virtualLink:
node: VL1
- virtualBinding:
node: VDU2
VL1:
type: tosca.nodes.mec.VL
properties:
network_name: net_mgmt
vendor: Apmec
policies:
- SP1:
type: tosca.policies.apmec.Scaling
properties:
increment: 1
cooldown: 120
min_instances: 1
max_instances: 3
default_instances: 2
targets: [VDU1,VDU2]
- vdu_cpu_usage_monitoring_policy:
type: tosca.policies.apmec.Alarming
triggers:
vdu_hcpu_usage_scaling_out:
event_type:
type: tosca.events.resource.utilization
implementation: ceilometer
metrics: cpu_util
condition:
threshold: 50
constraint: utilization greater_than 50%
period: 600
evaluations: 1
method: avg
comparison_operator: gt
metadata: SG1
actions: [SP1]
vdu_lcpu_usage_scaling_in:
targets: [VDU1, VDU2]
event_type:
type: tosca.events.resource.utilization
implementation: ceilometer
metrics: cpu_util
condition:
threshold: 10
constraint: utilization less_than 10%
period: 600
evaluations: 1
method: avg
comparison_operator: lt
metadata: SG1
actions: [SP1]
NOTE: metadata defined in VDU properties must be matched with metadata in monitoring policy
How to setup environment
If OpenStack Devstack is used to test alarm monitoring in Apmec, OpenStack Ceilometer and Aodh plugins will need to be enabled in local.conf:
enable_plugin ceilometer https://git.openstack.org/openstack/ceilometer master
enable_plugin aodh https://git.openstack.org/openstack/aodh master
How to monitor MEAs via alarm triggers
How to setup alarm configuration
Firstly, mead and mea need to be created successfully using pre-defined TOSCA template for alarm monitoring. Then, in order to know whether alarm configuration defined in Apmec is successfully passed to Ceilometer, Apmec users could use CLI:
$aodh alarm list
+--------------------------------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------+----------+---------+
| alarm_id | type | name | state | severity | enabled |
+--------------------------------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------+----------+---------+
| 6f2336b9-e0a2-4e33-88be-bc036192b42b | threshold | apmec.mem.infra_drivers.openstack.openstack_OpenStack-a0f60b00-ad3d-4769-92ef-e8d9518da2c8-vdu_lcpu_scaling_in-smgctfnc3ql5 | insufficient data | low | True |
| e049f0d3-09a8-46c0-9b88-e61f1f524aab | threshold | apmec.mem.infra_drivers.openstack.openstack_OpenStack-a0f60b00-ad3d-4769-92ef-e8d9518da2c8-vdu_hcpu_usage_scaling_out-lubylov5g6xb | insufficient data | low | True |
+--------------------------------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------+----------+---------+
$aodh alarm show 6f2336b9-e0a2-4e33-88be-bc036192b42b
+---------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| alarm_actions | [u'http://pinedcn:9896/v1.0/meas/a0f60b00-ad3d-4769-92ef-e8d9518da2c8/vdu_lcpu_scaling_in/SP1-in/yl7kh5qd'] |
| alarm_id | 6f2336b9-e0a2-4e33-88be-bc036192b42b |
| comparison_operator | lt |
| description | utilization less_than 10% |
| enabled | True |
| evaluation_periods | 1 |
| exclude_outliers | False |
| insufficient_data_actions | None |
| meter_name | cpu_util |
| name | apmec.mem.infra_drivers.openstack.openstack_OpenStack-a0f60b00-ad3d-4769-92ef-e8d9518da2c8-vdu_lcpu_scaling_in-smgctfnc3ql5 |
| ok_actions | None |
| period | 600 |
| project_id | 3db801789c9e4b61b14ce448c9e7fb6d |
| query | metadata.user_metadata.mea_id = a0f60b00-ad3d-4769-92ef-e8d9518da2c8 |
| repeat_actions | True |
| severity | low |
| state | insufficient data |
| state_timestamp | 2016-11-16T18:39:30.134954 |
| statistic | avg |
| threshold | 10.0 |
| time_constraints | [] |
| timestamp | 2016-11-16T18:39:30.134954 |
| type | threshold |
| user_id | a783e8a94768484fb9a43af03c6426cb |
+---------------------------+-------------------------------------------------------------------------------------------------------------------------------+
How to trigger alarms:
As shown in the above Ceilometer command, alarm state is shown as "insufficient data". Alarm is triggered by Ceilometer once alarm state changes to "alarm". To make MEA instance reach to the pre-defined threshold, some simple scripts could be used.
Note: Because Ceilometer pipeline set the default interval to 600s (10 mins), in order to reduce this interval, users could edit "interval" value in /etc/ceilometer/pipeline.yaml file and then restart Ceilometer service.
Another way could be used to check if backend action is handled well in Apmec:
curl -H "Content-Type: application/json" -X POST -d '{"alarm_id": "35a80852-e24f-46ed-bd34-e2f831d00172", "current": "alarm"}' http://pinedcn:9896/v1.0/meas/a0f60b00-ad3d-4769-92ef-e8d9518da2c8/vdu_lcpu_scaling_in/SP1-in/yl7kh5qd
Then, users can check Horizon to know if mea is respawned. Please note that the url used in the above command could be captured from "ceilometer alarm-show command as shown before. "key" attribute in body request need to be captured from the url. The reason is that key will be authenticated so that the url is requested only one time.