Add LMA test results.
- usage testing - reliability testing Co-Authored-By: Swann Croiset <scroiset@mirantis.com> Change-Id: I66de27d4a9911efd86032cf6c46572e596e48989
@ -24,3 +24,4 @@ Test Results
|
||||
reliability/index
|
||||
control_plane/main
|
||||
controlplane_density/index
|
||||
monitoring/index
|
||||
|
12
doc/source/test_results/monitoring/index.rst
Normal file
@ -0,0 +1,12 @@
|
||||
.. raw:: pdf
|
||||
|
||||
PageBreak oneColumn
|
||||
|
||||
================================
|
||||
Monitoring systems test results
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
lma/index
|
640
doc/source/test_results/monitoring/lma/index.rst
Normal file
@ -0,0 +1,640 @@
|
||||
|
||||
.. _LMA_test_results:
|
||||
|
||||
****************
|
||||
LMA Test Results
|
||||
****************
|
||||
|
||||
:Abstract:
|
||||
|
||||
This document includes results of measuring how many resources LMA service
|
||||
needs as a monitoring service during using on a big environment (~200 nodes).
|
||||
This document includes results of reliability testing of `LMA`_ services.
|
||||
|
||||
|
||||
Environment description
|
||||
=======================
|
||||
Hardware configuration of each server
|
||||
-------------------------------------
|
||||
|
||||
.. table:: Description of servers hardware
|
||||
|
||||
+-------+----------------+------------------------+------------------------+
|
||||
|role |role |OpenStackController |OpenStackCompute and LMA|
|
||||
+-------+----------------+------------------------+------------------------+
|
||||
|CPU |core_count (+HT)|40 |12 |
|
||||
| +----------------+------------------------+------------------------+
|
||||
| |frequency_MHz |2300 |2100 |
|
||||
+-------+----------------+------------------------+------------------------+
|
||||
|RAM |amount_MB |262144 |32768 |
|
||||
+-------+----------------+------------------------+------------------------+
|
||||
|Disk1 |amount_GB |111.8 |75 |
|
||||
+ +----------------+------------------------+------------------------+
|
||||
| |SSD/HDD |SSD |SSD |
|
||||
+-------+----------------+------------------------+------------------------+
|
||||
|Disk2 |amount_GB |111.8 |1000 |
|
||||
+ +----------------+------------------------+------------------------+
|
||||
| |SSD/HDD |SSD |HDD |
|
||||
+-------+----------------+------------------------+------------------------+
|
||||
|Disk3 |amount_GB |1800 |- |
|
||||
+ +----------------+------------------------+------------------------+
|
||||
| |SSD/HDD |HDD |- |
|
||||
+-------+----------------+------------------------+------------------------+
|
||||
|Disk4 |amount_GB |1800 |- |
|
||||
+ +----------------+------------------------+------------------------+
|
||||
| |SSD/HDD |HDD |- |
|
||||
+-------+----------------+------------------------+------------------------+
|
||||
|
||||
Software configuration of the services
|
||||
--------------------------------------
|
||||
Installation of OpenStack and LMA plugins:
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
OpenStack has been installed using Fuel version 8.0 and fuel plugins:
|
||||
3 controllers, 193 computes (20 OSD), 3 Elasticsearch, 3 InfluxDB, 1 Nagios
|
||||
|
||||
.. table:: Versions of some software
|
||||
|
||||
+--------------------------------+------------+
|
||||
|Software |Version |
|
||||
+================================+============+
|
||||
|Fuel |8.0 |
|
||||
+--------------------------------+------------+
|
||||
|fuel-plugin-lma-collector |0.9 |
|
||||
+--------------------------------+------------+
|
||||
|fuel-plugin-elasticsearch-kibana|0.9 |
|
||||
+--------------------------------+------------+
|
||||
|fuel-plugin-influxdb-grafana |0.9 |
|
||||
+--------------------------------+------------+
|
||||
|
||||
Testing process
|
||||
===============
|
||||
1.
|
||||
Fuel 8.0, LMA plugins and OpenStack have been installed installed.
|
||||
|
||||
2.
|
||||
Rally tests have been performed two times. Results are here:
|
||||
:download:`rally_report_1.html <./rally_report_1.html>`
|
||||
:download:`rally_report_2.html <./rally_report_2.html>`
|
||||
3. Metrics (cpu, memory, I/O) have been collected using collectd
|
||||
4. Disbale InfluxDB services in haproxy to prevent Heka to send metrics to
|
||||
InfluxDB. The outage time should be equal to 3 hours
|
||||
5. Enable InfluxDB services in haproxy backends and measure how many resources
|
||||
and time InfluxDB needs to get all statistic from Heka after outage.
|
||||
6. Disbale Elasticsearch services in haproxy to prevent Heka to send metrics to
|
||||
Elasticsearch. The outage time should be equal to 3 hours
|
||||
7. Enable Elasticsearch services in haproxy backends and measure how many
|
||||
resources and time Elasticsearch needs to get all statistic from Heka after
|
||||
outage.
|
||||
|
||||
Usage Results
|
||||
=============
|
||||
|
||||
Collector: Hekad / collectd
|
||||
---------------------------
|
||||
The following table describe how many resources was used by Hekad and Collectd
|
||||
during the test in depend on OpenStack role:
|
||||
|
||||
.. table::
|
||||
CPU, Memory and Disk consumption in depend on OpenStack role
|
||||
|
||||
+------------------------+----------------+----------------+----------------+
|
||||
| role |CPU |Memory |I/O per second |
|
||||
| |(hekad/collectd)|(hekad/collectd)|(hekad/collectd)|
|
||||
+========================+================+================+================+
|
||||
| controller | 0.7 cpu | 223 MB |730 KB write |
|
||||
| | | | |
|
||||
| | 0.13 cpu | 45 MB |730 KB read |
|
||||
| | | | |
|
||||
| | | |0 KB write |
|
||||
| | | | |
|
||||
| | | |250 KB read |
|
||||
+------------------------+----------------+----------------+----------------+
|
||||
|| Controller without | 0.4 cpu |no impact |220 KB write |
|
||||
|| RabbitMQ queues | | | |
|
||||
|| metrics (~4500 queues)| | | |
|
||||
|| `1549721`_ | 0.06 cpu | |280 KB read |
|
||||
| | | | |
|
||||
| | | |0 KB write |
|
||||
| | | | |
|
||||
| | | |250 KB read |
|
||||
+------------------------+----------------+----------------+----------------+
|
||||
| aggregator | 0.9 cpu | 285 MB |830 KB write |
|
||||
| | | | |
|
||||
| | 0.13 cpu | 50 MB |830 KB read |
|
||||
| | | | |
|
||||
| | | |0 KB write |
|
||||
| | | | |
|
||||
| | | |247 KB read |
|
||||
+------------------------+----------------+----------------+----------------+
|
||||
| compute | 0.2 cpu | 145 MB |15 KB write |
|
||||
| | | | |
|
||||
| | 0.02 cpu | 6.1 MB |40 KB read |
|
||||
| | | | |
|
||||
| | | |0 KB write |
|
||||
| | | | |
|
||||
| | | |22 KB read |
|
||||
+------------------------+----------------+----------------+----------------+
|
||||
| compute/osd | 0.25 cpu | 154 MB |15 KB write |
|
||||
| | | | |
|
||||
| | 0.02 cpu | 13 MB |40 KB read |
|
||||
| | | | |
|
||||
| | | |0 KB write |
|
||||
| | | | |
|
||||
| | | |23 KB read |
|
||||
+------------------------+----------------+----------------+----------------+
|
||||
|
||||
Influxdb
|
||||
--------
|
||||
|
||||
InfluxDB consumes manageable amount of CPU (more information in the table
|
||||
below). The compaction operation is performed regularly which produces spike of
|
||||
resource consumption (every ~ 6 minutes with the actual load of
|
||||
200 nodes / 1000 VMs):
|
||||
|
||||
|image0|
|
||||
|
||||
The average write operation duration is 3ms (SSD drive)
|
||||
|
||||
+-------------------------+-----------------+--------+-------+-----------------+
|
||||
| Conditions | write/s | cpu | memory| I/O |
|
||||
| | |(normal |(normal|(normal/ |
|
||||
| | |/spike) |/spike)|spike) |
|
||||
+=========================+=================+========+=======+=================+
|
||||
| normal |111 HTTP writes/s|0.38 cpu|1.2GB |1.3MB(r)/1.7MB(w)|
|
||||
| | | | | |
|
||||
| |(37 w/s per node)|2 cpu |2.3GB |1.5MB(r)/7.3MB(w)|
|
||||
+-------------------------+-----------------+--------+-------+-----------------+
|
||||
|| Controller without |75 HTTP writes/s |0.3 cpu |1.2GB |930KB(r)/1MB(w) |
|
||||
|| RabbitMQ queues |(25 w/s per node)| | | |
|
||||
|| metrics (~4500 queues) | | | | |
|
||||
|| `1549721`_ |(-30% w/o |1.9 cpu |2.2GB |1.5MB(r)/7.3MB(w)|
|
||||
|| |rabbitmq queues) | | | |
|
||||
+-------------------------+-----------------+--------+-------+-----------------+
|
||||
| w/o rabbitMQ | 93 HTTP writes/s|0.5 cpu |1.5 GB |1MB(r)/1.4MB(w) |
|
||||
| |(31 w/s per node)| | | |
|
||||
| | | | | |
|
||||
| and 1000 VMs | (0,018 w/s/vm) |2.5 cpu |2 GB |1.2MB(r)/6.6MB(w)|
|
||||
+-------------------------+-----------------+--------+-------+-----------------+
|
||||
|
||||
Disk space usage evolution with 1000 VMs:
|
||||
|
||||
~125 MB / hour
|
||||
|
||||
~3 GB / day
|
||||
|
||||
~90 GB / month
|
||||
|
||||
|image1|
|
||||
|
||||
Elasticsearch
|
||||
-------------
|
||||
|
||||
The bulk operations takes ~80 ms (mean) on SATA disk (this is the mean
|
||||
response time from HAProxy log).
|
||||
|
||||
The CPU usage depends on the REST API activity (see the extra load in
|
||||
the graph below) and also seems to depends on the current index size
|
||||
(CPU utilization increases proportionally while the load is constant):
|
||||
|
||||
|image2|
|
||||
|
||||
|image3|
|
||||
|
||||
Disk space usage evolution with a constant API solicitation (eg, while
|
||||
true; nova\|cinder\|neutron list); done) and 1000 VMs spawned:
|
||||
|
||||
~670 MB / hour
|
||||
|
||||
~16 GB / day
|
||||
|
||||
~500 GB / month
|
||||
|
||||
|image4|
|
||||
|
||||
All RabbitMQ queues collection impact
|
||||
-------------------------------------
|
||||
|
||||
The collection of all RabbitMQ queue metrics has a significant impact
|
||||
on Heka and Collectd CPU utilization and obviously on the InfluxDB load
|
||||
(HTTP request per second)
|
||||
|
||||
Heka
|
||||
|
||||
|image5|
|
||||
|
||||
Collectd
|
||||
|
||||
|image6|
|
||||
|
||||
InfluxDB
|
||||
|
||||
|image7|
|
||||
|
||||
Reliability Results
|
||||
===================
|
||||
Backends outage for 2 hours
|
||||
---------------------------
|
||||
|
||||
InfluxDB
|
||||
~~~~~~~~
|
||||
|
||||
After a complete InfluxDB cluster downtime (simulated by a HAProxy
|
||||
shutdown) the cluster is capable to take over all metrics accumulated by
|
||||
Heka instances in less than 10 minutes, here is the spike of resource
|
||||
consumption per node.
|
||||
|
||||
+-------------------+------------------------------+--------+-------+---------+
|
||||
|Conditions |write/s |cpu |memory | I/O |
|
||||
+===================+==============================+========+=======+=========+
|
||||
|| take over 3 hours|| ~900 w/s || 6.1cpu|| 4.8GB|| 22MB(r)|
|
||||
|| of metrics || total of 2700 HTTP writes/s || || || 25MB(w)|
|
||||
+-------------------+------------------------------+--------+-------+---------+
|
||||
|
||||
|image8|\ fuel nodes
|
||||
|
||||
|image9|
|
||||
|
||||
|image10|
|
||||
|
||||
|image11|
|
||||
|
||||
Data loss
|
||||
^^^^^^^^^
|
||||
|
||||
A window of less than 40 minutes of metrics are lost on controllers.
|
||||
|
||||
Other node roles have no data loss because they have much less metrics
|
||||
collected than controllers. Hence, the heka buffer size (1GB) for
|
||||
influxdb queue is filled within ~1h20.
|
||||
|
||||
This retention period can be increased drastically by avoiding to
|
||||
collect all the rabbitmq queues metrics.
|
||||
|
||||
The following examples show both controller and compute/osd CPU metric.
|
||||
The 2 first annotations indicate the downtime (InfluxDB and
|
||||
Elasticsearch) while the 2 last annotations indicate the recovery
|
||||
status.
|
||||
|
||||
On controller node the CPU metric is lost from 18h52 to 19h29 while the
|
||||
InfluxDB outage ran from ~17h30 to 19h30:
|
||||
|
||||
|image12|
|
||||
|
||||
A role with osd/compute roles didn’t lose metrics:
|
||||
|
||||
|image13|
|
||||
|
||||
Elasticsearch
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
After a complete ES cluster downtime (simulated by an HAProxy shutdown)
|
||||
the cluster is capable to take over all logs accumulated by Hekad
|
||||
instances in less than 10 minutes, here the spike resource consumption
|
||||
per node
|
||||
|
||||
+-------------------+-----------+-------+-----------------------+------------+
|
||||
|Conditions |HTTP bulk |cpu |memory |I/O |
|
||||
| |request/s | | | |
|
||||
| | | |(normal/spike) |normal/spike|
|
||||
+===================+===========+=======+=======================+============+
|
||||
|| take over 3 hours|| 680 req/s|| 4 cpu|| 16GB (jvm fixed size)|| 26 MB (r) |
|
||||
|| of logs || || || || 25 MB (w) |
|
||||
+-------------------+-----------+-------+-----------------------+------------+
|
||||
|
||||
CPU utilization:
|
||||
|
||||
|image14|
|
||||
|
||||
I/O
|
||||
|
||||
|image15|
|
||||
|
||||
Data lost
|
||||
^^^^^^^^^
|
||||
|
||||
We lost some logs (and maybe notification) since heka log has a bunch of
|
||||
“queue is full”
|
||||
|
||||
Apache2/Nagios3
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
Apache is flooded and never recover the load
|
||||
|
||||
Elasticsearch failover/recovery
|
||||
-------------------------------
|
||||
|
||||
One ES node down
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The cluster is detected as WARNING (cannot honor the number of replicas)
|
||||
but there is no downtime observed and no data lost since the cluster
|
||||
accepts data.
|
||||
|
||||
.. code::
|
||||
|
||||
root@node-47:~# curl 192.168.0.4:9200/\_cluster/health?pretty
|
||||
|
||||
{
|
||||
|
||||
"cluster\_name" : "lma",
|
||||
|
||||
**"status" : "yellow",**
|
||||
|
||||
"timed\_out" : false,
|
||||
|
||||
"number\_of\_nodes" : 2,
|
||||
|
||||
"number\_of\_data\_nodes" : 2,
|
||||
|
||||
"active\_primary\_shards" : 25,
|
||||
|
||||
"active\_shards" : 50,
|
||||
|
||||
"relocating\_shards" : 0,
|
||||
|
||||
"initializing\_shards" : 0,
|
||||
|
||||
"unassigned\_shards" : 20,
|
||||
|
||||
"delayed\_unassigned\_shards" : 0,
|
||||
|
||||
"number\_of\_pending\_tasks" : 0,
|
||||
|
||||
"number\_of\_in\_flight\_fetch" : 0
|
||||
|
||||
}
|
||||
|
||||
root@node-47:~# curl 192.168.0.4:9200/\_cat/indices?v
|
||||
|
||||
health status index pri rep docs.count docs.deleted store.size
|
||||
pri.store.size
|
||||
|
||||
green open kibana-int 5 1 2 0 52.1kb 26.1kb
|
||||
|
||||
yellow open log-2016.03.08 5 2 5457994 0 2.1gb 1gb
|
||||
|
||||
yellow open log-2016.03.07 5 2 10176926 0 3.7gb 1.8gb
|
||||
|
||||
yellow open notification-2016.03.08 5 2 1786 0 3.5mb 1.9mb
|
||||
|
||||
yellow open notification-2016.03.07 5 2 2103 0 3.7mb 1.8mb
|
||||
|
||||
|image16|
|
||||
|
||||
|image17|
|
||||
|
||||
|image18|
|
||||
|
||||
|image19|
|
||||
|
||||
|image20|
|
||||
|
||||
2 ES down
|
||||
~~~~~~~~~
|
||||
|
||||
The cluster is unavailable, all heka buffersize data until recovery.
|
||||
|
||||
root@node-47:~# curl 192.168.0.4:9200/\_cluster/health?pretty
|
||||
|
||||
{
|
||||
|
||||
"error" : "MasterNotDiscoveredException[waited for [30s]]",
|
||||
|
||||
"status" : 503
|
||||
|
||||
}
|
||||
|
||||
*ES logs*
|
||||
|
||||
[2016-03-08 09:48:10,758][INFO ][cluster.service ]
|
||||
[node-47.domain.tld\_es-01] removed
|
||||
{[node-153.domain.tld\_es-01][bIVAau9SRc-K3lomVAe1\_A][node-153.domain.tld][inet[/192.168.0.163:9
|
||||
|
||||
300]]{master=true},}, reason: zen-disco-receive(from master
|
||||
[[node-204.domain.tld\_es-01][SLMBNAvcRt6DWQdNvFE4Yw][node-204.domain.tld][inet[/192.168.0.138:9300]]{master=true}])
|
||||
|
||||
[2016-03-08 09:48:12,375][INFO ][discovery.zen ]
|
||||
[node-47.domain.tld\_es-01] master\_left
|
||||
[[node-204.domain.tld\_es-01][SLMBNAvcRt6DWQdNvFE4Yw][node-204.domain.tld][inet[/192.168.0.1
|
||||
|
||||
38:9300]]{master=true}], reason [transport disconnected]
|
||||
|
||||
[2016-03-08 09:48:12,375][WARN ][discovery.zen ]
|
||||
[node-47.domain.tld\_es-01] master left (reason = transport
|
||||
disconnected), current nodes: {[node-47.domain.tld\_es-01][l-UXgVBgSze7g
|
||||
|
||||
twc6Lt\_yw][node-47.domain.tld][inet[/192.168.0.108:9300]]{master=true},}
|
||||
|
||||
[2016-03-08 09:48:12,375][INFO ][cluster.service ]
|
||||
[node-47.domain.tld\_es-01] removed
|
||||
{[node-204.domain.tld\_es-01][SLMBNAvcRt6DWQdNvFE4Yw][node-204.domain.tld][inet[/192.168.0.138:9
|
||||
|
||||
300]]{master=true},}, reason: zen-disco-master\_failed
|
||||
([node-204.domain.tld\_es-01][SLMBNAvcRt6DWQdNvFE4Yw][node-204.domain.tld][inet[/192.168.0.138:9300]]{master=true})
|
||||
|
||||
[2016-03-08 09:48:21,385][DEBUG][action.admin.cluster.health]
|
||||
[node-47.domain.tld\_es-01] no known master node, scheduling a retry
|
||||
|
||||
[2016-03-08 09:48:32,482][DEBUG][action.admin.indices.get ]
|
||||
[node-47.domain.tld\_es-01] no known master node, scheduling a retry
|
||||
|
||||
*LMA collector logs:*
|
||||
|
||||
2016/03/08 09:54:00 Plugin 'elasticsearch\_output' error: HTTP response
|
||||
error. Status: 503 Service Unavailable. Body:
|
||||
{"error":"ClusterBlockException[blocked by: [SERVICE\_UNAVAILABLE/2/no
|
||||
master];]","status":503}
|
||||
|
||||
InfluxDB failover/recovery
|
||||
--------------------------
|
||||
|
||||
1 InfluxDB node is down
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
no downtime
|
||||
|
||||
⅔ nodes are down:
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
One node is in a bad shape (missing data during and after the outage!)
|
||||
|
||||
This is not supported
|
||||
|
||||
Apache2 overloaded
|
||||
------------------
|
||||
|
||||
.. note::
|
||||
The issue described in this section has been resolved in 0.10 version. You
|
||||
can read more here
|
||||
https://blueprints.launchpad.net/lma-toolchain/+spec/scalable-nagios-api
|
||||
|
||||
All nodes push AFD status to Nagios through the CGI script. This
|
||||
represent 110 request/s
|
||||
|
||||
The server cannot handle the load:
|
||||
|
||||
100% CPU (12), load average 190, 125 process fork/s
|
||||
|
||||
The CGI script is definitively not scalable.
|
||||
|
||||
|image21|
|
||||
|
||||
When increasing the AFD interval from 10 to 20 seconds on all nodes and
|
||||
purging the heka output queue buffer, the load is maintainable by node
|
||||
(90 forks / second):
|
||||
|
||||
|image22|
|
||||
|
||||
|image23|
|
||||
|
||||
Outcomes
|
||||
========
|
||||
InfluxDB
|
||||
--------
|
||||
InfluxDB worked correctly only with SSD drives. With SATA drives, it was unable
|
||||
to cope with the data generated by 200 nodes.
|
||||
|
||||
Supported scale-up operations: 1 node -> 3 nodes.
|
||||
|
||||
Failover mode: a cluster of 3 nodes supports the loss of 1 node.
|
||||
|
||||
Deployment size <= 200 nodes
|
||||
4 cpu
|
||||
4 GB RAM
|
||||
SSD drive
|
||||
100 GB is required for retention of 30 days
|
||||
|
||||
Elasticsearch
|
||||
-------------
|
||||
Elasticsearch can handle the load with a dedicated SATA disk, using SSD drives
|
||||
is obviously a better choice but not mandatory.
|
||||
|
||||
Supported scale-up operations: 1 node -> 3 nodes
|
||||
|
||||
Failover mode: a cluster of 3 nodes survives after the loss of 1 node. It can
|
||||
also support the loss of 2 nodes with downtime (when using the default
|
||||
configuration of number_of_replicas).
|
||||
|
||||
.. note::
|
||||
When OpenStack services are configured with DEBUG log level and
|
||||
relatively high load on the cluster (several API calls for some time) could lead
|
||||
to fill up the Heka buffers.
|
||||
|
||||
Sizing guide
|
||||
------------
|
||||
|
||||
These guidelines apply for an environment configured to log at the INFO level.
|
||||
They take info account a high rate of API calls. Using the DEBUG log level
|
||||
implies much more resource consumption in terms of disk space (~ x5) and
|
||||
CPU/Memory (~ x2).
|
||||
|
||||
Deployment size <= 200 nodes
|
||||
4 CPU
|
||||
8 GB RAM
|
||||
SSD or SATA drive
|
||||
500 GB is required for retention of 30 days
|
||||
|
||||
Apache2/Nagios3
|
||||
|
||||
.. note::
|
||||
The following issue has been resolved in 0.10 version. Therefore you don't
|
||||
need to apply the workaround described bellow.
|
||||
|
||||
The default configuration doesn’t allow to handle the load of 200 nodes: the
|
||||
CGI script introduces a bottleneck. The recommendation for 0.9.0 is not to
|
||||
deploy the lma_infrastructure_alerting plugin for an environment with more than
|
||||
50 nodes. With 200 nodes, it required at least 7 cores to handle the incoming
|
||||
requests.
|
||||
|
||||
In the current state, the recommendation to be able to handle 200 nodes is to
|
||||
perform this operation after the initial deployment:
|
||||
|
||||
- increase all AFD filters interval from 10s to 20s
|
||||
|
||||
- decrease all Nagios outputs buffering size to 500KB, to limit the flooding at
|
||||
startup time
|
||||
|
||||
- stop lma_collector on all nodes
|
||||
|
||||
- remove the heka queue buffer (rm -rf /var/log/lma_collector/nagios_output)
|
||||
|
||||
- restart lma_collector on all nodes
|
||||
|
||||
Issues which have been found during the tests
|
||||
=============================================
|
||||
|
||||
.. table:: Issues which have been found during the tests
|
||||
|
||||
+---------------------------------------------------------------+------------+
|
||||
|Issue description | Link |
|
||||
+===============================================================+============+
|
||||
|| Kibana dashboards unavailable after an ElasticSearch scale up| `1552258`_ |
|
||||
|| from 1 to 3 nodes | |
|
||||
+---------------------------------------------------------------+------------+
|
||||
|| Reduce the monitoring scope of Rabbitmq queues | `1549721`_ |
|
||||
+---------------------------------------------------------------+------------+
|
||||
|| Nova collectd plugin timeout with a lot of instances | `1554502`_ |
|
||||
+---------------------------------------------------------------+------------+
|
||||
|| Apache doesn't handle the load to process passive checks with| `1552772`_ |
|
||||
|| 200 nodes | |
|
||||
+---------------------------------------------------------------+------------+
|
||||
|| InfluxDB crash while scaling up from 1 to 2 nodes | `1552191`_ |
|
||||
+---------------------------------------------------------------+------------+
|
||||
|
||||
.. references:
|
||||
|
||||
.. _LMA: http://fuel-plugin-lma-collector.readthedocs.io/en/latest/intro.html
|
||||
.. _1549721: https://bugs.launchpad.net/lma-toolchain/+bug/1549721
|
||||
.. _1552258: https://bugs.launchpad.net/lma-toolchain/+bug/1552258
|
||||
.. _1554502: https://bugs.launchpad.net/lma-toolchain/+bug/1554502
|
||||
.. _1552772: https://bugs.launchpad.net/lma-toolchain/+bug/1552772
|
||||
.. _1552191: https://bugs.launchpad.net/lma-toolchain/+bug/1552191
|
||||
|
||||
.. |image0| image:: media/image25.png
|
||||
:scale: 50
|
||||
.. |image1| image:: media/image16.png
|
||||
:scale: 40
|
||||
.. |image2| image:: media/image39.png
|
||||
:scale: 40
|
||||
.. |image3| image:: media/image30.png
|
||||
:scale: 40
|
||||
.. |image4| image:: media/image10.png
|
||||
:scale: 40
|
||||
.. |image5| image:: media/image41.png
|
||||
:scale: 40
|
||||
.. |image6| image:: media/image13.png
|
||||
:scale: 40
|
||||
.. |image7| image:: media/image20.png
|
||||
:scale: 40
|
||||
.. |image8| image:: media/image46.png
|
||||
:scale: 40
|
||||
.. |image9| image:: media/image45.png
|
||||
:scale: 40
|
||||
.. |image10| image:: media/image38.png
|
||||
:scale: 40
|
||||
.. |image11| image:: media/image21.png
|
||||
:scale: 40
|
||||
.. |image12| image:: media/image19.png
|
||||
:scale: 40
|
||||
.. |image13| image:: media/image47.png
|
||||
:scale: 40
|
||||
.. |image14| image:: media/image40.png
|
||||
:scale: 40
|
||||
.. |image15| image:: media/image27.png
|
||||
:scale: 40
|
||||
.. |image16| image:: media/image42.png
|
||||
:scale: 40
|
||||
.. |image17| image:: media/image44.png
|
||||
:scale: 40
|
||||
.. |image18| image:: media/image14.png
|
||||
:scale: 40
|
||||
.. |image19| image:: media/image37.png
|
||||
:scale: 40
|
||||
.. |image20| image:: media/image02.png
|
||||
:scale: 50
|
||||
.. |image21| image:: media/image43.png
|
||||
:scale: 40
|
||||
.. |image22| image:: media/image23.png
|
||||
:scale: 40
|
||||
.. |image23| image:: media/image17.png
|
||||
:scale: 40
|
BIN
doc/source/test_results/monitoring/lma/media/image02.png
Normal file
After Width: | Height: | Size: 24 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image10.png
Normal file
After Width: | Height: | Size: 67 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image13.png
Normal file
After Width: | Height: | Size: 68 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image14.png
Normal file
After Width: | Height: | Size: 79 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image16.png
Normal file
After Width: | Height: | Size: 63 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image17.png
Normal file
After Width: | Height: | Size: 28 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image19.png
Normal file
After Width: | Height: | Size: 83 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image20.png
Normal file
After Width: | Height: | Size: 44 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image21.png
Normal file
After Width: | Height: | Size: 75 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image23.png
Normal file
After Width: | Height: | Size: 77 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image25.png
Normal file
After Width: | Height: | Size: 68 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image27.png
Normal file
After Width: | Height: | Size: 71 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image30.png
Normal file
After Width: | Height: | Size: 181 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image37.png
Normal file
After Width: | Height: | Size: 64 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image38.png
Normal file
After Width: | Height: | Size: 54 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image39.png
Normal file
After Width: | Height: | Size: 126 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image40.png
Normal file
After Width: | Height: | Size: 60 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image41.png
Normal file
After Width: | Height: | Size: 80 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image42.png
Normal file
After Width: | Height: | Size: 62 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image43.png
Normal file
After Width: | Height: | Size: 100 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image44.png
Normal file
After Width: | Height: | Size: 84 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image45.png
Normal file
After Width: | Height: | Size: 71 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image46.png
Normal file
After Width: | Height: | Size: 62 KiB |
BIN
doc/source/test_results/monitoring/lma/media/image47.png
Normal file
After Width: | Height: | Size: 125 KiB |