Adding first Telemetry test results with CephStorage
Change-Id: I80286a8a595663ebb09d439239363462a4c802ee
@ -4,7 +4,7 @@
|
|||||||
Telemetry Services resource consumption/scalability testing
|
Telemetry Services resource consumption/scalability testing
|
||||||
===========================================================
|
===========================================================
|
||||||
|
|
||||||
:status: **draft**
|
:status: **ready**
|
||||||
:version: 1.0
|
:version: 1.0
|
||||||
|
|
||||||
:Abstract:
|
:Abstract:
|
||||||
@ -182,3 +182,9 @@ Performance
|
|||||||
Failure Conditions
|
Failure Conditions
|
||||||
|
|
||||||
- Errors in log files (Gnocchi, Ceilometer, Nova, Swift, ...)
|
- Errors in log files (Gnocchi, Ceilometer, Nova, Swift, ...)
|
||||||
|
|
||||||
|
Reports
|
||||||
|
=======
|
||||||
|
|
||||||
|
Test plan execution reports:
|
||||||
|
* :ref:`telemetry_gnocchi_with_ceph_report_1k_instances`
|
||||||
|
After Width: | Height: | Size: 174 KiB |
After Width: | Height: | Size: 397 KiB |
After Width: | Height: | Size: 77 KiB |
After Width: | Height: | Size: 242 KiB |
After Width: | Height: | Size: 103 KiB |
After Width: | Height: | Size: 57 KiB |
After Width: | Height: | Size: 71 KiB |
After Width: | Height: | Size: 46 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 190 KiB |
315
doc/source/test_results/telemetry_gnocchi_with_ceph/index.rst
Normal file
@ -0,0 +1,315 @@
|
|||||||
|
.. _telemetry_gnocchi_with_ceph_report_1k_instances:
|
||||||
|
|
||||||
|
===================================================================================================
|
||||||
|
Telemetry Services resource consumption/scalability test results on Gnocchi with CephStorage driver
|
||||||
|
===================================================================================================
|
||||||
|
|
||||||
|
This report is generated for :ref:`telemetry_scale` test plan.
|
||||||
|
|
||||||
|
Test Environment
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Environment description
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
The environment description includes hardware specs, software versions, tunings
|
||||||
|
and configuration of the OpenStack Cloud under test.
|
||||||
|
|
||||||
|
Hardware
|
||||||
|
~~~~~~~~
|
||||||
|
Deployment node (Undercloud) (1 Machine)
|
||||||
|
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Parameter | Value |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| model | Dell PowerEdge r620 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| CPU | 2xIntel(R) Xeon(R) E5-2620v2 @ 2.10GHz (12Cores/24Threads) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Memory | 8x8192MiB - 64GiB (@1333MHz) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| RAID Cont | PERC H710 Mini Embedded (512MB Cache) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Disk | 2 x 1TB 7.2K SATA Drive in RAID 1 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Network | 2x1Gb/s Integrated(Offline), 2x10Gb/s Intel x520 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
|
||||||
|
Controller (3 Machines)
|
||||||
|
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Parameter | Value |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| model | Dell PowerEdge r620 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| CPU | 2xIntel(R) Xeon(R) E5-2620 @ 2.00GHz (12Cores/24Threads) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Memory | 8x8192MiB - 64GiB (@1333MHz) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| RAID Cont | PERC H710 Mini Embedded (1024MB Cache) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Disk | 2 x 1TB 7.2K SATA Drive in RAID 1 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Network | 2x1Gb/s Integrated(Offline), 2x10Gb/s Intel x520 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
|
||||||
|
Compute (10 Machines)
|
||||||
|
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Parameter | Value |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| model | Dell PowerEdge r620 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| CPU | 2xIntel(R) Xeon(R) E5-2620 @ 2.00GHz (12Cores/24Threads) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Memory | 8x8192MiB - 64GiB (@1333MHz) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| RAID Cont | PERC H710 Mini Embedded (1024MB Cache) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Disk | 2 x 1TB 7.2K SATA Drive in RAID 1 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Network | 2x1Gb/s Integrated(Offline), 2x10Gb/s Intel x520 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
|
||||||
|
CephStorage (4 Machines)
|
||||||
|
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Parameter | Value |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| model | Dell PowerEdge r930 |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| CPU | 4xIntel(R) Xeon(R) E7-4830v3 @ 2.10GHz (48Cores/96Threads) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Memory | 32x16384MiB - 512GiB (@1333MHz) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| RAID Cont | PERC H730P Adapter (2048MB Cache) |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Disk | 9 x 300GB 10K SAS Drives each in Single RAID 0 (JBOD) |
|
||||||
|
| | 1xOS Disk, 7xOSDs with co-hosted journal |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
| Network | 2x1Gb/s Integrated(Offline), 2x10Gb/s Intel x520 Integrated|
|
||||||
|
| | 2x10Gb/s Intel x520 Adapter |
|
||||||
|
+-----------+------------------------------------------------------------+
|
||||||
|
|
||||||
|
Additional Hardware for testing/monitoring/results
|
||||||
|
|
||||||
|
- Performance Monitoring Host (Carbon/Graphite/Grafana)
|
||||||
|
|
||||||
|
Software
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Versions:
|
||||||
|
|
||||||
|
- ceph-base-10.2.3-13.el7cp.x86_64
|
||||||
|
- ceph-common-10.2.3-13.el7cp.x86_64
|
||||||
|
- ceph-mon-10.2.3-13.el7cp.x86_64
|
||||||
|
- ceph-osd-10.2.3-13.el7cp.x86_64
|
||||||
|
- ceph-radosgw-10.2.3-13.el7cp.x86_64
|
||||||
|
- ceph-selinux-10.2.3-13.el7cp.x86_64
|
||||||
|
- libcephfs1-10.2.3-13.el7cp.x86_64
|
||||||
|
- openstack-ceilometer-api-7.0.0-4.el7ost.noarch
|
||||||
|
- openstack-ceilometer-central-7.0.0-4.el7ost.noarch
|
||||||
|
- openstack-ceilometer-collector-7.0.0-4.el7ost.noarch
|
||||||
|
- openstack-ceilometer-common-7.0.0-4.el7ost.noarch
|
||||||
|
- openstack-ceilometer-compute-7.0.0-4.el7ost.noarch
|
||||||
|
- openstack-ceilometer-notification-7.0.0-4.el7ost.noarch
|
||||||
|
- openstack-ceilometer-polling-7.0.0-4.el7ost.noarch
|
||||||
|
- openstack-cinder-9.0.0-13.el7ost.noarch
|
||||||
|
- openstack-glance-13.0.0-1.el7ost.noarch
|
||||||
|
- openstack-gnocchi-api-3.0.2-1.el7ost.noarch
|
||||||
|
- openstack-gnocchi-carbonara-3.0.2-1.el7ost.noarch
|
||||||
|
- openstack-gnocchi-common-3.0.2-1.el7ost.noarch
|
||||||
|
- openstack-gnocchi-indexer-sqlalchemy-3.0.2-1.el7ost.noarch
|
||||||
|
- openstack-gnocchi-metricd-3.0.2-1.el7ost.noarch
|
||||||
|
- openstack-gnocchi-statsd-3.0.2-1.el7ost.noarch
|
||||||
|
- openstack-heat-api-7.0.0-7.el7ost.noarch
|
||||||
|
- openstack-heat-api-cfn-7.0.0-7.el7ost.noarch
|
||||||
|
- openstack-heat-api-cloudwatch-7.0.0-7.el7ost.noarch
|
||||||
|
- openstack-heat-common-7.0.0-7.el7ost.noarch
|
||||||
|
- openstack-heat-engine-7.0.0-7.el7ost.noarch
|
||||||
|
- openstack-keystone-10.0.0-3.el7ost.noarch
|
||||||
|
- openstack-neutron-9.1.0-7.el7ost.noarch
|
||||||
|
- openstack-neutron-bigswitch-agent-9.40.0-1.1.el7ost.noarch
|
||||||
|
- openstack-neutron-bigswitch-lldp-9.40.0-1.1.el7ost.noarch
|
||||||
|
- openstack-neutron-common-9.1.0-7.el7ost.noarch
|
||||||
|
- openstack-neutron-lbaas-9.1.0-1.el7ost.noarch
|
||||||
|
- openstack-neutron-metering-agent-9.1.0-7.el7ost.noarch
|
||||||
|
- openstack-neutron-ml2-9.1.0-7.el7ost.noarch
|
||||||
|
- openstack-neutron-openvswitch-9.1.0-7.el7ost.noarch
|
||||||
|
- openstack-neutron-sriov-nic-agent-9.1.0-7.el7ost.noarch
|
||||||
|
- openstack-nova-api-14.0.2-7.el7ost.noarch
|
||||||
|
- openstack-nova-cert-14.0.2-7.el7ost.noarch
|
||||||
|
- openstack-nova-common-14.0.2-7.el7ost.noarch
|
||||||
|
- openstack-nova-compute-14.0.2-7.el7ost.noarch
|
||||||
|
- openstack-nova-conductor-14.0.2-7.el7ost.noarch
|
||||||
|
- openstack-nova-console-14.0.2-7.el7ost.noarch
|
||||||
|
- openstack-nova-novncproxy-14.0.2-7.el7ost.noarch
|
||||||
|
- openstack-nova-scheduler-14.0.2-7.el7ost.noarch
|
||||||
|
- openstack-swift-plugin-swift3-1.11.0-2.el7ost.noarch
|
||||||
|
- openstack-swift-proxy-2.10.0-6.el7ost.noarch
|
||||||
|
- openstack-swift-container-2.10.0-6.el7ost.noarch
|
||||||
|
- openstack-swift-object-2.10.0-6.el7ost.noarch
|
||||||
|
- openstack-swift-account-2.10.0-6.el7ost.noarch
|
||||||
|
|
||||||
|
Tuning/Configuration
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Parameter | Value |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Gnocchi Metricd Processes | 6 (Default) |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Gnocchi API Deployment | Deployed in httpd |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Gnocchi API Process Count | 6 processes (Default) |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Gnocchi API Thread Count | 1 thread per process (Default) |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Gnocchi Storage Driver | Ceph |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Gnocchi Other | metric_processing_delay = 60 (Default) |
|
||||||
|
| | aggregation_workers_number = 1 (Default) |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Ceilometer Dispatcher | meter_dispatchers=gnocchi |
|
||||||
|
| | archive_policy=low |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Ceilometer Polling | 600s (Default) |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Ceilometer Processes | 1 ceilometer-agent-notification |
|
||||||
|
| | 1 ceilometer-collector |
|
||||||
|
| | 1 ceilometer-polling |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Keystone Processes | 24 admin, 24 main (Single thread/process) |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Nova Processes | 48 api, 24 conductor, 1 scheduler |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Neutron Processes | 24 api, 24 rpc, 24 metadata, 1 l3, 1 dhcp |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Cinder Processes | 24 api, 1 scheduler, 1 volume |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Heat Processes | 24 api, 24 cfn, 24 cloudwatch, 24 engine |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Glance Processes | 24 api, 24 registry |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Swift Processes | 1 proxy, account, container, object server |
|
||||||
|
| | 1 for all other swift processes |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
| Aodh Processes | 1 evaluator, 1 listener, 1 notifier |
|
||||||
|
+---------------------------+--------------------------------------------+
|
||||||
|
|
||||||
|
System Performance Monitoring
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
System performance metrics were recorded into a separate metrics
|
||||||
|
collection/storage/analysis system. Carbon, Graphite, and Grafana with
|
||||||
|
dashboards for monitoring system resource utilization was provided via
|
||||||
|
Browbeat. Gnocchi's backlog was monitored using a collectd plugin to
|
||||||
|
query Gnocchi's status api. The plugin is available here
|
||||||
|
(https://github.com/akrzos/gnocchi-status-collectd)
|
||||||
|
|
||||||
|
Test Diagram
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
.. image:: content/newton_network_diagram_cephstorage.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
Test Case 1
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Description
|
||||||
|
^^^^^^^^^^^
|
||||||
|
Boot 100 persisting instances every 1800 seconds until 1000 instances booted
|
||||||
|
and running in OpenStack cloud.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
|
||||||
|
#. Amount of Instances to boot per period: 100 (5 concurrency at a time)
|
||||||
|
#. Amount of time to wait between booting periods: 1800s
|
||||||
|
#. Maximum number of instances: 1000
|
||||||
|
|
||||||
|
Stopping/Failure Conditions
|
||||||
|
|
||||||
|
- Max number of instances achieved
|
||||||
|
- Failure to boot instances
|
||||||
|
- Failure for Telemetry Services to consume metrics
|
||||||
|
- Other service failures/errors
|
||||||
|
- System out of Resources (ex. CPU 100% utilized)
|
||||||
|
|
||||||
|
Setup
|
||||||
|
^^^^^^^^
|
||||||
|
|
||||||
|
#. Deploy OpenStack Cloud
|
||||||
|
#. Install testing and monitoring tooling
|
||||||
|
#. Gather metadata on Cloud
|
||||||
|
#. Run test
|
||||||
|
#. Tune if necessary/possible
|
||||||
|
|
||||||
|
Analysis/Results
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Gnocchi with a Ceph Storage driver sustained collecting and processing metrics
|
||||||
|
on 1000 instances with tuning several parameters.
|
||||||
|
|
||||||
|
- Gnocchi - 48 metricd workers on each controller
|
||||||
|
- Gnocchi - metric_processing_delay = 30
|
||||||
|
- Ceph - 512 pgs for metrics pool (32 OSDs)
|
||||||
|
|
||||||
|
Through several experiments, the Gnocchi Metricd worker count was tuned at
|
||||||
|
deployment/installation rather than during or post workload initation. At the
|
||||||
|
top end of the workload it was noticed just a little bit more processing
|
||||||
|
capacity was needed thus metric_processing_delay was decreased from 60s to 30s.
|
||||||
|
This improved the capacity and the backlog dropped quickly. Ceph's pgs
|
||||||
|
were also tuned during the deployment timeframe and pgcalc was used to
|
||||||
|
determine the number of pgs for the given number of OSDs. PGcalc is available
|
||||||
|
at http://ceph.com/pgcalc/
|
||||||
|
|
||||||
|
Resource Utilization Graphs
|
||||||
|
|
||||||
|
Instances
|
||||||
|
|
||||||
|
.. image:: content/instances_booted.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
Gnocchi Status
|
||||||
|
|
||||||
|
.. image:: content/gnocchi_status.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
CPU Utilization On Controllers
|
||||||
|
|
||||||
|
.. image:: content/controllers_cpu.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
Memory Utilization on Controllers
|
||||||
|
|
||||||
|
.. image:: content/controllers_memory.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
Disk IO Utilization on Controllers
|
||||||
|
|
||||||
|
.. image:: content/controllers_disk.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
Error Logs on Controllers
|
||||||
|
|
||||||
|
.. image:: content/controllers_errors.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
CPU Utilization On CephStorage Nodes
|
||||||
|
|
||||||
|
.. image:: content/cephstorage_cpu.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
Memory Utilization on CephStorage Nodes
|
||||||
|
|
||||||
|
.. image:: content/cephstorage_memory.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
Disk IO Utilization on CephStorage Nodes
|
||||||
|
|
||||||
|
.. image:: content/cephstorage_disk.png
|
||||||
|
:width: 600px
|
||||||
|
|
||||||
|
Post running the test, it was found at exactly 00:00 UTC, Gnocchi is
|
||||||
|
performing additional work which caused its backlog to grow again. During this
|
||||||
|
timeframe there is less cpu utilization on the controllers and less disk IO
|
||||||
|
utilization on the Ceph storage nodes. Eventually it catchs back up but more
|
||||||
|
analysis will need to be done to determine exactly what Gnocchi was doing that
|
||||||
|
caused its backlog to grow at that specific timeframe.
|