Testing of major openstack services with 1000
of compute nodes containers. Change-Id: Ifeab8cd5422c92a682ef8867691b53e2fe781edc
This commit is contained in:
parent
856e181f9f
commit
90e73f3c1f
204
doc/source/test_plans/1000_nodes/plan.rst
Normal file
204
doc/source/test_plans/1000_nodes/plan.rst
Normal file
@ -0,0 +1,204 @@
|
|||||||
|
.. _1000_nodes:
|
||||||
|
|
||||||
|
===========================================================
|
||||||
|
1000 Compute nodes resourse consumption/scalability testing
|
||||||
|
===========================================================
|
||||||
|
|
||||||
|
:status: **ready**
|
||||||
|
:version: 1
|
||||||
|
|
||||||
|
:Abstract:
|
||||||
|
|
||||||
|
This document describes a test plan for measuring OpenStack services
|
||||||
|
resources consumption along with scalability potential. It also provides
|
||||||
|
a results which could be used to find bottlenecks and/or potential pain
|
||||||
|
points for scaling standalone OpenStack services and OpenStack cloud itself.
|
||||||
|
|
||||||
|
Test Plan
|
||||||
|
=========
|
||||||
|
|
||||||
|
Most of current OpenStack users wonder how it will behave on scale with a lot
|
||||||
|
of compute nodes. This is a valid consern because OpenStack have a lot of
|
||||||
|
services whose have different load and resources consumptions patterns.
|
||||||
|
Most of the cloud operations are related to the two things: workloads placement
|
||||||
|
and simple controlplane/dataplane management for them.
|
||||||
|
So the main idea of this test plan is to create simple workloads (10-30k of
|
||||||
|
VMs) and observe how core services working with them and what is resources
|
||||||
|
consumption during active workloads placement and some time after that.
|
||||||
|
|
||||||
|
Test Environment
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Test assumes that each and every service will be monitored separately for
|
||||||
|
resourses consuption using known techniques like atop/nagios/containerization
|
||||||
|
and any other toolkits/solutions which will allow to:
|
||||||
|
|
||||||
|
1. Measure CPU/RAM consuption of process/set of processes.
|
||||||
|
2. Separate services and provide them as much as possible resourses available
|
||||||
|
to fulfill their needs.
|
||||||
|
|
||||||
|
List of mandatory services for OpenStack testing:
|
||||||
|
nova-api
|
||||||
|
nova-scheduler
|
||||||
|
nova-conductor
|
||||||
|
nova-compute
|
||||||
|
glance-api
|
||||||
|
glance-registry
|
||||||
|
neutron-server
|
||||||
|
keystone-all
|
||||||
|
|
||||||
|
List of replaceable but still mandatory services:
|
||||||
|
neutron-dhcp-agent
|
||||||
|
neutron-ovs-agent
|
||||||
|
rabbitmq
|
||||||
|
libvirtd
|
||||||
|
mysqld
|
||||||
|
openvswitch-vswitch
|
||||||
|
|
||||||
|
List of optional service which may be omitted with performance decrease:
|
||||||
|
memcached
|
||||||
|
|
||||||
|
List of optional service which may be omitted:
|
||||||
|
horizon
|
||||||
|
|
||||||
|
Rally fits here as a pretty stable and reliable load runner. Monitoring could be
|
||||||
|
done by any suitable software which will be able to provide a results in a form
|
||||||
|
which allow to build graphs/visualize resources consuption to analyze them or
|
||||||
|
do the analyzis automatically.
|
||||||
|
|
||||||
|
Preparation
|
||||||
|
^^^^^^^^^^^
|
||||||
|
|
||||||
|
**Common preparation steps**
|
||||||
|
|
||||||
|
To begin testing environment should have all the OpenStack services up and
|
||||||
|
running. Of course they should be configured accordingly to the recommended
|
||||||
|
settings from release and/or for your specific environment or use case.
|
||||||
|
To have real world RPS/TPS/etc metrics all the services (inlcuding compute
|
||||||
|
nodes) should be on the separate physical servers but again it depends on
|
||||||
|
setup and requirements. For simplicity and testing only control plane the
|
||||||
|
Fake compute driver could be used.
|
||||||
|
|
||||||
|
Environment description
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The environment description includes hardware specification of servers,
|
||||||
|
network parameters, operation system and OpenStack deployment characteristics.
|
||||||
|
|
||||||
|
Hardware
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
This section contains list of all types of hardware nodes.
|
||||||
|
|
||||||
|
+-----------+-------+----------------------------------------------------+
|
||||||
|
| Parameter | Value | Comments |
|
||||||
|
+-----------+-------+----------------------------------------------------+
|
||||||
|
| model | | e.g. Supermicro X9SRD-F |
|
||||||
|
+-----------+-------+----------------------------------------------------+
|
||||||
|
| CPU | | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz |
|
||||||
|
+-----------+-------+----------------------------------------------------+
|
||||||
|
|
||||||
|
Network
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
This section contains list of interfaces and network parameters.
|
||||||
|
For complicated cases this section may include topology diagram and switch
|
||||||
|
parameters.
|
||||||
|
|
||||||
|
+------------------+-------+-------------------------+
|
||||||
|
| Parameter | Value | Comments |
|
||||||
|
+------------------+-------+-------------------------+
|
||||||
|
| card model | | e.g. Intel |
|
||||||
|
+------------------+-------+-------------------------+
|
||||||
|
| driver | | e.g. ixgbe |
|
||||||
|
+------------------+-------+-------------------------+
|
||||||
|
| speed | | e.g. 10G or 1G |
|
||||||
|
+------------------+-------+-------------------------+
|
||||||
|
|
||||||
|
Software
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
This section describes installed software.
|
||||||
|
|
||||||
|
+-------------------+--------+---------------------------+
|
||||||
|
| Parameter | Value | Comments |
|
||||||
|
+-------------------+--------+---------------------------+
|
||||||
|
| OS | | e.g. Ubuntu 14.04.3 |
|
||||||
|
+-------------------+--------+---------------------------+
|
||||||
|
| DB | | e.g. MySQL 5.6 |
|
||||||
|
+-------------------+--------+---------------------------+
|
||||||
|
| MQ broker | | e.g. RabbitMQ v3.4.25 |
|
||||||
|
+-------------------+--------+---------------------------+
|
||||||
|
| OpenStack release | | e.g. Liberty |
|
||||||
|
+-------------------+--------+---------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Configuration
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This section describes configuration of OpenStack and core services
|
||||||
|
|
||||||
|
+-------------------+-------------------------------+
|
||||||
|
| Parameter | File |
|
||||||
|
+-------------------+-------------------------------+
|
||||||
|
| Keystone | ./results/keystone.conf |
|
||||||
|
+-------------------+-------------------------------+
|
||||||
|
| Nova-api | ./results/nova-api.conf |
|
||||||
|
+-------------------+-------------------------------+
|
||||||
|
| ... + |
|
||||||
|
+-------------------+-------------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Test Case 1: Resources consumption under severe load
|
||||||
|
----------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
Description
|
||||||
|
^^^^^^^^^^^
|
||||||
|
|
||||||
|
This test should spawn a number of instances in n parallel threads and along
|
||||||
|
with that record all CPU/RAM metrics from all the OpenStack and core services
|
||||||
|
like MQ brokers and DB server. As test itself is pretty long there is no need
|
||||||
|
in very high test resolution. 1 measure per 5 seconds should be more than
|
||||||
|
enough.
|
||||||
|
|
||||||
|
Rally scenario that creates load of 50 parallel threads spawning VMs and
|
||||||
|
calling for VMs list can be found in test plan folder and can be used for
|
||||||
|
testing purposes. It could be modified to fit specific deployment needs.
|
||||||
|
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
^^^^^^^^^^
|
||||||
|
|
||||||
|
============================ ====================================================
|
||||||
|
Parameter name Value
|
||||||
|
============================ ====================================================
|
||||||
|
OpenStack release Liberty, Mitaka
|
||||||
|
|
||||||
|
Compute nodes amount 50,100,200,500,1000,2000,5000,10000
|
||||||
|
|
||||||
|
Services configurations Configuration for each OpenStack and core service
|
||||||
|
============================ ====================================================
|
||||||
|
|
||||||
|
List of performance metrics
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Test case result is presented as a weighted tree structure with operations
|
||||||
|
as nodes and time spent on them as node weights for every control plane
|
||||||
|
operation under the test. This information is automatically gathered in
|
||||||
|
Ceilometer and can be gracefully transformed to the human-friendly report via
|
||||||
|
OSprofiler.
|
||||||
|
|
||||||
|
======== =============== ================= =================================
|
||||||
|
Priority Value Measurement Units Description
|
||||||
|
======== =============== ================= =================================
|
||||||
|
1 CPU load Mhz CPU load for each OpenStack
|
||||||
|
service
|
||||||
|
2 RAM consumption Gb RAM consumption for each
|
||||||
|
OpenStack service
|
||||||
|
3 Instances amnt Amount Max number of instances spawned
|
||||||
|
4 Operation time milliseconds Time spent for every instance
|
||||||
|
spawn
|
||||||
|
======== =============== ================= =================================
|
||||||
|
|
58
doc/source/test_plans/1000_nodes/rallytest.json
Normal file
58
doc/source/test_plans/1000_nodes/rallytest.json
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
{
|
||||||
|
"NovaServers.boot_and_list_server": [
|
||||||
|
{
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"concurrency": 50,
|
||||||
|
"times": 20000
|
||||||
|
},
|
||||||
|
"args": {
|
||||||
|
"detailed": true,
|
||||||
|
"flavor": {
|
||||||
|
"name": "m1.tiny"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "cirros"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"failure_rate": {
|
||||||
|
"max": 0
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"project_domain": "default",
|
||||||
|
"users_per_tenant": 2,
|
||||||
|
"tenants": 200,
|
||||||
|
"resource_management_workers": 30,
|
||||||
|
"user_domain": "default"
|
||||||
|
},
|
||||||
|
"quotas": {
|
||||||
|
"nova": {
|
||||||
|
"ram": -1,
|
||||||
|
"floating_ips": -1,
|
||||||
|
"security_group_rules": -1,
|
||||||
|
"instances": -1,
|
||||||
|
"cores": -1,
|
||||||
|
"security_groups": -1
|
||||||
|
},
|
||||||
|
"neutron": {
|
||||||
|
"subnet": -1,
|
||||||
|
"network": -1,
|
||||||
|
"port": -1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"network": {
|
||||||
|
"network_create_args": {
|
||||||
|
"tenant_id": "d51f243eba4d48d09a853e23aeb68774",
|
||||||
|
"name": "c_rally_b7d5d2f5_OqPRUMD8"
|
||||||
|
},
|
||||||
|
"subnets_per_network": 1,
|
||||||
|
"start_cidr": "1.0.0.0/21",
|
||||||
|
"networks_per_tenant": 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
@ -19,4 +19,5 @@ Test Plans
|
|||||||
container_cluster_systems/plan
|
container_cluster_systems/plan
|
||||||
neutron_features/l3_ha/test_plan
|
neutron_features/l3_ha/test_plan
|
||||||
hardware_features/index
|
hardware_features/index
|
||||||
|
1000_nodes/plan
|
||||||
|
|
||||||
|
141
doc/source/test_results/1000_nodes/index.rst
Normal file
141
doc/source/test_results/1000_nodes/index.rst
Normal file
@ -0,0 +1,141 @@
|
|||||||
|
Testing on scale of 1000 compute hosts.
|
||||||
|
=======================================
|
||||||
|
|
||||||
|
Environment setup
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Each and every service of OpenStack was placed into the container. Containers
|
||||||
|
were placed mostly across 17 nodes. “Mostly” means some of them were placed on
|
||||||
|
separate nodes to be able to get more resources if needed without limitations
|
||||||
|
from other services. So after some initial assumptions these privileges was
|
||||||
|
given to the following containers: rabbitmq, mysql, keystone, nova-api and
|
||||||
|
neutron. Later after some observations only rabbitmq, mysql and keystone has
|
||||||
|
kept these privileges. All other containers were places with some higher
|
||||||
|
priorities but without dedicating additional hosts for them.
|
||||||
|
|
||||||
|
List of OpenStack and Core services which were used in testing environment (in
|
||||||
|
parentheses represents number of instances/containers):
|
||||||
|
|
||||||
|
- nova-api(1)
|
||||||
|
- nova-scheduler(8)
|
||||||
|
- nova-conductor(1)
|
||||||
|
- nova-compute(1000)
|
||||||
|
- glance-api(1)
|
||||||
|
- glance-registry(1)
|
||||||
|
- neutron-server(1)
|
||||||
|
- neutron-dhcp-agent(1)
|
||||||
|
- neutron-ovs-agent(1000)
|
||||||
|
- keystone-all(1)
|
||||||
|
- rabbitmq(1)
|
||||||
|
- mysqld(1)
|
||||||
|
- memcached(1)
|
||||||
|
- horizon(1)
|
||||||
|
- libvirtd(1000)
|
||||||
|
- openvswitch-vswitch(1000)
|
||||||
|
|
||||||
|
|
||||||
|
Additional information
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
We have 8 instances of nova-scheduler because it’s known as non-scalable inside
|
||||||
|
of service (there is no workers/threads/etc inside of nova-scheduler).
|
||||||
|
All other OpenStack services were run in quantity of 1.
|
||||||
|
Each and every “Compute node” container has neutron-ovs-agent, libvirtd,
|
||||||
|
nova-compute and openvswitch-vswitch inside. So we have 1000 of “Compute”
|
||||||
|
containers across ~13 nodes.
|
||||||
|
|
||||||
|
Prime aim of this simple testing is to check scalability of OpenStack control
|
||||||
|
and data plane services. Because of that RabbitMQ and MySQL were run in single
|
||||||
|
node mode just to verify “essential load” and confirm that there is no issues
|
||||||
|
with standalone nodes. Later we will run tests with Galera cluster and
|
||||||
|
clustered RabbitMQ.
|
||||||
|
|
||||||
|
We have used Mirantis MOS 8.0 (OpenStack Liberty release) official repo for
|
||||||
|
creating containers with OpenStack services.
|
||||||
|
|
||||||
|
There is set of tests run with fake compute driver for preliminary checks
|
||||||
|
and overall load and placement verification. Later we modified original libvirt
|
||||||
|
driver to only skip actual VM booting (spawn of qemu-kvm process). All other
|
||||||
|
things related to the instance spawning are actually done.
|
||||||
|
|
||||||
|
Glance was used with local file storage as a backend. CirrOS images were used
|
||||||
|
for VM booting(~13Mb). Local disks of nodes/containers were used as a storage
|
||||||
|
for VMs.
|
||||||
|
|
||||||
|
Methodology
|
||||||
|
-----------
|
||||||
|
|
||||||
|
For simplicity we chose “boot and list VM” scenario in Rally with the
|
||||||
|
following important parameters:
|
||||||
|
|
||||||
|
- Total number of instances: 20000
|
||||||
|
- Total number of workers: 50
|
||||||
|
- Total number of networks: 200
|
||||||
|
- Total number of tenants: 200
|
||||||
|
- Total number of users: 400
|
||||||
|
|
||||||
|
In 2-3 years probability of 1000 compute hosts to be added at the same
|
||||||
|
moment (all of them in 10-15 seconds) is close to 0% therefore it's necessary
|
||||||
|
to start all Compute containers and wait for ~5-10 minutes to provide neutron
|
||||||
|
DVR with enough time to update all the agents to know each other.
|
||||||
|
|
||||||
|
After that we start Rally test scenario. Because of nature of changes in
|
||||||
|
nova-compute driver starting of a VM would be considered succeeded before
|
||||||
|
security groups get applied to it (like vif_plugging_is_fatal=False). So this
|
||||||
|
will lead to the increased Neutron server load and possibility of not all the
|
||||||
|
rules got applied at the end of the testing. Although in our case it will
|
||||||
|
create bigger load on Neutron which makes this test much heavier.
|
||||||
|
Anyway we plan to do this test later excluding this particular behavior and
|
||||||
|
compare the results.
|
||||||
|
|
||||||
|
In folder with this report you’ll find additional files with the test
|
||||||
|
scenario, results and usage patterns observations.
|
||||||
|
|
||||||
|
Here we would like to just point out some findings about resources consumptions
|
||||||
|
by each and every service which could help with servers capacity planning. All
|
||||||
|
servers had 2x Intel Xeon E5-2680v3.
|
||||||
|
Here is top watermarks from different services under mentioned test load.
|
||||||
|
|
||||||
|
|
||||||
|
Table 1. Services top watermarks
|
||||||
|
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| Service | CPU | RAM |
|
||||||
|
+=================+=========+==========+
|
||||||
|
| nova-api | 13 GHz | 12.4 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| nova-scheduler* | 1 GHz | 1.1 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| nova-conductor | 30 GHz | 4.8 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| glance-api | 160 MHz | 1.8 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| glance-registry | 300 MHz | 1.8 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| neutron-server | 30 GHz | 20 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| keystone-all | 14 GHz | 2.7 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| rabbitmq | 21 GHz | 17 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| mysqld | 1.9 GHz | 3.5 Gb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
| memcached | 10 MHz | 27 Mb |
|
||||||
|
+-----------------+---------+----------+
|
||||||
|
|
||||||
|
| * each of eight nova-scheduler processes.
|
||||||
|
|
||||||
|
Very first assumptions on scale of 1000 nodes will be the following: it would
|
||||||
|
be good to have 2 dedicated servers per component. Here is a list of components
|
||||||
|
whose would require that: nova-conductor,nova-api, neutron-server, keystone.
|
||||||
|
RabbitMQ and MySQL servers worked in standalone mode so clustering overhead
|
||||||
|
will be added and they will consume much more resources than we already
|
||||||
|
metered.
|
||||||
|
|
||||||
|
|
||||||
|
Graphs:
|
||||||
|
|
||||||
|
.. image:: stats1.png
|
||||||
|
:width: 1300px
|
||||||
|
.. image:: stats2.png
|
||||||
|
:width: 1300px
|
BIN
doc/source/test_results/1000_nodes/stats1.png
Normal file
BIN
doc/source/test_results/1000_nodes/stats1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.3 MiB |
BIN
doc/source/test_results/1000_nodes/stats2.png
Normal file
BIN
doc/source/test_results/1000_nodes/stats2.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.4 MiB |
@ -19,3 +19,4 @@ Test Results
|
|||||||
neutron_features/index
|
neutron_features/index
|
||||||
hardware_features/index
|
hardware_features/index
|
||||||
provisioning/index
|
provisioning/index
|
||||||
|
1000_nodes/index
|
||||||
|
BIN
raw_results/1000_nodes/report.html.gz
Normal file
BIN
raw_results/1000_nodes/report.html.gz
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user