Change-Id: I04a26909a3472a6abc18a50266131e3be7c71e26
30 KiB
OpenStack Networking (Neutron) control plane performance report for 400 nodes
- Abstract
This document includes OpenStack Networking (aka Neutron) control plane performance test results against two OpenStack environments: 200 nodes and 378 nodes. All tests have been performed regarding
openstack_neutron_control_plane_performance_test_plan
Environment description
Lab A (200 nodes)
3 controllers, 196 computes, 1 node for Grafana/Prometheus
Hardware configuration of each server
server | vendor,model | Supermicro MBD-X10DRI |
|
vendor,model ----------------+ processor_count ----------------+ core_count ----------------+ frequency_MHz | Intel Xeon E5-2650v3 ---------------------------------+ 2 ---------------------------------+ 10 ---------------------------------+ 2300 |
|
vendor,model ----------------+ amount_MB | 8x Samsung M393A2G40DB0-CPB ---------------------------------+ 2097152 |
|
vendor,model ----------------+ bandwidth ----------------+ vendor,model ----------------+ bandwidth | Intel,I350 Dual Port ---------------------------------+ 1G ---------------------------------+ Intel,82599ES Dual Port ---------------------------------+ 10G |
|
vendor,model ----------------+ SSD/HDD ----------------+ size ----------------+ vendor,model ----------------+ SSD/HDD ----------------+ size | Intel SSD DC S3500 Series ---------------------------------+ SSD ---------------------------------+ 240GB ---------------------------------+ 2x WD WD5003AZEX ---------------------------------+ HDD ---------------------------------+ 500GB |
server | vendor,model | SUPERMICRO 5037MR-H8TRF |
|
vendor,model ----------------+ processor_count ----------------+ core_count ----------------+ frequency_MHz | INTEL XEON Ivy Bridge 6C E5-2620 ---------------------------------+ 1 ---------------------------------+ 6 ---------------------------------+ 2100 |
|
vendor,model ----------------+ amount_MB | 4x Samsung DDRIII 8GB DDR3-1866 ---------------------------------+ 32768 |
|
vendor,model ----------------+ bandwidth | AOC-STGN-i2S - 2-port ---------------------------------+ 10G |
|
vendor,model ----------------+ SSD/HDD ----------------+ size ----------------+ vendor,model ----------------+ SSD/HDD ----------------+ size | Intel SSD DC S3500 Series ---------------------------------+ SSD ---------------------------------+ 80GB ---------------------------------+ 1x WD Scorpio Black BP WD7500BPKT ---------------------------------+ HDD ---------------------------------+ 750GB |
Lab B (378 nodes)
Environment contains 4 types of servers:
- rally node
- controller node
- compute-osd node
- compute node
Role | Servers count | Type |
---|---|---|
rally | 1 | 1 or 2 |
controller | 3 | 1 or 2 |
compute | 291 | 1 or 2 |
compute-osd | 34 | 3 |
compute-osd | 49 | 1 |
Hardware configuration of each server
All servers have 3 types of configuration describing in table below
server | vendor,model | Dell PowerEdge R630 |
|
vendor,model ----------------+ processor_count ----------------+ core_count ----------------+ frequency_MHz | Intel,E5-2680 v3 ---------------------------------+ 2 ---------------------------------+ 12 ---------------------------------+ 2500 |
|
vendor,model ----------------+ amount_MB | Samsung, M393A2G40DB0-CPB ---------------------------------+ 262144 |
|
interface_name s ----------------+ vendor,model ----------------+ bandwidth ----------------+ interface_names ----------------+ vendor,model ----------------+ bandwidth | eno1, eno2 ---------------------------------+ Intel,X710 Dual Port ---------------------------------+ 10G ---------------------------------+ enp3s0f0, enp3s0f1 ---------------------------------+ Intel,X710 Dual Port ---------------------------------+ 10G |
|
dev_name ----------------+ vendor,model ----------------+ SSD/HDD ----------------+ size |
/dev/sda ---------------------------------+ | raid1 - Dell, PERC H730P Mini | 2 disks Intel S3610 ---------------------------------+ SSD ---------------------------------+ 3,6TB |
Network configuration of each server
All servers have same network configuration:
Software configuration on servers with controller, compute and compute-osd roles
Role | Service name |
---|---|
controller | horizon keystone nova-api nava-scheduler nova-cert nova-conductor nova-consoleauth nova-consoleproxy cinder-api cinder-backup cinder-scheduler cinder-volume glance-api glance-glare glance-registry neutron-dhcp-agent neutron-l3-agent neutron-metadata-agent neutron-openvswitch-agent neutron-server heat-api heat-api-cfn heat-api-cloudwatch ceph-mon rados-gw memcached rabbitmq_server mysqld galera corosync pacemaker haproxy |
compute | nova-compute neutron-l3-agent neutron-metadata-agent neutron-openvswitch-agent |
compute-osd | nova-compute neutron-l3-agent neutron-metadata-agent neutron-openvswitch-agent ceph-osd |
Software | Version |
---|---|
OpenStack | Mitaka |
Ceph | Hammer |
Ubuntu | Ubuntu 14.04.3 LTS |
You can find outputs of some commands and /etc folder in the following archives:
controller-1.tar.gz <configs/controller-1.tar.gz>
controller-2.tar.gz <configs/controller-2.tar.gz>
controller-3.tar.gz <configs/controller-3.tar.gz>
compute-1.tar.gz <configs/compute-1.tar.gz>
compute-osd-1.tar.gz <configs/compute-osd-1.tar.gz>
Software configuration on servers with Rally role
On this server should be installed Rally. How to do it you can find in Rally installation documentation
Software | Version |
---|---|
Rally | 0.5.0 |
Ubuntu | Ubuntu 14.04.3 LTS |
Test results
Test Case 1: Basic Neutron test suite
The following list of tests were run with the default configuration against Lab A (200 nodes):
- create-and-list-floating-ips
- create-and-list-networks
- create-and-list-ports
- create-and-list-routers
- create-and-list-security-groups
- create-and-list-subnets
- create-and-delete-floating-ips
- create-and-delete-networks
- create-and-delete-ports
- create-and-delete-routers
- create-and-delete-security-groups
- create-and-delete-subnets
- create-and-update-networks
- create-and-update-ports
- create-and-update-routers
- create-and-update-security-groups
- create-and-update-subnets
The time needed for each scenario can be comparatively presented using the following chart:
To overview extended information please download the following
report: basic_neutron.html <reports/basic_neutron.html>
Test Case 2: Stressful Neutron test suite
The following list of tests were run against both Lab A (200 nodes) and Lab B (378 nodes):
- create-and-list-networks
- create-and-list-ports
- create-and-list-routers
- create-and-list-security-groups
- create-and-list-subnets
- boot-and-list-server
- boot-and-delete-server-with-secgroups
- boot-runcommand-delete
Here is short representation of the collected results:
|
Iterations/concurrency -----------+-----------+ Lab A |Lab B | Time, sec ------------+------------+ Lab A |Lab B | Errors ----------------------+----------------------+ Lab A |Lab B |
+======================================+===========+===========+============+============+======================+======================+ 3000/50 avg 2.375 1 | | | max 11.669 Internal server error | | | | | | while processing your | | | | | | request | +--------------------------------------+-----------+-----------+------------+------------+----------------------+----------------------+ 1000/50 avg 123.97 1 | | | max 270.84 | | | | | | | | | | | | | +--------------------------------------+-----------+-----------+------------+------------+----------------------+----------------------+ 2000/50 avg 15.59 0 | | | max 19.398 | | | +--------------------------------------+-----------+-----------+------------+------------+----------------------+----------------------+ 50/1 avg 210.706 0 | | | max 169.315 | | | +--------------------------------------+-----------+-----------+------------+------------+----------------------+----------------------+ 2000/50 avg 25.973 1 | | | max 50.415 | | | | | | | | | | | | | +--------------------------------------+-----------+-----------+------------+------------+----------------------+----------------------+ 4975/50 avg 21.445 0 | | | max 25.21 | | | +--------------------------------------+-----------+-----------+------------+------------+----------------------+----------------------+ 4975/200 avg 190.772 394 | | | max 95.651 | | | | | | | | | | | | | | | | | | | +--------------------------------------+-----------+-----------+------------+------------+----------------------+----------------------+ 2000/15 avg 28.39 34 | | | max 85.659 Resource <Server: | | | | | | s_rally_b58e9bd | | | | | | e_Y369JdPf> has ERROR | | | | | | status. Deadlock | | | | | | found when trying to | | | | | | | +--------------------------------------+-----------+-----------+------------+------------+----------------------+----------------------+
During execution of Rally were filed and fixed bugs affecting boot-and-delete-server-with-secgroups and boot-runcommand-delete scenarios on Lab A:
- Bug LP #1610303 l2pop mech fails to update_port_postcommit on a loaded cluster , fix - https://review.openstack.org/353835
- Bug LP #1614452 Port create time grows at scale due to dvr arp update , fix - https://review.openstack.org/357052
With these fixes applied on Lab B mentioned Rally scenarios passed successfully.
Other bugs that were faced:
Observed trends
Create and list networks: the total time spent on each iteration grows linearly
Create and list routers: router list operation time gradually grows from 0.12 to 1.5 sec (2000 iterations).
Create and list routers: total load duration remains line-rate
Create and list subnets: subnet list operation time increases ~ after 1750 iterations (4.5 sec at 1700th iteration to 10.48 at 1800th iteration).
Create and list subnets: creating subnets has time peaks after 1750 iterations
Create and list security groups: secgroup list operation exposes the most rapid growth rate with time increasing from 0.548 sec in first iteration to over 10 sec in last iterations
More details can be found in original Rally report: stress_neutron.html <reports/stress_neutron.html>
Test case 3: Neutron scalability test with many networks
In our tests 100 networks (each with a subnet, router and a VM) were created per each iteration.
Iterations/concurrency | Avg time, sec | Max time, sec | Errors |
---|---|---|---|
10/1 | 1237.389 | 1294.549 | 0 |
20/3 | 1298.611 | 1425.878 | 1 HTTPConnectionPool Read time out |
Load graph for run with 20 iterations/concurrency 3:
More details can be found in original Rally report: scale_neutron_networks.html <reports/scale_neutron_networks.html>
Test case 4: Neutron scalability test with many servers
During each iteration this test creates huge number of VMs (100 in our case) per a single network, hence it is possible to check the case with many number of ports per subnet.
Iterations/concurrency | Avg time, sec | Max time, sec | Errors |
---|---|---|---|
10/1 | 100.422 | 104.315 | 0 |
20/3 | 119.767 | 147.107 | 0 |
Load graph for run with 20 iterations/concurrency 3:
More details can be found in original Rally report: scale_neutron_servers.html <reports/scale_neutron_servers.html>