diff --git a/doc/source/test_plans/neutron_features/index.rst b/doc/source/test_plans/neutron_features/index.rst index c410cca..4ef16c2 100644 --- a/doc/source/test_plans/neutron_features/index.rst +++ b/doc/source/test_plans/neutron_features/index.rst @@ -7,8 +7,7 @@ Neutron features test plans =========================== .. toctree:: + :glob: :maxdepth: 3 - l3_ha/plan - resource_density/plan - agent_restart/plan + */plan diff --git a/doc/source/test_plans/neutron_features/vm_density/deployment.svg b/doc/source/test_plans/neutron_features/vm_density/deployment.svg new file mode 100644 index 0000000..2aafd1c --- /dev/null +++ b/doc/source/test_plans/neutron_features/vm_density/deployment.svg @@ -0,0 +1,564 @@ + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + compute node 1 + slave_2 + + monitoringserver + + + + OpenStack + + virtual router + + instance + + instance + + instance + + instance + + + + + + compute node + + compute node 1 + + instance + + instance + + instance + + instance + + + + + + compute node + + + + + + + + + + + + + + diff --git a/doc/source/test_plans/neutron_features/vm_density/integrity_images/integrity1.png b/doc/source/test_plans/neutron_features/vm_density/integrity_images/integrity1.png new file mode 100644 index 0000000..62599cb Binary files /dev/null and b/doc/source/test_plans/neutron_features/vm_density/integrity_images/integrity1.png differ diff --git a/doc/source/test_plans/neutron_features/vm_density/integrity_images/integrity2.png b/doc/source/test_plans/neutron_features/vm_density/integrity_images/integrity2.png new file mode 100644 index 0000000..987d4f9 Binary files /dev/null and b/doc/source/test_plans/neutron_features/vm_density/integrity_images/integrity2.png differ diff --git a/doc/source/test_plans/neutron_features/vm_density/integrity_images/route1.png b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route1.png new file mode 100644 index 0000000..35f69bb Binary files /dev/null and b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route1.png differ diff --git a/doc/source/test_plans/neutron_features/vm_density/integrity_images/route2.png b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route2.png new file mode 100644 index 0000000..efe4f3e Binary files /dev/null and b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route2.png differ diff --git a/doc/source/test_plans/neutron_features/vm_density/integrity_images/route3.png b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route3.png new file mode 100644 index 0000000..5221a2d Binary files /dev/null and b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route3.png differ diff --git a/doc/source/test_plans/neutron_features/vm_density/integrity_images/route4.png b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route4.png new file mode 100644 index 0000000..125ae8e Binary files /dev/null and b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route4.png differ diff --git a/doc/source/test_plans/neutron_features/vm_density/integrity_images/route5.png b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route5.png new file mode 100644 index 0000000..36ed191 Binary files /dev/null and b/doc/source/test_plans/neutron_features/vm_density/integrity_images/route5.png differ diff --git a/doc/source/test_plans/neutron_features/vm_density/plan.rst b/doc/source/test_plans/neutron_features/vm_density/plan.rst new file mode 100644 index 0000000..9127849 --- /dev/null +++ b/doc/source/test_plans/neutron_features/vm_density/plan.rst @@ -0,0 +1,304 @@ +.. _neutron_vm_density_test_plan: + +================================= +OpenStack Neutron Density Testing +================================= + +:status: **ready** +:version: 1.0 + +:Abstract: + + With density testing we are able to launch many instances on a single + OpenStack cluster. But except obtaining high numbers we also would like to + ensure that all instances are properly wired into the network and, + which is more important, have connectivity with public network. + + +Test Plan +========= + +The goal of this test is to launch as many as possible instances in the +OpenStack cloud and verify that all of them have correct connectivity with +public network. Upon start each instance reports itself to the external +monitoring server. On success server logs should contain instances' IPs and a +number of attempts to get IP from metadata and send it to the server. + + +Test Environment +---------------- + +Preparation +^^^^^^^^^^^ + +This test plan is performed against existing OpenStack cloud. The monitoring +server is deployed on a machine outside of the cloud. + +.. image:: deployment.svg + +During each iteration instances are created and connected to the same Neutron +network, which is plugged into Neutron router that connects OpenStack with +external network. The case with multiple Neutron networks being used may also +be considered. + + +Environment description +^^^^^^^^^^^^^^^^^^^^^^^ + +The environment description includes hardware specification of servers, +network parameters, operation system and OpenStack deployment characteristics. + +Hardware +~~~~~~~~ + +This section contains list of all types of hardware nodes. + ++-----------+-------+----------------------------------------------------+ +| Parameter | Value | Comments | ++-----------+-------+----------------------------------------------------+ +| model | | e.g. Supermicro X9SRD-F | ++-----------+-------+----------------------------------------------------+ +| CPU | | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz | ++-----------+-------+----------------------------------------------------+ +| role | | e.g. compute or network | ++-----------+-------+----------------------------------------------------+ + +Network +~~~~~~~ + +This section contains list of interfaces and network parameters. +For complicated cases this section may include topology diagram and switch +parameters. + ++------------------+-------+-------------------------+ +| Parameter | Value | Comments | ++------------------+-------+-------------------------+ +| network role | | e.g. provider or public | ++------------------+-------+-------------------------+ +| card model | | e.g. Intel | ++------------------+-------+-------------------------+ +| driver | | e.g. ixgbe | ++------------------+-------+-------------------------+ +| speed | | e.g. 10G or 1G | ++------------------+-------+-------------------------+ +| MTU | | e.g. 9000 | ++------------------+-------+-------------------------+ +| offloading modes | | e.g. default | ++------------------+-------+-------------------------+ + +Software +~~~~~~~~ + +This section describes installed software. + ++-----------------+-------+---------------------------+ +| Parameter | Value | Comments | ++-----------------+-------+---------------------------+ +| OS | | e.g. Ubuntu 14.04.3 | ++-----------------+-------+---------------------------+ +| OpenStack | | e.g. Mitaka | ++-----------------+-------+---------------------------+ +| Hypervisor | | e.g. KVM | ++-----------------+-------+---------------------------+ +| Neutron plugin | | e.g. ML2 + OVS | ++-----------------+-------+---------------------------+ +| L2 segmentation | | e.g. VxLAN | ++-----------------+-------+---------------------------+ +| virtual routers | | e.g. DVR | ++-----------------+-------+---------------------------+ + +Test Case 1: VM density check +----------------------------- + +Description +^^^^^^^^^^^ + +The goal of this test is to launch as many as possible instances in the +OpenStack cloud and verify that all of them have correct connectivity with +public network. Instances can be launched in batches (i.e. via Heat). When +instance starts it sends IP to monitoring server located outside of the +cloud. + +The test is treated as successful if all instances report their status. As an +extension for this test plan reverted test case might be taken into account: +when external resource is trying to connect to each VM using floating IP +address. This should be treated as a separate test case. + + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +======== =============== ====================== ====================================== +Priority Value Measurement Units Description +======== =============== ====================== ====================================== +1 Total count Total number of instances +1 Density count per compute node Max instances per compute node +======== =============== ====================== ====================================== + + +Tools +----- + +To execute the test plan: + +1. Disable quotas in OpenStack cloud (since we are going far beyond limits that + are usually allowed): + + .. literalinclude:: scripts/unset_quotas.sh + :language: bash + +2. Configure a machine for monitoring server. Copy the script into it: + + .. literalinclude:: scripts/server.py + :language: python + + Copy Heat teamplate: + + .. literalinclude:: scripts/instance_metadata.hot + :language: yaml + + +3. Start the server: + + .. code-block:: shell + + python server.py -p -l + + The server writes logs about incoming connections into + ``/tmp/instance_.txt`` file. Each line contains instance's IP + identifying which instance sent the report. + + .. note:: If the server is restarted, it will create a new + "instance_.txt" file with new timestamp + + +4. Provision VM instances: + + #. Define number of compute nodes you have in the cluster. Let's say this + number is ``NUM_COMPUTES``. + #. Make sure that ``IMAGE_ID`` and ``FLAVOR`` exist. + #. Put address of monitoring server into ``SERVER_ADDRESS``. + #. Run Heat stack using the template from above: + + .. code-block:: shell + + heat stack-create -f instance_metadata.hot \ + -P "image=IMAGE_ID;flavor=FLAVOR;instance_count=NUM_COMPUTES;\ + server_endpoint=SERVER_ADDRESS" STACK_NAME + + #. Repeat step 4 as many times as you need. + #. Each step monitor ``instances_.txt`` using ``wc -l`` to + validate that all instances are booted and connected to the HTTP server. + +Test Case 2: Additional integrity check +--------------------------------------- + +As an option, one more test can be run between density +`Test Case 1: VM density check`_ and other researches on the environment (or +between multiple density tests if they are run against the same OpenStack +environment). The idea of this test is to create a group of resources and +verify that it stays persistent no matter what other operations are performed +on the environment (resources creation/deletion, heavy workloads, etc.). + +Testing workflow +^^^^^^^^^^^^^^^^ + +Create 20 instances in two server groups `server-group-floating` and +`server-group-non-floating` in proportion 10:10, with each server group having +the anti-affinity policy. Instances from different server groups are located in +different subnets plugged into a router. Instances from `server-group-floating` +have assigned floating IPs while instances from `server-group-non-floating` +have only fixed IPs. + +.. image:: integrity_images/integrity1.png + :width: 550px + +For each of the instances the following connectivity checks are made: + +1. SSH into an instance. +2. Ping an external resource (eg. 8.8.8.8) +3. Ping other VMs (by fixed or floating IPs) + +.. image:: integrity_images/integrity2.png + :alt: Traffic flow during connectivity check + :width: 650px + +Lists of IPs to ping from VM are formed in a way to check all possible +combinations with minimum redundancy. Having VMs from different subnets with +and without floating IPs ping each other and external resource (8.8.8.8) allows +to check that all possible traffic routes are working, i.e.: + +From fixed IP to fixed IP in the same subnet: + +.. image:: integrity_images/route1.png + :alt: From fixed IP to fixed IP in the same subnet + :width: 550px + +From fixed IP to fixed IP in different subnets: + +.. image:: integrity_images/route2.png + :alt: From fixed IP to fixed IP in different subnets + :width: 550px + +From floating IP to fixed IP (same path as in 2): + +.. image:: integrity_images/route3.png + :alt: From floating IP to fixed IP (same path as in 2) + :width: 550px + +From floating IP to floating IP: + +.. image:: integrity_images/route4.png + :alt: From floating IP to floating IP + :width: 550px + +From fixed IP to floating IP: + +.. image:: integrity_images/route5.png + :alt: From fixed IP to floating IP + :width: 550px + +Test steps +^^^^^^^^^^ + +* Create integrity stack using the following Heat template: + + .. literalinclude:: scripts/integrity_vm.hot + :language: yaml + + Use this command to create a Heat stack: + + .. code:: bash + + heat stack-create -f integrity_check/integrity_vm.hot -P "image=IMAGE_ID;flavor=m1.micro;instance_count_floating=10;instance_count_non_floating=10" integrity_stack + +* Assign floating IPs to instances + + .. code:: bash + + assign_floatingips --sg-floating nova_server_group_floating + +* Run connectivity check + + .. code:: bash + + connectivity_check -s ~/ips.json + +.. note:: ``~/ips.json`` is a path to file used to store instances' IPs. + +.. note:: Make sure to run this check only on controller with qdhcp namespace + of integrity_network. + +* At the very end of the testing please cleaunup an integrity stack: + + .. code:: bash + + assign_floatingips --sg-floating nova_server_group_floating --cleanup + heat stack-delete integrity_stack + rm ~/ips.json + +Reports +======= + +Test plan execution reports: + * :ref:`neutron_vm_density_test_report` diff --git a/doc/source/test_plans/neutron_features/vm_density/scripts/instance_metadata.hot b/doc/source/test_plans/neutron_features/vm_density/scripts/instance_metadata.hot new file mode 100644 index 0000000..cb2509a --- /dev/null +++ b/doc/source/test_plans/neutron_features/vm_density/scripts/instance_metadata.hot @@ -0,0 +1,109 @@ +heat_template_version: 2013-05-23 +description: Template to create multiple instances. + +parameters: + image: + type: string + description: Image used for servers + flavor: + type: string + description: flavor used by the servers + default: m1.micro + constraints: + - custom_constraint: nova.flavor + public_network: + type: string + label: Public network name or ID + description: Public network with floating IP addresses. + default: admin_floating_net + instance_count: + type: number + description: Number of instances to create + default: 1 + server_endpoint: + type: string + description: Server endpoint address + cidr: + type: string + description: Private subnet CIDR + +resources: + + private_network: + type: OS::Neutron::Net + + private_subnet: + type: OS::Neutron::Subnet + properties: + network_id: { get_resource: private_network } + cidr: { get_param: cidr } + dns_nameservers: + - 8.8.8.8 + + router: + type: OS::Neutron::Router + properties: + external_gateway_info: + network: { get_param: public_network } + + router-interface: + type: OS::Neutron::RouterInterface + properties: + router_id: { get_resource: router } + subnet: { get_resource: private_subnet } + + server_security_group: + type: OS::Neutron::SecurityGroup + properties: + rules: [ + {remote_ip_prefix: 0.0.0.0/0, + protocol: tcp, + port_range_min: 1, + port_range_max: 65535}, + {remote_ip_prefix: 0.0.0.0/0, + protocol: udp, + port_range_min: 1, + port_range_max: 65535}, + {remote_ip_prefix: 0.0.0.0/0, + protocol: icmp}] + + policy_group: + type: OS::Nova::ServerGroup + properties: + name: nova-server-group + policies: [anti-affinity] + + server_group: + type: OS::Heat::ResourceGroup + properties: + count: { get_param: instance_count} + resource_def: + type: OS::Nova::Server + properties: + image: { get_param: image } + flavor: { get_param: flavor } + networks: + - subnet: { get_resource: private_subnet } + scheduler_hints: { group: { get_resource: policy_group} } + security_groups: [{get_resource: server_security_group}] + user_data_format: RAW + user_data: + str_replace: + template: | + #!/bin/sh -x + RETRY_COUNT=${RETRY_COUNT:-10} + RETRY_DELAY=${RETRY_DELAY:-3} + for i in `seq 1 $RETRY_COUNT`; do + instance_ip=`curl http://169.254.169.254/latest/meta-data/local-ipv4` + [[ -n "$instance_ip" ]] && break + echo "Retry get_instance_ip $i" + sleep $RETRY_DELAY + done + for j in `seq 1 $RETRY_COUNT`; do + curl -vX POST http://$SERVER_ENDPOINT:4242/ -d "{\"instance_ip\": \"$instance_ip\", \"retry_get\": $i, \"retry_send\": $j}" + [ $? = 0 ] && break + echo "Retry send_instance_ip $j" + sleep $RETRY_DELAY + done + params: + "$SERVER_ENDPOINT": { get_param: server_endpoint } diff --git a/doc/source/test_plans/neutron_features/vm_density/scripts/integrity_vm.hot b/doc/source/test_plans/neutron_features/vm_density/scripts/integrity_vm.hot new file mode 100644 index 0000000..157a2ef --- /dev/null +++ b/doc/source/test_plans/neutron_features/vm_density/scripts/integrity_vm.hot @@ -0,0 +1,126 @@ +heat_template_version: 2013-05-23 +description: Template to create multiple instances. + +parameters: + image: + type: string + description: Image used for servers + flavor: + type: string + description: flavor used by the servers + default: m1.micro + constraints: + - custom_constraint: nova.flavor + public_network: + type: string + label: Public network name or ID + description: Public network with floating IP addresses. + default: admin_floating_net + instance_count_floating: + type: number + description: Number of instances to create + default: 1 + instance_count_non_floating: + type: number + description: Number of instances to create + default: 1 + + +resources: + + private_network: + type: OS::Neutron::Net + properties: + name: integrity_network + + private_subnet_floating: + type: OS::Neutron::Subnet + properties: + name: integrity_floating_subnet + network_id: { get_resource: private_network } + cidr: 10.10.10.0/24 + dns_nameservers: + - 8.8.8.8 + + private_subnet_non_floating: + type: OS::Neutron::Subnet + properties: + name: integrity_non_floating_subnet + network_id: { get_resource: private_network } + cidr: 20.20.20.0/24 + dns_nameservers: + - 8.8.8.8 + + router: + type: OS::Neutron::Router + properties: + name: integrity_router + external_gateway_info: + network: { get_param: public_network } + + router_interface_floating: + type: OS::Neutron::RouterInterface + properties: + router_id: { get_resource: router } + subnet: { get_resource: private_subnet_floating } + + router_interface_non_floating: + type: OS::Neutron::RouterInterface + properties: + router_id: { get_resource: router } + subnet: { get_resource: private_subnet_non_floating } + + server_security_group: + type: OS::Neutron::SecurityGroup + properties: + rules: [ + {remote_ip_prefix: 0.0.0.0/0, + protocol: tcp, + port_range_min: 1, + port_range_max: 65535}, + {remote_ip_prefix: 0.0.0.0/0, + protocol: udp, + port_range_min: 1, + port_range_max: 65535}, + {remote_ip_prefix: 0.0.0.0/0, + protocol: icmp}] + + policy_group_floating: + type: OS::Nova::ServerGroup + properties: + name: nova_server_group_floating + policies: [anti-affinity] + + policy_group_non_floating: + type: OS::Nova::ServerGroup + properties: + name: nova_server_group_non_floating + policies: [anti-affinity] + + server_group_floating: + type: OS::Heat::ResourceGroup + properties: + count: { get_param: instance_count_floating} + resource_def: + type: OS::Nova::Server + properties: + image: { get_param: image } + flavor: { get_param: flavor } + networks: + - subnet: { get_resource: private_subnet_floating } + scheduler_hints: { group: { get_resource: policy_group_floating} } + security_groups: [{get_resource: server_security_group}] + + server_group_non_floating: + type: OS::Heat::ResourceGroup + properties: + count: { get_param: instance_count_non_floating} + resource_def: + type: OS::Nova::Server + properties: + image: { get_param: image } + flavor: { get_param: flavor } + networks: + - subnet: { get_resource: private_subnet_non_floating } + scheduler_hints: { group: { get_resource: policy_group_non_floating} } + security_groups: [{get_resource: server_security_group}] diff --git a/doc/source/test_plans/neutron_features/vm_density/scripts/server.py b/doc/source/test_plans/neutron_features/vm_density/scripts/server.py new file mode 100644 index 0000000..28adc61 --- /dev/null +++ b/doc/source/test_plans/neutron_features/vm_density/scripts/server.py @@ -0,0 +1,81 @@ +#!/usr/bin/python +# This script setups simple HTTP server that listens on a given port. +# When a server is started it creates a log file with name in the format +# "instance_.txt". Save directory is also configured +# (defaults to /tmp). +# Once special incoming POST request comes this server logs it +# to the log file." + +import argparse +from BaseHTTPServer import BaseHTTPRequestHandler +from BaseHTTPServer import HTTPServer +from datetime import datetime +import logging +import os +import sys +import json + +LOG = logging.getLogger(__name__) +FILE_NAME = "instances_{:%Y_%m_%d_%H:%M:%S}.txt".format(datetime.now()) + + +class PostHandler(BaseHTTPRequestHandler): + def do_POST(self): + try: + data = self._receive_data() + except Exception as err: + LOG.exception("Failed to process request: %s", err) + raise + else: + LOG.info("Incoming connection: ip=%(ip)s, %(data)s", + {"ip": self.client_address[0], "data": data}) + + def _receive_data(self): + length = int(self.headers.getheader('content-length')) + data = json.loads(self.rfile.read(length)) + + # Begin the response + self.send_response(200) + self.end_headers() + self.wfile.write("Hello!\n") + return data + + +def get_parser(): + parser = argparse.ArgumentParser() + parser.add_argument('-l', '--log-dir', default="/tmp") + parser.add_argument('-p', '--port', required=True) + return parser + + +def main(): + # Parse CLI arguments + args = get_parser().parse_args() + file_name = os.path.join(args.log_dir, FILE_NAME) + + # Set up logging + logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', + level=logging.INFO, + filename=file_name) + console = logging.StreamHandler(stream=sys.stdout) + console.setLevel(logging.INFO) + formatter = logging.Formatter('%(asctime)s %(levelname)s:%(message)s') + console.setFormatter(formatter) + logging.getLogger('').addHandler(console) + + # Initialize and start server + server = HTTPServer(('0.0.0.0', int(args.port)), PostHandler) + LOG.info("Starting server on %s:%s, use to stop", + server.server_address[0], args.port) + try: + server.serve_forever() + except KeyboardInterrupt: + LOG.info("Server terminated") + except Exception as err: + LOG.exception("Server terminated unexpectedly: %s", err) + raise + finally: + logging.shutdown() + +if __name__ == '__main__': + main() diff --git a/doc/source/test_plans/neutron_features/vm_density/scripts/unset_quotas.sh b/doc/source/test_plans/neutron_features/vm_density/scripts/unset_quotas.sh new file mode 100755 index 0000000..340cc07 --- /dev/null +++ b/doc/source/test_plans/neutron_features/vm_density/scripts/unset_quotas.sh @@ -0,0 +1,32 @@ +#!/usr/bin/env bash +#========================================================================== +# Unset quotas for main Nova and Neutron resources for a tenant +# with name $OS_TENANT_NAME. +# Neutron quotas: floatingip, network, port, router, security-group, +# security-group-rule subnet. +# Nova quotas: cores, instances, ram, server-groups, server-group-members. +# +# Usage: unset_quotas.sh +#========================================================================== + +set -e + +NEUTRON_QUOTAS=(floatingip network port router security-group security-group-rule subnet) +NOVA_QUOTAS=(cores instances ram server-groups server-group-members) + +OS_TENANT_ID=$(openstack project show $OS_TENANT_NAME -c id -f value) + +echo "Unsetting Neutron quotas: ${NEUTRON_QUOTAS[@]}" +for net_quota in ${NEUTRON_QUOTAS[@]} +do + neutron quota-update --"$net_quota" -1 $OS_TENANT_ID +done + +echo "Unsetting Nova quotas: ${NOVA_QUOTAS[@]}" +for nova_quota in ${NOVA_QUOTAS[@]} +do + nova quota-update --"$nova_quota" -1 $OS_TENANT_ID +done + +echo "Successfully unset all quotas" +openstack quota show $OS_TENANT_ID diff --git a/doc/source/test_results/neutron_features/index.rst b/doc/source/test_results/neutron_features/index.rst index ba4d893..fbd00c8 100644 --- a/doc/source/test_results/neutron_features/index.rst +++ b/doc/source/test_results/neutron_features/index.rst @@ -13,3 +13,4 @@ Neutron features scale testing l3_ha/test_results_mitaka resource_density/index agent_restart/index + vm_density/results diff --git a/doc/source/test_results/neutron_features/vm_density/reports/grafana.png b/doc/source/test_results/neutron_features/vm_density/reports/grafana.png new file mode 100644 index 0000000..cd28d50 Binary files /dev/null and b/doc/source/test_results/neutron_features/vm_density/reports/grafana.png differ diff --git a/doc/source/test_results/neutron_features/vm_density/reports/iteration1.png b/doc/source/test_results/neutron_features/vm_density/reports/iteration1.png new file mode 100644 index 0000000..f68bd16 Binary files /dev/null and b/doc/source/test_results/neutron_features/vm_density/reports/iteration1.png differ diff --git a/doc/source/test_results/neutron_features/vm_density/reports/iterationi.png b/doc/source/test_results/neutron_features/vm_density/reports/iterationi.png new file mode 100644 index 0000000..9ed2965 Binary files /dev/null and b/doc/source/test_results/neutron_features/vm_density/reports/iterationi.png differ diff --git a/doc/source/test_results/neutron_features/vm_density/results.rst b/doc/source/test_results/neutron_features/vm_density/results.rst new file mode 100644 index 0000000..13f84ae --- /dev/null +++ b/doc/source/test_results/neutron_features/vm_density/results.rst @@ -0,0 +1,221 @@ +.. _`neutron_vm_density_test_report`: + +======================================== +OpenStack Neutron Density Testing report +======================================== + +:Abstract: + + This document includes OpenStack Networking (aka Neutron) density test + results against 200 nodes OpenStack environment. All tests have been + performed regarding + :ref:`neutron_vm_density_test_plan` + +Environment description +======================= + +Lab A (200 nodes) +----------------- + +3 controllers, 196 computes, 1 node for Grafana/Prometheus + +Hardware configuration of each server +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. table:: Description of controller servers + + +-------+----------------+---------------------------------+ + |server |vendor,model |Supermicro MBD-X10DRI | + +-------+----------------+---------------------------------+ + |CPU |vendor,model |Intel Xeon E5-2650v3 | + | +----------------+---------------------------------+ + | |processor_count |2 | + | +----------------+---------------------------------+ + | |core_count |10 | + | +----------------+---------------------------------+ + | |frequency_MHz |2300 | + +-------+----------------+---------------------------------+ + |RAM |vendor,model |8x Samsung M393A2G40DB0-CPB | + | +----------------+---------------------------------+ + | |amount_MB |2097152 | + +-------+----------------+---------------------------------+ + |NETWORK|vendor,model |Intel,I350 Dual Port | + | +----------------+---------------------------------+ + | |bandwidth |1G | + | +----------------+---------------------------------+ + | |vendor,model |Intel,82599ES Dual Port | + | +----------------+---------------------------------+ + | |bandwidth |10G | + +-------+----------------+---------------------------------+ + |STORAGE|vendor,model |Intel SSD DC S3500 Series | + | +----------------+---------------------------------+ + | |SSD/HDD |SSD | + | +----------------+---------------------------------+ + | |size |240GB | + | +----------------+---------------------------------+ + | |vendor,model |2x WD WD5003AZEX | + | +----------------+---------------------------------+ + | |SSD/HDD |HDD | + | +----------------+---------------------------------+ + | |size |500GB | + +-------+----------------+---------------------------------+ + +.. table:: Description of compute servers + + +-------+----------------+---------------------------------+ + |server |vendor,model |SUPERMICRO 5037MR-H8TRF | + +-------+----------------+---------------------------------+ + |CPU |vendor,model |INTEL XEON Ivy Bridge 6C E5-2620 | + | +----------------+---------------------------------+ + | |processor_count |1 | + | +----------------+---------------------------------+ + | |core_count |6 | + | +----------------+---------------------------------+ + | |frequency_MHz |2100 | + +-------+----------------+---------------------------------+ + |RAM |vendor,model |4x Samsung DDRIII 8GB DDR3-1866 | + | +----------------+---------------------------------+ + | |amount_MB |32768 | + +-------+----------------+---------------------------------+ + |NETWORK|vendor,model |AOC-STGN-i2S - 2-port | + | +----------------+---------------------------------+ + | |bandwidth |10G | + +-------+----------------+---------------------------------+ + |STORAGE|vendor,model |Intel SSD DC S3500 Series | + | +----------------+---------------------------------+ + | |SSD/HDD |SSD | + | +----------------+---------------------------------+ + | |size |80GB | + | +----------------+---------------------------------+ + | |vendor,model |1x WD Scorpio Black BP WD7500BPKT| + | +----------------+---------------------------------+ + | |SSD/HDD |HDD | + | +----------------+---------------------------------+ + | |size |750GB | + +-------+----------------+---------------------------------+ + +Test results +============ + +Test Case: VM density check +--------------------------- + +The idea was to boot as many VMs as possible (in batches of 200-1000 VMs) and +make sure they are properly wired and have access to the external network. +The test allows to measure the maximum number of VMs which can be deployed +without issues with cloud operability, etc. + +The external access was checked by the external server to which VMs connect +upon spawning. The server logs incoming connections from provisioned VMs which +send instance IPs to this server via POST requests. Instances also report a +number of attempts it took to get an IP address from metadata server and send +connect to the HTTP server respectively. + +A Heat template was used for creating 1 net with a subnet, 1 DVR router, and a +VM per compute node. Heat stacks were created in batches of 1 to 5 (5 most of +the times), so 1 iteration effectively means 5 new nets/routers and 196 * 5 +VMs. During the execution of the test we were constantly monitoring lab’s +status using Grafana dashboard and checking agents’ status. + +**As a result we were able to successfully create 125 Heat stacks which gives +us the total of 24500 VMs**. + +Iteration 1: + +.. image:: reports/iteration1.png + :width: 650px + +Iteration i: + +.. image:: reports/iterationi.png + :width: 650px + +Example of Grafana dashboard during density test: + +.. image:: reports/grafana.png + :width: 650px + +Observed issues +--------------- + +Issues faced during testing: + +* `LP #1614452 Port create time grows at scale due to dvr arp update`_ + + * Patch: https://review.openstack.org/360732 + +* `LP #1606844 L3 agent constantly resyncing deleted router`_ + + * Patch: https://review.openstack.org/353010 + +* `LP #1609741 oslo.messaging does not redeclare exchange if it is missing`_ + + * Patch: https://review.openstack.org/351162 + +* `LP #1549311 Unexpected SNAT behavior between instances with DVR+floating ip`_ + + * Patch: https://review.openstack.org/349549/ + * Patch: https://review.openstack.org/349884/ + +* `LP #1610303 l2pop mech fails to update_port_postcommit on a loaded cluster`_ + + * Patch: https://review.openstack.org/365051/ + +* `LP #1606827 Agents might be reported as down for 10 minutes after all controllers restart`_ + + * Patch: https://review.openstack.org/349038 + +* `LP #1528895 Timeouts in update_device_list (too slow with large # of VIFs)`_ + + * Patch: https://review.openstack.org/277279/ + +* `LP #1606825 nova-compute hangs while executing a blocking call to librbd`_ + + * Patch: https://review.openstack.org/348492 + +During testing it was also needed to tune nodes configuration in order to +comply with the growing number of VMs per node, such as: + +* Increase ARP table size on compute nodes and controllers +* Raise cpu_allocation_ratio from 8.0 for 12.0 in nova.conf to prevent hitting + Nova vCPUs limit on computes + +At ~16000 VMs we reached ARP table size limit on compute nodes so Heat stack +creation started to fail. Having increased maximum table size we decided to +cleanup failed stacks, in attempt to do so we ran into a following Nova issue +(`LP #1606825 nova-compute hangs while executing a blocking call to librbd`_): +on VM deletion `nova-compute` may hang for a while executing a call to `librbd` +and eventually go down in Nova service-list output. This issue was fixed with +the help of the Mirnatis Nova team and the fix was applied on the lab as a +patch. + +After launching ~20000 VMs cluster started experiencing problems with RabbitMQ +and Ceph. When the number of VMs reached 24500 control plane services and +agents started to massively go down: the initial failure might have been +caused by the lack of allowed PIDs per OSD nodes +(https://bugs.launchpad.net/fuel/+bug/1536271) , Ceph failure affected all +services, i.e. MySQL errors in Neutron server that lead to agents going down +and massive resource rescheduling/resync. After Ceph failure the control plane +cluster could not be recovered and due to that density test had to be stopped +before the capacity of compute nodes was exhausted. + +Ceph team commented that 3 Ceph monitors aren't enough for over 20000 VMs +(each having 2 drives) and recommended to have at least 1 monitor per ~1000 +client connections or move them to dedicated nodes. + +.. note:: Connectivity check of Integrity test passed 100% even when control + plane cluster went crazy. That is a good illustration of control plane + failures not affecting data plane. + +.. note:: Final result - 24500 VMs on a cluster. + +.. references: + +.. _LP #1614452 Port create time grows at scale due to dvr arp update: https://bugs.launchpad.net/neutron/+bug/1614452 +.. _LP #1606844 L3 agent constantly resyncing deleted router: https://bugs.launchpad.net/neutron/+bug/1606844 +.. _LP #1609741 oslo.messaging does not redeclare exchange if it is missing: https://bugs.launchpad.net/neutron/+bug/1609741 +.. _LP #1549311 Unexpected SNAT behavior between instances with DVR+floating ip: https://bugs.launchpad.net/neutron/+bug/1549311 +.. _LP #1610303 l2pop mech fails to update_port_postcommit on a loaded cluster: https://bugs.launchpad.net/neutron/+bug/1610303 +.. _LP #1606827 Agents might be reported as down for 10 minutes after all controllers restart: https://bugs.launchpad.net/neutron/+bug/1606827 +.. _LP #1528895 Timeouts in update_device_list (too slow with large # of VIFs):https://bugs.launchpad.net/neutron/+bug/1528895 +.. _LP #1606825 nova-compute hangs while executing a blocking call to librbd: https://bugs.launchpad.net/neutron/+bug/1606825