diff --git a/doc/source/test_plans/index.rst b/doc/source/test_plans/index.rst index f242f47..6427ecf 100644 --- a/doc/source/test_plans/index.rst +++ b/doc/source/test_plans/index.rst @@ -17,4 +17,5 @@ Test Plans container_repositories/plan keystone/plan container_cluster_systems/plan + neutron_features/l3_ha/test_plan diff --git a/doc/source/test_plans/neutron_features/l3_ha/L3HA.png b/doc/source/test_plans/neutron_features/l3_ha/L3HA.png new file mode 100644 index 0000000..af4983d Binary files /dev/null and b/doc/source/test_plans/neutron_features/l3_ha/L3HA.png differ diff --git a/doc/source/test_plans/neutron_features/l3_ha/iperf_addresses.png b/doc/source/test_plans/neutron_features/l3_ha/iperf_addresses.png new file mode 100644 index 0000000..1a664ae Binary files /dev/null and b/doc/source/test_plans/neutron_features/l3_ha/iperf_addresses.png differ diff --git a/doc/source/test_plans/neutron_features/l3_ha/ping.png b/doc/source/test_plans/neutron_features/l3_ha/ping.png new file mode 100644 index 0000000..2cc0130 Binary files /dev/null and b/doc/source/test_plans/neutron_features/l3_ha/ping.png differ diff --git a/doc/source/test_plans/neutron_features/l3_ha/ping_external.png b/doc/source/test_plans/neutron_features/l3_ha/ping_external.png new file mode 100644 index 0000000..41dafe4 Binary files /dev/null and b/doc/source/test_plans/neutron_features/l3_ha/ping_external.png differ diff --git a/doc/source/test_plans/neutron_features/l3_ha/test_plan.rst b/doc/source/test_plans/neutron_features/l3_ha/test_plan.rst new file mode 100644 index 0000000..f0fd522 --- /dev/null +++ b/doc/source/test_plans/neutron_features/l3_ha/test_plan.rst @@ -0,0 +1,321 @@ +.. _neutron_l3_ha_test_plan: + +================================= +OpenStack Neutron L3 HA Test Plan +================================= + +:status: **draft** +:version: 1.0 + +:Abstract: + + We are able to spawn many L3 agents, however each L3 agent is a SPOF. + If an L3 agent fails, all virtual routers scheduled to this agent will be lost, + and consequently all VMs connected to these virtual routers will be isolated + from external networks and possibly from other tenant networks. + + The main purpose of L3 HA is to address this issue by adding a new type of + router (HA router), which will be spawned twice on two different agents. + One agent will be in charge of the master version of this router, and another + l3 agent will be in charge of the slave router. + + L3 HA functionality in Neutron was implemented in Juno, however detailed + testing on scale for it was not performed. The purpose of this document is to + describe the scenarios for its testing. + + .. image:: L3HA.png + :width: 650px + +:Conventions: + + - **VRRP** - Virtual Router Redundancy Protocol + - **Keepalived** - Routing software based on VRRP protocol + - **Rally** - Benchmarking tool for OpenStack + - **Shaker** - Data plane performance testing tool + - **iperf** - Commonly-used network testing tool + + +Test Plan +========= + +The purpose of this section is to describe scenarios for testing L3 HA. +The most important aspect is the number of packets that will be lost during +restart of the L3 agent or controller as a whole. The second aspect is the +number of routers that can move from one agent to another without +it falling into unmanaged state. + +Test Environment +---------------- + +Preparation +^^^^^^^^^^^ + +This test plan is performed against existing OpenStack cloud. + +Environment description +^^^^^^^^^^^^^^^^^^^^^^^ + +The environment description includes hardware specification of servers, +network parameters, operation system and OpenStack deployment characteristics. + +Hardware +~~~~~~~~ + +This section contains list of all types of hardware nodes. + ++-----------+-------+----------------------------------------------------+ +| Parameter | Value | Comments | ++-----------+-------+----------------------------------------------------+ +| model | | e.g. Supermicro X9SRD-F | ++-----------+-------+----------------------------------------------------+ +| CPU | | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz | ++-----------+-------+----------------------------------------------------+ +| role | | e.g. compute or network | ++-----------+-------+----------------------------------------------------+ + +Network +~~~~~~~ + +This section contains list of interfaces and network parameters. +For complicated cases this section may include topology diagram and switch +parameters. + ++------------------+-------+-------------------------+ +| Parameter | Value | Comments | ++------------------+-------+-------------------------+ +| network role | | e.g. provider or public | ++------------------+-------+-------------------------+ +| card model | | e.g. Intel | ++------------------+-------+-------------------------+ +| driver | | e.g. ixgbe | ++------------------+-------+-------------------------+ +| speed | | e.g. 10G or 1G | ++------------------+-------+-------------------------+ +| MTU | | e.g. 9000 | ++------------------+-------+-------------------------+ +| offloading modes | | e.g. default | ++------------------+-------+-------------------------+ + +Software +~~~~~~~~ + +This section describes installed software. + ++-----------------+-------+---------------------------+ +| Parameter | Value | Comments | ++-----------------+-------+---------------------------+ +| OS | | e.g. Ubuntu 14.04.3 | ++-----------------+-------+---------------------------+ +| OpenStack | | e.g. Liberty | ++-----------------+-------+---------------------------+ +| Hypervisor | | e.g. KVM | ++-----------------+-------+---------------------------+ +| Neutron plugin | | e.g. ML2 + OVS | ++-----------------+-------+---------------------------+ +| L2 segmentation | | e.g. VLAN or VxLAN or GRE | ++-----------------+-------+---------------------------+ +| virtual routers | | HA | ++-----------------+-------+---------------------------+ + +Test Case 1: Comparative analysis of metrics with and without L3 agents restart +------------------------------------------------------------------------------- + +Description +^^^^^^^^^^^ + +`Shaker `__ is +able to deploy OpenStack instances and networks in different topologies. +For L3 HA, the most important scenarios are those that check connection +between VMs in different networks (`L3 east-west +`__) +and connection via floating ip (`L3 north-south +`__). + +The following tests should be executed: + +1. OpenStack L3 East-West + + - This scenario launches pairs of VMs in different networks + connected to one router (L3 east-west) + +2. OpenStack L3 East-West Performance + + - This scenario launches 1 pair of VMs in different networks + connected to one router (L3 east-west). VMs are hosted on + different compute nodes. + +3. OpenStack L3 North-South + + - This scenario launches pairs of VMs on different compute nodes. + VMs are in the different networks connected via different + routers, master accesses slave by floating ip. + +4. OpenStack L3 North-South UDP + +5. OpenStack L3 North-South Performance + +6. OpenStack L3 North-South Dense + + - This scenario launches pairs of VMs on one compute node. VMs are + in the different networks connected via different routers, + master accesses slave by floating ip. + +For scenarios 1,2,3 and 6, results were also collected for L3 agent restart +with L3 HA option disabled and standard router rescheduling enabled. + +While running shaker tests, scripts restart.sh and restart_not_ha.sh were executed. + + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. table:: Shaker metrics +======== =============== ================= ====================================== +Priority Value Measurement Units Description +======== =============== ================= ====================================== +1 Latency ms The network latency +1 TCP bandwidth Mbits/s TCP network bandwidth +2 UDP bandwidth packets per sec Number of UDP packets of 32 bytes size +2 TCP retransmits packets per sec Number of retransmitted TCP packets +======== =============== ================= ====================================== + +Test Case 2: Rally tests execution +---------------------------------- + +Description +^^^^^^^^^^^ +Rally allows to check the ability of OpenStack to perform simple operations +like create-delete, create-update, etc on scale. + +L3 HA has a restriction of 255 routers per HA network per tenant. At this moment +we do not have the ability to create new HA network per tenant if the number of +VIPs exceed this limit. Based on this, for some tests, the number of tenants +was increased (NeutronNetworks.create_and_list_router). +The most important results are provided by test_create_delete_routers test, +as it allows to catch possible race conditions during creation/deletion of +HA routers, HA networks and HA interfaces. There are already several known bugs +related to this which have been fixed in upstream. +To find out more possible issues test_create_delete_routers has been run multiple +times with different concurrency. + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. table:: Shaker metrics + +======== ====================== ======================================================== +Priority Measurement Units Description +======== ====================== ======================================================== +1 Number of failed tests Number of tests that failed during Rally tests execution +2 Concurrency Number of tests that executed in parallel +======== ====================== ======================================================== + + +Test Case 3: Manual destruction test: Ping to external network from VM during reset of primary(non-primary) controller +---------------------------------------------------------------------------------------------------------------------- + +Description +^^^^^^^^^^^ +.. image:: ping_external.png + :width: 650px + +Scenario steps: + +1. create router + ``neutron router-create routerHA --ha True`` +2. set gateway for external network and add interface + ``neutron router-gateway-set routerHA `` + ``neutron router-interface-add routerHA `` +3. boot an instance in private net + ``nova boot --image --flavor --nic net_id= vm1`` +4. Login to VM using ssh or VNC console +5. Start ping 8.8.8.8 and check that packets are not lost +6. Check which agent is active with + ``neutron l3-agent-list-hosting-router `` +7. Restart node on which l3-agent is active + ``sudo shutdown -r now`` or ``sudo reboot`` +8. Wait until another agent becomes active and restarted node recover + ``neutron l3-agent-list-hosting-router `` +9. Stop ping and check the number of packets that was lost. +10. Increase number of routers and repeat steps 5-10 + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +======== ======================= ========================================================= +Priority Measurement Units Description +======== ======================= ========================================================= +1 Number of loss packets Number of packets that was lost when L3 agent was banned +2 Number of routers Number of existing router of the environment +======== ======================= ========================================================= + + +Test Case 4: Manual destruction test: Ping from one VM to another VM in different network during ban L3 agent +------------------------------------------------------------------------------------------------------------- + +Description +^^^^^^^^^^^ +.. image:: ping.png + :width: 650px + +Scenario steps: + +1. create router + ``neutron router-create routerHA--ha True`` +2. add interface for two internal networks + ``router-interface-add routerHA `` + ``router-interface-add routerHA `` +3. boot an instance in private net1 and net2 + ``nova boot --image --flavor --nic net_id= vm1`` +4. Login into VM1 using ssh or VNC console +5. Start ping vm2_ip and check that packets are not lost +6. Check which agent is active with + ``neutron l3-agent-list-hosting-router `` +7. ban active l3 agent run: + ``pcs resource ban p_neutron-l3-agent node-`` +8. Wait until another agent become active in neutron l3-agent-list-hosting-router +9. Clear banned agent + ``pcs resource clear p_neutron-l3-agent node-`` +10. Stop ping and check the number of packets that was lost. +11. Increase number of routers and repeat steps 5-10 + + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +======== ======================= =========================================================== +Priority Measurement Units Description +======== ======================= =========================================================== +1 Number of loss packets Number of packets that was lost during restart of the node +2 Number of routers Number of existing router of the environment +======== ======================= =========================================================== + +Test Case 5: Manual destruction test: Iperf UPD testing between VMs in different networks ban L3 agent +------------------------------------------------------------------------------------------------------ + +Description +^^^^^^^^^^^ +.. image:: iperf_addresses.png + :width: 650px + +Scenario steps: + +1. Create vms. +2. Login to VM1 using ssh or VNC console and run + ``iperf -s -u`` +3. Login to VM2 using ssh or VNC console and run + ``iperf -c vm1_ip -p 5001 -t 60 -i 10 --bandwidth 30M --len 64 -u`` +4. Check that loss is less than 1% +5. Check which agent is active with + ``neutron l3-agent-list-hosting-router `` +6. Run command from step 3 again +7. ban active l3 agent run: + ``pcs resource ban p_neutron-l3-agent node-`` +8. Check the results of iperf command and clear banned L3 agent. + ``pcs resource clear p_neutron-l3-agent node-`` +9. Increase number of routers and repeat steps 3-8 + +List of performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +======== =============== ================= ==================================== +Priority Value Measurement Units Description +======== =============== ================= ==================================== +1 UDP bandwidth % Loss of UDP packets of 64 bytes size +======== =============== ================= ==================================== \ No newline at end of file diff --git a/doc/source/test_results/index.rst b/doc/source/test_results/index.rst index 8e2bc72..177af59 100644 --- a/doc/source/test_results/index.rst +++ b/doc/source/test_results/index.rst @@ -15,4 +15,5 @@ Test Results db/index keystone/index container_platforms/index + neutron_features/index diff --git a/doc/source/test_results/neutron_features/index.rst b/doc/source/test_results/neutron_features/index.rst new file mode 100644 index 0000000..696f00d --- /dev/null +++ b/doc/source/test_results/neutron_features/index.rst @@ -0,0 +1,12 @@ +.. raw:: pdf + + PageBreak oneColumn + +============================== +Neutron features scale testing +============================== + +.. toctree:: + :maxdepth: 3 + + l3_ha/test_results diff --git a/doc/source/test_results/neutron_features/l3_ha/iperf_addresses.png b/doc/source/test_results/neutron_features/l3_ha/iperf_addresses.png new file mode 100644 index 0000000..1a664ae Binary files /dev/null and b/doc/source/test_results/neutron_features/l3_ha/iperf_addresses.png differ diff --git a/doc/source/test_results/neutron_features/l3_ha/ping.png b/doc/source/test_results/neutron_features/l3_ha/ping.png new file mode 100644 index 0000000..2cc0130 Binary files /dev/null and b/doc/source/test_results/neutron_features/l3_ha/ping.png differ diff --git a/doc/source/test_results/neutron_features/l3_ha/ping_external.png b/doc/source/test_results/neutron_features/l3_ha/ping_external.png new file mode 100644 index 0000000..41dafe4 Binary files /dev/null and b/doc/source/test_results/neutron_features/l3_ha/ping_external.png differ diff --git a/doc/source/test_results/neutron_features/l3_ha/test_results.rst b/doc/source/test_results/neutron_features/l3_ha/test_results.rst new file mode 100644 index 0000000..0d355c7 --- /dev/null +++ b/doc/source/test_results/neutron_features/l3_ha/test_results.rst @@ -0,0 +1,531 @@ +Neutron L3 HA test results +-------------------------- + +Environment description +======================= + +Cluster description +~~~~~~~~~~~~~~~~~~~ +* 3 controllers +* 46 compute nodes + +Software versions +~~~~~~~~~~~~~~~~~ +MOS 8.0 + +Hardware configuration of each server +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Description of servers hardware + +**Compute Vendor**: + 1x SUPERMICRO SUPERSERVER 5037MR-H8TRF MICRO-CLOUD ``_ +**CPU** + 1x INTEL XEON Ivy Bridge 6C E5-2620 V2 2.1G 15M 7.2GT/s QPI 80w SOCKET 2011R 1600 ``_ +**RAM**: + 4x Samsung DDRIII 8GB DDR3-1866 1Rx4 ECC REG RoHS M393B1G70QH0-CMA +**NIC** + 1x AOC-STGN-i2S - 2-port 10 Gigabit Ethernet SFP+ + + + +Rally test results +================== + +L3 HA has a restriction of 255 routers per HA network per tenant. At this moment +we do not have the ability to create new HA network per tenant if the number of +VIPs exceed this limit. Based on this, for some tests, the number of tenants +was increased (NeutronNetworks.create_and_list_router). + +The most important results are provided by test_create_delete_routers test, +as it allows to catch possible race conditions during creation/deletion of HA +routers, HA networks and HA interfaces. There are already several known bugs +related to this which have been fixed in upstream. To find out more possible +issues test_create_delete_routers has been run multiple times with different +concurrency. + +.. list-table:: Results of test_create_delete_routers + :header-rows: 1 + + * + - Times + - Concurrency + - Number of errors + - Link for rally report + * + - 92 + - 20 + - 0 + - `rally report `_ + * + - 92 + - 40 + - 0 + - `rally report `_ + * + - 150 + - 50 + - 1 + - `rally report `_ + * + - 150 + - 50 + - 0 + - `rally report `_ + * + - 200 + - 60 + - 1 + - `rally report `_ + * + - 200 + - 60 + - 1 + - `rally report `_ + * + - 200 + - 70 + - 2 + - `rally report `_ + * + - 200 + - 70 + - 0 + - `rally report `_ + * + - 200 + - 75 + - 1 + - `rally report `_ + * + - 200 + - 75 + - 1 + - `rally report `_ + * + - 300 + - 100 + - 1 + - `rally report `_ + * + - 300 + - 100 + - 0 + - `rally report `_ + * + - 400 + - 100 + - 1 + - `rally report `_ + * + - 400 + - 100 + - 0 + - `rally report `_ + + +Multiple scenarios: + + ++------------------------------+-------------------+-------+-------------+------------------+--------------------------------------------------------------------------------------------------------------+ +| Test | Number of tenants | Times | Concurrency | Number of errors | Link for rally report | ++------------------------------+-------------------+-------+-------------+------------------+--------------------------------------------------------------------------------------------------------------+ +|``create_and_delete_routers`` | 1 |92 |10 | 0 |`rally report `_ | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_list_routers`` | 2 |368 |10 | 272 | | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_update_routers`` |1 |92 |10 |0 | | ++------------------------------+-------------------+-------+-------------+------------------+--------------------------------------------------------------------------------------------------------------+ +|``create_and_delete_routers`` |1 |92 |10 |0 | | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_list_routers`` |2 |100 |10 |6 |`rally report `_ | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_update_routers`` |1 |92 |10 |0 | | ++------------------------------+-------------------+-------+-------------+------------------+--------------------------------------------------------------------------------------------------------------+ +|``create_and_delete_routers`` |1 |92 |10 |0 | | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_list_routers`` |10 |368 |10 |0 |`rally report `_ | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_update_routers`` |1 | 92 |10 |0 | | ++------------------------------+-------------------+-------+-------------+------------------+--------------------------------------------------------------------------------------------------------------+ +|``create_and_delete_routers`` |1 |300 |50 |1 | | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_list_routers`` |10 |368 |50 |0 |`rally report `_ | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_update_routers`` |1 |300 |50 |0 | | ++------------------------------+-------------------+-------+-------------+------------------+--------------------------------------------------------------------------------------------------------------+ +|``create_and_delete_routers`` |1 |300 |50 |1 | | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_list_routers`` |10 |368 |50 |0 |`rally report `_ | ++------------------------------+-------------------+-------+-------------+------------------+ | +|``create_and_update_routers`` |1 |300 |50 |0 | | ++------------------------------+-------------------+-------+-------------+------------------+--------------------------------------------------------------------------------------------------------------+ + + +The errors discovered have been classified as the following bugs: + +.. list-table:: Bugs + :header-rows: 1 + + * + - Short description + - Trace + - Upstream bug + - Status + * + - IpAddressGenerationFailure No more IP addresses available on network + - `trace `_ + - `bug/1562887 `_ + - Open (Affects Neutron without L3 HA enabled, probably Rally bug) + * + - Device "tap-" does not exist. + - `trace `_ + - `bug/1562887 `_ + - Open + * + - Session rollback + - `trace `_ + - `bug/1550886 `_ + - In progress + * + - SubnetInUse: Unable to complete operation on subnet + - `trace `_ + - `bug/1562878 `_ + - Open + * + - MessagingTimeout: Timed out waiting for a reply to message + - `trace `_ + - `bug/1555670 `_ + - Open + * + - DBDeadlock: ipallocationpools + - `trace `_ + - `bug/1562876 `_ + - Open + * + - Not all HA networks deleted + - `not a trace `_ + - `bug/1562892 `_ + - Open + +Summary: +~~~~~~~~ + +1. The number of failed tests is less than 1% (exception ``test_create_list_routers``, + but with increased number of tenants the problem was fixed; automatic creation of new HA + network after the previous one ran out of virtual ips is more + like a feature request). + +2. All bugs found are Medium or Low priority. + +Shaker test results +=================== + ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ +| L3 HA | L3 HA during L3 agents restart | Router rescheduling (Non L3 HA) during L3 agent restart | ++========================================+==================================+=================================================================================================================+========+==========+=========================================================================================================================+========+===========+=============================================================================================================================+ +| Lost | Errors | Link for report | Lost | Errors | Link for report | Lost | Errors | Link for report | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| OpenStack L3 East-West | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| 0 | 0 | `report `__ | 0 | 0 | `report `__ | 50 | 5 | `report `__ | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| OpenStack L3 East-West Performance | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| 1 | 0 | `report `__ | 0 | 0 | `report `__ | 0 | 1 (all) | `report `__ | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| OpenStack L3 North-South | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| 0 | 0 | `report `__ | 8 | 0 | `report `__ | 95 | 3 | `report `__ | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| OpenStack L3 North-South UDP | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| 10 | 1 | `report `__ | 14 | 0 | `report `__ | | | | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| OpenStack L3 North-South Performance | +| | +| (concurrency 2) | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| 0 | 0 | `report `__ | 0 | 0 | `report `__ | | | | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| OpenStack L3 North-South Performance | +| | +| (concurrency 5) | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| 0 | 0 | `report `__ | 1 | 0 | `report `__ | | | | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| OpenStack L3 North-South Dense | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ +| 0 | 0 | `report `__ | 41 | 0 | `report `__ | 81 | 1 | `report `__ | ++----------------------------------------+----------------------------------+-----------------------------------------------------------------------------------------------------------------+--------+----------+-------------------------------------------------------------------------------------------------------------------------+--------+-----------+-----------------------------------------------------------------------------------------------------------------------------+ + +Shaker provides statistics about maximum, minimum and mean values of +different connection measurements. For each test was found the maximum +among all maximum values, minimum among all minimum values and counts +the mean value from all mean values. In the table below, these values +are presented. + ++-----------------+---------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+-----------------------------------------------------------+ +| type | L3 HA | L3 HA during l3 agents restart | Router rescheduling (Non L3 HA) during l3 agent restart | ++=================+========================================+==================================+===========================================================+================+=================+================+====================+===========+==========================+ +| | min | mean | max | min | mean | max | min | mean | max | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| OpenStack L3 East-West | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| ping\_icmp, | 0.05 | 2.45 | 12.39 | **0.07** | **7.39** | **18.03** | 0.41 | 32.84 | 2583.93 | +| | | | | | | | | | | +| ms | | | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| tcp\_download | 0.02 | 874.04 | 5820.88 | **0.11** | **957.66** | **5883.96** | 77.41 | 896.96 | 3703.83 | +| | | | | | | | | | | +| Mbits/s | | | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| tcp\_upload | 0.02 | 884.25 | 5649.94 | **0.13** | **897.11** | **5963.02** | 64.11 | 1268.74 | 5111.02 | +| | | | | | | | | | | +| Mbits/s | | | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| OpenStack L3 East-West Performance | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| ping\_icmp | 0.64 | 0.81 | 1.45 | **0.57** | **0.82** | **1.79** | **No statistic** | +| ms | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+ | +| Bandwidth | 839.84 | 1876.83 | 3880.01 | **630.0** | **1497.19** | **3020.0** | | +| Mbit/s | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+ | +| Packets | 101680.0 | 129664.2 | 136880.0 | **89660.0** | **129515.33** | **367930.0** | | +| pps | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+ | +| retransmits | 0.0 | 0.67 | 25.0 | **0.0** | **2.5** | **72.0** | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| OpenStack L3 North-South | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| ping\_icmp, | 0.08 | 9.83 | 27.61 | **0.06** | **7.11** | **25.73** | 0.33 | 0.62 | 2.45 | +| | | | | | | | | | | +| ms | | | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| tcp\_download | 65.28 | 902.35 | 4454.43 | **72.7** | **769.61** | **4494.97** | 741.95 | 1647.07 | 2776.53 | +| | | | | | | | | | | +| Mbits/s | | | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| tcp\_upload | 0.13 | 815.02 | 4345.86 | **0.13** | **867.68** | **4289.98** | **No statistic** | +| | | | | | | | | +| Mbits/s | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| OpenStack L3 North-South UDP | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| Packets | 31218.0 | 123452.06 | 476254.0 | **39196.0** | **122214.76** | **431108.0** | | +| pps | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| OpenStack L3 North-South Performance | +| | +| (concurrency 2) | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| ping\_icmp | 0.9 | 1.22 | 2.36 | **0.67** | **0.93** | **2.34** | | +| ms | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| Bandwidth | 439.91 | 449.94 | 525.5 | **0.0** | **2000.8** | **3400.5** | | +| Mbit/s | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| Packets | 126360.0 | 129349.33 | 135150.0 | **131700.0** | **135319.33** | **140550.0** | | +| pps | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| retransmits | 0.0 | 1.0 | 83.0 | **0.0** | **3.0** | **205.0** | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| OpenStack L3 North-South Performance | +| | +| (concurrency 5) | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| ping\_icmp | 0.74 | 0.97 | 1.72 | **0.2** | **1.02** | **3.01** | | +| ms | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| Bandwidth | 41.99 | 181.01 | 386.43 | **0.0** | **1720.71** | **3519.77** | | +| Mbit/s | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| Packets | 122140.0 | 131601.17 | 138220.0 | **103510.0** | **129021.6** | **138860.0** | | +| pps | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| retransmits | 0.0 | 1.0 | 49.0 | **0.0** | **3.17** | **231.0** | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+-----------------------------------------------------------+ +| OpenStack L3 North-South Dense | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| ping\_icmp, | 0.56 | 18.18 | 96.42 | **0.38** | **4.07** | **56.35** | 0.45 | 9.79 | 106.52 | +| | | | | | | | | | | +| ms | | | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| tcp\_download | 1.72 | 210.2 | 862.02 | **322.24** | **1634.48** | **4656.44** | 11.61 | 407.69 | 2235.84 | +| | | | | | | | | | | +| Mbits/s | | | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ +| tcp\_upload | 18.88 | 209.49 | 781.86 | **49.96** | **1590.83** | **4667.82** | 18.77 | 1955.41 | 4333.32 | +| | | | | | | | | | | +| Mbits/s | | | | | | | | | | ++-----------------+----------------------------------------+----------------------------------+-----------------------------------------------------------+----------------+-----------------+----------------+--------------------+-----------+--------------------------+ + +These results show that there is no significant difference between +results during multiple l3 agent restarts and normal test execution. + +Average value of difference between these values without and with +restart presented in the next table: + ++--------+---------------+-----------------+---------------+-------------+-----------+---------------+ +| | ping\_icmp, | tcp\_download | tcp\_upload | Bandwidth | Packets | retransmits | +| | | | | Mbit/s | pps | | +| | ms | Mbits/s | Mbits/s | | | | ++========+===============+=================+===============+=============+===========+===============+ +| min | 0.17 | -103.34 | -10.39 | 230.58 | 4333 | 0 | ++--------+---------------+-----------------+---------------+-------------+-----------+---------------+ +| mean | 2.02 | -458.39 | -482.39 | -903.64 | -501.07 | -2 | ++--------+---------------+-----------------+---------------+-------------+-----------+---------------+ +| max | 5.78 | -1299.35 | -1381.05 | -1717.11 | -47986 | -117 | ++--------+---------------+-----------------+---------------+-------------+-----------+---------------+ + +Summary: +~~~~~~~~ + +1. Results of comparison between L3 HA and standard router rescheduling + show that L3 HA allows to perform testing uninterrupted without + huge loss of statistics during L3 agent restarts. + +2. Comparing L3 HA results with and without restart show that bandwidth + and speed do not decrease during agent restart. + + +Manual tests execution +====================== + +During manual testing, the following scenarios were tested: + +- Ping to external network from VM during reset of primary(non-primary) + controller + +- Ping from one VM to another VM in different network during ban L3 + agent + +- Iperf UPD testing between VMs in different networks during ban L3 + agent + +All tests were performed with large number of routers. + +Ping to external network from VM during reset of primary(non-primary) controller +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. image:: ping_external.png + :width: 650px + ++-------------+---------------------+----------------+---------------------------+ +| Iteration | Number of routers | Command | Number of loss packages | ++=============+=====================+================+===========================+ +| 1 | 1 | | 3 | ++-------------+---------------------+----------------+---------------------------+ +| 2 | 25 | | 3 | ++-------------+---------------------+----------------+---------------------------+ +| 3 | 50 | | 3 | ++-------------+---------------------+----------------+---------------------------+ +| 4 | 100 | | 3 | ++-------------+---------------------+----------------+---------------------------+ +| 5 | 150 | | 3 | ++-------------+---------------------+----------------+---------------------------+ +| 6 | 170 | ping 8.8.8.8 | 3 | ++-------------+---------------------+----------------+---------------------------+ +| 7 | 175 | | 89 | ++-------------+---------------------+----------------+---------------------------+ +| 8 | 175 | | 116 | ++-------------+---------------------+----------------+---------------------------+ +| 9 | 175 | | 52 | ++-------------+---------------------+----------------+---------------------------+ +| 10 | 200 | | 51 | ++-------------+---------------------+----------------+---------------------------+ +| 11 | 200 | | 3 | ++-------------+---------------------+----------------+---------------------------+ + +Current result looks unstable and not directly dependent on the number +of routers. The huge loss of packages on iterations 7-10 happened +because agent from recovered controller became “active” (master) while +there was already another active L3 agent. After some time it became the +only “active” L3 agent for router. + +This issue needs special attention and will be investigated as +`bug/1563298 `__. + +Ping from one VM to another VM in different network during ban L3 agent +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. image:: ping.png + :width: 650px + ++-------------+---------------------+-----------------+---------------------------+ +| Iteration | Number of routers | Command | Number of loss packages | ++=============+=====================+=================+===========================+ +| 1 | 100 | | 4 | ++-------------+---------------------+-----------------+---------------------------+ +| 2 | | | 4 | ++-------------+---------------------+-----------------+---------------------------+ +| 3 | | | 3 | ++-------------+---------------------+-----------------+---------------------------+ +| 4 | 200 | | 3 | ++-------------+---------------------+-----------------+---------------------------+ +| 5 | | | 3 | ++-------------+---------------------+-----------------+---------------------------+ +| 6 | | ping 10.0.1.6 | 103 | ++-------------+---------------------+-----------------+---------------------------+ +| 7 | | | 26 | ++-------------+---------------------+-----------------+---------------------------+ +| 8 | | | 3 | ++-------------+---------------------+-----------------+---------------------------+ +| 9 | 250 | | 3 | ++-------------+---------------------+-----------------+---------------------------+ +| 10 | | | 4 | ++-------------+---------------------+-----------------+---------------------------+ + +The loss of packages on iterations 6-7 happend for the similar reason as +for previous manual scenario. L3 agent `status +flapped `__ during loss. + +With 250 routers l3 agents started to fail with `unmanaged +state `__. + +Iperf UPD testing between VMs in different networks ban L3 agent +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. image:: iperf_addresses.png + :width: 650px + ++---------------------+---------------------------------------------------------------------+------------+ +| Number of routers | Command | Loss (%) | ++=====================+=====================================================================+============+ +| 10 | | 0.14 | ++---------------------+---------------------------------------------------------------------+------------+ +| | | 4.9 | ++---------------------+---------------------------------------------------------------------+------------+ +| | | 1.3 | ++---------------------+---------------------------------------------------------------------+------------+ +| | | 5.3 | ++---------------------+---------------------------------------------------------------------+------------+ +| 24 | | 1.3 | ++---------------------+---------------------------------------------------------------------+------------+ +| | iperf -c 10.0.3.4 -p 5001 -t 60 -i 10 --bandwidth 30M --len 64 -u | 8.9 | ++---------------------+---------------------------------------------------------------------+------------+ +| | | 6.1 | ++---------------------+---------------------------------------------------------------------+------------+ +| | | 2.4 | ++---------------------+---------------------------------------------------------------------+------------+ +| 50 | | 1.7 | ++---------------------+---------------------------------------------------------------------+------------+ +| | | 10 | ++---------------------+---------------------------------------------------------------------+------------+ +| | | 40 | ++---------------------+---------------------------------------------------------------------+------------+ +| | | 18 | ++---------------------+---------------------------------------------------------------------+------------+ + +Summary: +~~~~~~~~ + +1. For unstable behaviour of L3 HA, + `bug `__ was + filed. + +2. With number of routers less than 170, the network can be classified + as stable for failures. + +3. With number of routers more than 240, agent’s recovery leads to + falling into unmanaged state. \ No newline at end of file