Merge "Neutron control plane performance and agent restart"

This commit is contained in:
Jenkins 2016-10-10 18:30:56 +00:00 committed by Gerrit Code Review
commit ce68881e60
16 changed files with 14776 additions and 0 deletions

View File

@ -0,0 +1,145 @@
.. _neutron_agent_restart_test_plan:
=============================================================
OpenStack Neutron Control Plane Performance and Agent Restart
=============================================================
:status: **draft**
:version: 1.0
Test Plan
=========
Neutron Server is the core of Neutron control plane. It processes requests
from public API and internal RPC API. The latter is used to communicate with
agents. Normally RPC is used to notify agents about updated configuration.
However in case of agent restart or communication failure the agent requests
all data from server and the amount of data may be significant.
The goal of this test plan is to measure how restart of bunch of agents
affect performance of Neutron control plane.
Test Environment
----------------
Preparation
^^^^^^^^^^^
This test plan is performed against existing OpenStack cloud.
Environment description
^^^^^^^^^^^^^^^^^^^^^^^
The environment description includes hardware specification of servers,
network parameters, operation system and OpenStack deployment characteristics.
Hardware
~~~~~~~~
This section contains list of all types of hardware nodes.
+-----------+-------+----------------------------------------------------+
| Parameter | Value | Comments |
+-----------+-------+----------------------------------------------------+
| model | | e.g. Supermicro X9SRD-F |
+-----------+-------+----------------------------------------------------+
| CPU | | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz |
+-----------+-------+----------------------------------------------------+
| role | | e.g. compute or network |
+-----------+-------+----------------------------------------------------+
Network
~~~~~~~
This section contains list of interfaces and network parameters.
For complicated cases this section may include topology diagram and switch
parameters.
+------------------+-------+-------------------------+
| Parameter | Value | Comments |
+------------------+-------+-------------------------+
| network role | | e.g. provider or public |
+------------------+-------+-------------------------+
| card model | | e.g. Intel |
+------------------+-------+-------------------------+
| driver | | e.g. ixgbe |
+------------------+-------+-------------------------+
| speed | | e.g. 10G or 1G |
+------------------+-------+-------------------------+
| MTU | | e.g. 9000 |
+------------------+-------+-------------------------+
| offloading modes | | e.g. default |
+------------------+-------+-------------------------+
Software
~~~~~~~~
This section describes installed software.
+-----------------+-------+---------------------------+
| Parameter | Value | Comments |
+-----------------+-------+---------------------------+
| OS | | e.g. Ubuntu 14.04.3 |
+-----------------+-------+---------------------------+
| OpenStack | | e.g. Liberty |
+-----------------+-------+---------------------------+
| Hypervisor | | e.g. KVM |
+-----------------+-------+---------------------------+
| Neutron plugin | | e.g. ML2 + OVS |
+-----------------+-------+---------------------------+
| L2 segmentation | | e.g. VLAN or VxLAN or GRE |
+-----------------+-------+---------------------------+
| virtual routers | | HA |
+-----------------+-------+---------------------------+
Test Case: mass restart of agents
---------------------------------
Description
^^^^^^^^^^^
Measurements can be performed by methodology described in
:ref:`reliability_testing_version_2`. The following metrics need to be
collected:
.. list-table::
:header-rows: 1
*
- Priority
- Value
- Measurement Unit
- Description
*
- 1
- Service downtime
- sec
- How long the service was not available and operations were in error
state.
*
- 1
- MTTR
- sec
- How long does it takes to recover service performance after the failure.
*
- 1
- Operation Degradation
- sec
- the mean of difference in operation performance during recovery period
and operation performance when service operates normally.
*
- 1
- Operation Degradation Ratio
- sec
- the ratio between operation performance during recovery period and
operation performance when service operates normally.
Reports
=======
Test plan execution reports:
* :ref:`neutron_agent_restart_test_report`

View File

@ -11,3 +11,4 @@ Neutron features test plans
l3_ha/plan
resource_density/plan
agent_restart/plan

View File

@ -0,0 +1,77 @@
Networks operations and L3-agent restart
========================================
In this scenario we restart all L3 agents while Neutron creates and deletes
networks.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_networks:
-
args:
network_create_args: {}
runner:
type: "constant_for_duration"
duration: 120
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-l3-agent service
trigger:
name: event
args:
unit: iteration
at: [100]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 85 | 0.36 | 0.4 | 0.068 | 0.52 |
+-----------+-------------+-----------+-----------+---------------------+

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 206 KiB

View File

@ -0,0 +1,76 @@
Networks operations and OVS agent restart
=========================================
In this scenario we restart all OVS agents while Neutron creates and deletes
networks.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_networks:
-
args:
network_create_args: {}
runner:
type: "constant_for_duration"
duration: 120
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-openvswitch-agent service
trigger:
name: event
args:
unit: iteration
at: [100]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 86 | 0.38 | 0.4 | 0.063 | 0.5 |
+-----------+-------------+-----------+-----------+---------------------+

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 223 KiB

View File

@ -0,0 +1,79 @@
Ports operations and L3-agent restart
=====================================
In this scenario we restart all L3 agents while Neutron creates and deletes
ports.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_ports:
-
args:
network_create_args: {}
port_create_args: {}
ports_per_network: 10
runner:
type: "constant_for_duration"
duration: 300
concurrency: 6
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
port: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-l3-agent service
trigger:
name: event
args:
unit: iteration
at: [80]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 63 | 8.5 | 8.5 | 0.4 | 9.3 |
+-----------+-------------+-----------+-----------+---------------------+

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 73 KiB

View File

@ -0,0 +1,79 @@
Ports operations and OVS agent restart
======================================
In this scenario we restart all OVS agents while Neutron creates and deletes
ports.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_ports:
-
args:
network_create_args: {}
port_create_args: {}
ports_per_network: 10
runner:
type: "constant_for_duration"
duration: 300
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
port: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-openvswitch-agent service
trigger:
name: event
args:
unit: iteration
at: [80]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 65 | 8.7 | 8.8 | 0.31 | 9.3 |
+-----------+-------------+-----------+-----------+---------------------+

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 63 KiB

View File

@ -0,0 +1,80 @@
Subnets operations and L3-agent restart
=======================================
In this scenario we restart all L3 agents while Neutron creates and deletes
subnets.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_subnets:
-
args:
network_create_args: {}
subnet_create_args: {}
subnet_cidr_start: "1.1.0.0/28"
subnets_per_network: 2
runner:
type: "constant_for_duration"
duration: 120
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
subnet: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-l3-agent service
trigger:
name: event
args:
unit: iteration
at: [100]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 85 | 2 | 2 | 0.16 | 2.4 |
+-----------+-------------+-----------+-----------+---------------------+

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 77 KiB

View File

@ -0,0 +1,80 @@
Subnets operations and OVS-agent restart
========================================
In this scenario we restart all OVS agents while Neutron creates and deletes
subnets.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_subnets:
-
args:
network_create_args: {}
subnet_create_args: {}
subnet_cidr_start: "1.1.0.0/28"
subnets_per_network: 2
runner:
type: "constant_for_duration"
duration: 120
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
subnet: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-openvswitch-agent service
trigger:
name: event
args:
unit: iteration
at: [100]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 85 | 1.3 | 1.4 | 0.14 | 1.6 |
+-----------+-------------+-----------+-----------+---------------------+

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 99 KiB

View File

@ -0,0 +1,50 @@
.. _neutron_agent_restart_test_report:
=========================================================================
OpenStack Neutron Control Plane Performance and Agent Restart Test Report
=========================================================================
This report is generated for :ref:`neutron_agent_restart_test_plan`.
Environment description
=======================
Cluster description
-------------------
* 3 controllers
* 3 compute nodes
Software versions
-----------------
**OpenStack/System**:
Fuel/MOS 9.0, Ubuntu 14.04, Linux kernel 3.13, OVS 2.4.1
**Networking**
Neutron ML2 + OVS plugin, DVR, L2pop, MTU 1500
Hardware configuration of each server
-------------------------------------
Description of servers hardware
**Compute Vendor**:
HP ProLiant DL380 Gen9,
**CPU**
2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores)
**RAM**:
256 Gb
**NIC**
2 x Intel Corporation Ethernet 10G 2P X710
Reports
=======
Reports are collected on OpenStack with 100 instances, 100 routers,
100 networks.
.. toctree::
:glob:
:maxdepth: 1
*/index

View File

@ -12,3 +12,4 @@ Neutron features scale testing
l3_ha/test_results_liberty
l3_ha/test_results_mitaka
resource_density/index
agent_restart/index