Add test plan for OpenStack API performance metrics

The test plan defines set of metrics that can be collected for OpenStack API operations. Change-Id: I20899adff29578253be7a1d9440d937797a9f400
2016-09-14 16:03:15 +03:00 · 2016-09-14 16:03:15 +03:00 · 20d28676ec
commit 20d28676ec
parent 20b0943204
1 changed files with 250 additions and 0 deletions
--- a/doc/source/test_plans/openstack_api_metrics/plan.rst
+++ b/doc/source/test_plans/openstack_api_metrics/plan.rst
@ -0,0 +1,250 @@
+.. _openstack_api_performance_metrics_test_plan:
+
+=================================
+OpenStack API Performance Metrics
+=================================
+
+:status: **draft**
+:version: 1.0
+
+:Abstract:
+
+  This test plan defines performance metrics for OpenStack API and the way
+  to measure them.
+
+:Conventions:
+  - **Operation Duration** - how long does it take to perform a single
+    operation.
+  - **Operation Throughput** - how many operations can be done in one second in
+    average.
+  - **Concurrency** - how many parallel operations can be run when operation
+    throughput reaches the maximum.
+  - **Scale Impact** - comparison of operation metrics when number of objects
+    is high versus low.
+
+
+Test Plan
+=========
+
+This test plan defines set of performance metrics for OpenStack API. This
+metrics can be used to compare different cloud implementations and for
+performance tuning.
+
+This test plan can be used to answer the following questions:
+ * How long does it take to perform a particular operation? (*e.g. duration of
+   Neutron net_create operation*)
+ * How many concurrent operation can be run in parallel without degradation?
+   (*e.g. can one do 10 Neutron net_create operation in parallel or better do
+   them one-by-one*)
+ * How many particular operations can OpenStack cloud process in a second?
+   (*e.g. find out whether one can do 100 Neutron net_create ops per second or
+   not*)
+ * What is the impact of having many objects in the cloud? How the performance
+   degrades? (*e.g. will the cloud be slower when there are thousands of
+   objects and how slower will it be*)
+
+Test Environment
+----------------
+
+Preparation
+^^^^^^^^^^^
+
+This test plan is executed against existing OpenStack cloud.
+
+Measurements can be done with the tool that can:
+ * report duration of single operations;
+ * execute operations one-by-one and in a configurable number of concurrent
+   threads.
+
+Environment description
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The environment description includes hardware specification of servers,
+network parameters, operation system and OpenStack deployment characteristics.
+
+Hardware
+~~~~~~~~
+
+This section contains list of all types of hardware nodes.
+
+-----------+-------+----------------------------------------------------+
+| Parameter | Value | Comments                                           |
+-----------+-------+----------------------------------------------------+
+| model     |       | e.g. Supermicro X9SRD-F                            |
+-----------+-------+----------------------------------------------------+
+| CPU       |       | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz |
+-----------+-------+----------------------------------------------------+
+| role      |       | e.g. compute or network                            |
+-----------+-------+----------------------------------------------------+
+
+Network
+~~~~~~~
+
+This section contains list of interfaces and network parameters.
+For complicated cases this section may include topology diagram and switch
+parameters.
+
+------------------+-------+-------------------------+
+| Parameter        | Value | Comments                |
+------------------+-------+-------------------------+
+| network role     |       | e.g. provider or public |
+------------------+-------+-------------------------+
+| card model       |       | e.g. Intel              |
+------------------+-------+-------------------------+
+| driver           |       | e.g. ixgbe              |
+------------------+-------+-------------------------+
+| speed            |       | e.g. 10G or 1G          |
+------------------+-------+-------------------------+
+| MTU              |       | e.g. 9000               |
+------------------+-------+-------------------------+
+| offloading modes |       | e.g. default            |
+------------------+-------+-------------------------+
+
+Software
+~~~~~~~~
+
+This section describes installed software.
+
+-----------------+-------+---------------------------+
+| Parameter       | Value | Comments                  |
+-----------------+-------+---------------------------+
+| OS              |       | e.g. Ubuntu 14.04.3       |
+-----------------+-------+---------------------------+
+| OpenStack       |       | e.g. Liberty              |
+-----------------+-------+---------------------------+
+| Hypervisor      |       | e.g. KVM                  |
+-----------------+-------+---------------------------+
+| Neutron plugin  |       | e.g. ML2 + OVS            |
+-----------------+-------+---------------------------+
+| L2 segmentation |       | e.g. VLAN or VxLAN or GRE |
+-----------------+-------+---------------------------+
+| virtual routers |       | e.g. legacy or HA or DVR  |
+-----------------+-------+---------------------------+
+
+
+Test Case: Operation Performance Measurements
+---------------------------------------------
+
+Description
+^^^^^^^^^^^
+
+The test case is performed by running a specific OpenStack operation. Every
+operation is executed several times to collect more reliable statistical data.
+
+
+Parameters
+^^^^^^^^^^
+
+The only parameter is the operation being tested.
+
+List of performance metrics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. list-table::
+   :header-rows: 1
+
+   *
+     - Priority
+     - Value
+     - Measurement Unit
+     - Description
+   *
+     - 1
+     - Duration median
+     - ms
+     - Median of operation durations measured when operations are performed
+       one-by-one in 1 thread
+   *
+     - 1
+     - Duration 95% percentile
+     - ms
+     - 95% percentile of operation durations measured when operations are
+       performed one-by-one in 1 thread
+   *
+     - 2
+     - Duration 99% percentile
+     - ms
+     - 99% percentile of operation durations measured when operations are
+       performed one-by-one in 1 thread
+   *
+     - 1
+     - Concurrency
+     - count
+     - How many operations can be processed in parallel without significant
+       degradation of duration
+   *
+     - 1
+     - Throughput
+     - operations per second
+     - How many operations can be processed in one second
+   *
+     - 1
+     - Scale impact
+     - %
+     - Performance degradation measured as ratio of operation duration when
+       number of objects is 1k versus when number of objects is low.
+
+
+Tools
+=====
+
+Rally
+-----
+
+This test plan can be executed with `Rally`_ tool. Rally can report
+duration of individual operations and can be configured to perform operations
+in multiple parallel threads.
+
+Rally scenario execution also involves creation/deletion of additional objects
+(like tenants, users) and cleaning of resources created by scenario. All this
+consumes extra time, so it makes sense to run measurements not one-by-one, but
+grouped by resource type. E.g. instead of having 4 separate scenarios for
+create, get, list and delete operations have 1 that calls these operations
+sequentially.
+
+Scenarios
+^^^^^^^^^
+
+To perform measurements we will need 2 types of scenarios:
+ * **cyclic** - sequence of `create`, `get`, `list` and `delete`
+   operations; total number of objects is not increased.
+ * **accumulative** - sequence of `create`, `get` and `list` operations;
+   total number of objects is increasing.
+
+Duration metrics
+^^^^^^^^^^^^^^^^
+
+Duration metrics are collected with help of cyclic scenario.
+
+Actions:
+ #. Set concurrency in 1 thread.
+ #. Run scenario N times, where N is large enough to make a good sample.
+    Collect list of operation durations.
+ #. For every operation calculate median and percentiles.
+
+
+Concurrency and throughput metrics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These metrics are collected with help of cyclic scenarios.
+
+Actions:
+ #. Start with concurrency in 1 thread.
+ #. Run scenario N times, where N is large enough to make a good sample.
+    Collect list of operation durations.
+ #. Calculate throughput (divide number of operations on total duration).
+
+Scale impact metrics
+^^^^^^^^^^^^^^^^^^^^
+
+These metrics are collected with help of accumulative scenarios.
+
+Actions:
+ #. Set concurrency in 1 thread.
+ #. Run scenario until desired number of objects reached (e.g. 1 thousand).
+ #. Calculate mean for first 50 objects and for last 50.
+ #. Calculate the ratio between means.
+
+.. references:
+
+.. _Rally: http://rally.readthedocs.io/