Merge "Add test plan for OpenStack API performance metrics"

2016-09-21 22:16:36 +00:00 · 2016-09-21 22:16:36 +00:00 · e12762db42
commit e12762db42
parent a6e3543717 20d28676ec
1 changed files with 250 additions and 0 deletions
--- a/doc/source/test_plans/openstack_api_metrics/plan.rst
+++ b/doc/source/test_plans/openstack_api_metrics/plan.rst
@ -0,0 +1,250 @@
 .. _openstack_api_performance_metrics_test_plan:
 =================================
 OpenStack API Performance Metrics
 =================================
 :status: **draft**
 :version: 1.0
 :Abstract:
  This test plan defines performance metrics for OpenStack API and the way
  to measure them.
 :Conventions:
  - **Operation Duration** - how long does it take to perform a single
    operation.
  - **Operation Throughput** - how many operations can be done in one second in
    average.
  - **Concurrency** - how many parallel operations can be run when operation
    throughput reaches the maximum.
  - **Scale Impact** - comparison of operation metrics when number of objects
    is high versus low.
 Test Plan
 =========
 This test plan defines set of performance metrics for OpenStack API. This
 metrics can be used to compare different cloud implementations and for
 performance tuning.
 This test plan can be used to answer the following questions:
 * How long does it take to perform a particular operation? (*e.g. duration of
   Neutron net_create operation*)
 * How many concurrent operation can be run in parallel without degradation?
   (*e.g. can one do 10 Neutron net_create operation in parallel or better do
   them one-by-one*)
 * How many particular operations can OpenStack cloud process in a second?
   (*e.g. find out whether one can do 100 Neutron net_create ops per second or
   not*)
 * What is the impact of having many objects in the cloud? How the performance
   degrades? (*e.g. will the cloud be slower when there are thousands of
   objects and how slower will it be*)
 Test Environment
 ----------------
 Preparation
 ^^^^^^^^^^^
 This test plan is executed against existing OpenStack cloud.
 Measurements can be done with the tool that can:
 * report duration of single operations;
 * execute operations one-by-one and in a configurable number of concurrent
   threads.
 Environment description
 ^^^^^^^^^^^^^^^^^^^^^^^
 The environment description includes hardware specification of servers,
 network parameters, operation system and OpenStack deployment characteristics.
 Hardware
 ~~~~~~~~
 This section contains list of all types of hardware nodes.
 +-----------+-------+----------------------------------------------------+
 | Parameter | Value | Comments                                           |
 +-----------+-------+----------------------------------------------------+
 | model     |       | e.g. Supermicro X9SRD-F                            |
 +-----------+-------+----------------------------------------------------+
 | CPU       |       | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz |
 +-----------+-------+----------------------------------------------------+
 | role      |       | e.g. compute or network                            |
 +-----------+-------+----------------------------------------------------+
 Network
 ~~~~~~~
 This section contains list of interfaces and network parameters.
 For complicated cases this section may include topology diagram and switch
 parameters.
 +------------------+-------+-------------------------+
 | Parameter        | Value | Comments                |
 +------------------+-------+-------------------------+
 | network role     |       | e.g. provider or public |
 +------------------+-------+-------------------------+
 | card model       |       | e.g. Intel              |
 +------------------+-------+-------------------------+
 | driver           |       | e.g. ixgbe              |
 +------------------+-------+-------------------------+
 | speed            |       | e.g. 10G or 1G          |
 +------------------+-------+-------------------------+
 | MTU              |       | e.g. 9000               |
 +------------------+-------+-------------------------+
 | offloading modes |       | e.g. default            |
 +------------------+-------+-------------------------+
 Software
 ~~~~~~~~
 This section describes installed software.
 +-----------------+-------+---------------------------+
 | Parameter       | Value | Comments                  |
 +-----------------+-------+---------------------------+
 | OS              |       | e.g. Ubuntu 14.04.3       |
 +-----------------+-------+---------------------------+
 | OpenStack       |       | e.g. Liberty              |
 +-----------------+-------+---------------------------+
 | Hypervisor      |       | e.g. KVM                  |
 +-----------------+-------+---------------------------+
 | Neutron plugin  |       | e.g. ML2 + OVS            |
 +-----------------+-------+---------------------------+
 | L2 segmentation |       | e.g. VLAN or VxLAN or GRE |
 +-----------------+-------+---------------------------+
 | virtual routers |       | e.g. legacy or HA or DVR  |
 +-----------------+-------+---------------------------+
 Test Case: Operation Performance Measurements
 ---------------------------------------------
 Description
 ^^^^^^^^^^^
 The test case is performed by running a specific OpenStack operation. Every
 operation is executed several times to collect more reliable statistical data.
 Parameters
 ^^^^^^^^^^
 The only parameter is the operation being tested.
 List of performance metrics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. list-table::
   :header-rows: 1
   *
     - Priority
     - Value
     - Measurement Unit
     - Description
   *
     - 1
     - Duration median
     - ms
     - Median of operation durations measured when operations are performed
       one-by-one in 1 thread
   *
     - 1
     - Duration 95% percentile
     - ms
     - 95% percentile of operation durations measured when operations are
       performed one-by-one in 1 thread
   *
     - 2
     - Duration 99% percentile
     - ms
     - 99% percentile of operation durations measured when operations are
       performed one-by-one in 1 thread
   *
     - 1
     - Concurrency
     - count
     - How many operations can be processed in parallel without significant
       degradation of duration
   *
     - 1
     - Throughput
     - operations per second
     - How many operations can be processed in one second
   *
     - 1
     - Scale impact
     - %
     - Performance degradation measured as ratio of operation duration when
       number of objects is 1k versus when number of objects is low.
 Tools
 =====
 Rally
 -----
 This test plan can be executed with `Rally`_ tool. Rally can report
 duration of individual operations and can be configured to perform operations
 in multiple parallel threads.
 Rally scenario execution also involves creation/deletion of additional objects
 (like tenants, users) and cleaning of resources created by scenario. All this
 consumes extra time, so it makes sense to run measurements not one-by-one, but
 grouped by resource type. E.g. instead of having 4 separate scenarios for
 create, get, list and delete operations have 1 that calls these operations
 sequentially.
 Scenarios
 ^^^^^^^^^
 To perform measurements we will need 2 types of scenarios:
 * **cyclic** - sequence of `create`, `get`, `list` and `delete`
   operations; total number of objects is not increased.
 * **accumulative** - sequence of `create`, `get` and `list` operations;
   total number of objects is increasing.
 Duration metrics
 ^^^^^^^^^^^^^^^^
 Duration metrics are collected with help of cyclic scenario.
 Actions:
 #. Set concurrency in 1 thread.
 #. Run scenario N times, where N is large enough to make a good sample.
    Collect list of operation durations.
 #. For every operation calculate median and percentiles.
 Concurrency and throughput metrics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 These metrics are collected with help of cyclic scenarios.
 Actions:
 #. Start with concurrency in 1 thread.
 #. Run scenario N times, where N is large enough to make a good sample.
    Collect list of operation durations.
 #. Calculate throughput (divide number of operations on total duration).
 Scale impact metrics
 ^^^^^^^^^^^^^^^^^^^^
 These metrics are collected with help of accumulative scenarios.
 Actions:
 #. Set concurrency in 1 thread.
 #. Run scenario until desired number of objects reached (e.g. 1 thousand).
 #. Calculate mean for first 50 objects and for last 50.
 #. Calculate the ratio between means.
 .. references:
 .. _Rally: http://rally.readthedocs.io/