Openstack API metrics test plan update

Change-Id: I717a681b8fd6a151824a41f5c13ec24b083c7363
2016-11-03 14:43:14 +03:00 · 2016-11-03 14:43:14 +03:00 · 4af3806924
commit 4af3806924
parent f9482e714a
6 changed files with 115 additions and 21 deletions
--- a/doc/source/test_plans/openstack_api_metrics/content/boot_server.png
+++ b/doc/source/test_plans/openstack_api_metrics/content/boot_server.png
--- a/doc/source/test_plans/openstack_api_metrics/content/concurrency.png
+++ b/doc/source/test_plans/openstack_api_metrics/content/concurrency.png
--- a/doc/source/test_plans/openstack_api_metrics/content/objects.png
+++ b/doc/source/test_plans/openstack_api_metrics/content/objects.png
--- a/doc/source/test_plans/openstack_api_metrics/content/rps.png
+++ b/doc/source/test_plans/openstack_api_metrics/content/rps.png
--- a/doc/source/test_plans/openstack_api_metrics/content/scenario.json
+++ b/doc/source/test_plans/openstack_api_metrics/content/scenario.json
@ -0,0 +1,17 @@
+{
+    "Authenticate.keystone": [
+        {
+            "runner": {
+                "type": "constant",
+                "times": {{times}},
+                "concurrency": {{concurrency}}
+            },
+            "context": {
+                "users": {
+                    "tenants": 2,
+                    "users_per_tenant": 3
+                }
+            }
+        }
+    ]
+}
--- a/doc/source/test_plans/openstack_api_metrics/plan.rst
+++ b/doc/source/test_plans/openstack_api_metrics/plan.rst
@ -135,7 +135,13 @@ operation is executed several times to collect more reliable statistical data.
 Parameters
 ^^^^^^^^^^

-The only parameter is the operation being tested.
+Following parameters are configurable:
+ #. Operation being measured
+ #. Concurrency range for concurrency dependent metrics
+ #. Objects amount upper bound for objects dependent metrics.
+ #. Degradation coefficient for each metric.
+ #. Sample size
+

 List of performance metrics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -188,8 +194,8 @@ List of performance metrics
 Tools
 =====

-Rally
-----
+Rally + Metrics2
+----------------

 This test plan can be executed with `Rally`_ tool. Rally can report
 duration of individual operations and can be configured to perform operations
@ -202,6 +208,29 @@ grouped by resource type. E.g. instead of having 4 separate scenarios for
 create, get, list and delete operations have 1 that calls these operations
 sequentially.

+To make report generation simple there is `Metrics2`_ tool. It is a
+combination of python script that triggers Rally task execution and jupyter
+notebook which takes reports generated by Rally and calculates metrics
+and draws plots based on those reports. Also, before execution of metric
+measurements this script runs dummy scenario to 'prepare' deployment.
+
+
+Rally atomic action measurement augmentation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some Rally scenarios use polling to check if the operation object is
+ready/active. It creates additional load on Openstack deployment, so it was
+decided to fork Rally and use amqp events as a way to get more reliable atomic
+action measurement. It was done using `Pika`_ however results have shown, that
+in most cases atomic action time of operation measured with amqp is in variance
+range of time measured with polling and the difference is noticeable only when
+degradation of operation itself is very high. Plot below shows that
+difference. Vertical lines mark the 1.5 degradation point.
+
+    .. image:: content/boot_server.png
+        :width: 650px
+
+
 Scenarios
 ^^^^^^^^^

@ -211,28 +240,67 @@ To perform measurements we will need 2 types of scenarios:
 * **accumulative** - sequence of `create`, `get` and `list` operations;
   total number of objects is increasing.

+Scenarios should be prepared in following format:
+ #. Constant runner with times field set to '{{times}}' and concurrency
+    to '{{concurrency}}'.
+ #. It's better to turn off SLA, so it won't mess with the `Metrics2`_ report
+    compilation later.
+
+Example:
+
+.. literalinclude:: content/scenario.json
+
 Duration metrics
 ^^^^^^^^^^^^^^^^

-Duration metrics are collected with help of cyclic scenario.
-
-Actions:
- #. Set concurrency in 1 thread.
- #. Run scenario N times, where N is large enough to make a good sample.
-    Collect list of operation durations.
- #. For every operation calculate median and percentiles.
+Duration metrics are collected with help of cyclic scenario. However
+they are part of concurrency metrics, only with concurrency set to 1.


-Concurrency and throughput metrics
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Concurrency metrics
+^^^^^^^^^^^^^^^^^^^

 These metrics are collected with help of cyclic scenarios.

 Actions:
- #. Start with concurrency in 1 thread.
- #. Run scenario N times, where N is large enough to make a good sample.
-    Collect list of operation durations.
- #. Calculate throughput (divide number of operations on total duration).
+ For each concurrency value x from defined concurrency range:
+  #. Run scenario N times, where N is large enough to make a good sample.
+     Collect list of operation durations.
+  #. Calculate atomic actions duration mean and variance.
+  #. Find first concurrency value where duration mean exceeds base value
+     times degradation coefficient for each atomic action.
+  #. Generate plot.
+
+Important note, that this metric is actually a monotonic function of
+concurrency and can be calculated using binary search for example.
+
+Example report:
+
+    .. image:: content/concurrency.png
+        :width: 650px
+
+Throughput metrics
+^^^^^^^^^^^^^^^^^^
+
+These metrics are collected with help of cyclic scenarios.
+
+Actions:
+ For each concurrency value x from defined concurrency range:
+  #. Run scenario N times, where N is large enough to make a good sample.
+     Collect list of operation durations.
+  #. Calculate requests per seconds, using 'load_duration' value from Rally
+     reports. Divide load duration by concurrency to get RPS.
+  #. Find concurrency value, where RPS has its peak.
+  #. Generate plot.
+
+This metric usually has a bell-like behavior, but not always, so it must
+be calculated in linear time across all the points in concurrency range.
+
+Example report:
+
+    .. image:: content/rps.png
+        :width: 650px
+

 Scale impact metrics
 ^^^^^^^^^^^^^^^^^^^^
@ -240,11 +308,20 @@ Scale impact metrics
 These metrics are collected with help of accumulative scenarios.

 Actions:
- #. Set concurrency in 1 thread.
- #. Run scenario until desired number of objects reached (e.g. 1 thousand).
- #. Calculate mean for first 50 objects and for last 50.
- #. Calculate the ratio between means.
+ #. Set concurrency to low value, like 3. The reason for this is that it won't
+    affect metric measurement, but will speed up report generation process.
+ #. Run scenario until upper bound value of objects reached (e.g. 1 thousand).
+ #. Find first amount of objects value, where degradation of duration operation
+    exceeds defined value.
+ #. Generate plot.
+
+Example report:
+
+    .. image:: content/objects.png
+        :width: 650px

 .. references:

-.. _Rally: http://rally.readthedocs.io/
+.. _Rally: http://rally.readthedocs.io/
+.. _Metrics2: https://github.com/dudkamaster/metrics2
+.. _Pika: http://pika.readthedocs.io/