Publish reports for reliability testing 2.0
Change-Id: Ibe31d2674dfde70c0c2d349154c1f94c8e4fc86e
@ -1,4 +1,4 @@
|
|||||||
.. _reliability_testing:
|
.. _reliability_testing_version_2:
|
||||||
|
|
||||||
==========================================
|
==========================================
|
||||||
OpenStack reliability testing. Version 2.0
|
OpenStack reliability testing. Version 2.0
|
||||||
@ -18,11 +18,13 @@ OpenStack reliability testing. Version 2.0
|
|||||||
|
|
||||||
- **MTTR** - mean time to recover service performance after the fault.
|
- **MTTR** - mean time to recover service performance after the fault.
|
||||||
|
|
||||||
- **Service Downtime** - the time when service was not available and number
|
- **Service Downtime** - the time when service was not available.
|
||||||
of errors is more than defined by SLA.
|
|
||||||
|
|
||||||
- **Operation Degradation** - the difference in operation performance
|
- **Absolute performance degradation** - is an absolute difference between
|
||||||
compared with performance when service operates normally.
|
the mean of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
|
- **Relative performance degradation** - is the ratio between the mean
|
||||||
|
of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
- **Fault injection** - the function that emulates failure in software or
|
- **Fault injection** - the function that emulates failure in software or
|
||||||
hardware.
|
hardware.
|
||||||
@ -201,14 +203,14 @@ Overall the following metrics need to be collected:
|
|||||||
- How long does it takes to recover service performance after the failure.
|
- How long does it takes to recover service performance after the failure.
|
||||||
*
|
*
|
||||||
- 1
|
- 1
|
||||||
- Operation Degradation
|
- Absolute performance degradation
|
||||||
- sec
|
- sec
|
||||||
- the mean of difference in operation performance during recovery period
|
- the mean of difference in operation performance during recovery period
|
||||||
and operation performance when service operates normally.
|
and operation performance when service operates normally.
|
||||||
*
|
*
|
||||||
- 1
|
- 1
|
||||||
- Operation Degradation Ratio
|
- Relative performance degradation
|
||||||
- sec
|
- ratio
|
||||||
- the ratio between operation performance during recovery period and
|
- the ratio between operation performance during recovery period and
|
||||||
operation performance when service operates normally.
|
operation performance when service operates normally.
|
||||||
|
|
||||||
@ -252,13 +254,45 @@ succeed operation.
|
|||||||
To find the recovery period we first calculate the mean duration of
|
To find the recovery period we first calculate the mean duration of
|
||||||
consequent operations with sliding window. The period is treated as
|
consequent operations with sliding window. The period is treated as
|
||||||
`Recovery period` when mean operation duration is significantly more than
|
`Recovery period` when mean operation duration is significantly more than
|
||||||
the mean operation duration in the baseline. `Operation degradation` is
|
the mean operation duration in the baseline. The average duration of Recovery
|
||||||
calculated as difference between mean of operation duration during Recovery
|
period is `MTTR` value. `Absolute performance degradatio` is calculated as
|
||||||
period and the baseline's. `Operation ratio` is the ratio between mean of
|
difference between mean of operation duration during Recovery period and
|
||||||
operation duration during Recovery period and the baseline's.
|
the baseline's. `Relative performance degradation` is the ratio between
|
||||||
|
mean of operation duration during Recovery period and the baseline's.
|
||||||
|
|
||||||
|
|
||||||
|
How to run
|
||||||
|
^^^^^^^^^^
|
||||||
|
|
||||||
|
Prerequisites:
|
||||||
|
* Install `Rally` tool and configure deployment parameters
|
||||||
|
|
||||||
|
* Verify that Rally is properly installed by running ``rally show flavors``
|
||||||
|
|
||||||
|
* Install `os-faults` library: ``pip install os-faults``
|
||||||
|
|
||||||
|
* Configure cloud and power management parameters, refer to `os-faults-cfg`
|
||||||
|
* Verify parameters by running ``os-inject-fault -v``
|
||||||
|
|
||||||
|
* Install `RallyRunners` tool: ``pip install rally-runners``
|
||||||
|
|
||||||
|
Run scenarios:
|
||||||
|
``rally-reliability -s SCENARIO -o OUTPUT -b BOOK``
|
||||||
|
|
||||||
|
To show full list of scenarios:
|
||||||
|
``rally-reliability -h``
|
||||||
|
|
||||||
|
|
||||||
|
Reports
|
||||||
|
=======
|
||||||
|
|
||||||
|
Test plan execution reports:
|
||||||
|
* :ref:`reliability_test_results_version_2`
|
||||||
|
|
||||||
|
|
||||||
.. references:
|
.. references:
|
||||||
|
|
||||||
.. _Rally: https://rally.readthedocs.io/
|
.. _Rally: https://rally.readthedocs.io/
|
||||||
.. _os-faults: https://os-faults.readthedocs.io/
|
.. _os-faults: https://os-faults.readthedocs.io/
|
||||||
|
.. _os-faults-cfg: http://os-faults.readthedocs.io/en/latest/readme.html#usage
|
||||||
|
.. _RallyRunners: https://github.com/shakhat/rally-runners
|
||||||
|
42
doc/source/test_results/reliability/version_2/index.rst
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
.. _reliability_test_results_version_2:
|
||||||
|
|
||||||
|
========================================
|
||||||
|
OpenStack reliability testing. Version 2
|
||||||
|
========================================
|
||||||
|
|
||||||
|
Test results
|
||||||
|
============
|
||||||
|
|
||||||
|
Environment description
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
This report contains results for :ref:`reliability_testing_version_2`
|
||||||
|
test plan. The data is collected in :ref:`intel_mirantis_performance_lab`.
|
||||||
|
|
||||||
|
|
||||||
|
Software
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
This section describes installed software.
|
||||||
|
|
||||||
|
+-----------------+--------------------------------------------+
|
||||||
|
| Parameter | Value |
|
||||||
|
+-----------------+--------------------------------------------+
|
||||||
|
| OS | Ubuntu 14.04.3 |
|
||||||
|
+-----------------+--------------------------------------------+
|
||||||
|
| OpenStack | Fuel 9.0 (Mitaka) |
|
||||||
|
+-----------------+--------------------------------------------+
|
||||||
|
| Networking | Neutron OVS ML2 plugin with VxLAN and DVR |
|
||||||
|
+-----------------+--------------------------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Reports
|
||||||
|
^^^^^^^
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:glob:
|
||||||
|
|
||||||
|
reports/*/*/index
|
||||||
|
|
||||||
|
Reports are calculated on :download:`Raw Rally data <raw/raw_data.tar.xz>`
|
@ -0,0 +1,296 @@
|
|||||||
|
Keystone authentication with kill of Keystone on one node
|
||||||
|
=========================================================
|
||||||
|
|
||||||
|
This report is generated on results collected by execution of the following
|
||||||
|
Rally scenario:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
{% set repeat = repeat|default(5) %}
|
||||||
|
Authenticate.keystone:
|
||||||
|
{% for iteration in range(repeat) %}
|
||||||
|
-
|
||||||
|
runner:
|
||||||
|
type: "constant_for_duration"
|
||||||
|
duration: 30
|
||||||
|
concurrency: 20
|
||||||
|
context:
|
||||||
|
users:
|
||||||
|
tenants: 1
|
||||||
|
users_per_tenant: 1
|
||||||
|
hooks:
|
||||||
|
-
|
||||||
|
name: fault_injection
|
||||||
|
args:
|
||||||
|
action: kill keystone service on one node
|
||||||
|
trigger:
|
||||||
|
name: event
|
||||||
|
args:
|
||||||
|
unit: iteration
|
||||||
|
at: [100]
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
In Fuel architecture Keystone is deployed behind Apache2, which in turn are
|
||||||
|
behind NGINX front-end. In this scenario we kill Keystone processes running
|
||||||
|
on one of controller nodes.
|
||||||
|
|
||||||
|
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||||
|
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||||
|
+=======================+============+=======================================+===========================================+
|
||||||
|
| 0.038 ±0.081 | 2.28 ±0.23 | 1.21 ±0.35 | 9.1 ±2.3 |
|
||||||
|
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||||
|
|
||||||
|
Metrics:
|
||||||
|
* `Service downtime` is the time interval between the first and
|
||||||
|
the last errors.
|
||||||
|
* `MTTR` is the mean time to recover service performance after
|
||||||
|
the fault.
|
||||||
|
* `Absolute performance degradation` is an absolute difference between
|
||||||
|
the mean of operation duration during recovery period and the baseline's.
|
||||||
|
* `Relative performance degradation` is the ratio between the mean
|
||||||
|
of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
This section contains individual data for particular scenario runs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #1
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_1.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 78 | 0.12 | 0.13 | 0.041 | 0.23 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+================+
|
||||||
|
| 1 | 0.0034 ±0.0034 |
|
||||||
|
+-----+----------------+
|
||||||
|
| 2 | 0.0282 ±0.0014 |
|
||||||
|
+-----+----------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 2.711 ±0.023 | 1.30 ±0.39 | 10.8 ±3.0 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #2
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_2.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 70 | 0.14 | 0.15 | 0.048 | 0.24 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+================+
|
||||||
|
| 1 | 0.0047 ±0.0047 |
|
||||||
|
+-----+----------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 2.722 ±0.026 | 1.66 ±0.43 | 11.9 ±2.9 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #3
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_3.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 84 | 0.15 | 0.16 | 0.058 | 0.27 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+================+
|
||||||
|
| 1 | 0.1147 ±0.0067 |
|
||||||
|
+-----+----------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 2.317 ±0.019 | 1.07 ±0.35 | 7.5 ±2.1 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #4
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_4.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 87 | 0.14 | 0.16 | 0.051 | 0.25 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+================+
|
||||||
|
| 1 | 0.0057 ±0.0057 |
|
||||||
|
+-----+----------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 1.695 ±0.015 | 1.11 ±0.29 | 8.0 ±1.8 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #5
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_5.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 87 | 0.14 | 0.15 | 0.051 | 0.26 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+================+
|
||||||
|
| 1 | 0.0166 ±0.0044 |
|
||||||
|
+-----+----------------+
|
||||||
|
| 2 | 0.0162 ±0.0044 |
|
||||||
|
+-----+----------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 1.976 ±0.015 | 0.93 ±0.29 | 7.1 ±1.9 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
After Width: | Height: | Size: 455 KiB |
After Width: | Height: | Size: 469 KiB |
After Width: | Height: | Size: 450 KiB |
After Width: | Height: | Size: 469 KiB |
After Width: | Height: | Size: 460 KiB |
@ -0,0 +1,98 @@
|
|||||||
|
Keystone authentication with kill of MySQL on one node
|
||||||
|
======================================================
|
||||||
|
|
||||||
|
This report is generated on results collected by execution of the following
|
||||||
|
Rally scenario:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
Authenticate.keystone:
|
||||||
|
-
|
||||||
|
runner:
|
||||||
|
type: "constant_for_duration"
|
||||||
|
duration: 60
|
||||||
|
concurrency: 5
|
||||||
|
context:
|
||||||
|
users:
|
||||||
|
tenants: 1
|
||||||
|
users_per_tenant: 1
|
||||||
|
hooks:
|
||||||
|
-
|
||||||
|
name: fault_injection
|
||||||
|
args:
|
||||||
|
action: kill mysql service on one node
|
||||||
|
trigger:
|
||||||
|
name: event
|
||||||
|
args:
|
||||||
|
unit: iteration
|
||||||
|
at: [150]
|
||||||
|
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
In this scenario we kill one of MySQL servers while working with Keystone API.
|
||||||
|
In Fuel architecture MySQL is deployed with Galera in active-active mode,
|
||||||
|
however Keystone looses connection to DB with the following traces::
|
||||||
|
|
||||||
|
(_mysql_exceptions.OperationalError) (2013, "Lost connection to MySQL
|
||||||
|
server at 'reading initial communication packet', system error: 0")
|
||||||
|
|
||||||
|
+-----------------------+-----------+---------------------------------------+-------------------------------------------+
|
||||||
|
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||||
|
+=======================+===========+=======================================+===========================================+
|
||||||
|
| 14.7 ±1.4 | N/A | N/A | N/A |
|
||||||
|
+-----------------------+-----------+---------------------------------------+-------------------------------------------+
|
||||||
|
|
||||||
|
Metrics:
|
||||||
|
* `Service downtime` is the time interval between the first and
|
||||||
|
the last errors.
|
||||||
|
* `MTTR` is the mean time to recover service performance after
|
||||||
|
the fault.
|
||||||
|
* `Absolute performance degradation` is an absolute difference between
|
||||||
|
the mean of operation duration during recovery period and the baseline's.
|
||||||
|
* `Relative performance degradation` is the ratio between the mean
|
||||||
|
of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
This section contains individual data for particular scenario runs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #1
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_1.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 135 | 0.071 | 0.074 | 0.012 | 0.09 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+---------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+===============+
|
||||||
|
| 1 | 14.7 ±2.0 |
|
||||||
|
+-----+---------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
After Width: | Height: | Size: 405 KiB |
@ -0,0 +1,292 @@
|
|||||||
|
Keystone authentication with Keystone API restart on one node
|
||||||
|
=============================================================
|
||||||
|
|
||||||
|
This report is generated on results collected by execution of the following
|
||||||
|
Rally scenario:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
{% set repeat = repeat|default(5) %}
|
||||||
|
Authenticate.keystone:
|
||||||
|
{% for iteration in range(repeat) %}
|
||||||
|
-
|
||||||
|
runner:
|
||||||
|
type: "constant_for_duration"
|
||||||
|
duration: 30
|
||||||
|
concurrency: 5
|
||||||
|
context:
|
||||||
|
users:
|
||||||
|
tenants: 1
|
||||||
|
users_per_tenant: 1
|
||||||
|
hooks:
|
||||||
|
-
|
||||||
|
name: fault_injection
|
||||||
|
args:
|
||||||
|
action: restart keystone service on one node
|
||||||
|
trigger:
|
||||||
|
name: event
|
||||||
|
args:
|
||||||
|
unit: iteration
|
||||||
|
at: [100]
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
In Fuel architecture Keystone is deployed behind Apache2, which in turn are
|
||||||
|
behind NGINX front-end. In this scenario we restart Apache2 service, as result
|
||||||
|
Keystone becomes unavailable on one of controller nodes.
|
||||||
|
|
||||||
|
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||||
|
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||||
|
+=======================+============+=======================================+===========================================+
|
||||||
|
| 1.07 ±0.76 | 5.44 ±0.47 | 0.41 ±0.22 | 4.7 ±2.0 |
|
||||||
|
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||||
|
|
||||||
|
Metrics:
|
||||||
|
* `Service downtime` is the time interval between the first and
|
||||||
|
the last errors.
|
||||||
|
* `MTTR` is the mean time to recover service performance after
|
||||||
|
the fault.
|
||||||
|
* `Absolute performance degradation` is an absolute difference between
|
||||||
|
the mean of operation duration during recovery period and the baseline's.
|
||||||
|
* `Relative performance degradation` is the ratio between the mean
|
||||||
|
of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
This section contains individual data for particular scenario runs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #1
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_1.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 84 | 0.071 | 0.077 | 0.017 | 0.13 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+---------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+===============+
|
||||||
|
| 1 | 0.88 ±0.75 |
|
||||||
|
+-----+---------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 3.549 ±0.034 | 0.51 ±0.25 | 7.6 ±3.3 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #2
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_2.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 84 | 0.13 | 0.13 | 0.0086 | 0.14 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+---------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+===============+
|
||||||
|
| 1 | 1.00 ±0.87 |
|
||||||
|
+-----+---------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 6.038 ±0.034 | 0.35 ±0.17 | 3.7 ±1.3 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #3
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_3.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 84 | 0.13 | 0.12 | 0.0077 | 0.14 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+---------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+===============+
|
||||||
|
| 1 | 0.26 ±0.12 |
|
||||||
|
+-----+---------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 6.123 ±0.037 | 0.43 ±0.25 | 4.4 ±2.0 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #4
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_4.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 84 | 0.13 | 0.13 | 0.0089 | 0.14 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+---------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+===============+
|
||||||
|
| 1 | 1.02 ±0.73 |
|
||||||
|
+-----+---------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 5.860 ±0.027 | 0.25 ±0.13 | 2.9 ±1.1 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #5
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_5.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 87 | 0.13 | 0.13 | 0.019 | 0.14 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+---------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+===============+
|
||||||
|
| 1 | 2.173 ±0.067 |
|
||||||
|
+-----+---------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 5.630 ±0.048 | 0.52 ±0.30 | 5.0 ±2.3 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
After Width: | Height: | Size: 255 KiB |
After Width: | Height: | Size: 165 KiB |
After Width: | Height: | Size: 166 KiB |
After Width: | Height: | Size: 166 KiB |
After Width: | Height: | Size: 217 KiB |
@ -0,0 +1,201 @@
|
|||||||
|
Keystone authentication with memached restart on one node
|
||||||
|
=========================================================
|
||||||
|
|
||||||
|
This report is generated on results collected by execution of the following
|
||||||
|
Rally scenario:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
{% set repeat = repeat|default(5) %}
|
||||||
|
Authenticate.keystone:
|
||||||
|
{% for iteration in range(repeat) %}
|
||||||
|
-
|
||||||
|
runner:
|
||||||
|
type: "constant_for_duration"
|
||||||
|
duration: 30
|
||||||
|
concurrency: 5
|
||||||
|
context:
|
||||||
|
users:
|
||||||
|
tenants: 1
|
||||||
|
users_per_tenant: 1
|
||||||
|
hooks:
|
||||||
|
-
|
||||||
|
name: fault_injection
|
||||||
|
args:
|
||||||
|
action: restart memcached service on one node
|
||||||
|
trigger:
|
||||||
|
name: event
|
||||||
|
args:
|
||||||
|
unit: iteration
|
||||||
|
at: [100]
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
In this scenario we restart Memcached service on one of controller nodes.
|
||||||
|
Memcached is used as caching backend for Keystone, thus it's expected that
|
||||||
|
Keystone performance may degrade.
|
||||||
|
|
||||||
|
+-----------------------+--------------+---------------------------------------+-------------------------------------------+
|
||||||
|
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||||
|
+=======================+==============+=======================================+===========================================+
|
||||||
|
| N/A | 0.458 ±0.068 | 0.057 ±0.034 | 1.46 ±0.27 |
|
||||||
|
+-----------------------+--------------+---------------------------------------+-------------------------------------------+
|
||||||
|
|
||||||
|
Metrics:
|
||||||
|
* `Service downtime` is the time interval between the first and
|
||||||
|
the last errors.
|
||||||
|
* `MTTR` is the mean time to recover service performance after
|
||||||
|
the fault.
|
||||||
|
* `Absolute performance degradation` is an absolute difference between
|
||||||
|
the mean of operation duration during recovery period and the baseline's.
|
||||||
|
* `Relative performance degradation` is the ratio between the mean
|
||||||
|
of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
This section contains individual data for particular scenario runs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #1
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_1.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 88 | 0.12 | 0.12 | 0.014 | 0.13 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #2
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_2.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 84 | 0.12 | 0.12 | 0.0078 | 0.13 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 0.4059 ±0.0027 | 0.069 ±0.030 | 1.57 ±0.25 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #3
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_3.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 88 | 0.12 | 0.13 | 0.017 | 0.15 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #4
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_4.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 84 | 0.12 | 0.12 | 0.01 | 0.14 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #5
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_5.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 84 | 0.13 | 0.13 | 0.0086 | 0.14 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 0.5110 ±0.0037 | 0.045 ±0.037 | 1.35 ±0.29 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
After Width: | Height: | Size: 203 KiB |
After Width: | Height: | Size: 204 KiB |
After Width: | Height: | Size: 200 KiB |
After Width: | Height: | Size: 198 KiB |
After Width: | Height: | Size: 196 KiB |
@ -0,0 +1,160 @@
|
|||||||
|
Create and list networks with kill of one of MySQL servers
|
||||||
|
==========================================================
|
||||||
|
|
||||||
|
This report is generated on results collected by execution of the following
|
||||||
|
Rally scenario:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
{% set repeat = repeat|default(3) %}
|
||||||
|
NeutronNetworks.create_and_list_networks:
|
||||||
|
{% for iteration in range(repeat) %}
|
||||||
|
-
|
||||||
|
args:
|
||||||
|
network_create_args: {}
|
||||||
|
runner:
|
||||||
|
type: "constant_for_duration"
|
||||||
|
duration: 60
|
||||||
|
concurrency: 4
|
||||||
|
context:
|
||||||
|
users:
|
||||||
|
tenants: 1
|
||||||
|
users_per_tenant: 1
|
||||||
|
quotas:
|
||||||
|
neutron:
|
||||||
|
network: -1
|
||||||
|
hooks:
|
||||||
|
-
|
||||||
|
name: fault_injection
|
||||||
|
args:
|
||||||
|
action: kill mysql service on one node
|
||||||
|
trigger:
|
||||||
|
name: event
|
||||||
|
args:
|
||||||
|
unit: iteration
|
||||||
|
at: [100]
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
In this scenario we kill one of MySQL servers while working with Neutron API.
|
||||||
|
In Fuel architecture MySQL is deployed with Galera in active-active mode, thus
|
||||||
|
no dramatic impact should occur.
|
||||||
|
|
||||||
|
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||||
|
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||||
|
+=======================+============+=======================================+===========================================+
|
||||||
|
| N/A | 7.73 ±0.72 | 1.4 ±1.1 | 3.8 ±2.3 |
|
||||||
|
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||||
|
|
||||||
|
Metrics:
|
||||||
|
* `Service downtime` is the time interval between the first and
|
||||||
|
the last errors.
|
||||||
|
* `MTTR` is the mean time to recover service performance after
|
||||||
|
the fault.
|
||||||
|
* `Absolute performance degradation` is an absolute difference between
|
||||||
|
the mean of operation duration during recovery period and the baseline's.
|
||||||
|
* `Relative performance degradation` is the ratio between the mean
|
||||||
|
of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
This section contains individual data for particular scenario runs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #1
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_1.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 86 | 0.48 | 0.8 | 0.49 | 1.6 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #2
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_2.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 85 | 0.46 | 0.5 | 0.12 | 0.7 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 6.824 ±0.093 | 1.5 ±1.2 | 4.1 ±2.5 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #3
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_3.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 85 | 0.45 | 0.47 | 0.065 | 0.61 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 8.63 ±0.12 | 1.18 ±1.00 | 3.5 ±2.1 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
After Width: | Height: | Size: 90 KiB |
After Width: | Height: | Size: 101 KiB |
After Width: | Height: | Size: 100 KiB |
@ -0,0 +1,119 @@
|
|||||||
|
Boot and delete VM with disabling management network on one of controllers
|
||||||
|
==========================================================================
|
||||||
|
|
||||||
|
This report is generated on results collected by execution of the following
|
||||||
|
Rally scenario:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
NovaServers.boot_and_delete_server:
|
||||||
|
-
|
||||||
|
args:
|
||||||
|
flavor:
|
||||||
|
name: "m1.micro"
|
||||||
|
image:
|
||||||
|
name: "(^cirros.*uec$|TestVM)"
|
||||||
|
force_delete: false
|
||||||
|
runner:
|
||||||
|
type: "constant_for_duration"
|
||||||
|
duration: 600
|
||||||
|
concurrency: 4
|
||||||
|
context:
|
||||||
|
users:
|
||||||
|
tenants: 1
|
||||||
|
users_per_tenant: 1
|
||||||
|
hooks:
|
||||||
|
-
|
||||||
|
name: fault_injection
|
||||||
|
args:
|
||||||
|
action: disconnect management network on one node with nova-scheduler service
|
||||||
|
trigger:
|
||||||
|
name: event
|
||||||
|
args:
|
||||||
|
unit: iteration
|
||||||
|
at: [50]
|
||||||
|
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
In this scenario we disable management network interface on one of controllers
|
||||||
|
(in Fuel architecture controller runs DB, MQ, API services, scheduler).
|
||||||
|
This emulates the case with networking outage (network port failure on machine
|
||||||
|
or switch).
|
||||||
|
|
||||||
|
The outage causes all services to become unreachable from outside. Moreover,
|
||||||
|
the cluster remains broken even 10 minutes after the fault.
|
||||||
|
|
||||||
|
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||||
|
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||||
|
+=======================+============+=======================================+===========================================+
|
||||||
|
| 358.0 ±2.7 | 149.0 ±2.1 | 24 ±17 | 5.7 ±3.4 |
|
||||||
|
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||||
|
|
||||||
|
Metrics:
|
||||||
|
* `Service downtime` is the time interval between the first and
|
||||||
|
the last errors.
|
||||||
|
* `MTTR` is the mean time to recover service performance after
|
||||||
|
the fault.
|
||||||
|
* `Absolute performance degradation` is an absolute difference between
|
||||||
|
the mean of operation duration during recovery period and the baseline's.
|
||||||
|
* `Relative performance degradation` is the ratio between the mean
|
||||||
|
of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
This section contains individual data for particular scenario runs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #1
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_1.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 36 | 5.5 | 5.2 | 0.6 | 6 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+---------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+===============+
|
||||||
|
| 1 | 126.32 ±0.82 |
|
||||||
|
+-----+---------------+
|
||||||
|
| 2 | 231.7 ±6.5 |
|
||||||
|
+-----+---------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 149.0 ±4.6 | 24 ±17 | 5.7 ±3.4 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
After Width: | Height: | Size: 64 KiB |
@ -0,0 +1,81 @@
|
|||||||
|
Boot and delete VM with kill of RabbitMQ on one of nodes
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
This report is generated on results collected by execution of the following
|
||||||
|
Rally scenario:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
NovaServers.boot_and_delete_server:
|
||||||
|
-
|
||||||
|
args:
|
||||||
|
flavor:
|
||||||
|
name: "m1.micro"
|
||||||
|
image:
|
||||||
|
name: "(^cirros.*uec$|TestVM)"
|
||||||
|
force_delete: false
|
||||||
|
runner:
|
||||||
|
type: "constant_for_duration"
|
||||||
|
duration: 240
|
||||||
|
concurrency: 4
|
||||||
|
context:
|
||||||
|
users:
|
||||||
|
tenants: 1
|
||||||
|
users_per_tenant: 1
|
||||||
|
hooks:
|
||||||
|
-
|
||||||
|
name: fault_injection
|
||||||
|
args:
|
||||||
|
action: kill rabbitmq service on one node
|
||||||
|
trigger:
|
||||||
|
name: event
|
||||||
|
args:
|
||||||
|
unit: iteration
|
||||||
|
at: [60]
|
||||||
|
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
In this scenario we kill one of running RabbitMQ servers. Once killed RabbitMQ
|
||||||
|
gets restarted automatically by Pacemaker.
|
||||||
|
|
||||||
|
The cloud stays stable, no errors, nor significant performance degradation
|
||||||
|
observed. Oslo.messaging library handles the loss of connection to RabbitMQ
|
||||||
|
and reconnects to one of other servers automatically::
|
||||||
|
|
||||||
|
AMQP server on 10.43.0.3:5673 is unreachable: timed out. Trying again in
|
||||||
|
1 seconds.
|
||||||
|
...
|
||||||
|
Reconnected to AMQP server on 10.43.0.6:5673 via [amqp] client
|
||||||
|
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
This section contains individual data for particular scenario runs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #1
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_1.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 45 | 5.8 | 5.8 | 0.3 | 6.1 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
After Width: | Height: | Size: 66 KiB |
@ -0,0 +1,113 @@
|
|||||||
|
Boot and delete VM with reboot of one of controllers
|
||||||
|
====================================================
|
||||||
|
|
||||||
|
This report is generated on results collected by execution of the following
|
||||||
|
Rally scenario:
|
||||||
|
|
||||||
|
.. code-block:: yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
NovaServers.boot_and_delete_server:
|
||||||
|
-
|
||||||
|
args:
|
||||||
|
flavor:
|
||||||
|
name: "m1.micro"
|
||||||
|
image:
|
||||||
|
name: "(^cirros.*uec$|TestVM)"
|
||||||
|
force_delete: false
|
||||||
|
runner:
|
||||||
|
type: "constant_for_duration"
|
||||||
|
duration: 600
|
||||||
|
concurrency: 4
|
||||||
|
context:
|
||||||
|
users:
|
||||||
|
tenants: 1
|
||||||
|
users_per_tenant: 1
|
||||||
|
hooks:
|
||||||
|
-
|
||||||
|
name: fault_injection
|
||||||
|
args:
|
||||||
|
action: reboot one node with rabbitmq service
|
||||||
|
trigger:
|
||||||
|
name: event
|
||||||
|
args:
|
||||||
|
unit: iteration
|
||||||
|
at: [50]
|
||||||
|
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
In this scenario we reboot one of controllers (in Fuel architecture controller
|
||||||
|
runs DB, MQ, API services, scheduler). The observed recovery period corresponds
|
||||||
|
to time needed for a node to reboot, start services and get back to sync state.
|
||||||
|
|
||||||
|
+-----------------------+--------------+---------------------------------------+-------------------------------------------+
|
||||||
|
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||||
|
+=======================+==============+=======================================+===========================================+
|
||||||
|
| 8.7 ±1.6 | 286.89 ±0.87 | 14.7 ±4.7 | 3.85 ±0.91 |
|
||||||
|
+-----------------------+--------------+---------------------------------------+-------------------------------------------+
|
||||||
|
|
||||||
|
Metrics:
|
||||||
|
* `Service downtime` is the time interval between the first and
|
||||||
|
the last errors.
|
||||||
|
* `MTTR` is the mean time to recover service performance after
|
||||||
|
the fault.
|
||||||
|
* `Absolute performance degradation` is an absolute difference between
|
||||||
|
the mean of operation duration during recovery period and the baseline's.
|
||||||
|
* `Relative performance degradation` is the ratio between the mean
|
||||||
|
of operation duration during recovery period and the baseline's.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Details
|
||||||
|
-------
|
||||||
|
|
||||||
|
This section contains individual data for particular scenario runs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Run #1
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
.. image:: plot_1.svg
|
||||||
|
|
||||||
|
Baseline
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Baseline samples are collected before the start of fault injection. They are
|
||||||
|
used to estimate service performance degradation after the fault.
|
||||||
|
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||||
|
+===========+=============+===========+===========+=====================+
|
||||||
|
| 36 | 5.1 | 5.2 | 0.63 | 6.1 |
|
||||||
|
+-----------+-------------+-----------+-----------+---------------------+
|
||||||
|
|
||||||
|
|
||||||
|
Service downtime
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service is not available during the following time period(s).
|
||||||
|
|
||||||
|
+-----+---------------+
|
||||||
|
| # | Downtime, s |
|
||||||
|
+=====+===============+
|
||||||
|
| 1 | 8.7 ±2.5 |
|
||||||
|
+-----+---------------+
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Service performance degradation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The tested service has measurable performance degradation during the
|
||||||
|
following time period(s).
|
||||||
|
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||||
|
+=====+======================+===========================+========================+
|
||||||
|
| 1 | 286.89 ±0.76 | 14.7 ±4.7 | 3.85 ±0.91 |
|
||||||
|
+-----+----------------------+---------------------------+------------------------+
|
||||||
|
|
||||||
|
|
After Width: | Height: | Size: 88 KiB |