Add reliability test results
This commit add part of reliability testing results. Scope of this commit is testing Nova API under several factors. Change-Id: Id3cb644ccf4bd315846399e6ac40a446297787f3
This commit is contained in:
parent
92816644f7
commit
f84ec2ce07
@ -14,43 +14,7 @@ OpenStack reliability testing
|
|||||||
|
|
||||||
:Conventions:
|
:Conventions:
|
||||||
|
|
||||||
- **OpenStack cluster:** consists of server nodes with deployed and fully
|
.. include:: plan_conventions.rst
|
||||||
operational OpenStack environment in high-availability configuration.
|
|
||||||
|
|
||||||
- **Fault-injection operation:** represents common types of failures which can
|
|
||||||
occur in production environment: service-hang, service-crash,
|
|
||||||
network-partition, network-flapping, and node-crash.
|
|
||||||
|
|
||||||
- **Service-hang:** faults are injected into specified OpenStack service by
|
|
||||||
sending -SIGSTOP and -SIGCONT POSIX signals.
|
|
||||||
|
|
||||||
- **Service-crash:** faults are injected by sending -SIGKILL signal into
|
|
||||||
specified OpenStack service.
|
|
||||||
|
|
||||||
- **Node-crash:** faults are injected to an OpenStack cluster by rebooting
|
|
||||||
or shutting down a server node.
|
|
||||||
|
|
||||||
- **Network-partition:** faults are injected by inserting iptables rules to
|
|
||||||
OpenStack cluster nodes to a corresponding service that should be
|
|
||||||
network-partitioned.
|
|
||||||
|
|
||||||
- **Network-flapping:** faults are injected into OpenStack cluster nodes by
|
|
||||||
inserting/deleting iptables rules on the fly which will affect
|
|
||||||
corresponding service that should be tested.
|
|
||||||
|
|
||||||
- **Factor:** consists of a set of atomic fault-injection operations. For
|
|
||||||
example: reboot-random-controller, reboot-random-rabbitmq.
|
|
||||||
|
|
||||||
- **Test plan:** contains two elements: test scenario
|
|
||||||
execution graph and fault-injection factors.
|
|
||||||
|
|
||||||
- **SLA**: Service-level agreement
|
|
||||||
|
|
||||||
- **Testing-cycles**: number of test cycles of each factor
|
|
||||||
|
|
||||||
- **Inf**: assumes infinite time to auto-healing of cluster
|
|
||||||
after fault-factor injection.
|
|
||||||
|
|
||||||
|
|
||||||
Test Plan
|
Test Plan
|
||||||
=========
|
=========
|
||||||
|
36
doc/source/test_plans/reliability/plan_conventions.rst
Normal file
36
doc/source/test_plans/reliability/plan_conventions.rst
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
- **OpenStack cluster:** consists of server nodes with deployed and fully
|
||||||
|
operational OpenStack environment in high-availability configuration.
|
||||||
|
|
||||||
|
- **Fault-injection operation:** represents common types of failures which can
|
||||||
|
occur in production environment: service-hang, service-crash,
|
||||||
|
network-partition, network-flapping, and node-crash.
|
||||||
|
|
||||||
|
- **Service-hang:** faults are injected into specified OpenStack service by
|
||||||
|
sending -SIGSTOP and -SIGCONT POSIX signals.
|
||||||
|
|
||||||
|
- **Service-crash:** faults are injected by sending -SIGKILL signal into
|
||||||
|
specified OpenStack service.
|
||||||
|
|
||||||
|
- **Node-crash:** faults are injected to an OpenStack cluster by rebooting
|
||||||
|
or shutting down a server node.
|
||||||
|
|
||||||
|
- **Network-partition:** faults are injected by inserting iptables rules to
|
||||||
|
OpenStack cluster nodes to a corresponding service that should be
|
||||||
|
network-partitioned.
|
||||||
|
|
||||||
|
- **Network-flapping:** faults are injected into OpenStack cluster nodes by
|
||||||
|
inserting/deleting iptables rules on the fly which will affect
|
||||||
|
corresponding service that should be tested.
|
||||||
|
|
||||||
|
- **Factor:** consists of a set of atomic fault-injection operations. For
|
||||||
|
example: reboot-random-controller, reboot-random-rabbitmq.
|
||||||
|
|
||||||
|
- **Test plan:** contains two elements: test scenario
|
||||||
|
execution graph and fault-injection factors.
|
||||||
|
|
||||||
|
- **SLA**: Service-level agreement
|
||||||
|
|
||||||
|
- **Testing-cycles**: number of test cycles of each factor
|
||||||
|
|
||||||
|
- **Inf**: assumes infinite time to auto-healing of cluster
|
||||||
|
after fault-factor injection.
|
@ -20,3 +20,4 @@ Test Results
|
|||||||
hardware_features/index
|
hardware_features/index
|
||||||
provisioning/index
|
provisioning/index
|
||||||
1000_nodes/index
|
1000_nodes/index
|
||||||
|
reliability/index
|
||||||
|
BIN
doc/source/test_results/reliability/images/Network_Scheme.png
Normal file
BIN
doc/source/test_results/reliability/images/Network_Scheme.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 20 KiB |
650
doc/source/test_results/reliability/index.rst
Normal file
650
doc/source/test_results/reliability/index.rst
Normal file
@ -0,0 +1,650 @@
|
|||||||
|
.. _reliability_testing_results:
|
||||||
|
|
||||||
|
=============================
|
||||||
|
OpenStack reliability testing
|
||||||
|
=============================
|
||||||
|
|
||||||
|
:status: draft
|
||||||
|
:version: 0
|
||||||
|
|
||||||
|
:Abstract:
|
||||||
|
This document describes an abstract methodology for OpenStack cluster
|
||||||
|
high-availability testing and analysis. OpenStack data plane testing
|
||||||
|
at this moment is out of scope but will be described in future.
|
||||||
|
|
||||||
|
:Conventions:
|
||||||
|
|
||||||
|
.. include:: ../../test_plans/reliability/plan_conventions.rst
|
||||||
|
|
||||||
|
|
||||||
|
Test results
|
||||||
|
============
|
||||||
|
|
||||||
|
Test environment
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Software configuration on servers with OpenStack
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. table:: **Basic cluster configuration**
|
||||||
|
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Name | Build-9.0.0-451 |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|OpenStack release | Mitaka on Ubuntu 14.04 |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Total nodes | 6 nodes |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Controller | 3 nodes |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Compute, Ceph OSD | 3 nodes with KVM hypervisor |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Network | Neutron with tunneling segmentation |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Storage back ends | | Ceph RBD for volumes (Cinder) |
|
||||||
|
| | | Ceph RadosGW for objects (Swift API) |
|
||||||
|
| | | Ceph RBD for ephemeral volumes (Nova) |
|
||||||
|
| | | Ceph RBD for images (Glance) |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|
||||||
|
Software configuration on servers with Rally role
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Before you start configuring a server with Rally role, verify that Rally
|
||||||
|
is installed. For more information, see `Rally installation documentation`_.
|
||||||
|
|
||||||
|
.. table:: **Software version of Rally server**
|
||||||
|
|
||||||
|
+------------+-------------------+
|
||||||
|
|Software |Version |
|
||||||
|
+============+===================+
|
||||||
|
|Rally |0.4.0 |
|
||||||
|
+------------+-------------------+
|
||||||
|
|Ubuntu |14.04.3 LTS |
|
||||||
|
+------------+-------------------+
|
||||||
|
|
||||||
|
Environment description
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Hardware
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
.. table:: **Description of server hardware**
|
||||||
|
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|SERVER |name | | 728997-comp-disk-228 | 729017-comp-disk-255 |
|
||||||
|
| | | | 728998-comp-disk-227 | |
|
||||||
|
| | | | 728999-comp-disk-226 | |
|
||||||
|
| | | | 729000-comp-disk-225 | |
|
||||||
|
| | | | 729001-comp-disk-224 | |
|
||||||
|
| | | | 729002-comp-disk-223 | |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |role | | controller | Rally |
|
||||||
|
| | | | controller | |
|
||||||
|
| | | | controller | |
|
||||||
|
| | | | compute, ceph-osd | |
|
||||||
|
| | | | compute, ceph-osd | |
|
||||||
|
| | | | compute, ceph-osd | |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |vendor, model |HP, DL380 Gen9 |HP, DL380 Gen9 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |operating_system | | 3.13.0-87-generic | | 3.13.0-87-generic |
|
||||||
|
| | | | Ubuntu-trusty | | Ubuntu-trusty |
|
||||||
|
| | | | x86_64 | | x86_64 |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|CPU |vendor, model |Intel, E5-2680 v3 |Intel, E5-2680 v3 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |processor_count |2 |2 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |core_count |12 |12 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |frequency_MHz |2500 |2500 |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|RAM |vendor, model |HP, 752369-081 |HP, 752369-081 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |amount_MB |262144 |262144 |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|NETWORK |interface_name |p1p1 |p1p1 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |vendor, model |Intel, X710 Dual Port |Intel, X710 Dual Port |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |bandwidth |10 Gbit |10 Gbit |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|STORAGE |dev_name |/dev/sda |/dev/sda |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |vendor, model | | raid10 - HP P840 | | raid10 - HP P840 |
|
||||||
|
| | | | 12 disks EH0600JEDHE | | 12 disks EH0600JEDHE |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |SSD/HDD |HDD |HDD |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |size | 3,6 TB | 3,6 TB |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|
||||||
|
Software
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
.. table:: **Services on servers by role**
|
||||||
|
|
||||||
|
+------------+----------------------------+
|
||||||
|
|Role |Service name |
|
||||||
|
+============+============================+
|
||||||
|
|controller || horizon |
|
||||||
|
| || keystone |
|
||||||
|
| || nova-api |
|
||||||
|
| || nava-scheduler |
|
||||||
|
| || nova-cert |
|
||||||
|
| || nova-conductor |
|
||||||
|
| || nova-consoleauth |
|
||||||
|
| || nova-consoleproxy |
|
||||||
|
| || cinder-api |
|
||||||
|
| || cinder-backup |
|
||||||
|
| || cinder-scheduler |
|
||||||
|
| || cinder-volume |
|
||||||
|
| || glance-api |
|
||||||
|
| || glance-glare |
|
||||||
|
| || glance-registry |
|
||||||
|
| || neutron-dhcp-agent |
|
||||||
|
| || neutron-l3-agent |
|
||||||
|
| || neutron-metadata-agent |
|
||||||
|
| || neutron-openvswitch-agent |
|
||||||
|
| || neutron-server |
|
||||||
|
| || heat-api |
|
||||||
|
| || heat-api-cfn |
|
||||||
|
| || heat-api-cloudwatch |
|
||||||
|
| || ceph-mon |
|
||||||
|
| || rados-gw |
|
||||||
|
| || heat-engine |
|
||||||
|
| || memcached |
|
||||||
|
| || rabbitmq_server |
|
||||||
|
| || mysqld |
|
||||||
|
| || galera |
|
||||||
|
| || corosync |
|
||||||
|
| || pacemaker |
|
||||||
|
| || haproxy |
|
||||||
|
+------------+----------------------------+
|
||||||
|
|compute-osd || nova-compute |
|
||||||
|
| || neutron-l3-agent |
|
||||||
|
| || neutron-metadata-agent |
|
||||||
|
| || neutron-openvswitch-agent |
|
||||||
|
| || ceph-osd |
|
||||||
|
+------------+----------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
High availability cluster architecture
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Controller nodes:
|
||||||
|
|
||||||
|
.. image:: https://docs.mirantis.com/openstack/fuel/fuel-8.0/_images/logical-diagram-controller.svg
|
||||||
|
:height: 700px
|
||||||
|
:width: 600px
|
||||||
|
:alt: Mirantis reference HA architecture
|
||||||
|
|
||||||
|
Compute nodes:
|
||||||
|
|
||||||
|
.. image:: https://docs.mirantis.com/openstack/fuel/fuel-8.0/_images/logical-diagram-compute.svg
|
||||||
|
:height: 250px
|
||||||
|
:width: 350px
|
||||||
|
:alt: Mirantis reference HA architecture
|
||||||
|
|
||||||
|
|
||||||
|
Networking
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
All servers have the similar network configuration:
|
||||||
|
|
||||||
|
.. image:: images/Network_Scheme.png
|
||||||
|
:alt: Network Scheme of the environment
|
||||||
|
|
||||||
|
The following example shows a part of a switch configuration for each switch
|
||||||
|
port that is connected to ens1f0 interface of a server:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
switchport mode trunk
|
||||||
|
switchport trunk native vlan 600
|
||||||
|
switchport trunk allowed vlan 600-602,630-649
|
||||||
|
spanning-tree port type edge trunk
|
||||||
|
spanning-tree bpduguard enable
|
||||||
|
no snmp trap link-status
|
||||||
|
|
||||||
|
|
||||||
|
Factors description
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
- **reboot-random-controller:** consists of a node-crash fault injection on a
|
||||||
|
random OpenStack controller node.
|
||||||
|
|
||||||
|
- **sigkill-random-rabbitmq:** consists of a service-crash fault injection on
|
||||||
|
a random slave RabbitMQ messaging node.
|
||||||
|
|
||||||
|
- **sigkill-random-mysql:** consists of a service-crash fault injection on a
|
||||||
|
random MySQL node.
|
||||||
|
|
||||||
|
- **freeze-random-nova-api:** consists of a service-hang fault injection to
|
||||||
|
all nova-api process on a random controller node for a 150 seconds period.
|
||||||
|
|
||||||
|
- **freeze-random-memcached:** consists of a service-hang fault injection to
|
||||||
|
the memcached service on a random controller node for a 150 seconds period.
|
||||||
|
|
||||||
|
- **freeze-random-keystone:** consists of a service-hang fault injection to
|
||||||
|
the keystone (public and admin endpoints) service on a random controller
|
||||||
|
node for a 150 seconds period.
|
||||||
|
|
||||||
|
|
||||||
|
Testing process
|
||||||
|
===============
|
||||||
|
|
||||||
|
Use the following VM parameters for testing purposes:
|
||||||
|
|
||||||
|
.. table:: **Test parameters**
|
||||||
|
|
||||||
|
+--------------------------------+--------+
|
||||||
|
|Name |Value |
|
||||||
|
+================================+========+
|
||||||
|
|Flavor to create VM from |m1.tiny |
|
||||||
|
+--------------------------------+--------+
|
||||||
|
|Image name to create VM from |cirros |
|
||||||
|
+--------------------------------+--------+
|
||||||
|
|
||||||
|
#. Create a work directory on a server with Rally role.
|
||||||
|
In this documentation, we name this directory ``WORK_DIR``. The path
|
||||||
|
example: ``/data/rally``.
|
||||||
|
|
||||||
|
#. Create a directory ``plugins`` in ``WORK_DIR`` and copy the
|
||||||
|
:download:`scrappy.py <rally_plugins/scrappy.py>` plugin into that directory.
|
||||||
|
|
||||||
|
#. Download the bash framework :download:`scrappy.sh <rally_plugins/scrappy.sh>`
|
||||||
|
and :download:`scrappy.conf <rally_plugins/scrappy.conf>` to
|
||||||
|
``WORK_DIR/plugins``.
|
||||||
|
|
||||||
|
#. Modify the ``scrappy.conf`` file with appropriate values. For example:
|
||||||
|
|
||||||
|
.. literalinclude:: rally_plugins/scrappy.conf
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
#. Create a ``scenarios`` directory in ``WORK_DIR`` and copy all Rally
|
||||||
|
scenarios with factors that you are planning to test to that directory.
|
||||||
|
For example:
|
||||||
|
:download:`random_controller_reboot_factor.json
|
||||||
|
<rally_scenarios/NovaServers/boot_and_delete_server/random_controller_reboot_factor.json/>`.
|
||||||
|
|
||||||
|
#. Create a ``deployment.json`` file in the ``WORK_DIR`` and fill it with
|
||||||
|
your OpenStack environment info. It should looks like this:
|
||||||
|
|
||||||
|
.. code:: json
|
||||||
|
|
||||||
|
{
|
||||||
|
"admin": {
|
||||||
|
"password": "password",
|
||||||
|
"tenant_name": "tenant",
|
||||||
|
"username": "user"
|
||||||
|
},
|
||||||
|
"auth_url": "http://1.2.3.4:5000/v2.0",
|
||||||
|
"region_name": "RegionOne",
|
||||||
|
"type": "ExistingCloud",
|
||||||
|
"endpoint_type": "internal",
|
||||||
|
"admin_port": 35357,
|
||||||
|
"https_insecure": true
|
||||||
|
}
|
||||||
|
|
||||||
|
#. Prepare for tests:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
${WORK_DIR:?}
|
||||||
|
DEPLOYMENT_NAME="$(uuidgen)"
|
||||||
|
DEPLOYMENT_CONFIG="${WORK_DIR}/deployment.json"
|
||||||
|
rally deployment create --filename $(DEPLOYMENT_CONFIG) --name $(DEPLOYMENT_NAME)
|
||||||
|
|
||||||
|
#. Create a ``/root/scrappy`` directory on every node in your OpenStack
|
||||||
|
environment and copy :download:`scrappy_host.sh <rally_plugins/scrappy_host.sh>`
|
||||||
|
to that directory.
|
||||||
|
|
||||||
|
#. Perform tests:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
PLUGIN_PATH="${WORK_DIR}/plugins"
|
||||||
|
SCENARIOS="random_controller_reboot_factor.json"
|
||||||
|
for scenario in SCENARIOS; do
|
||||||
|
rally --plugin-paths ${PLUGINS_PATH} task start --tag ${scenario} ${WORK_DR}/scenarios/${scenario}
|
||||||
|
done
|
||||||
|
task_list="$(rally task list --uuids-only)"
|
||||||
|
rally task report --tasks ${task_list} --out=${WORK_DIR}/rally_report.html
|
||||||
|
|
||||||
|
Once these steps are done, you get an HTML file with Rally test results.
|
||||||
|
|
||||||
|
|
||||||
|
Test case 1: NovaServers.boot_and_delete_server
|
||||||
|
-----------------------------------------------
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
This Rally scenario boots and deletes virtual instances with injected fault
|
||||||
|
factors using OpenStack Nova API.
|
||||||
|
|
||||||
|
**Service-level agreement**
|
||||||
|
|
||||||
|
=================== ========
|
||||||
|
Parameter Value
|
||||||
|
=================== ========
|
||||||
|
MTTR (sec) <=240
|
||||||
|
Failure rate (%) <=95
|
||||||
|
Auto-healing Yes
|
||||||
|
=================== ========
|
||||||
|
|
||||||
|
**Parameters**
|
||||||
|
|
||||||
|
=================== ========
|
||||||
|
Parameter Value
|
||||||
|
=================== ========
|
||||||
|
Runner constant
|
||||||
|
Concurrency 5
|
||||||
|
Times 100
|
||||||
|
Injection-iteration 20
|
||||||
|
Testing-cycles 5
|
||||||
|
=================== ========
|
||||||
|
|
||||||
|
**List of reliability metrics**
|
||||||
|
|
||||||
|
======== ============== ================= =================================================
|
||||||
|
Priority Value Measurement Units Description
|
||||||
|
======== ============== ================= =================================================
|
||||||
|
1 SLA Boolean Service-level agreement result
|
||||||
|
2 Auto-healing Boolean Is cluster auto-healed after fault-injection
|
||||||
|
3 Failure rate Percents Test iteration failure ratio
|
||||||
|
4 MTTR (auto) Seconds Automatic mean time to repair
|
||||||
|
5 MTTR (manual) Seconds Manual mean time to repair, if Auto MTTR is Inf.
|
||||||
|
======== ============== ================= =================================================
|
||||||
|
|
||||||
|
Test case 1 results
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
reboot-random-controller
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_reboot_factor.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 1 | 4.31 | 2 | Yes | Yes, up to 148.52 sec. |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 2 | 19.88 | 14 | Yes | Yes, up to 150.946 sec. |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 3 | 7.31 | 8 | Yes | Yes, up to 124.593 sec. |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 4 | 95.07 | 9 | Yes | Yes, up to 240.893 |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 5 | Inf. | 80.00 | No | Inf. |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`reboot_random_controller.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/reboot_random_controller.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
| Min | 4.31 | 2 |
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
| Max | 95.07 | 80 |
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
| SLA | Yes | No |
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor affects OpenStack cluster operation on every run.
|
||||||
|
Auto-healing works, but may take a long time. In our testing results, the
|
||||||
|
cluster was recovered on the fifth testing cycle only, after Rally had
|
||||||
|
completed testing with the error status. Therefore, the performance degradation
|
||||||
|
is very significant during cluster recovering.
|
||||||
|
|
||||||
|
sigkill-random-rabbitmq
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_kill_rabbitmq.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 1 | 0 | 0 | Yes | Yes, up to 12.266 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 2 | 0 | 0 | Yes | Yes, up to 15.775 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 3 | 98.52 | 1 | Yes | Yes, up to 145.115 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 4 | 0 | 0 | Yes | No |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 5 | 0 | 0 | Yes | Yes, up to 65.926 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_kill_rabbitmq.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_kill_rabbitmq.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 98.52 | 1 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | Yes |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor may affect OpenStack cluster operation.
|
||||||
|
Auto-healing works fine.
|
||||||
|
Performance degradation is significant during cluster recovering.
|
||||||
|
|
||||||
|
sigkill-random-mysql
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_kill_mysqld.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 1 | 2.31 | 0 | Yes | Yes, up to 12.928 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 2 | 0 | 0 | Yes | Yes, up to 11.156 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 3 | 0 | 1 | Yes | Yes, up to 13.592 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 4 | 0 | 0 | Yes | Yes, up to 11.864 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 5 | 0 | 0 | Yes | Yes, up to 12.715 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_kill_mysqld.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_kill_mysqld.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 2.31 | 1 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | Yes |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor may affect OpenStack cluster operation.
|
||||||
|
Auto-healing works fine.
|
||||||
|
Performance degradation is not significant.
|
||||||
|
|
||||||
|
|
||||||
|
freeze-random-nova-api
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_nova-api_150_sec.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 1 | 0 | 0 | Yes | Yes, up to 156.935 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 2 | 0 | 0 | Yes | Yes, up to 155.085 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 3 | 0 | 0 | Yes | Yes, up to 156.93 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 4 | 0 | 0 | Yes | Yes, up to 156.782 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 5 | 150.55 | 1 | Yes | Yes, up to 154.741 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_freeze_nova_api_150_sec.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_nova_api_150_sec.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 150.55 | 1 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | Yes |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor affects OpenStack cluster operation.
|
||||||
|
Auto-healing does not work. Cluster operation was recovered
|
||||||
|
only after sending SIGCONT POSIX signal to all freezed nova-api
|
||||||
|
processes. Performance degradation is determined by the factor duration time.
|
||||||
|
This behaviour is not normal for an HA OpenStack configuration
|
||||||
|
and should be investigated.
|
||||||
|
|
||||||
|
|
||||||
|
freeze-random-memcached
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_memcached_150_sec.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 1 | 0 | 0 | Yes | Yes, up to 26.679 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 2 | 0 | 0 | Yes | Yes, up to 23.726 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 3 | 0 | 0 | Yes | Yes, up to 21.893 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 4 | 0 | 0 | Yes | Yes, up to 22.796 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 5 | 0 | 0 | Yes | Yes, up to 27.737 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_freeze_memcached_150_sec.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_memcached_150_sec.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | Yes |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor does not affect an OpenStack cluster operations.
|
||||||
|
During the factor testing, a small performance degradation is observed.
|
||||||
|
|
||||||
|
freeze-random-keystone
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_keystone_150_sec.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 1 | 97.19 | 7 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 2 | 93.87 | 6 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 3 | 92.12 | 8 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 4 | 94.51 | 6 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 5 | 98.37 | 7 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_freeze_keystone_150_sec.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_keystone_150_sec.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 92.12 | 6 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 98.37 | 8 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | No |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor affects an OpenStack cluster operations.
|
||||||
|
After the keystone processes freeze on controllers, the HA
|
||||||
|
logic needs approximately 95 seconds to recover service operation.
|
||||||
|
After recovering, performance degradation is not observed but
|
||||||
|
only at small concurrency. This behaviour is not normal
|
||||||
|
for an HA OpenStack configuration and should be investigated in future.
|
||||||
|
|
||||||
|
.. references:
|
||||||
|
.. _Rally installation documentation: https://rally.readthedocs.io/en/latest/install.html
|
@ -0,0 +1,16 @@
|
|||||||
|
# SSH credentials
|
||||||
|
SSH_LOGIN="root"
|
||||||
|
SSH_PASS="r00tme"
|
||||||
|
|
||||||
|
# Controller nodes
|
||||||
|
CONTROLLERS[0]="10.44.0.7"
|
||||||
|
CONTROLLERS[1]="10.44.0.6"
|
||||||
|
CONTROLLERS[2]="10.44.0.5"
|
||||||
|
|
||||||
|
# Compute nodes
|
||||||
|
COMPUTES[0]="10.44.0.3"
|
||||||
|
COMPUTES[1]="10.44.0.4"
|
||||||
|
COMPUTES[2]="10.44.0.8"
|
||||||
|
|
||||||
|
#Scrappy base path
|
||||||
|
SCRAPPY_BASE="/root/scrappy"
|
115
doc/source/test_results/reliability/rally_plugins/scrappy.py
Normal file
115
doc/source/test_results/reliability/rally_plugins/scrappy.py
Normal file
@ -0,0 +1,115 @@
|
|||||||
|
# Copyright 2014: Mirantis Inc.
|
||||||
|
# All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
# not use this file except in compliance with the License. You may obtain
|
||||||
|
# a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
# License for the specific language governing permissions and limitations
|
||||||
|
# under the License.
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Rully scrappy plugin
|
||||||
|
This is pluging was designed for OpenStack
|
||||||
|
reliability testing.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from rally.common.i18n import _
|
||||||
|
from rally import consts
|
||||||
|
from rally.task import sla
|
||||||
|
import os
|
||||||
|
from rally.common import logging
|
||||||
|
from rally.common import streaming_algorithms as streaming
|
||||||
|
|
||||||
|
|
||||||
|
LOG = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
class MttrCalculation():
|
||||||
|
def __init__(self):
|
||||||
|
self.min_timestamp = streaming.MinComputation()
|
||||||
|
self.max_timestamp = streaming.MaxComputation()
|
||||||
|
self.mttr = 0
|
||||||
|
self.last_error_duration = 0
|
||||||
|
self.last_iteration = None
|
||||||
|
|
||||||
|
def add(self, iteration):
|
||||||
|
if iteration["error"]:
|
||||||
|
# Store duration of last error iteration
|
||||||
|
if self.max_timestamp.result() < iteration["timestamp"]:
|
||||||
|
self.last_error_duration = iteration["duration"]
|
||||||
|
|
||||||
|
self.min_timestamp.add(iteration["timestamp"])
|
||||||
|
self.max_timestamp.add(iteration["timestamp"])
|
||||||
|
LOG.info("TIMESTAMP: %s" % iteration["timestamp"])
|
||||||
|
|
||||||
|
self.last_iteration = iteration
|
||||||
|
|
||||||
|
def result(self):
|
||||||
|
self.mttr = round(self.max_timestamp.result() -
|
||||||
|
self.min_timestamp.result() +
|
||||||
|
self.last_error_duration, 2)
|
||||||
|
# SLA Context don't have information about iterations count,
|
||||||
|
# so assume that if last iteration completed with error,
|
||||||
|
# that cluster was not auto-healed
|
||||||
|
if self.last_iteration["error"]:
|
||||||
|
self.mttr = "Inf."
|
||||||
|
return(self.mttr)
|
||||||
|
|
||||||
|
|
||||||
|
@sla.configure(name="scrappy")
|
||||||
|
class Scrappy(sla.SLA):
|
||||||
|
"""Scrappy events."""
|
||||||
|
CONFIG_SCHEMA = {
|
||||||
|
"type": "object",
|
||||||
|
"$schema": consts.JSON_SCHEMA,
|
||||||
|
"properties": {
|
||||||
|
"on_iter": {"type": "number"},
|
||||||
|
"execute": {"type": "string"},
|
||||||
|
"cycle": {"type": "number"}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, criterion_value):
|
||||||
|
super(Scrappy, self).__init__(criterion_value)
|
||||||
|
self.on_iter = self.criterion_value.get("on_iter", None)
|
||||||
|
self.execute = self.criterion_value.get("execute", None)
|
||||||
|
self.cycle = self.criterion_value.get("cycle", 0)
|
||||||
|
self.errors = 0
|
||||||
|
self.total = 0
|
||||||
|
self.error_rate = 0.0
|
||||||
|
self.mttr = MttrCalculation()
|
||||||
|
|
||||||
|
def add_iteration(self, iteration):
|
||||||
|
self.total += 1
|
||||||
|
if iteration["error"]:
|
||||||
|
self.errors += 1
|
||||||
|
|
||||||
|
self.mttr.add(iteration)
|
||||||
|
|
||||||
|
"""Start iteration event"""
|
||||||
|
if self.on_iter == self.total:
|
||||||
|
LOG.info("Scrappy testing cycle: ITER: %s" % self.cycle)
|
||||||
|
LOG.info("Scrappy executing: %s" % self.on_iter)
|
||||||
|
os.system(self.execute)
|
||||||
|
|
||||||
|
self.error_rate = self.errors * 100.0 / self.total
|
||||||
|
self.success = self.error_rate <= 5
|
||||||
|
return self.success
|
||||||
|
|
||||||
|
def merge(self, other):
|
||||||
|
self.total += other.total
|
||||||
|
self.errors += other.errors
|
||||||
|
if self.total:
|
||||||
|
self.error_rate = self.errors * 100.0 / self.total
|
||||||
|
self.success = self.error_rate <= 5
|
||||||
|
return self.success
|
||||||
|
|
||||||
|
def details(self):
|
||||||
|
return (_("Scrappy failure rate %.2f%% MTTR %s seconds - %s") %
|
||||||
|
(self.error_rate, self.mttr.result(), self.status()))
|
116
doc/source/test_results/reliability/rally_plugins/scrappy.sh
Executable file
116
doc/source/test_results/reliability/rally_plugins/scrappy.sh
Executable file
@ -0,0 +1,116 @@
|
|||||||
|
# Copyright 2014: Mirantis Inc.
|
||||||
|
# All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
# not use this file except in compliance with the License. You may obtain
|
||||||
|
# a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
# License for the specific language governing permissions and limitations
|
||||||
|
# under the License.
|
||||||
|
|
||||||
|
#!/bin/bash -xe
|
||||||
|
|
||||||
|
# source credentionals
|
||||||
|
if [ -f /data/rally/rally_plugins/scrappy/scrappy.conf ];
|
||||||
|
then
|
||||||
|
. /data/rally/rally_plugins/scrappy/scrappy.conf
|
||||||
|
else
|
||||||
|
exit -1
|
||||||
|
fi
|
||||||
|
|
||||||
|
#
|
||||||
|
# Function exetute command over ssh
|
||||||
|
# Login & password stored in scrappy.conf
|
||||||
|
#
|
||||||
|
function ssh_exec() {
|
||||||
|
local ssh_node=$1
|
||||||
|
local ssh_cmd=$2
|
||||||
|
local ssh_options='-oConnectTimeout=5 -oStrictHostKeyChecking=no -oCheckHostIP=no -oUserKnownHostsFile=/dev/null -oRSAAuthentication=no'
|
||||||
|
echo "sshpass -p ${SSH_PASS} ssh ${ssh_options} ${SSH_LOGIN}@${ssh_node} ${ssh_cmd}"
|
||||||
|
local ssh_result=`sshpass -p ${SSH_PASS} ssh ${ssh_options} ${SSH_LOGIN}@${ssh_node} ${ssh_cmd}`
|
||||||
|
echo "$ssh_result"
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Function return random controller node from Fuel cluster
|
||||||
|
#
|
||||||
|
function get_random_controller() {
|
||||||
|
local random_controller=${CONTROLLERS[$RANDOM % ${#CONTROLLERS[@]}]}
|
||||||
|
echo $random_controller
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Function return random compute node from Fuel cluster
|
||||||
|
#
|
||||||
|
function get_random_compute() {
|
||||||
|
local random_compute=${COMPUTES[$RANDOM % ${#COMPUTES[@]}]}
|
||||||
|
echo $random_compute
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Factors
|
||||||
|
#
|
||||||
|
function random_controller_kill_rabbitmq() {
|
||||||
|
local action=$1
|
||||||
|
local controller_node=$(get_random_controller)
|
||||||
|
local result=`ssh_exec ${controller_node} "${SCRAPPY_BASE}/scrappy_host.sh send_signal rabbitmq_server -KILL"`
|
||||||
|
echo "$result"
|
||||||
|
}
|
||||||
|
|
||||||
|
function random_controller_freeze_process_random_interval() {
|
||||||
|
local process_name=$1
|
||||||
|
local interval=$2
|
||||||
|
local controller_node=$(get_random_controller)
|
||||||
|
local result=`ssh_exec ${controller_node} "${SCRAPPY_BASE}/scrappy_host.sh freeze_process_random_interval ${process_name} ${interval}"`
|
||||||
|
echo "$result"
|
||||||
|
}
|
||||||
|
|
||||||
|
function random_controller_freeze_process_fixed_interval() {
|
||||||
|
local process_name=$1
|
||||||
|
local interval=$2
|
||||||
|
local controller_node=$(get_random_controller)
|
||||||
|
local result=`ssh_exec ${controller_node} "${SCRAPPY_BASE}/scrappy_host.sh freeze_process_fixed_interval ${process_name} ${interval}"`
|
||||||
|
echo "$result"
|
||||||
|
}
|
||||||
|
|
||||||
|
function random_controller_reboot() {
|
||||||
|
local controller_node=$(get_random_controller)
|
||||||
|
local result=`ssh_exec ${controller_node} "${SCRAPPY_BASE}/scrappy_host.sh reboot_node"`
|
||||||
|
echo "$result"
|
||||||
|
}
|
||||||
|
|
||||||
|
function usage() {
|
||||||
|
echo "usage"
|
||||||
|
echo "TODO"
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Main
|
||||||
|
#
|
||||||
|
function main() {
|
||||||
|
local factor=$1
|
||||||
|
case ${factor} in
|
||||||
|
random_controller_kill_rabbitmq)
|
||||||
|
random_controller_kill_rabbitmq $2
|
||||||
|
;;
|
||||||
|
random_controller_freeze_process_random_interval)
|
||||||
|
random_controller_freeze_process_random_interval $2 $3
|
||||||
|
;;
|
||||||
|
random_controller_freeze_process_fixed_interval)
|
||||||
|
random_controller_freeze_process_fixed_interval $2 $3
|
||||||
|
;;
|
||||||
|
random_controller_reboot)
|
||||||
|
random_controller_reboot
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
usage
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
main "$@"
|
132
doc/source/test_results/reliability/rally_plugins/scrappy_host.sh
Executable file
132
doc/source/test_results/reliability/rally_plugins/scrappy_host.sh
Executable file
@ -0,0 +1,132 @@
|
|||||||
|
# Copyright 2014: Mirantis Inc.
|
||||||
|
# All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
# not use this file except in compliance with the License. You may obtain
|
||||||
|
# a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
# License for the specific language governing permissions and limitations
|
||||||
|
# under the License.
|
||||||
|
|
||||||
|
#!/bin/bash -xe
|
||||||
|
|
||||||
|
LOG_FILE="/var/log/scrappy.log"
|
||||||
|
|
||||||
|
#
|
||||||
|
# Logging function
|
||||||
|
#
|
||||||
|
function log() {
|
||||||
|
echo "`date -u` scrappy_host: $1" >> ${LOG_FILE}
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This is function send specified signal
|
||||||
|
# to all processes with given name
|
||||||
|
#
|
||||||
|
function send_signal() {
|
||||||
|
local process_name=$1
|
||||||
|
local signal=$2
|
||||||
|
local pids=`ps -ef | grep $process_name | grep -v grep | grep -v scrappy_host | awk '{print $2}'`
|
||||||
|
for each_pid in ${pids};
|
||||||
|
do
|
||||||
|
log "sending signal: ${signal} to ${process_name} with pid:$each_pid"
|
||||||
|
kill ${signal} ${each_pid}
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This is function control services
|
||||||
|
#
|
||||||
|
function service_control() {
|
||||||
|
local service_name=$1
|
||||||
|
local service_action=$2
|
||||||
|
log "service control: $service_name action: $service_action"
|
||||||
|
service $service_name $service_action
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This is function reboot node
|
||||||
|
#
|
||||||
|
function reboot_node() {
|
||||||
|
log "reboot"
|
||||||
|
shutdown -r now
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This factor freeze specifid process
|
||||||
|
#
|
||||||
|
function freeze_process_random_interval {
|
||||||
|
local process_name=$1
|
||||||
|
local max_interval=$2
|
||||||
|
local interval=$(( ($RANDOM % ${max_interval}) + 1))
|
||||||
|
log "freeze_process_random_interval: freezing process ${process_name} freeze interval ${interval}"
|
||||||
|
send_signal ${process_name} '-STOP'
|
||||||
|
sleep ${interval}
|
||||||
|
log "freeze_process_random_interval: unfreezing process ${process_name}"
|
||||||
|
send_signal ${process_name} '-CONT'
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This factor freeze specifid process
|
||||||
|
#
|
||||||
|
function freeze_process_fixed_interval {
|
||||||
|
local process_name=$1
|
||||||
|
local interval=$2
|
||||||
|
log "freeze_process_fixed_interval: freezing process ${process_name} freeze interval ${interval}"
|
||||||
|
send_signal ${process_name} '-STOP'
|
||||||
|
sleep ${interval}
|
||||||
|
log "freeze_process_fixed_interval: unfreezing process ${process_name}"
|
||||||
|
send_signal ${process_name} '-CONT'
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Show usage
|
||||||
|
#
|
||||||
|
function usage() {
|
||||||
|
echo "scrappy_host usage:"
|
||||||
|
echo "scrappy_host commands:"
|
||||||
|
echo -e "\t send_signal process_name signal"
|
||||||
|
echo -e "\t service_control service_name action"
|
||||||
|
echo -e "\t freeze_process_random_interval process max_interval"
|
||||||
|
echo -e "\t freeze_process_fixed_interval process interval"
|
||||||
|
echo -e "\t reboot_node"
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# main
|
||||||
|
#
|
||||||
|
function main() {
|
||||||
|
local command=$1
|
||||||
|
case $command in
|
||||||
|
send_signal)
|
||||||
|
send_signal $2 $3
|
||||||
|
;;
|
||||||
|
service_control)
|
||||||
|
service_control $2 $3
|
||||||
|
;;
|
||||||
|
reboot_node)
|
||||||
|
reboot_node
|
||||||
|
;;
|
||||||
|
freeze_process_random_interval)
|
||||||
|
set +xe
|
||||||
|
freeze_process_random_interval $2 $3 &
|
||||||
|
set -xe
|
||||||
|
;;
|
||||||
|
freeze_process_fixed_interval)
|
||||||
|
set +xe
|
||||||
|
freeze_process_fixed_interval $2 $3 &
|
||||||
|
set -xe
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
usage
|
||||||
|
exit -1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
main "$@"
|
@ -0,0 +1,37 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_freeze_process_fixed_interval keystone 150",
|
||||||
|
"cycle": {{i}}
|
||||||
|
},
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
@ -0,0 +1,37 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_freeze_process_fixed_interval memcached 150",
|
||||||
|
"cycle": {{i}}
|
||||||
|
},
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
@ -0,0 +1,37 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 15
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_freeze_process_fixed_interval nova-api 150",
|
||||||
|
"cycle": {{i}}
|
||||||
|
},
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
@ -0,0 +1,38 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh send_signal /usr/sbin/mysqld -KILL",
|
||||||
|
"cycle": {{i}}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
@ -0,0 +1,38 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_kill_rabbitmq",
|
||||||
|
"cycle": {{i}}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
@ -0,0 +1,38 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_reboot",
|
||||||
|
"cycle": {{i}}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue
Block a user