Add tools section

Note: diagrams and pictures will be added in the separated commit,
due to the fact that some of the tools had new releases recently
and some documentation diagrams are not up to date now.

Change-Id: Ie69808be13e932d2c1b72ea35e8f54ff7644aa50
This commit is contained in:
Dina Belova 2015-12-24 14:38:24 +03:00
parent 2203c45448
commit 70bf823515
2 changed files with 260 additions and 0 deletions

View File

@ -9,4 +9,5 @@ Methodologies
.. toctree::
:maxdepth: 2
tools
hyper-scale

View File

@ -0,0 +1,259 @@
=====
Tools
=====
In distributed computing systems three basic components of the underlying
telecommunications architecture can be found - **control plane**,
**data plane** and **management plane**, that is commonly considered a subset
of the control plane.
Classically the control plane is defined as the part of a network that carries
signaling traffic and is responsible for routing. Control plane functions
include system configuration and its management. The data plane in this
definition is the part of a network that carries user traffic and, served by
control plane, enables data transfer to and from system clients.
In case of very specific use case - testing of OpenStack clouds - we can treat
these terms in a bit different light.
For cloud technologies and infrastructures (and OpenStack in particular)
we can define control plane as a part of cloud architecture, that is
responsible for cloud resources management, including their on demand
availability, their scheduling, creation, modification and deletion. These
operations are covered by various distributed cloud services, communicating
via their unified APIs. Data plane in general can be considered as all data
operations performed by the workloads on top of OpenStack infrastructure.
In particular, the most common parts are network operations which are
responsible for data transfer over network protocols between running on cloud
VMs or between a VM and the external networks including Internet. Other parts
of Data Plane layer are related to the storage operations which are used by the
workloads to store data on a persistent storage.
All tools listed below are orienting on the cloud-specific definition of
both control plane and data plane layers.
Control Plane (API) testing
===========================
To evaluate cloud control plane performance, it's vital to understand
performance of resource management operations, that are served by cloud APIs.
Therefore, in this section API testing tools will be listed.
JMeter
------
The leader of the pack in awareness is probably **Apache JMeter**. This is an
open-source Java application whose key feature is a powerful and complete GUI
which you use to create test plans. A test plan is composed of test components
which define every piece of the test such as:
* Multiple threads to generate a load
* Parametrizing HTTP requests
* Flexible output results via listeneres interface
This tool is very well established and is probably one of the best tools for
functional load testing. It allows to model complex flows using conditions and
allows to create custom assertions to validate the behavior. It also allows to
simulate non-trivial HTTP scenarios like logging in before the actual HTTP call
to a specific URL or perform file uploads. **JMeter** has a wide and well
established community which produces various plugins to modify and extend the
built-in behaviors. **JMeter** allows to test not only HTTP based API but also
supports various protocols including:
* Web - HTTP and HTTPS
* SOAP and REST API protocols (over HTTP)
* FTP
* Databases via JDBC Java DB interfaces
* LDAP
* Message oriented middleware MOM via JMS • Mail - SMTP, POP3 and IMAP
* MongoDB
* TCP over IP
And last, but not the least, **JMeter** is open source and free. As with every
tool, it has its own limitations and problems. **JMeter** comes with GUI which
has a steep learning curve. It is overloaded with options and concepts which
one should learn before being able to use this tool efficiently. GUI consumes
a lot of compute and memory resources, so, in order to reduce the performance
impact, the GUI can be switched off and tests can be executed in non-GUI mode.
It will require saving the test scenario in an XML formatted file and using CLI
tool to start the test execution. The desired throughput of the requests is
controlled by several parameters of the test scenario and requires fine-tuning
before test execution.
Gatling
-------
**Gatling** is a highly capable load testing tool. It is designed for ease of
use, maintainability, and high performance.
Out of the box, **Gatling** comes with excellent support of the HTTP protocol
that makes it a tool of choice for load testing of any HTTP server (including
OpenStack controllers, as all cloud management processes running on them are
using HTTP APIs to communicate). As the core engine is actually protocol
agnostic, it is perfectly possible to implement support for other protocols.
For example, **Gatling** currently also ships JMS support. Gatlings
architecture is asynchronous as long as the underlying protocol, such as HTTP,
can be implemented in a non-blocking way. This kind of architecture lets us
implement virtual users as messages instead of dedicated threads, making them
very resource cheap. Thus, running thousands of concurrent virtual users is not
an issue. **Gatling** uses its own DSL for the test scenarios.
Wrk and Apache AB
-----------------
**Wrk** and **Apache AB** are command line tools to test HTTP based resources.
In these tools everything is configured via command line interface through
command line parameters. It has few powerful setting essential to generate
HTTP load. As a result of simplicity both tools are capable to generate high
loads. It can be also extended via plugins, and currently there are plugins for
Kafka and RabbitMQ tests.
Rally
-----
**Rally** is a benchmarking tool that was designed specifically for OpenStack
API testing. To make this possible, **Rally** automates and unifies multi-node
OpenStack deployment, cloud verification, benchmarking & profiling. **Rally**
does it in a generic way, making it possible to check whether OpenStack is
going to work well on, say, a 1k-servers installation under high load. The
actual **Rally** core consists of four main components, listed below in the
order they go into action:
* *Server Providers* provide a unified interface for interaction with different
virtualization technologies (*LXS*, *Virsh* etc.) and cloud suppliers
(like *Amazon*): it does so via SSH access and in one L3 network
* *Deploy Engines* either deploy some OpenStack distribution (like *DevStack*,
*Fuel* or others) or use the existing one before any benchmarking procedures
take place, using servers retrieved from *Server Providers*
* *Verification* runs *Tempest* (or another subunit-based tool) against the
deployed cloud to check that it works correctly, collects results and
presents them in a human readable form
* *Benchmark Engine* allows to write parameterized benchmark tasks & run
them against the cloud (and in fact, it is the most powerful part of this
framework, that allows to simulate any kind of usual OpenStack control
plane load)
*Rally* is written in Python language and can easily be extended with plugins
written in Python.
Data Plane testing
==================
For now there is no 100% ready data plane testing tool for OpenStack, that can
cover all possible data workloads, although, there are still options to use
and to improve, to be able to run complex testing data workloads against the
cloud.
VMTP
----
VMTP_ is a data path performance measurement tool built specifically for
OpenStack clouds. It was written to provide a quick, simple and automated way
to get VM-level or host-level single-flow throughput and latency numbers from
any OpenStack cloud, and to take into account various Neutron topologies.
*VMTP* is a small Python application that will automatically perform ping
connectivity, round trip time measurement (latency) and TCP/UDP throughput
measurement for the following East/West flows on any OpenStack deployment:
* VM to VM same network (private fixed IP)
* VM to VM different network using fixed IP (same as intra-tenant L3 fixed IP)
* VM to VM different network using floating IP and NAT (same as floating IP
inter-tenant L3)
Optionally, when an external Linux host is available for testing North/South
flows:
* External host/VM download and upload throughput/latency (L3/floating IP)
In case if SSH login to any Linux host (native or virtual) is available, *VMTP*
can collect the following data:
* Host to host process-level throughput/latency (intra-node and inter-node)
Also, *VMTP* can automatically extract the CPU usage from all native hosts in
the cloud during the throughput tests, provided the Ganglia monitoring service
(gmond) installed and enabled on those hosts.
For VM-related flows, *VMTP* will automatically create the necessary OpenStack
resources (router, networks, subnets, key pairs, security groups, test VMs)
using the public OpenStack API, install the test tools and then orchestrate
them to gather the throughput measurements then cleanup all related resources
before exiting.
.. _VMTP: https://github.com/openstack/vmtp
Shaker
------
The Shaker_ tool is a tool used and developed by Mirantis to understand the
Data Plane capabilities of an OpenStack deployment. Data Plane testing helps
cloud administrators understand their deployment from the perspective of the
applications that are using the environment. This tool can be used for
deployment planning, environment verification, and troubleshooting.
Today, *Shaker* focuses on network based tests using iperf to drive load across
the network. *Shaker* has future plans to roll out testing to evaluate I/O and
CPU.
*Shaker* utilizes Heat (OpenStack Orchestration) templates to deploy and
execute Data Plane tests. It deploys a number of agents/compute nodes that all
report back to a centralized *Shaker* server.
The server is executed by *shaker* command and is responsible for deployment of
instances, execution of tests as specified in the scenario, for results
processing and report generation. The agent is light-weight and polls tasks
from the server and replies with the results. Agents have connectivity to the
server, but the server does not (so it is easy to keep agents behind NAT).
*Shaker* runs three types of network tests with many different options
(including TCP and UDP). Below is a summary of the tests and their
characteristics:
* type of the test:
* VMs in the same network (L2)
* VMs in a different network (L3 East/West)
* VMs hitting external IP addresses (L3 North/South)
* communication: either floating IPs or SNAT/internal
* number of VMs: from 1 to *N/2*, where *N* is number of compute nodes
available
* external hosts to use: static hard coded
* VM placement:
* one VM per compute
* two VMs per compute
* two VMs per compute (different networks)
Shaker L2 Segment Topology
~~~~~~~~~~~~~~~~~~~~~~~~~~
With VMs in the same network (L2 network test), *Shaker* deploys two VMs in the
same network using Heat templates, and runs *iperf* between them, measuring the
single stream network performance between.
Shaker L3 East-West Topology
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With VMs in different networks (L3 east/west), *Shaker* deploys two VMs in
different networks using Heat templates, and runs *iperf* between them,
measuring the single stream network performance between them. This will
involve routing and will test the performance of the deployed SDN overlay.
Shaker L3 North-South Topology
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The last case is about VMs hitting external IP addresses (L3 north/south).
*Shaker* deploys one of the VMs with an external (floating) IP address, and
runs *iperf* between the some given external node and the VM.
.. _Shaker: https://github.com/openstack/shaker
Rally
-----
Although right now *Rally* is used for control plane testing, there is the
approved blueprint_ for it to support various workloads testing, that means
that in future it will be possible to use *Rally* for all data plane testing
as well.
.. _blueprint: https://blueprints.launchpad.net/rally/+spec/vm-workloads-framework