Test plan of provisioning systems.
Add test plan which describes how measure performance of provisioning systems. * Add table titles * Delete configuration management template * Delete author info * Fix abstract and conventions sections * Change hardware info table to good looking * Fix the reference to the script Change-Id: I8cb3524dbd12bcd67502e3c6fd9c003b495f18bb
This commit is contained in:
parent
8d0ce71e72
commit
a101c13368
@ -11,4 +11,3 @@ Performance Documentation
|
||||
.. raw:: pdf
|
||||
|
||||
PageBreak oneColumn
|
||||
|
||||
|
@ -10,3 +10,4 @@ Test Plans
|
||||
:maxdepth: 2
|
||||
|
||||
mq/index
|
||||
provisioning/main
|
||||
|
345
doc/source/test_plans/provisioning/main.rst
Normal file
345
doc/source/test_plans/provisioning/main.rst
Normal file
@ -0,0 +1,345 @@
|
||||
.. _Measuring_performance_of_provisioning_systems:
|
||||
|
||||
=============================================
|
||||
Measuring performance of provisioning systems
|
||||
=============================================
|
||||
|
||||
:status: draft is in progress
|
||||
:version: 0
|
||||
|
||||
:Abstract:
|
||||
|
||||
This document describes a test plan for quantifying the performance of
|
||||
provisioning systems as a function of the number of nodes to be provisioned. The
|
||||
plan includes the collection of several resource utilization metrics, which will
|
||||
be used to analyze and understand the overall performance of each system. In
|
||||
particular, resource bottlenecks will either be fixed, or best practices
|
||||
developed for system configuration and hardware requirements.
|
||||
|
||||
:Conventions:
|
||||
|
||||
- **Provisioning:** is the entire process of installing and configuring an
|
||||
operating system.
|
||||
|
||||
- **Provisioning system:** is a service or a set of services which enables the
|
||||
installation of an operating system and performs basic operations such as
|
||||
configuring network interfaces and partitioning disks. A preliminary
|
||||
`list of provisioning systems`_ can be found below in `Applications`_.
|
||||
The provisioning system
|
||||
can include configuration management systems like Puppet or Chef, but
|
||||
this feature will not be considered in this document. The test plan for
|
||||
configuration management systems is described in the
|
||||
"Measuring_performance_of_configuration_management_systems" document.
|
||||
|
||||
- **Performance of a provisioning system:** is a set of metrics which
|
||||
describes how many nodes can be provisioned at the same time and the
|
||||
hardware resources required to do so.
|
||||
|
||||
- **Nodes:** are servers which will be provisioned.
|
||||
|
||||
List of performance metrics
|
||||
---------------------------
|
||||
The table below shows the list of test metrics to be collected. The priority
|
||||
is the relative ranking of the importance of each metric in evaluating the
|
||||
performance of the system.
|
||||
|
||||
.. table:: List of performance metrics
|
||||
|
||||
+--------+------------------------+------------------------------------------+
|
||||
|Priority| Parameter | Description |
|
||||
+========+========================+==========================================+
|
||||
| | | | The elapsed time to provision all |
|
||||
| 1 |PROVISIONING_TIME(NODES)| | nodes, as a function of the numbers of |
|
||||
| | | | nodes |
|
||||
+--------+------------------------+------------------------------------------+
|
||||
| | | | Incoming network bandwidth usage as a |
|
||||
| 2 |INGRESS_NET(NODES) | | function of the number of nodes. |
|
||||
| | | | Average during provisioning on the host|
|
||||
| | | | where the provisioning system is |
|
||||
| | | | installed. |
|
||||
+--------+------------------------+------------------------------------------+
|
||||
| | | | Outgoing network bandwidth usage as a |
|
||||
| 2 | EGRESS_NET(NODES) | | function of the number of nodes. |
|
||||
| | | | Average during provisioning on the host|
|
||||
| | | | where the provisioning system is |
|
||||
| | | | installed. |
|
||||
+--------+------------------------+------------------------------------------+
|
||||
| | | | CPU utilization as a function of the |
|
||||
| 3 | CPU(NODES) | | number of nodes. Average during |
|
||||
| | | | provisioning on the host where the |
|
||||
| | | | provisioning system is installed. |
|
||||
+--------+------------------------+------------------------------------------+
|
||||
| | | | Active memory usage as a function of |
|
||||
| 3 | RAM(NODES) | | the number of nodes. Average during |
|
||||
| | | | provisioning on the host where the |
|
||||
| | | | provisioning system is installed. |
|
||||
+--------+------------------------+------------------------------------------+
|
||||
| | | | Storage read IO bandwidth as a |
|
||||
| 3 | WRITE_IO(NODES) | | function of the number of nodes. |
|
||||
| | | | Average during provisioning on the host|
|
||||
| | | | where the provisioning system is |
|
||||
| | | | installed. |
|
||||
+--------+------------------------+------------------------------------------+
|
||||
| | | | Storage write IO bandwidth as a |
|
||||
| 3 | READ_IO(NODES) | | function of the number of nodes. |
|
||||
| | | | Average during provisioning on the host|
|
||||
| | | | where the provisioning system is |
|
||||
| | | | installed. |
|
||||
+--------+------------------------+------------------------------------------+
|
||||
|
||||
Test Plan
|
||||
---------
|
||||
|
||||
The above performance metrics will be measured for various number
|
||||
of provisioned nodes. The result will be a table that shows the
|
||||
dependence of these metrics on the number of nodes.
|
||||
|
||||
Environment description
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Test results MUST include a description of the environment used. The following items
|
||||
should be included:
|
||||
|
||||
- **Hardware configuration of each server.** If virtual machines are used then both
|
||||
physical and virtual hardware should be fully documented.
|
||||
An example format is given below:
|
||||
|
||||
.. table:: Description of server hardware
|
||||
|
||||
+-------+----------------+-------+-------+
|
||||
|server |name | | |
|
||||
| +----------------+-------+-------+
|
||||
| |role | | |
|
||||
| +----------------+-------+-------+
|
||||
| |vendor,model | | |
|
||||
| +----------------+-------+-------+
|
||||
| |operating_system| | |
|
||||
+-------+----------------+-------+-------+
|
||||
|CPU |vendor,model | | |
|
||||
| +----------------+-------+-------+
|
||||
| |processor_count | | |
|
||||
| +----------------+-------+-------+
|
||||
| |core_count | | |
|
||||
| +----------------+-------+-------+
|
||||
| |frequency_MHz | | |
|
||||
+-------+----------------+-------+-------+
|
||||
|RAM |vendor,model | | |
|
||||
| +----------------+-------+-------+
|
||||
| |amount_MB | | |
|
||||
+-------+----------------+-------+-------+
|
||||
|NETWORK|interface_name | | |
|
||||
| +----------------+-------+-------+
|
||||
| |vendor,model | | |
|
||||
| +----------------+-------+-------+
|
||||
| |bandwidth | | |
|
||||
+-------+----------------+-------+-------+
|
||||
|STORAGE|dev_name | | |
|
||||
| +----------------+-------+-------+
|
||||
| |vendor,model | | |
|
||||
| +----------------+-------+-------+
|
||||
| |SSD/HDD | | |
|
||||
| +----------------+-------+-------+
|
||||
| |size | | |
|
||||
+-------+----------------+-------+-------+
|
||||
|
||||
- **Configuration of hardware network switches.** The configuration file from the
|
||||
switch can be downloaded and attached.
|
||||
|
||||
- **Configuration of virtual machines and virtual networks (if they are used).**
|
||||
The configuration files can be attached, along with the mapping of virtual
|
||||
machines to host machines.
|
||||
|
||||
- **Network scheme.** The plan should show how all hardware is connected and
|
||||
how the components communicate. All ethernet/fibrechannel and VLAN channels
|
||||
should be included. Each interface of every hardware component should be
|
||||
matched with the corresponding L2 channel and IP address.
|
||||
|
||||
- **Software configuration of the provisioning system.** `sysctl.conf` and any
|
||||
other kernel file that is changed from the default should be attached.
|
||||
List of installed packages should be attached. Specifications of the
|
||||
operating system, network interfaces configuration, and disk partitioning
|
||||
configuration should be included. If distributed provisioning systems are
|
||||
to be tested then the parts that are distributed need to be described.
|
||||
|
||||
- **Desired software configuration of the provisioned nodes.**
|
||||
The operating system, disk partitioning scheme, network interface
|
||||
configuration, installed packages and other components of the nodes
|
||||
affect the amount of work to be performed by the provisioning system
|
||||
and thus its performance.
|
||||
|
||||
Preparation
|
||||
^^^^^^^^^^^
|
||||
1.
|
||||
The following package needs to be installed on the provisioning system
|
||||
servers to collect performance metrics.
|
||||
|
||||
.. table:: Software to be installed
|
||||
|
||||
+--------------+---------+-----------------------------------+
|
||||
| package name | version | source |
|
||||
+==============+=========+===================================+
|
||||
| `dstat`_ | 0.7.2 | Ubuntu trusty universe repository |
|
||||
+--------------+---------+-----------------------------------+
|
||||
|
||||
Measuring performance values
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
The script
|
||||
`Full script for collecting performance metrics`_
|
||||
can be used for the first five of the following steps.
|
||||
|
||||
.. note::
|
||||
If a distributed provisioning system is used, the values need to be
|
||||
measured on each provisioning system instance.
|
||||
|
||||
1.
|
||||
Start the collection of CPU, memory, network, and storage metrics during the
|
||||
provisioning process. Use the dstat programm which can collect all of these
|
||||
metrics in CSV format into a log file.
|
||||
2.
|
||||
Start the provisioning process for the first node and record the wall time.
|
||||
3.
|
||||
Wait until the provisioning process has finished (when all nodes are reachable
|
||||
via ssh)
|
||||
and record the wall time.
|
||||
4.
|
||||
Stop the dstat program.
|
||||
5.
|
||||
Prepare collected data for analysis. dstat provides a large amount of
|
||||
information, which can be pruned by saving only the following:
|
||||
|
||||
* "system"[time]. Save as given.
|
||||
|
||||
* 100-"total cpu usage"[idl]. dstat provides only the idle CPU value. CPU
|
||||
utilization is calculated by subtracting the idle value from 100%.
|
||||
|
||||
* "memory usage"[used]. dstat provides this value in Bytes.
|
||||
This is converted it to Megabytes by dividing by 1024*1024=1048576.
|
||||
|
||||
* "net/eth0"[recv] receive bandwidth on the NIC. It is converted to Megabits
|
||||
per second by dividing by 1024*1024/8=131072.
|
||||
|
||||
* "net/eth0"[send] send bandwidth on the NIC. It is converted to Megabits
|
||||
per second by dividing by 1024*1024/8=131072.
|
||||
|
||||
* "net/eth0"[recv]+"net/eth0"[send]. The total receive and transmit bandwidth
|
||||
on the NIC. dstat provides these values in Bytes per second. They are
|
||||
converted to Megabits per second by dividing by 1024*1024/8=131072.
|
||||
|
||||
* "io/total"[read] storage read IO bandwidth.
|
||||
|
||||
* "io/total"[writ] storage write IO bandwidth.
|
||||
|
||||
* "io/total"[read]+"io/total"[writ]. The total read and write storage IO
|
||||
bandwidth.
|
||||
|
||||
These values will be graphed and maximum values reported.
|
||||
|
||||
6.
|
||||
Repeat steps 1-5 for provisioning at the same time the following number of
|
||||
nodes:
|
||||
|
||||
* 10 nodes
|
||||
* 20 nodes
|
||||
* 40 nodes
|
||||
* 80 nodes
|
||||
* 160 nodes
|
||||
* 320 nodes
|
||||
* 640 nodes
|
||||
* 1280 nodes
|
||||
* 2000 nodes
|
||||
|
||||
Additional tests will be performed if some anomalous behaviour is found.
|
||||
These may require the collection of additional performance metrics.
|
||||
|
||||
7.
|
||||
The result of this part of test will be:
|
||||
|
||||
* to provide the following graphs, one for each number of provisioned nodes:
|
||||
|
||||
#) Three dependencies on one graph.
|
||||
|
||||
* INGRESS_NET(TIME) Dependence on time of incoming network bandwidth usage.
|
||||
* EGRESS_NET(TIME) Dependence on time of outgoing network bandwidth usage.
|
||||
* ALL_NET(TIME) Dependence on time of total network bandwidth usage.
|
||||
|
||||
#) One dependence on one graph.
|
||||
|
||||
* CPU(TIME) Dependence on time of CPU utilization.
|
||||
|
||||
#) One dependence on one graph.
|
||||
|
||||
* RAM(TIME) Dependence on time of active memory usage.
|
||||
|
||||
#) Three dependencies on one graph.
|
||||
|
||||
* WRITE_IO(TIME) Dependence on time of storage write IO bandwidth.
|
||||
* READ_IO(TIME) Dependence on time of storage read IO bandwidth.
|
||||
* ALL_IO(TIME) Dependence on time of total storage IO bandwidth.
|
||||
|
||||
.. note::
|
||||
If a distributed provisioning system is used, the above graphs should be
|
||||
provided for each provisioning system instance.
|
||||
|
||||
* to fill in the following table for maximum values:
|
||||
|
||||
The resource metrics are obtained from the maxima of the corresponding graphs
|
||||
above. The provisioning time is the elapsed time for all nodes to be
|
||||
provisioned. One set of metrics will be given for each number of provisioned
|
||||
nodes.
|
||||
|
||||
.. table:: Maximum values of performance metrics
|
||||
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
|| nodes|| provisioning|| maximum|| maximum|| maximum|| maximum|
|
||||
|| count|| time || CPU || RAM || NET || IO |
|
||||
| | || usage || usage || usage || usage |
|
||||
+=======+==============+=========+=========+=========+=========+
|
||||
| 10 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
| 20 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
| 40 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
| 80 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
| 160 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
| 320 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
| 640 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
| 1280 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
| 2000 | | | | | |
|
||||
+-------+--------------+---------+---------+---------+---------+
|
||||
|
||||
Applications
|
||||
------------
|
||||
|
||||
list of provisioning systems
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. table:: list of provisioning systems
|
||||
|
||||
+-----------------------------+---------+
|
||||
| Name of provisioning system | Version |
|
||||
+=============================+=========+
|
||||
| `Cobbler`_ | 2.4 |
|
||||
+-----------------------------+---------+
|
||||
| `Razor`_ | 0.13 |
|
||||
+-----------------------------+---------+
|
||||
| Image based provisioning | |
|
||||
| via downloading images with | - |
|
||||
| bittorrent protocol | |
|
||||
+-----------------------------+---------+
|
||||
|
||||
Full script for collecting performance metrics
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. literalinclude:: measure.sh
|
||||
:language: bash
|
||||
|
||||
.. references:
|
||||
|
||||
.. _dstat: http://dag.wiee.rs/home-made/dstat/
|
||||
.. _Cobbler: http://cobbler.github.io/
|
||||
.. _Razor: https://github.com/puppetlabs/razor-server
|
86
doc/source/test_plans/provisioning/measure.sh
Normal file
86
doc/source/test_plans/provisioning/measure.sh
Normal file
@ -0,0 +1,86 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Need to install the required packages on provisioning system servers:
|
||||
if (("`dpkg -l | grep dstat | grep ^ii > /dev/null; echo $?` == 1"))
|
||||
then
|
||||
apt-get -y install dstat
|
||||
fi
|
||||
|
||||
# Need to prepare the following script on provisioning system server to collect
|
||||
# values of CPU,RAM,NET and IO loads per second. You need to change "INTERFACE"
|
||||
# variable regarding the interface which connected to nodes to communicare with
|
||||
# them during provisioning process. As a result of this command we'll get
|
||||
# running in backgroud dstat programm which collecting needed parametes in CSV
|
||||
# format into /var/log/dstat.log file.:
|
||||
INTERFACE=eth0
|
||||
OUTPUT_FILE=/var/log/dstat.csv
|
||||
dstat --nocolor --time --cpu --mem --net -N ${INTERFACE} --io --output ${OUTPUT_FILE} > /dev/null &
|
||||
|
||||
# Need to prepare script which starts provisioning process and gets the time when
|
||||
# provisioning started and when provisioning ended ( when all nodes reachable via
|
||||
# ssh). We'll analyze results collected during this time window. For getting
|
||||
# start time we can add "date" command before API call or CLI command and forward
|
||||
# the output of the command to some log file. Here is example for cobbler:
|
||||
ENV_NAME=env-1
|
||||
start_time=`date +%s.%N`
|
||||
echo "Provisioning started at "`date` > /var/log/provisioning.log
|
||||
for SYSTEM in `cobbler system find --comment=${ENV_NAME}`
|
||||
do
|
||||
cobbler system reboot --name=$i &
|
||||
done
|
||||
|
||||
# For getting end-time we can use the script below. This script tries to reach
|
||||
# nodes via ssh and write "Provisioning finished at <date/time>" into
|
||||
# /var/log/provisioning.log file. You'll need to provide ip addresses of the
|
||||
# nodes (from file nodes_ips.list, where IPs listed one per line) and
|
||||
# creadentials (SSH_PASSWORD and SSH_USER variables):
|
||||
SSH_OPTIONS="StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
|
||||
SSH_PASSWORD="r00tme"
|
||||
SSH_USER="root"
|
||||
NODE_IPS=(`cat nodes_ips.list`)
|
||||
TIMER=0
|
||||
TIMEOUT=20
|
||||
while (("${TIMER}" < "${TIMEOUT}"))
|
||||
do
|
||||
for NODE_IP in ${NODE_IPS[@]}
|
||||
do
|
||||
SSH_CMD="sshpass -p ${SSH_PASSWORD} ssh -o ${SSH_OPTIONS} ${SSH_USER}@${NODE_IP}"
|
||||
${SSH_CMD} "hostname" && UNHAPPY_SSH=0 || UNHAPPY_SSH=1
|
||||
if (("${UNHAPPY_SSH}" == "0"))
|
||||
then
|
||||
echo "Node with ip "${NODE_IP}" is reachable via ssh"
|
||||
NODE_IPS=(${NODE_IPS[@]/${NODE_IP}})
|
||||
else
|
||||
echo "Node with ip "${NODE_IP}" is still unreachable via ssh"
|
||||
fi
|
||||
done
|
||||
TIMER=$((${TIMER} + 1))
|
||||
if (("${TIMER}" == "${TIMEOUT}"))
|
||||
then
|
||||
echo "The following "${#NODE_IPS[@]}" are unreachable"
|
||||
echo ${NODE_IPS[@]}
|
||||
exit 1
|
||||
fi
|
||||
if ((${#NODE_IPS[@]} == 0 ))
|
||||
then
|
||||
break
|
||||
fi
|
||||
# Check that nodes are reachable once per 1 seconds
|
||||
sleep 1
|
||||
done
|
||||
echo "Provisioning finished at "`date` > /var/log/provisioning.log
|
||||
|
||||
end_time=`date +%s.%N`
|
||||
elapsed_time=$(echo "$end_time - $start_time" | bc -l)
|
||||
echo "Total elapsed time for provisioning: $elapsed_time seconds" > /var/log/provisioning.log
|
||||
|
||||
# Stop dstat command
|
||||
killall dstat
|
||||
|
||||
# Delete excess values and convert to needed metrics. So, we'll get the
|
||||
# following csv format:
|
||||
# time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all
|
||||
awk -F "," 'BEGIN {getline;getline;getline;getline;getline;getline;getline;
|
||||
print "time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all"}
|
||||
{print $1","100-$4","$8/1048576","$12/131072","$13/131072","($12+$13)/131072","$14","$15","$14+$15}' \
|
||||
$OUTPUT_FILE > /var/log/10_nodes.csv
|
Loading…
x
Reference in New Issue
Block a user