From a101c1336885f1c1ee41269a1911d4cbc440cfbb Mon Sep 17 00:00:00 2001 From: Leontii Istomin Date: Tue, 8 Dec 2015 10:57:49 +0000 Subject: [PATCH] Test plan of provisioning systems. Add test plan which describes how measure performance of provisioning systems. * Add table titles * Delete configuration management template * Delete author info * Fix abstract and conventions sections * Change hardware info table to good looking * Fix the reference to the script Change-Id: I8cb3524dbd12bcd67502e3c6fd9c003b495f18bb --- doc/source/index.rst | 1 - doc/source/test_plans/index.rst | 1 + doc/source/test_plans/provisioning/main.rst | 345 ++++++++++++++++++ doc/source/test_plans/provisioning/measure.sh | 86 +++++ 4 files changed, 432 insertions(+), 1 deletion(-) create mode 100644 doc/source/test_plans/provisioning/main.rst create mode 100644 doc/source/test_plans/provisioning/measure.sh diff --git a/doc/source/index.rst b/doc/source/index.rst index 91a794a..359a7ca 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -11,4 +11,3 @@ Performance Documentation .. raw:: pdf PageBreak oneColumn - diff --git a/doc/source/test_plans/index.rst b/doc/source/test_plans/index.rst index b36329e..6ecce67 100644 --- a/doc/source/test_plans/index.rst +++ b/doc/source/test_plans/index.rst @@ -10,3 +10,4 @@ Test Plans :maxdepth: 2 mq/index + provisioning/main diff --git a/doc/source/test_plans/provisioning/main.rst b/doc/source/test_plans/provisioning/main.rst new file mode 100644 index 0000000..957db60 --- /dev/null +++ b/doc/source/test_plans/provisioning/main.rst @@ -0,0 +1,345 @@ +.. _Measuring_performance_of_provisioning_systems: + +============================================= +Measuring performance of provisioning systems +============================================= + +:status: draft is in progress +:version: 0 + +:Abstract: + + This document describes a test plan for quantifying the performance of + provisioning systems as a function of the number of nodes to be provisioned. The + plan includes the collection of several resource utilization metrics, which will + be used to analyze and understand the overall performance of each system. In + particular, resource bottlenecks will either be fixed, or best practices + developed for system configuration and hardware requirements. + +:Conventions: + + - **Provisioning:** is the entire process of installing and configuring an + operating system. + + - **Provisioning system:** is a service or a set of services which enables the + installation of an operating system and performs basic operations such as + configuring network interfaces and partitioning disks. A preliminary + `list of provisioning systems`_ can be found below in `Applications`_. + The provisioning system + can include configuration management systems like Puppet or Chef, but + this feature will not be considered in this document. The test plan for + configuration management systems is described in the + "Measuring_performance_of_configuration_management_systems" document. + + - **Performance of a provisioning system:** is a set of metrics which + describes how many nodes can be provisioned at the same time and the + hardware resources required to do so. + + - **Nodes:** are servers which will be provisioned. + +List of performance metrics +--------------------------- +The table below shows the list of test metrics to be collected. The priority +is the relative ranking of the importance of each metric in evaluating the +performance of the system. + +.. table:: List of performance metrics + + +--------+------------------------+------------------------------------------+ + |Priority| Parameter | Description | + +========+========================+==========================================+ + | | | | The elapsed time to provision all | + | 1 |PROVISIONING_TIME(NODES)| | nodes, as a function of the numbers of | + | | | | nodes | + +--------+------------------------+------------------------------------------+ + | | | | Incoming network bandwidth usage as a | + | 2 |INGRESS_NET(NODES) | | function of the number of nodes. | + | | | | Average during provisioning on the host| + | | | | where the provisioning system is | + | | | | installed. | + +--------+------------------------+------------------------------------------+ + | | | | Outgoing network bandwidth usage as a | + | 2 | EGRESS_NET(NODES) | | function of the number of nodes. | + | | | | Average during provisioning on the host| + | | | | where the provisioning system is | + | | | | installed. | + +--------+------------------------+------------------------------------------+ + | | | | CPU utilization as a function of the | + | 3 | CPU(NODES) | | number of nodes. Average during | + | | | | provisioning on the host where the | + | | | | provisioning system is installed. | + +--------+------------------------+------------------------------------------+ + | | | | Active memory usage as a function of | + | 3 | RAM(NODES) | | the number of nodes. Average during | + | | | | provisioning on the host where the | + | | | | provisioning system is installed. | + +--------+------------------------+------------------------------------------+ + | | | | Storage read IO bandwidth as a | + | 3 | WRITE_IO(NODES) | | function of the number of nodes. | + | | | | Average during provisioning on the host| + | | | | where the provisioning system is | + | | | | installed. | + +--------+------------------------+------------------------------------------+ + | | | | Storage write IO bandwidth as a | + | 3 | READ_IO(NODES) | | function of the number of nodes. | + | | | | Average during provisioning on the host| + | | | | where the provisioning system is | + | | | | installed. | + +--------+------------------------+------------------------------------------+ + +Test Plan +--------- + +The above performance metrics will be measured for various number +of provisioned nodes. The result will be a table that shows the +dependence of these metrics on the number of nodes. + +Environment description +^^^^^^^^^^^^^^^^^^^^^^^ +Test results MUST include a description of the environment used. The following items +should be included: + +- **Hardware configuration of each server.** If virtual machines are used then both + physical and virtual hardware should be fully documented. + An example format is given below: + +.. table:: Description of server hardware + + +-------+----------------+-------+-------+ + |server |name | | | + | +----------------+-------+-------+ + | |role | | | + | +----------------+-------+-------+ + | |vendor,model | | | + | +----------------+-------+-------+ + | |operating_system| | | + +-------+----------------+-------+-------+ + |CPU |vendor,model | | | + | +----------------+-------+-------+ + | |processor_count | | | + | +----------------+-------+-------+ + | |core_count | | | + | +----------------+-------+-------+ + | |frequency_MHz | | | + +-------+----------------+-------+-------+ + |RAM |vendor,model | | | + | +----------------+-------+-------+ + | |amount_MB | | | + +-------+----------------+-------+-------+ + |NETWORK|interface_name | | | + | +----------------+-------+-------+ + | |vendor,model | | | + | +----------------+-------+-------+ + | |bandwidth | | | + +-------+----------------+-------+-------+ + |STORAGE|dev_name | | | + | +----------------+-------+-------+ + | |vendor,model | | | + | +----------------+-------+-------+ + | |SSD/HDD | | | + | +----------------+-------+-------+ + | |size | | | + +-------+----------------+-------+-------+ + +- **Configuration of hardware network switches.** The configuration file from the + switch can be downloaded and attached. + +- **Configuration of virtual machines and virtual networks (if they are used).** + The configuration files can be attached, along with the mapping of virtual + machines to host machines. + +- **Network scheme.** The plan should show how all hardware is connected and + how the components communicate. All ethernet/fibrechannel and VLAN channels + should be included. Each interface of every hardware component should be + matched with the corresponding L2 channel and IP address. + +- **Software configuration of the provisioning system.** `sysctl.conf` and any + other kernel file that is changed from the default should be attached. + List of installed packages should be attached. Specifications of the + operating system, network interfaces configuration, and disk partitioning + configuration should be included. If distributed provisioning systems are + to be tested then the parts that are distributed need to be described. + +- **Desired software configuration of the provisioned nodes.** + The operating system, disk partitioning scheme, network interface + configuration, installed packages and other components of the nodes + affect the amount of work to be performed by the provisioning system + and thus its performance. + +Preparation +^^^^^^^^^^^ +1. + The following package needs to be installed on the provisioning system + servers to collect performance metrics. + +.. table:: Software to be installed + + +--------------+---------+-----------------------------------+ + | package name | version | source | + +==============+=========+===================================+ + | `dstat`_ | 0.7.2 | Ubuntu trusty universe repository | + +--------------+---------+-----------------------------------+ + +Measuring performance values +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The script +`Full script for collecting performance metrics`_ +can be used for the first five of the following steps. + +.. note:: + If a distributed provisioning system is used, the values need to be + measured on each provisioning system instance. + +1. + Start the collection of CPU, memory, network, and storage metrics during the + provisioning process. Use the dstat programm which can collect all of these + metrics in CSV format into a log file. +2. + Start the provisioning process for the first node and record the wall time. +3. + Wait until the provisioning process has finished (when all nodes are reachable + via ssh) + and record the wall time. +4. + Stop the dstat program. +5. + Prepare collected data for analysis. dstat provides a large amount of + information, which can be pruned by saving only the following: + + * "system"[time]. Save as given. + + * 100-"total cpu usage"[idl]. dstat provides only the idle CPU value. CPU + utilization is calculated by subtracting the idle value from 100%. + + * "memory usage"[used]. dstat provides this value in Bytes. + This is converted it to Megabytes by dividing by 1024*1024=1048576. + + * "net/eth0"[recv] receive bandwidth on the NIC. It is converted to Megabits + per second by dividing by 1024*1024/8=131072. + + * "net/eth0"[send] send bandwidth on the NIC. It is converted to Megabits + per second by dividing by 1024*1024/8=131072. + + * "net/eth0"[recv]+"net/eth0"[send]. The total receive and transmit bandwidth + on the NIC. dstat provides these values in Bytes per second. They are + converted to Megabits per second by dividing by 1024*1024/8=131072. + + * "io/total"[read] storage read IO bandwidth. + + * "io/total"[writ] storage write IO bandwidth. + + * "io/total"[read]+"io/total"[writ]. The total read and write storage IO + bandwidth. + + These values will be graphed and maximum values reported. + +6. + Repeat steps 1-5 for provisioning at the same time the following number of + nodes: + + * 10 nodes + * 20 nodes + * 40 nodes + * 80 nodes + * 160 nodes + * 320 nodes + * 640 nodes + * 1280 nodes + * 2000 nodes + + Additional tests will be performed if some anomalous behaviour is found. + These may require the collection of additional performance metrics. + +7. + The result of this part of test will be: + +* to provide the following graphs, one for each number of provisioned nodes: + + #) Three dependencies on one graph. + + * INGRESS_NET(TIME) Dependence on time of incoming network bandwidth usage. + * EGRESS_NET(TIME) Dependence on time of outgoing network bandwidth usage. + * ALL_NET(TIME) Dependence on time of total network bandwidth usage. + + #) One dependence on one graph. + + * CPU(TIME) Dependence on time of CPU utilization. + + #) One dependence on one graph. + + * RAM(TIME) Dependence on time of active memory usage. + + #) Three dependencies on one graph. + + * WRITE_IO(TIME) Dependence on time of storage write IO bandwidth. + * READ_IO(TIME) Dependence on time of storage read IO bandwidth. + * ALL_IO(TIME) Dependence on time of total storage IO bandwidth. + +.. note:: + If a distributed provisioning system is used, the above graphs should be + provided for each provisioning system instance. + +* to fill in the following table for maximum values: + +The resource metrics are obtained from the maxima of the corresponding graphs +above. The provisioning time is the elapsed time for all nodes to be +provisioned. One set of metrics will be given for each number of provisioned +nodes. + +.. table:: Maximum values of performance metrics + + +-------+--------------+---------+---------+---------+---------+ + || nodes|| provisioning|| maximum|| maximum|| maximum|| maximum| + || count|| time || CPU || RAM || NET || IO | + | | || usage || usage || usage || usage | + +=======+==============+=========+=========+=========+=========+ + | 10 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + | 20 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + | 40 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + | 80 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + | 160 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + | 320 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + | 640 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + | 1280 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + | 2000 | | | | | | + +-------+--------------+---------+---------+---------+---------+ + +Applications +------------ + +list of provisioning systems +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. table:: list of provisioning systems + + +-----------------------------+---------+ + | Name of provisioning system | Version | + +=============================+=========+ + | `Cobbler`_ | 2.4 | + +-----------------------------+---------+ + | `Razor`_ | 0.13 | + +-----------------------------+---------+ + | Image based provisioning | | + | via downloading images with | - | + | bittorrent protocol | | + +-----------------------------+---------+ + +Full script for collecting performance metrics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. literalinclude:: measure.sh + :language: bash + +.. references: + +.. _dstat: http://dag.wiee.rs/home-made/dstat/ +.. _Cobbler: http://cobbler.github.io/ +.. _Razor: https://github.com/puppetlabs/razor-server diff --git a/doc/source/test_plans/provisioning/measure.sh b/doc/source/test_plans/provisioning/measure.sh new file mode 100644 index 0000000..423f8e9 --- /dev/null +++ b/doc/source/test_plans/provisioning/measure.sh @@ -0,0 +1,86 @@ +#!/bin/bash + +# Need to install the required packages on provisioning system servers: +if (("`dpkg -l | grep dstat | grep ^ii > /dev/null; echo $?` == 1")) +then + apt-get -y install dstat +fi + +# Need to prepare the following script on provisioning system server to collect +# values of CPU,RAM,NET and IO loads per second. You need to change "INTERFACE" +# variable regarding the interface which connected to nodes to communicare with +# them during provisioning process. As a result of this command we'll get +# running in backgroud dstat programm which collecting needed parametes in CSV +# format into /var/log/dstat.log file.: +INTERFACE=eth0 +OUTPUT_FILE=/var/log/dstat.csv +dstat --nocolor --time --cpu --mem --net -N ${INTERFACE} --io --output ${OUTPUT_FILE} > /dev/null & + +# Need to prepare script which starts provisioning process and gets the time when +# provisioning started and when provisioning ended ( when all nodes reachable via +# ssh). We'll analyze results collected during this time window. For getting +# start time we can add "date" command before API call or CLI command and forward +# the output of the command to some log file. Here is example for cobbler: +ENV_NAME=env-1 +start_time=`date +%s.%N` +echo "Provisioning started at "`date` > /var/log/provisioning.log +for SYSTEM in `cobbler system find --comment=${ENV_NAME}` +do + cobbler system reboot --name=$i & +done + +# For getting end-time we can use the script below. This script tries to reach +# nodes via ssh and write "Provisioning finished at " into +# /var/log/provisioning.log file. You'll need to provide ip addresses of the +# nodes (from file nodes_ips.list, where IPs listed one per line) and +# creadentials (SSH_PASSWORD and SSH_USER variables): +SSH_OPTIONS="StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" +SSH_PASSWORD="r00tme" +SSH_USER="root" +NODE_IPS=(`cat nodes_ips.list`) +TIMER=0 +TIMEOUT=20 +while (("${TIMER}" < "${TIMEOUT}")) +do + for NODE_IP in ${NODE_IPS[@]} + do + SSH_CMD="sshpass -p ${SSH_PASSWORD} ssh -o ${SSH_OPTIONS} ${SSH_USER}@${NODE_IP}" + ${SSH_CMD} "hostname" && UNHAPPY_SSH=0 || UNHAPPY_SSH=1 + if (("${UNHAPPY_SSH}" == "0")) + then + echo "Node with ip "${NODE_IP}" is reachable via ssh" + NODE_IPS=(${NODE_IPS[@]/${NODE_IP}}) + else + echo "Node with ip "${NODE_IP}" is still unreachable via ssh" + fi + done + TIMER=$((${TIMER} + 1)) + if (("${TIMER}" == "${TIMEOUT}")) + then + echo "The following "${#NODE_IPS[@]}" are unreachable" + echo ${NODE_IPS[@]} + exit 1 + fi + if ((${#NODE_IPS[@]} == 0 )) + then + break + fi + # Check that nodes are reachable once per 1 seconds + sleep 1 +done +echo "Provisioning finished at "`date` > /var/log/provisioning.log + +end_time=`date +%s.%N` +elapsed_time=$(echo "$end_time - $start_time" | bc -l) +echo "Total elapsed time for provisioning: $elapsed_time seconds" > /var/log/provisioning.log + +# Stop dstat command +killall dstat + +# Delete excess values and convert to needed metrics. So, we'll get the +# following csv format: +# time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all +awk -F "," 'BEGIN {getline;getline;getline;getline;getline;getline;getline; + print "time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all"} + {print $1","100-$4","$8/1048576","$12/131072","$13/131072","($12+$13)/131072","$14","$15","$14+$15}' \ +$OUTPUT_FILE > /var/log/10_nodes.csv