Kubernetes density testing

Change-Id: I350fa5797c15880290a9ff31b322e67e55c8b9b5
This commit is contained in:
Ilya Shakhat 2016-12-20 16:11:51 +04:00
parent 09b76b1148
commit 72de796900
23 changed files with 29405 additions and 1 deletions

View File

@ -0,0 +1,77 @@
.. _Kubernetes_density_test_plan:
**************************
Kubernetes density testing
**************************
:status: **ready**
:version: 1.0
:Abstract:
This test plan covers scenarios of density testing of Kubernetes
Test Plan
=========
Test Environment
----------------
Preparation
^^^^^^^^^^^
The test plan is executed against Kubernetes deployed on bare-metal nodes.
Environment description
^^^^^^^^^^^^^^^^^^^^^^^
The environment description includes hardware specification of servers,
network parameters, operation system and OpenStack deployment characteristics.
Test Case #1: Maximum pods per node
-----------------------------------
Description
^^^^^^^^^^^
Kubernetes by default limits number of pods running by the node. The value is
chosen by community and since version 1.4 equals to 110 (k8s_max_pods_).
The goal of this test is to investigate system behavior at default limit and
find out whether it can be increased or not. In particular we are interested
in the following metrics: pod startup time during mass start (e.g. when
replication controller is scaled up) and node's average load.
From manual experiments it is observed that pod starts functioning before
Kubernetes API reports it to be in running state. In this test case we are
interested to investigate how long does it takes for Kubernetes to update
pod status.
List of performance metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. table:: list of test metrics to be collected during this test
+-------------------------+---------------------------------------------+
| Parameter | Description |
+=========================+=============================================+
| POD_COUNT | Number of pods |
+-------------------------+---------------------------------------------+
| POD_FIRST_REPORT | Time taken by pod to start and report |
+-------------------------+---------------------------------------------+
| KUBECTL_RUN | Time for all pods to be reported as running |
+-------------------------+---------------------------------------------+
| KUBECTL_TERMINATE | Time to terminate all pods |
+-------------------------+---------------------------------------------+
Reports
=======
Test plan execution reports:
* :ref:`Kubernetes_density_test_report`
.. references:
.. _k8s_max_pods: https://github.com/kubernetes/kubernetes/blob/v1.5.0/pkg/apis/componentconfig/v1alpha1/defaults.go#L290

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 72 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 45 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 122 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 50 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 207 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 105 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 52 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 142 KiB

View File

@ -0,0 +1,183 @@
.. _Kubernetes_density_test_report:
******************************
Kubernetes density test report
******************************
:Abstract:
This document is the report for :ref:`Kubernetes_density_test_plan`
Environment description
=======================
This report is collected on the hardware described in
:ref:`intel_mirantis_performance_lab_1`.
Software
~~~~~~~~
Kubernetes is installed using :ref:`Kargo` deployment tool on Ubuntu 16.04.1.
Node roles:
- node1: minion+master+etcd
- node2: minion+master+etcd
- node3: minion+etcd
- node4: minion
Software versions:
- OS: Ubuntu 16.04.1 LTS (Xenial Xerus)
- Kernel: 4.4.0-47
- Docker: 1.12.1
- Kubernetes: 1.4.3
Reports
=======
Test Case #1: Maximum pods per node
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pod startup time is measured with help of
`MMM(MySQL/Master/Minions) testing suite`_. To schedule all pods on a single
node the original replication controller for minions is updated with scheduler
hint. To do this add the following lines into template's spec section:
.. code-block:: yaml
nodeSelector:
kubernetes.io/hostname: node4
Pod status from Kubernetes point of view is retrieved from kubectl tool.
The process is automated with
:download:`kubectl_mon.py <kubectl-mon/kubectl_mon.py>`, which produces
output in CSV format. Charts are created by
:download:`pod_stats.py <kubectl-mon/pod_stats.py>` script.
Every measurement starts with empty namespace. Then Kubernetes replication
controller is created with specified number of pods. We collect pod's report
time and kubectl stats. The summary data is presented below.
.. image:: summary.png
.. list-table:: POD stats
:header-rows: 1
*
- POD_COUNT
- POD_FIRST_REPORT, s
- KUBECTL_RUN, s
- KUBECTL_TERMINATE, s
*
- 50
- 12
- 44
- 30
*
- 100
- 27
- 131
- 87
*
- 200
- 61
- 450
- 153
*
- 400
- 208
- ∞ (not finished)
- 390
Detailed Stats
--------------
50 pods
^^^^^^^
Start replication controller with 50 pods
.. image:: 50-start.svg
:width: 100%
Terminate replication controller with 50 pods
.. image:: 50-term.svg
:width: 100%
100 pods
^^^^^^^^
Start replication controller with 100 pods
.. image:: 100-start.svg
:width: 100%
Terminate replication controller with 100 pods
.. image:: 100-term.svg
:width: 100%
200 pods
^^^^^^^^
Start replication controller with 200 pods
.. image:: 200-start.svg
:width: 100%
Terminate replication controller with 200 pods
.. image:: 200-term.svg
:width: 100%
400 pods
^^^^^^^^
Start replication controller with 400 pods.
Note: In this experiment all pods successfully reported, however from Kubernetes API
point of view less than 60 pods were in running state. The number of pods
reported as running was slowly increasing over the time, but the speed was very
low to treat the process as succeed.
.. image:: 400-start.svg
:width: 100%
Terminate replication controller with 400 pods.
.. image:: 400-term.svg
:width: 100%
Scale by 100 pods steps
^^^^^^^^^^^^^^^^^^^^^^^
In this experiment we scale replication controller up by steps of 100 pods.
Scaling process is invoked after all pods are reported as running. On step 3
(201-300 pods) the process has become significantly slower and we've started
scaling replication controller down. The full cycle is visualized below.
.. image:: N-start-term.svg
:width: 100%
System metrics from API nodes and minion are below
.. image:: N-cpu-user.png
.. image:: N-cpu-system.png
.. image:: N-mem-used.png
.. image:: N-disk-io.png
Full `Kubernetes stats`_ are available online.
.. references:
.. _Kargo: https://github.com/kubespray/kargo
.. _MMM(MySQL/Master/Minions) testing suite: https://github.com/AleksandrNull/MMM
.. _Kubernetes stats: https://snapshot.raintank.io/dashboard/snapshot/YCtAh7jHhYpmWk8nsfda0EAIRRnG4TV9

View File

@ -0,0 +1,27 @@
#!/bin/python
import re
import subprocess
import time
KUBECTL_CMD = 'kubectl --namespace minions get pods -l k8s-app=minion'
def main():
while True:
start = time.time()
stdout = subprocess.Popen(KUBECTL_CMD, shell=True,
stdout=subprocess.PIPE).stdout.read()
print('time,name,status')
for line in stdout.split('\n')[1:]:
if line:
tokens = re.split('\s+', line)
name = tokens[0]
status = tokens[2]
print('%f,%s,%s' % (start, name, status))
d = 1 - (time.time() - start)
time.sleep(d)
if __name__ == '__main__':
main()

View File

@ -0,0 +1,85 @@
#!/bin/python
import argparse
import operator
import itertools
import numpy as np
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
COLORS = {
'Pending': '#ffb624',
'ContainerCreating': '#ebeb00',
'Running': '#50c878',
'Terminating': '#a366ff',
'Error': '#cc0000',
}
def main():
parser = argparse.ArgumentParser(prog='pod-stats')
parser.add_argument('data', nargs='+')
args = parser.parse_args()
source = args.data[0]
data = np.genfromtxt(source, dtype=None, delimiter=',',
skip_header=1, skip_footer=0,
names=['time', 'name', 'status'])
categories = list(set(x[2] for x in data))
categories.sort()
processed = []
t = []
base_time = data[0][0]
for k, g in itertools.groupby(data, key=operator.itemgetter(0)):
r = dict((c, 0) for c in categories)
for x in g:
r[x[2]] += 1
v = [r[c] for c in categories]
processed.append(v)
t.append(k - base_time)
figure = plt.figure()
plot = figure.add_subplot(111)
colors = [COLORS[c] for c in categories]
plot.stackplot(t, np.transpose(processed), colors=colors)
if len(args.data) > 1:
y = []
x = []
with open(args.data[1]) as fd:
cnt = fd.read()
for i, p in enumerate(cnt.strip().split('\n')):
x.append(int(p) / 1000.0 - base_time)
y.append(i)
plot.plot(x, y, 'b.')
plot.grid(True)
plot.set_xlabel('time, s')
plot.set_ylabel('pods')
ax = figure.gca()
ax.yaxis.set_major_locator(mticker.MaxNLocator(integer=True))
# add legend
patches = [mpatches.Patch(color=c) for c in colors]
texts = categories
if len(args.data) > 1:
patches.append(mpatches.Patch(color='blue'))
texts.append('Pod report')
legend = plot.legend(patches, texts, loc='right', shadow=True)
for label in legend.get_texts():
label.set_fontsize('small')
plt.show()
figure.savefig('figure.svg')
if __name__ == '__main__':
main()

View File

@ -0,0 +1,10 @@
FROM debian:latest
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get -y upgrade
RUN apt-get -y --no-install-recommends install python
ADD minion.py /opt/minion/minion
RUN chmod 0777 /opt/minion/minion
ENTRYPOINT ["/opt/minion/minion"]

View File

@ -0,0 +1,35 @@
#!/usr/bin/python2
import httplib
import signal
import sys
import time
import uuid
class GracefulKiller:
kill_now = False
def __init__(self):
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
def exit_gracefully(self, signum, frame):
print('Signal caught')
self.kill_now = True
sys.exit(0)
if __name__ == '__main__':
killer = GracefulKiller()
t = int(time.time() * (10 ** 3))
u = str(uuid.uuid4())
e = ''
c = httplib.HTTPConnection('172.20.9.7:8000')
q = '%s' % t
print(q)
c.request('GET', q)
r = c.getresponse()
print(r.status)
time.sleep(2 << 20)

View File

@ -0,0 +1,25 @@
apiVersion: v1
kind: ReplicationController
metadata:
name: minion
namespace: minions
spec:
replicas: 1
selector:
k8s-app: minion
version: v0
template:
metadata:
labels:
k8s-app: minion
version: v0
spec:
containers:
- name: minion
image: 127.0.0.1:31500/qa/minion:latest
env:
- name: MINION_RC
value: "1"
nodeSelector:
kubernetes.io/hostname: node4

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View File

@ -6,3 +6,4 @@ Kubernetes system performance
:maxdepth: 4 :maxdepth: 4
API_testing/index API_testing/index
density/index