Document issues in scale testing of fuel-ccp
This document provides details of issues found in scale testing of fuel-ccp tool and OpenStack containerized control plane installed by it. It is split into two sections, one dedicated to Kubernetes issues, and the other to OpenStack-specific problems. Change-Id: Ided3c47f7425d5e8b5fe2bdea9e794fbb4d550ec
This commit is contained in:
parent
b37458301a
commit
fb8a2f1e35
@ -24,6 +24,7 @@ Contents
|
|||||||
labs/index.rst
|
labs/index.rst
|
||||||
test_plans/index
|
test_plans/index
|
||||||
test_results/index
|
test_results/index
|
||||||
|
issues/index
|
||||||
|
|
||||||
.. raw:: pdf
|
.. raw:: pdf
|
||||||
|
|
||||||
|
12
doc/source/issues/index.rst
Normal file
12
doc/source/issues/index.rst
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
.. raw:: pdf
|
||||||
|
|
||||||
|
PageBreak oneColumn
|
||||||
|
|
||||||
|
=======================
|
||||||
|
Issues Analysis
|
||||||
|
=======================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
scale_testing_issues
|
881
doc/source/issues/scale_testing_issues.rst
Normal file
881
doc/source/issues/scale_testing_issues.rst
Normal file
@ -0,0 +1,881 @@
|
|||||||
|
.. _scale_testing_issues:
|
||||||
|
|
||||||
|
======================================
|
||||||
|
Kubernetes Issues At Scale 900 Minions
|
||||||
|
======================================
|
||||||
|
|
||||||
|
Glossary
|
||||||
|
========
|
||||||
|
|
||||||
|
- **Kubernetes** is an open-source system for automating deployment,
|
||||||
|
scaling, and management of containerized applications.
|
||||||
|
|
||||||
|
- **fuel-ccp**: CCP stands for “Containerized Control Plane”. The goal
|
||||||
|
of this project is to make building, running and managing
|
||||||
|
production-ready OpenStack containers on top of Kubernetes an
|
||||||
|
easy task for operators.
|
||||||
|
|
||||||
|
- **OpenStack** is a cloud operating system that controls large pools
|
||||||
|
of compute, storage, and networking resources throughout a
|
||||||
|
datacenter, all managed through a dashboard that gives
|
||||||
|
administrators control while empowering their users to provision
|
||||||
|
resources through a web interface.
|
||||||
|
|
||||||
|
- **Heat** is an OpenStack service to orchestrate composite cloud
|
||||||
|
applications using a declarative template format through an
|
||||||
|
OpenStack-native REST API.
|
||||||
|
|
||||||
|
- **Slice** is a set of 6 VMs:
|
||||||
|
|
||||||
|
- 1x Yahoo! Benchmark (ycsb)
|
||||||
|
|
||||||
|
- 1x Cassandra
|
||||||
|
|
||||||
|
- 1x Magento
|
||||||
|
|
||||||
|
- 1x Wordpress
|
||||||
|
|
||||||
|
- 2x Idle VM
|
||||||
|
|
||||||
|
|
||||||
|
Setup
|
||||||
|
=====
|
||||||
|
|
||||||
|
We had about 181 bare metal machines, 3 of them were used for Kubernetes
|
||||||
|
control plane services placement (API servers, ETCD, Kubernetes
|
||||||
|
scheduler, etc.), others had 5 virtual machines on each node, where
|
||||||
|
every VM was used as a Kubernetes minion node.
|
||||||
|
|
||||||
|
Each bare metal node has the following specifications:
|
||||||
|
|
||||||
|
- HP ProLiant DL380 Gen9
|
||||||
|
|
||||||
|
- **CPU** - 2x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
|
||||||
|
|
||||||
|
- **RAM** - 264G
|
||||||
|
|
||||||
|
- **Storage** - 3.0T on RAID on HP Smart Array P840 Controller, HDD -
|
||||||
|
12 x HP EH0600JDYTL
|
||||||
|
|
||||||
|
- **Network** - 2x Intel Corporation Ethernet 10G 2P X710
|
||||||
|
|
||||||
|
Running OpenStack cluster (from Kubernetes point of view) is represented
|
||||||
|
with the following numbers:
|
||||||
|
|
||||||
|
1. OpenStack control plane services are running within ~80 pods on 6
|
||||||
|
nodes
|
||||||
|
|
||||||
|
2. ~4500 pods are spread across all remaining nodes, 5 pods on each.
|
||||||
|
|
||||||
|
Kubernetes architecture analysis obstacles
|
||||||
|
==========================================
|
||||||
|
|
||||||
|
During the 900 nodes tests we used `Prometheus <https://prometheus.io/>`__
|
||||||
|
monitoring tool for the
|
||||||
|
verification of the resources consumption and the load put on core
|
||||||
|
system, Kubernetes and OpenStack levels services. During one of the
|
||||||
|
Prometheus configuration optimisations old data from the Prometheus
|
||||||
|
storage was deleted to improve Prometheus API speed, and this old data
|
||||||
|
included 900 nodes cluster information, therefore we have only partial
|
||||||
|
data being available for the post run investigation. This fact,
|
||||||
|
although, does not influence overall reference architecture analysis, as
|
||||||
|
all issues, that were observed during the containerized OpenStack setup
|
||||||
|
testing, were thoughtfully documented and debugged.
|
||||||
|
|
||||||
|
To prevent monitoring data loss in future (Q1 2017 timeframe and
|
||||||
|
further) we need to proceed with the following improvements of the
|
||||||
|
monitoring setup:
|
||||||
|
|
||||||
|
1. Prometheus by default is more optimized to be used as real time
|
||||||
|
monitoring / alerting system, and there is an official
|
||||||
|
recommendation from Prometheus developers team to keep monitoring
|
||||||
|
data retention for about 15 days to keep tool working in quick
|
||||||
|
and responsive manner. To keep old data for the post-usage
|
||||||
|
analytics purposes external store requires to be configured.
|
||||||
|
|
||||||
|
2. We need to reconfigure monitoring tool (Prometheus) to include data
|
||||||
|
backup to one of the persistent time series databases (e.g.
|
||||||
|
InfluxDB / Cassandra / OpenTSDB) that’s supported as an external
|
||||||
|
persistent data store by Prometheus. This will allow us to store
|
||||||
|
old data for extended amount of time for post-processing needs.
|
||||||
|
|
||||||
|
Observed issues
|
||||||
|
===============
|
||||||
|
|
||||||
|
Huge load on kube-apiserver
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Both API servers, running in Kubernetes cluster, were utilising up to
|
||||||
|
2000% of CPU (up to 45% of total node compute performance capacity)
|
||||||
|
after we migrated them to hardware nodes. Initial setup with all nodes
|
||||||
|
(including Kubernetes control plane nodes) running on virtualized
|
||||||
|
environment was showing not workable API servers at all.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
All services that are placed not on Kubernetes masters (``kubelet``,
|
||||||
|
``kube-proxy`` on all minions) access ``kube-apiserver`` via local
|
||||||
|
``ngnix`` proxy.
|
||||||
|
|
||||||
|
Most of those requests are watch requests that stay mostly idle after
|
||||||
|
they are initiated (most timeouts on them are defined to be about 5-10
|
||||||
|
minutes). ``nginx`` was configured to cut idle connections in 3 seconds,
|
||||||
|
which makes all clients to reconnect and (the worst) restart aborted SSL
|
||||||
|
session. On the server side it makes ``kube-apiserver`` consume up to 2000%
|
||||||
|
CPU resources and other requests become very slow.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Set ``proxy_timeout`` parameter to 10 minutes in ``nginx.conf`` config
|
||||||
|
file, which should be more than enough not to cut SSL connections before
|
||||||
|
requests time out by themselves. After this fix was applied, one
|
||||||
|
api-server became to consume 100% of CPU (about 2% of total node compute
|
||||||
|
performance capacity), the second one about 200% (about 4% of total node
|
||||||
|
compute performance capacity) of CPU (with average response time 200-400
|
||||||
|
ms).
|
||||||
|
|
||||||
|
Upstream issue (fixed)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Make Kargo deployment tool set ``proxy_timeout`` to 10 minutes:
|
||||||
|
`issue <https://github.com/kubernetes-incubator/kargo/issues/655>`__
|
||||||
|
fixed with `pull request <https://github.com/kubernetes-incubator/kargo/pull/656>`__
|
||||||
|
by Fuel CCP team.
|
||||||
|
|
||||||
|
KubeDNS cannot handle big cluster load with default settings
|
||||||
|
------------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
When deploying OpenStack cluster on this scale, ``kubedns`` becomes
|
||||||
|
unresponsive because of high load. This end up with very often error
|
||||||
|
appearing in logs of ``dnsmasq`` container in ``kubedns`` pod::
|
||||||
|
|
||||||
|
Maximum number of concurrent DNS queries reached.
|
||||||
|
|
||||||
|
Also ``dnsmasq`` containers sometimes get restarted due to hitting high
|
||||||
|
memory limit.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
First of all, ``kubedns`` seems to fail often on high load (or even without
|
||||||
|
load), during the experiment we observed continuous kubedns container
|
||||||
|
restarts even on empty (but big enough) Kubernetes cluster. Restarts
|
||||||
|
are caused by liveness check failing, although nothing notable is
|
||||||
|
observed in any logs.
|
||||||
|
|
||||||
|
Second, ``dnsmasq`` should have taken load off ``kubedns``, but it needs some
|
||||||
|
tuning to behave as expected for big load, otherwise it is useless.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
This requires several levels of fixing:
|
||||||
|
|
||||||
|
1. Set higher limits for ``dnsmasq`` containers: they take on most of the
|
||||||
|
load.
|
||||||
|
|
||||||
|
2. Add more replicas to ``kubedns`` replication controller (we decided to
|
||||||
|
stop on 6 replicas, as it solved the observed issue - for bigger
|
||||||
|
clusters it might be needed to increase this number even more).
|
||||||
|
|
||||||
|
3. Increase number of parallel connections ``dnsmasq`` should handle (we
|
||||||
|
used ``--dns-forward-max=1000`` which is recommendaed parameter setup
|
||||||
|
in ``dnsmasq`` manuals)
|
||||||
|
|
||||||
|
4. Increase size of cache in ``dnsmasq``: it has hard limit of 10000 cache
|
||||||
|
entries which seems to be reasonable amount.
|
||||||
|
|
||||||
|
5. Fix ``kubedns`` to handle this behaviour in proper way.
|
||||||
|
|
||||||
|
Upstream issues (partially fixed)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
#1 and #2 are fixed by making them configurable in Kargo by Kubernetes
|
||||||
|
team:
|
||||||
|
`issue <https://github.com/kubernetes-incubator/kargo/issues/643>`__,
|
||||||
|
`pull request <https://github.com/kubernetes-incubator/kargo/pull/652>`__.
|
||||||
|
|
||||||
|
Other fixes are still being implemented as of time of this publication.
|
||||||
|
|
||||||
|
Kubernetes scheduler is ineffective with pod antiaffinity
|
||||||
|
---------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
It takes significant amount of time for scheduler to process pods with
|
||||||
|
pod antiaffinity rules specified on them. It is spending about **2-3
|
||||||
|
seconds** on each pod which makes time needed to deploy OpenStack
|
||||||
|
cluster on 900 nodes unexpectedly long (about 3h for just scheduling).
|
||||||
|
Antiaffinity rules are required to be used for OpenStack deployment to
|
||||||
|
prevent several OpenStack compute nodes to be mixed and messed to one
|
||||||
|
Kubernetes Minion node.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
According to profiling results, most of the time is spent on creating
|
||||||
|
new Selectors to match existing pods against them, which triggers
|
||||||
|
validation step. Basically we have O(N^2) unnecessary validation steps
|
||||||
|
(N - number of pods), even if we have just 5 deployments entities
|
||||||
|
covering most of the nodes.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Specific optimization that speeds up scheduling time up to about 300
|
||||||
|
ms/pod was required in this case. It’s still slow in terms of common
|
||||||
|
sense (about 30m spent just on pods scheduling for 900 nodes OpenStack
|
||||||
|
cluster), but is close to be reasonable. This solution lowers number of
|
||||||
|
very expensive operations to O(N), which is better, but still depends on
|
||||||
|
number of pods instead of deployments, so there is space for future
|
||||||
|
improvement.
|
||||||
|
|
||||||
|
Upstream issues
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Optimization merged into master: `pull
|
||||||
|
request <https://github.com/kubernetes/kubernetes/pull/37691>`__;
|
||||||
|
backported to 1.5 branch (will release in 1.5.2 release): `pull
|
||||||
|
request <https://github.com/kubernetes/kubernetes/pull/38693>`__.
|
||||||
|
|
||||||
|
Kubernetes scheduler needs to be deployed on separate node
|
||||||
|
----------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
During huge OpenStack cluster deployment against pre-deployed
|
||||||
|
Kubernetes ``scheduler``, ``controller-manager`` and ``apiserver`` start
|
||||||
|
competing for CPU cycles as all of them get big load. Scheduler is more
|
||||||
|
resource-hungry (see next problem), so we need a way to deploy it
|
||||||
|
separately.
|
||||||
|
|
||||||
|
Root Cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
The same problem with Kubsernetes scheduler efficiency at scale of about
|
||||||
|
1000 nodes as in the issue above.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Kubernetes scheduler was moved to a separate node manually, all other
|
||||||
|
schedulers were manually killed to prevent them from moving to other
|
||||||
|
nodes.
|
||||||
|
|
||||||
|
Upstream issues
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
`Issue <https://github.com/kubernetes-incubator/kargo/issues/834>`__
|
||||||
|
created in Kargo installer Github repository.
|
||||||
|
|
||||||
|
kube-apiserver have low default rate limit
|
||||||
|
------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Different services start receiving “429 Rate Limit Exceeded” HTTP error
|
||||||
|
even though ``kube-apiservers`` can take more load. It is linked to a
|
||||||
|
scheduler bug (see below).
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Raise rate limit for ``kube-apiserver process`` via ``--max-requests-inflight``
|
||||||
|
option. It defaults to 400, in our case it became workable at 2000. This
|
||||||
|
number should be configurable in Kargo deployment tool, as for bigger
|
||||||
|
deployments it might be required to increase it accordingly.
|
||||||
|
|
||||||
|
Upstream issues
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Upstream issue or pull request was not created for this issue.
|
||||||
|
|
||||||
|
Kubernetes scheduler can schedule wrongly
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
When many pods are being created (~4500 in our case of OpenStack
|
||||||
|
deployment) and faced with 429 error from ``kube-apiserver`` (see above),
|
||||||
|
the scheduler can schedule several pods of the same deployment on one node
|
||||||
|
in violation of pod antiaffinity rule on them.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
This issue arises due to scheduler cache being evicted before the pod
|
||||||
|
actually processed.
|
||||||
|
|
||||||
|
Upstream issues
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
`Pull
|
||||||
|
request <https://github.com/kubernetes/kubernetes/pull/38503>`__ accepted
|
||||||
|
in Kubernetes upstream.
|
||||||
|
|
||||||
|
Docker become unresponsive at random
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Docker process sometimes hangs on several nodes, which results in
|
||||||
|
timeouts in ``kubelet`` logs and pods cannot be spawned or terminated
|
||||||
|
successfully on the affected minion node. Although bunch of similar
|
||||||
|
issues has been fixed in Docker since 1.11, we still are observing those
|
||||||
|
symptoms.
|
||||||
|
|
||||||
|
Workaround
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Docker daemon logs does not contain any notable information, so we had
|
||||||
|
to restart docker service on the affected node (during those experiments
|
||||||
|
we used Docker 1.12.3, but we have observed similar symptoms in 1.13
|
||||||
|
as well).
|
||||||
|
|
||||||
|
Calico start up time is too long
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
If we have to kill a Kubernetes node, Calico requires ~5 minutes to
|
||||||
|
reestablish all mesh connections.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Calico uses BGP, so without route reflector it has to do full-mesh
|
||||||
|
between all nodes in cluster.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
We need to switch to using route reflectors in our clusters. Then every
|
||||||
|
node needs only to establish connections to all reflectors.
|
||||||
|
|
||||||
|
Upstream Issues
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
None. For production use, architecutre of Calico network should be
|
||||||
|
adjusted to use route reflectors set up on selected nodes or on
|
||||||
|
switching fabric hardware. This will reduce the number of BGP
|
||||||
|
connections per node and speed up the Calico startup.
|
||||||
|
|
||||||
|
===========================================
|
||||||
|
OpenStack Issues At Scale 200 Compute Nodes
|
||||||
|
===========================================
|
||||||
|
|
||||||
|
Workloads testing approach
|
||||||
|
==========================
|
||||||
|
|
||||||
|
The goal of this test was to investigate OpenStack and Kubernetes
|
||||||
|
behavior under load. This load needs to emulate end-user traffic running
|
||||||
|
on OpenStack servers (guest systems) with different types of
|
||||||
|
applications running against cloud system. We were interested in the
|
||||||
|
following metrics: CPU usage, Disk statistics, Network load, Memory used
|
||||||
|
and IO stats on each controller nodes and chosen set of the compute
|
||||||
|
nodes.
|
||||||
|
|
||||||
|
Testing preparation steps
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
To preare for testing, several steps should be made:
|
||||||
|
|
||||||
|
- [0] Pre-deploy Kubernetes + OpenStack Newton environment (Kargo +
|
||||||
|
fuel-ccp tools)
|
||||||
|
|
||||||
|
- [1] Establish advanced monitoring of the testing environment (on
|
||||||
|
all three layers - system, Kubernetes and OpenStack)
|
||||||
|
|
||||||
|
- [2] Prepare to automatically deploy slices against OpenStack and
|
||||||
|
configure applications running within them
|
||||||
|
|
||||||
|
- [3] Prepare to run workloads against applications running within
|
||||||
|
slices
|
||||||
|
|
||||||
|
[1] was achieved through configuring Prometheus monitoring and alerting
|
||||||
|
tool with Collectd and Telegraf collectors. Separated monitoring
|
||||||
|
document is under preparation to present significant effort spent on
|
||||||
|
this task.
|
||||||
|
|
||||||
|
[2] was automated by using Heat OpenStack orchestration service
|
||||||
|
`templates <https://github.com/ayasakov/hot-ansible-templates>`__.
|
||||||
|
|
||||||
|
[3] was automated mostly through generating HTTP trafic native for the
|
||||||
|
applications under test using Locust.IO tool
|
||||||
|
`templates <https://github.com/ayasakov/locustio-workloads>`__.
|
||||||
|
Cassandra VM workload was automated through Yahoo! Benchmark (ycsb)
|
||||||
|
running on neighbour VM of the same slice.
|
||||||
|
|
||||||
|
Step [1] was tested against 900 nodes Kubernetes cluster described in
|
||||||
|
section above and later against all test environments we had, steps [2] and
|
||||||
|
[3] were verified against small (20 nodes) testing environment in
|
||||||
|
parallel with finalizing step [1] workability. During this verification
|
||||||
|
several issues with recently introduced Heat support to fuel-ccp were
|
||||||
|
observed and fixed. Later all of those steps were assumed to be run
|
||||||
|
against 200 bare metal nodes lab, but bunch of issues was observed
|
||||||
|
during step [2] that blocked finishing the testing. All of those issues
|
||||||
|
(including those found during small environment verification) are listed
|
||||||
|
below.
|
||||||
|
|
||||||
|
Heat/fuel-ccp integration issues
|
||||||
|
================================
|
||||||
|
|
||||||
|
Lack of Heat domain configuration
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Authentication errors during Heat stacks (representing workloads testing
|
||||||
|
slices on each compute node) creation.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
During OpenStack Newton timeframe Orchestration has started to require
|
||||||
|
additional information in the Identity service to manage stacks - in
|
||||||
|
particular, configuration of heat domain that would contain stacks
|
||||||
|
projects and users, creating of ``heat_domain_admin`` user to manage
|
||||||
|
projects and users in the heat domain and adding the admin role to the
|
||||||
|
``heat_domain_admin`` user in the heat domain to enable administrative
|
||||||
|
stack management privileges by the ``heat_domain_admin`` user. Please take
|
||||||
|
a look on `OpenStack Newton configuration
|
||||||
|
guide <http://docs.openstack.org/project-install-guide/orchestration/newton/install-ubuntu.html>`__
|
||||||
|
for more information.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Set up needed configuration steps in ``fuel-ccp``:
|
||||||
|
|
||||||
|
- `Patch #1 <https://review.openstack.org/#/c/400846/>`__
|
||||||
|
|
||||||
|
- `Patch #2 <https://review.openstack.org/#/c/402045/>`__
|
||||||
|
|
||||||
|
Lack of heat-api-cfn service configuration in fuel-ccp
|
||||||
|
------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Applications configured in Heat templates are not receiving their
|
||||||
|
configurations and data required for the succeeded applications
|
||||||
|
workability.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Initial Heat support in ``fuel-ccp`` did not include ``heat-api-cfn``
|
||||||
|
service, which is used by Heat for some config transports. This service is
|
||||||
|
necessary to support default ``user_data_format`` (``HEAT_CFNTOOLS``), used
|
||||||
|
for most applications-related Heat templates.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Add new ``heat-api-cfn`` image, which will be used to create new service and
|
||||||
|
configure all necessary endpoints for it in ``fuel-ccp`` tool. Patches to
|
||||||
|
``fuel-ccp``:
|
||||||
|
|
||||||
|
- `Patch #1 <https://review.openstack.org/#/c/401138/>`__
|
||||||
|
|
||||||
|
- `Patch #2 <https://review.openstack.org/#/c/401174/>`__
|
||||||
|
|
||||||
|
Heat endpoint not reachable from virtual machines
|
||||||
|
-------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Heat API and Heat API CFN are not reachable from OpenStack VM, even
|
||||||
|
though some of the applications configurations require such access.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Fuel CCP deploys Heat services with default configuration and changes
|
||||||
|
``endpoint_type`` from ``publicURL`` to ``internalURL``. However,
|
||||||
|
such configuration in Kubernetes cluster is not enough for several types
|
||||||
|
of Heat stack resources like ``OS::Heat::Waitcondition`` and
|
||||||
|
``OS::Heat::SoftwareDeployment``, which require callbacks to Heat API
|
||||||
|
or Heat API CFN. Due to Kubernetes architecture, it's not possible to
|
||||||
|
do such callback on the default port value (for ``heat-api`` it's - 8004
|
||||||
|
and 8000 for ``heat-api-cfn``).
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
There are two ways to fix described above issues:
|
||||||
|
|
||||||
|
- Out of the box, which requires just adding some data to .ccp.yaml
|
||||||
|
configuration file. This workaround can be used prior OpenStack
|
||||||
|
cluster deployment during future OpenStack cluster description.
|
||||||
|
|
||||||
|
- Second which requires some manual actions and can be processed when
|
||||||
|
Openstack is already deployed and cloud administrator can change
|
||||||
|
only one component configuration.
|
||||||
|
|
||||||
|
Both of those solutions are described in the `patch to Fuel CCP
|
||||||
|
Documentation <https://review.openstack.org/#/c/404114>`__. Please
|
||||||
|
notice that additionally `patch to Fuel CCP
|
||||||
|
codebase <https://review.openstack.org/#/c/405263/>`__ need to be
|
||||||
|
applied to make some of the Kubernetes options configurable.
|
||||||
|
|
||||||
|
Glance+Heat authentication misconfiguration
|
||||||
|
-------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
During Heat stacks (representing workloads testing slices) creation,
|
||||||
|
random Glance authentication errors were observed.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Service-specific users should have admin role in "service" project and
|
||||||
|
should not belong to user-facing admin project. Initially Fuel-CCP
|
||||||
|
contained Glance and Heat services misconfigured and several patches
|
||||||
|
were required to fix it.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
- `Patch #1 <https://review.openstack.org/#/c/409033/>`__
|
||||||
|
|
||||||
|
- `Patch #2 <https://review.openstack.org/#/c/409037/>`__
|
||||||
|
|
||||||
|
Workloads Testing Issues
|
||||||
|
========================
|
||||||
|
|
||||||
|
Random loss of connection to MySQL
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
During Heat stacks (representing slices) creation some of the stacks are
|
||||||
|
moved to ERROR state with the following traceback being found in the
|
||||||
|
Heat logs::
|
||||||
|
|
||||||
|
2016-12-11 16:59:22.220 1165 ERROR nova.api.openstack.extensions
|
||||||
|
DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't
|
||||||
|
connect to MySQL server on 'galera.ccp.svc.cluster.local' ([Errno
|
||||||
|
-2] Name or service not known)")
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
This turned out to be exactly the same issue with KubeDNS being unable
|
||||||
|
to handle high loads as described in Kubernetes Issues section above.
|
||||||
|
|
||||||
|
First of all, ``kubedns`` seems to fail often on high load (or even without
|
||||||
|
load), during the experiment we observed continuous kubedns container
|
||||||
|
restarts even on empty (but big enough) Kubernetes cluster. Restarts
|
||||||
|
are caused by liveness check failing, although nothing notable is
|
||||||
|
observed in any logs.
|
||||||
|
|
||||||
|
Second, ``dnsmasq`` should have taken load off ``kubedns``, but it needs some
|
||||||
|
tuning to behave as expected for big load, otherwise it is useless.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
See above in Kubernetes Issues section.
|
||||||
|
|
||||||
|
Slow Heat API requests (`Bug 1653088 <https://bugs.launchpad.net/fuel-ccp/+bug/1653088>`__)
|
||||||
|
-----------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Requests to the Heat API for regular requests took more than a minute of
|
||||||
|
time. Example of such request can be presented with `this
|
||||||
|
traceback <http://paste.openstack.org/show/593132/>`__ showing time
|
||||||
|
needed for listing details of the specific stack
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Fuel-ccp team proposed that it might be a hidden race condition between
|
||||||
|
multiple Heat workers, and it's not yet clear where is this race
|
||||||
|
happening. This is still under debug
|
||||||
|
|
||||||
|
Workaround
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Set up number of Heat workers to 1 until real cause of this issue will
|
||||||
|
be found.
|
||||||
|
|
||||||
|
OpenStack VMs cannot fetch required metadata
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack servers do not receive metadata with applications-specific
|
||||||
|
information. There is a following
|
||||||
|
`trace <http://paste.openstack.org/show/593337/>`__ in Heat logs.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Prior pushing metadata to OpenStack VMs, Heat is storing information
|
||||||
|
about stack under creation to its own database. Starting with OpenStack
|
||||||
|
Newton Heat works in so-called “convergence” mode by default, making
|
||||||
|
sure that Heat engine process several requests at one time. This
|
||||||
|
parallel task processing might end up with race conditions during DB
|
||||||
|
access.
|
||||||
|
|
||||||
|
Workaround
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Turn off Heat convergence engine::
|
||||||
|
|
||||||
|
convergence_engine=False
|
||||||
|
|
||||||
|
RPC timeouts during Heat resources validation
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
During Heat resources validation, this process is failed with the
|
||||||
|
following `trace <http://paste.openstack.org/show/593112/>`__ being
|
||||||
|
caught.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Initial assumption was that there might be too small
|
||||||
|
``rpc_response_timeout`` parameter being set up in Heat configuration
|
||||||
|
file. This parameter was set up to 10 minutes exactly like that was used
|
||||||
|
in MOS::
|
||||||
|
|
||||||
|
rpc_response_timeout = 600
|
||||||
|
|
||||||
|
After that was done, no more RPC timeouts were observed.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Set up ``rpc_response_timeout = 600`` as that’s tested value for
|
||||||
|
generations of MOS releases.
|
||||||
|
|
||||||
|
Overloaded Memcache service (`Bug 1653089 <https://bugs.launchpad.net/fuel-ccp/+bug/1653089>`__)
|
||||||
|
----------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
On 200 nodes with workloads deploying OpenStack is actively using cache
|
||||||
|
(``memcached``). At some point of time requests to ``memcached`` begin to be
|
||||||
|
processed really slow.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Memcache size is 256 MB by default in Fuel CCP, which is really small
|
||||||
|
size. That was a reason for great amount of retransmissions being
|
||||||
|
processed.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Increase cache size up to 30G in ``fuel-ccp`` configuration file for huge or
|
||||||
|
loaded deployments::
|
||||||
|
|
||||||
|
configs:
|
||||||
|
memcached:
|
||||||
|
address: 0.0.0.0
|
||||||
|
port:
|
||||||
|
cont: 11211
|
||||||
|
ram: 30720
|
||||||
|
|
||||||
|
Incorrect status information for Neutron agents
|
||||||
|
-----------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
While asking for Neutron agents status through OpenStack client, all of
|
||||||
|
them are displayed in down state, although due to the environment
|
||||||
|
behaviour it does not seem to be true.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
The root cause of this issue is hidden in OpenStackSDK refactoring that
|
||||||
|
caused various OSC networking commands to fail. The full discussion
|
||||||
|
regarding this issue can be found in `upstream
|
||||||
|
bug <https://bugs.launchpad.net/python-openstackclient/+bug/1652317>`__.
|
||||||
|
|
||||||
|
Workaround
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Use Neutron client directly to gather Neutron services statuses until
|
||||||
|
`bug <https://bugs.launchpad.net/python-openstackclient/+bug/1652317>`__
|
||||||
|
will be fixed.
|
||||||
|
|
||||||
|
Nova client not working (`Bug 1653075 <https://bugs.launchpad.net/fuel-ccp/+bug/1653075>`__)
|
||||||
|
------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
All commands running through Nova client end up with the `stack
|
||||||
|
trace <http://paste.openstack.org/show/593531/>`__.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
In debug mode it’s seen that client tries to use HTTP protocol for the
|
||||||
|
Nova requests, e.g. http://compute.ccp.external:8443/v2.1/, although
|
||||||
|
HTTPS protocol is required for this client-server conversation.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Add the following lines to
|
||||||
|
``ccp-repos/fuel-ccp-nova/service/files/nova.conf.j2`` file::
|
||||||
|
|
||||||
|
[DEFAULT]
|
||||||
|
|
||||||
|
secure_proxy_ssl_header = HTTP_X_FORWARDED_PROTO
|
||||||
|
|
||||||
|
Neutron server timeouts (`Bug 1653077 <https://bugs.launchpad.net/fuel-ccp/+bug/1653077>`__)
|
||||||
|
------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Neutron server is not able to process requests to it with the following
|
||||||
|
error being caught in the Neutron logs:
|
||||||
|
`trace <http://paste.openstack.org/show/593483/>`__
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Default values (that are used in ``fuel-ccp``) for the Neutron database pool
|
||||||
|
size, as well as max overflow parameter are not enough for the
|
||||||
|
environment with big enough load on Neutron API.
|
||||||
|
|
||||||
|
Solution
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Modify ``ccp-repos/fuel-ccp-neutron/service/files/neutron.conf.j2`` file to
|
||||||
|
contain the following configuration parameters::
|
||||||
|
|
||||||
|
[database]
|
||||||
|
|
||||||
|
max_pool_size = 30
|
||||||
|
|
||||||
|
max_overflow = 60
|
||||||
|
|
||||||
|
No access to OpenStack VM from tenant network
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
Some of the VMs representing the slices are not reachable via tenant
|
||||||
|
network. For example::
|
||||||
|
|
||||||
|
| 93b95c73-f849-4ffb-9108-63cf262d3a9f | cassandra_vm_0 |
|
||||||
|
ACTIVE | slice0-node162-net=11.62.0.8, 10.144.1.35 |
|
||||||
|
ubuntu-software-config-last |
|
||||||
|
|
||||||
|
root@node1:~# ssh -i .ssh/slace ubuntu@10.144.1.35
|
||||||
|
Connection closed by 10.144.1.35 port 22
|
||||||
|
|
||||||
|
It is unreachable from tenant network as well. For example from instance
|
||||||
|
``b1946719-b401-447d-8103-cc43b03b1481`` which has been spawned by the same
|
||||||
|
Heat stack on the same compute node (``node162``):
|
||||||
|
`http://paste.openstack.org/show/593486/ <http://paste.openstack.org/show/593486/>`__
|
||||||
|
|
||||||
|
Root cause and solution
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Still under investigation. Root cause not clear yet. **This issue is blocking
|
||||||
|
running workloads against deployed slices.**
|
||||||
|
|
||||||
|
OpenStack services don’t handle PXC pseudo-deadlocks
|
||||||
|
----------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
When run in parallel, create operations of lots of resources were
|
||||||
|
failing with DBError saying that Percona Xtradb Cluster identified a
|
||||||
|
deadlock and transaction should be restarted.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
oslo.db is responsible for wrapping errors received from DB into proper
|
||||||
|
classes so that services can restart transactions if similar errors
|
||||||
|
occur, but it didn’t expect error in format that is being sent by
|
||||||
|
Percona. After we fixed this, we still experienced similar errors
|
||||||
|
because not all transactions that could be restarted were properly
|
||||||
|
decorated in Nova code.
|
||||||
|
|
||||||
|
Upstream issues
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
`Bug <https://bugs.launchpad.net/oslo.db/+bug/1648818>`__ has been
|
||||||
|
fixed by Roman Podolyaka’s
|
||||||
|
`CR <https://review.openstack.org/409194>`__ and
|
||||||
|
`backported <https://review.openstack.org/409679>`__ to Newton. It
|
||||||
|
fixes Percona deadlock error detection, but there’s at least one place
|
||||||
|
in Nova to be fixed (TBD)
|
||||||
|
|
||||||
|
Live migration failed with live_migration_uri configuration
|
||||||
|
-------------------------------------------------------------
|
||||||
|
|
||||||
|
Symptoms
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
With ``live_migration_uri`` configuration, live migrations fails because
|
||||||
|
one compute host can’t connect to a libvirt on another host.
|
||||||
|
|
||||||
|
Root cause
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
We can’t specify which IP address to use in template in
|
||||||
|
``live_migration_uri``, so it was trying to use address from first
|
||||||
|
interface which happened to be in PXE network while libvirt listens in
|
||||||
|
private network. We couldn’t use ``live_migration_inbound_addr`` which
|
||||||
|
would solve this problem because of a problem in upstream Nova.
|
||||||
|
|
||||||
|
Upstream issues
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A `bug <https://bugs.launchpad.net/nova/+bug/1638625>`__ in Nova has
|
||||||
|
been `fixed <https://review.openstack.org/398956>`__ and
|
||||||
|
`backported <https://review.openstack.org/404810>`__ to Newton. We
|
||||||
|
`switched <https://review.openstack.org/407708>`__ to using
|
||||||
|
``live_migration_inbound_addr`` after that.
|
||||||
|
|
||||||
|
Contributors
|
||||||
|
============
|
||||||
|
|
||||||
|
The following people have credits for contributing to this
|
||||||
|
document:
|
||||||
|
|
||||||
|
* Dina Belova <dbelova@mirantis.com>
|
||||||
|
|
||||||
|
* Yuriy Taraday <ytaraday@mirantis.com>
|
Loading…
Reference in New Issue
Block a user