infra-specs/specs/nodepool-launch-workers.rst

::

  Copyright (c) 2014 Hewlett-Packard Development Company, L.P.

  This work is licensed under a Creative Commons Attribution 3.0
  Unported License.
  http://creativecommons.org/licenses/by/3.0/legalcode

..
  This template should be in ReSTructured text. Please do not delete
  any of the sections in this template.  If you have nothing to say
  for a whole section, just write: "None". For help with syntax, see
  http://sphinx-doc.org/rest.html To test out your formatting, see
  http://www.tele3.cz/jbar/rest/rest.html

==================================
Nodepool launch and delete workers
==================================

Story: https://storyboard.openstack.org/#!/story/2000075

Split the node launch and delete operations into separate workers for
scalability and flexibility.

Problem Description
===================

When nodepool launches or deletes a node, it creates a thread for the
operation.  As nodepool scales up the number of nodes it manages, it
may have a very large number of concurrent threads.  To launch 1,000
nodes would consume an additional 1,000 threads.  Much of this time is
spent waiting (sleeping or performing network I/O outside of the
global interpreter lock), so despite Python's threading limitations,
this is generally not a significant performance problem.

However, recently we have seen that seemingly small amounts of
additional computation can starve important threads in nodepool, such
as the main loop or the gearman I/O threads.  It would be better if we
could limit the impact of thread contention on critical paths of the
program while still preserving our ability to launch >1,000 nodes at
once.

Proposed Change
===============

Create a new worker (independent process which may run on either the
main nodepool host or one or more new servers) which performs node
launch and delete taks called 'nodepool-launcher'.  All of the
interaction with providers related to launching and deleting servers
(including ip allocation, initial ssh sessions, etc) will be done with
this worker.

The nodepool-launcher worker would read a configuration file with the
same syntax as the main nodepool server in order to obtain cloud
credentials.  The worker should be told which providers it should
handle via command-line arguments (the default should be all
providers).

It will register functions with gearman in the form
"node-launch:<provider>" and "node-delete:<provider>" for each of the
providers it handles.  Generally a single worker should expect to have
exclusive control of a given provider, as that allows the rate
limiting performed by the provider manager to be effective.  Though
there should be no technical limitation that enforces this, just a
recommendation to the operator to avoid having more than one
nodepool-launcher working with any given provider.

The worker will launch threads for each of the jobs in much the same
way that the current nodepool server does.  The worker may handle as
many simultaneous jobs as desired.  This may be unlimited as it is
currently, or it could be a configurable limit so that, say, it does
not have more than 100 simultaneous server launches running.  It is
not expected that the launcher would suffer the same starvation issues
that we have seen in the main nodepool server (due to its more limited
functionality), but if it does, this control could be used to mitigate
it.

The main nodepool server will then largely consist of the main loop
and associated actions.  Anywhere that it currently spawns a thread to
launch or delete a node should be converted into a gearman function
call to launch or delete.  The main loop will still create the
database entry for the initial node launch (so that its calulations
may proceed as they do now) and should simply pass the node id as an
argument to the launch gearman function.  Similarly, it should mark
nodes as deleted when the ZMQ message arrives, and then submit a
delete function call with the node id.

The main loop currently keeps track of ongoing delete threads so that
the periodic cleanup task does not launch more than one.  Similarly
with this change it should keep track of delete *jobs* and not launch
more than one simultaneously.  It should additionally keep track of
launch jobs, and if the launch is unsuccessful (or the worker
disconnects -- this also returns WORK_FAIL) it should mark the node
for deletion and launch a delete thread.  This will maintain the
current behavior where if nodepool is stopped (in this case, a
nodepool launch-worker is stopped), building nodes are deleted rather
than being orphaned.

Alternatives
------------

Nodepool could be made into a more single-threaded application,
however, we would need to devise a state machine for all of the points
at which we wait for something to complete during the launch cycle,
and they are quite numerous and changing all the time.  This would
seem to be very complex whereas threading is actually an ideal
paradigm.

Implementation
==============

Assignee(s)
-----------

Primary assignee: unknown

Work Items
----------

* Create nodepool-launcher class and command
* Change main server to launch gearman jobs instead of threads
* Stress test

Repositories
------------

This affects nodepool and system-config.

Servers
-------

No new servers are required, but are optional.  Initial implementation
should be colocated on the current nodepool server (it has
underutilized virtual CPUs).

DNS Entries
-----------

None.

Documentation
-------------

The infra/system-config nodepool documentation should be updated to
describe the new system.

Security
--------

The gearman protocol is cleartext and unauthenticated.  IP based
access control is currently used, and certificate support along with
authentication is planned and work is in progress.  No sensitive
information will be sent over the wire (workers will read cloud
provider credentials from a local file).

Testing
-------

This should be testable privately and locally before deployment in
production.

Dependencies
============

None.

Similar in spirit, but does not require https://review.openstack.org/#/c/127673/