use case for load balancer member respawn

Change-Id: I3362292106d104df1a379182502c8ab8a603641e
Story: 2005830
Task: 33587
This commit is contained in:
Eric K 2019-06-18 19:17:03 -07:00 committed by Adam Spiers
parent f887448771
commit d313c13496

View File

@ -0,0 +1,132 @@
..
This template is intended to encourage a certain level of
consistency between different use cases. Adherence to the structure
of this template is recommended but not strictly required.
This template should be in ReSTructured text. For help with syntax,
see <http://sphinx-doc.org/rest.html>. To test out your formatting,
see <http://www.tele3.cz/jbar/rest/rest.html>.
===============================
Load Balancer Member Respawning
===============================
..
Please fill in the blanks in this use case statement, or rephrase
as appropriate.
As a cloud operator, whenever a load balancer member node fails, I want the
load balancer to stop directing traffic to the failed member and for a new
member to be spawned.
Fault class
===========
..
Please choose which of these classes are relevant and delete the
others. If you can think of a new class which should be listed
here, please update the template.
* Hardware failure
* Software error
* Network failure
OpenStack projects used
=======================
..
Please provide a list of projects (OpenStack and otherwise) which
may be used in order to implement this use case. If no
implementation exists yet, suggestions are sufficient here.
* Openstack Aodh (telemetry alarm service)
* Openstack Heat (orchestration)
* Openstack Octavia (load balancer as a service)
Remediation class
=================
..
Please choose which of these classes are relevant and delete the
others. If you can think of a new class which should be listed
here, please update the template.
* Reactive
Fault detection
===============
From the `Octavia admin guide
<https://docs.openstack.org/octavia/latest/admin/guides/operator-maintenance.html#monitoring-pool-members>`_:
Octavia will use the health information from the underlying load
balancing application to determine the health of members. This
information will be streamed to the Octavia database and made
available via the status tree or other API methods.
In addition, an Aodh alarm is defined to detect load balancer member
node failure and trigger the alarm action to notify Heat. This
``loadbalancer_member_health`` type alarm rule was `added to Aodh in
April 2019 <https://review.opendev.org/#/c/654221/>`_, and at the time
of writing a patch is under review to `add a Heat resource for
creating this alarm type automatically via Heat templates
<https://review.opendev.org/#/c/662381/>`_. It is intended to update
this document later with sample Heat templates.
Inputs, decision-making, and remediation
========================================
..
Describe how decisions about the remediation action are taken. In
particular list any other components or inputs which may provide
additional context to help determine appropriate remediation of the
fault.
* Octavia's builtin behavior automatically stops directing traffic to
the unresponsive member node.
* Heat receives the Aodh alarm regarding the unresponsive member node,
and according to the behavior defined in the stack template, spawns
a new instance to replace the unresponsive member node.
* Octavia detects when the new member node is operational and begins
directing some traffic to the new node.
Existing implementation(s)
==========================
..
If there are one or more existing implementations of this use case,
please give as many details as possible, in order that operators can
re-implement the use case in their own clouds. However any
information is better than no information! Linking to external
documents is perfectly acceptable.
A demo video is available
`here <https://www.youtube.com/watch?v=dXsGnbr7DfM>`_.
Future work
===========
..
Please link from here to any relevant specs. If a cross-project
spec is required, it can be placed under ../specs/ in this
repository.
Dependencies
============
..
- Include specific references to specs and/or blueprints in
self-healing-sig, or in other projects, that this one either depends
on or is related to.
- Does this feature require any new library dependencies or code
otherwise not included in OpenStack? Or does it depend on a specific
version of library?