cluster-maintenance-strategy

Implements:blueprint cluster-maintaining Change-Id: I942b793a3182d81981bb817a6e8c970459a16e43
2017-05-15 18:11:16 +08:00 · 2017-05-15 18:11:16 +08:00 · aea4693153
commit aea4693153
parent 3c538d49fc
1 changed files with 146 additions and 0 deletions
--- a/specs/queens/approved/cluster-maintenance-strategy.rst
+++ b/specs/queens/approved/cluster-maintenance-strategy.rst
@ -0,0 +1,146 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+==========================
+Host maintenance strategy
+==========================
+
+https://blueprints.launchpad.net/watcher/+spec/cluster-maintaining
+
+Problem description
+===================
+
+Sometimes we need to maintain compute nodes, update hardware or software,
+and so on, without interrupting user's applications.
+
+Use Cases
+---------
+
+As an openstack operator, sometimes I want to maintain one compute node
+without interrupting user's applications.
+
+
+Proposed change
+===============
+There will be a new goal and strategy for cluster-maintenance.
+
+* Add one new goal - "Cluster Maintenance"
+* Add one new strategy for this goal - "Host Maintenance"
+
+The new strategy executes as follows
+
+* First, get the compute node which needs maintenance. This input parameter
+  is provided by the administrator. Call change_nova_service_state action
+  to set the maintaining node in "maintaining" state (disabled with
+  disable_reason  'watcher_maintaining').
+* Then, call migrate action to migrate all instances on the maintaining node
+  to other nodes. Migrate active instances use "live-migrate" and
+  others use "cold-migrate". Calculate free cpus/memory/disk of a node
+  to determine whether one instance or all instances from the maintaining node
+  can migrate to.
+  This strategy just consider how to migrate all instances of the
+  maintaining node, further optimization rely on other strategies.
+  There are two methods to migrate the instances of the maintaining node:
+  Method No.1, migrate all instances on the maintaining node intensively to
+  one unused host.The 'unused' host means disable but not power-off node
+  for Watcher. If there are more than one "unused" hosts, choose one from
+  them by random.
+  (This method won't result in more VMs migration among other hosts.)
+  Method No.2, just migrate all instances on the maintaining node dispersedly
+  to other nodes.
+  Method No.1 is priority. Only if Method No.1 fails, Method No.2 will
+  execute. If both methods fail, this audit fails and raise exception with
+  no solution produced.
+
+After the maintenance finished, the administrator needs to activate the
+maintaining node by cli 'nova service-enable' to change the node's state
+from "maintaining" to "enabled" manually, which will make the compute node
+rejoin into compute resource.
+
+Alternatives
+------------
+
+None
+
+Data model impact
+-----------------
+
+None
+
+REST API impact
+---------------
+
+None
+
+Security impact
+---------------
+None
+
+Notifications impact
+--------------------
+
+None
+
+Other end user impact
+---------------------
+
+None
+
+Performance Impact
+------------------
+
+None
+
+Other deployer impact
+---------------------
+
+None
+
+Developer impact
+----------------
+
+None
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:sue
+
+Work Items
+----------
+
+ * Add strategy and goal for cluster_maintenance
+ * Update change_nova_service_state action, to make it available to
+   maintain one compute node.
+
+Dependencies
+============
+
+https://blueprints.launchpad.net/watcher/+spec/extend-node-status
+
+Testing
+=======
+
+Unit tests
+
+Documentation Impact
+====================
+
+A documentation explaining how to use this new optimization strategy.
+
+References
+==========
+
+None
+
+History
+=======
+
+None
+