![suzhengwei](/assets/img/avatar_default.png)
Implements:blueprint cluster-maintaining Change-Id: I942b793a3182d81981bb817a6e8c970459a16e43
3.3 KiB
Host maintenance strategy
https://blueprints.launchpad.net/watcher/+spec/cluster-maintaining
Problem description
Sometimes we need to maintain compute nodes, update hardware or software, and so on, without interrupting user's applications.
Use Cases
As an openstack operator, sometimes I want to maintain one compute node without interrupting user's applications.
Proposed change
There will be a new goal and strategy for cluster-maintenance.
- Add one new goal - "Cluster Maintenance"
- Add one new strategy for this goal - "Host Maintenance"
The new strategy executes as follows
- First, get the compute node which needs maintenance. This input parameter is provided by the administrator. Call change_nova_service_state action to set the maintaining node in "maintaining" state (disabled with disable_reason 'watcher_maintaining').
- Then, call migrate action to migrate all instances on the maintaining node to other nodes. Migrate active instances use "live-migrate" and others use "cold-migrate". Calculate free cpus/memory/disk of a node to determine whether one instance or all instances from the maintaining node can migrate to. This strategy just consider how to migrate all instances of the maintaining node, further optimization rely on other strategies. There are two methods to migrate the instances of the maintaining node: Method No.1, migrate all instances on the maintaining node intensively to one unused host.The 'unused' host means disable but not power-off node for Watcher. If there are more than one "unused" hosts, choose one from them by random. (This method won't result in more VMs migration among other hosts.) Method No.2, just migrate all instances on the maintaining node dispersedly to other nodes. Method No.1 is priority. Only if Method No.1 fails, Method No.2 will execute. If both methods fail, this audit fails and raise exception with no solution produced.
After the maintenance finished, the administrator needs to activate the maintaining node by cli 'nova service-enable' to change the node's state from "maintaining" to "enabled" manually, which will make the compute node rejoin into compute resource.
Alternatives
None
Data model impact
None
REST API impact
None
Security impact
None
Notifications impact
None
Other end user impact
None
Performance Impact
None
Other deployer impact
None
Developer impact
None
Implementation
Assignee(s)
Primary assignee:sue
Work Items
- Add strategy and goal for cluster_maintenance
- Update change_nova_service_state action, to make it available to maintain one compute node.
Dependencies
https://blueprints.launchpad.net/watcher/+spec/extend-node-status
Testing
Unit tests
Documentation Impact
A documentation explaining how to use this new optimization strategy.
References
None
History
None