
After releasing the rc-final of Watcher for Mitaka, we need to clean-up specs folder before moving to Newton. Change-Id: I8de45ebae47c0b9c8c25e17285d553c9870deea0
4.4 KiB
Workload Based Migration Strategy
https://blueprints.launchpad.net/watcher/+spec/workload-balance-migration-strategy
This spec proposes a new Watcher migration strategy based on the VM workloads of hypervisors. This strategy makes decisions to migrate workloads to make the total VM workloads of each hypervisor balanced, when the total VM workloads of hypervisor reaches threshold.
- Note: * VM workloads means how much CPUs the VM instance fully used, eg a VM
-
instance has 4 CPUs, the VM workload range is [0.00 , 4.00].
- This strategy is based on that all hosts have the same CPUs.
Problem description
In current Data Center, VM workloads on each hypervisor may not balance, some are extremely high, some are idle, which will reduce the cooling efficiency, this strategy will balance the workloads when the CPU utilization of hypervisor reaches threshold.
Use Cases
As an administrator, I want to be able to trigger an audit that controls the CPU utilization below a certain threshold.
In order to:
- balance the workloads on each hypervisor, make it close to average workload value of all hypervisors.
Project Priority
Not relevant because Watcher is not in the big tent so far.
Proposed change
Watcher already has its decision framework, so this strategy should be a new class which extend the base strategy class.
- Set the threshold in 2 steps: hard coded first, then through the template. see: https://blueprints.launchpad.net/watcher/+spec/optimization-threshold Threshold is the trigger value to start workload balancing. It can be the percentage of CPU utilization of hypervisor.
- Create a new Python class to extend the "BaseStrategy" class.
- Use the Nova objects framework to get free CPU/Memory/Disk of hypervisors.
- Use the Ceilometer client to get VM "cpu_util" to calculate the workloads. The time window is 5 minutes here, will be configurable as threshold. It uses ceilometer aggregation API to get the average value of "cpu_util".
- An algorithm to detect if the threshold of workloads has been reached, it will figure out the suitable VM to be moved, and it will filter the viable targets according to the free resource information of hypervisors from previous step and choose the one with lowest workloads. It will use select-destinations-filter when it is ready. It will also do some check to avoid the corner case, such as "ping pong".
Alternatives
No alternative
Data model impact
None
REST API impact
None
Security impact
None
Notifications impact
None
Other end user impact
None
Performance Impact
There used to be some performance issues regarding the query of metrics from the Ceilometer database. This is one of the reasons why it was rarely used in production environment. These issues may now be solved thanks to an abstraction layer which enables anybody to change the underlying metrics storage backend easily.
There is a performance issue when you query the Nova DB to get CPU usage metrics.
Other deployer impact
None
Developer impact
None
Implementation
Assignee(s)
- Primary assignee:
-
<edwin-zhai>
Work Items
- function to calculate the total VM workloads of hypervisors.
- function to filter hypervisors by Nova basic metrics(free CPU/Memory/Disk).
- Rewrite execute function to add the algorithm to detect the threshold and to pick up the suitable VM to be moved and choose the target hypervisors, generate the solution.
Dependencies
- https://blueprints.launchpad.net/watcher/+spec/optimization-threshold
- https://blueprints.launchpad.net/watcher/+spec/select-destinations-filter
- http://docs.openstack.org/developer/python-novaclient/api.html
- https://blueprints.launchpad.net/watcher/+spec/get-goal-from-strategy
Testing
Unit tests and functional test, will use a fake metrics set for running functional test.
Documentation Impact
A documentation explaining how to use this new optimization strategy.
References
None
History
None