Alexander Chadin 1e34f5b32f Move approved specs of Queens as implemented

Change-Id: I5cab3be742f3fbf0f816013b9ce5a389ac596e09

2018-02-12 13:17:55 +03:00

8.1 KiB

Raw Blame History

Zone migration strategy

https://blueprints.launchpad.net/watcher/+spec/zone-migration-strategy

There are thousands of physical servers and storage running various kinds of workloads in the cloud system. Operator have to migrate instances and volumes for hardware maintenance once a quarter or so. It requires operator to watch workloads, choose instances and volumes and migrate them efficiently with minimum downtime. Watcher can be used to do this task automatically.

Problem description

It is hard for operator to migrate many instances and volumes efficiently with minimum downtime for hardware maintenance.

Use Cases

As an operator, I want Watcher to migrate many instances and volumes efficiently with minimum downtime automatically.

Proposed change

Add Goal "Hardware maintenance". The goal is to migrate instances and volumes on a set(called "Zone" in this spec) of compute nodes and storage.

Add eight Efficacy Indicator.

LiveInstanceMigrateCount

The number of instances should be live migrated
PlannedLiveInstanceMigrateCount

The number of instances planned to live migrate
ColdInstanceMigrateCount

The number of instances should be cold migrated
PlannedColdInstanceMigrateCount

The number of instances planned to cold migrate
VolumeMigrateCount

The number of detached volumes should be migrated
PlannedVolumeMigrateCount

The number of detached volumes planned to migrate
VolumeUpdateCount

The number of attached volumes should be updated
PlannedVolumeUpdateCount

The number of attached volumes planned to update

Add Efficacy Specification associated with the goal. The efficacy specification has four global efficacy indicators.

live_instance_migrate_ratio

Ratio of planned live migrate instances to instances should be live migrated. The result of PlannedLiveInstanceMigrateCount / LiveInstanceMigrateCount
cold_instance_migrate_ratio

Ratio of planned cold migrate instances to instances should be cold migrated. The result of PlannedColdInstanceMigrateCount / ColdInstanceMigrateCount
volume_migrate_ratio

Ratio of planned detached volumes to volumes should be migrated. migrate. The result of PlannedVolumeMigrateCount / VolumeMigrateCount
volume_update_ratio

Ratio of planned attached volumes to volumes should be updated. The result of PlannedVolumeUpdateCount / PlannedVolumeUpdateCount

Add Strategy "Zone migration". The strategy gets compute node and storage pool names given by input parameter, and gets instances and volumes on them from cluster data model. After that, the strategy prioritizes instances and volumes respectively by the following list given by input parameter.

project
compute_node
storage_pool
compute

The strategy chooses instances in descending order of one of the following:
- vcpu_num
- mem_size
- disk_size
- created_at
storage

The strategy chooses volumes in descending order of one of the following:
- size
- created_at

For example, If the following input parameters is given:

"priority": {
    "project": ["pj1"],
    "compute_node": ["compute1", "compute2"],
    "compute": ["cpu_num"],
    "storage_pool": ["pool1", "pool2"],
    "storage": ["size"]
}

And we have list of instances and volumes as the followings.

instances:

{"project": "pj1", "node":"compute2", "cpu_num": 1, "memory_size": 4}
{"project": "pj2", "node":"compute1", "cpu_num": 1, "memory_size": 2}
{"project": "pj1", "node":"compute1", "cpu_num": 2, "memory_size": 3}
{"project": "pj2", "node":"compute1", "cpu_num": 1, "memory_size": 1}

volumes:

{"project": "pj1", "node":"pool1", "size": 3, "created_at": 2017-02-25}
{"project": "pj1", "node":"pool1", "size": 3, "created_at": 2017-02-26}
{"project": "pj2", "node":"pool1", "size": 5, "created_at": 2017-02-25}
{"project": "pj1", "node":"pool2", "size": 3, "created_at": 2017-02-25}

Instances are prioritized as the following:

{"project": "pj1", "node":"compute1", "cpu_num": 2, "memory_size": 3}
{"project": "pj1", "node":"compute2", "cpu_num": 1, "memory_size": 4}
{"project": "pj2", "node":"compute1", "cpu_num": 1, "memory_size": 2}
{"project": "pj2", "node":"compute1", "cpu_num": 1, "memory_size": 1}

Volumes are prioritized as the following:

{"project": "pj1", "node":"pool1", "size": 3, "created_at": 2017-02-25}
{"project": "pj1", "node":"pool1", "size": 3, "created_at": 2017-02-26}
{"project": "pj2", "node":"pool1", "size": 5, "created_at": 2017-02-25}
{"project": "pj1", "node":"pool2", "size": 3, "created_at": 2017-02-25}

The strategy migrates all chosen volumes first by volume_migrate action. Then the strategy migrates all chosen instances by migrate action. Destination can be given by input parameter.

If volume is attached to an instance, the instance can be migrated just after attached volume is migrated. Because they should be near place for performance reason. This behavior is configurable.

The strategy uses weights planner that is the planner by default which has the number of actions to be run in parallel on a per action type basis. In addition to that, the strategy has the number of actions to be run in parallel per node or pool. The number is given by the input parameter.

The strategy gets volumes and instances from prioritized ones and migrates them which are limited to the number of volumes and instances to each number of parallelization per host and the number of parallelization per action in weight planner.

The input parameters are the followings:

compute_nodes": [
    {
        "src_node": "cen-cmp02",
        "dst_node": "cen-cmp01"
    },
    ......
],
"storage_pools": [
    {
        "src_pool": "cen-cmp02@lvm#afa",
        "dst_pool": "cen-cmp01@lvm#afa",
        "src_volume_type": "afa",
        "dst_volume_type": "afa"
    },
    ........
],
"parallel_per_node": 2,
"parallel_per_pool": 2,
"priority": {
    "project": ["pj1", "pj2"],
    "compute_node": ["compute1", "compute2"],
    "storage_pool": ["pool1", "pool2"],
    "compute": ["cpu_num", "memory_size"],
    "storage": ["size"]
}
"with_attached_volume": false

Alternatives

Operator migrates instances and volumes manually one by one.

Data model impact

None

REST API impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

None

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee:: <nakamura-h>
Other contributors:: <adi-sky17>

Work Items

Implement goal "Hardware maintenance"
Implement efficacy indicator
Impement efficacy specification
Implement strategy "Zone migration"
- Filters for prioritizing volumes and instances to be migrated
- Parallel number controller
- Migrating volumes and instances by actions logic

Dependencies

Testing

Unit and tempest tests are added.

Documentation Impact

Strategy documentation is added.

References

https://www.youtube.com/watch?v=_6kB1NTob8o

History

None

8.1 KiB Raw Blame History

Zone migration strategy

Problem description

Use Cases

Proposed change

Alternatives

Data model impact

REST API impact

Security impact

Notifications impact

Other end user impact

Performance Impact

Other deployer impact

Developer impact

Implementation

Assignee(s)

Work Items

Dependencies

Testing

Documentation Impact

References

History

8.1 KiB

Raw Blame History