Merge "Volume migration improvement for Liberty version"
This commit is contained in:
commit
0982dec286
354
specs/liberty/volume-migration-improvement.rst
Normal file
354
specs/liberty/volume-migration-improvement.rst
Normal file
@ -0,0 +1,354 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
============================
|
||||
Volume Migration Improvement
|
||||
============================
|
||||
|
||||
https://blueprints.launchpad.net/cinder/+spec/migration-improvement
|
||||
|
||||
This specification proposes to improve the current volume migration in terms
|
||||
of implementing a better way to manage the volume migration status,
|
||||
adding the migration progress indication, enriching the notification system via
|
||||
reporting to Ceilometer, guaranteeing the migration quality via tempest tests
|
||||
and CI support, etc. It targets to resolve the current issues for the available
|
||||
volumes only, since we need to wait until the multiple attached volume
|
||||
functionality lands in Nova to resolve the issues related to the attached volumes.
|
||||
There is going to be another specification designed to cover the issues regarding
|
||||
the attached volumes.
|
||||
|
||||
There are three cases for volume migration. The scope of this spec is for the
|
||||
available volumes only and targets to resolve the issues within the following
|
||||
migration Case 1 and 2:
|
||||
|
||||
Within the scope of this spec
|
||||
1) Available volume migration using the "dd" command.
|
||||
For example, migration from LVM to LVM, between LVM and vendor driver, and
|
||||
between different vendor drivers.
|
||||
|
||||
2) Available volume migration between two pools from the same vendor driver using
|
||||
driver-specific way. Storwize is taken as the reference example for this spec.
|
||||
|
||||
Out of the scope of the spec
|
||||
3) In-use(attached) volume migration using Cinder generic migration.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently, there are quite some straightforward issues about the volume migration.
|
||||
1. Whether the migration succeeds or fails is not saved anywhere, which is very
|
||||
confusing for the administrator. The volume status is still available or in-use,
|
||||
even if the administrator mentions --force as a flag for cinder migrate command.
|
||||
2. From the API perspective, the administrator is unable to check the status of the
|
||||
migration. The only way to check if the migration fails or succeeds is
|
||||
to check the database.
|
||||
3. During the migration, there are two volumes appearing in the database record
|
||||
via the check by "cinder list" command. One is the source volume and one is the
|
||||
destination volume. The latter is actually useless to show and leads to confusion
|
||||
for the administrator.
|
||||
4. When executing the command "cinder migrate", most of the time there is
|
||||
nothing returned to the administrator from the terminal, which is unfriendly and needs to
|
||||
be improved.
|
||||
5. It is probable that the migration process takes a long time to finish. Currently
|
||||
the administrator gets nothing from the log about the progress of the migration.
|
||||
6. Nothing reports to the Ceilometer.
|
||||
7. There are no tempest test cases and no CI support to make sure the migration
|
||||
truly works for any kind of drivers.
|
||||
|
||||
We propose to add the management of the migration status to resolve
|
||||
issues 1 to 4, add the migration progress indication to cover Issue 5, add
|
||||
the notification to solve Issue 6 and add tempest tests and CI support to tackle
|
||||
Issue 7.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
At the beginning, we declare that all the changes and test cases are dedicated to
|
||||
available volumes. For the attached volumes, we will wait until the multiple
|
||||
attached volume functionality get merged in Nova.
|
||||
|
||||
* Management of the volume migration status:
|
||||
|
||||
If the migration fails, the migration status is set to "error". If the migration
|
||||
succeeds, the migration status is set to "success". If no migration is ever done
|
||||
for the volume, the migration status is set to None. The migration status is used
|
||||
to record the result of the previous migration. The migration status can be seen
|
||||
by the administrator only.
|
||||
|
||||
The administrator has several ways to check the migration status:
|
||||
1) The administrator can do a regular "volume list" with a filter
|
||||
"migration_status=<expected volume migration status>" to find all the volumes
|
||||
with the specified migration status. If no filter is specified, all the volumes
|
||||
will list the migration status.
|
||||
2) The administrator can issue a "get" command for a certain volume and the
|
||||
migration status can be found in the field 'os-vol-mig-status-attr:migstat'.
|
||||
|
||||
If the administrator issues the migrate command with the --force flag, the volume
|
||||
status will be changed to 'maintenance'. Attach or detach will not be allowed
|
||||
during migration. If the administrator issues the migrate command without the
|
||||
--force flag, the volume status will remain unchanged. Attach or detach action issued
|
||||
during migration will abort the migration. The status 'maintenance' can be extended
|
||||
to use in any other situation, in which the volume service is not available due to
|
||||
any kinds of reasons.
|
||||
|
||||
We plan to provide more information when the administrator is running "cinder migrate"
|
||||
command. If the migration is able to start, we return a message "Your migration request
|
||||
has been received. Please check migration status and the server log to see more
|
||||
information." If the migration is rejected by the API, we shall return messages
|
||||
like "Your migration request failed to process due to some reasons".
|
||||
|
||||
We plan to remove the redundant information for the dummy destination volume. If
|
||||
Cinder Internal Tenant(https://review.openstack.org/#/c/186232/) is successfully
|
||||
implemented, we will apply that patch to hide the destination volume.
|
||||
|
||||
* Migration progress indication:
|
||||
|
||||
We would like to introduce a poll mechanism to check the migration progress in
|
||||
a certain interval as the implementation for the migration progress indication.
|
||||
The poll mechanism can be realized in a loop, and the migration progress will
|
||||
be checked in a certain interval, which is configurable in cinder.conf. This
|
||||
mechanism can be running in parallel to the volume migration.
|
||||
|
||||
If the volume copy starts, another thread for the migration progress check will
|
||||
start as well. If the volume copy ends, the thread for the migration progress
|
||||
check ends.
|
||||
|
||||
A driver capability named migration_progress_report can be added to each driver.
|
||||
It is either True or False. This is for the case that volumes are migrated
|
||||
from one pool to another within the same storage back-end. If it is True, the
|
||||
loop for the poll mechanism will start. Otherwise, no poll mechanism will start.
|
||||
|
||||
A configuration option named migration_progress_report_interval can be added into
|
||||
cinder.conf, specifying how frequent the migration progress needs to be checked.
|
||||
For example, if migration_progress_report_interval is set to 30 seconds, the code will
|
||||
check the migration progress and report it every 30 seconds.
|
||||
|
||||
If the volume is migrated using dd command, e.g. volume migration from LVM to
|
||||
LVM, from LVM to vendor driver, from one back-end to another via blockcopy, etc,
|
||||
the migration progress can be checked via the position indication of
|
||||
/proc/<PID>/fdinfo/1.
|
||||
|
||||
For the volume is migrated using file I/O, the current file pointer is able to
|
||||
report the position of the transferred data. The migration progress can be checked
|
||||
via this position relative to EOF.
|
||||
|
||||
If the volume is migrated within different pools of one back-end, we would like to
|
||||
implement this feature by checking the stats of the storage back-end. Storwize
|
||||
V7000 is taken as the reference implementation about reporting the migration
|
||||
progress. It is possible that some drivers support the progress report and some
|
||||
do not. A new key "migration_progress_report" will be added into the driver
|
||||
to report the capability. If the back-end driver supports to report the migration
|
||||
progress, this key is set to True. Otherwise, this key is set to False and the
|
||||
progress report becomes unsupported in this case.
|
||||
|
||||
The migration progress can be checked by the administrator only. Since the progress
|
||||
is not stored, each time the progress is queried from the API, the request will be
|
||||
scheduled to the cinder-volume service, which can get the updated migration
|
||||
progress for a specified volume and reports back.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
We can definitely use a hidden flag to indicate if a database row is displayed or
|
||||
hidden. However, cinder needs a consistent way to resolve other issues like image
|
||||
cache, backup, etc, we reach an agreement that cinder internal tenant is the approach
|
||||
to go.
|
||||
|
||||
The purpose that we plan to have a better management of the volume migration status,
|
||||
add migration progress indication, report the stats to Ceilometer and provide tempest
|
||||
tests and CI, is simply to guarantee the migration works in a more robust and stable
|
||||
way.
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
The REST API should be able to provide the migration status and the migration
|
||||
progress information for the volumes. For the migration status, it can be
|
||||
retrieved from the database. For the the migration progress, the API request
|
||||
will be scheduled to the cinder volume service, where the volume is located,
|
||||
and cinder volume service reports the updated progress back.
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
The volume migration should send notification to Ceilometer about the start, and
|
||||
the progress and the finish.
|
||||
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
If the back-end driver supports the migration progress indication, a new
|
||||
configuration option migration_progress_report_interval can be added. The administrator
|
||||
can decide how frequent the cinder volume service to report the migration
|
||||
progress. For example, if migration_progress_report_interval is set to 30 seconds,
|
||||
the cinder volume service will provide the progress information every 30 seconds.
|
||||
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
The driver maintainer or developer should be aware that they need to add a new
|
||||
capability to indicate whether their driver support the progress report. If yes,
|
||||
they need to implement the related method, to be provided in the implementation of
|
||||
this specification.
|
||||
|
||||
If their drivers have implemented volume migration, integration tests and driver CI
|
||||
are important to ensure the quality. This is something they need to pay attention
|
||||
and implement for their drivers as well.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Vincent Hou (sbhou@cn.ibm.com)
|
||||
|
||||
Other contributors:
|
||||
Jay Bryant
|
||||
Jon Bernard
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Management of the volume migration status:
|
||||
|
||||
1) Change the migration_status to "error" if the migration fails; Change the
|
||||
migration_status to "success" if the migration succeeds.
|
||||
2) Change the volume status to "maintenance" if the administrator executes
|
||||
the migration command with --force flag. No attach or detach is allowed during
|
||||
this migration. If the administrator executes the migration command without
|
||||
--force flag, the volume status will stay unchanged. Attach or detach during
|
||||
migration will terminate the migration to ensure the availability of the volume.
|
||||
3) Enrich cinderclient with friendly messages returned for cinder migrate and
|
||||
retype command.
|
||||
4) Hide the redundant dummy destination volume during migration.
|
||||
|
||||
* Migration progress indication:
|
||||
|
||||
Add a loop to wrap the implementation of the poll mechanism.
|
||||
|
||||
The driver, which supports the migration progress report, will set
|
||||
migration_progress_report to True. Otherwise, set it to False.
|
||||
|
||||
The option migration_progress_report_interval will be used to specify the time
|
||||
interval, in which the migration progress is checked.
|
||||
|
||||
1) If the volume is migrated between LVM back-ends, or one back-end to another,
|
||||
the position indication of /proc/<PID>/fdinfo/1 can be checked to get the
|
||||
progress of the blockcopy.
|
||||
|
||||
2) If the volume is migrated within different pools of one back-end, we would like
|
||||
to check the progress report of the back-end storage in a certain time interval.
|
||||
|
||||
The migration percentage will be logged and reported to Ceilometer.
|
||||
|
||||
* Notification:
|
||||
|
||||
Add the code to send the start, progress and end to Ceilometer during migration.
|
||||
|
||||
* Tempest tests and CI support:
|
||||
|
||||
This work item is planned to finish in two steps. The first step is called manual
|
||||
mode, in which the tempest tests are ready and people need to configure the
|
||||
OpenStack environment manually to meet the requirements of the tests.
|
||||
|
||||
The second step is called automatic mode, in which the tempest tests can run
|
||||
automatically in the gate. With the current state of OpenStack infrastructure, it
|
||||
is only possible for us to do the manual mode. The automatic mode needs to
|
||||
collaboration with OpenStack-infra team and there is going to be a blueprint for it.
|
||||
|
||||
The following cases will be added:
|
||||
1) From LVM(thin) to LVM(thin)
|
||||
2) From LVM(thin) to Storwize
|
||||
3) From Storwize to LVM(thin)
|
||||
4) From Storwize Pool 1 to Storwize Pool 2
|
||||
|
||||
Besides, RBD driver is also going to provide the similar test cases from (2) to (4)
|
||||
as above.
|
||||
|
||||
We are sure that other drivers can get involved into the tests. This specification
|
||||
targets to add the test cases for LVM, Storwize and RBD drivers as the initiative. We
|
||||
hope other drivers can take the implementation of LVM, Storwize and RBD as a
|
||||
reference in future.
|
||||
|
||||
* Documentation:
|
||||
|
||||
Update the manual for the administrators, and the development reference for
|
||||
the driver developers and maintainers.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
Cinder Internal Tenant: https://review.openstack.org/#/c/186232/
|
||||
Add support for file I/O volume migration: https://review.openstack.org/#/c/187270/
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Depending on ability to parse the required information for the LVM driver, the
|
||||
following scenarios for available volumes will taken into account:
|
||||
1) Migration using Cinder generic migration with LVM(thin) to LVM(thin).
|
||||
2) Migration using Cinder generic migration with LVM(thin) to vendor driver.
|
||||
3) Migration using Cinder generic migration with vendor driver to LVM(thin).
|
||||
4) Migration between two pools from the same vendor driver using driver-specific way.
|
||||
|
||||
There are some other scenarios, but for this release we plan to consider the above.
|
||||
For scenarios 1 to 3, we plan to put tests cases into Tempest.
|
||||
For Scenario 4, we plan to put the test into CI.
|
||||
The reference case for Scenario 2 is migration from LVM to Storwize V7000.
|
||||
The reference case for Scenario 3 is migration from Storwize V7000 to LVM.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Documentation should be updated to tell the administrator how to use the migrate
|
||||
and retype command. Describe what commands work for what kind of use cases, how
|
||||
to check the migration status, how to configure and check the migration indication,
|
||||
etc.
|
||||
|
||||
Reference will be updated to tell the driver maintainers or developers how to
|
||||
change their drivers to adapt this migration improvement via the link
|
||||
http://docs.openstack.org/developer/cinder/devref/index.html.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* https://blueprints.launchpad.net/cinder/+spec/migration-improvement
|
||||
* https://etherpad.openstack.org/p/volume-migration-improvement
|
Loading…
Reference in New Issue
Block a user