Merge "Fix the volume replication spec"

This commit is contained in:
Jenkins 2015-05-08 19:32:40 +00:00 committed by Gerrit Code Review
commit ca54b61129

View File

@ -15,15 +15,20 @@ Volume replication is a key storage feature and a requirement for
features such as high-availability and disaster recovery of applications
running on top of OpenStack clouds.
This blueprint is an attempt to add initial support for volume replication
in Cinder, and is considered a first take, and will include support for:
in Cinder, and is considered a first take which will include support for:
* Replicate volumes (primary to secondary approach)
* Promote a secondary to primary (and stop replication)
* Synchronize replication with direction
* Re-enable replication
* Test that replication is running properly
This would further be enhanced in the future.
It is important to note that this is a first pass at volume replication.
The process of implementing replication for drivers has uncovered a
number of challenges that will be addressed in a future revision of
replication that will address the ability to have different replication
types and the ability to replicate across multiple backends.
While this blueprint focuses on volume replication, a related blueprint
focuses on consistency groups, and replication would be extended to
focuses on consistency groups, and replication will be extended to
support it.
Use Cases
@ -60,7 +65,7 @@ Assumptions:
* Replication should be transparent to the end-user, failover, failback
and test will be executed by the cloud admin.
However, to test that the application is working, the end-user may be
involved, as he will be required to verify that his application is
involved, as they will be required to verify that his application is
working with the volume replica.
* The storage admin will provide the setup and configuration to enable the
@ -78,95 +83,236 @@ Assumptions:
the cloud admin).
* Quota management: quota are consumed as 2x as two volumes are
created and the consumed space id double.
created and the consumed space is doubled.
We can re-examine this mechanism after we get comments from deployers.
Proposed change
===============
Each Cinder host will report replication capabilities:
Introduction:
* Replication_support: indicate if replication is enabled for this driver
instance
* Replication_unit_id: device specific id used for replication
* Replication_partners: list of device specific ids that this node can
replicate with
* Replication_rpo_range - supported RPO by this driver instance <min,max>
* replication_supported_methods - list of methods supported by the back-end
The proposed design provides just a framework in Cinder for backend volume
drivers to implement volume replication using the facilities in the storage
backend. As such, this spec provides guidance as to how volume replication
should be implemented but the actual implementation will vary depending
upon the backend in question.
The key to enabling replication starts with adding an extra spec to the
volume type to indicate that replication is desired. That extra spec is
then used in the volume driver to enable the set-up and control of
replication on the storage backend in each of the different functions
documented below.
Since Cinder is just providing the framework for backend volume drivers
to implement replication, details of replication implementation are left
to the backend to implement. The backend driver developer will need to
decide for their storage backend the best way to enable replication. For
instance one storage provider may feel that implementing synchronous
replication is the best choice while another storage provider may choose
asynchronous. A provider could also choose to make it a configurable
option. Implementing volume replication in Cinder in this manner allows
the greatest flexibility to the backend developer to implement replication.
It is also important to note that the developer documentation must provide
examples of how this is implemented in the Storwize driver. This is an
important item to note as it is not currently possible to demonstrate
volume replication in Cinder's reference implementation, LVM. Therefore
developer documentation will have to serve as the reference.
Add extra-specs in the volume type to indicate replication:
* Replication_enabled - if True, volume to be replicated if exists as extra
specs. if option is not specified or False, then replication is not
enabled. This option is required to enable replication.
* replica_same_az - (optional) indicate if replica should be in the same AZ
* replica_volume_backend_name - (optional) specify back-end to be used as
target
* replication_target_rpo - (optional) requested RPO (numeric, minutes) for
the volume
* capabilities:replication <is> True - if True, the volume is to be replicated,
if supported, by the backend driver. If the option is not specified or
False, then replication is not enabled. This option is required to enable
replication.
Create volume with replication enabled:
* Scheduler selects two hosts for volume placement and sets up the replication
DB entry
* Manager on primary creates the primary volume (as is done today)
* Manager on secondary creates the replica volume
* Manager on primary sets up the replication
* Backend drivers that wish to enable replication will need to update their
create_volume() function to check for the
'capabilities:replication <is> True' extra spec. It is up to the backend
driver developers to implement replication in a manner that is compatible
with their storage backend.
Re-type volume:
When a replicated volume is created it is expected that the volume dictionary
will be populated as follows:
* Replication_enabled: True->False:
drop the replication and continue with the regular retype logic.
* Replication_enabled: False->True:
after the retype logic selects back-ends (scheduler) and enables
replication.
** volume['replication_status'] = 'copying'
** volume['replication_extended_status'] = <driver specific value>
** volume['driver_data'] = <driver specific value>
Promote to primary:
The replica volume is hidden from the end user as the end user will
never need to directly interact with the replica volume. Any interaction
with the replica happens through the primary volume.
* Manager on secondary stops the replication.
* Switch between volume ids of primary and secondary
(user sees no change in volume ids)
Further details around the dictionary fields above may be seen in the data
"Data Model Impact" section below.
Sync replication:
Create Volume from Snapshot:
* Manager on primary restarts the replication
If the volume type extra specs include 'capabilities:replication <is> True'
for the new volume, the driver needs to create a volume replica at volume
creation time and set up replication between the newly created volume and its
associated replica. The volume dictionary should be populated in the same
manner as create volume.
Test:
Create Cloned Volume:
* Create a clone of the secondary volume.
If the volume type extra specs include 'capabilities:replication <is> True'
for the new volume, the driver needs to create a volume replica at clone
creation time and set up replication between the newly created volume and its
associated replica. The volume dictionary should be populated in the same
manner as create volume.
Create Replica Test Volume:
Create a clone of the replica (secondary) volume. This clone can then be
used for testing replication to ensure that fail-over can be executed when
necessary. It is important to note that this doesn't actually execute the
the promote path as the intention is not to promote the replica but it gives
a method to ensure that the replica contains data and would be useful if
it had to be promoted.
The administrator is able to access this functionality using the
--source-replica option when creating a volume.
Delete volume:
* Disable the replication
* Delete secondary volume
* Delete primary volume (as is done today)
For volumes with replication enabled the replica needs to be deleted
along with the primary copy. So, if a volume type has
'capabilities:replication <is> True' set, the driver will need to do the
additional deletion.
Cloning a volume:
Get Volume Stats:
* Since the replica are added after the primary is created, if we
clone a volume and keep the volume-type, it will be replicated.
If the storage backend driver supports replication the following state should
be reported:
* replication = True (None or False disables replication)
Snapshots:
Re-type volume:
* Snapshot for the primary volume works as is today, and create
a snapshot on the primary. No snapshot is done for the replica.
* Snapshot for the replica (secondary) volume will fail.
Changing volume-type is the mechanism an admin can use to make an existing
volume replicated, or to disable replication for a volume. Change the
volume-type of a volume to a volume-type that includes
'capabilities:replication: <is> True' (and didn't have it before) should
result in adding a secondary copy to a volume. Change the volume-type of
a volume to a volume-type that no longer includes
'capabilities:replication: <is> True' should result in removing the secondary
copy while preserving the primary copy.
Returns either:
A boolean indicating whether the retype occurred, or
A tuple (retyped, model_update) where retyped is a boolean
indicating if the retype occurred, and the model_update includes
changes for the volume db.
The steps to implement this would look as follows:
* Do a diff['extra_specs'] and see if 'replication' is included.
* If replication was enabled for the original volume_type but is not
not enabled for the new volume_type, then replication should be disabled.
* The replica should be deleted.
* The volume dictionary should be updated as follows:
** volume['replication_status'] = 'disabled'
** volume['replication_extended_status'] = None
** volume['driver_data'] = None
* If replication was not enabled for the original volume_type but is
enabled for the new volume_type, then replication should be enabled.
* A volume replica should be created and the replication should
be set up between the volume and the newly created replica.
* The volume dictionary should be updated as follows:
** volume['replication_status'] = 'copying'
** volume['replication_extended_status'] = <driver specific value>
** volume['driver_data'] = <driver specific value>
Get Replication Status:
This will be used to update the status of replication between the primary and
secondary volume.
This function is called by the "_update_replication_relationship_status"
function in 'manager.py' and is the mechanism to update the status
replication between the primary and secondary copies.
The actual state of the replication, as the storage backed is aware of,
should be returned and the Cinder database should be updated to reflect the
status reported from the storage backend.
It is expected that the following model update for the volume will
happen:
* volume['replication_status'] = <error | copying | active | active-stopped |
inactive>
** 'error' if an error occurred with replication.
** 'copying' replication copying data to secondary (inconsistent)
** 'active' replication copying data to secondary (consistent)
** 'active-stopped' replication data copy on hold (consistent)
** 'inactive' if replication data copy is stopped (inconsistent)
* volume['replication_extended_status'] = <driver specific value>
* volume['driver_data'] = <driver specific value>
Note for get replication status, that the replication_extended_status and
driver_data may not need to be updated.
Promote replica:
Promotion of a replica means that the secondary volume will take over
for the primary volume. This can be thought of as a 'fail over' operation.
Once promotion has happened replication between the two volumes, at the
storage level, should be stopped, the replica should be available to be
attached and the replication status should be changed to 'inactive' if the
change is successful, otherwise it should be 'error'.
A model update for the volume is returned.
As with the functions above, the volume driver is expected to update the
volume dictionary as follows:
* volume['replication_status'] = <error | inactive>
** 'error' if an error occurred with replication.
** 'inactive' if replication data copy on hold (inconsistent)
* volume['replication_extended_status'] = <driver specific value>
* volume['driver_data'] = <driver specific value>
Re-enable replication:
Re-enabling replication would be used to fix the replication between
the primary and secondary volumes. Replication would need to be
re-enabled as part of the fail-back process to make the promoted
volume and the old primary volume consistent again.
The volume driver returns a model update to reflect the actions taken.
The backend driver is expected to update the following volume dictionary
entries:
* volume['replication_status'] = <error | copying | active | active-stopped |
inactive>
** 'error' if an error occurred with replication.
** 'copying' replication copying data to secondary (inconsistent)
** 'active' replication copying data to secondary (consistent)
** 'active-stopped' replication data copy on hold (consistent)
** 'inactive' if replication data copy is stopped (inconsistent)
* volume['replication_extended_status'] = <driver specific value>
* volume['driver_data'] = <driver specific value>
Notes:
* Manager acts via the driver for back-end replication specific functions.
* Failover is "promote to primary" as described above.
* Failback is "sync replication" + "promote to primary".
The replication_extended_status should be used to store information that
the backend driver will need to track replication status. For instance,
the Storwize driver, will use the replication_extended_status to track
the primary copy status and synchronization status for the primary volume
and the copy status, synchronization status and synchronization progress for
the replica (secondary) volume.
The driver_data field may be, optionally, used to contain any additional data
that the backend driver may require. Some backend drivers may not need to
use the driver_data field.
Driver API:
* create_replica: to be run on secondary to create the volume
* enable_replica: to be run on primary to start replication
* disable_replica: to be run on primary, stops the replication
* delete_replica: to be run on secondary, deletes the replica target volume
* replication_status_check: to be run on all hosts, updating the replication
status as observed from the back-end perspective
* promote_replica: to be run on secondary, make secondary the primary
* promote: Promotes a replica that is in active or active-stopped state to
be the primary.
* reenable: Reenables replication on a volume that is in inactive,
active-stopped or error status.
Alternatives
------------
@ -179,31 +325,30 @@ Also all recovery actions (failover, failback) will require both the
the storage and cloud admins to work together.
While replication in Cinder reduces the role of the storage admin to
only the setup phase, and the cloud admin is responsible for failover
and failback with (typically) not need for intervention from the clouds
and failback with (typically) no need for intervention from the cloud
admin.
Data model impact
-----------------
* A new replication relationship table will be created.
(with its database migration support).
* The volumes table will be updated:
** Add replication_status column (string) for indicating the status of
replication for a give volume. Possible values are:
*** 'copying' - Data is being copied between volumes, the secondary is
inconsistent.
*** 'disabled' - Volume replication is disabled.
*** 'error' - Replication is in error state.
*** 'active' - Data is being copied to the secondary and the secondary is
consistent.
*** 'active-stopped' - Data is not being copied to the secondary (on hold),
the secondary volume is consistent.
*** 'inactive' - Data is not being copied to the secondary, the secondary
copy is inconsistent.
** Add replication_extended_status column to contain details with regards
to replication status of the primary and secondary volumes.
** Add replication_driver_data column to contain additional details that
may be needed by a vendor's driver to implement replication on a backend.
* On promote to primary, the ids of the primary and secondary volume entries
will change (switch).
Replication relationship db table:
* id = Column(String(36), primary_key=True)
* deleted = Column(Boolean, default=False)
* primary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False)
* secondary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False)
* primary_replication_unit_id = Column(String(255))
* secondary_replication_unit_id = Column(String(255))
* status = Column(Enum('error', 'creating', 'copying', 'active',
'active-stopped', 'stopping', 'deleting', 'deleted',
'inactive', name='replicationrelationship_status'))
* extended_status = Column(String(255))
* driver_data = Column(String(255))
State diagram for replication (status)
@ -211,171 +356,76 @@ State diagram for replication (status)
<start>
any error
Create replica +----------+ condition +-------+
+--------------> | creating | +------------> | error |
+----+-----+ +---+---+
condition +-------+
Create volume +-----+ +------------> | error |
| +---+---+
| | Storage admin to
| enable replication | fix, and status
| | fix, and status
| | check will update
+----+-----+ |
+-------------> | copying | any state <--------+
| +----+-----+
+-----+-----+ |
+-------------> | copying | any state <-------+
| +-----+-----+
| |
| status |
| check | status check
| +----++----+ +------> +--+--+-+--------+
| +----+-----+ +------> +----------------+
| | active | | active-stopped |
| +----++----+ <------+ +--+--+-+--------+
| +----+-----+ <------+ +----------------+
| | status check
| |
| | promote to primary
| |
| sync +----+--+--+
| re-enable +----+-----+
+------------+ | inactive |
+-------+--+
+----------+
<end>
REST API impact
---------------
* Show replication relationship
Create volume API will have "source-replica" added:
* Show information about a volume replication relationship.
* Method type: GET
* Normal Response Code: 200
* Expected error http response code(s)
{
"volume":
{
"source-replica": "Volume uuid of primary to clone",
}
}
* 404: replication relationship not found
* /v2/<tenant id>/os-volume-replication/<replication uuid>
* JSON schema definition for the response data::
{
'relationship':
{
'id': 'relationship id'
'primary_id': 'primary volume uuid'
'status': 'status of relationship'
'links': '{ ... }'
}
}
* Show replication relationship with details
* Show detailed information about a volume replication relationship.
* Method type: GET
* Normal Response Code: 200
* Expected error http response code(s)
* 404: replication relationship not found
* /v2/<tenant id>/os-volume-replication/<replication uuid>/detail
* JSON schema definition for the response data::
{
'relationship':
{
'id': 'relationship id'
'primary_id': 'primary volume uuid'
'secondary_id': 'secondary volume uuid'
'status': 'status of relationship'
'extended_status': 'extended status'
'links': { ... }
}
}
* List replication relationship with details
* List detailed information about a volume replication relationship.
* Method type: GET
* Normal Response Code: 200
* Expected error http response code(s)
* TBD
* /v2/<tenant id>/os-volume-replication/detail
* Parameters:
*status*
filter by replication relationship status
*primary_id*
Filter by primary volume id
*secondary_id*
Filter by secondary volume id
* JSON schema definition for the response data::
{
'relationship':
{
'id': 'relationship id'
'primary_id': 'primary volume uuid'
'secondary_id': 'secondary volume uuid'
'status': 'status of relationship'
'extended_status': 'extended status'
'links': { ... }
}
}
* Promote volume to be the primary volume
* Switch between the uuids of the primary and secondary volumes, and
make the secondary volume the primary volume.
* Method type: PUT
* Normal Response Code: 202
* Expected error http response code(s)
* 404: replication relationship not found
* /v2/<tenant id>/os-volume-replication/<replication uuid>
* JSON schema definition for the body data::
{
'relationship':
{
'promote': None
}
}
* Sync between the primary and secondary volume.
* Resync the replication between the primary and secondary volume.
Typically follows a promote operation on the replication.
* Method type: PUT
* Normal Response Code: 202
* Expected error http response code(s)
* 404: replication relationship not found
* /v2/<tenant id>/os-volume-replication/<replication uuid>
* JSON schema definition for the body data::
{
'relationship':
{
'sync': None
}
}
* Test replication by make a copy of the secondary volume available
* Test the volume replication. Create a clone of the secondary volume
and make it accessible, so the promote process can be tested.
* Promote the secondary copy to be primary. the primary will become
secondary and Replication should become inactive.
* Method type: POST
* Normal Response Code: 202
* Expected error http response code(s)
* 404: replication relationship not found
* 500: Replication is not enabled for volume
* 500: Replication status for volume must be active or active-stopped,
but current status is: <status>
* 500: Volume status for volume must be available, but current status
is: <status>
* /v2/<tenant id>/os-volume-replication/<replication uuid>/test
* JSON schema definition for the response data::
* V2/<tenant id>/volumes/os-promote-replica/<volume uuid>
* This API has no body
{
'relationship':
{
'volume_id': 'volume id of the cloned secondary'
}
}
* Re-enable replication between the primary and secondary volume.
* Re-enable the replication between the primary and secondary volume.
Typically follows a promote operation on the replication.
* Method type: POST
* Normal Response Code: 202
* Expected error http response code(s)
* 500: Replication is not enabled
* 500: Replication status for volume must be inactive, active-stopped,
or error, but current status is: <status>
* /v2/<tenant id>/volumes/os-reenable-replica/<volume uuid>
* This API has no body
Security impact
---------------
@ -406,26 +456,21 @@ Security impact
Notifications impact
--------------------
Will add notification for enabling replication, promoting, syncing and
dropping replication.
Will add notification for promoting and re-enabling replication for
volumes.
Other end user impact
---------------------
* End-user to use volume types to enable/disable replication.
* End-user to use volume types to enable replication.
* Cloud admin to use the *promote*, *sync* and *test* commands
in the python-cinderclient to execute failover, failback and test.
* Cloud admin to use the *replication-promote*, *replication-reenable* and
*create --source-replica* commands in the python-cinderclient to execute
failover, failback and test.
Performance Impact
------------------
* Scheduler now needs to choose two hosts instead of one based on
additional input from the driver and volume type.
* The periodic task will query the driver and back-end for status
of all replicated volumes - running on the primary and secondary.
* Extra db calls identifying if replication exists are added to retype,
snapshot operations, etc will add a small latency to these functions.
@ -443,10 +488,10 @@ Other deployer impact
Developer impact
----------------
* Change to the driver API is noted above. Basically new functions are
needed to support using replication.
* Change to the driver API is noted above. Third party backends that wish
to enable replication will need to add replication support to their driver.
* The API will expand to include consistency groups following merging
* The API will expand to include consistency groups following the merge of
consistency group support to Cinder.
@ -460,20 +505,17 @@ Primary assignee:
ronenkat
Other contributors:
None
Jay Bryant - E-Mail: jsbryant@us.ibm.com IRC: jungleboyj
Work Items
----------
* Cinder public (admin) APIs for replication
* DB schema for replication
* Cinder scheduler support for replication
* DB schema updates for replication
* Cinder driver API additions for replication
* Cinder manager update for replication
* Testing
Note: Code is based on https://review.openstack.org/#/c/64026/ which was
submitted in the Icehouse development cycle.
Dependencies
============
@ -502,10 +544,11 @@ Documentation Impact
* Public (admin) API changes.
* Details how replication is used by leveraging volume types.
* Driver docs explaining how replication is setup for each driver.
* Provide examples of volume replication implementation for
the Storwize backend.
References
==========
* Volume replication design session
https://etherpad.openstack.org/p/juno-cinder-volume-replication
Etherpad on improvements needed in documentation:
https://etherpad.openstack.org/p/cinder-replication-redoc