Merge "Add volume replication support"
This commit is contained in:
commit
59e7c7572f
505
specs/juno/volume-replication.rst
Normal file
505
specs/juno/volume-replication.rst
Normal file
@ -0,0 +1,505 @@
|
||||
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
Volume Replication
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/cinder/+spec/volume-replication
|
||||
|
||||
Volume replication is a key storage feature and a requirement for
|
||||
features such as high-availability and disaster recovery of applications
|
||||
running on top of OpenStack clouds.
|
||||
This blueprint is an attempt to add initial support for volume replication
|
||||
in Cinder, and is considered a first take, and will include support for:
|
||||
* Replicate volumes (primary to secondary approach)
|
||||
* Promote a secondary to primary (and stop replication)
|
||||
* Synchronize replication with direction
|
||||
|
||||
This would further be enhanced in the future.
|
||||
|
||||
While this blueprint focuses on volume replication, a related blueprint
|
||||
focuses on consistency groups, and replication would be extended to
|
||||
support it.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
The main use of volume replication is resiliency in presence of failures.
|
||||
Examples of possible failures are:
|
||||
|
||||
* Storage system failure
|
||||
* Rack(s) level failure
|
||||
* Datacenter level failure
|
||||
|
||||
Here we specifically exclude failures like media failures, disk failures, etc.
|
||||
Such failures are typically addressed by local resiliency schemes.
|
||||
|
||||
Replication can be implemented in the following ways:
|
||||
|
||||
* Host-based - Requires Nova integration
|
||||
|
||||
* Storage-based
|
||||
|
||||
- Typical block based approach - replication is specified between two
|
||||
existing volumes (or groups of volumes) on the controllers.
|
||||
- Typically file system based approach - a file
|
||||
(in Cinder context, the file representing a block device) placed in a
|
||||
directory (or group or fileset, etc) that is automatically copied to a
|
||||
specified remote location.
|
||||
|
||||
Assumptions:
|
||||
|
||||
* Replication should be transparent to the end-user, failover, failback
|
||||
and test will be executed by the cloud admin.
|
||||
However, to test that the application is working, the end-user may be
|
||||
involved, as he will be required to verify that his application is
|
||||
working with the volume replica.
|
||||
|
||||
* The storage admin will provide the setup and configuration to enable the
|
||||
actual replication between the storage systems. This could be performed
|
||||
at the storage back-end or storage driver level depending on the storage
|
||||
back-end. Specifically, storage drivers are expected to report with whom
|
||||
they can replicate and report this to the scheduler.
|
||||
|
||||
* The cloud admin will enable the replication feature through the use of
|
||||
volume types.
|
||||
|
||||
* The end-user will not be directly exposed to the replication feature.
|
||||
Selecting a volume-type will determine if the volume will be replicated,
|
||||
based on the actual extra-spec definition of the volume type (defined by
|
||||
the cloud admin).
|
||||
|
||||
* Quota management: quota are consumed as 2x as two volumes are
|
||||
created and the consumed space id double.
|
||||
We can re-examine this mechanism after we get comments from deployers.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Each Cinder host will report replication capabilities:
|
||||
|
||||
* Replication_support: indicate if replication is enabled for this driver
|
||||
instance
|
||||
* Replication_unit_id: device specific id used for replication
|
||||
* Replication_partners: list of device specific ids that this node can
|
||||
replicate with
|
||||
* Replication_rpo_range - supported RPO by this driver instance <min,max>
|
||||
* replication_supported_methods - list of methods supported by the back-end
|
||||
|
||||
Add extra-specs in the volume type to indicate replication:
|
||||
|
||||
* Replication_enabled - if True, volume to be replicated if exists as extra
|
||||
specs. if option is not specified or False, then replication is not
|
||||
enabled. This option is required to enable replication.
|
||||
* replica_same_az - (optional) indicate if replica should be in the same AZ
|
||||
* replica_volume_backend_name - (optional) specify back-end to be used as
|
||||
target
|
||||
* replication_target_rpo - (optional) requested RPO (numeric, minutes) for
|
||||
the volume
|
||||
|
||||
Create volume with replication enabled:
|
||||
|
||||
* Scheduler selects two hosts for volume placement and sets up the replication
|
||||
DB entry
|
||||
* Manager on primary creates the primary volume (as is done today)
|
||||
* Manager on secondary creates the replica volume
|
||||
* Manager on primary sets up the replication
|
||||
|
||||
Re-type volume:
|
||||
|
||||
* Replication_enabled: True->False:
|
||||
drop the replication and continue with the regular retype logic.
|
||||
* Replication_enabled: False->True:
|
||||
after the retype logic selects back-ends (scheduler) and enables
|
||||
replication.
|
||||
|
||||
Promote to primary:
|
||||
|
||||
* Manager on secondary stops the replication.
|
||||
* Switch between volume ids of primary and secondary
|
||||
(user sees no change in volume ids)
|
||||
|
||||
Sync replication:
|
||||
|
||||
* Manager on primary restarts the replication
|
||||
|
||||
Test:
|
||||
|
||||
* Create a clone of the secondary volume.
|
||||
|
||||
Delete volume:
|
||||
|
||||
* Disable the replication
|
||||
* Delete secondary volume
|
||||
* Delete primary volume (as is done today)
|
||||
|
||||
Cloning a volume:
|
||||
|
||||
* Since the replica are added after the primary is created, if we
|
||||
clone a volume and keep the volume-type, it will be replicated.
|
||||
|
||||
Snapshots:
|
||||
|
||||
* Snapshot for the primary volume works as is today, and create
|
||||
a snapshot on the primary. No snapshot is done for the replica.
|
||||
* Snapshot for the replica (secondary) volume will fail.
|
||||
|
||||
Notes:
|
||||
|
||||
* Manager acts via the driver for back-end replication specific functions.
|
||||
* Failover is "promote to primary" as described above.
|
||||
* Failback is "sync replication" + "promote to primary".
|
||||
|
||||
Driver API:
|
||||
|
||||
* create_replica: to be run on secondary to create the volume
|
||||
* enable_replica: to be run on primary to start replication
|
||||
* disable_replica: to be run on primary, stops the replication
|
||||
* delete_replica: to be run on secondary, deletes the replica target volume
|
||||
* replication_status_check: to be run on all hosts, updating the replication
|
||||
status as observed from the back-end perspective
|
||||
* promote_replica: to be run on secondary, make secondary the primary
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Replication can be performed outside of Cinder, and OpenStack can be
|
||||
unaware of it. However, this requires vendor specific scripts, and
|
||||
is not visible to the admin user, as only the storage system admin
|
||||
will see the replica and the state of the replication.
|
||||
Also all recovery actions (failover, failback) will require both the
|
||||
the storage and cloud admins to work together.
|
||||
While replication in Cinder reduces the role of the storage admin to
|
||||
only the setup phase, and the cloud admin is responsible for failover
|
||||
and failback with (typically) not need for intervention from the clouds
|
||||
admin.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
* A new replication relationship table will be created.
|
||||
(with its database migration support).
|
||||
|
||||
* On promote to primary, the ids of the primary and secondary volume entries
|
||||
will change (switch).
|
||||
|
||||
Replication relationship db table:
|
||||
|
||||
* id = Column(String(36), primary_key=True)
|
||||
* deleted = Column(Boolean, default=False)
|
||||
* primary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False)
|
||||
* secondary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False)
|
||||
* primary_replication_unit_id = Column(String(255))
|
||||
* secondary_replication_unit_id = Column(String(255))
|
||||
* status = Column(Enum('error', 'creating', 'copying', 'active', 'active-stopped',
|
||||
'stopping', 'deleting', 'deleted', 'inactive',
|
||||
name='replicationrelationship_status'))
|
||||
* extended_status = Column(String(255))
|
||||
* driver_data = Column(String(255))
|
||||
|
||||
State diagram for replication (status)::
|
||||
<start>
|
||||
any error
|
||||
Create replica +----------+ condition +-------+
|
||||
+--------------> | creating | +------------> | error |
|
||||
+----+-----+ +---+---+
|
||||
| | Storage admin to
|
||||
| enable replication | fix, and status
|
||||
| | check will update
|
||||
+----+-----+ |
|
||||
+-------------> | copying | any state <--------+
|
||||
| +----+-----+
|
||||
| |
|
||||
| status |
|
||||
| check | status check
|
||||
| +----++----+ +------> +--+--+-+--------+
|
||||
| | active | | active-stopped |
|
||||
| +----++----+ <------+ +--+--+-+--------+
|
||||
| | status check
|
||||
| |
|
||||
| | promote to primary
|
||||
| |
|
||||
| sync +----+--+--+
|
||||
+------------+ | inactive |
|
||||
+-------+--+
|
||||
<end>
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
* Show replication relationship
|
||||
|
||||
* Show information about a volume replication relationship.
|
||||
* Method type: GET
|
||||
* Normal Response Code: 200
|
||||
* Expected error http response code(s)
|
||||
|
||||
* 404: replication relationship not found
|
||||
|
||||
* /v2/<tenant id>/os-volume-replication/<replication uuid>
|
||||
* JSON schema definition for the response data::
|
||||
|
||||
{
|
||||
'relationship':
|
||||
{
|
||||
'id': 'relationship id'
|
||||
'primary_id': 'primary volume uuid'
|
||||
'status': 'status of relationship'
|
||||
'links': '{ ... }'
|
||||
}
|
||||
}
|
||||
|
||||
* Show replication relationship with details
|
||||
|
||||
* Show detailed information about a volume replication relationship.
|
||||
* Method type: GET
|
||||
* Normal Response Code: 200
|
||||
* Expected error http response code(s)
|
||||
|
||||
* 404: replication relationship not found
|
||||
|
||||
* /v2/<tenant id>/os-volume-replication/<replication uuid>/detail
|
||||
* JSON schema definition for the response data::
|
||||
|
||||
{
|
||||
'relationship':
|
||||
{
|
||||
'id': 'relationship id'
|
||||
'primary_id': 'primary volume uuid'
|
||||
'secondary_id': 'secondary volume uuid'
|
||||
'status': 'status of relationship'
|
||||
'extended_status': 'extended status'
|
||||
'links': { ... }
|
||||
}
|
||||
}
|
||||
|
||||
* List replication relationship with details
|
||||
|
||||
* List detailed information about a volume replication relationship.
|
||||
* Method type: GET
|
||||
* Normal Response Code: 200
|
||||
* Expected error http response code(s)
|
||||
|
||||
* TBD
|
||||
|
||||
* /v2/<tenant id>/os-volume-replication/detail
|
||||
* Parameters:
|
||||
|
||||
*status*
|
||||
filter by replication relationship status
|
||||
*primary_id*
|
||||
Filter by primary volume id
|
||||
*secondary_id*
|
||||
Filter by secondary volume id
|
||||
|
||||
* JSON schema definition for the response data::
|
||||
|
||||
{
|
||||
'relationship':
|
||||
{
|
||||
'id': 'relationship id'
|
||||
'primary_id': 'primary volume uuid'
|
||||
'secondary_id': 'secondary volume uuid'
|
||||
'status': 'status of relationship'
|
||||
'extended_status': 'extended status'
|
||||
'links': { ... }
|
||||
}
|
||||
}
|
||||
|
||||
* Promote volume to be the primary volume
|
||||
|
||||
* Switch between the uuids of the primary and secondary volumes, and
|
||||
make the secondary volume the primary volume.
|
||||
* Method type: PUT
|
||||
* Normal Response Code: 202
|
||||
* Expected error http response code(s)
|
||||
|
||||
* 404: replication relationship not found
|
||||
|
||||
* /v2/<tenant id>/os-volume-replication/<replication uuid>
|
||||
* JSON schema definition for the body data::
|
||||
|
||||
{
|
||||
'relationship':
|
||||
{
|
||||
'promote': None
|
||||
}
|
||||
}
|
||||
|
||||
* Sync between the primary and secondary volume.
|
||||
|
||||
* Resync the replication between the primary and secondary volume.
|
||||
Typically follows a promote operation on the replication.
|
||||
* Method type: PUT
|
||||
* Normal Response Code: 202
|
||||
* Expected error http response code(s)
|
||||
|
||||
* 404: replication relationship not found
|
||||
|
||||
* /v2/<tenant id>/os-volume-replication/<replication uuid>
|
||||
* JSON schema definition for the body data::
|
||||
|
||||
{
|
||||
'relationship':
|
||||
{
|
||||
'sync': None
|
||||
}
|
||||
}
|
||||
|
||||
* Test replication by make a copy of the secondary volume available
|
||||
|
||||
* Test the volume replication. Create a clone of the secondary volume
|
||||
and make it accessible, so the promote process can be tested.
|
||||
* Method type: POST
|
||||
* Normal Response Code: 202
|
||||
* Expected error http response code(s)
|
||||
|
||||
* 404: replication relationship not found
|
||||
|
||||
* /v2/<tenant id>/os-volume-replication/<replication uuid>/test
|
||||
* JSON schema definition for the response data::
|
||||
|
||||
{
|
||||
'relationship':
|
||||
{
|
||||
'volume_id': 'volume id of the cloned secondary'
|
||||
}
|
||||
}
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
* Does this change touch sensitive data such as tokens, keys, or user data?
|
||||
*No*.
|
||||
|
||||
* Does this change alter the API in a way that may impact security, such as
|
||||
a new way to access sensitive information or a new way to login?
|
||||
*No*.
|
||||
|
||||
* Does this change involve cryptography or hashing?
|
||||
*No*.
|
||||
|
||||
* Does this change require the use of sudo or any elevated privileges?
|
||||
*No*.
|
||||
|
||||
* Does this change involve using or parsing user-provided data? This could
|
||||
be directly at the API level or indirectly such as changes to a cache layer.
|
||||
*No*.
|
||||
|
||||
* Can this change enable a resource exhaustion attack, such as allowing a
|
||||
single API interaction to consume significant server resources? Some
|
||||
examples of this include launching subprocesses for each connection, or
|
||||
entity expansion attacks in XML.
|
||||
*Yes*, enabling replication consume cloud and storage resources.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
Will add notification for enabling replication, promoting, syncing and
|
||||
dropping replication.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
* End-user to use volume types to enable/disable replication.
|
||||
|
||||
* Cloud admin to use the *promote*, *sync* and *test* commands
|
||||
in the python-cinderclient to execute failover, failback and test.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
* Scheduler now needs to choose two hosts instead of one based on
|
||||
additional input from the driver and volume type.
|
||||
|
||||
* The periodic task will query the driver and back-end for status
|
||||
of all replicated volumes - running on the primary and secondary.
|
||||
|
||||
* Extra db calls identifying if replication exists are added to retype,
|
||||
snapshot operations, etc will add a small latency to these functions.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
* Added options for volume types (see above)
|
||||
|
||||
* Add new driver capabilities, needs to be supported by the volume drivers,
|
||||
which may imply changes to the driver configuration options.
|
||||
|
||||
* This change will require explicit enablement (to be used by users)
|
||||
from the cloud administrator.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
* Change to the driver API is noted above. Basically new functions are
|
||||
needed to support using replication.
|
||||
|
||||
* The API will expand to include consistency groups following merging
|
||||
consistency group support to Cinder.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
ronenkat
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Cinder public (admin) APIs for replication
|
||||
* DB schema for replication
|
||||
* Cinder scheduler support for replication
|
||||
* Cinder driver API additions for replication
|
||||
* Cinder manager update for replication
|
||||
* Testing
|
||||
|
||||
Note: Code is based on https://review.openstack.org/#/c/64026/ which was
|
||||
submitted in the Icehouse development cycle.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* Related blueprints: Consistency groups
|
||||
https://blueprints.launchpad.net/cinder/+spec/consistency-groups
|
||||
|
||||
* LVM to support replication using DRBD, in a separate contribution.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Testing in gate is not supported due to the following considerations:
|
||||
|
||||
* LVM has no replication support, to be addressed using DRBD in a separate
|
||||
contribution.
|
||||
* requires setting up at least two nodes using DRBD
|
||||
|
||||
* Should be discussed/addressed as support for LVM is added.
|
||||
|
||||
* 3rd party driver CI will be expected to test replication.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
* Public (admin) API changes.
|
||||
* Details how replication is used by leveraging volume types.
|
||||
* Driver docs explaining how replication is setup for each driver.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* Volume replication design session
|
||||
https://etherpad.openstack.org/p/juno-cinder-volume-replication
|
||||
|
Loading…
x
Reference in New Issue
Block a user