diff --git a/specs/juno/volume-replication.rst b/specs/juno/volume-replication.rst new file mode 100644 index 00000000..4e539ced --- /dev/null +++ b/specs/juno/volume-replication.rst @@ -0,0 +1,505 @@ + +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================================== +Volume Replication +========================================== + +https://blueprints.launchpad.net/cinder/+spec/volume-replication + +Volume replication is a key storage feature and a requirement for +features such as high-availability and disaster recovery of applications +running on top of OpenStack clouds. +This blueprint is an attempt to add initial support for volume replication +in Cinder, and is considered a first take, and will include support for: +* Replicate volumes (primary to secondary approach) +* Promote a secondary to primary (and stop replication) +* Synchronize replication with direction + +This would further be enhanced in the future. + +While this blueprint focuses on volume replication, a related blueprint +focuses on consistency groups, and replication would be extended to +support it. + +Problem description +=================== + +The main use of volume replication is resiliency in presence of failures. +Examples of possible failures are: + +* Storage system failure +* Rack(s) level failure +* Datacenter level failure + +Here we specifically exclude failures like media failures, disk failures, etc. +Such failures are typically addressed by local resiliency schemes. + +Replication can be implemented in the following ways: + +* Host-based - Requires Nova integration + +* Storage-based + + - Typical block based approach - replication is specified between two + existing volumes (or groups of volumes) on the controllers. + - Typically file system based approach - a file + (in Cinder context, the file representing a block device) placed in a + directory (or group or fileset, etc) that is automatically copied to a + specified remote location. + +Assumptions: + +* Replication should be transparent to the end-user, failover, failback + and test will be executed by the cloud admin. + However, to test that the application is working, the end-user may be + involved, as he will be required to verify that his application is + working with the volume replica. + +* The storage admin will provide the setup and configuration to enable the + actual replication between the storage systems. This could be performed + at the storage back-end or storage driver level depending on the storage + back-end. Specifically, storage drivers are expected to report with whom + they can replicate and report this to the scheduler. + +* The cloud admin will enable the replication feature through the use of + volume types. + +* The end-user will not be directly exposed to the replication feature. + Selecting a volume-type will determine if the volume will be replicated, + based on the actual extra-spec definition of the volume type (defined by + the cloud admin). + +* Quota management: quota are consumed as 2x as two volumes are + created and the consumed space id double. + We can re-examine this mechanism after we get comments from deployers. + +Proposed change +=============== + +Each Cinder host will report replication capabilities: + +* Replication_support: indicate if replication is enabled for this driver + instance +* Replication_unit_id: device specific id used for replication +* Replication_partners: list of device specific ids that this node can + replicate with +* Replication_rpo_range - supported RPO by this driver instance +* replication_supported_methods - list of methods supported by the back-end + +Add extra-specs in the volume type to indicate replication: + +* Replication_enabled - if True, volume to be replicated if exists as extra + specs. if option is not specified or False, then replication is not + enabled. This option is required to enable replication. +* replica_same_az - (optional) indicate if replica should be in the same AZ +* replica_volume_backend_name - (optional) specify back-end to be used as + target +* replication_target_rpo - (optional) requested RPO (numeric, minutes) for + the volume + +Create volume with replication enabled: + +* Scheduler selects two hosts for volume placement and sets up the replication + DB entry +* Manager on primary creates the primary volume (as is done today) +* Manager on secondary creates the replica volume +* Manager on primary sets up the replication + +Re-type volume: + +* Replication_enabled: True->False: + drop the replication and continue with the regular retype logic. +* Replication_enabled: False->True: + after the retype logic selects back-ends (scheduler) and enables + replication. + +Promote to primary: + +* Manager on secondary stops the replication. +* Switch between volume ids of primary and secondary + (user sees no change in volume ids) + +Sync replication: + +* Manager on primary restarts the replication + +Test: + +* Create a clone of the secondary volume. + +Delete volume: + +* Disable the replication +* Delete secondary volume +* Delete primary volume (as is done today) + +Cloning a volume: + +* Since the replica are added after the primary is created, if we + clone a volume and keep the volume-type, it will be replicated. + +Snapshots: + +* Snapshot for the primary volume works as is today, and create + a snapshot on the primary. No snapshot is done for the replica. +* Snapshot for the replica (secondary) volume will fail. + +Notes: + +* Manager acts via the driver for back-end replication specific functions. +* Failover is "promote to primary" as described above. +* Failback is "sync replication" + "promote to primary". + +Driver API: + +* create_replica: to be run on secondary to create the volume +* enable_replica: to be run on primary to start replication +* disable_replica: to be run on primary, stops the replication +* delete_replica: to be run on secondary, deletes the replica target volume +* replication_status_check: to be run on all hosts, updating the replication + status as observed from the back-end perspective +* promote_replica: to be run on secondary, make secondary the primary + +Alternatives +------------ + +Replication can be performed outside of Cinder, and OpenStack can be +unaware of it. However, this requires vendor specific scripts, and +is not visible to the admin user, as only the storage system admin +will see the replica and the state of the replication. +Also all recovery actions (failover, failback) will require both the +the storage and cloud admins to work together. +While replication in Cinder reduces the role of the storage admin to +only the setup phase, and the cloud admin is responsible for failover +and failback with (typically) not need for intervention from the clouds +admin. + +Data model impact +----------------- + +* A new replication relationship table will be created. + (with its database migration support). + +* On promote to primary, the ids of the primary and secondary volume entries + will change (switch). + +Replication relationship db table: + +* id = Column(String(36), primary_key=True) +* deleted = Column(Boolean, default=False) +* primary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False) +* secondary_id = Column(String(36), ForeignKey('volumes.id'), nullable=False) +* primary_replication_unit_id = Column(String(255)) +* secondary_replication_unit_id = Column(String(255)) +* status = Column(Enum('error', 'creating', 'copying', 'active', 'active-stopped', + 'stopping', 'deleting', 'deleted', 'inactive', + name='replicationrelationship_status')) +* extended_status = Column(String(255)) +* driver_data = Column(String(255)) + +State diagram for replication (status):: + + any error + Create replica +----------+ condition +-------+ + +--------------> | creating | +------------> | error | + +----+-----+ +---+---+ + | | Storage admin to + | enable replication | fix, and status + | | check will update + +----+-----+ | + +-------------> | copying | any state <--------+ + | +----+-----+ + | | + | status | + | check | status check + | +----++----+ +------> +--+--+-+--------+ + | | active | | active-stopped | + | +----++----+ <------+ +--+--+-+--------+ + | | status check + | | + | | promote to primary + | | + | sync +----+--+--+ + +------------+ | inactive | + +-------+--+ + + +REST API impact +--------------- + +* Show replication relationship + + * Show information about a volume replication relationship. + * Method type: GET + * Normal Response Code: 200 + * Expected error http response code(s) + + * 404: replication relationship not found + + * /v2//os-volume-replication/ + * JSON schema definition for the response data:: + + { + 'relationship': + { + 'id': 'relationship id' + 'primary_id': 'primary volume uuid' + 'status': 'status of relationship' + 'links': '{ ... }' + } + } + +* Show replication relationship with details + + * Show detailed information about a volume replication relationship. + * Method type: GET + * Normal Response Code: 200 + * Expected error http response code(s) + + * 404: replication relationship not found + + * /v2//os-volume-replication//detail + * JSON schema definition for the response data:: + + { + 'relationship': + { + 'id': 'relationship id' + 'primary_id': 'primary volume uuid' + 'secondary_id': 'secondary volume uuid' + 'status': 'status of relationship' + 'extended_status': 'extended status' + 'links': { ... } + } + } + +* List replication relationship with details + + * List detailed information about a volume replication relationship. + * Method type: GET + * Normal Response Code: 200 + * Expected error http response code(s) + + * TBD + + * /v2//os-volume-replication/detail + * Parameters: + + *status* + filter by replication relationship status + *primary_id* + Filter by primary volume id + *secondary_id* + Filter by secondary volume id + + * JSON schema definition for the response data:: + + { + 'relationship': + { + 'id': 'relationship id' + 'primary_id': 'primary volume uuid' + 'secondary_id': 'secondary volume uuid' + 'status': 'status of relationship' + 'extended_status': 'extended status' + 'links': { ... } + } + } + +* Promote volume to be the primary volume + + * Switch between the uuids of the primary and secondary volumes, and + make the secondary volume the primary volume. + * Method type: PUT + * Normal Response Code: 202 + * Expected error http response code(s) + + * 404: replication relationship not found + + * /v2//os-volume-replication/ + * JSON schema definition for the body data:: + + { + 'relationship': + { + 'promote': None + } + } + +* Sync between the primary and secondary volume. + + * Resync the replication between the primary and secondary volume. + Typically follows a promote operation on the replication. + * Method type: PUT + * Normal Response Code: 202 + * Expected error http response code(s) + + * 404: replication relationship not found + + * /v2//os-volume-replication/ + * JSON schema definition for the body data:: + + { + 'relationship': + { + 'sync': None + } + } + +* Test replication by make a copy of the secondary volume available + + * Test the volume replication. Create a clone of the secondary volume + and make it accessible, so the promote process can be tested. + * Method type: POST + * Normal Response Code: 202 + * Expected error http response code(s) + + * 404: replication relationship not found + + * /v2//os-volume-replication//test + * JSON schema definition for the response data:: + + { + 'relationship': + { + 'volume_id': 'volume id of the cloned secondary' + } + } + +Security impact +--------------- + +* Does this change touch sensitive data such as tokens, keys, or user data? + *No*. + +* Does this change alter the API in a way that may impact security, such as + a new way to access sensitive information or a new way to login? + *No*. + +* Does this change involve cryptography or hashing? + *No*. + +* Does this change require the use of sudo or any elevated privileges? + *No*. + +* Does this change involve using or parsing user-provided data? This could + be directly at the API level or indirectly such as changes to a cache layer. + *No*. + +* Can this change enable a resource exhaustion attack, such as allowing a + single API interaction to consume significant server resources? Some + examples of this include launching subprocesses for each connection, or + entity expansion attacks in XML. + *Yes*, enabling replication consume cloud and storage resources. + +Notifications impact +-------------------- + +Will add notification for enabling replication, promoting, syncing and +dropping replication. + +Other end user impact +--------------------- + +* End-user to use volume types to enable/disable replication. + +* Cloud admin to use the *promote*, *sync* and *test* commands + in the python-cinderclient to execute failover, failback and test. + +Performance Impact +------------------ + +* Scheduler now needs to choose two hosts instead of one based on + additional input from the driver and volume type. + +* The periodic task will query the driver and back-end for status + of all replicated volumes - running on the primary and secondary. + +* Extra db calls identifying if replication exists are added to retype, + snapshot operations, etc will add a small latency to these functions. + +Other deployer impact +--------------------- + +* Added options for volume types (see above) + +* Add new driver capabilities, needs to be supported by the volume drivers, + which may imply changes to the driver configuration options. + +* This change will require explicit enablement (to be used by users) + from the cloud administrator. + +Developer impact +---------------- + +* Change to the driver API is noted above. Basically new functions are + needed to support using replication. + +* The API will expand to include consistency groups following merging + consistency group support to Cinder. + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + ronenkat + +Other contributors: + None + +Work Items +---------- + +* Cinder public (admin) APIs for replication +* DB schema for replication +* Cinder scheduler support for replication +* Cinder driver API additions for replication +* Cinder manager update for replication +* Testing + +Note: Code is based on https://review.openstack.org/#/c/64026/ which was +submitted in the Icehouse development cycle. + +Dependencies +============ + +* Related blueprints: Consistency groups + https://blueprints.launchpad.net/cinder/+spec/consistency-groups + +* LVM to support replication using DRBD, in a separate contribution. + +Testing +======= + +* Testing in gate is not supported due to the following considerations: + + * LVM has no replication support, to be addressed using DRBD in a separate + contribution. + * requires setting up at least two nodes using DRBD + +* Should be discussed/addressed as support for LVM is added. + +* 3rd party driver CI will be expected to test replication. + +Documentation Impact +==================== + +* Public (admin) API changes. +* Details how replication is used by leveraging volume types. +* Driver docs explaining how replication is setup for each driver. + +References +========== + +* Volume replication design session + https://etherpad.openstack.org/p/juno-cinder-volume-replication +