Active/Active Replication v2.1 specs

Replication v2.1 works only for Active/Passive configurations and it needs some changes to support Active/Active configurations as well. This patch adds a spec to add Active/Active support to Cinder's replication mechanism. Change-Id: I6ae6e74bdabf656c327f9e0955149a6037676631
2016-11-23 20:01:07 +01:00 · 2016-11-23 20:01:07 +01:00 · 369da4edc6
commit 369da4edc6
parent 6dac354228
1 changed files with 231 additions and 0 deletions
--- a/specs/ocata/ha-aa-replication.rst
+++ b/specs/ocata/ha-aa-replication.rst
@ -0,0 +1,231 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=================================================
+Cinder Volume Active/Active support - Replication
+=================================================
+
+https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support
+
+As it stands to reason replication v2.1 only works in deployment configurations
+that were available and supported in Cinder at the time of its design and
+implementation.
+
+Now that we are also supporting Active-Active configurations this translates to
+replication not properly working on this new supported configuration.
+
+This spec extends replication v2.1 functionality to support Active-Active
+configurations while preserving backward compatibility for non clustered
+configurations.
+
+Problem description
+===================
+
+On replication v2.1 failover is requested on a per backend basis, so when a
+failover request is received by the API it is then redirected to a specific
+volume service via an asynchronous RPC call using that service's topic message
+queue.  Same thing happens for freeze and thaw operations.
+
+It works when we have a one-to-one relation between volume services and storage
+backends, but it doesn't when we have many-to-one relationship because the
+failover RPC call will be received by only one of the services that form the
+cluster for the storage backend and the others will be oblivious to this change
+and will continue using the same replication site they had been using before.
+This will result in some operations succeeding, those going to the service that
+performed the failover, and some operations failing, since they are going to
+the site that's not available.
+
+While that's the primary issue, it's not the only one, since we also have to
+track the replication status at the cluster level.
+
+Use Cases
+=========
+
+Users want to have highly available cinder services with disaster recovery
+using replication.
+
+It is not enough that individual features will be available on their own as
+they'll want to have them both at the same time; so being able to use either
+Active-Active configurations without replication, or replication if not
+deployed as Active-Active, is insufficient.
+
+They could probably make it work if they stopped all but one volume services in
+the cluster, issued the failover request, and once it has been completed they
+brought the other services back up, but this would not be a clean approach to
+the problem.
+
+Proposed change
+===============
+
+The proposed change in its core is to divide the failover operation in the
+driver into two individual operations, one that will do the side of things
+related with the storage backend, for example force promoting volumes to
+primary on the secondary site, and another that will make the driver perform
+all the operations against the secondary storage device.
+
+As mentioned before only one volume service will receive the request to do the
+failover, so by splitting the operation the manager will be able to request the
+local driver to do the first part of the failover and once that is done it will
+send all volume nodes in the cluster handling that backend the signal that that
+the failover has been completed and that they should start pointing to the
+failed over secondary site, thus solving the problem of some services not
+knowing that a new site should be used.
+
+This will also require two homonymous RPC calls to the drivers new methods in
+the volume manager: ``failover`` and ``failover_completed``.
+
+We will also add the replication information to the ``clusters`` table to track
+replication at the cluster level for clustered services.
+
+Given current use of the freeze and thaw operation there doesn't seem to be a
+reason to do the same split, so these operations would be left as they are and
+will only be performed by one volume service when requested.
+
+This change will require vendors to update their drivers to support replication
+on Active-Active configurations, so to avoid surprises we will be preventing
+the volume service from starting in Active-Active configurations with
+replication enabled on drivers that don't support the Active-Active
+mechanism.
+
+Alternatives
+------------
+
+The splitting mechanism for the ``failover_host`` method is pretty straight
+forward, the only alternative to the proposed changed would be to split the
+thaw and freeze operations as well.
+
+Data model impact
+-----------------
+
+Three new fields related to the replication will be added to the ``clusters``
+table.  These will be the same fields we currently have in the ``services``
+table and will hold the same meaning:
+
+- ``replication_status``: String storing the replication status for the whole
+  cluster.
+- ``active_backend_id``: String storing which one of the replication sites is
+  currently active.
+- ``frozen``: Boolean reflecting whether the cluster is frozen or not.
+
+These fields will be kept in sync between the ``clusters`` table and the
+``services`` table for consistency.
+
+REST API impact
+---------------
+
+- A new action called ``failover`` equivalent to existing ``failover_host``
+  will be added, and it will support a new ``cluster`` parameter in addition to
+  the ``host`` field already available in ``failover_host``.
+
+- Cluster listing will accept ``replication_status``, ``frozen`` and
+  ``active_backend_id`` as filters,
+
+- Cluster listing will return additional ``replication_status``, ``frozen`` and
+  ``active_backend_id`` fields.
+
+Security impact
+---------------
+
+None.
+
+Notifications impact
+--------------------
+
+None.
+
+Other end user impact
+---------------------
+
+The client will return the new fields when listing clusters using the new
+microversion and new filters will also be available.
+
+Failover for this microversion will accept the cluster parameter.
+
+Performance Impact
+------------------
+
+The new code should have no performance impact on existing deployments since it
+will only affect new Active-Active deployments.
+
+Other deployer impact
+---------------------
+
+None.
+
+Developer impact
+----------------
+
+Drivers that wish to support replication on Active-Active deployments will have
+to implement ``failover`` and ``failover_completed`` methods as well as the
+current ``failover_host`` method since it is being used for backward
+compatibility with the base replication v2.1.
+
+The easiest way to support this with minimum code would be to implement
+``failover`` and ``failover_completed`` and then create ``failover_host`` based
+on those:
+
+.. code:: python
+
+    def failover_host(self, volumes, secondary_id):
+        self.failover(volumes, secondary_id)
+        self.failover_completed(secondary_id)
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  Gorka Eguileor (geguileo)
+
+Other contributors:
+  None
+
+Work Items
+----------
+
+- Change service start to use ``active_backend_id`` from the cluster or the
+  service.
+
+- Add new ``failover`` REST API
+
+- Update list REST API method to accept new filtering fields and update the
+  view to return new information.
+
+- Update the DB model and create migration
+
+- Update ``Cluster`` Versioned Object
+
+- Make modifications to the manager to support the new RPC calls.
+
+Dependencies
+============
+
+This work has no additional dependency besides the basic Active-Active
+mechanism being in place, which it already is.
+
+Testing
+=======
+
+Only unit tests will be implemented, since there is no reference driver that
+implements replication and can be used at the gate.
+
+We also lack a mechanism to actually verify that the replication is actually
+working.
+
+Documentation Impact
+====================
+
+From a documentation perspective there won't be much to document besides the
+changes related to the API changes.
+
+References
+==========
+
+- `Replication v2.1`__
+
+__ https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cheesecake.html