Active/Active Replication v2.1 specs
Replication v2.1 works only for Active/Passive configurations and it needs some changes to support Active/Active configurations as well. This patch adds a spec to add Active/Active support to Cinder's replication mechanism. Change-Id: I6ae6e74bdabf656c327f9e0955149a6037676631
This commit is contained in:
parent
6dac354228
commit
369da4edc6
231
specs/ocata/ha-aa-replication.rst
Normal file
231
specs/ocata/ha-aa-replication.rst
Normal file
@ -0,0 +1,231 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=================================================
|
||||
Cinder Volume Active/Active support - Replication
|
||||
=================================================
|
||||
|
||||
https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support
|
||||
|
||||
As it stands to reason replication v2.1 only works in deployment configurations
|
||||
that were available and supported in Cinder at the time of its design and
|
||||
implementation.
|
||||
|
||||
Now that we are also supporting Active-Active configurations this translates to
|
||||
replication not properly working on this new supported configuration.
|
||||
|
||||
This spec extends replication v2.1 functionality to support Active-Active
|
||||
configurations while preserving backward compatibility for non clustered
|
||||
configurations.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
On replication v2.1 failover is requested on a per backend basis, so when a
|
||||
failover request is received by the API it is then redirected to a specific
|
||||
volume service via an asynchronous RPC call using that service's topic message
|
||||
queue. Same thing happens for freeze and thaw operations.
|
||||
|
||||
It works when we have a one-to-one relation between volume services and storage
|
||||
backends, but it doesn't when we have many-to-one relationship because the
|
||||
failover RPC call will be received by only one of the services that form the
|
||||
cluster for the storage backend and the others will be oblivious to this change
|
||||
and will continue using the same replication site they had been using before.
|
||||
This will result in some operations succeeding, those going to the service that
|
||||
performed the failover, and some operations failing, since they are going to
|
||||
the site that's not available.
|
||||
|
||||
While that's the primary issue, it's not the only one, since we also have to
|
||||
track the replication status at the cluster level.
|
||||
|
||||
Use Cases
|
||||
=========
|
||||
|
||||
Users want to have highly available cinder services with disaster recovery
|
||||
using replication.
|
||||
|
||||
It is not enough that individual features will be available on their own as
|
||||
they'll want to have them both at the same time; so being able to use either
|
||||
Active-Active configurations without replication, or replication if not
|
||||
deployed as Active-Active, is insufficient.
|
||||
|
||||
They could probably make it work if they stopped all but one volume services in
|
||||
the cluster, issued the failover request, and once it has been completed they
|
||||
brought the other services back up, but this would not be a clean approach to
|
||||
the problem.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The proposed change in its core is to divide the failover operation in the
|
||||
driver into two individual operations, one that will do the side of things
|
||||
related with the storage backend, for example force promoting volumes to
|
||||
primary on the secondary site, and another that will make the driver perform
|
||||
all the operations against the secondary storage device.
|
||||
|
||||
As mentioned before only one volume service will receive the request to do the
|
||||
failover, so by splitting the operation the manager will be able to request the
|
||||
local driver to do the first part of the failover and once that is done it will
|
||||
send all volume nodes in the cluster handling that backend the signal that that
|
||||
the failover has been completed and that they should start pointing to the
|
||||
failed over secondary site, thus solving the problem of some services not
|
||||
knowing that a new site should be used.
|
||||
|
||||
This will also require two homonymous RPC calls to the drivers new methods in
|
||||
the volume manager: ``failover`` and ``failover_completed``.
|
||||
|
||||
We will also add the replication information to the ``clusters`` table to track
|
||||
replication at the cluster level for clustered services.
|
||||
|
||||
Given current use of the freeze and thaw operation there doesn't seem to be a
|
||||
reason to do the same split, so these operations would be left as they are and
|
||||
will only be performed by one volume service when requested.
|
||||
|
||||
This change will require vendors to update their drivers to support replication
|
||||
on Active-Active configurations, so to avoid surprises we will be preventing
|
||||
the volume service from starting in Active-Active configurations with
|
||||
replication enabled on drivers that don't support the Active-Active
|
||||
mechanism.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The splitting mechanism for the ``failover_host`` method is pretty straight
|
||||
forward, the only alternative to the proposed changed would be to split the
|
||||
thaw and freeze operations as well.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
Three new fields related to the replication will be added to the ``clusters``
|
||||
table. These will be the same fields we currently have in the ``services``
|
||||
table and will hold the same meaning:
|
||||
|
||||
- ``replication_status``: String storing the replication status for the whole
|
||||
cluster.
|
||||
- ``active_backend_id``: String storing which one of the replication sites is
|
||||
currently active.
|
||||
- ``frozen``: Boolean reflecting whether the cluster is frozen or not.
|
||||
|
||||
These fields will be kept in sync between the ``clusters`` table and the
|
||||
``services`` table for consistency.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
- A new action called ``failover`` equivalent to existing ``failover_host``
|
||||
will be added, and it will support a new ``cluster`` parameter in addition to
|
||||
the ``host`` field already available in ``failover_host``.
|
||||
|
||||
- Cluster listing will accept ``replication_status``, ``frozen`` and
|
||||
``active_backend_id`` as filters,
|
||||
|
||||
- Cluster listing will return additional ``replication_status``, ``frozen`` and
|
||||
``active_backend_id`` fields.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
The client will return the new fields when listing clusters using the new
|
||||
microversion and new filters will also be available.
|
||||
|
||||
Failover for this microversion will accept the cluster parameter.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
The new code should have no performance impact on existing deployments since it
|
||||
will only affect new Active-Active deployments.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Drivers that wish to support replication on Active-Active deployments will have
|
||||
to implement ``failover`` and ``failover_completed`` methods as well as the
|
||||
current ``failover_host`` method since it is being used for backward
|
||||
compatibility with the base replication v2.1.
|
||||
|
||||
The easiest way to support this with minimum code would be to implement
|
||||
``failover`` and ``failover_completed`` and then create ``failover_host`` based
|
||||
on those:
|
||||
|
||||
.. code:: python
|
||||
|
||||
def failover_host(self, volumes, secondary_id):
|
||||
self.failover(volumes, secondary_id)
|
||||
self.failover_completed(secondary_id)
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Gorka Eguileor (geguileo)
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
- Change service start to use ``active_backend_id`` from the cluster or the
|
||||
service.
|
||||
|
||||
- Add new ``failover`` REST API
|
||||
|
||||
- Update list REST API method to accept new filtering fields and update the
|
||||
view to return new information.
|
||||
|
||||
- Update the DB model and create migration
|
||||
|
||||
- Update ``Cluster`` Versioned Object
|
||||
|
||||
- Make modifications to the manager to support the new RPC calls.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
This work has no additional dependency besides the basic Active-Active
|
||||
mechanism being in place, which it already is.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Only unit tests will be implemented, since there is no reference driver that
|
||||
implements replication and can be used at the gate.
|
||||
|
||||
We also lack a mechanism to actually verify that the replication is actually
|
||||
working.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
From a documentation perspective there won't be much to document besides the
|
||||
changes related to the API changes.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
- `Replication v2.1`__
|
||||
|
||||
__ https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cheesecake.html
|
Loading…
x
Reference in New Issue
Block a user