7.7 KiB
Cinder Volume Active/Active support - Replication
https://blueprints.launchpad.net/cinder/+spec/cinder-volume-active-active-support
As it stands to reason replication v2.1 only works in deployment configurations that were available and supported in Cinder at the time of its design and implementation.
Now that we are also supporting Active-Active configurations this translates to replication not properly working on this new supported configuration.
This spec extends replication v2.1 functionality to support Active-Active configurations while preserving backward compatibility for non clustered configurations.
Problem description
On replication v2.1 failover is requested on a per backend basis, so when a failover request is received by the API it is then redirected to a specific volume service via an asynchronous RPC call using that service's topic message queue. Same thing happens for freeze and thaw operations.
It works when we have a one-to-one relation between volume services and storage backends, but it doesn't when we have many-to-one relationship because the failover RPC call will be received by only one of the services that form the cluster for the storage backend and the others will be oblivious to this change and will continue using the same replication site they had been using before. This will result in some operations succeeding, those going to the service that performed the failover, and some operations failing, since they are going to the site that's not available.
While that's the primary issue, it's not the only one, since we also have to track the replication status at the cluster level.
Use Cases
Users want to have highly available cinder services with disaster recovery using replication.
It is not enough that individual features will be available on their own as they'll want to have them both at the same time; so being able to use either Active-Active configurations without replication, or replication if not deployed as Active-Active, is insufficient.
They could probably make it work if they stopped all but one volume services in the cluster, issued the failover request, and once it has been completed they brought the other services back up, but this would not be a clean approach to the problem.
Proposed change
The proposed change in its core is to divide the failover operation in the driver into two individual operations, one that will do the side of things related with the storage backend, for example force promoting volumes to primary on the secondary site, and another that will make the driver perform all the operations against the secondary storage device.
As mentioned before only one volume service will receive the request to do the failover, so by splitting the operation the manager will be able to request the local driver to do the first part of the failover and once that is done it will send all volume nodes in the cluster handling that backend the signal that the failover has been completed and that they should start pointing to the failed over secondary site, thus solving the problem of some services not knowing that a new site should be used.
This will also require two homonymous RPC calls to the drivers new
methods in the volume manager: failover
and
failover_completed
.
We will also add the replication information to the
clusters
table to track replication at the cluster level
for clustered services.
Given current use of the freeze and thaw operation there doesn't seem to be a reason to do the same split, so these operations would be left as they are and will only be performed by one volume service when requested.
This change will require vendors to update their drivers to support replication on Active-Active configurations, so to avoid surprises we will be preventing the volume service from starting in Active-Active configurations with replication enabled on drivers that don't support the Active-Active mechanism.
Alternatives
The splitting mechanism for the failover_host
method is
pretty straight forward, the only alternative to the proposed changed
would be to split the thaw and freeze operations as well.
Data model impact
Three new fields related to the replication will be added to the
clusters
table. These will be the same fields we currently
have in the services
table and will hold the same
meaning:
replication_status
: String storing the replication status for the whole cluster.active_backend_id
: String storing which one of the replication sites is currently active.frozen
: Boolean reflecting whether the cluster is frozen or not.
These fields will be kept in sync between the clusters
table and the services
table for consistency.
REST API impact
- A new action called
failover
equivalent to existingfailover_host
will be added, and it will support a newcluster
parameter in addition to thehost
field already available infailover_host
. - Cluster listing will accept
replication_status
,frozen
andactive_backend_id
as filters, - Cluster listing will return additional
replication_status
,frozen
andactive_backend_id
fields.
Security impact
None.
Notifications impact
None.
Other end user impact
The client will return the new fields when listing clusters using the new microversion and new filters will also be available.
Failover for this microversion will accept the cluster parameter.
Performance Impact
The new code should have no performance impact on existing deployments since it will only affect new Active-Active deployments.
Other deployer impact
None.
Developer impact
Drivers that wish to support replication on Active-Active deployments
will have to implement failover
and
failover_completed
methods as well as the current
failover_host
method since it is being used for backward
compatibility with the base replication v2.1.
The easiest way to support this with minimum code would be to
implement failover
and failover_completed
and
then create failover_host
based on those:
def failover_host(self, volumes, secondary_id):
self.failover(volumes, secondary_id)
self.failover_completed(secondary_id)
Implementation
Assignee(s)
- Primary assignee:
-
Gorka Eguileor (geguileo)
- Other contributors:
-
None
Work Items
- Change service start to use
active_backend_id
from the cluster or the service. - Add new
failover
REST API - Update list REST API method to accept new filtering fields and update the view to return new information.
- Update the DB model and create migration
- Update
Cluster
Versioned Object - Make modifications to the manager to support the new RPC calls.
Dependencies
This work has no additional dependency besides the basic Active-Active mechanism being in place, which it already is.
Testing
Only unit tests will be implemented, since there is no reference driver that implements replication and can be used at the gate.
We also lack a mechanism to actually verify that the replication is actually working.
Documentation Impact
From a documentation perspective there won't be much to document besides the changes related to the API changes.