Trove Replication V2
Specification outlining the design of Replication Features to be added to the Trove Kilo Release. Change-Id: If0a14416eaecc1ed5e78b3518ee4ed3fe6422a65 Implements: blueprint replication-v2
This commit is contained in:
parent
db9d439bb9
commit
7b08a1b449
619
specs/replication-v2.rst
Normal file
619
specs/replication-v2.rst
Normal file
@ -0,0 +1,619 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
Sections of this template were taken directly from the Nova spec
|
||||
template at:
|
||||
https://github.com/openstack/nova-specs/blob/master/specs/template.rst
|
||||
..
|
||||
|
||||
=======================
|
||||
Trove Replication V2
|
||||
=======================
|
||||
|
||||
Include the URL of your launchpad blueprint:
|
||||
|
||||
https://blueprints.launchpad.net/trove/bp/replication-v2
|
||||
|
||||
The Juno release of Trove laid the foundation of Trove Replication
|
||||
support. The V1 version of replication focused on providing read-only
|
||||
slave replication in MySQL 5.5. For the V2 replication release for
|
||||
Kilo, replication will be extended to provide support for manual
|
||||
failover in MySQL replication leveraging the latest replication
|
||||
features of MySQL 5.6.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
For the Kilo release of OpenStack, trove replication support will be
|
||||
extended to support manual failover when a replication master fails.
|
||||
Specifically, this means that a user can instruct Trove to demote a
|
||||
replication master and promote a slave to be the new master. For V2,
|
||||
manual promotion means that the user will be required to execute an
|
||||
action to cause failover - a component to detect failure and cause the
|
||||
failover to occur will not be within the scope of V2.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Supported Features:
|
||||
|
||||
* manual failover
|
||||
* master/slaves in different availability zones
|
||||
* automatic slave generation to replace slaves promoted to master
|
||||
* automatically generated slave will be created in the same az as the
|
||||
slave that was promoted to master
|
||||
* public ips assigned to deleted/demoted master will be transferred to
|
||||
new master
|
||||
* public ips of promoted slave will be transferred to new slave
|
||||
* GTIDs will be used to facilitate master promotion (Note: this limits
|
||||
feature set to MySQL 5.6 and later)
|
||||
* if a master site is reachable, a chosen slave may be promoted to
|
||||
master and the old master will be demoted to a slave. This
|
||||
operation will be done in such a way as to prevent the loss of data.
|
||||
This operation would be useful for resizing a master without
|
||||
downtime.
|
||||
* a master site may be deleted, in which case Trove will pick a slave
|
||||
to be promoted to master (see MASTER_PROMOTION_STRATEGY below) and a
|
||||
new slave will be generated to replace the promoted slave. If the
|
||||
master site in not reachable, it will be forcefully removed from
|
||||
Trove/Nova; this is how an unreachable master would be "failed
|
||||
over".
|
||||
* new master selection process on delete has following
|
||||
MASTER_PROMOTION_STRATEGY (CONF) switch: MOST_RECENT: the slave with
|
||||
the most recent updates is chosen as new master, PROXIMATE_AZ: slave
|
||||
IN MASTER's AZ with most recent updates is chosen as new master,
|
||||
PROXIMATE_REGION: slave IN MASTER's REGION with most recent update
|
||||
is chosen as new master. PROXIMATE_REGION will be the default
|
||||
(though for now equivalent to MOST_RECENT) and may be the only
|
||||
implemented option for V2.
|
||||
* replication from existing backup and incremental snapshot will be
|
||||
implemented
|
||||
* replica_count option will be added to create-instance to allow N
|
||||
slaves to be spun up from a given snapshot. All replicas from the
|
||||
given snapshot will have the same "create-instance" options.
|
||||
|
||||
Features Not Supported:
|
||||
|
||||
* automatic failover
|
||||
* region support
|
||||
* writable slaves
|
||||
* features related to the promotion of slaves to masters will not be
|
||||
supported by MySQL versions prior to 5.6
|
||||
* replication_strategy per datastore - this could be implemented in
|
||||
Kilo via an independent blueprint
|
||||
* GTID based replication for MariaDB (binlog replication will not be
|
||||
tested for MariaDB, but should be compatible with MySQL)
|
||||
* host affinity/anti-affinity
|
||||
* dealing with "error transactions" created when updates are executed
|
||||
directly on slaves in conflict with changes on the master.
|
||||
Performing updates directly on slaves is not supported by Trove and
|
||||
slave sites will be put into "read only" mode.
|
||||
|
||||
Replication V2 Components
|
||||
-------------------------
|
||||
|
||||
The V2 Replication feature will consist of several components:
|
||||
|
||||
- Implement a new replication strategy to support GTID Based
|
||||
Replication in addition to Bin Log replication.
|
||||
- Manual failover from replication master
|
||||
- Replication configuration using incremental snapshots based on
|
||||
existing backups.
|
||||
- Creation of multiple slaves from master in single call
|
||||
|
||||
Upgrade from Binlog Replication to GTID Based Replication
|
||||
*********************************************************
|
||||
|
||||
MySQL 5.6 introduced a new type of replication which is based on
|
||||
Global Transaction IDs (GTID). By assigning a GTID to each
|
||||
transaction, MySQL is able to simplify transaction coordination
|
||||
between masters and slaves, allowing for simpler and more reliable
|
||||
failover to a new master.
|
||||
|
||||
This feature requires that the trove-integration project upgrade to
|
||||
use Ubuntu 14.04 and MySQL 5.6.
|
||||
|
||||
A new Replication Strategy named "MysqlGTIDReplicationStrategy" will
|
||||
be created to support the new GTID based replication with MySQL 5.6
|
||||
and later, and the existing Replication Strategy named
|
||||
"MysqlBinlogReplication" will continue to be supported for MySQL 5.5
|
||||
but without support for the new features listed in this document.
|
||||
|
||||
|
||||
Manual Failover from Replication Master
|
||||
***************************************
|
||||
|
||||
It will be possible for a user to cause a slave to become the new
|
||||
master for replication by executing a trove command. For the V2
|
||||
release of replication, no facility for detecting a master failure
|
||||
condition will be provided.
|
||||
|
||||
To assist the user in minimizing data loss, there will be two
|
||||
different ways for the user to cause a slave to be promoted to master.
|
||||
If the user wishes to promote a slave to replace a master which is
|
||||
healthy and reachable, they will execute a new
|
||||
"promote_to_replica_source" function against a slave to promote it in
|
||||
place of the existing master; this function will coordinate with the
|
||||
master site to ensure that no data is lost. If a master site is
|
||||
unreachable, the user will use the "eject_replica_source" function to
|
||||
remove that instance from the replication set and the replication
|
||||
strategy will choose the slave with the most recent updates to promote
|
||||
to master; this operation may result in the loss of any transactions
|
||||
that were committed at the master site but not replicated to any of
|
||||
the slaves. Trove will not allow a reachable master site to be
|
||||
deleted as that would unnecessarily result in lost data.
|
||||
|
||||
There will be no accomodation made to allow users or operators to
|
||||
"fix" slaves which have gotten out of sync with the master site.
|
||||
Instead, every effort will be made to configure replication so that
|
||||
the slave will not fall out of sync with the master. The following
|
||||
MySQL options will be set to ensure safe replication:
|
||||
|
||||
*Master Options*
|
||||
|
||||
* Binary logs will be configured for MIXED mode logging. This will
|
||||
allow statement based replication where it is safe to do so, and row
|
||||
based replication will be used where necessary.
|
||||
* The enforce_gtid_consistency option will be used to prevent
|
||||
statements which will conflict with the use of GTID replication.
|
||||
* When the Percona database is being used, the Percona
|
||||
enforce_storage_engine option will be used to restrict replication
|
||||
to the InnoDB storage engine. This is to prevent the use of MyISAM
|
||||
tables which could be corrupted during a crash recovery.
|
||||
|
||||
*Slave Options*
|
||||
|
||||
* Slave will execute in READ_ONLY mode to avoid transaction conflicts
|
||||
between master and slave. By default, users are not given root
|
||||
access to the database; if they choose to enable root access, they
|
||||
are assumed to be sufficiently advanced as to not execute operations
|
||||
on a slave which will disturb replication.
|
||||
* The slaves' relay log will be stored in a table in the database to
|
||||
provide transactional consistency between the statements executed
|
||||
against the database and the recording of the slave's position in
|
||||
executing the relay log.
|
||||
* Relay log recovery will be turned on to cause relay log recovery
|
||||
during mysql startup. relay_log_purge will be enabled in support
|
||||
for relay_log_recovery.
|
||||
|
||||
Promotion of Slave to Master
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The user may select a slave to be promoted to be the new master of a
|
||||
replication set. This operation would consist of the following steps:
|
||||
|
||||
#. Contact each slave, abort operation if any not reachable
|
||||
#. Make the old master read-only
|
||||
#. Detach old master's public IP
|
||||
#. Detach master candidate's public IP
|
||||
#. Record latest GTID of master
|
||||
#. For each slave (including master candidate)
|
||||
|
||||
* Wait for slave to receive/apply master's latest GTID
|
||||
#. Set master candidate as replication master site
|
||||
#. For each remaining slave
|
||||
|
||||
* Make instance slave of new master
|
||||
#. Make old master be slave of new master
|
||||
#. Assign master candidate's IP to old master (which is now slave)
|
||||
#. Make new master writable
|
||||
#. Assign old master's public IP to new master
|
||||
|
||||
*Promote to Master API*
|
||||
|
||||
To replace a healthy master site, the promote_to_replica_source API
|
||||
call will be added to the client and taskmanager APIs.
|
||||
|
||||
Ejection of Master Site
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If a replication master site is out of service, the user may choose to
|
||||
"eject" the instance from the replication set. Ejecting an
|
||||
unreachable instance which is a master for replication would result in
|
||||
one of its slaves being chosen to be promoted to be the new master
|
||||
site, and a new slave generated to fill out the replication set. The
|
||||
ejected master will be available for examination, but will no longer
|
||||
participate in replication. This operation would consist of the
|
||||
following steps:
|
||||
|
||||
#. Abort operation if the master site can be contacted
|
||||
#. Contact each slave, abort operation if any not reachable
|
||||
#. Detach master's public IP
|
||||
#. Record master's Region/Zone
|
||||
#. Select master candidate (see Master Candidate Selection)
|
||||
#. Switch the master candidate from slave to master
|
||||
#. For each remaining slave
|
||||
|
||||
* Connect slave to new master instance
|
||||
#. Mark new master as writable
|
||||
#. Attach master's public IP to new master
|
||||
#. Create new slave in same Region/Zone as old master
|
||||
#. Assign master candidate's public IP to new slave
|
||||
|
||||
*Master Candidate Selection*
|
||||
|
||||
When selecting a slave to be promoted to master to replace an
|
||||
unreachable master site, the algorithm for choosing the master
|
||||
candidate will be determined by the value of the
|
||||
MASTER_PROMOTION_STRATEGY configuration option of the Taskmanager
|
||||
Config (not datastore specific). The possible values for this option
|
||||
are outlined below:
|
||||
|
||||
================ =================================================
|
||||
Strategy Description
|
||||
================ =================================================
|
||||
MOST_RECENT The slave with the highest GTID is chosen as the
|
||||
master candidate
|
||||
PROXIMATE_AZ The slave with the highest GTID in the same
|
||||
Availability Zone as the old master is chosen
|
||||
PROXIMATE_REGION The slave with the highest GTID in the same
|
||||
Region as the old master is chosen
|
||||
================ =================================================
|
||||
|
||||
The PROXIMATE_REGION setting will be the default as this will ensure
|
||||
that the new master site will be in the same region as the old master;
|
||||
for the Kilo release, this will be equivalent to the MOST_RECENT
|
||||
option (and may be implemented as such) as Region support is not
|
||||
implemented in Trove.
|
||||
|
||||
|
||||
Incremental Snapshots
|
||||
*********************
|
||||
|
||||
To improve the performance of slave creation, the default action will
|
||||
be to take the most recent backup (full or incremental) and create an
|
||||
incremental backup to be used for the replication snapshot. If no
|
||||
previous backup can be found, a full backup will be created to include
|
||||
in the replication snapshot. Should the "backup" option be specified
|
||||
in addition to the "replica_of" option, an incremental backup will be
|
||||
performed from the indicated backup.
|
||||
|
||||
|
||||
Multiple Slave Creation
|
||||
***********************
|
||||
|
||||
A replica_count option will be added to support the creation of multiple
|
||||
slaves from a single replication snapshot.
|
||||
|
||||
* a replica_count option will be added to the ``trove create`` command
|
||||
* a replica_count parameter will be added to the create_instance
|
||||
taskmanager ReST API
|
||||
* the taskmanager FreshInstanceTasks.create_instance method will
|
||||
iteratively create the specified number of slaves from a single
|
||||
replication snapshot (the implementor is free to implement slave
|
||||
creation in parallel if time permits, and should investigate doing
|
||||
so, but it is not a requirement for V2)
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
|
||||
MysqlGTIDReplicationStrategy value added to ReplicationStategy option
|
||||
for MySQL configuration.
|
||||
|
||||
New configuration option master_promotion_strategy added to MySQL
|
||||
configuration with values as above.
|
||||
|
||||
Database
|
||||
--------
|
||||
|
||||
No database impacts are envisioned.
|
||||
|
||||
|
||||
Public API
|
||||
----------
|
||||
|
||||
*Promote to Replica Source*
|
||||
|
||||
A new action will be added to the Trove REST API to allow a replica to
|
||||
be promoted to be the master of its replication set::
|
||||
|
||||
POST http://127.0.0.1:8779/v1.0/<tenant id>/instance/<instance id>/action
|
||||
{
|
||||
*"promote_to_replica_source": null*
|
||||
}
|
||||
|
||||
RESP: [200]
|
||||
{
|
||||
'date': '<date>',
|
||||
'content-length': '<RESP BODY len>',
|
||||
'content-type': 'application/json'
|
||||
}
|
||||
RESP BODY:
|
||||
{
|
||||
"instance": {
|
||||
*"status": "PROMOTE",*
|
||||
"updated": "2014-11-25T21:25:11",
|
||||
"name": "m",
|
||||
"links": [
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...",
|
||||
"rel": "self"
|
||||
},
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/instances\/...",
|
||||
"rel": "bookmark"
|
||||
}
|
||||
],
|
||||
"created": "2014-11-25T21:25:06",
|
||||
"ip": [
|
||||
"10.0.0.2"
|
||||
],
|
||||
"replicas": [
|
||||
{
|
||||
"id": "8e5710df-ef39-4201-a059-764d9091f079",
|
||||
"links": [
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...",
|
||||
"rel": "self"
|
||||
},
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/instances\/...",
|
||||
"rel": "bookmark"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"id": "fff6d8c5-9d05-4a00-ab58-d8954ec945a3",
|
||||
"volume": {
|
||||
"used": 0.13,
|
||||
"size": 1
|
||||
},
|
||||
"flavor": {
|
||||
"id": "7",
|
||||
"links": [
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/flavors\/7",
|
||||
"rel": "self"
|
||||
},
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/flavors\/7",
|
||||
"rel": "bookmark"
|
||||
}
|
||||
]
|
||||
},
|
||||
"datastore": {
|
||||
"version": "5.5",
|
||||
"type": "mysql"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
A new CLI command will be added to invoke the
|
||||
promote_to_replica_source API::
|
||||
|
||||
trove promote-to-replica-source <replica id>
|
||||
|
||||
*Eject Replica Source*
|
||||
|
||||
A new action will be added to the Trove REST API to allow a replica
|
||||
source to be ejected from a replication set::
|
||||
|
||||
POST http://127.0.0.1:8779/v1.0/<tenant id>/instance/<instance id>/action
|
||||
{
|
||||
*"eject_replica_source": null*
|
||||
}
|
||||
|
||||
RESP: [200]
|
||||
{
|
||||
'date': '<date>',
|
||||
'content-length': '<RESP BODY len>',
|
||||
'content-type': 'application/json'
|
||||
}
|
||||
RESP BODY:
|
||||
{
|
||||
"instance": {
|
||||
*"status": "EJECT",*
|
||||
"updated": "2014-11-25T21:25:11",
|
||||
"name": "m",
|
||||
"links": [
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...",
|
||||
"rel": "self"
|
||||
},
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/instances\/...",
|
||||
"rel": "bookmark"
|
||||
}
|
||||
],
|
||||
"created": "2014-11-25T21:25:06",
|
||||
"ip": [
|
||||
"10.0.0.2"
|
||||
],
|
||||
"replicas": [
|
||||
{
|
||||
"id": "8e5710df-ef39-4201-a059-764d9091f079",
|
||||
"links": [
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...",
|
||||
"rel": "self"
|
||||
},
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/instances\/...",
|
||||
"rel": "bookmark"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"id": "fff6d8c5-9d05-4a00-ab58-d8954ec945a3",
|
||||
"volume": {
|
||||
"used": 0.13,
|
||||
"size": 1
|
||||
},
|
||||
"flavor": {
|
||||
"id": "7",
|
||||
"links": [
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/flavors\/7",
|
||||
"rel": "self"
|
||||
},
|
||||
{
|
||||
"href": "https:\/\/10.40.10.178:8779\/flavors\/7",
|
||||
"rel": "bookmark"
|
||||
}
|
||||
]
|
||||
},
|
||||
"datastore": {
|
||||
"version": "5.5",
|
||||
"type": "mysql"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
A new CLI command will be added to invoke the eject_replica_source
|
||||
API::
|
||||
|
||||
trove eject-replica-source <replica source id>
|
||||
|
||||
|
||||
*Trove Create Replica Count*
|
||||
|
||||
The Trove REST API for the create instance operation will be augmented
|
||||
with a new field *replica_count* to specify the number of replicas to
|
||||
be created from the indicated instance::
|
||||
|
||||
POST http://127.0.0.1:8779/v1.0/<tenant id>/instances
|
||||
{
|
||||
"instance": {
|
||||
"volume": {"size": 1},
|
||||
"flavorRef": "7",
|
||||
"name": "s",
|
||||
"replica_of": "<master id>",
|
||||
*"replica_count": "<n>"*
|
||||
}
|
||||
}
|
||||
|
||||
RESP *unchanged*
|
||||
|
||||
An option will be added to the "trove create" CLI command to specify
|
||||
the new replica count option::
|
||||
|
||||
trove create <name> <flavor id> --replica_count=<count> ...
|
||||
|
||||
|
||||
Internal API
|
||||
------------
|
||||
|
||||
promote_to_replica_source method added to taskmanager API.
|
||||
eject_replica_source method added to taskmanager API.
|
||||
|
||||
Guest Agent
|
||||
-----------
|
||||
|
||||
The implementation of this feature set will result in many additions
|
||||
to the MySQL guest agent. There should be minimal impact to
|
||||
pre-existing code, and there is not expected to be any impact on
|
||||
backward compatibility of the APIs.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
vgnbkr
|
||||
|
||||
Secondary assignee:
|
||||
peterstac
|
||||
|
||||
Milestones
|
||||
----------
|
||||
|
||||
Target Milestone for completion:
|
||||
Kilo-2
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
====================== ========= =================
|
||||
Work Item Assignee Scheduled Release
|
||||
====================== ========= =================
|
||||
GTID Support Morgan Kilo-3
|
||||
Failover Morgan Kilo-3
|
||||
Slave Count Peter Kilo-3
|
||||
Incremental Snapshots Peter Kilo-3
|
||||
====================== ========= =================
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
n/a
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The existing int-tests are believed to be sufficient for testing the
|
||||
GTID replication changes, as there are no functionality changes, just
|
||||
implementation changes.
|
||||
|
||||
New Int-Tests:
|
||||
|
||||
Promote to Master Positive
|
||||
|
||||
Create a new replication set of two sites. Attach floating ip
|
||||
addresses to each instance. Execute the promote_to_replica_source
|
||||
API call and verify that the master/slave relationships are
|
||||
correctly changed, and that the floating ip addresses maintain
|
||||
their affinity to master and slave.
|
||||
|
||||
Promote to Master Negative
|
||||
|
||||
Create a new replication set of two sites. Execute "service mysql
|
||||
stop" on the master site. Verify that promote_to_replica_source
|
||||
cannot be executed against the slave site.
|
||||
|
||||
Delete Master Positive
|
||||
|
||||
Create a new replication set of two sites. Attach floating ip
|
||||
addresses to each instance. Execute "service mysql stop" on the
|
||||
master to simulate the master site crashing. Execute the delete
|
||||
API call against the master site. Ensure that the slave has been
|
||||
promoted to master, a new slave has been added, and that the
|
||||
floating ip addresses have been moved appropriately.
|
||||
|
||||
Replica Count
|
||||
|
||||
No int-test will be done for this feature due to the resource
|
||||
requirements
|
||||
|
||||
Incremental Snapshots
|
||||
|
||||
No int-test will be done for this feature as there is no way to
|
||||
verify that the restore was actually done from an incremental
|
||||
backup rather than a full backup
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
User Guide
|
||||
----------
|
||||
|
||||
* add section explaining manual failover, both via
|
||||
promote-to-replica-source and via deletion of a failed master
|
||||
* section on replication should be updated to document replica_count
|
||||
option to "trove create"
|
||||
|
||||
CLI Reference
|
||||
-------------
|
||||
|
||||
* add promote-to-replica-source command
|
||||
* add eject-replica-source command
|
||||
* update create command with replica_count
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
- https://etherpad.openstack.org/p/kilo-summit-trove-replication-v2
|
||||
|
Loading…
Reference in New Issue
Block a user