7b08a1b449
Specification outlining the design of Replication Features to be added to the Trove Kilo Release. Change-Id: If0a14416eaecc1ed5e78b3518ee4ed3fe6422a65 Implements: blueprint replication-v2
620 lines
21 KiB
ReStructuredText
620 lines
21 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
Sections of this template were taken directly from the Nova spec
|
|
template at:
|
|
https://github.com/openstack/nova-specs/blob/master/specs/template.rst
|
|
..
|
|
|
|
=======================
|
|
Trove Replication V2
|
|
=======================
|
|
|
|
Include the URL of your launchpad blueprint:
|
|
|
|
https://blueprints.launchpad.net/trove/bp/replication-v2
|
|
|
|
The Juno release of Trove laid the foundation of Trove Replication
|
|
support. The V1 version of replication focused on providing read-only
|
|
slave replication in MySQL 5.5. For the V2 replication release for
|
|
Kilo, replication will be extended to provide support for manual
|
|
failover in MySQL replication leveraging the latest replication
|
|
features of MySQL 5.6.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
For the Kilo release of OpenStack, trove replication support will be
|
|
extended to support manual failover when a replication master fails.
|
|
Specifically, this means that a user can instruct Trove to demote a
|
|
replication master and promote a slave to be the new master. For V2,
|
|
manual promotion means that the user will be required to execute an
|
|
action to cause failover - a component to detect failure and cause the
|
|
failover to occur will not be within the scope of V2.
|
|
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
Supported Features:
|
|
|
|
* manual failover
|
|
* master/slaves in different availability zones
|
|
* automatic slave generation to replace slaves promoted to master
|
|
* automatically generated slave will be created in the same az as the
|
|
slave that was promoted to master
|
|
* public ips assigned to deleted/demoted master will be transferred to
|
|
new master
|
|
* public ips of promoted slave will be transferred to new slave
|
|
* GTIDs will be used to facilitate master promotion (Note: this limits
|
|
feature set to MySQL 5.6 and later)
|
|
* if a master site is reachable, a chosen slave may be promoted to
|
|
master and the old master will be demoted to a slave. This
|
|
operation will be done in such a way as to prevent the loss of data.
|
|
This operation would be useful for resizing a master without
|
|
downtime.
|
|
* a master site may be deleted, in which case Trove will pick a slave
|
|
to be promoted to master (see MASTER_PROMOTION_STRATEGY below) and a
|
|
new slave will be generated to replace the promoted slave. If the
|
|
master site in not reachable, it will be forcefully removed from
|
|
Trove/Nova; this is how an unreachable master would be "failed
|
|
over".
|
|
* new master selection process on delete has following
|
|
MASTER_PROMOTION_STRATEGY (CONF) switch: MOST_RECENT: the slave with
|
|
the most recent updates is chosen as new master, PROXIMATE_AZ: slave
|
|
IN MASTER's AZ with most recent updates is chosen as new master,
|
|
PROXIMATE_REGION: slave IN MASTER's REGION with most recent update
|
|
is chosen as new master. PROXIMATE_REGION will be the default
|
|
(though for now equivalent to MOST_RECENT) and may be the only
|
|
implemented option for V2.
|
|
* replication from existing backup and incremental snapshot will be
|
|
implemented
|
|
* replica_count option will be added to create-instance to allow N
|
|
slaves to be spun up from a given snapshot. All replicas from the
|
|
given snapshot will have the same "create-instance" options.
|
|
|
|
Features Not Supported:
|
|
|
|
* automatic failover
|
|
* region support
|
|
* writable slaves
|
|
* features related to the promotion of slaves to masters will not be
|
|
supported by MySQL versions prior to 5.6
|
|
* replication_strategy per datastore - this could be implemented in
|
|
Kilo via an independent blueprint
|
|
* GTID based replication for MariaDB (binlog replication will not be
|
|
tested for MariaDB, but should be compatible with MySQL)
|
|
* host affinity/anti-affinity
|
|
* dealing with "error transactions" created when updates are executed
|
|
directly on slaves in conflict with changes on the master.
|
|
Performing updates directly on slaves is not supported by Trove and
|
|
slave sites will be put into "read only" mode.
|
|
|
|
Replication V2 Components
|
|
-------------------------
|
|
|
|
The V2 Replication feature will consist of several components:
|
|
|
|
- Implement a new replication strategy to support GTID Based
|
|
Replication in addition to Bin Log replication.
|
|
- Manual failover from replication master
|
|
- Replication configuration using incremental snapshots based on
|
|
existing backups.
|
|
- Creation of multiple slaves from master in single call
|
|
|
|
Upgrade from Binlog Replication to GTID Based Replication
|
|
*********************************************************
|
|
|
|
MySQL 5.6 introduced a new type of replication which is based on
|
|
Global Transaction IDs (GTID). By assigning a GTID to each
|
|
transaction, MySQL is able to simplify transaction coordination
|
|
between masters and slaves, allowing for simpler and more reliable
|
|
failover to a new master.
|
|
|
|
This feature requires that the trove-integration project upgrade to
|
|
use Ubuntu 14.04 and MySQL 5.6.
|
|
|
|
A new Replication Strategy named "MysqlGTIDReplicationStrategy" will
|
|
be created to support the new GTID based replication with MySQL 5.6
|
|
and later, and the existing Replication Strategy named
|
|
"MysqlBinlogReplication" will continue to be supported for MySQL 5.5
|
|
but without support for the new features listed in this document.
|
|
|
|
|
|
Manual Failover from Replication Master
|
|
***************************************
|
|
|
|
It will be possible for a user to cause a slave to become the new
|
|
master for replication by executing a trove command. For the V2
|
|
release of replication, no facility for detecting a master failure
|
|
condition will be provided.
|
|
|
|
To assist the user in minimizing data loss, there will be two
|
|
different ways for the user to cause a slave to be promoted to master.
|
|
If the user wishes to promote a slave to replace a master which is
|
|
healthy and reachable, they will execute a new
|
|
"promote_to_replica_source" function against a slave to promote it in
|
|
place of the existing master; this function will coordinate with the
|
|
master site to ensure that no data is lost. If a master site is
|
|
unreachable, the user will use the "eject_replica_source" function to
|
|
remove that instance from the replication set and the replication
|
|
strategy will choose the slave with the most recent updates to promote
|
|
to master; this operation may result in the loss of any transactions
|
|
that were committed at the master site but not replicated to any of
|
|
the slaves. Trove will not allow a reachable master site to be
|
|
deleted as that would unnecessarily result in lost data.
|
|
|
|
There will be no accomodation made to allow users or operators to
|
|
"fix" slaves which have gotten out of sync with the master site.
|
|
Instead, every effort will be made to configure replication so that
|
|
the slave will not fall out of sync with the master. The following
|
|
MySQL options will be set to ensure safe replication:
|
|
|
|
*Master Options*
|
|
|
|
* Binary logs will be configured for MIXED mode logging. This will
|
|
allow statement based replication where it is safe to do so, and row
|
|
based replication will be used where necessary.
|
|
* The enforce_gtid_consistency option will be used to prevent
|
|
statements which will conflict with the use of GTID replication.
|
|
* When the Percona database is being used, the Percona
|
|
enforce_storage_engine option will be used to restrict replication
|
|
to the InnoDB storage engine. This is to prevent the use of MyISAM
|
|
tables which could be corrupted during a crash recovery.
|
|
|
|
*Slave Options*
|
|
|
|
* Slave will execute in READ_ONLY mode to avoid transaction conflicts
|
|
between master and slave. By default, users are not given root
|
|
access to the database; if they choose to enable root access, they
|
|
are assumed to be sufficiently advanced as to not execute operations
|
|
on a slave which will disturb replication.
|
|
* The slaves' relay log will be stored in a table in the database to
|
|
provide transactional consistency between the statements executed
|
|
against the database and the recording of the slave's position in
|
|
executing the relay log.
|
|
* Relay log recovery will be turned on to cause relay log recovery
|
|
during mysql startup. relay_log_purge will be enabled in support
|
|
for relay_log_recovery.
|
|
|
|
Promotion of Slave to Master
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The user may select a slave to be promoted to be the new master of a
|
|
replication set. This operation would consist of the following steps:
|
|
|
|
#. Contact each slave, abort operation if any not reachable
|
|
#. Make the old master read-only
|
|
#. Detach old master's public IP
|
|
#. Detach master candidate's public IP
|
|
#. Record latest GTID of master
|
|
#. For each slave (including master candidate)
|
|
|
|
* Wait for slave to receive/apply master's latest GTID
|
|
#. Set master candidate as replication master site
|
|
#. For each remaining slave
|
|
|
|
* Make instance slave of new master
|
|
#. Make old master be slave of new master
|
|
#. Assign master candidate's IP to old master (which is now slave)
|
|
#. Make new master writable
|
|
#. Assign old master's public IP to new master
|
|
|
|
*Promote to Master API*
|
|
|
|
To replace a healthy master site, the promote_to_replica_source API
|
|
call will be added to the client and taskmanager APIs.
|
|
|
|
Ejection of Master Site
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
If a replication master site is out of service, the user may choose to
|
|
"eject" the instance from the replication set. Ejecting an
|
|
unreachable instance which is a master for replication would result in
|
|
one of its slaves being chosen to be promoted to be the new master
|
|
site, and a new slave generated to fill out the replication set. The
|
|
ejected master will be available for examination, but will no longer
|
|
participate in replication. This operation would consist of the
|
|
following steps:
|
|
|
|
#. Abort operation if the master site can be contacted
|
|
#. Contact each slave, abort operation if any not reachable
|
|
#. Detach master's public IP
|
|
#. Record master's Region/Zone
|
|
#. Select master candidate (see Master Candidate Selection)
|
|
#. Switch the master candidate from slave to master
|
|
#. For each remaining slave
|
|
|
|
* Connect slave to new master instance
|
|
#. Mark new master as writable
|
|
#. Attach master's public IP to new master
|
|
#. Create new slave in same Region/Zone as old master
|
|
#. Assign master candidate's public IP to new slave
|
|
|
|
*Master Candidate Selection*
|
|
|
|
When selecting a slave to be promoted to master to replace an
|
|
unreachable master site, the algorithm for choosing the master
|
|
candidate will be determined by the value of the
|
|
MASTER_PROMOTION_STRATEGY configuration option of the Taskmanager
|
|
Config (not datastore specific). The possible values for this option
|
|
are outlined below:
|
|
|
|
================ =================================================
|
|
Strategy Description
|
|
================ =================================================
|
|
MOST_RECENT The slave with the highest GTID is chosen as the
|
|
master candidate
|
|
PROXIMATE_AZ The slave with the highest GTID in the same
|
|
Availability Zone as the old master is chosen
|
|
PROXIMATE_REGION The slave with the highest GTID in the same
|
|
Region as the old master is chosen
|
|
================ =================================================
|
|
|
|
The PROXIMATE_REGION setting will be the default as this will ensure
|
|
that the new master site will be in the same region as the old master;
|
|
for the Kilo release, this will be equivalent to the MOST_RECENT
|
|
option (and may be implemented as such) as Region support is not
|
|
implemented in Trove.
|
|
|
|
|
|
Incremental Snapshots
|
|
*********************
|
|
|
|
To improve the performance of slave creation, the default action will
|
|
be to take the most recent backup (full or incremental) and create an
|
|
incremental backup to be used for the replication snapshot. If no
|
|
previous backup can be found, a full backup will be created to include
|
|
in the replication snapshot. Should the "backup" option be specified
|
|
in addition to the "replica_of" option, an incremental backup will be
|
|
performed from the indicated backup.
|
|
|
|
|
|
Multiple Slave Creation
|
|
***********************
|
|
|
|
A replica_count option will be added to support the creation of multiple
|
|
slaves from a single replication snapshot.
|
|
|
|
* a replica_count option will be added to the ``trove create`` command
|
|
* a replica_count parameter will be added to the create_instance
|
|
taskmanager ReST API
|
|
* the taskmanager FreshInstanceTasks.create_instance method will
|
|
iteratively create the specified number of slaves from a single
|
|
replication snapshot (the implementor is free to implement slave
|
|
creation in parallel if time permits, and should investigate doing
|
|
so, but it is not a requirement for V2)
|
|
|
|
Configuration
|
|
-------------
|
|
|
|
MysqlGTIDReplicationStrategy value added to ReplicationStategy option
|
|
for MySQL configuration.
|
|
|
|
New configuration option master_promotion_strategy added to MySQL
|
|
configuration with values as above.
|
|
|
|
Database
|
|
--------
|
|
|
|
No database impacts are envisioned.
|
|
|
|
|
|
Public API
|
|
----------
|
|
|
|
*Promote to Replica Source*
|
|
|
|
A new action will be added to the Trove REST API to allow a replica to
|
|
be promoted to be the master of its replication set::
|
|
|
|
POST http://127.0.0.1:8779/v1.0/<tenant id>/instance/<instance id>/action
|
|
{
|
|
*"promote_to_replica_source": null*
|
|
}
|
|
|
|
RESP: [200]
|
|
{
|
|
'date': '<date>',
|
|
'content-length': '<RESP BODY len>',
|
|
'content-type': 'application/json'
|
|
}
|
|
RESP BODY:
|
|
{
|
|
"instance": {
|
|
*"status": "PROMOTE",*
|
|
"updated": "2014-11-25T21:25:11",
|
|
"name": "m",
|
|
"links": [
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...",
|
|
"rel": "self"
|
|
},
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/instances\/...",
|
|
"rel": "bookmark"
|
|
}
|
|
],
|
|
"created": "2014-11-25T21:25:06",
|
|
"ip": [
|
|
"10.0.0.2"
|
|
],
|
|
"replicas": [
|
|
{
|
|
"id": "8e5710df-ef39-4201-a059-764d9091f079",
|
|
"links": [
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...",
|
|
"rel": "self"
|
|
},
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/instances\/...",
|
|
"rel": "bookmark"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"id": "fff6d8c5-9d05-4a00-ab58-d8954ec945a3",
|
|
"volume": {
|
|
"used": 0.13,
|
|
"size": 1
|
|
},
|
|
"flavor": {
|
|
"id": "7",
|
|
"links": [
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/flavors\/7",
|
|
"rel": "self"
|
|
},
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/flavors\/7",
|
|
"rel": "bookmark"
|
|
}
|
|
]
|
|
},
|
|
"datastore": {
|
|
"version": "5.5",
|
|
"type": "mysql"
|
|
}
|
|
}
|
|
}
|
|
|
|
A new CLI command will be added to invoke the
|
|
promote_to_replica_source API::
|
|
|
|
trove promote-to-replica-source <replica id>
|
|
|
|
*Eject Replica Source*
|
|
|
|
A new action will be added to the Trove REST API to allow a replica
|
|
source to be ejected from a replication set::
|
|
|
|
POST http://127.0.0.1:8779/v1.0/<tenant id>/instance/<instance id>/action
|
|
{
|
|
*"eject_replica_source": null*
|
|
}
|
|
|
|
RESP: [200]
|
|
{
|
|
'date': '<date>',
|
|
'content-length': '<RESP BODY len>',
|
|
'content-type': 'application/json'
|
|
}
|
|
RESP BODY:
|
|
{
|
|
"instance": {
|
|
*"status": "EJECT",*
|
|
"updated": "2014-11-25T21:25:11",
|
|
"name": "m",
|
|
"links": [
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...",
|
|
"rel": "self"
|
|
},
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/instances\/...",
|
|
"rel": "bookmark"
|
|
}
|
|
],
|
|
"created": "2014-11-25T21:25:06",
|
|
"ip": [
|
|
"10.0.0.2"
|
|
],
|
|
"replicas": [
|
|
{
|
|
"id": "8e5710df-ef39-4201-a059-764d9091f079",
|
|
"links": [
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...",
|
|
"rel": "self"
|
|
},
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/instances\/...",
|
|
"rel": "bookmark"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"id": "fff6d8c5-9d05-4a00-ab58-d8954ec945a3",
|
|
"volume": {
|
|
"used": 0.13,
|
|
"size": 1
|
|
},
|
|
"flavor": {
|
|
"id": "7",
|
|
"links": [
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/flavors\/7",
|
|
"rel": "self"
|
|
},
|
|
{
|
|
"href": "https:\/\/10.40.10.178:8779\/flavors\/7",
|
|
"rel": "bookmark"
|
|
}
|
|
]
|
|
},
|
|
"datastore": {
|
|
"version": "5.5",
|
|
"type": "mysql"
|
|
}
|
|
}
|
|
}
|
|
|
|
A new CLI command will be added to invoke the eject_replica_source
|
|
API::
|
|
|
|
trove eject-replica-source <replica source id>
|
|
|
|
|
|
*Trove Create Replica Count*
|
|
|
|
The Trove REST API for the create instance operation will be augmented
|
|
with a new field *replica_count* to specify the number of replicas to
|
|
be created from the indicated instance::
|
|
|
|
POST http://127.0.0.1:8779/v1.0/<tenant id>/instances
|
|
{
|
|
"instance": {
|
|
"volume": {"size": 1},
|
|
"flavorRef": "7",
|
|
"name": "s",
|
|
"replica_of": "<master id>",
|
|
*"replica_count": "<n>"*
|
|
}
|
|
}
|
|
|
|
RESP *unchanged*
|
|
|
|
An option will be added to the "trove create" CLI command to specify
|
|
the new replica count option::
|
|
|
|
trove create <name> <flavor id> --replica_count=<count> ...
|
|
|
|
|
|
Internal API
|
|
------------
|
|
|
|
promote_to_replica_source method added to taskmanager API.
|
|
eject_replica_source method added to taskmanager API.
|
|
|
|
Guest Agent
|
|
-----------
|
|
|
|
The implementation of this feature set will result in many additions
|
|
to the MySQL guest agent. There should be minimal impact to
|
|
pre-existing code, and there is not expected to be any impact on
|
|
backward compatibility of the APIs.
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
None.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
vgnbkr
|
|
|
|
Secondary assignee:
|
|
peterstac
|
|
|
|
Milestones
|
|
----------
|
|
|
|
Target Milestone for completion:
|
|
Kilo-2
|
|
|
|
Work Items
|
|
----------
|
|
|
|
====================== ========= =================
|
|
Work Item Assignee Scheduled Release
|
|
====================== ========= =================
|
|
GTID Support Morgan Kilo-3
|
|
Failover Morgan Kilo-3
|
|
Slave Count Peter Kilo-3
|
|
Incremental Snapshots Peter Kilo-3
|
|
====================== ========= =================
|
|
|
|
|
|
Dependencies
|
|
============
|
|
|
|
n/a
|
|
|
|
Testing
|
|
=======
|
|
|
|
The existing int-tests are believed to be sufficient for testing the
|
|
GTID replication changes, as there are no functionality changes, just
|
|
implementation changes.
|
|
|
|
New Int-Tests:
|
|
|
|
Promote to Master Positive
|
|
|
|
Create a new replication set of two sites. Attach floating ip
|
|
addresses to each instance. Execute the promote_to_replica_source
|
|
API call and verify that the master/slave relationships are
|
|
correctly changed, and that the floating ip addresses maintain
|
|
their affinity to master and slave.
|
|
|
|
Promote to Master Negative
|
|
|
|
Create a new replication set of two sites. Execute "service mysql
|
|
stop" on the master site. Verify that promote_to_replica_source
|
|
cannot be executed against the slave site.
|
|
|
|
Delete Master Positive
|
|
|
|
Create a new replication set of two sites. Attach floating ip
|
|
addresses to each instance. Execute "service mysql stop" on the
|
|
master to simulate the master site crashing. Execute the delete
|
|
API call against the master site. Ensure that the slave has been
|
|
promoted to master, a new slave has been added, and that the
|
|
floating ip addresses have been moved appropriately.
|
|
|
|
Replica Count
|
|
|
|
No int-test will be done for this feature due to the resource
|
|
requirements
|
|
|
|
Incremental Snapshots
|
|
|
|
No int-test will be done for this feature as there is no way to
|
|
verify that the restore was actually done from an incremental
|
|
backup rather than a full backup
|
|
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
User Guide
|
|
----------
|
|
|
|
* add section explaining manual failover, both via
|
|
promote-to-replica-source and via deletion of a failed master
|
|
* section on replication should be updated to document replica_count
|
|
option to "trove create"
|
|
|
|
CLI Reference
|
|
-------------
|
|
|
|
* add promote-to-replica-source command
|
|
* add eject-replica-source command
|
|
* update create command with replica_count
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
- https://etherpad.openstack.org/p/kilo-summit-trove-replication-v2
|
|
|