From 7b08a1b4491dbf15462a443298e80f3d1e685d42 Mon Sep 17 00:00:00 2001 From: Morgan Jones Date: Fri, 30 Jan 2015 13:17:45 -0800 Subject: [PATCH] Trove Replication V2 Specification outlining the design of Replication Features to be added to the Trove Kilo Release. Change-Id: If0a14416eaecc1ed5e78b3518ee4ed3fe6422a65 Implements: blueprint replication-v2 --- specs/replication-v2.rst | 619 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 619 insertions(+) create mode 100644 specs/replication-v2.rst diff --git a/specs/replication-v2.rst b/specs/replication-v2.rst new file mode 100644 index 0000000..957dc92 --- /dev/null +++ b/specs/replication-v2.rst @@ -0,0 +1,619 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + + Sections of this template were taken directly from the Nova spec + template at: + https://github.com/openstack/nova-specs/blob/master/specs/template.rst +.. + +======================= +Trove Replication V2 +======================= + +Include the URL of your launchpad blueprint: + +https://blueprints.launchpad.net/trove/bp/replication-v2 + +The Juno release of Trove laid the foundation of Trove Replication +support. The V1 version of replication focused on providing read-only +slave replication in MySQL 5.5. For the V2 replication release for +Kilo, replication will be extended to provide support for manual +failover in MySQL replication leveraging the latest replication +features of MySQL 5.6. + +Problem description +=================== + +For the Kilo release of OpenStack, trove replication support will be +extended to support manual failover when a replication master fails. +Specifically, this means that a user can instruct Trove to demote a +replication master and promote a slave to be the new master. For V2, +manual promotion means that the user will be required to execute an +action to cause failover - a component to detect failure and cause the +failover to occur will not be within the scope of V2. + + +Proposed change +=============== + +Supported Features: + +* manual failover +* master/slaves in different availability zones +* automatic slave generation to replace slaves promoted to master +* automatically generated slave will be created in the same az as the + slave that was promoted to master +* public ips assigned to deleted/demoted master will be transferred to + new master +* public ips of promoted slave will be transferred to new slave +* GTIDs will be used to facilitate master promotion (Note: this limits + feature set to MySQL 5.6 and later) +* if a master site is reachable, a chosen slave may be promoted to + master and the old master will be demoted to a slave. This + operation will be done in such a way as to prevent the loss of data. + This operation would be useful for resizing a master without + downtime. +* a master site may be deleted, in which case Trove will pick a slave + to be promoted to master (see MASTER_PROMOTION_STRATEGY below) and a + new slave will be generated to replace the promoted slave. If the + master site in not reachable, it will be forcefully removed from + Trove/Nova; this is how an unreachable master would be "failed + over". +* new master selection process on delete has following + MASTER_PROMOTION_STRATEGY (CONF) switch: MOST_RECENT: the slave with + the most recent updates is chosen as new master, PROXIMATE_AZ: slave + IN MASTER's AZ with most recent updates is chosen as new master, + PROXIMATE_REGION: slave IN MASTER's REGION with most recent update + is chosen as new master. PROXIMATE_REGION will be the default + (though for now equivalent to MOST_RECENT) and may be the only + implemented option for V2. +* replication from existing backup and incremental snapshot will be + implemented +* replica_count option will be added to create-instance to allow N + slaves to be spun up from a given snapshot. All replicas from the + given snapshot will have the same "create-instance" options. + +Features Not Supported: + +* automatic failover +* region support +* writable slaves +* features related to the promotion of slaves to masters will not be + supported by MySQL versions prior to 5.6 +* replication_strategy per datastore - this could be implemented in + Kilo via an independent blueprint +* GTID based replication for MariaDB (binlog replication will not be + tested for MariaDB, but should be compatible with MySQL) +* host affinity/anti-affinity +* dealing with "error transactions" created when updates are executed + directly on slaves in conflict with changes on the master. + Performing updates directly on slaves is not supported by Trove and + slave sites will be put into "read only" mode. + +Replication V2 Components +------------------------- + +The V2 Replication feature will consist of several components: + +- Implement a new replication strategy to support GTID Based + Replication in addition to Bin Log replication. +- Manual failover from replication master +- Replication configuration using incremental snapshots based on + existing backups. +- Creation of multiple slaves from master in single call + +Upgrade from Binlog Replication to GTID Based Replication +********************************************************* + +MySQL 5.6 introduced a new type of replication which is based on +Global Transaction IDs (GTID). By assigning a GTID to each +transaction, MySQL is able to simplify transaction coordination +between masters and slaves, allowing for simpler and more reliable +failover to a new master. + +This feature requires that the trove-integration project upgrade to +use Ubuntu 14.04 and MySQL 5.6. + +A new Replication Strategy named "MysqlGTIDReplicationStrategy" will +be created to support the new GTID based replication with MySQL 5.6 +and later, and the existing Replication Strategy named +"MysqlBinlogReplication" will continue to be supported for MySQL 5.5 +but without support for the new features listed in this document. + + +Manual Failover from Replication Master +*************************************** + +It will be possible for a user to cause a slave to become the new +master for replication by executing a trove command. For the V2 +release of replication, no facility for detecting a master failure +condition will be provided. + +To assist the user in minimizing data loss, there will be two +different ways for the user to cause a slave to be promoted to master. +If the user wishes to promote a slave to replace a master which is +healthy and reachable, they will execute a new +"promote_to_replica_source" function against a slave to promote it in +place of the existing master; this function will coordinate with the +master site to ensure that no data is lost. If a master site is +unreachable, the user will use the "eject_replica_source" function to +remove that instance from the replication set and the replication +strategy will choose the slave with the most recent updates to promote +to master; this operation may result in the loss of any transactions +that were committed at the master site but not replicated to any of +the slaves. Trove will not allow a reachable master site to be +deleted as that would unnecessarily result in lost data. + +There will be no accomodation made to allow users or operators to +"fix" slaves which have gotten out of sync with the master site. +Instead, every effort will be made to configure replication so that +the slave will not fall out of sync with the master. The following +MySQL options will be set to ensure safe replication: + +*Master Options* + +* Binary logs will be configured for MIXED mode logging. This will + allow statement based replication where it is safe to do so, and row + based replication will be used where necessary. +* The enforce_gtid_consistency option will be used to prevent + statements which will conflict with the use of GTID replication. +* When the Percona database is being used, the Percona + enforce_storage_engine option will be used to restrict replication + to the InnoDB storage engine. This is to prevent the use of MyISAM + tables which could be corrupted during a crash recovery. + +*Slave Options* + +* Slave will execute in READ_ONLY mode to avoid transaction conflicts + between master and slave. By default, users are not given root + access to the database; if they choose to enable root access, they + are assumed to be sufficiently advanced as to not execute operations + on a slave which will disturb replication. +* The slaves' relay log will be stored in a table in the database to + provide transactional consistency between the statements executed + against the database and the recording of the slave's position in + executing the relay log. +* Relay log recovery will be turned on to cause relay log recovery + during mysql startup. relay_log_purge will be enabled in support + for relay_log_recovery. + +Promotion of Slave to Master +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The user may select a slave to be promoted to be the new master of a +replication set. This operation would consist of the following steps: + +#. Contact each slave, abort operation if any not reachable +#. Make the old master read-only +#. Detach old master's public IP +#. Detach master candidate's public IP +#. Record latest GTID of master +#. For each slave (including master candidate) + + * Wait for slave to receive/apply master's latest GTID +#. Set master candidate as replication master site +#. For each remaining slave + + * Make instance slave of new master +#. Make old master be slave of new master +#. Assign master candidate's IP to old master (which is now slave) +#. Make new master writable +#. Assign old master's public IP to new master + +*Promote to Master API* + +To replace a healthy master site, the promote_to_replica_source API +call will be added to the client and taskmanager APIs. + +Ejection of Master Site +^^^^^^^^^^^^^^^^^^^^^^^ + +If a replication master site is out of service, the user may choose to +"eject" the instance from the replication set. Ejecting an +unreachable instance which is a master for replication would result in +one of its slaves being chosen to be promoted to be the new master +site, and a new slave generated to fill out the replication set. The +ejected master will be available for examination, but will no longer +participate in replication. This operation would consist of the +following steps: + +#. Abort operation if the master site can be contacted +#. Contact each slave, abort operation if any not reachable +#. Detach master's public IP +#. Record master's Region/Zone +#. Select master candidate (see Master Candidate Selection) +#. Switch the master candidate from slave to master +#. For each remaining slave + + * Connect slave to new master instance +#. Mark new master as writable +#. Attach master's public IP to new master +#. Create new slave in same Region/Zone as old master +#. Assign master candidate's public IP to new slave + +*Master Candidate Selection* + +When selecting a slave to be promoted to master to replace an +unreachable master site, the algorithm for choosing the master +candidate will be determined by the value of the +MASTER_PROMOTION_STRATEGY configuration option of the Taskmanager +Config (not datastore specific). The possible values for this option +are outlined below: + +================ ================================================= +Strategy Description +================ ================================================= +MOST_RECENT The slave with the highest GTID is chosen as the + master candidate +PROXIMATE_AZ The slave with the highest GTID in the same + Availability Zone as the old master is chosen +PROXIMATE_REGION The slave with the highest GTID in the same + Region as the old master is chosen +================ ================================================= + +The PROXIMATE_REGION setting will be the default as this will ensure +that the new master site will be in the same region as the old master; +for the Kilo release, this will be equivalent to the MOST_RECENT +option (and may be implemented as such) as Region support is not +implemented in Trove. + + +Incremental Snapshots +********************* + +To improve the performance of slave creation, the default action will +be to take the most recent backup (full or incremental) and create an +incremental backup to be used for the replication snapshot. If no +previous backup can be found, a full backup will be created to include +in the replication snapshot. Should the "backup" option be specified +in addition to the "replica_of" option, an incremental backup will be +performed from the indicated backup. + + +Multiple Slave Creation +*********************** + +A replica_count option will be added to support the creation of multiple +slaves from a single replication snapshot. + +* a replica_count option will be added to the ``trove create`` command +* a replica_count parameter will be added to the create_instance + taskmanager ReST API +* the taskmanager FreshInstanceTasks.create_instance method will + iteratively create the specified number of slaves from a single + replication snapshot (the implementor is free to implement slave + creation in parallel if time permits, and should investigate doing + so, but it is not a requirement for V2) + +Configuration +------------- + +MysqlGTIDReplicationStrategy value added to ReplicationStategy option +for MySQL configuration. + +New configuration option master_promotion_strategy added to MySQL +configuration with values as above. + +Database +-------- + +No database impacts are envisioned. + + +Public API +---------- + +*Promote to Replica Source* + +A new action will be added to the Trove REST API to allow a replica to +be promoted to be the master of its replication set:: + + POST http://127.0.0.1:8779/v1.0//instance//action + { + *"promote_to_replica_source": null* + } + + RESP: [200] + { + 'date': '', + 'content-length': '', + 'content-type': 'application/json' + } + RESP BODY: + { + "instance": { + *"status": "PROMOTE",* + "updated": "2014-11-25T21:25:11", + "name": "m", + "links": [ + { + "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...", + "rel": "self" + }, + { + "href": "https:\/\/10.40.10.178:8779\/instances\/...", + "rel": "bookmark" + } + ], + "created": "2014-11-25T21:25:06", + "ip": [ + "10.0.0.2" + ], + "replicas": [ + { + "id": "8e5710df-ef39-4201-a059-764d9091f079", + "links": [ + { + "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...", + "rel": "self" + }, + { + "href": "https:\/\/10.40.10.178:8779\/instances\/...", + "rel": "bookmark" + } + ] + } + ], + "id": "fff6d8c5-9d05-4a00-ab58-d8954ec945a3", + "volume": { + "used": 0.13, + "size": 1 + }, + "flavor": { + "id": "7", + "links": [ + { + "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/flavors\/7", + "rel": "self" + }, + { + "href": "https:\/\/10.40.10.178:8779\/flavors\/7", + "rel": "bookmark" + } + ] + }, + "datastore": { + "version": "5.5", + "type": "mysql" + } + } + } + +A new CLI command will be added to invoke the +promote_to_replica_source API:: + + trove promote-to-replica-source + +*Eject Replica Source* + +A new action will be added to the Trove REST API to allow a replica +source to be ejected from a replication set:: + + POST http://127.0.0.1:8779/v1.0//instance//action + { + *"eject_replica_source": null* + } + + RESP: [200] + { + 'date': '', + 'content-length': '', + 'content-type': 'application/json' + } + RESP BODY: + { + "instance": { + *"status": "EJECT",* + "updated": "2014-11-25T21:25:11", + "name": "m", + "links": [ + { + "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...", + "rel": "self" + }, + { + "href": "https:\/\/10.40.10.178:8779\/instances\/...", + "rel": "bookmark" + } + ], + "created": "2014-11-25T21:25:06", + "ip": [ + "10.0.0.2" + ], + "replicas": [ + { + "id": "8e5710df-ef39-4201-a059-764d9091f079", + "links": [ + { + "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...", + "rel": "self" + }, + { + "href": "https:\/\/10.40.10.178:8779\/instances\/...", + "rel": "bookmark" + } + ] + } + ], + "id": "fff6d8c5-9d05-4a00-ab58-d8954ec945a3", + "volume": { + "used": 0.13, + "size": 1 + }, + "flavor": { + "id": "7", + "links": [ + { + "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/flavors\/7", + "rel": "self" + }, + { + "href": "https:\/\/10.40.10.178:8779\/flavors\/7", + "rel": "bookmark" + } + ] + }, + "datastore": { + "version": "5.5", + "type": "mysql" + } + } + } + +A new CLI command will be added to invoke the eject_replica_source +API:: + + trove eject-replica-source + + +*Trove Create Replica Count* + +The Trove REST API for the create instance operation will be augmented +with a new field *replica_count* to specify the number of replicas to +be created from the indicated instance:: + + POST http://127.0.0.1:8779/v1.0//instances + { + "instance": { + "volume": {"size": 1}, + "flavorRef": "7", + "name": "s", + "replica_of": "", + *"replica_count": ""* + } + } + + RESP *unchanged* + +An option will be added to the "trove create" CLI command to specify +the new replica count option:: + + trove create --replica_count= ... + + +Internal API +------------ + +promote_to_replica_source method added to taskmanager API. +eject_replica_source method added to taskmanager API. + +Guest Agent +----------- + +The implementation of this feature set will result in many additions +to the MySQL guest agent. There should be minimal impact to +pre-existing code, and there is not expected to be any impact on +backward compatibility of the APIs. + +Alternatives +------------ + +None. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + vgnbkr + +Secondary assignee: + peterstac + +Milestones +---------- + +Target Milestone for completion: + Kilo-2 + +Work Items +---------- + +====================== ========= ================= +Work Item Assignee Scheduled Release +====================== ========= ================= +GTID Support Morgan Kilo-3 +Failover Morgan Kilo-3 +Slave Count Peter Kilo-3 +Incremental Snapshots Peter Kilo-3 +====================== ========= ================= + + +Dependencies +============ + +n/a + +Testing +======= + +The existing int-tests are believed to be sufficient for testing the +GTID replication changes, as there are no functionality changes, just +implementation changes. + +New Int-Tests: + +Promote to Master Positive + + Create a new replication set of two sites. Attach floating ip + addresses to each instance. Execute the promote_to_replica_source + API call and verify that the master/slave relationships are + correctly changed, and that the floating ip addresses maintain + their affinity to master and slave. + +Promote to Master Negative + + Create a new replication set of two sites. Execute "service mysql + stop" on the master site. Verify that promote_to_replica_source + cannot be executed against the slave site. + +Delete Master Positive + + Create a new replication set of two sites. Attach floating ip + addresses to each instance. Execute "service mysql stop" on the + master to simulate the master site crashing. Execute the delete + API call against the master site. Ensure that the slave has been + promoted to master, a new slave has been added, and that the + floating ip addresses have been moved appropriately. + +Replica Count + + No int-test will be done for this feature due to the resource + requirements + +Incremental Snapshots + + No int-test will be done for this feature as there is no way to + verify that the restore was actually done from an incremental + backup rather than a full backup + + +Documentation Impact +==================== + +User Guide +---------- + +* add section explaining manual failover, both via + promote-to-replica-source and via deletion of a failed master +* section on replication should be updated to document replica_count + option to "trove create" + +CLI Reference +------------- + +* add promote-to-replica-source command +* add eject-replica-source command +* update create command with replica_count + + +References +========== + +- https://etherpad.openstack.org/p/kilo-summit-trove-replication-v2 +