.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode Sections of this template were taken directly from the Nova spec template at: https://github.com/openstack/nova-specs/blob/master/specs/template.rst .. ======================= Trove Replication V2 ======================= Include the URL of your launchpad blueprint: https://blueprints.launchpad.net/trove/bp/replication-v2 The Juno release of Trove laid the foundation of Trove Replication support. The V1 version of replication focused on providing read-only slave replication in MySQL 5.5. For the V2 replication release for Kilo, replication will be extended to provide support for manual failover in MySQL replication leveraging the latest replication features of MySQL 5.6. Problem description =================== For the Kilo release of OpenStack, trove replication support will be extended to support manual failover when a replication master fails. Specifically, this means that a user can instruct Trove to demote a replication master and promote a slave to be the new master. For V2, manual promotion means that the user will be required to execute an action to cause failover - a component to detect failure and cause the failover to occur will not be within the scope of V2. Proposed change =============== Supported Features: * manual failover * master/slaves in different availability zones * automatic slave generation to replace slaves promoted to master * automatically generated slave will be created in the same az as the slave that was promoted to master * public ips assigned to deleted/demoted master will be transferred to new master * public ips of promoted slave will be transferred to new slave * GTIDs will be used to facilitate master promotion (Note: this limits feature set to MySQL 5.6 and later) * if a master site is reachable, a chosen slave may be promoted to master and the old master will be demoted to a slave. This operation will be done in such a way as to prevent the loss of data. This operation would be useful for resizing a master without downtime. * a master site may be deleted, in which case Trove will pick a slave to be promoted to master (see MASTER_PROMOTION_STRATEGY below) and a new slave will be generated to replace the promoted slave. If the master site in not reachable, it will be forcefully removed from Trove/Nova; this is how an unreachable master would be "failed over". * new master selection process on delete has following MASTER_PROMOTION_STRATEGY (CONF) switch: MOST_RECENT: the slave with the most recent updates is chosen as new master, PROXIMATE_AZ: slave IN MASTER's AZ with most recent updates is chosen as new master, PROXIMATE_REGION: slave IN MASTER's REGION with most recent update is chosen as new master. PROXIMATE_REGION will be the default (though for now equivalent to MOST_RECENT) and may be the only implemented option for V2. * replication from existing backup and incremental snapshot will be implemented * replica_count option will be added to create-instance to allow N slaves to be spun up from a given snapshot. All replicas from the given snapshot will have the same "create-instance" options. Features Not Supported: * automatic failover * region support * writable slaves * features related to the promotion of slaves to masters will not be supported by MySQL versions prior to 5.6 * replication_strategy per datastore - this could be implemented in Kilo via an independent blueprint * GTID based replication for MariaDB (binlog replication will not be tested for MariaDB, but should be compatible with MySQL) * host affinity/anti-affinity * dealing with "error transactions" created when updates are executed directly on slaves in conflict with changes on the master. Performing updates directly on slaves is not supported by Trove and slave sites will be put into "read only" mode. Replication V2 Components ------------------------- The V2 Replication feature will consist of several components: - Implement a new replication strategy to support GTID Based Replication in addition to Bin Log replication. - Manual failover from replication master - Replication configuration using incremental snapshots based on existing backups. - Creation of multiple slaves from master in single call Upgrade from Binlog Replication to GTID Based Replication ********************************************************* MySQL 5.6 introduced a new type of replication which is based on Global Transaction IDs (GTID). By assigning a GTID to each transaction, MySQL is able to simplify transaction coordination between masters and slaves, allowing for simpler and more reliable failover to a new master. This feature requires that the trove-integration project upgrade to use Ubuntu 14.04 and MySQL 5.6. A new Replication Strategy named "MysqlGTIDReplicationStrategy" will be created to support the new GTID based replication with MySQL 5.6 and later, and the existing Replication Strategy named "MysqlBinlogReplication" will continue to be supported for MySQL 5.5 but without support for the new features listed in this document. Manual Failover from Replication Master *************************************** It will be possible for a user to cause a slave to become the new master for replication by executing a trove command. For the V2 release of replication, no facility for detecting a master failure condition will be provided. To assist the user in minimizing data loss, there will be two different ways for the user to cause a slave to be promoted to master. If the user wishes to promote a slave to replace a master which is healthy and reachable, they will execute a new "promote_to_replica_source" function against a slave to promote it in place of the existing master; this function will coordinate with the master site to ensure that no data is lost. If a master site is unreachable, the user will use the "eject_replica_source" function to remove that instance from the replication set and the replication strategy will choose the slave with the most recent updates to promote to master; this operation may result in the loss of any transactions that were committed at the master site but not replicated to any of the slaves. Trove will not allow a reachable master site to be deleted as that would unnecessarily result in lost data. There will be no accomodation made to allow users or operators to "fix" slaves which have gotten out of sync with the master site. Instead, every effort will be made to configure replication so that the slave will not fall out of sync with the master. The following MySQL options will be set to ensure safe replication: *Master Options* * Binary logs will be configured for MIXED mode logging. This will allow statement based replication where it is safe to do so, and row based replication will be used where necessary. * The enforce_gtid_consistency option will be used to prevent statements which will conflict with the use of GTID replication. * When the Percona database is being used, the Percona enforce_storage_engine option will be used to restrict replication to the InnoDB storage engine. This is to prevent the use of MyISAM tables which could be corrupted during a crash recovery. *Slave Options* * Slave will execute in READ_ONLY mode to avoid transaction conflicts between master and slave. By default, users are not given root access to the database; if they choose to enable root access, they are assumed to be sufficiently advanced as to not execute operations on a slave which will disturb replication. * The slaves' relay log will be stored in a table in the database to provide transactional consistency between the statements executed against the database and the recording of the slave's position in executing the relay log. * Relay log recovery will be turned on to cause relay log recovery during mysql startup. relay_log_purge will be enabled in support for relay_log_recovery. Promotion of Slave to Master ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The user may select a slave to be promoted to be the new master of a replication set. This operation would consist of the following steps: #. Contact each slave, abort operation if any not reachable #. Make the old master read-only #. Detach old master's public IP #. Detach master candidate's public IP #. Record latest GTID of master #. For each slave (including master candidate) * Wait for slave to receive/apply master's latest GTID #. Set master candidate as replication master site #. For each remaining slave * Make instance slave of new master #. Make old master be slave of new master #. Assign master candidate's IP to old master (which is now slave) #. Make new master writable #. Assign old master's public IP to new master *Promote to Master API* To replace a healthy master site, the promote_to_replica_source API call will be added to the client and taskmanager APIs. Ejection of Master Site ^^^^^^^^^^^^^^^^^^^^^^^ If a replication master site is out of service, the user may choose to "eject" the instance from the replication set. Ejecting an unreachable instance which is a master for replication would result in one of its slaves being chosen to be promoted to be the new master site, and a new slave generated to fill out the replication set. The ejected master will be available for examination, but will no longer participate in replication. This operation would consist of the following steps: #. Abort operation if the master site can be contacted #. Contact each slave, abort operation if any not reachable #. Detach master's public IP #. Record master's Region/Zone #. Select master candidate (see Master Candidate Selection) #. Switch the master candidate from slave to master #. For each remaining slave * Connect slave to new master instance #. Mark new master as writable #. Attach master's public IP to new master #. Create new slave in same Region/Zone as old master #. Assign master candidate's public IP to new slave *Master Candidate Selection* When selecting a slave to be promoted to master to replace an unreachable master site, the algorithm for choosing the master candidate will be determined by the value of the MASTER_PROMOTION_STRATEGY configuration option of the Taskmanager Config (not datastore specific). The possible values for this option are outlined below: ================ ================================================= Strategy Description ================ ================================================= MOST_RECENT The slave with the highest GTID is chosen as the master candidate PROXIMATE_AZ The slave with the highest GTID in the same Availability Zone as the old master is chosen PROXIMATE_REGION The slave with the highest GTID in the same Region as the old master is chosen ================ ================================================= The PROXIMATE_REGION setting will be the default as this will ensure that the new master site will be in the same region as the old master; for the Kilo release, this will be equivalent to the MOST_RECENT option (and may be implemented as such) as Region support is not implemented in Trove. Incremental Snapshots ********************* To improve the performance of slave creation, the default action will be to take the most recent backup (full or incremental) and create an incremental backup to be used for the replication snapshot. If no previous backup can be found, a full backup will be created to include in the replication snapshot. Should the "backup" option be specified in addition to the "replica_of" option, an incremental backup will be performed from the indicated backup. Multiple Slave Creation *********************** A replica_count option will be added to support the creation of multiple slaves from a single replication snapshot. * a replica_count option will be added to the ``trove create`` command * a replica_count parameter will be added to the create_instance taskmanager ReST API * the taskmanager FreshInstanceTasks.create_instance method will iteratively create the specified number of slaves from a single replication snapshot (the implementor is free to implement slave creation in parallel if time permits, and should investigate doing so, but it is not a requirement for V2) Configuration ------------- MysqlGTIDReplicationStrategy value added to ReplicationStategy option for MySQL configuration. New configuration option master_promotion_strategy added to MySQL configuration with values as above. Database -------- No database impacts are envisioned. Public API ---------- *Promote to Replica Source* A new action will be added to the Trove REST API to allow a replica to be promoted to be the master of its replication set:: POST http://127.0.0.1:8779/v1.0//instance//action { *"promote_to_replica_source": null* } RESP: [200] { 'date': '', 'content-length': '', 'content-type': 'application/json' } RESP BODY: { "instance": { *"status": "PROMOTE",* "updated": "2014-11-25T21:25:11", "name": "m", "links": [ { "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...", "rel": "self" }, { "href": "https:\/\/10.40.10.178:8779\/instances\/...", "rel": "bookmark" } ], "created": "2014-11-25T21:25:06", "ip": [ "10.0.0.2" ], "replicas": [ { "id": "8e5710df-ef39-4201-a059-764d9091f079", "links": [ { "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...", "rel": "self" }, { "href": "https:\/\/10.40.10.178:8779\/instances\/...", "rel": "bookmark" } ] } ], "id": "fff6d8c5-9d05-4a00-ab58-d8954ec945a3", "volume": { "used": 0.13, "size": 1 }, "flavor": { "id": "7", "links": [ { "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/flavors\/7", "rel": "self" }, { "href": "https:\/\/10.40.10.178:8779\/flavors\/7", "rel": "bookmark" } ] }, "datastore": { "version": "5.5", "type": "mysql" } } } A new CLI command will be added to invoke the promote_to_replica_source API:: trove promote-to-replica-source *Eject Replica Source* A new action will be added to the Trove REST API to allow a replica source to be ejected from a replication set:: POST http://127.0.0.1:8779/v1.0//instance//action { *"eject_replica_source": null* } RESP: [200] { 'date': '', 'content-length': '', 'content-type': 'application/json' } RESP BODY: { "instance": { *"status": "EJECT",* "updated": "2014-11-25T21:25:11", "name": "m", "links": [ { "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...", "rel": "self" }, { "href": "https:\/\/10.40.10.178:8779\/instances\/...", "rel": "bookmark" } ], "created": "2014-11-25T21:25:06", "ip": [ "10.0.0.2" ], "replicas": [ { "id": "8e5710df-ef39-4201-a059-764d9091f079", "links": [ { "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/instances\/...", "rel": "self" }, { "href": "https:\/\/10.40.10.178:8779\/instances\/...", "rel": "bookmark" } ] } ], "id": "fff6d8c5-9d05-4a00-ab58-d8954ec945a3", "volume": { "used": 0.13, "size": 1 }, "flavor": { "id": "7", "links": [ { "href": "https:\/\/10.40.10.178:8779\/v1.0\/...\/flavors\/7", "rel": "self" }, { "href": "https:\/\/10.40.10.178:8779\/flavors\/7", "rel": "bookmark" } ] }, "datastore": { "version": "5.5", "type": "mysql" } } } A new CLI command will be added to invoke the eject_replica_source API:: trove eject-replica-source *Trove Create Replica Count* The Trove REST API for the create instance operation will be augmented with a new field *replica_count* to specify the number of replicas to be created from the indicated instance:: POST http://127.0.0.1:8779/v1.0//instances { "instance": { "volume": {"size": 1}, "flavorRef": "7", "name": "s", "replica_of": "", *"replica_count": ""* } } RESP *unchanged* An option will be added to the "trove create" CLI command to specify the new replica count option:: trove create --replica_count= ... Internal API ------------ promote_to_replica_source method added to taskmanager API. eject_replica_source method added to taskmanager API. Guest Agent ----------- The implementation of this feature set will result in many additions to the MySQL guest agent. There should be minimal impact to pre-existing code, and there is not expected to be any impact on backward compatibility of the APIs. Alternatives ------------ None. Implementation ============== Assignee(s) ----------- Primary assignee: vgnbkr Secondary assignee: peterstac Milestones ---------- Target Milestone for completion: Kilo-2 Work Items ---------- ====================== ========= ================= Work Item Assignee Scheduled Release ====================== ========= ================= GTID Support Morgan Kilo-3 Failover Morgan Kilo-3 Slave Count Peter Kilo-3 Incremental Snapshots Peter Kilo-3 ====================== ========= ================= Dependencies ============ n/a Testing ======= The existing int-tests are believed to be sufficient for testing the GTID replication changes, as there are no functionality changes, just implementation changes. New Int-Tests: Promote to Master Positive Create a new replication set of two sites. Attach floating ip addresses to each instance. Execute the promote_to_replica_source API call and verify that the master/slave relationships are correctly changed, and that the floating ip addresses maintain their affinity to master and slave. Promote to Master Negative Create a new replication set of two sites. Execute "service mysql stop" on the master site. Verify that promote_to_replica_source cannot be executed against the slave site. Delete Master Positive Create a new replication set of two sites. Attach floating ip addresses to each instance. Execute "service mysql stop" on the master to simulate the master site crashing. Execute the delete API call against the master site. Ensure that the slave has been promoted to master, a new slave has been added, and that the floating ip addresses have been moved appropriately. Replica Count No int-test will be done for this feature due to the resource requirements Incremental Snapshots No int-test will be done for this feature as there is no way to verify that the restore was actually done from an incremental backup rather than a full backup Documentation Impact ==================== User Guide ---------- * add section explaining manual failover, both via promote-to-replica-source and via deletion of a failed master * section on replication should be updated to document replica_count option to "trove create" CLI Reference ------------- * add promote-to-replica-source command * add eject-replica-source command * update create command with replica_count References ========== - https://etherpad.openstack.org/p/kilo-summit-trove-replication-v2