trove-specs/specs/newton/instance-upgrade.rst
Morgan Jones b4c9872358 Instance Upgrade
A proposal for upgrading running Trove instances to a new image.
This would support upgrading the Trove guestagent, the database,
and/or the underlying OS.

APIImpact
DOCImpact

Change-Id: If1ca9b9bf944c3ce3b0aaddd28f32f76a9a9500f
2016-06-06 22:10:57 +00:00

340 lines
11 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
Sections of this template were taken directly from the Nova spec
template at:
https://github.com/openstack/nova-specs/blob/master/specs/template.rst
..
================
Instance Upgrade
================
Trove needs to address how guest instances will update their database
software, the Trove guest agent, and the underlying operating system.
This blueprint proposes a technique which will allow guest instances
to be updated by migrating the data volume to a new instance based on
an updated guest image.
Launchpad Blueprint:
https://blueprints.launchpad.net/trove/+spec/instance-upgrade
Problem Description
===================
Trove creates a database instance by creating a nova instance from a
provided operating system image and attaching a volume to the instance
to hold the user's data. By using a known image with the database
software preconfigured, an operator can ensure that the user will
receive a known version of database software, configured with an
operating system certified to work with that version of database
software. This technique eliminates many of the ambiguities of
provisioning applications, allowing an operator to provide a published
level of service to the database user.
Unfortunately, running database instances aren't static over their
lifetimes. Over time, it may be necessary for an instance to be
upgraded with security patches, new database software, or updates to
the Trove guest agent. The cloud operator and the end user must be
provided with a way to migrate their databases to new operating
environments.
In summary, there are a number of changes that a database instance
might be subjected to over time, and this spec addresses applying
these types of changes. These include (but are not necessarily
limited to):
* operating system security patches
* database security patches
* database version upgrades
* Trove guest agent upgrades
* OpenStack version upgrades
* operating system upgrades
Proposed Change
===============
The traditional approach to system upgrades is to simply execute
system utilities such as "apt-get upgrade" to upgrade all or parts of a
running system. For large installations, this technique could be
automated by tools such as Ansible or Salt to apply the changes
to many systems. This approach isn't ideal for
database-as-a-service as there is little control over which components
are upgraded, and to which version; an update could leave a database
instance in a state where the database does not operate properly, or
could even cause the data itself to become corrupted.
It is crucial that Trove develop a method to update instances which
provides a high probability of leaving the database in a safe,
consistent, and usable state.
To ensure a safe, consistent state, this blueprint proposes a
technique for upgrading an instance which transitions it from one
known image to a new known image running updated software. The full
state of each instance state will be known. At its core, this
technique will involve detaching the volume containing the
database data and attaching it to a new instance based on a different
(updated) image.
The bulk of the "variable" data on a Trove instance exists on the
attached volume where the database stores its data. There is also
some operating system state which must be preserved during this
transition, consisting primarily of networking data (the instance name
and ip address), some Trove configuration data (guest_info.conf and
trove-guestagent.conf), and the database configuration data (such as
/etc/mysql).
The core of this technique will be based on the rebuild API provided
by Nova. The Nova rebuild function upgrades a running instance to a
new image, ensuring that the networking configuration is applied to
the new instance. The newly created instance will come up with the
same network configuration (ip, hostname, and Neutron configuration)
as the instance from which it was created. Trove guest functionality
will be added to allow the guest to capture additional state about the
instance (primarily database configuration), and re-apply that state
to the system after rebuild. The result will be a running,
connectable database provisioned on a new instance running updated
software.
The selection of image to upgrade to will be based on the current
datastore_version mechanism. Essentially, the user will be allowed to
upgrade an instance to a newer datastore_version of its datastore.
Safeguards will be put in place to ensure that the user is selecting
an appropriate datastore_version to upgrade to; for example, the user
would only be allowed to upgrade a mysql instance from
datastore_version 5.5 to 5.6, not to 5.0 or 6.7.
Configuration
-------------
No configuration changes are planned.
Database
--------
A future blueprint will outline the changes necessary to support the
upgrade constraints on datastore_versions; this specification will
be limited to implementing the upgrade functionality without the
constraints.
Public API
----------
A new REST API call will be added in support of this functionality.
REST API: PATCH /instances/<instance id>
REST body:
.. code-block:: json
{
"instance": {
"datastore_version": "<datastore_version_uuid>"
}
}
REST result:
.. code-block:: json
{}
REST return codes:
202 - Accepted.
400 - BadRequest. Server could not understand request.
404 - Not Found. <datastore_version_id> not found.
Public API Security
-------------------
There are no envisioned security implications.
Python API
----------
A new method will be implemented in the trove API. This method will
upgrade a instance to the image specified by the provided
datastore_version.
.. code-block:: python
upgrade(instance, datastore_version)
:instance: the instance to upgrade
:datastore_version: the datastore version, or its id, to which the
trove instance will be upgraded
CLI (python-troveclient)
------------------------
A new CLI call will be implemented. This new call will upgrade a
instance to the image specified by the provided datastore_version.
.. code-block:: bash
trove upgrade <instance> <datastore_version>
:instance: the instance to upgrade
:datastore_version: the datastore version to which the instance will
be upgraded
Internal API
------------
A new method will be added in support of this functionality.
.. code-block:: python
def upgrade(self, instance_id, datastore_version_id):
LOG.debug("Making async call to upgrade guest to %s "
% datastore_version_id)
cctxt = self.client.prepare(version=self.version_cap)
cctxt.cast(self.context, "upgrade", instance_id=instance_id,
datastore_version_id=datastore_version_id)
Guest Agent
-----------
Two new operations will be implemented in the guest agent API. It is
expected that each datastore will (optionally) override these methods
to implement any needed functionality before and after the image
upgrade proceeds. Mysql, for example, would use these methods to copy
its configuration data from /etc/mysql to the data volume before the
image upgrade, copying them back and restarting the mysql server after
the image upgrade.
It is expected that the pre_upgrade method will validate that it is
possible to perform the requested upgrade; for example, there may be a
configuration override specified for an instance which is not
compatible with the new datastore_version. In the event that an
upgrade cannot be performed, the pre_upgrade method will raise an
exception - any exception will cause the taskmanager to abort the
upgrade process for that instance.
.. code-block:: python
def pre_upgrade(self):
"""Prepare the guest for upgrade."""
LOG.debug("Sending the call to prepare the guest for upgrade.")
return self._call("pre_upgrade", AGENT_HIGH_TIMEOUT, self.version_cap)
def post_upgrade(self, upgrade_info):
"""Recover the guest after upgrading the guest's image."""
LOG.debug("Recover the guest after upgrading the guest's image.")
self._call("post_upgrade", AGENT_HIGH_TIMEOUT, self.version_cap)
Two mechanisms are available for the pre_upgrade to communicate
information to the post_upgrade. First, the value returned from the
pre_upgrade step is passed to the post_upgrade step as the
*upgrade_info* parameter; the guest is free to use this value as it
sees fit, though it would typically be used to exchange a dictionary
of configuration data. The second mechanism the guest will use to
pass information beween phases is by storing configuration data on the
data volume; this would typically be used for large configuration
files.
It is expected that if the Nova rebuild command does not return an
exception before beginning the rebuild process, the rebuild will
succeed. Should the rebuild fail, the instance will be moved to an
ERROR state and no attempt will be made to recover the instance;
operator intervention will be required. Should the post_upgrade
return an exception, the instance will be transitioned to an ERROR
state.
Alternatives
------------
The alternative to upgrading images is to upgrade via apt-get or yum
in the running instance. While this is the normal procedure for
upgrading instances, it has undesired implications for Trove. Trove
aims to provide a known service level, but upgrading a running
instance has the potential to leave an instance in an unknown state.
There could also be issues around installations which don't allow
trove instances to access the internet directly as trove would have to
provide some mechanism for delivering all of the deb/rpm packages
required to update an instance.
Dashboard Impact (UX)
=====================
TBD
Implementation
==============
Assignee(s)
-----------
Primary assignee:
6-morgan
Milestones
----------
Target Milestone for completion:
Newton
Work Items
----------
This feature has been already been prototyped. The work required to
bring the prototype in line with this spec is:
* The prototype uses image id rather than datastore_version
* add trove.upgrade.start/end/error notifications
* implement unit tests
* investigate if int-tests need to be updated for this feature
* document the upgrade procedure
Upgrade Implications
====================
Dependencies
============
n/a
Testing
=======
Unit tests will be added as appropriate.
An int-test will be added that tests upgrading an instance to the
image that it is already running.
Documentation Impact
====================
The "trove upgrade" command, and its corresponding python API, will
need to be documented.
References
==========
n/a
Appendix
========
n/a