trove-specs/specs/mitaka/pgsql-incremental-backup.rst
Peter Stachowski f200cf24df Add 'Dashboard Impact' to default spec
Added a new Dashboard Impact section to the Trove spec default
document to track the UX changes for new features.

Updated existing specs to have the section (so they pass py27).

Change-Id: I8981aa8c577b46fea894dd531a98d283880ea9fc
2016-01-06 19:11:11 +00:00

308 lines
8.6 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
Sections of this template were taken directly from the Nova spec
template at:
https://github.com/openstack/nova-specs/blob/master/specs/template.rst
..
=========================================
PostgreSQL Incremental Backup and Restore
=========================================
.. If section numbers are desired, unindent this
.. sectnum::
.. If a TOC is desired, unindent this
.. contents::
Trove currently only supports full backup and restore with the PostgreSQL guest
agent.
Launchpad Blueprint:
https://blueprints.launchpad.net/trove/+spec/postgresql-incremental-backup
Problem Description
===================
Currently, backups in PostgreSQL are done with the ``pg_dump`` and
``pg_restore`` tools, which produce efficient, logical backups. Unfortunately,
they cannot be used as the basis for incremental backups or point-in-time
recovery.
Proposed Change
===============
PostgreSQL Logging
------------------
In contrast to MySQL, PostgreSQL uses the same foundation for recovery,
replication and incremental backups. Central to this is the write-ahead log
(WAL) [1]_. With each update to the database, a WAL entry is created. The WAL
contents are periodically checkpointed to the main data files in the database
directory and are eventually purged depending on the system configuration.
In the event of a system crash, recovery proceeds by replaying the contents of
the WAL since the last checkpoint. This same mechanism is used, in a
"continuous" sense, for replication, and up until a specific WAL entry, for
point-in-time recovery [2]_.
So in order to support incremental backup, two parts are required:
* A base backup, a snapshot of the filesystem contents of the pgsql data
directory
* The WAL files written since the previous backup up to the desired point in
time
Base Backups
~~~~~~~~~~~~
Base backups are taken with the ``pg_basebackup`` filesystem-level backup tool
[3]_. ``pg_basebackup`` creates a copy of the binary files in the PostgreSQL
cluster data directory and enough WAL segments to allow the database to be
brought back to a consistent state. Associated with each backup is a log
location, normally indicated by the WAL file name and the position inside the
file.
Point-in-time Recovery and WAL Archiving
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PostgreSQL can be made to support Trove's notion of incremental backup and
restore by bootstrapping the recovery system. To restore an incremental backup
from a previous backup, in PostgreSQL, is to replay the WAL entries to a
designated point in time -- all that is required is the most recent base
backup, and all WAL files that were written since.
During normal operation of the database, as WAL data is written and then
checkpointed to the data files, unneeded files are purged. PostgreSQL can also
be configured to automatically archive WAL files, which can then be used for
the purposes of point-in-time recovery.
PostgreSQL has two important configuration parameters that manage WAL archiving
and WAL retrieval for recovery: ``archive_command`` and ``recovery_command``.
The parameters allow a great deal of flexibility in how WAL files are handled
-- all that is required is that the command returns 0 only in the event that
the file was successfully transferred. A simple example::
archive_command = 'test ! -f /mnt/arch/%f && cp %p /mnt/arch/%f'
The above command checks for the existence of the file in the archive directory
/mnt/arch (to avoid overwritting an existing file) and then copies it in.
Similarly, for recovery::
restore_command = 'cp /mnt/arch/%f "%p"'
The above commands assume that ``/mnt/arch`` is accessible and, for recovery,
contains the appropriate WAL files.
In principle ``archive_command`` can be anything, with a few important caveats:
* The speed of the archive_command must be fast enough to keep up with the
on-going generation of WAL files
* The restore command must be able to reverse whatever operation is applied
to the WAL file on archive
Assuming that an appropriate mechanism to archive and retrieve the WAL files is
in place, incremental backup and restore become simple operations: an
incremental "backup" is done by creating a restore point, using
``pg_start_backup('my_restore_pt')``, and the equivalent restore is done by
restarting the server with a ``recovery_target_name = 'my_restore_pt'`` in the
recovery.conf file.
Incremental Collection of WAL Files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the following series of WAL files that may be found in the PostgreSQL
WAL archive directory::
$ ls -lh /mnt/wal_arch
-rw------- 1 postgres postgres 16M Oct 27 19:11 000000010000000000000016
-rw------- 1 postgres postgres 16M Oct 27 19:11 000000010000000000000017
-rw------- 1 postgres postgres 16M Oct 27 19:12 000000010000000000000018
-rw------- 1 postgres postgres 16M Oct 27 19:12 000000010000000000000019
-rw------- 1 postgres postgres 16M Oct 27 19:12 00000001000000000000001A
The WAL entry stream can be visualized as::
--------------------------------------------------------
| 16 | 17 | 18 | 19 | 1A | ...
--------------------------------------------------------
^ ^ ^
B1 I1 I2
Suppose that base backup B1 was taken at file 16 position 48, incremental
backup I1 at file 18 position 30, and incremental backup I2 at file 1A
position 20.
I1 would consist of WAL files 16 through 18: the entries after position 48 in
file 16, where B1 was taken, would be needed, along with the contents of file
18 up until position 30.
I2 would consist of WAL files 18 through 1A.
This approach has the benefit of being consistent with the current paradigm
used by Trove for incremental backup and restore. The main downside is that
WAL data must be archived in two stages: once by PostgreSQL, local to the
instances, and a second time by Trove, to object storage. This introduces more
complexity and increases storage requirements on the instance.
Configuration
-------------
The new incremental backup and restore strategies will need to be added.
Database
--------
None.
Public API
----------
None.
Public API Security
-------------------
None.
Python API
----------
None.
CLI (python-troveclient)
------------------------
After implementation, the following CLI commands will work::
trove backup-create <inst> <inc_backup> --parent <backup_id>
trove create <inst> .. --backup <inc_backup_id>
Internal API
------------
None.
Guest Agent
-----------
A new backup and restore strategy based on pg_basebackup will be added.
Alternatives
------------
Tighter Postgresql - Swift Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A simpler approach would be to bypass the two-stage archive process, and have
PostgreSQL automatically manage the WAL archiving process to and from object
storage. For example, the archive command could run the WAL file through a
fast compressor such as Snappy or LZOP, encrypt and then ship the file to
Swift directly, with an equivalent reverse procedure for recovery.
This has the benefit of being simpler to implement, as most log handling is
pushed down to PostgreSQL, but has the significant side-effect of introducing
a relatively continuous stream of WAL data from the PostgreSQL guest to Swift,
something potentially unexpected and not consistent with the approach to
incremental backups on other datastores.
Dashboard Impact (UX)
=====================
TBD (section added after approval)
Implementation
==============
Assignee(s)
-----------
Primary assignee:
atomic77
Milestones
----------
mitaka-1
Work Items
----------
* pg_basebackup full backup
* incremental backup
* integration tests
Upgrade Implications
====================
Backups taken with the old PgDump strategy will not be compatible with this new
strategy.
Dependencies
============
Ability to create pgsql instances in the generic int-test framework.
Testing
=======
Int-tests for incremental backup currently do not exist in the new generic
int-test framework and will be added. Unit tests will be added as necessary.
Documentation Impact
====================
The documentation will need to be updated to reflect the new backup and restore
strategy.
References
==========
.. [1] http://www.postgresql.org/docs/9.4/static/wal-configuration.html
.. [2] http://www.postgresql.org/docs/9.4/static/continuous-archiving.html
.. [3] http://www.postgresql.org/docs/current/static/app-pgbasebackup.html
Appendix
========
N/A