trove

History

Petr Malik e722342ce7 Add backup & restore for Cassandra Implement backup and restore functionality for Cassandra datastore. We implement full backup strategy using the Nodetool (http://goo.gl/QtXVsM) utility. Snapshots: Nodetool can take a snapshot of one or more keyspace(s). Snapshot(s) will be stored in the data directory tree: '<data dir>/<keyspace>/<table>/snapshots/<snapshot name>' A snapshot can be restored by moving all .db files from a snapshot directory to the respective keyspace overwriting any existing files. NOTE: It is recommended to include the system keyspace in the backup. Keeping the system keyspace will reduce the restore time by avoiding need to rebuilding indexes. The Backup Procedure: 1. Clear existing snapshots. 2. Take a snapshot of all keyspaces. 3. Collect all .db files from the snapshot directories package them into a single TAR archive. Transform the paths such that the backup can be restored simply by extracting the archive right to an existing data directory (i.e. place the root into the <data dir> and remove the 'snapshots/<snapshot name>' portion of the path). The data directory itself is not included in the backup archive (i.e. the archive is rooted inside the data directory). This is to make sure we can always restore an old backup even if the standard guest agent data directory changes. Attempt to preserve access modifiers on the archived files. Assert the backup is not empty as there should always be at least the system keyspace. Fail if there is nothing to backup. 4. Compress and/or encrypt the archive as required. 5. This archive is streamed to the storage location. The Restore Procedure: 1. Create a new data directory as it does not exist. 2. Unpack the backup to that directory. 3. Update ownership of the restored files to the Cassandra user. Notes on 'cluster_name' property: Cassandra has a concept of clusters. Clusters are composed of nodes - instances. All nodes belonging to one cluster must all have the same 'cluster_name' property. This prevents nodes from different logical clusters from accidentally talking to each other. The cluster name can be changed in the configuration file. It is also stored in the system keyspace. When the Cassandra service boots up it verifies that the cluster name stored in the database matches the name in the configuration file and fails if not. This is to prevent the operator from accidentally launching a node with data from another cluster. The operator has to update the configuration file. Similarly, when a backup is restored it carries the original cluster name with it. We have to update the configuration file to use the old name. When a node gets restored it will still belong to the original cluster. Notes on superuser password reset: Database is no longer wide open and requires password authentication. The 'root' password stored in the system keyspace needs to be reset before we can start up with restored data. A general password reset procedure is: - disable user authentication and remote access - restart the service - update the password in the 'system_auth.credentials' table - re-enable authentication and make the host reachable - restart the service Note: The superuser-password-reset and related methods that potentially expose the database contents are intentionally decorated with '_' and '__' to discourage a caller from using them unless absolutely necessary. Additional changes: - Adds backup/restore namespaces to the sample config file 'trove-guestagent.conf.sample'. We include the other datastores too for the sake of consistency. (Auston McReynolds, Jul 6, 2014) Implements: blueprint cassandra-backup-restore Co-Authored-By: Denis Makogon <dmakogon@mirantis.com> Change-Id: I3671a737d3e71305982d8f4965215a73e785ea2d	2016-02-13 03:29:28 +00:00
..
tests	Changes names of some quota values	2015-10-21 12:42:22 +00:00
trove	Add backup & restore for Cassandra	2016-02-13 03:29:28 +00:00

Petr Malik e722342ce7 Add backup & restore for Cassandra

Implement backup and restore functionality for Cassandra datastore.

We implement full backup strategy using the Nodetool
(http://goo.gl/QtXVsM) utility.

Snapshots:

Nodetool can take a snapshot of one or more keyspace(s).
Snapshot(s) will be stored in the data directory tree:
'<data dir>/<keyspace>/<table>/snapshots/<snapshot name>'

A snapshot can be restored by moving all *.db files from a snapshot
directory to the respective keyspace overwriting any existing files.

NOTE: It is recommended to include the system keyspace in the backup.
      Keeping the system keyspace will reduce the restore time
      by avoiding need to rebuilding indexes.

The Backup Procedure:

1. Clear existing snapshots.

2. Take a snapshot of all keyspaces.

3. Collect all *.db files from the snapshot directories package them
into a single TAR archive.

Transform the paths such that the backup can be restored simply by
extracting the archive right to an existing data directory
(i.e. place the root into the <data dir> and
remove the 'snapshots/<snapshot name>' portion of the path).
The data directory itself is not included in the backup archive
(i.e. the archive is rooted inside the data directory).
This is to make sure we can always restore an old backup
even if the standard guest agent data directory changes.

Attempt to preserve access modifiers on the archived files.

Assert the backup is not empty as there should always be
at least the system keyspace. Fail if there is nothing to backup.

4. Compress and/or encrypt the archive as required.

5. This archive is streamed to the storage location.

The Restore Procedure:

1. Create a new data directory as it does not exist.

2. Unpack the backup to that directory.

3. Update ownership of the restored files to the Cassandra user.

Notes on 'cluster_name' property:

Cassandra has a concept of clusters. Clusters are composed of
nodes - instances. All nodes belonging to one cluster must all have the
same 'cluster_name' property. This prevents nodes from different logical
clusters from accidentally talking to each other.

The cluster name can be changed in the configuration file.
It is also stored in the system keyspace.
When the Cassandra service boots up it verifies that the cluster name
stored in the database matches the name in the configuration file and
fails if not. This is to prevent the operator from accidentally
launching a node with data from another cluster.
The operator has to update the configuration file.

Similarly, when a backup is restored it carries the original cluster
name with it. We have to update the configuration file to use the old
name.
When a node gets restored it will still belong to the original cluster.

Notes on superuser password reset:

Database is no longer wide open and requires password authentication.
The 'root' password stored in the system keyspace
needs to be reset before we can start up with restored data.

A general password reset procedure is:
- disable user authentication and remote access
- restart the service
- update the password in the 'system_auth.credentials' table
- re-enable authentication and make the host reachable
- restart the service

Note: The superuser-password-reset and related methods that
      potentially expose the database contents are intentionally
      decorated with '_' and '__' to discourage a caller from
      using them unless absolutely necessary.

Additional changes:

- Adds backup/restore namespaces to the sample config
  file 'trove-guestagent.conf.sample'.
  We include the other datastores too
  for the sake of consistency.
  (Auston McReynolds, Jul 6, 2014)

Implements: blueprint cassandra-backup-restore
Co-Authored-By: Denis Makogon <dmakogon@mirantis.com>
Change-Id: I3671a737d3e71305982d8f4965215a73e785ea2d

2016-02-13 03:29:28 +00:00

tests

Changes names of some quota values

2015-10-21 12:42:22 +00:00

trove

Add backup & restore for Cassandra

2016-02-13 03:29:28 +00:00