Add borg-backup roles

This adds roles to implement backup with borg [1].

Our current tool "bup" has no Python 3 support and is not packaged for
Ubuntu Focal.  This means it is effectively end-of-life.  borg fits
our model of servers backing themselves up to a central location, is
well documented and seems well supported.  It also has the clarkb seal
of approval :)

As mentioned, borg works in the same manner as bup by doing an
efficient back up over ssh to a remote server.  The core of these
roles are the same as the bup based ones; in terms of creating a
separate user for each host and deploying keys and ssh config.

This chooses to install borg in a virtualenv on /opt.  This was chosen
for a number of reasons; firstly reading the history of borg there
have been incompatible updates (although they provide a tool to update
repository formats); it seems important that we both pin the version
we are using and keep clients and server in sync.  Since we have a
hetrogenous distribution collection we don't want to rely on the
packaged tools which may differ.  I don't feel like this is a great
application for a container; we actually don't want it that isolated
from the base system because it's goal is to read and copy it offsite
with as little chance of things going wrong as possible.

Borg has a lot of support for encrypting the data at rest in various
ways.  However, that introduces the possibility we could lose both the
key and the backup data.  Really the only thing stopping this is key
management, and if we want to go down this path we can do it as a
follow-on.

The remote end server is configured via ssh command rules to run in
append-only mode.  This means a misbehaving client can't delete its
old backups.  In theory we can prune backups on the server side --
something we could not do with bup.  The documentation has been
updated but is vague on this part; I think we should get some hosts in
operation, see how the de-duplication is working out and then decide
how we want to mange things long term.

Testing is added; a focal and bionic host both run a full backup of
themselves to the backup server.  Pretty cool, the logs are in
/var/log/borg-backup-<host>.log.

No hosts are currently in the borg groups, so this can be applied
without affecting production.  I'd suggest the next steps are to bring
up a borg-based backup server and put a few hosts into this.  After
running for a while, we can add all hosts, and then deprecate the
current bup-based backup server in vexxhost and replace that with a
borg-based one; giving us dual offsite backups.

[1] https://borgbackup.readthedocs.io/en/stable/

Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e
This commit is contained in:
Ian Wienand 2020-07-16 13:43:18 +10:00
parent c3b2aac1c1
commit 028d655375
18 changed files with 434 additions and 46 deletions

View File

@ -231,12 +231,14 @@ Or use matching to cover a range of servers::
Backups
=======
Infra uses the `bup <https://bup.github.io>`__ tool for backups.
Infra uses the `borg <https://borgbackup.readthedocs.io>`__ backup
tool.
Hosts in the ``backup`` Ansible inventory group will be backed up to
servers in the ``backup-server`` group with ``bup``. The
``playbooks/roles/backup`` and ``playbooks/roles/backup-server`` roles
implement the required setup.
Hosts in the ``borg-backup`` Ansible inventory group will be backed up
to servers in the ``borg-backup-server`` group with ``borg``. The
``playbooks/roles/borg-backup`` and
``playbooks/roles/borg-backup-server`` roles implement the required
setup.
The backup server has a unique Unix user for each host to be backed
up. The roles will setup required users, their home directories in
@ -250,52 +252,27 @@ key setup just for backup communication (see ``/root/.ssh/config``).
Restore from Backup
-------------------
On the server that needs items restored from backup become root, start a
screen session as restoring can take a while, and create a working
directory to restore the backups into. This allows us to be selective in
how we restore content from backups::
``borg`` has many options for restoring but a basic way to dump a host
at a particular time is to
sudo su -
screen
mkdir /root/backup-restore-$DATE
cd /root/backup-restore-$DATE
Root uses a separate ssh key and remote user to communicate with the
backup server(s); the username and key to use for backup should be
automatically configured in ``/root/.ssh/config``. The backup server
hostname can be taken from there.
At this point we can join the tar that was split by the backup cron::
bup join -r backup.x.y.opendev.org: root > backup.tar
At this point you may need to wait a while. These backups are stored on
servers geographically distant from our normal servers resulting in less
network throughput between servers than we are used to.
Once the ``bup join`` is complete you will have a tar archive of that
backup. It may be useful to list the files in the backup
``tar -tf backup.tar`` to get an idea of what things are available. At
this point you will probably either want to extract the entire backup::
tar -xvf backup.tar
ls -al
Or selectively extract files::
# path/to/file needs to match the output given by tar -t
tar -xvf backup.tar path/to/file
Note if you created your working directory in a path that is not
excluded by bup you will want to remove that directory when your work is
done. /root/backup-restore-* is excluded so the path above is safe.
* log into the backup server
* sudo ``su -`` to switch to the backup user for the host to be restored
* you will now be in the home directory of that user
* run ``/opt/borg/bin/borg list ./backup`` to list the archives available
* these should look like ``hostname-YYYY-MM-DDTHH:MM:SS``
* move to working directory
* extract one of the appropriate archives with ``/opt/borg/bin/borg extract ~/backup <archive-tag>``
Rotating backup storage
-----------------------
Since ``bup`` only stores differences, it does not have an effective
way to prune old backups. The easiest way is to simply periodically
start the backups fresh.
We run ``borg`` in append-only mode, so that clients can not remove
old backups on the server.
TODO(ianw) : Write instructions on how to prune server side. We
should monitor growth to see if automatic pruning would be
appropriate, or periodic manual pruning, or something similar to this
existing system where we keep a historic archive and start fresh.
The backup server keeps an active volume and the previously rotated
volume. Each consists of 3 x 1TiB volumes grouped with LVM. The

View File

@ -0,0 +1,15 @@
Setup backup server
This role configures backup server(s) in the ``borg-backup-server`` group
to accept backups from remote hosts.
Note that the ``borg-backup`` role must have run on each host in the
``borg-backup`` group before this role. That role will create a
``borg_user`` tuple in the hostvars for for each host consisting of
the required username and public key.
Each required user gets a separate home directory in ``/opt/backups``.
Their ``authorized_keys`` file is configured with the public key to
allow the remote host to log in and only run ``borg`` in server mode.
**Role Variables**

View File

@ -0,0 +1 @@
borg_users: []

View File

@ -0,0 +1,19 @@
- name: Create backup directory
file:
state: directory
path: /opt/backups
- name: Install borg
include_role:
name: install-borg
- name: Build all borg users from backup hosts
set_fact:
borg_users: '{{ borg_users }} + [ {{ hostvars[item]["borg_user"] }} ]'
with_inventory_hostnames: 'borg-backup:!disabled'
- name: Create borg users
include_tasks: user.yaml
loop: '{{ borg_users }}'
loop_control:
loop_var: borg_user

View File

@ -0,0 +1,31 @@
# note borg_user is the parent loop variable name; this works on each
# element from the borg_users global.
- name: Set variables
set_fact:
user_name: '{{ borg_user[0] }}'
user_key: '{{ borg_user[1] }}'
- name: Create borg user
user:
name: '{{ user_name }}'
comment: 'Backup user'
shell: /bin/bash
home: '/opt/backups/{{ user_name }}'
create_home: yes
register: homedir
- name: Create borg user authorized key
authorized_key:
user: '{{ user_name }}'
state: present
key: '{{ user_key }}'
key_options: 'command="/opt/borg/bin/borg serve --append-only --restrict-to-path /opt/backups/{{ user_name }}/backup",restrict'
# ansible-lint wants this in a handler, it should be done here and
# now; this isn't like a service restart where multiple things might
# call it.
- name: Initalise borg
command: /opt/borg/bin/borg init --encryption=none /opt/backups/{{ user_name }}/backup
become: yes
become_user: '{{ user_name }}'
when: homedir.changed

View File

@ -0,0 +1,36 @@
Configure a host to be backed up
This role setups a host to use ``borgp`` for backup to any hosts in the
``borg-backup-server`` group.
A separate ssh key will be generated for root to connect to the backup
server(s) and the host key for the backup servers will be accepted to
the host.
The ``borg`` tool is installed and a cron job is setup to run the
backup periodically.
Note the ``borg-backup-server`` role must run after this to create the user
correctly on the backup server. This role sets a tuple ``borg_user``
with the username and public key; the ``borg-backup-server`` role uses this
variable for each host in the ``borg-backup`` group to initalise users.
**Role Variables**
.. zuul:rolevar:: borg_username
The username to connect to the backup server. If this is left
undefined, it will be automatically set to ``borg-$(hostname)``
.. zuul:rolevar:: borg_backup_excludes_extra
:default: []
A list of extra items to pass as ``--exclude`` arguments to borg.
Appended to the global default list of excludes set with
``borg_backup_excludes``.
.. zuul:rolevar:: borg_backup_dirs_extra
:default: []
A list of extra directories to backup. Appended to the global
default list of directories set with ``borg_backup_dirs``.

View File

@ -0,0 +1,13 @@
borg_backup_excludes:
- '/home/*.cache/*'
- '/var/cache/*'
- '/var/tmp/*'
borg_backup_excludes_extra: []
borg_backup_dirs:
- /etc
- /home
- /root
- /var
borg_backup_dirs_extra: []

View File

@ -0,0 +1,63 @@
- name: Generate borg username for this host
set_fact:
borg_username: 'borg-{{ inventory_hostname.split(".", 1)[0] }}'
when: borg_username is not defined
- debug:
var: borg_username
- name: Install borg
include_role:
name: install-borg
- name: Install backup script
template:
src: borg-backup.j2
dest: /usr/local/bin/borg-backup
mode: 0755
- name: Generate keypair for backups
openssh_keypair:
path: /root/.ssh/id_borg_backup_ed25519
type: ed25519
register: borg_keypair
- name: Configure ssh for backup server
blockinfile:
path: /root/.ssh/config
create: true
block: |
# {{ item }} backup server
Host {{ item }}
HostName {{ item }}
IdentityFile /root/.ssh/id_borg_backup_ed25519
User {{ borg_username }}
mode: 0600
with_inventory_hostnames: borg-backup-server
- name: Generate borg_user info tuple
set_fact:
borg_user: '{{ [ borg_username, borg_keypair["public_key"] ] }}'
- name: Accept hostkey of backup server
known_hosts:
state: present
key: '{{ item }} ssh-ed25519 {{ hostvars[item]["ansible_ssh_host_key_ed25519_public"] }}'
name: '{{ item }}'
with_inventory_hostnames: borg-backup-server
- name: Install backup cron job
cron:
name: "Run borg backup"
job: "/usr/local/bin/borg-backup {{ item }} 2>> /var/log/borg-backup-{{ item }}.log"
user: root
hour: '5'
minute: '{{ 59|random(seed=item) }}'
with_inventory_hostnames: borg-backup-server
- name: Install logrotate rules
include_role:
name: logrotate
vars:
logrotate_file_name: '/var/log/borg-backup-{{ item }}.txt'
with_inventory_hostnames: borg-backup-server

View File

@ -0,0 +1,53 @@
#!/bin/bash
# Flags based on
# https://borgbackup.readthedocs.io/en/stable/quickstart.html
if [ -z "$1" ]; then
echo "Must specify backup host"
exit 1
fi
BORG="/opt/borg/bin/borg"
# Setting this, so the repo does not need to be given on the commandline:
export BORG_REPO="ssh://{{ borg_username}}@${1}/opt/backups/{{ borg_username }}/backup"
# some helpers and error handling:
info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; }
trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM
info "Starting backup"
# This avoids UI prompts when first accessing the remote repository
export BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=1
# Backup the most important directories into an archive named after
# the machine this script is currently running on:
${BORG} create \
--verbose \
--filter AME \
--list \
--stats \
--show-rc \
--compression lz4 \
--exclude-caches \
{% for item in borg_backup_excludes + borg_backup_excludes_extra -%}
--exclude '{{ item }}' \
{% endfor -%}
\
::'{hostname}-{now}' \
{% for item in borg_backup_dirs + borg_backup_dirs_extra -%}
{{ item }} {{ '\\' if not loop.last }}
{% endfor -%}
backup_exit=$?
if [ ${backup_exit} -eq 0 ]; then
info "Backup finished successfully"
else
info "Backup finished with errors"
fi
exit ${backup_exit}

View File

@ -0,0 +1,11 @@
Install borg backup tool to /opt/borg
Install borg to a virtualenv; the binary will be available at
``/opt/borg/bin/borg``.
**Role Variables**
.. zuul:rolevar:: borg_version
The version of ``borg`` to install. This should likely be pinned
to be the same between server and client.

View File

@ -0,0 +1 @@
borg_version: 1.1.13

View File

@ -0,0 +1,24 @@
# We install into a virtualenv here for two reasons; we want a
# specific version pinned between server and client -- borg has had
# updates that required transitions so we don't want to use system
# packages where thing might get out of sync. Secondly we want to
# keep as few things as possible to go wrong when running backups.
- name: Install build deps
package:
name:
- python3-dev
- libssl-dev
- openssl
- libacl1-dev
- libacl1
- build-essential
- name: Install borg
pip:
# borg build deps are a little ... interesting, it needs cython
# but the requirements don't bring it in.
name:
- cython
- 'borgbackup=={{ borg_version }}'
virtualenv: /opt/borg
virtualenv_command: /usr/bin/python3 -m venv

View File

@ -0,0 +1,12 @@
# This needs to happen in order. Backup hosts export their username/key
# combos which are installed onto the backup server
- hosts: "borg-backup:!disabled"
name: "Base: Generate borg backup users and keys"
roles:
- iptables
- borg-backup
- hosts: "borg-backup-server:!disabled"
name: "Generate borg configuration"
roles:
- iptables
- borg-backup-server

View File

@ -20,3 +20,10 @@ groups:
backup:
- backup-test01.opendev.org
- backup-test02.opendev.org
borg-backup-server:
- borg-backup01.region.provider.opendev.org
borg-backup:
- borg-backup-test01.opendev.org
- borg-backup-test02.opendev.org

View File

@ -0,0 +1,77 @@
# Copyright 2019 Red Hat, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
import os.path
import pytest
testinfra_hosts = ['borg-backup01.region.provider.opendev.org',
'borg-backup-test01.opendev.org',
'borg-backup-test02.opendev.org']
def test_borg_installed(host):
f = host.file('/opt/borg/bin/borg')
assert f.exists
cmd = host.run('/opt/borg/bin/borg --version')
assert cmd.succeeded
# NOTE(ianw): deliberately pinned; we want to be careful if we
# update that the new version is compatible with old repos.
assert '1.1.13' in cmd.stdout
def test_borg_server_users(host):
hostname = host.backend.get_hostname()
if hostname.startswith('borg-backup-test'):
pytest.skip()
for username in 'borg-borg-backup-test01', 'borg-borg-backup-test02':
homedir = os.path.join('/opt/backups/', username)
borg_repo = os.path.join(homedir, 'backup')
authorized_keys = os.path.join(homedir, '.ssh', 'authorized_keys')
user = host.user(username)
assert user.exists
assert user.home == homedir
f = host.file(authorized_keys)
assert f.exists
assert f.contains("ssh-ed25519")
f = host.file(borg_repo)
assert f.exists
def test_borg_backup_host_config(host):
hostname = host.backend.get_hostname()
if hostname == 'borg-backup01.region.provider.opendev.org':
pytest.skip()
f = host.file('/usr/local/bin/borg-backup')
assert f.exists
f = host.file('/root/.ssh/id_borg_backup_ed25519')
assert f.exists
f = host.file('/root/.ssh/config')
assert f.exists
assert f.contains('Host borg-backup01.region.provider.opendev.org')
def test_borg_backup(host):
hostname = host.backend.get_hostname()
if hostname == 'borg-backup01.region.provider.opendev.org':
pytest.skip()
cmd = host.run(
'/usr/local/bin/borg-backup borg-backup01.region.provider.opendev.org 2>> '
'/var/log/borg-backup-borg-backup01.region.provider.opendev.org.log')
assert cmd.succeeded

View File

@ -285,6 +285,19 @@
- playbooks/roles/backup-server/
- playbooks/roles/iptables/
- job:
name: infra-prod-service-borg-backup
parent: infra-prod-service-base
description: Run service-borg-backup.yaml playbook.
vars:
playbook_name: service-borg-backup.yaml
files:
- inventory/
- playbooks/service-borg-backup.yaml
- playbooks/roles/borg-backup/
- playbooks/roles/borg-backup-server/
- playbooks/roles/iptables/
- job:
name: infra-prod-service-registry
parent: infra-prod-service-base

View File

@ -13,6 +13,7 @@
- system-config-run-base-ansible-devel:
voting: false
- system-config-run-backup
- system-config-run-borg-backup
- system-config-run-dns
- system-config-run-eavesdrop:
dependencies:
@ -235,6 +236,7 @@
- infra-prod-service-mirror
- infra-prod-service-static
- infra-prod-service-backup
- infra-prod-service-borg-backup
- infra-prod-service-registry
- infra-prod-service-zookeeper
- infra-prod-service-zuul
@ -276,6 +278,7 @@
- infra-prod-service-mirror-update
- infra-prod-service-mirror
- infra-prod-service-static
- infra-prod-service-borg-backup
- infra-prod-service-backup
- infra-prod-service-zookeeper
- infra-prod-service-review

View File

@ -342,6 +342,38 @@
- playbooks/zuul/templates/host_vars/backup
- testinfra/test_backups.py
- job:
name: system-config-run-borg-backup
parent: system-config-run
description: |
Run the playbook for borg backup configuration
nodeset:
nodes:
- name: bridge.openstack.org
label: ubuntu-bionic
- name: borg-backup01.region.provider.opendev.org
label: ubuntu-focal
- name: borg-backup-test01.opendev.org
label: ubuntu-focal
- name: borg-backup-test02.opendev.org
label: ubuntu-bionic
vars:
run_playbooks:
- playbooks/service-borg-backup.yaml
files:
- playbooks/install-ansible.yaml
- playbooks/roles/borg-backup
- playbooks/zuul/templates/host_vars/borg-backup
- testinfra/test_borg_backups.py
host-vars:
borg-backup-test01.opendev.org:
host_copy_output:
'/var/log/borg-backup-borg-backup01.region.provider.opendev.org.log': logs
borg-backup-test02.opendev.org:
host_copy_output:
'/var/log/borg-backup-borg-backup01.region.provider.opendev.org.log': logs
- job:
name: system-config-run-mirror-base
parent: system-config-run