diff --git a/doc/source/sysadmin.rst b/doc/source/sysadmin.rst index b93cc3ff2b..8a3cc2a59c 100644 --- a/doc/source/sysadmin.rst +++ b/doc/source/sysadmin.rst @@ -231,12 +231,14 @@ Or use matching to cover a range of servers:: Backups ======= -Infra uses the `bup `__ tool for backups. +Infra uses the `borg `__ backup +tool. -Hosts in the ``backup`` Ansible inventory group will be backed up to -servers in the ``backup-server`` group with ``bup``. The -``playbooks/roles/backup`` and ``playbooks/roles/backup-server`` roles -implement the required setup. +Hosts in the ``borg-backup`` Ansible inventory group will be backed up +to servers in the ``borg-backup-server`` group with ``borg``. The +``playbooks/roles/borg-backup`` and +``playbooks/roles/borg-backup-server`` roles implement the required +setup. The backup server has a unique Unix user for each host to be backed up. The roles will setup required users, their home directories in @@ -250,52 +252,27 @@ key setup just for backup communication (see ``/root/.ssh/config``). Restore from Backup ------------------- -On the server that needs items restored from backup become root, start a -screen session as restoring can take a while, and create a working -directory to restore the backups into. This allows us to be selective in -how we restore content from backups:: +``borg`` has many options for restoring but a basic way to dump a host +at a particular time is to - sudo su - - screen - mkdir /root/backup-restore-$DATE - cd /root/backup-restore-$DATE - -Root uses a separate ssh key and remote user to communicate with the -backup server(s); the username and key to use for backup should be -automatically configured in ``/root/.ssh/config``. The backup server -hostname can be taken from there. - -At this point we can join the tar that was split by the backup cron:: - - bup join -r backup.x.y.opendev.org: root > backup.tar - -At this point you may need to wait a while. These backups are stored on -servers geographically distant from our normal servers resulting in less -network throughput between servers than we are used to. - -Once the ``bup join`` is complete you will have a tar archive of that -backup. It may be useful to list the files in the backup -``tar -tf backup.tar`` to get an idea of what things are available. At -this point you will probably either want to extract the entire backup:: - - tar -xvf backup.tar - ls -al - -Or selectively extract files:: - - # path/to/file needs to match the output given by tar -t - tar -xvf backup.tar path/to/file - -Note if you created your working directory in a path that is not -excluded by bup you will want to remove that directory when your work is -done. /root/backup-restore-* is excluded so the path above is safe. +* log into the backup server +* sudo ``su -`` to switch to the backup user for the host to be restored +* you will now be in the home directory of that user +* run ``/opt/borg/bin/borg list ./backup`` to list the archives available +* these should look like ``hostname-YYYY-MM-DDTHH:MM:SS`` +* move to working directory +* extract one of the appropriate archives with ``/opt/borg/bin/borg extract ~/backup `` Rotating backup storage ----------------------- -Since ``bup`` only stores differences, it does not have an effective -way to prune old backups. The easiest way is to simply periodically -start the backups fresh. +We run ``borg`` in append-only mode, so that clients can not remove +old backups on the server. + +TODO(ianw) : Write instructions on how to prune server side. We +should monitor growth to see if automatic pruning would be +appropriate, or periodic manual pruning, or something similar to this +existing system where we keep a historic archive and start fresh. The backup server keeps an active volume and the previously rotated volume. Each consists of 3 x 1TiB volumes grouped with LVM. The diff --git a/playbooks/roles/borg-backup-server/README.rst b/playbooks/roles/borg-backup-server/README.rst new file mode 100644 index 0000000000..c78c557dec --- /dev/null +++ b/playbooks/roles/borg-backup-server/README.rst @@ -0,0 +1,15 @@ +Setup backup server + +This role configures backup server(s) in the ``borg-backup-server`` group +to accept backups from remote hosts. + +Note that the ``borg-backup`` role must have run on each host in the +``borg-backup`` group before this role. That role will create a +``borg_user`` tuple in the hostvars for for each host consisting of +the required username and public key. + +Each required user gets a separate home directory in ``/opt/backups``. +Their ``authorized_keys`` file is configured with the public key to +allow the remote host to log in and only run ``borg`` in server mode. + +**Role Variables** diff --git a/playbooks/roles/borg-backup-server/defaults/main.yaml b/playbooks/roles/borg-backup-server/defaults/main.yaml new file mode 100644 index 0000000000..9ba22faee0 --- /dev/null +++ b/playbooks/roles/borg-backup-server/defaults/main.yaml @@ -0,0 +1 @@ +borg_users: [] diff --git a/playbooks/roles/borg-backup-server/tasks/main.yaml b/playbooks/roles/borg-backup-server/tasks/main.yaml new file mode 100644 index 0000000000..bae1f7e826 --- /dev/null +++ b/playbooks/roles/borg-backup-server/tasks/main.yaml @@ -0,0 +1,19 @@ +- name: Create backup directory + file: + state: directory + path: /opt/backups + +- name: Install borg + include_role: + name: install-borg + +- name: Build all borg users from backup hosts + set_fact: + borg_users: '{{ borg_users }} + [ {{ hostvars[item]["borg_user"] }} ]' + with_inventory_hostnames: 'borg-backup:!disabled' + +- name: Create borg users + include_tasks: user.yaml + loop: '{{ borg_users }}' + loop_control: + loop_var: borg_user diff --git a/playbooks/roles/borg-backup-server/tasks/user.yaml b/playbooks/roles/borg-backup-server/tasks/user.yaml new file mode 100644 index 0000000000..5a1100b0c4 --- /dev/null +++ b/playbooks/roles/borg-backup-server/tasks/user.yaml @@ -0,0 +1,31 @@ +# note borg_user is the parent loop variable name; this works on each +# element from the borg_users global. +- name: Set variables + set_fact: + user_name: '{{ borg_user[0] }}' + user_key: '{{ borg_user[1] }}' + +- name: Create borg user + user: + name: '{{ user_name }}' + comment: 'Backup user' + shell: /bin/bash + home: '/opt/backups/{{ user_name }}' + create_home: yes + register: homedir + +- name: Create borg user authorized key + authorized_key: + user: '{{ user_name }}' + state: present + key: '{{ user_key }}' + key_options: 'command="/opt/borg/bin/borg serve --append-only --restrict-to-path /opt/backups/{{ user_name }}/backup",restrict' + +# ansible-lint wants this in a handler, it should be done here and +# now; this isn't like a service restart where multiple things might +# call it. +- name: Initalise borg + command: /opt/borg/bin/borg init --encryption=none /opt/backups/{{ user_name }}/backup + become: yes + become_user: '{{ user_name }}' + when: homedir.changed diff --git a/playbooks/roles/borg-backup/README.rst b/playbooks/roles/borg-backup/README.rst new file mode 100644 index 0000000000..c97c88161f --- /dev/null +++ b/playbooks/roles/borg-backup/README.rst @@ -0,0 +1,36 @@ +Configure a host to be backed up + +This role setups a host to use ``borgp`` for backup to any hosts in the +``borg-backup-server`` group. + +A separate ssh key will be generated for root to connect to the backup +server(s) and the host key for the backup servers will be accepted to +the host. + +The ``borg`` tool is installed and a cron job is setup to run the +backup periodically. + +Note the ``borg-backup-server`` role must run after this to create the user +correctly on the backup server. This role sets a tuple ``borg_user`` +with the username and public key; the ``borg-backup-server`` role uses this +variable for each host in the ``borg-backup`` group to initalise users. + +**Role Variables** + +.. zuul:rolevar:: borg_username + + The username to connect to the backup server. If this is left + undefined, it will be automatically set to ``borg-$(hostname)`` + +.. zuul:rolevar:: borg_backup_excludes_extra + :default: [] + + A list of extra items to pass as ``--exclude`` arguments to borg. + Appended to the global default list of excludes set with + ``borg_backup_excludes``. + +.. zuul:rolevar:: borg_backup_dirs_extra + :default: [] + + A list of extra directories to backup. Appended to the global + default list of directories set with ``borg_backup_dirs``. diff --git a/playbooks/roles/borg-backup/defaults/main.yaml b/playbooks/roles/borg-backup/defaults/main.yaml new file mode 100644 index 0000000000..788d313c6d --- /dev/null +++ b/playbooks/roles/borg-backup/defaults/main.yaml @@ -0,0 +1,13 @@ +borg_backup_excludes: + - '/home/*.cache/*' + - '/var/cache/*' + - '/var/tmp/*' +borg_backup_excludes_extra: [] + +borg_backup_dirs: + - /etc + - /home + - /root + - /var +borg_backup_dirs_extra: [] + diff --git a/playbooks/roles/borg-backup/tasks/main.yaml b/playbooks/roles/borg-backup/tasks/main.yaml new file mode 100644 index 0000000000..8578cc8dc0 --- /dev/null +++ b/playbooks/roles/borg-backup/tasks/main.yaml @@ -0,0 +1,63 @@ +- name: Generate borg username for this host + set_fact: + borg_username: 'borg-{{ inventory_hostname.split(".", 1)[0] }}' + when: borg_username is not defined + +- debug: + var: borg_username + +- name: Install borg + include_role: + name: install-borg + +- name: Install backup script + template: + src: borg-backup.j2 + dest: /usr/local/bin/borg-backup + mode: 0755 + +- name: Generate keypair for backups + openssh_keypair: + path: /root/.ssh/id_borg_backup_ed25519 + type: ed25519 + register: borg_keypair + +- name: Configure ssh for backup server + blockinfile: + path: /root/.ssh/config + create: true + block: | + # {{ item }} backup server + Host {{ item }} + HostName {{ item }} + IdentityFile /root/.ssh/id_borg_backup_ed25519 + User {{ borg_username }} + mode: 0600 + with_inventory_hostnames: borg-backup-server + +- name: Generate borg_user info tuple + set_fact: + borg_user: '{{ [ borg_username, borg_keypair["public_key"] ] }}' + +- name: Accept hostkey of backup server + known_hosts: + state: present + key: '{{ item }} ssh-ed25519 {{ hostvars[item]["ansible_ssh_host_key_ed25519_public"] }}' + name: '{{ item }}' + with_inventory_hostnames: borg-backup-server + +- name: Install backup cron job + cron: + name: "Run borg backup" + job: "/usr/local/bin/borg-backup {{ item }} 2>> /var/log/borg-backup-{{ item }}.log" + user: root + hour: '5' + minute: '{{ 59|random(seed=item) }}' + with_inventory_hostnames: borg-backup-server + +- name: Install logrotate rules + include_role: + name: logrotate + vars: + logrotate_file_name: '/var/log/borg-backup-{{ item }}.txt' + with_inventory_hostnames: borg-backup-server diff --git a/playbooks/roles/borg-backup/templates/borg-backup.j2 b/playbooks/roles/borg-backup/templates/borg-backup.j2 new file mode 100644 index 0000000000..1b33e3430e --- /dev/null +++ b/playbooks/roles/borg-backup/templates/borg-backup.j2 @@ -0,0 +1,53 @@ +#!/bin/bash + +# Flags based on +# https://borgbackup.readthedocs.io/en/stable/quickstart.html + +if [ -z "$1" ]; then + echo "Must specify backup host" + exit 1 +fi + +BORG="/opt/borg/bin/borg" + +# Setting this, so the repo does not need to be given on the commandline: +export BORG_REPO="ssh://{{ borg_username}}@${1}/opt/backups/{{ borg_username }}/backup" + +# some helpers and error handling: +info() { printf "\n%s %s\n\n" "$( date )" "$*" >&2; } +trap 'echo $( date ) Backup interrupted >&2; exit 2' INT TERM + +info "Starting backup" + +# This avoids UI prompts when first accessing the remote repository +export BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=1 + +# Backup the most important directories into an archive named after +# the machine this script is currently running on: +${BORG} create \ + --verbose \ + --filter AME \ + --list \ + --stats \ + --show-rc \ + --compression lz4 \ + --exclude-caches \ +{% for item in borg_backup_excludes + borg_backup_excludes_extra -%} + --exclude '{{ item }}' \ +{% endfor -%} + \ + ::'{hostname}-{now}' \ +{% for item in borg_backup_dirs + borg_backup_dirs_extra -%} + {{ item }} {{ '\\' if not loop.last }} +{% endfor -%} + +backup_exit=$? + +if [ ${backup_exit} -eq 0 ]; then + info "Backup finished successfully" +else + info "Backup finished with errors" +fi + +exit ${backup_exit} + diff --git a/playbooks/roles/install-borg/README.rst b/playbooks/roles/install-borg/README.rst new file mode 100644 index 0000000000..57a4559900 --- /dev/null +++ b/playbooks/roles/install-borg/README.rst @@ -0,0 +1,11 @@ +Install borg backup tool to /opt/borg + +Install borg to a virtualenv; the binary will be available at +``/opt/borg/bin/borg``. + +**Role Variables** + +.. zuul:rolevar:: borg_version + + The version of ``borg`` to install. This should likely be pinned + to be the same between server and client. diff --git a/playbooks/roles/install-borg/defaults/main.yaml b/playbooks/roles/install-borg/defaults/main.yaml new file mode 100644 index 0000000000..cf24517f15 --- /dev/null +++ b/playbooks/roles/install-borg/defaults/main.yaml @@ -0,0 +1 @@ +borg_version: 1.1.13 diff --git a/playbooks/roles/install-borg/tasks/main.yaml b/playbooks/roles/install-borg/tasks/main.yaml new file mode 100644 index 0000000000..228cbb9243 --- /dev/null +++ b/playbooks/roles/install-borg/tasks/main.yaml @@ -0,0 +1,24 @@ +# We install into a virtualenv here for two reasons; we want a +# specific version pinned between server and client -- borg has had +# updates that required transitions so we don't want to use system +# packages where thing might get out of sync. Secondly we want to +# keep as few things as possible to go wrong when running backups. +- name: Install build deps + package: + name: + - python3-dev + - libssl-dev + - openssl + - libacl1-dev + - libacl1 + - build-essential + +- name: Install borg + pip: + # borg build deps are a little ... interesting, it needs cython + # but the requirements don't bring it in. + name: + - cython + - 'borgbackup=={{ borg_version }}' + virtualenv: /opt/borg + virtualenv_command: /usr/bin/python3 -m venv diff --git a/playbooks/service-borg-backup.yaml b/playbooks/service-borg-backup.yaml new file mode 100644 index 0000000000..f0f7f3470e --- /dev/null +++ b/playbooks/service-borg-backup.yaml @@ -0,0 +1,12 @@ +# This needs to happen in order. Backup hosts export their username/key +# combos which are installed onto the backup server +- hosts: "borg-backup:!disabled" + name: "Base: Generate borg backup users and keys" + roles: + - iptables + - borg-backup +- hosts: "borg-backup-server:!disabled" + name: "Generate borg configuration" + roles: + - iptables + - borg-backup-server diff --git a/playbooks/zuul/templates/gate-groups.yaml.j2 b/playbooks/zuul/templates/gate-groups.yaml.j2 index 847b91db38..c420cc6c77 100644 --- a/playbooks/zuul/templates/gate-groups.yaml.j2 +++ b/playbooks/zuul/templates/gate-groups.yaml.j2 @@ -20,3 +20,10 @@ groups: backup: - backup-test01.opendev.org - backup-test02.opendev.org + + borg-backup-server: + - borg-backup01.region.provider.opendev.org + + borg-backup: + - borg-backup-test01.opendev.org + - borg-backup-test02.opendev.org diff --git a/testinfra/test_borg_backups.py b/testinfra/test_borg_backups.py new file mode 100644 index 0000000000..7f9f7d6b4e --- /dev/null +++ b/testinfra/test_borg_backups.py @@ -0,0 +1,77 @@ +# Copyright 2019 Red Hat, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may +# not use this file except in compliance with the License. You may obtain +# a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +# License for the specific language governing permissions and limitations +# under the License. + +import os.path +import pytest + +testinfra_hosts = ['borg-backup01.region.provider.opendev.org', + 'borg-backup-test01.opendev.org', + 'borg-backup-test02.opendev.org'] + + +def test_borg_installed(host): + f = host.file('/opt/borg/bin/borg') + assert f.exists + + cmd = host.run('/opt/borg/bin/borg --version') + assert cmd.succeeded + # NOTE(ianw): deliberately pinned; we want to be careful if we + # update that the new version is compatible with old repos. + assert '1.1.13' in cmd.stdout + +def test_borg_server_users(host): + hostname = host.backend.get_hostname() + if hostname.startswith('borg-backup-test'): + pytest.skip() + + for username in 'borg-borg-backup-test01', 'borg-borg-backup-test02': + homedir = os.path.join('/opt/backups/', username) + borg_repo = os.path.join(homedir, 'backup') + authorized_keys = os.path.join(homedir, '.ssh', 'authorized_keys') + + user = host.user(username) + assert user.exists + assert user.home == homedir + + f = host.file(authorized_keys) + assert f.exists + assert f.contains("ssh-ed25519") + + f = host.file(borg_repo) + assert f.exists + +def test_borg_backup_host_config(host): + hostname = host.backend.get_hostname() + if hostname == 'borg-backup01.region.provider.opendev.org': + pytest.skip() + + f = host.file('/usr/local/bin/borg-backup') + assert f.exists + + f = host.file('/root/.ssh/id_borg_backup_ed25519') + assert f.exists + + f = host.file('/root/.ssh/config') + assert f.exists + assert f.contains('Host borg-backup01.region.provider.opendev.org') + +def test_borg_backup(host): + hostname = host.backend.get_hostname() + if hostname == 'borg-backup01.region.provider.opendev.org': + pytest.skip() + + cmd = host.run( + '/usr/local/bin/borg-backup borg-backup01.region.provider.opendev.org 2>> ' + '/var/log/borg-backup-borg-backup01.region.provider.opendev.org.log') + assert cmd.succeeded diff --git a/zuul.d/infra-prod.yaml b/zuul.d/infra-prod.yaml index d52bbe21c2..15741fc76e 100644 --- a/zuul.d/infra-prod.yaml +++ b/zuul.d/infra-prod.yaml @@ -285,6 +285,19 @@ - playbooks/roles/backup-server/ - playbooks/roles/iptables/ +- job: + name: infra-prod-service-borg-backup + parent: infra-prod-service-base + description: Run service-borg-backup.yaml playbook. + vars: + playbook_name: service-borg-backup.yaml + files: + - inventory/ + - playbooks/service-borg-backup.yaml + - playbooks/roles/borg-backup/ + - playbooks/roles/borg-backup-server/ + - playbooks/roles/iptables/ + - job: name: infra-prod-service-registry parent: infra-prod-service-base diff --git a/zuul.d/project.yaml b/zuul.d/project.yaml index 50382f0050..d14d451961 100644 --- a/zuul.d/project.yaml +++ b/zuul.d/project.yaml @@ -13,6 +13,7 @@ - system-config-run-base-ansible-devel: voting: false - system-config-run-backup + - system-config-run-borg-backup - system-config-run-dns - system-config-run-eavesdrop: dependencies: @@ -235,6 +236,7 @@ - infra-prod-service-mirror - infra-prod-service-static - infra-prod-service-backup + - infra-prod-service-borg-backup - infra-prod-service-registry - infra-prod-service-zookeeper - infra-prod-service-zuul @@ -276,6 +278,7 @@ - infra-prod-service-mirror-update - infra-prod-service-mirror - infra-prod-service-static + - infra-prod-service-borg-backup - infra-prod-service-backup - infra-prod-service-zookeeper - infra-prod-service-review diff --git a/zuul.d/system-config-run.yaml b/zuul.d/system-config-run.yaml index 98a25c9329..4aed91314b 100644 --- a/zuul.d/system-config-run.yaml +++ b/zuul.d/system-config-run.yaml @@ -342,6 +342,38 @@ - playbooks/zuul/templates/host_vars/backup - testinfra/test_backups.py +- job: + name: system-config-run-borg-backup + parent: system-config-run + description: | + Run the playbook for borg backup configuration + nodeset: + nodes: + - name: bridge.openstack.org + label: ubuntu-bionic + - name: borg-backup01.region.provider.opendev.org + label: ubuntu-focal + - name: borg-backup-test01.opendev.org + label: ubuntu-focal + - name: borg-backup-test02.opendev.org + label: ubuntu-bionic + vars: + run_playbooks: + - playbooks/service-borg-backup.yaml + files: + - playbooks/install-ansible.yaml + - playbooks/roles/borg-backup + - playbooks/zuul/templates/host_vars/borg-backup + - testinfra/test_borg_backups.py + host-vars: + borg-backup-test01.opendev.org: + host_copy_output: + '/var/log/borg-backup-borg-backup01.region.provider.opendev.org.log': logs + borg-backup-test02.opendev.org: + host_copy_output: + '/var/log/borg-backup-borg-backup01.region.provider.opendev.org.log': logs + + - job: name: system-config-run-mirror-base parent: system-config-run