Backup and Restore

Patch 2 review edits.

Stripped spaces at EOLs. Deleted vscode hidden directory.

Initial review and resolved supporting additions to source/shared/strings.txt

Signed-off-by: Stone <ronald.stone@windriver.com>
Change-Id: I21895e0f451d1a11b8f9813034c31a049d4f9ad8
Signed-off-by: Stone <ronald.stone@windriver.com>
This commit is contained in:
Stone 2020-12-04 11:59:12 -05:00
parent 4918b82348
commit e1180949ca
10 changed files with 933 additions and 0 deletions

View File

@ -0,0 +1,159 @@
.. hgq1552923986183
.. _backing-up-starlingx-system-data:
===================
Back Up System Data
===================
A system data backup of a |prod-long| system captures core system
information needed to restore a fully operational |prod-long| cluster.
.. contents:: In this section:
:local:
:depth: 1
.. _backing-up-starlingx-system-data-section-N1002E-N1002B-N10001:
System Data Backups include:
.. _backing-up-starlingx-system-data-ul-enh-3dl-lp:
- platform configuration details
- system databases
- patching and package repositories
- home directory for the **sysadmin** user and all |LDAP| user accounts.
.. xreflink See |sec-doc|: :ref:`Local LDAP Linux User Accounts
<local-ldap-linux-user-accounts>` for additional information.
.. note::
If there is any change in hardware configuration, for example, new
NICs, a system backup is required to ensure that there is no
configuration mismatch after system restore.
.. _backing-up-starlingx-system-data-section-N10089-N1002B-N10001:
------------------------------------
Detailed contents of a system backup
------------------------------------
The backup contains details as listed below:
.. _backing-up-starlingx-system-data-ul-s3t-bz4-kjb:
- Postgresql backup databases
- |LDAP| database
- Ceph crushmap
- DNS server list
- System Inventory network configuration is required during a system
restore to set up the OS configuration.
- Docker registries on controller
- Docker no-proxy
- \(Optional\) Any end user container images in **registry.local**; that
is, any images other than |org| system and application images.
|prod| system and application images are repulled from their
original source, external registries during the restore procedure.
- Backup up data:
- OS configuration:
- \(item=/etc\)
.. note::
Although everything is backed up, not all the content is restored.
- Home directory 'sysadmin' user, and all |LDAP| user accounts
\(item=/etc\)
- Generated platform configuration:
- item=/opt/platform/config/<|prefix|\_version>
- item=/opt/platform/puppet/<|prefix|\_version>/hieradata:
All the hieradata under is backed-up. However, only the static
hieradata \(static.yaml and secure\_static.yaml\) will be
restored to the bootstrap controller-0.
- Keyring:
- item=/opt/platform/.keyring/<|prefix|\_version>
- Patching and package repositories:
- item=/opt/patching
- item=/www/pages/updates
- Extension filesystem:
- item=/opt/extension
- dc-vault filesystem for Distributed Cloud system-controller:
- item=/opt/dc-vault
- Armada manifests:
- item=/opt/platform/armada/<|prefix|\_version>
- Helm charts:
- item=/opt/platform/helm\_charts
.. _backing-up-starlingx-system-data-section-N1021A-N1002B-N10001:
-----------------------------------
Data not included in system backups
-----------------------------------
.. _backing-up-starlingx-system-data-ul-im2-b2y-lp:
- Application |PVCs| on Ceph clusters.
- StarlingX application data. Use the command :command:`system
application-list` to display a list of installed applications.
- Modifications manually made to the file systems, such as configuration
changes on the /etc directory. After a restore operation has been completed,
these modifications have to be reapplied.
- Home directories and passwords of local user accounts. They must be
backed up manually by the system administrator.
- The /root directory. Use the **sysadmin** account instead when root
access is needed.
.. note::
The system data backup can only be used to restore the cluster from
which the backup was made. You cannot use the system data backup to
restore the system to different hardware. Perform a system data backup
for each cluster and label the backup accordingly.
To ensure recovery from the backup file during a restore procedure,
containers must be in the active state when performing the backup.
Containers that are in a shutdown or paused state at the time of the
backup will not be recovered after a subsequent restore procedure.
When the system data backup is complete, the backup file must be kept in a
secured location, probably holding multiple copies of them for redundancy
purposes.
.. seealso::
:ref:`Run Ansible Backup Playbook Locally on the Controller
<running-ansible-backup-playbook-locally-on-the-controller>`
:ref:`Run Ansible Backup Playbook Remotely
<running-ansible-backup-playbook-remotely>`

View File

@ -0,0 +1,17 @@
=====================================
|prod-long| System Backup and Restore
=====================================
- System Data Backup
- :ref:`Backing Up Platform System Data <backing-up-cloud-platform-system-data>`
- :ref:`Running Ansible Backup Playbook Locally on the Controller <running-ansible-backup-playbook-locally-on-the-controller>`
- :ref:`Running Ansible Backup Playbook Remotely <running-ansible-backup-playbook-remotely>`
- System Data and Storage Restore
- :ref:`Restoring Platform System Data and Storage <restoring-cloud-platform-system-data-and-storage>`
- :ref:`Running Restore Playbook Locally on the Controller <running-restore-playbook-locally-on-the-controller>`
- :ref:`Running Ansible Restore Playbook Remotely <system-backup-running-ansible-restore-playbook-remotely>`

View File

@ -0,0 +1,30 @@
.. Backup and Restore file, created by
sphinx-quickstart on Thu Sep 3 15:14:59 2020.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
==================
Backup and Restore
==================
-------------
System backup
-------------
.. toctree::
:maxdepth: 1
backing-up-starlingx-system-data
running-ansible-backup-playbook-locally-on-the-controller
running-ansible-backup-playbook-remotely
--------------------------
System and storage restore
--------------------------
.. toctree::
:maxdepth: 1
restoring-starlingx-system-data-and-storage
running-restore-playbook-locally-on-the-controller
system-backup-running-ansible-restore-playbook-remotely

View File

@ -0,0 +1,385 @@
.. uzk1552923967458
.. _restoring-starlingx-system-data-and-storage:
========================================
Restore Platform System Data and Storage
========================================
You can perform a system restore \(controllers, workers, including or
excluding storage nodes\) of a |prod| cluster from available system data and
bring it back to the operational state it was when the backup procedure took
place.
.. rubric:: |context|
This procedure takes a snapshot of the etcd database at the time of backup,
stores it in the system data backup, and then uses it to initialize the
Kubernetes cluster during a restore. Kubernetes configuration will be
restored and pods that are started from repositories accessible from the
internet or from external repositories will start immediately. StarlingX
specific applications must be re-applied once a storage cluster is configured.
.. warning::
The system data backup file can only be used to restore the system from
which the backup was made. You cannot use this backup file to restore
the system to different hardware.
To restore the data, use the same version of the boot image \(ISO\) that
was used at the time of the original installation.
The |prod| restore supports two modes:
.. _restoring-starlingx-system-data-and-storage-ol-tw4-kvc-4jb:
#. To keep the Ceph cluster data intact \(false - default option\), use the
following syntax, when passing the extra arguments to the Ansible Restore
playbook command:
.. code-block:: none
wipe_ceph_osds=false
#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
need to be recreated, use the following syntax:
.. code-block:: none
wipe_ceph_osds=true
Restoring a |prod| cluster from a backup file is done by re-installing the
ISO on controller-0, running the Ansible Restore Playbook, applying updates
\(patches\), unlocking controller-0, and then powering on, and unlocking the
remaining hosts, one host at a time, starting with the controllers, and then
the storage hosts, ONLY if required, and lastly the compute \(worker\) hosts.
.. rubric:: |prereq|
Before you start the restore procedure you must ensure the following
conditions are in place:
.. _restoring-starlingx-system-data-and-storage-ul-rfq-qfg-mp:
- All cluster hosts must be prepared for network boot and then powered
down. You can prepare a host for network boot.
.. note::
If you are restoring system data only, do not lock, power off or
prepare the storage hosts to be reinstalled.
- The backup file is accessible locally, if restore is done by running
Ansible Restore playbook locally on the controller. The backup file is
accessible remotely, if restore is done by running Ansible Restore playbook
remotely.
- You have the original |prod| ISO installation image available on a USB
flash drive. It is mandatory that you use the exact same version of the
software used during the original installation, otherwise the restore
procedure will fail.
- The restore procedure requires all hosts but controller-0 to boot
over the internal management network using the |PXE| protocol. Ideally, the
old boot images are no longer present, so that the hosts boot from the
network when powered on. If this is not the case, you must configure each
host manually for network boot immediately after powering it on.
- If you are restoring a Distributed Cloud subcloud first, ensure it is in
an **unmanaged** state on the Central Cloud \(SystemController\) by using
the following commands:
.. code-block:: none
$ source /etc/platform/openrc
~(keystone_admin)$ dcmanager subcloud unmanage <subcloud-name>
where <subcloud-name> is the name of the subcloud to be unmanaged.
.. rubric:: |proc|
#. Power down all hosts.
If you have a storage host and want to retain Ceph data, then power down
all the nodes except the storage hosts; the cluster has to be functional
during a restore operation.
.. caution::
Do not use :command:`wipedisk` before a restore operation. This will
lead to data loss on your Ceph cluster. It is safe to use
:command:`wipedisk` during an initial installation, while reinstalling
a host, or during an upgrade.
#. Install the |prod| ISO software on controller-0 from the USB flash
drive.
You can now log in using the host's console.
#. Log in to the console as user **sysadmin** with password **sysadmin**.
#. Install network connectivity required for the subcloud.
#. Ensure the backup file is available on the controller. Run the Ansible
Restore playbook. For more information on restoring the back up file, see
:ref:`Run Restore Playbook Locally on the Controller
<running-restore-playbook-locally-on-the-controller>`, and :ref:`Run
Ansible Restore Playbook Remotely
<system-backup-running-ansible-restore-playbook-remotely>`.
.. note::
The backup file contains the system data and updates.
#. Update the controller's software to the previous updating level.
The current software version on the controller is compared against the
version available in the backup file. If the backed-up version includes
updates, the restore process automatically applies the updates and
forces an additional reboot of the controller to make them effective.
After the reboot, you can verify that the updates were applied, as
illustrated in the following example:
.. code-block:: none
$ sudo sw-patch query
Patch ID RR Release Patch State
======================== ========== ===========
COMPUTECONFIG Available 20.06 n/a
LIBCUNIT_CONTROLLER_ONLY Applied 20.06 n/a
STORAGECONFIG Applied 20.06 n/a
Rerun the Ansible Restore Playbook.
#. Unlock Controller-0.
.. code-block:: none
~(keystone_admin)$ system host-unlock controller-0
After you unlock controller-0, storage nodes become available and Ceph
becomes operational.
#. Authenticate the system as Keystone user **admin**.
Source the **admin** user environment as follows:
.. code-block:: none
$ source /etc/platform/openrc
#. For Simplex systems only, if :command:`wipe_ceph_osds` is set to false,
wait for the apps to transition from 'restore-requested' to the 'applied'
state.
If the apps are in 'apply-failed' state, ensure access to the docker
registry, and execute the following command for all custom applications
that need to be restored:
.. code-block:: none
~(keystone_admin)$ system application-apply <application>
For example, execute the following to restore stx-openstack.
.. code-block:: none
~(keystone_admin)$ system application-apply stx-openstack
.. note::
If you have a Simplex system, this is the last step in the process.
Wait for controller-0 to be in the unlocked, enabled, and available
state.
#. If you have a Duplex system, restore the controller-1 host.
#. List the current state of the hosts.
.. code-block:: none
~(keystone_admin)$ system host-list
+----+-------------+------------+---------------+-----------+------------+
| id | hostname | personality| administrative|operational|availability|
+----+-------------+------------+---------------+-----------+------------+
| 1 | controller-0| controller | unlocked |enabled |available |
| 2 | controller-1| controller | locked |disabled |offline |
| 3 | storage-0 | storage | locked |disabled |offline |
| 4 | storage-1 | storage | locked |disabled |offline |
| 5 | compute-0 | worker | locked |disabled |offline |
| 6 | compute-1 | worker | locked |disabled |offline |
+----+-------------+------------+---------------+-----------+------------+
#. Power on the host.
Ensure that the host boots from the network, and not from any disk
image that may be present.
The software is installed on the host, and then the host is
rebooted. Wait for the host to be reported as **locked**, **disabled**,
and **offline**.
#. Unlock controller-1.
.. code-block:: none
~(keystone_admin)$ system host-unlock controller-1
+-----------------+--------------------------------------+
| Property | Value |
+-----------------+--------------------------------------+
| action | none |
| administrative | locked |
| availability | online |
| ... | ... |
| uuid | 5fc4904a-d7f0-42f0-991d-0c00b4b74ed0 |
+-----------------+--------------------------------------+
#. Verify the state of the hosts.
.. code-block:: none
~(keystone_admin)$ system host-list
+----+-------------+------------+---------------+-----------+------------+
| id | hostname | personality| administrative|operational|availability|
+----+-------------+------------+---------------+-----------+------------+
| 1 | controller-0| controller | unlocked |enabled |available |
| 2 | controller-1| controller | unlocked |enabled |available |
| 3 | storage-0 | storage | locked |disabled |offline |
| 4 | storage-1 | storage | locked |disabled |offline |
| 5 | compute-0 | worker | locked |disabled |offline |
| 6 | compute-1 | worker | locked |disabled |offline |
+----+-------------+------------+---------------+-----------+------------+
#. Restore storage configuration. If :command:`wipe_ceph_osds` is set to
**True**, follow the same procedure used to restore controller-1,
beginning with host storage-0 and proceeding in sequence.
.. note::
This step should be performed ONLY if you are restoring storage hosts.
#. For storage hosts, there are two options:
With the controller software installed and updated to the same level
that was in effect when the backup was performed, you can perform
the restore procedure without interruption.
Standard with Controller Storage install or reinstall depends on the
:command:`wipe_ceph_osds` configuration:
#. If :command:`wipe_ceph_osds` is set to **true**, reinstall the
storage hosts.
#. If :command:`wipe_ceph_osds` is set to **false** \(default
option\), do not reinstall the storage hosts.
.. caution::
Do not reinstall or power off the storage hosts if you want to
keep previous Ceph cluster data. A reinstall of storage hosts
will lead to data loss.
#. Ensure that the Ceph cluster is healthy. Verify that the three Ceph
monitors \(controller-0, controller-1, storage-0\) are running in
quorum.
.. code-block:: none
~(keystone_admin)$ ceph -s
cluster:
id: 3361e4ef-b0b3-4f94-97c6-b384f416768d
health: HEALTH_OK
services:
mon: 3 daemons, quorum controller-0,controller-1,storage-0
mgr: controller-0(active), standbys: controller-1
osd: 10 osds: 10 up, 10 in
data:
pools: 5 pools, 600 pgs
objects: 636 objects, 2.7 GiB
usage: 6.5 GiB used, 2.7 TiB / 2.7 TiB avail
pgs: 600 active+clean
io:
client: 85 B/s rd, 336 KiB/s wr, 0 op/s rd, 67 op/s wr
.. caution::
Do not proceed until the Ceph cluster is healthy and the message
HEALTH\_OK appears.
If the message HEALTH\_WARN appears, wait a few minutes and then try
again. If the warning condition persists, consult the public
documentation for troubleshooting Ceph monitors \(for example,
`http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshootin
g-mon/
<http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshootin
g-mon/>`__\).
#. Restore the compute \(worker\) hosts, one at a time.
Restore the compute \(worker\) hosts following the same procedure used to
restore controller-1.
#. Unlock the compute hosts. The restore is complete.
The state of the hosts when the restore operation is complete is as
follows:
.. code-block:: none
~(keystone_admin)$ system host-list
+----+-------------+------------+---------------+-----------+------------+
| id | hostname | personality| administrative|operational|availability|
+----+-------------+------------+---------------+-----------+------------+
| 1 | controller-0| controller | unlocked |enabled |available |
| 2 | controller-1| controller | unlocked |enabled |available |
| 3 | storage-0 | storage | unlocked |enabled |available |
| 4 | storage-1 | storage | unlocked |enabled |available |
| 5 | compute-0 | worker | unlocked |enabled |available |
| 6 | compute-1 | worker | unlocked |enabled |available |
+----+-------------+------------+---------------+-----------+------------+
#. For Duplex systems only, if :command:`wipe_ceph_osds` is set to false, wait
for the apps to transition from 'restore-requested' to the 'applied' state.
If the apps are in 'apply-failed' state, ensure access to the docker
registry, and execute the following command for all custom applications
that need to be restored:
.. code-block:: none
~(keystone_admin)$ system application-apply <application>
For example, execute the following to restore stx-openstack.
.. code-block:: none
~(keystone_admin)$ system application-apply stx-openstack
.. rubric:: |postreq|
.. _restoring-starlingx-system-data-and-storage-ul-b2b-shg-plb:
- Passwords for local user accounts must be restored manually since they
are not included as part of the backup and restore procedures.
- After restoring a Distributed Cloud subcloud, you need to bring it back
to the **managed** state on the Central Cloud \(SystemController\), by
using the following commands:
.. code-block:: none
$ source /etc/platform/openrc
~(keystone_admin)$ dcmanager subcloud manage <subcloud-name>
where <subcloud-name> is the name of the subcloud to be managed.
.. comments in steps seem to throw numbering off.
.. xreflink removed from step 'Install the |prod| ISO software on controller-0 from the USB flash
drive.':
For details, refer to the |inst-doc|: :ref:`Installing Software on
controller-0 <installing-software-on-controller-0>`. Perform the
installation procedure for your system and *stop* at the step that
requires you to configure the host as a controller.
.. xreflink removed from step 'Install network connectivity required for the subcloud.':
For details, refer to the |distcloud-doc|: :ref:`Installing and
Provisioning a Subcloud <installing-and-provisioning-a-subcloud>`.

View File

@ -0,0 +1,61 @@
.. bqg1571264986191
.. _running-ansible-backup-playbook-locally-on-the-controller:
=====================================================
Run Ansible Backup Playbook Locally on the Controller
=====================================================
In this method the Ansible Backup playbook is run on the active controller.
Use the following command to run the Ansible Backup playbook and back up the
|prod| configuration, data, and optionally the user container images in
registry.local data:
.. code-block:: none
~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass=<sysadmin password> admin_password=<sysadmin password>" [ -e "backup_user_local_registry=true" ]
The <admin\_password\> and <ansible\_become\_pass\> need to be set correctly
using the ``-e`` option on the command line, or an override file, or in the Ansible
secret file.
The output files will be named:
.. _running-ansible-backup-playbook-locally-on-the-controller-ul-wj1-vxh-pmb:
- inventory\_hostname\_platform\_backup\_timestamp.tgz
- inventory\_hostname\_openstack\_backup\_timestamp.tgz
- inventory\_hostname\_docker\_local\_registry\_backup\_timestamp.tgz
The variables prefix can be overridden using the ``-e`` option on the command
line or by using an override file.
.. _running-ansible-backup-playbook-locally-on-the-controller-ul-rdp-gyh-pmb:
- platform\_backup\_filename\_prefix
- openstack\_backup\_filename\_prefix
- docker\_local\_registry\_backup\_filename\_prefix
The generated backup tar files will be displayed in the following format,
for example:
.. _running-ansible-backup-playbook-locally-on-the-controller-ul-p3b-f13-pmb:
- localhost\_docker\_local\_registry\_backup\_2020\_07\_15\_21\_24\_22.tgz
- localhost\_platform\_backup\_2020\_07\_15\_21\_24\_22.tgz
- localhost\_openstack\_backup\_2020\_07\_15\_21\_24\_22.tgz
These files are located by default in the /opt/backups directory on
controller-0, and contains the complete system backup.
If the default location needs to be modified, the variable backup\_dir can
be overridden using the ``-e`` option on the command line or by using an
override file.

View File

@ -0,0 +1,57 @@
.. kpt1571265015137
.. _running-ansible-backup-playbook-remotely:
====================================
Run Ansible Backup Playbook Remotely
====================================
In this method you can run Ansible Backup playbook on a remote workstation
and target it at controller-0.
.. rubric:: |prereq|
.. _running-ansible-backup-playbook-remotely-ul-evh-yn4-bkb:
- You need to have Ansible installed on your remote workstation, along
with the Ansible Backup/Restore playbooks.
- Your network has IPv6 connectivity before running Ansible Playbook, if
the system configuration is IPv6.
.. rubric:: |proc|
.. _running-ansible-backup-playbook-remotely-steps-bnw-bnc-ljb:
#. Log in to the remote workstation.
#. Provide an Ansible hosts file, either, a customized one that is
specified using the ``-i`` option, or the default one that resides in the
Ansible configuration directory \(that is, /etc/ansible/hosts\). You must
specify the floating |OAM| IP of the controller host. For example, if the
host name is |prefix|\_Cluster, the inventory file should have an entry
|prefix|\_Cluster, for example:
.. parsed-literal::
---
all:
hosts:
wc68:
ansible_host: 128.222.100.02
|prefix|\_Cluster:
ansible_host: 128.224.141.74
#. Run Ansible Backup playbook:
.. code-block:: none
~(keystone_admin)$ ansible-playbook <path-to-backup-playbook-entry-file> --limit host-name -i <inventory-file> -e <optional-extra-vars>
The generated backup tar file can be found in <host\_backup\_dir>, that
is, /home/sysadmin, by default. You can overwrite it using the ``-e``
option on the command line or in an override file.
.. warning::
If a backup of the **local registry images** file is created, the
file is not copied from the remote machine to the local machine.

View File

@ -0,0 +1,66 @@
.. rmy1571265233932
.. _running-restore-playbook-locally-on-the-controller:
==============================================
Run Restore Playbook Locally on the Controller
==============================================
To run restore on the controller, you need to download the backup to the
active controller.
.. rubric:: |context|
You can use an external storage device, for example, a USB drive. Use the
following command to run the Ansible Restore playbook:
.. code-block:: none
~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=<location_of_tarball ansible_become_pass=<admin_password> admin_password=<admin_password backup_filename=<backup_filename> wipe_ceph_osds=<true/false>"
The |prod| restore supports two optional modes, keeping the Ceph cluster data
intact or wiping the Ceph cluster.
.. rubric:: |proc|
.. _running-restore-playbook-locally-on-the-controller-steps-usl-2c3-pmb:
#. To keep the Ceph cluster data intact \(false - default option\), use the
following command:
.. code-block:: none
wipe_ceph_osds=false
#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will
need to be recreated, use the following command:
.. code-block:: none
wipe_ceph_osds=true
Example of a backup file in /home/sysadmin
.. code-block:: none
~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=/home/sysadmin ansible_become_pass=St0rlingX* admin_password=St0rlingX* backup_filename=localhost_platform_backup_2020_07_27_07_48_48.tgz wipe_ceph_osds=true"
.. note::
If the backup contains patches, Ansible Restore playbook will apply
the patches and prompt you to reboot the system. Then you will need to
re-run Ansible Restore playbook.
.. rubric:: |postreq|
After running restore\_platform.yml playbook, you can restore the local
registry images.
.. note::
The backup file of the local registry images may be large. Restore the
backed up file on the controller, where there is sufficient space.
For example:
.. code-block:: none
~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_user_images.yml -e "initial_backup_dir=/home/sysadmin backup_filename=localhost_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_become_pass=St0rlingX*"

View File

@ -0,0 +1,147 @@
.. quy1571265365123
.. _system-backup-running-ansible-restore-playbook-remotely:
=====================================
Run Ansible Restore Playbook Remotely
=====================================
In this method you can run Ansible Restore playbook and point to controller-0.
.. rubric:: |prereq|
.. _system-backup-running-ansible-restore-playbook-remotely-ul-ylm-g44-bkb:
- You need to have Ansible installed on your remote workstation, along
with the Ansible Backup/Restore playbooks.
- Your network has IPv6 connectivity before running Ansible Playbook, if
the system configuration is IPv6.
.. rubric:: |proc|
.. _system-backup-running-ansible-restore-playbook-remotely-steps-sgp-jjc-ljb:
#. Log in to the remote workstation.
You can log in directly on the console or remotely using :command:`ssh`.
#. Provide an inventory file, either a customized one that is specified
using the ``-i`` option, or the default one that is in the Ansible
configuration directory \(that is, /etc/ansible/hosts\). You must
specify the floating |OAM| IP of the controller host. For example, if the
host name is |prefix|\_Cluster, the inventory file should have an entry
called |prefix|\_Cluster.
.. parsed-literal::
---
all:
hosts:
wc68:
ansible_host: 128.222.100.02
|prefix|\_Cluster:
ansible_host: 128.224.141.74
#. Run the Ansible Restore playbook:
.. code-block:: none
~(keystone_admin)$ ansible-playbook path-to-restore-platform-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars
where optional-extra-vars can be:
- **Optional**: You can select one of the two restore modes:
- To keep Ceph data intact \(false - default option\), use the
following syntax:
:command:`wipe_ceph_osds=false`
- Start with an empty Ceph cluster \(true\), to recreate a new
Ceph cluster, use the following syntax:
:command:`wipe_ceph_osds=true`
- The backup\_filename is the platform backup tar file. It must be
provided using the ``-e`` option on the command line, for example:
.. code-block:: none
-e backup\_filename= localhost_platform_backup_2019_07_15_14_46_37.tgz
- The initial\_backup\_dir is the location on the Ansible control
machine where the platform backup tar file is placed to restore the
platform. It must be provided using ``-e`` option on the command line.
- The :command:`admin\_password`, :command:`ansible\_become\_pass`,
and :command:`ansible\_ssh\_pass` need to be set correctly using
the ``-e`` option on the command line or in the Ansible secret file.
:command:`ansible\_ssh\_pass` is the password to the sysadmin user
on controller-0.
- The :command:`ansible\_remote\_tmp` should be set to a new
directory \(not required to create it ahead of time\) under
/home/sysadmin on controller-0 using the ``-e`` option on the command
line.
For example:
.. parsed-literal::
~(keystone_admin)$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_platform.yml --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* admin_password=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_system_backup_2019_08_08_15_25_36.tgz ansible_remote_tmp=/home/sysadmin/ansible-restore"
.. note::
If the backup contains patches, Ansible Restore playbook will apply
the patches and prompt you to reboot the system. Then you will need to
re-run Ansible Restore playbook.
#. After running the restore\_platform.yml playbook, you can restore the local
registry images.
.. note::
The backup file of the local registry may be large. Restore the
backed up file on the controller, where there is sufficient space.
.. code-block:: none
~(keystone_admin)$ ansible-playbook path-to-restore-user-images-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars
where optional-extra-vars can be:
- The backup\_filename is the local registry backup tar file. It
must be provided using the ``-e`` option on the command line, for
example:
.. code-block:: none
-e backup\_filename= localhost_docker_local_registry_backup_2020_07_15_21_24_22.tgz
- The initial\_backup\_dir is the location on the Ansible control
machine where the platform backup tar file is located. It must be
provided using ``-e`` option on the command line.
- The :command:`ansible\_become\_pass`, and
:command:`ansible\_ssh\_pass` need to be set correctly using the
``-e`` option on the command line or in the Ansible secret file.
:command:`ansible\_ssh\_pass` is the password to the sysadmin user
on controller-0.
- The backup\_dir should be set to a directory on controller-0.
The directory must have sufficient space for local registry backup
to be copied. The backup\_dir is set using the ``-e`` option on the
command line.
- The :command:`ansible\_remote\_tmp` should be set to a new
directory on controller-0. Ansible will use this directory to copy
files, and the directory must have sufficient space for local
registry backup to be copied. The :command:`ansible\_remote\_tmp`
is set using the ``-e`` option on the command line.
For example, run the local registry restore playbook, where
/sufficient/space directory on the controller has sufficient space left
for the archived file to be copied.
.. parsed-literal::
~(keystone_admin)$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_user_images.ym --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_remote_tmp=/sufficient/space backup_dir=/sufficient/space"

View File

@ -102,6 +102,15 @@ Operation guides
operations/index operations/index
------------------
Backup and restore
------------------
.. toctree::
:maxdepth: 2
backup/index
--------- ---------
Reference Reference
--------- ---------

View File

@ -53,6 +53,7 @@
.. |CNI| replace:: :abbr:`CNI (Container Networking Interface)` .. |CNI| replace:: :abbr:`CNI (Container Networking Interface)`
.. |IPMI| replace:: :abbr:`IPMI (Intelligent Platform Management Interface)` .. |IPMI| replace:: :abbr:`IPMI (Intelligent Platform Management Interface)`
.. |LAG| replace:: :abbr:`LAG (Link Aggregation)` .. |LAG| replace:: :abbr:`LAG (Link Aggregation)`
.. |LDAP| replace:: :abbr:`LDAP (Lightweight Directory Access Protocol)`
.. |MEC| replace:: :abbr:`MEC (Multi-access Edge Computing)` .. |MEC| replace:: :abbr:`MEC (Multi-access Edge Computing)`
.. |NVMe| replace:: :abbr:`NVMe (Non Volatile Memory express)` .. |NVMe| replace:: :abbr:`NVMe (Non Volatile Memory express)`
.. |OAM| replace:: :abbr:`OAM (Operations, administration and management)` .. |OAM| replace:: :abbr:`OAM (Operations, administration and management)`
@ -60,6 +61,7 @@
.. |OSDs| replace:: :abbr:`OSDs (Object Storage Devices)` .. |OSDs| replace:: :abbr:`OSDs (Object Storage Devices)`
.. |PVC| replace:: :abbr:`PVC (Persistent Volume Claim)` .. |PVC| replace:: :abbr:`PVC (Persistent Volume Claim)`
.. |PVCs| replace:: :abbr:`PVCs (Persistent Volume Claims)` .. |PVCs| replace:: :abbr:`PVCs (Persistent Volume Claims)`
.. |PXE| replace:: :abbr:`PXE (Preboot Execution Environment)`
.. |SAS| replace:: :abbr:`SAS (Serial Attached SCSI)` .. |SAS| replace:: :abbr:`SAS (Serial Attached SCSI)`
.. |SATA| replace:: :abbr:`SATA (Serial AT Attachment)` .. |SATA| replace:: :abbr:`SATA (Serial AT Attachment)`
.. |SNMP| replace:: :abbr:`SNMP (Simple Network Management Protocol)` .. |SNMP| replace:: :abbr:`SNMP (Simple Network Management Protocol)`