diff --git a/doc/source/backup/backing-up-starlingx-system-data.rst b/doc/source/backup/backing-up-starlingx-system-data.rst new file mode 100644 index 000000000..8801c1e9d --- /dev/null +++ b/doc/source/backup/backing-up-starlingx-system-data.rst @@ -0,0 +1,159 @@ + +.. hgq1552923986183 +.. _backing-up-starlingx-system-data: + +=================== +Back Up System Data +=================== + +A system data backup of a |prod-long| system captures core system +information needed to restore a fully operational |prod-long| cluster. + +.. contents:: In this section: + :local: + :depth: 1 + +.. _backing-up-starlingx-system-data-section-N1002E-N1002B-N10001: + +System Data Backups include: + +.. _backing-up-starlingx-system-data-ul-enh-3dl-lp: + +- platform configuration details + +- system databases + +- patching and package repositories + +- home directory for the **sysadmin** user and all |LDAP| user accounts. + +.. xreflink See |sec-doc|: :ref:`Local LDAP Linux User Accounts + ` for additional information. + + .. note:: + If there is any change in hardware configuration, for example, new + NICs, a system backup is required to ensure that there is no + configuration mismatch after system restore. + +.. _backing-up-starlingx-system-data-section-N10089-N1002B-N10001: + +------------------------------------ +Detailed contents of a system backup +------------------------------------ + +The backup contains details as listed below: + +.. _backing-up-starlingx-system-data-ul-s3t-bz4-kjb: + +- Postgresql backup databases + +- |LDAP| database + +- Ceph crushmap + +- DNS server list + +- System Inventory network configuration is required during a system + restore to set up the OS configuration. + +- Docker registries on controller + +- Docker no-proxy + +- \(Optional\) Any end user container images in **registry.local**; that + is, any images other than |org| system and application images. + |prod| system and application images are repulled from their + original source, external registries during the restore procedure. + +- Backup up data: + + - OS configuration: + + - \(item=/etc\) + + .. note:: + Although everything is backed up, not all the content is restored. + + - Home directory 'sysadmin' user, and all |LDAP| user accounts + \(item=/etc\) + + - Generated platform configuration: + + - item=/opt/platform/config/<|prefix|\_version> + + - item=/opt/platform/puppet/<|prefix|\_version>/hieradata: + + All the hieradata under is backed-up. However, only the static + hieradata \(static.yaml and secure\_static.yaml\) will be + restored to the bootstrap controller-0. + + - Keyring: + + - item=/opt/platform/.keyring/<|prefix|\_version> + + - Patching and package repositories: + + - item=/opt/patching + + - item=/www/pages/updates + + - Extension filesystem: + + - item=/opt/extension + + - dc-vault filesystem for Distributed Cloud system-controller: + + - item=/opt/dc-vault + + - Armada manifests: + + - item=/opt/platform/armada/<|prefix|\_version> + + - Helm charts: + + - item=/opt/platform/helm\_charts + +.. _backing-up-starlingx-system-data-section-N1021A-N1002B-N10001: + +----------------------------------- +Data not included in system backups +----------------------------------- + +.. _backing-up-starlingx-system-data-ul-im2-b2y-lp: + +- Application |PVCs| on Ceph clusters. + +- StarlingX application data. Use the command :command:`system + application-list` to display a list of installed applications. + +- Modifications manually made to the file systems, such as configuration + changes on the /etc directory. After a restore operation has been completed, + these modifications have to be reapplied. + +- Home directories and passwords of local user accounts. They must be + backed up manually by the system administrator. + +- The /root directory. Use the **sysadmin** account instead when root + access is needed. + +.. note:: + The system data backup can only be used to restore the cluster from + which the backup was made. You cannot use the system data backup to + restore the system to different hardware. Perform a system data backup + for each cluster and label the backup accordingly. + + To ensure recovery from the backup file during a restore procedure, + containers must be in the active state when performing the backup. + Containers that are in a shutdown or paused state at the time of the + backup will not be recovered after a subsequent restore procedure. + +When the system data backup is complete, the backup file must be kept in a +secured location, probably holding multiple copies of them for redundancy +purposes. + +.. seealso:: + :ref:`Run Ansible Backup Playbook Locally on the Controller + ` + + :ref:`Run Ansible Backup Playbook Remotely + ` diff --git a/doc/source/backup/index.rs1 b/doc/source/backup/index.rs1 new file mode 100644 index 000000000..d4a31a1a1 --- /dev/null +++ b/doc/source/backup/index.rs1 @@ -0,0 +1,17 @@ +===================================== +|prod-long| System Backup and Restore +===================================== + +- System Data Backup + + - :ref:`Backing Up Platform System Data ` + + - :ref:`Running Ansible Backup Playbook Locally on the Controller ` + - :ref:`Running Ansible Backup Playbook Remotely ` + + +- System Data and Storage Restore + + - :ref:`Restoring Platform System Data and Storage ` + - :ref:`Running Restore Playbook Locally on the Controller ` + - :ref:`Running Ansible Restore Playbook Remotely ` diff --git a/doc/source/backup/index.rst b/doc/source/backup/index.rst new file mode 100644 index 000000000..12193ba3b --- /dev/null +++ b/doc/source/backup/index.rst @@ -0,0 +1,30 @@ +.. Backup and Restore file, created by + sphinx-quickstart on Thu Sep 3 15:14:59 2020. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +================== +Backup and Restore +================== + +------------- +System backup +------------- + +.. toctree:: + :maxdepth: 1 + + backing-up-starlingx-system-data + running-ansible-backup-playbook-locally-on-the-controller + running-ansible-backup-playbook-remotely + +-------------------------- +System and storage restore +-------------------------- + +.. toctree:: + :maxdepth: 1 + + restoring-starlingx-system-data-and-storage + running-restore-playbook-locally-on-the-controller + system-backup-running-ansible-restore-playbook-remotely diff --git a/doc/source/backup/restoring-starlingx-system-data-and-storage.rst b/doc/source/backup/restoring-starlingx-system-data-and-storage.rst new file mode 100644 index 000000000..8bef3923d --- /dev/null +++ b/doc/source/backup/restoring-starlingx-system-data-and-storage.rst @@ -0,0 +1,385 @@ + +.. uzk1552923967458 +.. _restoring-starlingx-system-data-and-storage: + +======================================== +Restore Platform System Data and Storage +======================================== + +You can perform a system restore \(controllers, workers, including or +excluding storage nodes\) of a |prod| cluster from available system data and +bring it back to the operational state it was when the backup procedure took +place. + +.. rubric:: |context| + +This procedure takes a snapshot of the etcd database at the time of backup, +stores it in the system data backup, and then uses it to initialize the +Kubernetes cluster during a restore. Kubernetes configuration will be +restored and pods that are started from repositories accessible from the +internet or from external repositories will start immediately. StarlingX +specific applications must be re-applied once a storage cluster is configured. + +.. warning:: + The system data backup file can only be used to restore the system from + which the backup was made. You cannot use this backup file to restore + the system to different hardware. + + To restore the data, use the same version of the boot image \(ISO\) that + was used at the time of the original installation. + +The |prod| restore supports two modes: + +.. _restoring-starlingx-system-data-and-storage-ol-tw4-kvc-4jb: + +#. To keep the Ceph cluster data intact \(false - default option\), use the + following syntax, when passing the extra arguments to the Ansible Restore + playbook command: + + .. code-block:: none + + wipe_ceph_osds=false + +#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will + need to be recreated, use the following syntax: + + .. code-block:: none + + wipe_ceph_osds=true + +Restoring a |prod| cluster from a backup file is done by re-installing the +ISO on controller-0, running the Ansible Restore Playbook, applying updates +\(patches\), unlocking controller-0, and then powering on, and unlocking the +remaining hosts, one host at a time, starting with the controllers, and then +the storage hosts, ONLY if required, and lastly the compute \(worker\) hosts. + +.. rubric:: |prereq| + +Before you start the restore procedure you must ensure the following +conditions are in place: + +.. _restoring-starlingx-system-data-and-storage-ul-rfq-qfg-mp: + +- All cluster hosts must be prepared for network boot and then powered + down. You can prepare a host for network boot. + + .. note:: + If you are restoring system data only, do not lock, power off or + prepare the storage hosts to be reinstalled. + +- The backup file is accessible locally, if restore is done by running + Ansible Restore playbook locally on the controller. The backup file is + accessible remotely, if restore is done by running Ansible Restore playbook + remotely. + +- You have the original |prod| ISO installation image available on a USB + flash drive. It is mandatory that you use the exact same version of the + software used during the original installation, otherwise the restore + procedure will fail. + +- The restore procedure requires all hosts but controller-0 to boot + over the internal management network using the |PXE| protocol. Ideally, the + old boot images are no longer present, so that the hosts boot from the + network when powered on. If this is not the case, you must configure each + host manually for network boot immediately after powering it on. + +- If you are restoring a Distributed Cloud subcloud first, ensure it is in + an **unmanaged** state on the Central Cloud \(SystemController\) by using + the following commands: + + .. code-block:: none + + $ source /etc/platform/openrc + ~(keystone_admin)$ dcmanager subcloud unmanage + + where is the name of the subcloud to be unmanaged. + +.. rubric:: |proc| + +#. Power down all hosts. + + If you have a storage host and want to retain Ceph data, then power down + all the nodes except the storage hosts; the cluster has to be functional + during a restore operation. + + .. caution:: + Do not use :command:`wipedisk` before a restore operation. This will + lead to data loss on your Ceph cluster. It is safe to use + :command:`wipedisk` during an initial installation, while reinstalling + a host, or during an upgrade. + +#. Install the |prod| ISO software on controller-0 from the USB flash + drive. + + You can now log in using the host's console. + +#. Log in to the console as user **sysadmin** with password **sysadmin**. + +#. Install network connectivity required for the subcloud. + +#. Ensure the backup file is available on the controller. Run the Ansible + Restore playbook. For more information on restoring the back up file, see + :ref:`Run Restore Playbook Locally on the Controller + `, and :ref:`Run + Ansible Restore Playbook Remotely + `. + + .. note:: + The backup file contains the system data and updates. + +#. Update the controller's software to the previous updating level. + + The current software version on the controller is compared against the + version available in the backup file. If the backed-up version includes + updates, the restore process automatically applies the updates and + forces an additional reboot of the controller to make them effective. + + After the reboot, you can verify that the updates were applied, as + illustrated in the following example: + + .. code-block:: none + + $ sudo sw-patch query + Patch ID RR Release Patch State + ======================== ========== =========== + COMPUTECONFIG Available 20.06 n/a + LIBCUNIT_CONTROLLER_ONLY Applied 20.06 n/a + STORAGECONFIG Applied 20.06 n/a + + Rerun the Ansible Restore Playbook. + +#. Unlock Controller-0. + + .. code-block:: none + + ~(keystone_admin)$ system host-unlock controller-0 + + After you unlock controller-0, storage nodes become available and Ceph + becomes operational. + +#. Authenticate the system as Keystone user **admin**. + + Source the **admin** user environment as follows: + + .. code-block:: none + + $ source /etc/platform/openrc + +#. For Simplex systems only, if :command:`wipe_ceph_osds` is set to false, + wait for the apps to transition from 'restore-requested' to the 'applied' + state. + + If the apps are in 'apply-failed' state, ensure access to the docker + registry, and execute the following command for all custom applications + that need to be restored: + + .. code-block:: none + + ~(keystone_admin)$ system application-apply + + For example, execute the following to restore stx-openstack. + + .. code-block:: none + + ~(keystone_admin)$ system application-apply stx-openstack + + .. note:: + If you have a Simplex system, this is the last step in the process. + + Wait for controller-0 to be in the unlocked, enabled, and available + state. + +#. If you have a Duplex system, restore the controller-1 host. + + #. List the current state of the hosts. + + .. code-block:: none + + ~(keystone_admin)$ system host-list + +----+-------------+------------+---------------+-----------+------------+ + | id | hostname | personality| administrative|operational|availability| + +----+-------------+------------+---------------+-----------+------------+ + | 1 | controller-0| controller | unlocked |enabled |available | + | 2 | controller-1| controller | locked |disabled |offline | + | 3 | storage-0 | storage | locked |disabled |offline | + | 4 | storage-1 | storage | locked |disabled |offline | + | 5 | compute-0 | worker | locked |disabled |offline | + | 6 | compute-1 | worker | locked |disabled |offline | + +----+-------------+------------+---------------+-----------+------------+ + + #. Power on the host. + + Ensure that the host boots from the network, and not from any disk + image that may be present. + + The software is installed on the host, and then the host is + rebooted. Wait for the host to be reported as **locked**, **disabled**, + and **offline**. + + #. Unlock controller-1. + + .. code-block:: none + + ~(keystone_admin)$ system host-unlock controller-1 + +-----------------+--------------------------------------+ + | Property | Value | + +-----------------+--------------------------------------+ + | action | none | + | administrative | locked | + | availability | online | + | ... | ... | + | uuid | 5fc4904a-d7f0-42f0-991d-0c00b4b74ed0 | + +-----------------+--------------------------------------+ + + #. Verify the state of the hosts. + + .. code-block:: none + + ~(keystone_admin)$ system host-list + +----+-------------+------------+---------------+-----------+------------+ + | id | hostname | personality| administrative|operational|availability| + +----+-------------+------------+---------------+-----------+------------+ + | 1 | controller-0| controller | unlocked |enabled |available | + | 2 | controller-1| controller | unlocked |enabled |available | + | 3 | storage-0 | storage | locked |disabled |offline | + | 4 | storage-1 | storage | locked |disabled |offline | + | 5 | compute-0 | worker | locked |disabled |offline | + | 6 | compute-1 | worker | locked |disabled |offline | + +----+-------------+------------+---------------+-----------+------------+ + +#. Restore storage configuration. If :command:`wipe_ceph_osds` is set to + **True**, follow the same procedure used to restore controller-1, + beginning with host storage-0 and proceeding in sequence. + + .. note:: + This step should be performed ONLY if you are restoring storage hosts. + + #. For storage hosts, there are two options: + + With the controller software installed and updated to the same level + that was in effect when the backup was performed, you can perform + the restore procedure without interruption. + + Standard with Controller Storage install or reinstall depends on the + :command:`wipe_ceph_osds` configuration: + + #. If :command:`wipe_ceph_osds` is set to **true**, reinstall the + storage hosts. + + #. If :command:`wipe_ceph_osds` is set to **false** \(default + option\), do not reinstall the storage hosts. + + .. caution:: + Do not reinstall or power off the storage hosts if you want to + keep previous Ceph cluster data. A reinstall of storage hosts + will lead to data loss. + + #. Ensure that the Ceph cluster is healthy. Verify that the three Ceph + monitors \(controller-0, controller-1, storage-0\) are running in + quorum. + + .. code-block:: none + + ~(keystone_admin)$ ceph -s + cluster: + id: 3361e4ef-b0b3-4f94-97c6-b384f416768d + health: HEALTH_OK + + services: + mon: 3 daemons, quorum controller-0,controller-1,storage-0 + mgr: controller-0(active), standbys: controller-1 + osd: 10 osds: 10 up, 10 in + + data: + pools: 5 pools, 600 pgs + objects: 636 objects, 2.7 GiB + usage: 6.5 GiB used, 2.7 TiB / 2.7 TiB avail + pgs: 600 active+clean + + io: + client: 85 B/s rd, 336 KiB/s wr, 0 op/s rd, 67 op/s wr + + .. caution:: + Do not proceed until the Ceph cluster is healthy and the message + HEALTH\_OK appears. + + If the message HEALTH\_WARN appears, wait a few minutes and then try + again. If the warning condition persists, consult the public + documentation for troubleshooting Ceph monitors \(for example, + `http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshootin + g-mon/ + `__\). + +#. Restore the compute \(worker\) hosts, one at a time. + + Restore the compute \(worker\) hosts following the same procedure used to + restore controller-1. + +#. Unlock the compute hosts. The restore is complete. + + The state of the hosts when the restore operation is complete is as + follows: + + .. code-block:: none + + ~(keystone_admin)$ system host-list + +----+-------------+------------+---------------+-----------+------------+ + | id | hostname | personality| administrative|operational|availability| + +----+-------------+------------+---------------+-----------+------------+ + | 1 | controller-0| controller | unlocked |enabled |available | + | 2 | controller-1| controller | unlocked |enabled |available | + | 3 | storage-0 | storage | unlocked |enabled |available | + | 4 | storage-1 | storage | unlocked |enabled |available | + | 5 | compute-0 | worker | unlocked |enabled |available | + | 6 | compute-1 | worker | unlocked |enabled |available | + +----+-------------+------------+---------------+-----------+------------+ + +#. For Duplex systems only, if :command:`wipe_ceph_osds` is set to false, wait + for the apps to transition from 'restore-requested' to the 'applied' state. + + If the apps are in 'apply-failed' state, ensure access to the docker + registry, and execute the following command for all custom applications + that need to be restored: + + .. code-block:: none + + ~(keystone_admin)$ system application-apply + + For example, execute the following to restore stx-openstack. + + .. code-block:: none + + ~(keystone_admin)$ system application-apply stx-openstack + +.. rubric:: |postreq| + +.. _restoring-starlingx-system-data-and-storage-ul-b2b-shg-plb: + +- Passwords for local user accounts must be restored manually since they + are not included as part of the backup and restore procedures. + +- After restoring a Distributed Cloud subcloud, you need to bring it back + to the **managed** state on the Central Cloud \(SystemController\), by + using the following commands: + + .. code-block:: none + + $ source /etc/platform/openrc + ~(keystone_admin)$ dcmanager subcloud manage + + where is the name of the subcloud to be managed. + + +.. comments in steps seem to throw numbering off. + +.. xreflink removed from step 'Install the |prod| ISO software on controller-0 from the USB flash + drive.': + For details, refer to the |inst-doc|: :ref:`Installing Software on + controller-0 `. Perform the + installation procedure for your system and *stop* at the step that + requires you to configure the host as a controller. + +.. xreflink removed from step 'Install network connectivity required for the subcloud.': + For details, refer to the |distcloud-doc|: :ref:`Installing and + Provisioning a Subcloud `. \ No newline at end of file diff --git a/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst b/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst new file mode 100644 index 000000000..d8378c58a --- /dev/null +++ b/doc/source/backup/running-ansible-backup-playbook-locally-on-the-controller.rst @@ -0,0 +1,61 @@ + +.. bqg1571264986191 +.. _running-ansible-backup-playbook-locally-on-the-controller: + +===================================================== +Run Ansible Backup Playbook Locally on the Controller +===================================================== + +In this method the Ansible Backup playbook is run on the active controller. + +Use the following command to run the Ansible Backup playbook and back up the +|prod| configuration, data, and optionally the user container images in +registry.local data: + +.. code-block:: none + + ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/backup.yml -e "ansible_become_pass= admin_password=" [ -e "backup_user_local_registry=true" ] + +The and need to be set correctly +using the ``-e`` option on the command line, or an override file, or in the Ansible +secret file. + +The output files will be named: + +.. _running-ansible-backup-playbook-locally-on-the-controller-ul-wj1-vxh-pmb: + +- inventory\_hostname\_platform\_backup\_timestamp.tgz + +- inventory\_hostname\_openstack\_backup\_timestamp.tgz + +- inventory\_hostname\_docker\_local\_registry\_backup\_timestamp.tgz + +The variables prefix can be overridden using the ``-e`` option on the command +line or by using an override file. + +.. _running-ansible-backup-playbook-locally-on-the-controller-ul-rdp-gyh-pmb: + +- platform\_backup\_filename\_prefix + +- openstack\_backup\_filename\_prefix + +- docker\_local\_registry\_backup\_filename\_prefix + +The generated backup tar files will be displayed in the following format, +for example: + +.. _running-ansible-backup-playbook-locally-on-the-controller-ul-p3b-f13-pmb: + +- localhost\_docker\_local\_registry\_backup\_2020\_07\_15\_21\_24\_22.tgz + +- localhost\_platform\_backup\_2020\_07\_15\_21\_24\_22.tgz + +- localhost\_openstack\_backup\_2020\_07\_15\_21\_24\_22.tgz + +These files are located by default in the /opt/backups directory on +controller-0, and contains the complete system backup. + +If the default location needs to be modified, the variable backup\_dir can +be overridden using the ``-e`` option on the command line or by using an +override file. + diff --git a/doc/source/backup/running-ansible-backup-playbook-remotely.rst b/doc/source/backup/running-ansible-backup-playbook-remotely.rst new file mode 100644 index 000000000..a0d5c619d --- /dev/null +++ b/doc/source/backup/running-ansible-backup-playbook-remotely.rst @@ -0,0 +1,57 @@ + +.. kpt1571265015137 +.. _running-ansible-backup-playbook-remotely: + +==================================== +Run Ansible Backup Playbook Remotely +==================================== + +In this method you can run Ansible Backup playbook on a remote workstation +and target it at controller-0. + +.. rubric:: |prereq| + +.. _running-ansible-backup-playbook-remotely-ul-evh-yn4-bkb: + +- You need to have Ansible installed on your remote workstation, along + with the Ansible Backup/Restore playbooks. + +- Your network has IPv6 connectivity before running Ansible Playbook, if + the system configuration is IPv6. + +.. rubric:: |proc| + +.. _running-ansible-backup-playbook-remotely-steps-bnw-bnc-ljb: + +#. Log in to the remote workstation. + +#. Provide an Ansible hosts file, either, a customized one that is + specified using the ``-i`` option, or the default one that resides in the + Ansible configuration directory \(that is, /etc/ansible/hosts\). You must + specify the floating |OAM| IP of the controller host. For example, if the + host name is |prefix|\_Cluster, the inventory file should have an entry + |prefix|\_Cluster, for example: + + .. parsed-literal:: + + --- + all: + hosts: + wc68: + ansible_host: 128.222.100.02 + |prefix|\_Cluster: + ansible_host: 128.224.141.74 + +#. Run Ansible Backup playbook: + + .. code-block:: none + + ~(keystone_admin)$ ansible-playbook --limit host-name -i -e + + The generated backup tar file can be found in , that + is, /home/sysadmin, by default. You can overwrite it using the ``-e`` + option on the command line or in an override file. + + .. warning:: + If a backup of the **local registry images** file is created, the + file is not copied from the remote machine to the local machine. diff --git a/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst b/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst new file mode 100644 index 000000000..aaa246f8f --- /dev/null +++ b/doc/source/backup/running-restore-playbook-locally-on-the-controller.rst @@ -0,0 +1,66 @@ + +.. rmy1571265233932 +.. _running-restore-playbook-locally-on-the-controller: + +============================================== +Run Restore Playbook Locally on the Controller +============================================== + +To run restore on the controller, you need to download the backup to the +active controller. + +.. rubric:: |context| + +You can use an external storage device, for example, a USB drive. Use the +following command to run the Ansible Restore playbook: + +.. code-block:: none + + ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir= admin_password= wipe_ceph_osds=" + +The |prod| restore supports two optional modes, keeping the Ceph cluster data +intact or wiping the Ceph cluster. + +.. rubric:: |proc| + +.. _running-restore-playbook-locally-on-the-controller-steps-usl-2c3-pmb: + +#. To keep the Ceph cluster data intact \(false - default option\), use the + following command: + + .. code-block:: none + + wipe_ceph_osds=false + +#. To wipe the Ceph cluster entirely \(true\), where the Ceph cluster will + need to be recreated, use the following command: + + .. code-block:: none + + wipe_ceph_osds=true + + Example of a backup file in /home/sysadmin + + .. code-block:: none + + ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml -e "initial_backup_dir=/home/sysadmin ansible_become_pass=St0rlingX* admin_password=St0rlingX* backup_filename=localhost_platform_backup_2020_07_27_07_48_48.tgz wipe_ceph_osds=true" + + .. note:: + If the backup contains patches, Ansible Restore playbook will apply + the patches and prompt you to reboot the system. Then you will need to + re-run Ansible Restore playbook. + +.. rubric:: |postreq| + +After running restore\_platform.yml playbook, you can restore the local +registry images. + +.. note:: + The backup file of the local registry images may be large. Restore the + backed up file on the controller, where there is sufficient space. + +For example: + +.. code-block:: none + + ~(keystone_admin)$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_user_images.yml -e "initial_backup_dir=/home/sysadmin backup_filename=localhost_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_become_pass=St0rlingX*" diff --git a/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst b/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst new file mode 100644 index 000000000..9c3762b73 --- /dev/null +++ b/doc/source/backup/system-backup-running-ansible-restore-playbook-remotely.rst @@ -0,0 +1,147 @@ + +.. quy1571265365123 +.. _system-backup-running-ansible-restore-playbook-remotely: + +===================================== +Run Ansible Restore Playbook Remotely +===================================== + +In this method you can run Ansible Restore playbook and point to controller-0. + +.. rubric:: |prereq| + +.. _system-backup-running-ansible-restore-playbook-remotely-ul-ylm-g44-bkb: + +- You need to have Ansible installed on your remote workstation, along + with the Ansible Backup/Restore playbooks. + +- Your network has IPv6 connectivity before running Ansible Playbook, if + the system configuration is IPv6. + +.. rubric:: |proc| + +.. _system-backup-running-ansible-restore-playbook-remotely-steps-sgp-jjc-ljb: + +#. Log in to the remote workstation. + + You can log in directly on the console or remotely using :command:`ssh`. + +#. Provide an inventory file, either a customized one that is specified + using the ``-i`` option, or the default one that is in the Ansible + configuration directory \(that is, /etc/ansible/hosts\). You must + specify the floating |OAM| IP of the controller host. For example, if the + host name is |prefix|\_Cluster, the inventory file should have an entry + called |prefix|\_Cluster. + + .. parsed-literal:: + + --- + all: + hosts: + wc68: + ansible_host: 128.222.100.02 + |prefix|\_Cluster: + ansible_host: 128.224.141.74 + +#. Run the Ansible Restore playbook: + + .. code-block:: none + + ~(keystone_admin)$ ansible-playbook path-to-restore-platform-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars + + where optional-extra-vars can be: + + - **Optional**: You can select one of the two restore modes: + + - To keep Ceph data intact \(false - default option\), use the + following syntax: + + :command:`wipe_ceph_osds=false` + + - Start with an empty Ceph cluster \(true\), to recreate a new + Ceph cluster, use the following syntax: + + :command:`wipe_ceph_osds=true` + + - The backup\_filename is the platform backup tar file. It must be + provided using the ``-e`` option on the command line, for example: + + .. code-block:: none + + -e backup\_filename= localhost_platform_backup_2019_07_15_14_46_37.tgz + + - The initial\_backup\_dir is the location on the Ansible control + machine where the platform backup tar file is placed to restore the + platform. It must be provided using ``-e`` option on the command line. + + - The :command:`admin\_password`, :command:`ansible\_become\_pass`, + and :command:`ansible\_ssh\_pass` need to be set correctly using + the ``-e`` option on the command line or in the Ansible secret file. + :command:`ansible\_ssh\_pass` is the password to the sysadmin user + on controller-0. + + - The :command:`ansible\_remote\_tmp` should be set to a new + directory \(not required to create it ahead of time\) under + /home/sysadmin on controller-0 using the ``-e`` option on the command + line. + + For example: + + .. parsed-literal:: + + ~(keystone_admin)$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_platform.yml --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* admin_password=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_system_backup_2019_08_08_15_25_36.tgz ansible_remote_tmp=/home/sysadmin/ansible-restore" + + .. note:: + If the backup contains patches, Ansible Restore playbook will apply + the patches and prompt you to reboot the system. Then you will need to + re-run Ansible Restore playbook. + +#. After running the restore\_platform.yml playbook, you can restore the local + registry images. + + .. note:: + The backup file of the local registry may be large. Restore the + backed up file on the controller, where there is sufficient space. + + .. code-block:: none + + ~(keystone_admin)$ ansible-playbook path-to-restore-user-images-playbook-entry-file --limit host-name -i inventory-file -e optional-extra-vars + + where optional-extra-vars can be: + + - The backup\_filename is the local registry backup tar file. It + must be provided using the ``-e`` option on the command line, for + example: + + .. code-block:: none + + -e backup\_filename= localhost_docker_local_registry_backup_2020_07_15_21_24_22.tgz + + - The initial\_backup\_dir is the location on the Ansible control + machine where the platform backup tar file is located. It must be + provided using ``-e`` option on the command line. + + - The :command:`ansible\_become\_pass`, and + :command:`ansible\_ssh\_pass` need to be set correctly using the + ``-e`` option on the command line or in the Ansible secret file. + :command:`ansible\_ssh\_pass` is the password to the sysadmin user + on controller-0. + + - The backup\_dir should be set to a directory on controller-0. + The directory must have sufficient space for local registry backup + to be copied. The backup\_dir is set using the ``-e`` option on the + command line. + + - The :command:`ansible\_remote\_tmp` should be set to a new + directory on controller-0. Ansible will use this directory to copy + files, and the directory must have sufficient space for local + registry backup to be copied. The :command:`ansible\_remote\_tmp` + is set using the ``-e`` option on the command line. + + For example, run the local registry restore playbook, where + /sufficient/space directory on the controller has sufficient space left + for the archived file to be copied. + + .. parsed-literal:: + + ~(keystone_admin)$ ansible-playbook /localdisk/designer/jenkins/tis-stx-dev/cgcs-root/stx/ansible-playbooks/playbookconfig/src/playbooks/restore_user_images.ym --limit |prefix|\_Cluster -i $HOME/br_test/hosts -e "ansible_become_pass=St0rlingX* ansible_ssh_pass=St0rlingX* initial_backup_dir=$HOME/br_test backup_filename= |prefix|\_Cluster_docker_local_registry_backup_2020_07_15_21_24_22.tgz ansible_remote_tmp=/sufficient/space backup_dir=/sufficient/space" diff --git a/doc/source/index.rst b/doc/source/index.rst index ffd0d9d85..e369979b8 100755 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -102,6 +102,15 @@ Operation guides operations/index +------------------ +Backup and restore +------------------ + +.. toctree:: + :maxdepth: 2 + + backup/index + --------- Reference --------- diff --git a/doc/source/shared/strings.txt b/doc/source/shared/strings.txt index b7c9f1a6d..b17177bd1 100644 --- a/doc/source/shared/strings.txt +++ b/doc/source/shared/strings.txt @@ -53,6 +53,7 @@ .. |CNI| replace:: :abbr:`CNI (Container Networking Interface)` .. |IPMI| replace:: :abbr:`IPMI (Intelligent Platform Management Interface)` .. |LAG| replace:: :abbr:`LAG (Link Aggregation)` +.. |LDAP| replace:: :abbr:`LDAP (Lightweight Directory Access Protocol)` .. |MEC| replace:: :abbr:`MEC (Multi-access Edge Computing)` .. |NVMe| replace:: :abbr:`NVMe (Non Volatile Memory express)` .. |OAM| replace:: :abbr:`OAM (Operations, administration and management)` @@ -60,6 +61,7 @@ .. |OSDs| replace:: :abbr:`OSDs (Object Storage Devices)` .. |PVC| replace:: :abbr:`PVC (Persistent Volume Claim)` .. |PVCs| replace:: :abbr:`PVCs (Persistent Volume Claims)` +.. |PXE| replace:: :abbr:`PXE (Preboot Execution Environment)` .. |SAS| replace:: :abbr:`SAS (Serial Attached SCSI)` .. |SATA| replace:: :abbr:`SATA (Serial AT Attachment)` .. |SNMP| replace:: :abbr:`SNMP (Simple Network Management Protocol)`