diff --git a/doc/source/backup/kubernetes/backing-up-starlingx-system-data.rst b/doc/source/backup/kubernetes/backing-up-starlingx-system-data.rst index c0707c5b9..c725cf3b2 100644 --- a/doc/source/backup/kubernetes/backing-up-starlingx-system-data.rst +++ b/doc/source/backup/kubernetes/backing-up-starlingx-system-data.rst @@ -6,108 +6,207 @@ Back Up System Data =================== -A system data backup of a |prod-long| system captures core system -information needed to restore a fully operational |prod-long| cluster. +A system data backup of |prod-long| system captures core system information +needed to restore a fully operational |prod-long| cluster. -.. contents:: In this section: +.. contents:: |minitoc| :local: :depth: 1 -.. _backing-up-starlingx-system-data-section-N1002E-N1002B-N10001: -System Data Backups include: - -.. _backing-up-starlingx-system-data-ul-enh-3dl-lp: - -- platform configuration details - -- system databases - -- patching and package repositories - -- home directory for the **sysadmin** user and all |LDAP| user accounts. - -.. warning:: - - During a system backup, if the files contained in 'sysadmin' user's home - directory (``/home/sysadmin``) result in the overall size of the backup - being larger than 2 Gbytes, the backup operation may fail. - -.. xreflink See |sec-doc|: :ref:`Local LDAP Linux User Accounts - ` for additional information. - - .. note:: - If there is any change in hardware configuration, for example, new - NICs, a system backup is required to ensure that there is no - configuration mismatch after system restore. - -.. _backing-up-starlingx-system-data-section-N10089-N1002B-N10001: - ------------------------------------- -Detailed contents of a system backup ------------------------------------- - -The backup contains details as listed below: +Contents of System Backup +------------------------- .. _backing-up-starlingx-system-data-ul-s3t-bz4-kjb: -- Platform Configuration Data. +The following content is included in the backup: - All platform configuration data and files required to fully restore the - system to a working state following the platform restore procedure. +- All platform configuration data required to fully restore the system to a + working state following the platform restore procedure. -- (Optional) Any end user container images in **registry.local**; that - is, any images other than |org| system and application images. - |prod| system and application images are repulled from their - original source, external registries during the restore procedure. + - Platform and Kubernetes databases. -- Home directory 'sysadmin' user, and all |LDAP| user accounts - (item=/etc) + - Platform configuration files. -- Patching and package repositories: + - Platform certificates and keys. - - item=/opt/patching +- Home directory for the sysadmin user and all |LDAP| user accounts. - - item=/var/www/pages/updates +- End-user container images in ``registry.local``; that is, any images other + than |org| system and application images. |prod| system and application + images are re-pulled from their original source, and (optional) external + registries during the restore procedure. +- Distributed Cloud Vault (Central System Controller only). -.. _backing-up-starlingx-system-data-section-N1021A-N1002B-N10001: +The following content is excluded from the backup: ------------------------------------ -Data not included in system backups ------------------------------------ +- Application |PVC| data on Ceph clusters. -.. _backing-up-starlingx-system-data-ul-im2-b2y-lp: +- Modifications manually made to the file systems, such as configuration + changes on the ``/etc`` directory. After a restore operation has been + completed, these modifications must be reapplied. -- Application |PVCs| on Ceph clusters. +- Home directories and passwords of local user accounts. They must be backed up + manually by the sysadmin. -- StarlingX application data. Use the command :command:`system - application-list` to display a list of installed applications. - -- Modifications manually made to the file systems, such as configuration - changes on the /etc directory. After a restore operation has been completed, - these modifications have to be reapplied. - -- Home directories and passwords of local user accounts. They must be - backed up manually by the system administrator. - -- The /root directory. Use the **sysadmin** account instead when root - access is needed. +- The ``/root`` directory. Use the sysadmin account instead when root access is + needed. .. note:: - The system data backup can only be used to restore the cluster from - which the backup was made. You cannot use the system data backup to - restore the system to different hardware. Perform a system data backup - for each cluster and label the backup accordingly. - To ensure recovery from the backup file during a restore procedure, - containers must be in the active state when performing the backup. - Containers that are in a shutdown or paused state at the time of the - backup will not be recovered after a subsequent restore procedure. + Ceph data may be retained when restoring to the same servers and cluster. + + +System Backup Size +------------------ + +Consider the following for backup size: + +- The base size of a platform system backup sizes range from 10MB to 30MB, + depending on the size of the system and deployment. |AIO-SX| systems are + typically 20MB or less. + +- Backup of user home directories can cause the backup archive to be very large + and is limited to 2GB or less. + +- Total backup size should be below 100MB when using centralized backup and + restore operations. + +- Container images are large and will only be backed up locally to avoid large + image archives being transferred for each system. Container images that are + not present on the system may be pulled as part of platform and application + deployment, or restored separately to the local registry + (``registry.local``). + +- There can also be a significant size impact when patching is included in the + backup. + + +System Backup Filesystem Usage +------------------------------ + +The following filesystems are used during the backup operations of the system +for both local and centralized backup. + +**Staging Storage** + +The host filesystem used to stage temporary files during backup operations. The +filesystem may also be used to store final backup images if the filesystem is +sufficiently sized to store the backup archives. + +Host filesystem name: backup + +System path: ``/opt/backups`` + +Default size: 25GB + +For more information on how to modify the host filesystem sizes see +:ref:`Resize Filesystems on a Host `. + +**Local Storage** + +The host filesystem used to store backup files in a protected partition which +does not get wiped during system reinstallation. The protected local backup +partition is typically used by |AIO-SX| systems where there is no redundant +filesystem storage and is the default for local backups. + +.. note:: + + The filesystem is shared with system release pre-staging and needs to be + sized for both pre-staging installation media and backup archives. + +System Path: ``/opt/platform-backup/backups`` + +Default Size: 30GB + +**Centralized Storage** + +The Distributed Cloud (DC) Vault filesystem is used to store backup archives +when using centralized backup and restore. The filesystem size must be +increased to accommodate subcloud backup archive storage. A separate backup +archive is stored per subcloud and release, and therefore, must be sized to +accommodate all backups. + +System path: ``/opt/dc-vault/backups//`` + +Default size: 15GB + +.. note:: + + The filesystem is shared for |DC| subcloud deployment and management and + must be sized to store subcloud deployment files (subcloud configuration, + ISO images and subcloud staging files). + +For more information on how to modify the controller filesystem sizes see +:ref:`Storage on Controller Hosts +`. + + +Distributed Cloud Centralized Backups +------------------------------------- + +A subcloud's system data and optionally container images (from +``registry.local``) can be backed up using DCManager CLI command line +interface. The subcloud's system backup data can either be stored locally on +the subcloud or on the System Controller.. The subcloud's container image +backup (from ``registry.local``) can only be stored locally on the subcloud to +avoid overloading the central storage and the network with large amount of data +transfer and redundant storage of images in a central location. + +.. image:: figures/system-controller-backup-and-restore.png + :width: 800 + +For more information on the |CLI| operation of the centralized backup +capability see :ref:`Backup a Subcloud/Group of Subclouds using DCManager CLI +`. + +For more information on DCManager - Subcloud Backup API see `Subcloud +Backups +`__. + + +Execution Time for System Backups +--------------------------------- + +- The time to execute system backups is approximately 3-4 minutes for an idle + system. + +- Centralized backups may require additional time for network transfer for + larger backups. + +- Subcloud backups may be initiated and monitored from the DCManager |CLI| or + API, including parallel backups. + +- A minor alarm (210.001) "System Backup in progress" is raised while backing + up an individual system. + +- Systems with at least 4 platform cores will have much faster execution times. + + +Recommended Backup and Retention Policies +----------------------------------------- + +- All backups should be performed remotely and stored off the system. + +- All backups are done during off-peak hours (i.e. maintenance window). + + - Weekly backups should be performed under normal steady state conditions to + ensure the system can be restored to a fully operational state. + + - Nightly backups are the exception and should only be performed in periods + of significant reconfiguration to the system such as during large/mass + rollout (addition of subclouds), upgrade cycle of multiple sites, or + disaster recovery rehoming of subclouds. + +- Backups should be performed prior to performing maintenance operations or + applying configuration changes to the platform or hosted applications. + +- The retention period of backups should be approximately one month. + + - Since Kubernetes is an intent-based system, the most recent backup is the + most important. -When the system data backup is complete, the backup file must be kept in a -secured location, probably holding multiple copies of them for redundancy -purposes. .. seealso:: :ref:`Run Ansible Backup Playbook Locally on the Controller diff --git a/doc/source/backup/kubernetes/figures/system-controller-backup-and-restore.png b/doc/source/backup/kubernetes/figures/system-controller-backup-and-restore.png new file mode 100644 index 000000000..b183304ec Binary files /dev/null and b/doc/source/backup/kubernetes/figures/system-controller-backup-and-restore.png differ diff --git a/doc/source/updates/kubernetes/upgrading-all-in-one-simplex.rst b/doc/source/updates/kubernetes/upgrading-all-in-one-simplex.rst index ba7f69850..3eb1ecc01 100644 --- a/doc/source/updates/kubernetes/upgrading-all-in-one-simplex.rst +++ b/doc/source/updates/kubernetes/upgrading-all-in-one-simplex.rst @@ -70,7 +70,7 @@ End user container images in ``registry.local`` will be backed up during the upgrade process. This only includes images other than |prod| system and application images. These images are limited to 5 GB in total size. If the system contains more than 5 GB of these images, the upgrade start will fail. -For more details, see :ref:`Detailed contents of a system backup +For more details, see :ref:`Contents of System Backup `. .. rubric:: |proc|