Merge "Distributed Cloud - Standalone System Subcloud Enrollment (dsr10)"

This commit is contained in:
Zuul 2024-12-09 14:47:56 +00:00 committed by Gerrit Code Review
commit 8f00ba92df
3 changed files with 776 additions and 1 deletions

View File

@ -0,0 +1,760 @@
.. WARNING: Add no lines of text between the label immediately following
.. and the title.
.. _enroll-a-factory-installed-nonminusdc-standalone-system-as-a-s-87b2fbf81be3:
==========================================================================
Enroll a Factory Installed Non Distributed Standalone System as a Subcloud
==========================================================================
The subcloud enrollment feature converts a factory pre-installed system to a
subcloud of a |DC|. For factory pre-installation, standalone systems must be
able to be installed locally in the factory, and later deployed and configured
on-site as a |DC| subcloud without re-installing the system.
.. rubric:: |prereq|
The following requirements must be met for factory installation of a system:
- The standalone system must be |BMC| configured (support Redfish protocol).
- The standalone system must be installed with prestaged ISO (with archived container images).
- The prestaged ISO must be installed with one of the cloud-init boot options.
default-boot >= 2 can be specified when generating a prestaged ISO, otherwise
a cloud-init boot option will have to be manually selected during
installation. Default boot has the following menu options:
- 0 - Prestage Serial Console
- 1 - Prestage Graphical Console (default)
- 2 - Prestage cloud-init All-in-one Serial Console
- 3 - Prestage cloud-init All-in-one Graphical Console
- 4 - Prestage cloud-init Controller Serial Console
- 5 - Prestage cloud-init Controller Graphical Console
To create a prestaged ISO image, see
:ref:`subcloud-deployment-with-local-installation-4982449058d5`.
--------------------------------
Factory Installation of a System
--------------------------------
*******************************
Factory Installation Automation
*******************************
The automation services are delivered and loaded using a generated seed ISO.
The seed ISO is applied by cloud-init service and enabled during prestaged ISO
installation. The seed ISO contains the platform automation services as well as
cloud-config for cloud-init to set up, and trigger automation services.
The automation services are a set of systemd services, that provide streamlined
staged execution.
**************************
Prepare Seed Configuration
**************************
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Retrieve Base Seed Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Download and extract ``nocloud-factory-install.tar`` that contains seed ISO
contents. It consists of the platform automation systemd services contained in
``/nocloud-factory-install/factory-install subdir``, the base cloud-init
configuration contained in meta-data, network-config and user-data in top level
dir, and the base host configuration contained in
``/nocloud-factory-install/config subdir``.
.. code-block::
nocloud-factory-install/
├──config
├ └──localhost.yml
├──factory-install
│ ├── scripts
│ │ ├── 10-init-setup
│ │ ├── 20-hardware-check
│ │ └── 90-init-final
│ ├── setup
│ │ └── 10-system-setup
│ ├── systemd
│ │ ├── factory-install-bootstrap.path
│ │ ├── factory-install-bootstrap.service
│ │ ├── factory-install-config.path
│ │ ├── factory-install-config.service
│ │ ├── factory-install-setup.path
│ │ ├── factory-install-setup.service
│ │ ├── factory-install.target
│ │ ├── factory-install-tests.path
│ │ ├── factory-install-tests.service
│ │ └── utils
│ │ ├──20-cloud-init.preset
│ │ ├──20-factory-install.preset
│ │ └──disable-factory-install
│ └── tests
│ └──10-system-health
├──meta-data
├──network-config
└──user-data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prepare cloud-init Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before performing the initial configuration in factory, the following
requirements must be met:
- Only controller-0 is provisioned by the factory installation process.
- Management network update is only allowed on Simplex system. On other system
types, admin network should be configured during factory installation.
- The subcloud platform networks should be configured with the expected IP
family (IPv4 or IPv6) because the IP family of a subcloud cannot be updated.
- Same SSL_CA certs (system_local_ca_cert, system_local_ca_key, and
system_root_ca_cert) need to be installed on both the central cloud system
controllers and the factory-installed subclouds in ``localhost.yaml`` to
enable the |SSL| communication via |OAM| connection. Otherwise, the
enrollment will fail due to |SSL| failure while requesting subcloud's region
name (logs can be found in dcmanager.log).
- Kubernetes RootCA certs need to be specified during the factory installation
process in ``localhost.yaml``, otherwise, the kube-rootca endpoint will be
out of sync and a kube-rootca-strategy is needed to make it in sync.
- Additional applications should not be installed on the factory installed
system before completing the enrollment process.
- Other configurations that do not allow reconfiguration operations (example: create
a new controllerfs or hostfs, storage backend, ceph, etc) should be
configured in the factory before the subcloud enrollment.
The nocloud-factory-install from the tarball consists of the following files:
**user-data**
This file is used to customize the instance when it is first booted. It does
the initial setup.
.. code-block::
user-data
#cloud-config
chpasswd:
list:
# Changes the sysadmin password - the hash below specifies St8rlingX*1234
- sysadmin:$5$HElSMXRZZ8wlTiEe$I0hValcFqxLRKm3pFdXrpGZlxnmzQt6i9lhIR9FWAf8
expire:False
runcmd:
- [ /bin/bash, -c, "echo$(date):Initiating factory-install"
- mkdir -p /opt/nocloud
- mount LABEL=CIDATA /opt/nocloud
- run-parts --verbose --exit-on-error /opt/nocloud/factory-install/scripts
-eject/opt/nocloud
The default base configuration file:
- Sets the password to ``St8rlingX*1234`` (specified under chpasswd section).
The password can be set either in plain-text or hash. The :command:`mkpasswd`
Linux command can be used to generate a new password hash.
For example:
.. code-block::
mkpasswd -m sha-512
It will prompt for the new password to be hashed.
- Runs commands specified under runcmd to setup and start automation services.
Users may also choose to extend user-data to perform another configuration for
the initial setup of the node. See official cloud-init documentation
https://cloudinit.readthedocs.io/en/latest/reference/examples.html for working
with configuration files.
**meta-data**
The meta-data file provides instance specific information, host name, and
instance ID.
.. code-block::
meta-data
instance-id:iid-local01
local-hostname:controller-0
This file should not be modified.
User-data is applied once per instance, hence the instance ID must be changed
for subsequent runs if you need to re-apply seed (reinsert seed ISO). However,
this is not recommended as it may lead to a bad state, specially if the
factory-install services have already started from a previous run.
**network-data**
Various file parameters can be found on official cloud-init documentation
https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v1.html.
''This network configuration format lets users customize their instances
networking interfaces by assigning subnet configuration, virtual device
creation (bonds, bridges, |VLANs|) routes, and |DNS| configuration.''
This configuration is a placeholder and it should be updated based on the
factory node networking requirements. This can be used to assign the |OAM| IP
address, enable the network interface, and add the route to |SSH| and monitor
progress during factory-install.
Example:
The following network-config shows IPv4 address configuration:
.. code-block::
network-data
version:1
config:
- type: physical
name: enp2s1
subnets:
- type: static
address: 10.10.10.2
netmask: 255.255.255.0
gateway:10.10.10.
The following network-config shows IPv6 |VLAN| configuration:
.. code-block::
network-data
version:1
config:
- type: vlan
name: vlan401
vlan_link: enp179s0f0
vlan_id: 401
subnets:
- type: static
address: 2620:10a:a001:d41::208/64
gateway: 2620:10a:a001:d41::1
~~~~~~~~~~~~~~~~~~~~~~~~~~
Prepare Host Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~
The standalone host configuration files are the configuration files used by the
actual platform-automation during the bootstrap and config (deployment) stage,
such as ``localhost.yml``, ``deployment-config.yaml``,
``dm-playbook-overrides.yaml``.
The host configuration files must be specified under
``nocloud-factory-install/config dir``. These files will be copied to
``/home/sysadmin/`` on the host during setup.
.. note::
Only ``localhost.yml`` is provided as part of the base configuration. It is
a sample placeholder that must be updated. The Deployment Manager
configuration and overrides file must be created.
**``localhost.yml``**
``localhost.yml`` provides values to be used during bootstrap process. Values in
this file can be specified the same as values used during a normal bootstrap process.
Example:
.. code-block::
localhost.yaml
system_type:
system_mode:
name:
# DNS servers need to be the same IP family (v4 or v6), need to add anIPv6
# address if installed as IPv6 default values are IPv4
dns_servers:
# Need to assignIPv6addresses, values can be generic, default values are IPv4
# OAM networks can be reconfigured during the enrollment process
external_oam_subnet:
external_oam_gateway_address:
external_oam_floating_address:
external_oam_node_0_address:
external_oam_node_1_address:
# Admin network is required as we cannot reconfigure themanagement network,
# admin networks can be reconfigured during the enrollment process
admin_subnet:
admin_start_address:
admin_end_address:
admin_gateway_address:
# Need to assign IPv6 addresses, values can be generic, default values are IPv4
# management networks can only be reconfigured on Simplex without
# admin network configured. Only the primary stack management
# network supports reconfiguration.
management_subnet:
management_multicast_subnet:
management_start_address:
management_end_address:
#management_gateway cannot be configured with admin network
# Need to assign IPv6 addresses, values can be generic, default values are IPv4
cluster_host_subnet:
cluster_pod_subnet:
cluster_service_subnet:
# The password for factory install stage, need to be aligned with user-data
# The admin password will not be updated during the enrollment. However, it
will be synchronized with the system controller after managing the subcloud.
admin_password:
# password for factory install stage, need to be align with the admin_password
ansible_become_pass:
# optional, need to install the same cert with the system controller, otherwise
# the k8s-rootca endpoint will be out-of-sync after enrollment, but can use
# k8s-rootca-update ochestration to sync it
k8s_root_ca_cert:
k8s_root_ca_key:
# system SSL CA certs are required, and need to align with the system controllers
system_root_ca_cert:
system_local_ca_cert:
system_local_ca_key
**dm-playbook-overrides.yaml**
.. note::
This file name needs to be the same as given, otherwise automation services
will fail to find the file.
This file is only used to support the installation of Deployment Manager.
.. code-block::
deployment_config:/home/sysadmin/deployment-config.yaml
deployment_manager_overrides:
/usr/local/share/applications/overrides/wind-river-cloud-platform-deployment-manager-overrides.yaml
deployment_manager_chart:
/usr/local/share/applications/helm/wind-river-cloud-platform-deployment-manager-<version>.tgz
ansible_become_pass:#sysadminpassword
**deployment-config.yaml**
The ``deployment-config.yaml`` file contains configurations to be applied by
Deployment Manager to the system during the factory installation. The values in
``deployment-config.yaml`` can be used in different deployment options,
however, below are some of the guidelines that must be met:
- Only controller-0 will be provisioned and configured during the
factory installation. The other hosts profiles should be removed from
``deployment-config.yaml``.
- Controller-0 should be administratively unlocked before the enrollment process.
- Controller-0 should be reconciled at the end of the factory installation
process to be valid for enrollment.
- Storage backends, filesystems, memory, and processors should be specified in
the configuration.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(Optional) Prepare Custom Setup, Checks, and Tests Script
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The platform-automation framework is designed to run a set of scripts at
various stages. Users may provide their own scripts, checks, and tests. Sample
placeholders are provided as part of the base configuration for:
- Pre-bootstrap (checks during initial setup)
- Setup stage (checks during post deployment-unlock)
- Test stage (final checks)
.. note::
The framework executes scripts using run-parts. That is, it will run every
script that is found in the specific stage directory. Additional script/test
files can be added and file names can be changed.
Users may choose to add new scripts in the parent directory or modify the
existing sample placeholders.
**Pre-bootstrap, initial setup, and checks**
The ``nocloud-factory-install/factory-install/scripts/:`` scripts are executed at
the beginning before the bootstrap stage.
.. note::
The files ``10-init-setup`` and ``90-init-final`` in the directory must
remain in place to correctly set up and start the platform automation.
These files may be modified for custom behavior, however, modifying these two
files may result in unexpected behavior.
A placeholder sample script is provided in the directory
``nocloud-factory-install/factory-install/scripts/20-hardware-check`` executed
before proceeding with platform-automation.
.. code-block::
nocloud-factory-install/factory-install/scripts/20-hardware-check
# cloud-init script to Perform hardware and firmware checks
#
# SAMPLE ONLY - REPLACE WITH REAL HARDWARE CHECKS
#
echo "Hardware Check - Start"
BOARD_VENDOR=$(cat /sys/devices/virtual/dmi/id/board_vendor)
BOARD_NAME=$(cat /sys/devices/virtual/dmi/id/board_name)
PRODUCT_NAME=$(cat /sys/devices/virtual/dmi/id/product_name)
BIOS_VERSION=$(cat /sys/devices/virtual/dmi/id/bios_version)
echo "BOARD_VENDOR=${BOARD_VENDOR}"
echo "BOARD_NAME=${BOARD_NAME}"
echo "PRODUCT_NAME=${PRODUCT_NAME}"
echo "BIOS_VERISON=${BIOS_VERISON}"
echo "Hardware Check - Complete"
exit 0
**Setup stage checks (Post-deployment-unlock)**
The scripts in ``nocloud-factory-install/factory-install/setup/`` are run after
the host is unlocked by Deployment Manager during the setup stage. For example,
``nocloud-factory-install/factory-install/setup/10-system-setup`` ensures that
the host is fully reconciled before proceeding.
.. code-block::
nocloud-factory-install/factory-install/setup/10-system-setup
echo "System Setup - Start"
echo "Wait - host goenabled"
until [ -f /var/run/goenabled ]; do
sleep 10
done
echo "Ready - host goenabled"
system_mode=$(awk - F= '/system_mode/ {print $2}' /etc/platform/platform.conf)
echo "Wait - system deployment reconciled"
while true; do
if [ "$system_mode" = "duplex" ]; then
SYSTEM_RECONCILED=true
else
SYSTEM_RECONCILED=$(kubectl --kubeconfig=/etc/kubernetes/admin.conf -n deployment
fi
HOST_RECONCILED=$(kubectl --kubeconfig=/etc/kubernetes/admin.conf -n deployment get ho
if [ "$SYSTEM_RECONCILED" = true ] && [ "$HOST_RECONCILED" = true ]; then
break
fi
sleep10
done
echo "Ready - system deployment reconciled"
echo "System Setup - Complete"
exit0
**Final checks**
The scripts in ``nocloud-factory-install/factory-install/tests/`` are run as
part of the testing stage which is the last stage. For instance, in the placeholder
``10-system-health``, fm alarm checks are done.
.. code-block::
nocloud-factory-install/factory-install/tests/10-system-health
# Factory install system health checks triggered during the tests stage
#
# SAMPLE ONLY - REPLACE WITH REAL SYSTEM HEALTH CHECKS
#
echo "System Health Checks - Start"
log_failure () {
echo "FAIL: $1"
exit ${2}
}
# check for service impacting alarms(only recommended in simplex system, multi-nodes
system have alarms due to loss of standby controller)
source /etc/platform/openrc
fm --timeout 10 alarm-list --nowrap|grep -e "major\|minor\|warning\|critical"
if [ $? == 0 ]; then
# Log the health check failure and exit 0 to allow factory-install to finish up.
# Modify to exit 1 if factory-install should fail the test stage and halt.
log_failure "service impacting alarms present" 0
fi
echo "System Health Checks - Complete"
exit 0
*******************
Generate Seed Image
*******************
The seed ISO can be generated after the seed (cloud-init and host)
configurations are prepared. The seed ISO is essentially the content of
``nocloud-factory-install`` directory.
Use the Linux genisoimage command line tool to generate the seed ISO:
.. code-block::
genisoimage -o <seed-output-dir>/seed.iso -volid 'CIDATA' -untranslated-filenames -joliet -rock -iso-level 2 <path to extracted nocloud-factory-install dir>
*****************
Insert Seed Image
*****************
.. rubric:: |proc|
.. note::
Insert the seed image and power on the system using virtual media, such as the
|BMC| GUI or by utilizing the Redfish API.
#. Host the ISO image. Place the seed ISO in a location/file server accessible by |BMC|.
#. Power off the host.
#. Insert virtual media seed ISO image using Redfish API.
Example:
.. code-block::
Redfish insert media example usage
curl -k -X POST "https://redfish-server-ip/redfish/v1/Managers/1/VirtualMedia/Cd/Actions/VirtualMedia.InsertMedia
-H "Content-Type: application/json" \
-u "your-username:your-password" \
-d ' {
"Image": "http://<seed.iso>",
"Inserted": true
}'
#. Power on the system.
The boot order does not need to be changed. The system should boot from the
installation. The inserted seed image is used as a data source.
************************
Factory-install Services
************************
You must access the node to monitor progress, either using serial console with a tool
like |IPMI| or |SSH| if an IP address has been assigned (with seed network-config).
- Monitor the progress and output of various stages by checking
``/var/log/factory-install.log:``. Automation will go through various stages
in the order: bootstrap, config, setup, and tests. Overall, it will
run the bootstrap ansible playbook and then the DM playbook, unlock host, and
wait for the system to be reconciled.
- The following factory-install services can be managed and checked using :command:`systemctl`.
- factory-install-bootstrap.service
- factory-install-config.service
- factory-install-setup.service
- factory-install-tests.service
Example:
.. code-block::
sysadmin@controller-0:~$ systemctl status factory-install-<stage>.service
- Retry and start automation from the failed stage:
.. code-block::
sudo systemctl restart factory-install-<stage>.service --no-block
``--no-block`` must be specified to return the command line.
For example, the factory-install failed stage can be restarted as follows:
.. code-block::
sysadmin@controller-0:~$ systemctl status factory-install-tests.service
factory-install-tests.service - Factory Installation Execute System Tests
Loaded: loaded (/etc/systemd/system/factory-install-tests.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2024-10-23 17:28:00 UTC; 4h 31min ago
TriggeredBy: ● factory-install-tests.path
Main PID: 1725 (code=exited, status=1/FAILURE)
CPU: 1.617s
.. code-block::
sysadmin@controller-0:~$ sudo systemctl restart factory-install-tests.service --no-block
- The factory-install state flag can be found in the following:
- ``/var/lib/factory-install/stage/*``
- A flag set at the start of a given stage indicates the stage trigger.
- The ``final`` flag in this directory indicates factory-install
completion.
- ``/var/lib/factory-install/state/*``
A flag set at the successful completion of a given stage indicates the
stage exit.
- ``/var/lib/factory-install/complete``
This flag indicates that factory-install has successfully been completed
(equivalent to the stage final flag).
The following flags would indicate that the bootstrap, config, and setup stages
have successfully completed and the current stage is ``tests``:
.. code-block::
sysadmin@controller-0:~$ ls /var/lib/factory-install/state/
bootstrap config setup
.. code-block::
sysadmin@controller-0:~$ ls /var/lib/factory-install/stage
bootstrap config setup tests
- The general script output from the ``factory-install`` framework is directed
to ``/var/log/cloud-init-output.log`` by default. That is, logs prior to starting the
services can be found in ``/var/log/cloud-init-output.log``. Failure of one
of the scripts or checks may lead to error.
------------------------------------------------------------------------
Enroll the Factory Installed System as a Subcloud of a Distributed Cloud
------------------------------------------------------------------------
******************************************
Prepare the Subcloud Values for Enrollment
******************************************
The enrollment process requires bootstrap values, install values, and deployment
configurations.
**Install values**
Install values are required to access the |BMC| controller and to provide
necessary values to be configured to enable communication between the system
controller and the subcloud via |OAM| network.
.. code-block::
bootstrap_address: the bootstrap address to be opened for ansible-playbook
bootstrap_interface: the interface to assign with the bootstrap address
bmc_address: bmc address
bmc_username: bmc username
bootstrap_vlan: vlanID for the bootstrap interface
**Bootstrap values**
Bootstrap values are used during the enrollment process to update the required
configurations for essential services before Deployment Manager can update the
other configurations.
.. code-block::
# values to be set to the oam network
external_oam_subnet
external_oam_gateway_address
external_oam_floating_address
external_oam_node_0_address
external_oam_node_1_address
# values to be set to the admin network
admin_subnet:
admin_start_address:
admin_end_address:
admin_gateway_address:
# MGMT network values are only updated on Simplex system without admin network configured.
# Cluster networks, pxeboot_subnet values are ignored
# system type, system mode, name are ignored
systemcontroller_gateway_address:
docker_http_proxy:
docker_no_proxy:
docker_registries:
dns_servers:
# The other values are ignored
**Deployment configurations**
To prepare the deployment configurations for enrollment, this YAML file
should be similar to the original deployment.
- For multi-node systems, configure additional hosts of the system that were
not specified during factory installation.
- As a subcloud, static routes from the hosts' admin/management gateway to the
system controllers management subnet should be added to establish the
communication between the system controllers and the subcloud hosts.
- Hosts should be administratively unlocked in this configuration.
- Deployment Manager will ignore any network change at this stage. The network
reconfiguration should be controlled by the bootstrap values.
***************************
Perform Subcloud Enrollment
***************************
.. rubric:: |prereq|
The software ISO and signature need to be uploaded before the subcloud enrollment.
This operation can be found in the subcloud installation.
Perform subcloud enrollment using the following command:
.. code-block::
~(keystone_admin)]$ dcmanager subcloud add --enroll --bootstrap-address <> --bootstrap-values <> --deployment-config <> --install-values <> --sysadmin-password <> --bmc-password <>
If the subcloud enrollment fails, retry the enrollment process using the following command:
.. code-block::
~(keystone_admin)]$ dcmanager subcloud deploy enroll <subcloud-name>
If any value needs to be updated, append the bootstrap values, install values, |BMC|
password, sysadmin password, and deployment configurations as the optional arguments.
.. code-block::
~(keystone_admin)]$ dcmanager subcloud deploy enroll <subcloud-name> --boostrap-values <> --install-values <> --deploy-config <> --sysadmin-password<> --bmc-password<>
After the subcloud reaches the ``enroll-complete`` status, further
operation (config) can be triggered using the following command to reach the
final deploy status ``complete``.
.. code-block::
~(keystone_admin)]$ dcmanager subcloud deploy config <subcloud-name> --deployment-config <> --sysadmin-password

View File

@ -38,6 +38,7 @@ Installation
subcloud-deployment-phases-0ce5f6fbf696
reinstalling-a-subcloud-with-redfish-platform-management-service
subcloud-deployment-with-local-installation-4982449058d5
enroll-a-factory-installed-nondc-standalone-system-as-a-s-87b2fbf81be3
---------
Operation
@ -198,7 +199,7 @@ Distributed Cloud System Controller GEO Redundancy
configure-distributed-cloud-system-controller-geo-redundancy-e3a31d6bf662
-----------------------------
Management Network Parameters
Management Network Parameters
-----------------------------
.. toctree::

View File

@ -114,6 +114,10 @@ release directory
Create the Prestaged ISO with gen-prestaged-iso.sh
--------------------------------------------------
.. note::
Prestaged ISO can be used to prepare for factory installation.
You can prepare and manually prestage the Install Bundle or use the
``gen-prestaged-iso.sh`` tool to create a self-installing prestaging ISO image.
@ -193,6 +197,11 @@ functions. You will find it in the same software distribution location as
#. (Optional) Obtain archived Docker images.
.. note::
This step is required if the prestaged ISO is being prepared for
factory installation.
|prod| uses a large number of Docker images. You can embed Docker images
within your Prestaged ISO.
@ -313,6 +322,11 @@ Use the ``--output`` directive to specify the path/filename of the created
Specify default boot menu option:
0 - Serial Console
1 - Graphical Console (default)
2 - Prestage cloud-init All-in-one Serial Console
3 - Prestage cloud-init All-in-one Graphical Console
4 - Prestage cloud-init Controller Serial Console
5 - Prestage cloud-init Controller Graphical Console
--timeout <menu timeout>:
Specify boot menu timeout, in seconds. (default 30)
A value of -1 will wait forever.