1598 lines
59 KiB
ReStructuredText
1598 lines
59 KiB
ReStructuredText
.. This release note was created to address review https://review.opendev.org/c/starlingx/docs/+/862596
|
|
.. The Release Notes will be updated and a separate gerrit review will be sent out
|
|
.. Ignore the contents in this RN except for the updates stated in the comment above
|
|
|
|
.. _release-notes:
|
|
|
|
.. The Stx 10.0 RN is WIP and not ready for review.
|
|
.. Removed appearances of Armada as its not supported
|
|
|
|
===================
|
|
R10.0 Release Notes
|
|
===================
|
|
|
|
.. rubric:: |context|
|
|
|
|
StarlingX is a fully integrated edge cloud software stack that provides
|
|
everything needed to deploy an edge cloud on one, two, or up to 100 servers.
|
|
|
|
This section describes the new capabilities, Known Limitations and Workarounds,
|
|
Defects fixed and deprecated information in StarlingX 9.0 Release.
|
|
|
|
.. contents::
|
|
:local:
|
|
:depth: 1
|
|
|
|
---------
|
|
ISO image
|
|
---------
|
|
|
|
The pre-built ISO (Debian) for StarlingX Release 9.0 is located at the
|
|
``StarlingX mirror`` repo:
|
|
|
|
https://mirror.starlingx.windriver.com/mirror/starlingx/release/9.0.0/debian/monolithic/outputs/iso/
|
|
|
|
-------------------------------------
|
|
Source Code for StarlingX Release 9.0
|
|
-------------------------------------
|
|
|
|
The source code for StarlingX Release 9.0 is available on the r/stx.9.0
|
|
branch in the `StarlingX repositories <https://opendev.org/starlingx>`_.
|
|
|
|
----------
|
|
Deployment
|
|
----------
|
|
|
|
To deploy StarlingX Release 9.0, see `Consuming StarlingX <https://docs.starlingx.io/introduction/consuming.html>`_.
|
|
|
|
For detailed installation instructions, see `StarlingX 9.0 Installation Guides <https://docs.starlingx.io/deploy_install_guides/index-install-e083ca818006.html>`_.
|
|
|
|
-----------------------------
|
|
New Features and Enhancements
|
|
-----------------------------
|
|
|
|
.. start-new-features-r9
|
|
|
|
The sections below provide a detailed list of new features and links to the
|
|
associated user guides (if applicable).
|
|
|
|
*********************
|
|
Kubernetes up-version
|
|
*********************
|
|
|
|
In StarlingX 9.0, the Kubernetes version that is supported is in the range
|
|
of v1.24 to v1.27.
|
|
|
|
****************************************
|
|
Platform Application Components Revision
|
|
****************************************
|
|
|
|
.. Need updated versions for this section wherever applicable
|
|
|
|
The following applications have been updated to a new version in StarlingX Release 9.0.
|
|
All platform application up-versions are updated to remain current and address
|
|
security vulnerabilities in older versions.
|
|
|
|
- app-sriov-fec-operator: 2.7.1
|
|
|
|
- cert-manager: 1.11.1
|
|
|
|
- metric-server: 1.0.18
|
|
|
|
- nginx-ingress-controller: 1.9.3
|
|
|
|
- oidc-dex: 2.37.0
|
|
|
|
- vault: 1.14.8
|
|
|
|
- portieris: 0.13.10
|
|
|
|
- istio: 1.19.4
|
|
|
|
- kiali: 1.75.0
|
|
|
|
******************
|
|
FluxCD Maintenance
|
|
******************
|
|
FluxCD helm-controller is upgraded from v0.27.0 to v0.35.0 and is compatible
|
|
with Helm version up to v3.12.1 and Kubernetes v1.27.3.
|
|
|
|
FluxCD source-controller is upgraded from v0.32.1 to v1.0.1 and is compatible
|
|
with Helm version up to v3.12.1 and Kubernetes v1.27.3.
|
|
|
|
****************
|
|
Helm Maintenance
|
|
****************
|
|
|
|
Helm has been upgraded to v3.12.2 in StarlingX Release 9.0.
|
|
|
|
*******************************************
|
|
Support for Silicom TimeSync Server Adaptor
|
|
*******************************************
|
|
|
|
The Silicom network adaptor provides local time sync support via a local |GNSS|
|
|
module which is based on the Intel Columbiaville device.
|
|
|
|
- ``cvl-4.10`` Silicom driver bundle
|
|
- ice driver: 1.10.1.2
|
|
- i40e driver: 2.21.12
|
|
- iavf driver: 4.6.1
|
|
|
|
.. note::
|
|
|
|
`cvl-4.10` is only recommended if the Silicom STS2 card is used.
|
|
|
|
*********************************************
|
|
Kubernetes Upgrade Optimization - AIO-Simplex
|
|
*********************************************
|
|
|
|
**Configure Kubernetes Multi-Version Upgrade Cloud Orchestration for AIO-SX**
|
|
|
|
You can configure Kubernetes multi-version upgrade orchestration strategy using
|
|
the :command:`sw-manager` command. This feature is enabled from
|
|
|prod| |k8s-multi-ver-orch-strategy-release| and is supported only for the
|
|
|AIO-SX| system.
|
|
|
|
**See**: :ref:`Configure Kubernetes Multi-Version Upgrade Cloud Orchestration for AIO-SX <configuring-kubernetes-multi-version-upgrade-orchestration-aio-b0b59a346466>`
|
|
|
|
**Manual Kubernetes Multi-Version Upgrade in AIO-SX**
|
|
|
|
|AIO-SX| now supports multi-version Kubernetes upgrades. In this model,
|
|
Kubernetes is upgraded by two or more versions after disabling applications and
|
|
then applications are enabled again. This is faster than upgrading Kubernetes
|
|
one version at a time. Also, the upgrade can be aborted and reverted to the
|
|
original version. This feature is supported only for |AIO-SX|.
|
|
|
|
**See**: :ref:`Manual Kubernetes Multi-Version Upgrade in AIO-SX <manual-kubernetes-multi-version-upgrade-in-aio-sx-13e05ba19840>`
|
|
|
|
***********************************
|
|
Platform Admin Network Introduction
|
|
***********************************
|
|
|
|
The newly introduced admin network is an optional network that is used to
|
|
monitor and control internal |prod| between the subclouds and system controllers
|
|
in a Distributed Cloud environment. This function is performed by the management
|
|
network in the absence of an admin network. However, the admin network is more
|
|
easily reconfigured to handle subnet and IP address network parameter changes
|
|
after initial configuration.
|
|
|
|
In deployment configurations, static routes from the management or admin
|
|
interface of subclouds controller nodes to the system controller's management
|
|
subnet must be present. This ensures that the subcloud comes online after deployment.
|
|
|
|
.. note::
|
|
|
|
The admin network is optional. The default management network will be used
|
|
if it is not present.
|
|
|
|
You can manage an optional admin network on a subcloud for IP connectivity to
|
|
the system controller management network where the IP addresses of the admin
|
|
network can be changed.
|
|
|
|
**See**:
|
|
|
|
- :ref:`Common Components <common-components>`
|
|
|
|
- :ref:`Manage Subcloud Network Parameters <update-a-subcloud-network-parameters-b76377641da4>`
|
|
|
|
****************************************************
|
|
L3 Firewalls for all |prod-long| Platform Interfaces
|
|
****************************************************
|
|
|
|
|prod| incorporates default firewall rules for the platform networks (|OAM|,
|
|
management, cluster-host, pxeboot, admin, and storage). You can configure
|
|
additional Kubernetes Network Policies to augment or override the default rules.
|
|
|
|
**See**:
|
|
|
|
- :ref:`Modify Firewall Options <security-firewall-options>`
|
|
|
|
- :ref:`Default Firewall Rules <security-default-firewall-rules>`
|
|
|
|
****************************************************
|
|
app-sriov-fec-operator upgrade to FEC operator 2.7.1
|
|
****************************************************
|
|
|
|
A new version of the FEC Operator v2.7.1 (for all Intel hardware accelerators)
|
|
is supported to include ``igb_uio`` along with making the accelerator resource
|
|
names configurable and enabling accelerator device configuration using
|
|
``igb_uio`` driver when secure boot is enabled in the BIOS.
|
|
|
|
.. note::
|
|
|
|
|FEC| operator is now running on the |prod| platform core.
|
|
|
|
**See**: :ref:`Configure Intel Wireless FEC Accelerators using SR-IOV FEC operator <configure-sriov-fec-operator-to-enable-hw-accelerators-for-hosted-vran-containarized-workloads>`
|
|
|
|
|
|
**************************************
|
|
Redundant System Clock Synchronization
|
|
**************************************
|
|
|
|
The ``phc2sys`` application can be configured to accept multiple source clock
|
|
inputs. The quality of these sources are compared to user-defined priority
|
|
values and the best available source is selected to set the system time.
|
|
|
|
The quality of the configured sources is continuously monitored by ``phc2sys``
|
|
application and will select a new best source if the current source degrades
|
|
or if another source becomes higher quality.
|
|
|
|
**See**: :ref:`Redundant System Clock Synchronization <redundant-system-clock-synchronization-89ee23f54fbb>`.
|
|
|
|
*******************************************************
|
|
Configure Intel E810 NICs using Intel Ethernet Operator
|
|
*******************************************************
|
|
|
|
You can install and use **Intel Ethernet** operator to orchestrate and manage
|
|
the configuration and capabilities provided by Intel E810 Series network
|
|
interface cards (NICs).
|
|
|
|
**See**: :ref:`Configure Intel E810 NICs using Intel Ethernet Operator <configure-intel-e810-nics-using-intel-ethernet-operator>`.
|
|
|
|
****************
|
|
AppArmor Support
|
|
****************
|
|
|
|
AppArmor is a Mandatory Access Control (MAC) system built on Linux's LSM (Linux
|
|
Security Modules) interface. In practice, the kernel queries AppArmor before
|
|
each system call to know whether the process is authorized to do the given
|
|
operation. Through this mechanism, AppArmor confines programs to a limited set
|
|
of resources.
|
|
|
|
AppArmor helps administrators in running a more secure kubernetes deployment
|
|
by restricting what operations containers/pods are allowed, and/or provide better
|
|
auditing through system logs. The access needed by a container/pod is
|
|
configured through profiles tuned to allow access such as Linux capabilities,
|
|
network access, file permissions, etc.
|
|
|
|
**See**: :ref:`About AppArmor <about-apparmor-ebdab8f1ed87>`.
|
|
|
|
*****************
|
|
Support for Vault
|
|
*****************
|
|
|
|
This release re-introduces support for Vault as it was intermittently
|
|
unavailable in |prod|. The supported version vault: 1.14.8 or later /
|
|
vault-k8s: 1.2.1 / helm-chart: 0.25.0 after the helm-v3 up-version to 3.6+
|
|
|
|
|prod| integrates open source Vault containerized security application
|
|
(Optional) into the |prod| solution, that requires |PVCs| as a storage
|
|
backend to be enabled.
|
|
|
|
**See**: :ref:`Vault Overview <security-vault-overview>`.
|
|
|
|
*********************
|
|
Support for Portieris
|
|
*********************
|
|
|
|
|prod| now supports version 0.13.10. Portieris is an open source Kubernetes
|
|
admission controller which ensures only policy-compliant images, such as signed
|
|
images from trusted registries, can run. The Portieris application uses images
|
|
from the ``icr.io registry``. You must configure service parameters for the
|
|
``icr.io registry`` prior to applying the Portieris application,
|
|
see: :ref:`About Changing External Registries for StarlingX Installation <about-changing-external-registries-for-starlingx-installation>`.
|
|
For Distributed Cloud deployments, the images must be present on the System
|
|
Controller registry.
|
|
|
|
**See**: :ref:`Portieris Overview <portieris-overview>`.
|
|
|
|
**************************
|
|
Configurable Power Manager
|
|
**************************
|
|
|
|
Configurable Power Manager focuses on containerized applications that use power
|
|
profiles individually by the core and/or the application.
|
|
|
|
|prod| has the capability to regulate the frequency of the entire processor.
|
|
However, this control is primarily directed towards the classification of the
|
|
core, distinguishing between application and platform cores. Consequently, if a
|
|
user requires to control over an individual core, such as Core 10 in a
|
|
24-core CPU, adjustments must be applied to all cores collectively. In the
|
|
context of containerized operations, it becomes imperative to establish
|
|
personalized configurations. This entails assigning each container the
|
|
requisite power configuration. In essence, this involves providing specific and
|
|
individualized power configurations to each core or group of cores.
|
|
|
|
**See**: :ref:`Configurable Power Manager <configurable-power-manager-04c24b536696>`.
|
|
|
|
******************************************************
|
|
Technology Preview - Install Power Metrics Application
|
|
******************************************************
|
|
|
|
The Power Metrics app deploys two containers, cAdvisor and Telegraf that
|
|
collect metrics about hardware usage.
|
|
|
|
**See**: :ref:`Install Power Metrics Application <install-power-metrics-application-a12de3db7478>`.
|
|
|
|
|
|
*******************************************************
|
|
Install Node Feature Discovery (NFD) |prod| Application
|
|
*******************************************************
|
|
|
|
Node Feature Discovery (NFD) version 0.15.0 detects hardware features available
|
|
on each node in a kubernetes cluster and advertises those features using
|
|
Kubernetes node labels. This procedure walks you through the process of
|
|
installing the |NFD| |prod| Application.
|
|
|
|
**See**: :ref:`Install Node Feature Discovery Application <install-node-feature-discovery-nfd-starlingx-application-70f6f940bb4a>`.
|
|
|
|
****************************************************************************
|
|
Partial Disk (Transparent) Encryption Support via Software Encryption (LUKS)
|
|
****************************************************************************
|
|
|
|
A new encrypted filesystem using Linux Unified Key Setup (LUKS) is created
|
|
automatically on all hosts to store security-sensitive files. This is mounted
|
|
at '/var/luks/stx/luks_fs' and the files kept in '/var/luks/stx/luks_fs/controller'
|
|
directory are replicated between controllers.
|
|
|
|
*************************************************************
|
|
K8s API/CLI OIDC (Dex) Authentication with Local LDAP Backend
|
|
*************************************************************
|
|
|
|
|prod| offers |LDAP| commands to create and manage |LDAP| Linux groups as part
|
|
of a StarlingX local |LDAP| server (serving the local StarlingX cluster and,
|
|
in the case of Distributed Cloud, the entire Distribute Cloud System).
|
|
|
|
StarlingX provides procedures to configure the **oidc-auth-apps** |OIDC|
|
|
Identity Provider (Dex) system application to use the StarlingX local |LDAP|
|
|
server (in addition to, or in place of the already supported remote Windows
|
|
Active Directory) to authenticate users of the Kubernetes API.
|
|
|
|
**See**:
|
|
|
|
- :ref:`Overview of LDAP Servers <overview-of-ldap-servers>`
|
|
- :ref:`Create LDAP Linux Groups <create-ldap-linux-groups-4c94045f8ee0>`
|
|
- :ref:`Configure Kubernetes Client Access <configure-kubernetes-client-access>`
|
|
|
|
************************
|
|
Create LDAP Linux Groups
|
|
************************
|
|
|
|
|prod| offers |LDAP| commands to create and manage |LDAP| Linux groups as part
|
|
of the `ldapscripts` library.
|
|
|
|
|
|
*****************************************
|
|
StarlingX OpenStack now supports Antelope
|
|
*****************************************
|
|
|
|
Currently stx-openstack has been updated and now deploys OpenStack services
|
|
based on the Antelope release.
|
|
|
|
*******************
|
|
Pod Security Policy
|
|
*******************
|
|
|
|
|PSP| ONLY applies if running on Kubernetes v1.24 or earlier. |PSP| is
|
|
deprecated as of Kubernetes v1.21 and is removed in Kubernetes v1.25.
|
|
Instead of using |PSP|, you can enforce similar restrictions on Pods using
|
|
:ref:`Pod Security Admission Controller <pod-security-admission-controller-8e9e6994100f>`.
|
|
|
|
Since it has been introduced |PSP| has had usability problems. The way |PSPs|
|
|
are applied to pods has proven confusing especially when trying to use them.
|
|
It is easy to accidentally grant broader permissions than intended, and
|
|
difficult to inspect which |PSPs| apply in a certain situation. Kubernetes
|
|
offers a built-in |PSA| controller that will replace |PSPs| in the future.
|
|
|
|
*************************************************
|
|
|WAD| users sudo and local linux group assignment
|
|
*************************************************
|
|
|
|
StarlingX 9.0 supports and provides procedures for centrally configured
|
|
Window Active Directory (WAD) Users with sudo access and local linux group
|
|
assignments; i.e. with only |WAD| configuration changes.
|
|
|
|
**See**:
|
|
|
|
- :ref:`Create LDAP Linux Accounts <create-ldap-linux-accounts>`
|
|
- :ref:`Local LDAP Certificates <local-ldap-certificates-4e1df1e39341>`
|
|
- :ref:`SSH User Authentication using Windows Active Directory <sssd-support-5fb6c4b0320b>`
|
|
|
|
|
|
*******************************************
|
|
Subcloud Error Root Cause Correction Action
|
|
*******************************************
|
|
|
|
This feature provides a root cause analysis of the subcloud
|
|
deployment / upgrade failure. This includes:
|
|
|
|
- existing 'deploy_status' that provides progress through phases of subcloud
|
|
deployment and, on error, the phase that failed
|
|
|
|
- introduces ``deploy_error_desc`` attribute that provides a summary of the
|
|
key deployment/upgrade errors
|
|
|
|
- Additional text that is added at the end of the 'deploy_error_desc' error
|
|
message, with information on:
|
|
|
|
- trouble shooting commands
|
|
|
|
- root cause of the errors and
|
|
|
|
- suggested recovery action
|
|
|
|
**See**: :ref:`Manage Subclouds Using the CLI <managing-subclouds-using-the-cli>`
|
|
|
|
************************************
|
|
Patch Orchestration Phase Operations
|
|
************************************
|
|
|
|
The distributed cloud patch orchestration has the option to separate the upload
|
|
from the apply, remove, install and reboot operations. This facilitates
|
|
performing the upload operations outside of the system maintenance window
|
|
to reduce the total execution time during the patch activation that occurs
|
|
during the maintenance window. With the separation of operations, systems can
|
|
be prestaged with the updates prior to applying the changes to the system.
|
|
|
|
**See**: :ref:`Distributed Cloud Guide <index-dist-cloud-kub-95bef233eef0>`
|
|
|
|
****************************************************
|
|
Long Latency Between System Controller and Subclouds
|
|
****************************************************
|
|
|
|
Rehoming procedure of a subcloud that has been powered off for a long period of
|
|
time will differ from the regular rehoming procedure. Based on how long the
|
|
subcloud has been offline, the platform certificates will expire and will
|
|
need to be regenerated.
|
|
|
|
**See**: :ref:`Rehoming Subcloud with Expired Certificates <rehoming-subcloud-with-expired-certificates-00549c4ea6e2>`
|
|
|
|
**************
|
|
GEO Redundancy
|
|
**************
|
|
|
|
|prod| may be deployed across a geographically distributed set of regions. A
|
|
region consists of a local Kubernetes cluster with local redundancy and access
|
|
to high-bandwidth, low-latency networking between hosts within that region.
|
|
|
|
|prod-long| Distributed Cloud GEO redundancy configuration supports the ability
|
|
to recover from a catastrophic event that requires subclouds to be rehomed away
|
|
from the failed system controller site to the available site(s) which have
|
|
enough spare capacity. This way, even if the failed site cannot be restored in
|
|
short time, the subclouds can still be rehomed to available peer system
|
|
controller(s) for centralized management.
|
|
|
|
In this release, the following items are addressed:
|
|
|
|
* 1+1 GEO redundancy
|
|
|
|
- Active-Active redundancy model
|
|
- Total number of subclouds should not exceed 1K
|
|
|
|
* Automated operations
|
|
|
|
- Synchronization and liveness check between peer systems
|
|
- Alarm generation if peer system controller is down
|
|
|
|
* Manual operations
|
|
|
|
- Batch rehoming from alive peer system controller
|
|
|
|
**See**: :ref:`GEO Redundancy <overview-of-distributed-cloud-geo-redundancy>`
|
|
|
|
********************************
|
|
Redfish Virtual Media Robustness
|
|
********************************
|
|
|
|
Redfish virtual media operations has been observed to frequently fail with
|
|
transient errors. While the conditions for those failures are not always known
|
|
(network, BMC timeouts, etc), it has been observed that if the Subcloud install
|
|
operation is retried, the operation is successful.
|
|
|
|
To alleviate the transient conditions, the robustness of the Redfish virtual
|
|
media controller (RVMC) is improved by introducing additional error
|
|
handling and retry attempts.
|
|
|
|
**See**: :ref:`Install a Subcloud Using Redfish Platform Management Service <installing-a-subcloud-using-redfish-platform-management-service>`
|
|
|
|
.. end-new-features-r9
|
|
|
|
----------------
|
|
Hardware Updates
|
|
----------------
|
|
|
|
**See**:
|
|
|
|
- :ref:`Kubernetes Verified Commercial Hardware <verified-commercial-hardware>`
|
|
|
|
----------
|
|
Bug status
|
|
----------
|
|
|
|
**********
|
|
Fixed bugs
|
|
**********
|
|
|
|
This release provides fixes for a number of defects. Refer to the StarlingX bug
|
|
database to review the R9.0 `Fixed Bugs <https://bugs.launchpad.net/starlingx/+bugs?field.searchtext=&orderby=-importance&field.status%3Alist=FIXRELEASED&assignee_option=any&field.assignee=&field.bug_reporter=&field.bug_commenter=&field.subscriber=&field.structural_subscriber=&field.tag=stx.9.0&field.tags_combinator=ANY&field.has_cve.used=&field.omit_dupes.used=&field.omit_dupes=on&field.affects_me.used=&field.has_patch.used=&field.has_branches.used=&field.has_branches=on&field.has_no_branches.used=&field.has_no_branches=on&field.has_blueprints.used=&field.has_blueprints=on&field.has_no_blueprints.used=&field.has_no_blueprints=on&search=Search>`_.
|
|
|
|
.. All please confirm if any Limitations need to be removed / added for Stx 9.0.
|
|
|
|
---------------------------------
|
|
Known Limitations and Workarounds
|
|
---------------------------------
|
|
|
|
The following are known limitations you may encounter with your |prod| Release
|
|
9.0 and earlier releases. Workarounds are suggested where applicable.
|
|
|
|
.. note::
|
|
|
|
These limitations are considered temporary and will likely be resolved in
|
|
a future release.
|
|
|
|
************************************************
|
|
Suspend/Resume on VMs with SR-IOV (direct) Ports
|
|
************************************************
|
|
|
|
When using VMs with SR-IOV ports created with the -vnic-type=direct option
|
|
after a Suspend action, if one wants to Resume the instance it might come up
|
|
with all virtual NICs created but missing the IP Address of the vNIC connected
|
|
to the SR-IOV port.
|
|
|
|
**Workaround**: Manually Power-Off and Power-On (or Hard-Reboot) the instance
|
|
and the IP should be assigned correctly again (no information is lost).
|
|
|
|
.. Cole please
|
|
|
|
*****************************************
|
|
Error on Restoring OpenStack after Backup
|
|
*****************************************
|
|
|
|
The ansible command for restoring the app will fail with |prod-long| Release 9.0
|
|
with an error message mentioning the absence of an Armada directory.
|
|
|
|
**Workaround**: Manually change the backup tarball adding the Armada directory
|
|
using the following the steps:
|
|
|
|
.. code-block:: none
|
|
|
|
tar -xzf wr_openstack_backup_file.tgz # this will create a opt directory
|
|
cp -r opt/platform/fluxcd/ opt/platform/armada # copy fluxd to armada
|
|
tar -czf new_wr-openstack_backu.tgz opt/ # tar the opt directory into a new backup tarball
|
|
|
|
*****************************************
|
|
Subcloud Upgrade with Kubernetes Versions
|
|
*****************************************
|
|
|
|
Subcloud Kubernetes versions are upgraded along with the System Controller.
|
|
You can add a new subcloud while the System Controller is on intermediate
|
|
versions of Kubernetes as long as the needed k8s images are available at the
|
|
configured sources.
|
|
|
|
**Workaround**: In a Distributed Cloud configuration, when upgrading from
|
|
|prod-long| Release 7.0 the Kubernetes version is v1.23.1. The default
|
|
version of the new install for Kubernetes is v1.24.4. Kubernetes must be
|
|
upgraded one version at a time on the System Controller.
|
|
|
|
.. note::
|
|
New subclouds should not be added until the System Controller has been
|
|
upgraded to Kubernetes v1.24.4.
|
|
|
|
****************************************************
|
|
AIO-SX Restore Fails during puppet-manifest-apply.sh
|
|
****************************************************
|
|
|
|
Restore fails using a backup file created after a fresh install.
|
|
|
|
**Workaround**: During the restore process, after reinstalling the controllers,
|
|
the |OAM| interface must be configured with the same IP address protocol version
|
|
used during installation.
|
|
|
|
|
|
**************************************************************************
|
|
Subcloud Controller-0 is in a degraded state after upgrade and host unlock
|
|
**************************************************************************
|
|
|
|
During an upgrade orchestration of the subcloud from |prod-long| Release 7.0
|
|
to |prod-long| Release 8.0, and after host unlock, the subcloud is in a
|
|
``degraded`` state, and alarm 200.004 is raised, displaying
|
|
"controller-0 experienced a service-affecting failure. Auto-recovery in progress".
|
|
|
|
**Workaround**: You can recover the subcloud to the ``available`` state by
|
|
locking and unlocking controller-0 .
|
|
|
|
***********************************************************************
|
|
Limitations when using Multiple Driver Versions for the same NIC Family
|
|
***********************************************************************
|
|
|
|
The capability to support multiple NIC driver versions has the following
|
|
limitations:
|
|
|
|
- Intel NIC family supports only: ice, i40e and iavf drivers
|
|
|
|
- Driver versions must respect the compatibility matrix between drivers
|
|
|
|
- Multiple driver versions cannot be loaded simultaneously and applies to the
|
|
entire system.
|
|
|
|
- Latest driver version will be loaded by default, unless specifically
|
|
configured to use a legacy driver version.
|
|
|
|
- Drivers used by the installer will always use the latest version,
|
|
therefore firmware compatibility must support basic NIC operations for each
|
|
version to facilitate installation
|
|
|
|
- Host reboot is required to activate the configured driver versions
|
|
|
|
- For Backup and Restore, the host must be rebooted a second time for
|
|
in order to activate the drivers versions.
|
|
|
|
**Workaround**: NA
|
|
|
|
*****************
|
|
Quartzville Tools
|
|
*****************
|
|
|
|
The following :command:`celo64e` and :command:`nvmupdate64e` commands are not
|
|
supported in |prod-long|, Release 8.0 due to a known issue in Quartzville
|
|
tools that crashes the host.
|
|
|
|
**Workaround**: Reboot the host using the boot screen menu.
|
|
|
|
*************************************************
|
|
Controller SWACT unavailable after System Restore
|
|
*************************************************
|
|
|
|
After performing a restore of the system, the user is unable to swact the
|
|
controller.
|
|
|
|
**Workaround**: NA
|
|
|
|
*************************************************************
|
|
Intermittent Kubernetes Upgrade failure due to missing Images
|
|
*************************************************************
|
|
|
|
During a Kubernetes upgrade, the upgrade may intermittently fail when you run
|
|
:command:`system kube-host-upgrade <host> control-plane` due to the
|
|
containerd cache being cleared.
|
|
|
|
**Workaround**: If the above failure is encountered, run the following commands
|
|
on the host encountering the failure:
|
|
|
|
.. rubric:: |proc|
|
|
|
|
#. Ensure the failure is due to missing images by running ``crictl images`` and
|
|
confirming the following are not present:
|
|
|
|
.. code-block::
|
|
|
|
registry.local:9001/k8s.gcr.io/kube-apiserver:v1.24.4
|
|
registry.local:9001/k8s.gcr.io/kube-controller-manager:v1.24.4
|
|
registry.local:9001/k8s.gcr.io/kube-scheduler:v1.24.4
|
|
registry.local:9001/k8s.gcr.io/kube-proxy:v1.24.4
|
|
|
|
#. Manually pull the image into containerd cache by running the following
|
|
commands, replacing ``<admin_password>`` with your password for the admin
|
|
user.
|
|
|
|
.. code-block::
|
|
|
|
~(keystone_admin)]$ crictl pull --creds admin:<admin_password> registry.local:9001/k8s.gcr.io/kube-apiserver:v1.24.4
|
|
~(keystone_admin)]$ crictl pull --creds admin:<admin_password> registry.local:9001/k8s.gcr.io/kube-controller-manager:v1.24.4
|
|
~(keystone_admin)]$ crictl pull --creds admin:<admin_password> registry.local:9001/k8s.gcr.io/kube-scheduler:v1.24.4
|
|
~(keystone_admin)]$ crictl pull --creds admin:<admin_password> registry.local:9001/k8s.gcr.io/kube-proxy:v1.24.4
|
|
|
|
#. Ensure the images are present when running ``crictl images``. Rerun the
|
|
:command:`system kube-host-upgrade <host> control-plane`` command.
|
|
|
|
***********************************
|
|
Docker Network Bridge Not Supported
|
|
***********************************
|
|
|
|
The Docker Network Bridge, previously created by default, is removed and no
|
|
longer supported in |prod-long| Release 8.0 as the default bridge IP address
|
|
collides with addresses already in use.
|
|
|
|
As a result, docker can no longer be used for running containers. This impacts
|
|
building docker images directly on the host.
|
|
|
|
**Workaround**: Create a Kubernetes pod that has network access, log in
|
|
to the container, and build the docker images.
|
|
|
|
|
|
************************************
|
|
Impact of Kubenetes Upgrade to v1.24
|
|
************************************
|
|
|
|
In Kubernetes v1.24 support for the ``RemoveSelfLink`` feature gate was removed.
|
|
In previous releases of |prod-long| this has been set to "false" for backward
|
|
compatibility, but this is no longer an option and it is now hardcoded to "true".
|
|
|
|
**Workaround**: Any application that relies on this feature gate being disabled
|
|
(i.e. assumes the existence of the "self link") must be updated before
|
|
upgrading to Kubernetes v1.24.
|
|
|
|
|
|
*******************************************************************
|
|
Password Expiry Warning Message is not shown for LDAP user on login
|
|
*******************************************************************
|
|
|
|
In |prod-long| Release 8.0, the password expiry warning message is not shown
|
|
for LDAP users on login when the password is nearing expiry. This is due to
|
|
the ``pam-sssd`` integration.
|
|
|
|
**Workaround**: It is highly recommend that LDAP users maintain independent
|
|
notifications and update their passwords every 3 months.
|
|
|
|
The expired password can be reset by a user with root privileges using
|
|
the following command:
|
|
|
|
.. code-block::none
|
|
|
|
~(keystone_admin)]$ sudo ldapsetpasswd ldap-username
|
|
Password:
|
|
Changing password for user uid=ldap-username,ou=People,dc=cgcs,dc=local
|
|
New Password:
|
|
Retype New Password:
|
|
Successfully set password for user uid=ldap-username,ou=People,dc=cgcs,dc=local
|
|
|
|
******************************************
|
|
Console Session Issues during Installation
|
|
******************************************
|
|
|
|
After bootstrap and before unlocking the controller, if the console session times
|
|
out (or the user logs out), ``systemd`` does not work properly. ``fm, sysinv and
|
|
mtcAgent`` do not initialize.
|
|
|
|
**Workaround**: If the console times out or the user logs out between bootstrap
|
|
and unlock of controller-0, then, to recover from this issue, you must
|
|
re-install the ISO.
|
|
|
|
************************************************
|
|
PTP O-RAN Spec Compliant Timing API Notification
|
|
************************************************
|
|
|
|
.. Need the version for the .tgz tarball....Please confirm if this is applicable to stx 8.0?
|
|
|
|
- The ptp-notification <minor_version>.tgz application tarball and the corresponding
|
|
notificationservice-base:stx8.0-v2.0.2 image are not backwards compatible
|
|
with applications using the ``v1 ptp-notification`` API and the corresponding
|
|
notificationclient-base:stx.8.0-v2.0.2 image.
|
|
|
|
Backward compatibility will be provided in StarlingX Release 9.0.
|
|
|
|
.. note::
|
|
|
|
For |O-RAN| Notification support (v2 API), deploy and use the
|
|
``ptp-notification-<minor_version>.tgz`` application tarball. Instructions for this
|
|
can be found in the |prod-long| Release 8.0 documentation.
|
|
|
|
**See**:
|
|
|
|
- :ref:`install-ptp-notifications`
|
|
|
|
- :ref:`integrate-the-application-with-notification-client-sidecar`
|
|
|
|
- The ``v1 API`` only supports monitoring a single ptp4l + phc2sys instance.
|
|
|
|
**Workaround**: Ensure the system is not configured with multiple instances
|
|
when using the v1 API.
|
|
|
|
- The O-RAN Cloud Notification defines a /././sync API v2 endpoint intended to
|
|
allow a client to subscribe to all notifications from a node. This endpoint
|
|
is not supported |prod-long| Release 8.0.
|
|
|
|
**Workaround**: A specific subscription for each resource type must be
|
|
created instead.
|
|
|
|
- ``v1 / v2``
|
|
|
|
- v1: Support for monitoring a single ptp4l instance per host - no other
|
|
services can be queried/subscribed to.
|
|
|
|
- v2: The API conforms to O-RAN.WG6.O-Cloud Notification API-v02.01
|
|
with the following exceptions, that are not supported in |prod-long|
|
|
Release 8.0.
|
|
|
|
- O-RAN SyncE Lock-Status-Extended notifications
|
|
|
|
- O-RAN SyncE Clock Quality Change notifications
|
|
|
|
- O-RAN Custom cluster names
|
|
|
|
- /././sync endpoint
|
|
|
|
**Workaround**: See the respective PTP-notification v1 and v2 document
|
|
subsections for further details.
|
|
|
|
v1: https://docs.starlingx.io/api-ref/ptp-notification-armada-app/api_ptp_notifications_definition_v1.html
|
|
|
|
v2: https://docs.starlingx.io/api-ref/ptp-notification-armada-app/api_ptp_notifications_definition_v2.html
|
|
|
|
|
|
**************************************************************************
|
|
Upper case characters in host names cause issues with kubernetes labelling
|
|
**************************************************************************
|
|
|
|
Upper case characters in host names cause issues with kubernetes labelling.
|
|
|
|
**Workaround**: Host names should be lower case.
|
|
|
|
****************
|
|
Debian Bootstrap
|
|
****************
|
|
|
|
On CentOS bootstrap worked even if **dns_servers** were not present in the
|
|
localhost.yml. This does not work for Debian bootstrap.
|
|
|
|
**Workaround**: You need to configure the **dns_servers** parameter in the
|
|
localhost.yml, as long as no |FQDNs| were used in the bootstrap overrides in
|
|
the localhost.yml file for Debian bootstrap.
|
|
|
|
***********************
|
|
Installing a Debian ISO
|
|
***********************
|
|
|
|
The disks and disk partitions need to be wiped before the install.
|
|
Installing a Debian ISO may fail with a message that the system is
|
|
in emergency mode if the disks and disk partitions are not
|
|
completely wiped before the install, especially if the server was
|
|
previously running a CentOS ISO.
|
|
|
|
**Workaround**: When installing a lab for any Debian install, the disks must
|
|
first be completely wiped using the following procedure before starting
|
|
an install.
|
|
|
|
Use the following wipedisk commands to run before any Debian install for
|
|
each disk (eg: sda, sdb, etc):
|
|
|
|
.. code-block:: none
|
|
|
|
sudo wipedisk
|
|
# Show
|
|
sudo sgdisk -p /dev/sda
|
|
# Clear part table
|
|
sudo sgdisk -o /dev/sda
|
|
|
|
.. note::
|
|
|
|
The above commands must be run before any Debian install. The above
|
|
commands must also be run if the same lab is used for CentOS installs after
|
|
the lab was previously running a Debian ISO.
|
|
|
|
**********************************
|
|
Security Audit Logging for K8s API
|
|
**********************************
|
|
|
|
A custom policy file can only be created at bootstrap in ``apiserver_extra_volumes``.
|
|
If a custom policy file was configured at bootstrap, then after bootstrap the
|
|
user has the option to configure the parameter ``audit-policy-file`` to either
|
|
this custom policy file (``/etc/kubernetes/my-audit-policy-file.yml``) or the
|
|
default policy file ``/etc/kubernetes/default-audit-policy.yaml``. If no
|
|
custom policy file was configured at bootstrap, then the user can only
|
|
configure the parameter ``audit-policy-file`` to the default policy file.
|
|
|
|
Only the parameter ``audit-policy-file`` is configurable after bootstrap, so
|
|
the other parameters (``audit-log-path``, ``audit-log-maxsize``,
|
|
``audit-log-maxage`` and ``audit-log-maxbackup``) cannot be changed at
|
|
runtime.
|
|
|
|
**Workaround**: NA
|
|
|
|
**See**: :ref:`kubernetes-operator-command-logging-663fce5d74e7`.
|
|
|
|
******************************************************************
|
|
Installing subcloud with patches in Partial-Apply is not supported
|
|
******************************************************************
|
|
|
|
When a patch has been uploaded and applied, but not installed, it is in
|
|
a ``Partial-Apply`` state. If a remote subcloud is installed via Redfish
|
|
(miniboot) at this point, it will run the patched software. Any patches in this
|
|
state will be applied on the subcloud as it is installed. However, this is not
|
|
reflected in the output from the :command:`sw-patch query` command on the
|
|
subcloud.
|
|
|
|
**Workaround**: For remote subcloud install operations using the Redfish
|
|
protocol, you should avoid installing any subclouds if there are System
|
|
Controller patches in the ``Partial-Apply`` state.
|
|
|
|
******************************************
|
|
PTP is not supported on Broadcom 57504 NIC
|
|
******************************************
|
|
|
|
|PTP| is not supported on the Broadcom 57504 NIC.
|
|
|
|
**Workaround**: None. Do not configure |PTP| instances on the Broadcom 57504
|
|
NIC.
|
|
|
|
*************************************
|
|
Metrics Server Update across Upgrades
|
|
*************************************
|
|
|
|
After a platform upgrade, the Metrics Server will NOT be automatically updated.
|
|
|
|
**Workaround**: To update the Metrics Server,
|
|
**See**: :ref:`Install Metrics Server <kubernetes-admin-tutorials-metrics-server>`
|
|
|
|
***********************************************************************************
|
|
Horizon Drop-Down lists in Chrome and Firefox causes issues due to the new branding
|
|
***********************************************************************************
|
|
|
|
Drop-down menus in Horizon do not work due to the 'select' HTML element on Chrome
|
|
and Firefox.
|
|
|
|
It is considered a 'replaced element' as it is generated by the browser and/or
|
|
operating system. This element has a limited range of customizable CSS
|
|
properties.
|
|
|
|
**Workaround**: The system should be 100% usable even with this limitation.
|
|
Changing browser's and/or operating system's theme could solve display issues
|
|
in case they limit the legibility of the elements (i.e. white text and
|
|
white background).
|
|
|
|
************************************************************************************************
|
|
Deploying an App using nginx controller fails with internal error after controller.name override
|
|
************************************************************************************************
|
|
|
|
An Helm override of controller.name to the nginx-ingress-controller app may
|
|
result in errors when creating ingress resources later on.
|
|
|
|
Example of Helm override:
|
|
|
|
.. code-block::none
|
|
|
|
cat <<EOF> values.yml
|
|
controller:
|
|
name: notcontroller
|
|
|
|
EOF
|
|
|
|
~(keystone_admin)$ system helm-override-update nginx-ingress-controller ingress-nginx kube-system --values values.yml
|
|
+----------------+-----------------------+
|
|
| Property | Value |
|
|
+----------------+-----------------------+
|
|
| name | ingress-nginx |
|
|
| namespace | kube-system |
|
|
| user_overrides | controller: |
|
|
| | name: notcontroller |
|
|
| | |
|
|
+----------------+-----------------------+
|
|
|
|
~(keystone_admin)$ system application-apply nginx-ingress-controller
|
|
|
|
**Workaround**: NA
|
|
|
|
************************************************
|
|
Kata Container is not supported on StarlingX 8.0
|
|
************************************************
|
|
|
|
Kata Containers that were supported on CentOS in earlier releases of |prod-long|
|
|
will not be supported on |prod-long| Release 8.0.
|
|
|
|
***********************************************
|
|
Vault is not supported on StarlingX Release 8.0
|
|
***********************************************
|
|
|
|
The Vault application is not supported on |prod-long| Release 8.0.
|
|
|
|
**Workaround**: NA
|
|
|
|
***************************************************
|
|
Portieris is not supported on StarlingX Release 8.0
|
|
***************************************************
|
|
|
|
The Portieris application is not supported on |prod-long| Release 8.0.
|
|
|
|
**Workaround**: NA
|
|
|
|
*****************************
|
|
DCManager Patch Orchestration
|
|
*****************************
|
|
|
|
.. warning::
|
|
Patches must be applied or removed on the System Controller prior to using
|
|
the :command:`dcmanager patch-strategy` command to propagate changes to the
|
|
subclouds.
|
|
|
|
****************************************
|
|
Optimization with a Large number of OSDs
|
|
****************************************
|
|
|
|
As Storage nodes are not optimized, you may need to optimize your Ceph
|
|
configuration for balanced operation across deployments with a high number of
|
|
|OSDs|. This results in an alarm being generated even if the installation
|
|
succeeds.
|
|
|
|
800.001 - Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s'
|
|
|
|
**Workaround**: To optimize your storage nodes with a large number of |OSDs|, it
|
|
is recommended to use the following commands:
|
|
|
|
.. code-block:: none
|
|
|
|
$ ceph osd pool set kube-rbd pg_num 256
|
|
$ ceph osd pool set kube-rbd pgp_num 256
|
|
|
|
******************************************************************
|
|
PTP tx_timestamp_timeout causes ptp4l port to transition to FAULTY
|
|
******************************************************************
|
|
|
|
NICs using the Intel Ice NIC driver may report the following in the `ptp4l``
|
|
logs, which might coincide with a |PTP| port switching to ``FAULTY`` before
|
|
re-initializing.
|
|
|
|
.. code-block:: none
|
|
|
|
ptp4l[80330.489]: timed out while polling for tx timestamp
|
|
ptp4l[80330.CGTS-30543489]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
|
|
|
|
This is due to a limitation of the Intel ICE driver.
|
|
|
|
**Workaround**: The recommended workaround is to set the ``tx_timestamp_timeout``
|
|
parameter to 700 (ms) in the ``ptp4l`` config using the following command.
|
|
|
|
.. code-block:: none
|
|
|
|
~(keystone_admin)]$ system ptp-instance-parameter-add ptp-inst1 tx_timestamp_timeout=700
|
|
|
|
***************
|
|
BPF is disabled
|
|
***************
|
|
|
|
|BPF| cannot be used in the PREEMPT_RT/low latency kernel, due to the inherent
|
|
incompatibility between PREEMPT_RT and |BPF|, see, https://lwn.net/Articles/802884/.
|
|
|
|
Some packages might be affected when PREEMPT_RT and BPF are used together. This
|
|
includes the following, but not limited to these packages.
|
|
|
|
- libpcap
|
|
- libnet
|
|
- dnsmasq
|
|
- qemu
|
|
- nmap-ncat
|
|
- libv4l
|
|
- elfutils
|
|
- iptables
|
|
- tcpdump
|
|
- iproute
|
|
- gdb
|
|
- valgrind
|
|
- kubernetes
|
|
- cni
|
|
- strace
|
|
- mariadb
|
|
- libvirt
|
|
- dpdk
|
|
- libteam
|
|
- libseccomp
|
|
- binutils
|
|
- libbpf
|
|
- dhcp
|
|
- lldpd
|
|
- containernetworking-plugins
|
|
- golang
|
|
- i40e
|
|
- ice
|
|
|
|
**Workaround**: It is recommended not to use BPF with real time kernel.
|
|
If required it can still be used, for example, debugging only.
|
|
|
|
*****************
|
|
crashkernel Value
|
|
*****************
|
|
|
|
**crashkernel=auto** is no longer supported by newer kernels, and hence the
|
|
v5.10 kernel will not support the "auto" value.
|
|
|
|
**Workaround**: |prod-long| uses **crashkernel=2048m** instead of
|
|
**crashkernel=auto**.
|
|
|
|
.. note::
|
|
|
|
|prod-long| Release 8.0 has increased the amount of reserved memory for
|
|
the crash/kdump kernel from 512 MiB to 2048 MiB.
|
|
|
|
***********************
|
|
Control Group parameter
|
|
***********************
|
|
|
|
The control group (cgroup) parameter **kmem.limit_in_bytes** has been
|
|
deprecated, and results in the following message in the kernel's log buffer
|
|
(dmesg) during boot-up and/or during the Ansible bootstrap procedure:
|
|
"kmem.limit_in_bytes is deprecated and will be removed. Please report your
|
|
use case to linux-mm@kvack.org if you depend on this functionality." This
|
|
parameter is used by a number of software packages in |prod-long|, including,
|
|
but not limited to, **systemd, docker, containerd, libvirt** etc.
|
|
|
|
**Workaround**: NA. This is only a warning message about the future deprecation
|
|
of an interface.
|
|
|
|
****************************************************
|
|
Kubernetes Taint on Controllers for Standard Systems
|
|
****************************************************
|
|
|
|
In Standard systems, a Kubernetes taint is applied to controller nodes in order
|
|
to prevent application pods from being scheduled on those nodes; since
|
|
controllers in Standard systems are intended ONLY for platform services.
|
|
If application pods MUST run on controllers, a Kubernetes toleration of the
|
|
taint can be specified in the application's pod specifications.
|
|
|
|
**Workaround**: Customer applications that need to run on controllers on
|
|
Standard systems will need to be enabled/configured for Kubernetes toleration
|
|
in order to ensure the applications continue working after an upgrade from
|
|
|prod-long| Release 6.0 to |prod-long| future Releases. It is suggested to add
|
|
the Kubernetes toleration to your application prior to upgrading to |prod-long|
|
|
Release 8.0.
|
|
|
|
You can specify toleration for a pod through the pod specification (PodSpec).
|
|
For example:
|
|
|
|
.. code-block:: none
|
|
|
|
spec:
|
|
....
|
|
template:
|
|
....
|
|
spec
|
|
tolerations:
|
|
- key: "node-role.kubernetes.io/control-plane"
|
|
operator: "Exists"
|
|
effect: "NoSchedule"
|
|
- key: "node-role.kubernetes.io/control-plane"
|
|
operator: "Exists"
|
|
effect: "NoSchedule"
|
|
|
|
**See**: `Taints and Tolerations <https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/>`__.
|
|
|
|
********************************************************
|
|
New Kubernetes Taint on Controllers for Standard Systems
|
|
********************************************************
|
|
|
|
A new Kubernetes taint will be applied to controllers for Standard systems in
|
|
order to prevent application pods from being scheduled on controllers; since
|
|
controllers in Standard systems are intended ONLY for platform services. If
|
|
application pods MUST run on controllers, a Kubernetes toleration of the taint
|
|
can be specified in the application's pod specifications. You will also need to
|
|
change the nodeSelector / nodeAffinity to use the new label.
|
|
|
|
**Workaround**: Customer applications that need to run on controllers on
|
|
Standard systems will need to be enabled/configured for Kubernetes toleration
|
|
in order to ensure the applications continue working after an upgrade to
|
|
|prod-long| Release 8.0 and |prod-long| future Releases.
|
|
|
|
You can specify toleration for a pod through the pod specification (PodSpec).
|
|
For example:
|
|
|
|
.. code-block:: none
|
|
|
|
spec:
|
|
....
|
|
template:
|
|
....
|
|
spec
|
|
tolerations:
|
|
- key: "node-role.kubernetes.io/control-plane"
|
|
operator: "Exists"
|
|
effect: "NoSchedule"
|
|
|
|
**See**: `Taints and Tolerations <https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/>`__.
|
|
|
|
**************************************************************
|
|
Ceph alarm 800.001 interrupts the AIO-DX upgrade orchestration
|
|
**************************************************************
|
|
|
|
Upgrade orchestration fails on |AIO-DX| systems that have Ceph enabled.
|
|
|
|
**Workaround**: Clear the Ceph alarm 800.001 by manually upgrading both
|
|
controllers and using the following command:
|
|
|
|
.. code-block:: none
|
|
|
|
~(keystone_admin)]$ ceph mon enable-msgr2
|
|
|
|
Ceph alarm 800.001 is cleared.
|
|
|
|
***************************************************************
|
|
Storage Nodes are not considered part of the Kubernetes cluster
|
|
***************************************************************
|
|
|
|
When running the :command:`system kube-host-upgrade-list` command the output
|
|
must only display controller and worker hosts that have control-plane and kubelet
|
|
components. Storage nodes do not have any of those components and so are not
|
|
considered a part of the Kubernetes cluster.
|
|
|
|
**Workaround**: Do not include Storage nodes.
|
|
|
|
***************************************************************************************
|
|
Backup and Restore of ACC100 (Mount Bryce) configuration requires double unlock attempt
|
|
***************************************************************************************
|
|
|
|
After restoring from a previous backup with an Intel ACC100 processing
|
|
accelerator device, the first unlock attempt will be refused since this
|
|
specific kind of device will be updated in the same context.
|
|
|
|
**Workaround**: A second attempt after few minutes will accept and unlock the
|
|
host.
|
|
|
|
**************************************
|
|
Application Pods with SRIOV Interfaces
|
|
**************************************
|
|
|
|
Application Pods with |SRIOV| Interfaces require a **restart-on-reboot: "true"**
|
|
label in their pod spec template.
|
|
|
|
Pods with |SRIOV| interfaces may fail to start after a platform restore or
|
|
Simplex upgrade and persist in the **Container Creating** state due to missing
|
|
PCI address information in the |CNI| configuration.
|
|
|
|
**Workaround**: Application pods that require|SRIOV| should add the label
|
|
**restart-on-reboot: "true"** to their pod spec template metadata. All pods with
|
|
this label will be deleted and recreated after system initialization, therefore
|
|
all pods must be restartable and managed by a Kubernetes controller
|
|
\(i.e. DaemonSet, Deployment or StatefulSet) for auto recovery.
|
|
|
|
Pod Spec template example:
|
|
|
|
.. code-block:: none
|
|
|
|
template:
|
|
metadata:
|
|
labels:
|
|
tier: node
|
|
app: sriovdp
|
|
restart-on-reboot: "true"
|
|
|
|
|
|
***********************
|
|
Management VLAN Failure
|
|
***********************
|
|
|
|
If the Management VLAN fails on the active System Controller, communication
|
|
failure 400.005 is detected, and alarm 280.001 is raised indicating
|
|
subclouds are offline.
|
|
|
|
**Workaround**: System Controller will recover and subclouds are manageable
|
|
when the Management VLAN is restored.
|
|
|
|
********************************
|
|
Host Unlock During Orchestration
|
|
********************************
|
|
|
|
If a host unlock during orchestration takes longer than 30 minutes to complete,
|
|
a second reboot may occur. This is due to the delays, VIM tries to abort. The
|
|
abort operation triggers the second reboot.
|
|
|
|
**Workaround**: NA
|
|
|
|
**************************************
|
|
Storage Nodes Recovery on Power Outage
|
|
**************************************
|
|
|
|
Storage nodes take 10-15 minutes longer to recover in the event of a full
|
|
power outage.
|
|
|
|
**Workaround**: NA
|
|
|
|
*************************************
|
|
Ceph OSD Recovery on an AIO-DX System
|
|
*************************************
|
|
|
|
In certain instances a Ceph OSD may not recover on an |AIO-DX| system
|
|
\(for example, if an OSD comes up after a controller reboot and a swact
|
|
occurs), and remains in the down state when viewed using the :command:`ceph -s`
|
|
command.
|
|
|
|
**Workaround**: Manual recovery of the OSD may be required.
|
|
|
|
********************************************************
|
|
Using Helm with Container-Backed Remote CLIs and Clients
|
|
********************************************************
|
|
|
|
If **Helm** is used within Container-backed Remote CLIs and Clients:
|
|
|
|
- You will NOT see any helm installs from |prod| Platform's system
|
|
FluxCD applications.
|
|
|
|
**Workaround**: Do not directly use **Helm** to manage |prod| Platform's
|
|
system FluxCD applications. Manage these applications using
|
|
:command:`system application` commands.
|
|
|
|
- You will NOT see any helm installs from end user applications, installed
|
|
using **Helm** on the controller's local CLI.
|
|
|
|
**Workaround**: It is recommended that you manage your **Helm**
|
|
applications only remotely; the controller's local CLI should only be used
|
|
for management of the |prod| Platform infrastructure.
|
|
|
|
*********************************************************************
|
|
Remote CLI Containers Limitation for StarlingX Platform HTTPS Systems
|
|
*********************************************************************
|
|
|
|
The python2 SSL lib has limitations with reference to how certificates are
|
|
validated. If you are using Remote CLI containers, due to a limitation in
|
|
the python2 SSL certificate validation, the certificate used for the 'ssl'
|
|
certificate should either have:
|
|
|
|
#. CN=IPADDRESS and SAN=empty or,
|
|
|
|
#. CN=FQDN and SAN=FQDN
|
|
|
|
**Workaround**: Use CN=FQDN and SAN=FQDN as CN is a deprecated field in
|
|
the certificate.
|
|
|
|
*******************************************************************
|
|
Cert-manager does not work with uppercase letters in IPv6 addresses
|
|
*******************************************************************
|
|
|
|
Cert-manager does not work with uppercase letters in IPv6 addresses.
|
|
|
|
**Workaround**: Replace the uppercase letters in IPv6 addresses with lowercase
|
|
letters.
|
|
|
|
.. code-block:: none
|
|
|
|
apiVersion: cert-manager.io/v1
|
|
kind: Certificate
|
|
metadata:
|
|
name: oidc-auth-apps-certificate
|
|
namespace: test
|
|
spec:
|
|
secretName: oidc-auth-apps-certificate
|
|
dnsNames:
|
|
- ahost.com
|
|
ipAddresses:
|
|
- fe80::903a:1c1a:e802::11e4
|
|
issuerRef:
|
|
name: cloudplatform-interca-issuer
|
|
kind: Issuer
|
|
|
|
*******************************
|
|
Kubernetes Root CA Certificates
|
|
*******************************
|
|
|
|
Kubernetes does not properly support **k8s_root_ca_cert** and **k8s_root_ca_key**
|
|
being an Intermediate CA.
|
|
|
|
**Workaround**: Accept internally generated **k8s_root_ca_cert/key** or
|
|
customize only with a Root CA certificate and key.
|
|
|
|
************************
|
|
Windows Active Directory
|
|
************************
|
|
|
|
- **Limitation**: The Kubernetes API does not support uppercase IPv6 addresses.
|
|
|
|
**Workaround**: The issuer_url IPv6 address must be specified as lowercase.
|
|
|
|
- **Limitation**: The refresh token does not work.
|
|
|
|
**Workaround**: If the token expires, manually replace the ID token. For
|
|
more information, see, :ref:`Configure Kubernetes Client Access <configure-kubernetes-client-access>`.
|
|
|
|
- **Limitation**: TLS error logs are reported in the **oidc-dex** container
|
|
on subclouds. These logs should not have any system impact.
|
|
|
|
**Workaround**: NA
|
|
|
|
- **Limitation**: **stx-oidc-client** liveness probe sometimes reports
|
|
failures. These errors may not have system impact.
|
|
|
|
**Workaround**: NA
|
|
|
|
.. Stx LP Bug: https://bugs.launchpad.net/starlingx/+bug/1846418
|
|
|
|
************
|
|
BMC Password
|
|
************
|
|
|
|
The BMC password cannot be updated.
|
|
|
|
**Workaround**: In order to update the BMC password, de-provision the BMC,
|
|
and then re-provision it again with the new password.
|
|
|
|
****************************************
|
|
Application Fails After Host Lock/Unlock
|
|
****************************************
|
|
|
|
In some situations, application may fail to apply after host lock/unlock due to
|
|
previously evicted pods.
|
|
|
|
**Workaround**: Use the :command:`kubectl delete` command to delete the evicted
|
|
pods and reapply the application.
|
|
|
|
***************************************
|
|
Application Apply Failure if Host Reset
|
|
***************************************
|
|
|
|
If an application apply is in progress and a host is reset it will likely fail.
|
|
A re-apply attempt may be required once the host recovers and the system is
|
|
stable.
|
|
|
|
**Workaround**: Once the host recovers and the system is stable, a re-apply
|
|
may be required.
|
|
|
|
********************************
|
|
Pod Recovery after a Host Reboot
|
|
********************************
|
|
|
|
On occasions some pods may remain in an unknown state after a host is rebooted.
|
|
|
|
**Workaround**: To recover these pods kill the pod. Also based on `https://github.com/kubernetes/kubernetes/issues/68211 <https://github.com/kubernetes/kubernetes/issues/68211>`__
|
|
it is recommended that applications avoid using a subPath volume configuration.
|
|
|
|
****************************
|
|
Rare Node Not Ready Scenario
|
|
****************************
|
|
|
|
In rare cases, an instantaneous loss of communication with the active
|
|
**kube-apiserver** may result in kubernetes reporting node\(s) as stuck in the
|
|
"Not Ready" state after communication has recovered and the node is otherwise
|
|
healthy.
|
|
|
|
**Workaround**: A restart of the **kublet** process on the affected node\(s)
|
|
will resolve the issue.
|
|
|
|
*************************
|
|
Platform CPU Usage Alarms
|
|
*************************
|
|
|
|
Alarms may occur indicating platform cpu usage is \>90% if a large number of
|
|
pods are configured using liveness probes that run every second.
|
|
|
|
**Workaround**: To mitigate either reduce the frequency for the liveness
|
|
probes or increase the number of platform cores.
|
|
|
|
*******************
|
|
Pods Using isolcpus
|
|
*******************
|
|
|
|
The isolcpus feature currently does not support allocation of thread siblings
|
|
for cpu requests (i.e. physical thread +HT sibling).
|
|
|
|
**Workaround**: NA
|
|
|
|
*****************************
|
|
system host-disk-wipe command
|
|
*****************************
|
|
|
|
The system host-disk-wipe command is not supported in this release.
|
|
|
|
**Workaround**: NA
|
|
|
|
*************************************************************
|
|
Restrictions on the Size of Persistent Volume Claims (PVCs)
|
|
*************************************************************
|
|
|
|
There is a limitation on the size of Persistent Volume Claims (PVCs) that can
|
|
be used for all StarlingX Platform Releases.
|
|
|
|
**Workaround**: It is recommended that all PVCs should be a minimum size of
|
|
1GB. For more information, see, `https://bugs.launchpad.net/starlingx/+bug/1814595 <https://bugs.launchpad.net/starlingx/+bug/1814595>`__.
|
|
|
|
***************************************************************
|
|
Sub-Numa Cluster Configuration not Supported on Skylake Servers
|
|
***************************************************************
|
|
|
|
Sub-Numa cluster configuration is not supported on Skylake servers.
|
|
|
|
**Workaround**: For servers with Skylake Gold or Platinum CPUs, Sub-NUMA
|
|
clustering must be disabled in the BIOS.
|
|
|
|
*****************************************************************
|
|
The ptp-notification-demo App is Not a System-Managed Application
|
|
*****************************************************************
|
|
|
|
The ptp-notification-demo app is provided for demonstration purposes only.
|
|
Therefore, it is not supported on typical platform operations such as Backup
|
|
and Restore.
|
|
|
|
**Workaround**: NA
|
|
|
|
*************************************************************************
|
|
Deleting image tags in registry.local may delete tags under the same name
|
|
*************************************************************************
|
|
|
|
When deleting image tags in the registry.local docker registry, you should be
|
|
aware that the deletion of an **<image-name:tag-name>** will delete all tags
|
|
under the specified <image-name> that have the same 'digest' as the specified
|
|
<image-name:tag-name>. For more information, see, :ref:`Delete Image Tags in the Docker Registry <delete-image-tags-in-the-docker-registry-8e2e91d42294>`.
|
|
|
|
**Workaround**: NA
|
|
|
|
------------------
|
|
Deprecated Notices
|
|
------------------
|
|
|
|
.. All please confirm if all these have been removed from the StarlingX 9.0 Release?
|
|
|
|
****************************
|
|
Airship Armada is deprecated
|
|
****************************
|
|
|
|
.. note::
|
|
|
|
Airship Armada is removed in stx.9.0 and replaced with FluxCD. All Armada
|
|
based applications have to be removed before you perform an
|
|
upgrade from |prod-long| Release 9.0 to |prod-long| Release 10.0.
|
|
|
|
.. note::
|
|
|
|
Some application repositories may still have "armada" in the file path but
|
|
are now supported by FluxCD. See https://opendev.org/starlingx/?sort=recentupdate&language=&q=armada.
|
|
|
|
StarlingX Release 7.0 introduces FluxCD based applications that utilize FluxCD
|
|
Helm/source controller pods deployed in the flux-helm Kubernetes namespace.
|
|
Airship Armada support is now considered to be deprecated. The Armada pod will
|
|
continue to be deployed for use with any existing Armada based applications but
|
|
will be removed in StarlingX Release 8.0, once the stx-openstack Armada
|
|
application is fully migrated to FluxCD.
|
|
|
|
************************************
|
|
Cert-manager API Version deprecation
|
|
************************************
|
|
|
|
The upgrade of cert-manager from 0.15.0 to 1.7.1, deprecated support for
|
|
cert manager API versions cert-manager.io/v1alpha2 and cert-manager.io/v1alpha3.
|
|
When creating cert-manager |CRDs| (certificates, issuers, etc) with |prod-long|
|
|
Release 8.0, use cert-manager.io/v1.
|
|
|
|
***************
|
|
Kubernetes APIs
|
|
***************
|
|
|
|
Kubernetes APIs that will be removed in K8s 1.25 are listed below:
|
|
|
|
**See**: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25
|
|
|
|
|
|
--------------------------------------
|
|
Release Information for other versions
|
|
--------------------------------------
|
|
|
|
You can find details about a release on the specific release page.
|
|
|
|
.. list-table::
|
|
|
|
* - Version
|
|
- Release Date
|
|
- Notes
|
|
- Status
|
|
* - StarlingX R8.0
|
|
- 2023-02
|
|
- https://docs.starlingx.io/r/stx.8.0/releasenotes/index.html
|
|
- Maintained
|
|
* - StarlingX R7.0
|
|
- 2022-07
|
|
- https://docs.starlingx.io/r/stx.7.0/releasenotes/index.html
|
|
- Maintained
|
|
* - StarlingX R6.0
|
|
- 2021-12
|
|
- https://docs.starlingx.io/r/stx.6.0/releasenotes/index.html
|
|
- Maintained
|
|
* - StarlingX R5.0.1
|
|
- 2021-09
|
|
- https://docs.starlingx.io/r/stx.5.0/releasenotes/index.html
|
|
- :abbr:`EOL (End of Life)`
|
|
* - StarlingX R5.0
|
|
- 2021-05
|
|
- https://docs.starlingx.io/r/stx.5.0/releasenotes/index.html
|
|
- :abbr:`EOL (End of Life)`
|
|
* - StarlingX R4.0
|
|
- 2020-08
|
|
-
|
|
- :abbr:`EOL (End of Life)`
|
|
* - StarlingX R3.0
|
|
- 2019-12
|
|
-
|
|
- :abbr:`EOL (End of Life)`
|
|
* - StarlingX R2.0.1
|
|
- 2019-10
|
|
-
|
|
- :abbr:`EOL (End of Life)`
|
|
* - StarlingX R2.0
|
|
- 2019-09
|
|
-
|
|
- :abbr:`EOL (End of Life)`
|
|
* - StarlingX R12.0
|
|
- 2018-10
|
|
-
|
|
- :abbr:`EOL (End of Life)`
|
|
|
|
|
|
StarlingX follows the release maintenance timelines in the `StarlingX Release
|
|
Plan <https://wiki.openstack.org/wiki/StarlingX/Release_Plan#Release_Maintenance>`_.
|
|
|
|
The Status column uses `OpenStack maintenance phase <https://docs.openstack.org/
|
|
project-team-guide/stable-branches.html#maintenance-phases>`_ definitions.
|
|
|