From 52b70f81c23f2404ba27e58879a66fad6300bad4 Mon Sep 17 00:00:00 2001 From: Ron Stone Date: Fri, 22 Oct 2021 12:12:38 -0400 Subject: [PATCH] Alarm Expiring or Expired Certificates Added topic on new expiring/expired cert alarms. Added 2x alarms to 500 series alarms messages page. NB. Details need to be confirmed. Minor update for clarity around use of kubernetes edit ... Added sample fm output Updtes to alarm definitions based on events.yaml Incorporated (Word) updates from Greg W. Patchset 4 review updates. Patchset 5 review updates. Fixed merge conflict in sec/kub/index Patchset 7 review updates. Patchset 8 review update (note about cert expiry check frequency) Story: 2008946 Task: 43568 Signed-off-by: Ron Stone Change-Id: Ifeeba7484e49abcaf2d1ad2afc9afc876d479ded --- .../kubernetes/500-series-alarm-messages.rst | 77 ++++++++- doc/source/fault-mgmt/kubernetes/index.rst | 1 + ...-and-expired-certificates-baf5b8f73009.rst | 150 ++++++++++++++++++ doc/source/security/kubernetes/index.rst | 1 + 4 files changed, 227 insertions(+), 2 deletions(-) create mode 100644 doc/source/security/kubernetes/alarm-expiring-soon-and-expired-certificates-baf5b8f73009.rst diff --git a/doc/source/fault-mgmt/kubernetes/500-series-alarm-messages.rst b/doc/source/fault-mgmt/kubernetes/500-series-alarm-messages.rst index caf014ba9..105957d70 100644 --- a/doc/source/fault-mgmt/kubernetes/500-series-alarm-messages.rst +++ b/doc/source/fault-mgmt/kubernetes/500-series-alarm-messages.rst @@ -19,7 +19,7 @@ health of the system. :header-rows: 0 * - **Alarm ID: 500.100** - - TPM initialization failed on host. + - |TPM| initialization failed on host. * - Entity Instance - tenant= * - Degrade Affecting Severity: @@ -46,4 +46,77 @@ health of the system. - C * - Proposed Repair Action - Reinstall system to disable developer certificate and remove untrusted - patches. \ No newline at end of file + patches. + +----- + +.. list-table:: + :widths: 6 25 + :header-rows: 0 + + * - **Alarm ID: 500.200** + - Certificate ‘system certificate-show ' (mode=) expiring soon on . + OR + Certificate ‘/’ expiring soon on . + OR + Certificate ‘’ expiring soon on . + system.certificate.k8sRootCA + * - Entity Instance + - system.certificate.mode=.uuid= + OR + namespace=.certificate= + OR + namespace=.secret= + * - Degrade Affecting Severity: + - None + * - Severity: + - M + * - Proposed Repair Action + - Renew certificate for the entity identified. + * - Alarm_Type: + - operational-violation + * - Probable_Cause: + - certificate-expiration + * - Service_Affecting: + - False + * - Suppression: + - False + * - Management_Affecting_Severity: + - none + +----- + +.. list-table:: + :widths: 6 25 + :header-rows: 0 + + * - **Alarm ID: 500.210** + - Certificate ‘system certificate-show ' (mode=) expired. + OR + Certificate ‘/’ expired. + OR + Certificate ‘’ expired. + * - Entity Instance + - system.certificate.mode=.uuid= + OR + namespace=.certificate= + OR + namespace=.secret= + OR + system.certificate.k8sRootCA + * - Degrade Affecting Severity: + - None + * - Severity: + - C + * - Proposed Repair Action + - Renew certificate for the entity identified. + * - Inhibit_Alarms: + - Alarm_Type: operational-violation + * - Probable_Cause: + - certificate-expiration + * - Service_Affecting: + - False + * - Suppression: + - False + * - Management_Affecting_Severity: + - none \ No newline at end of file diff --git a/doc/source/fault-mgmt/kubernetes/index.rst b/doc/source/fault-mgmt/kubernetes/index.rst index 4142cc373..767f71f65 100644 --- a/doc/source/fault-mgmt/kubernetes/index.rst +++ b/doc/source/fault-mgmt/kubernetes/index.rst @@ -89,6 +89,7 @@ SNMP setting-snmp-identifying-information uninstalling-snmp + ****************************** Troubleshooting log collection ****************************** diff --git a/doc/source/security/kubernetes/alarm-expiring-soon-and-expired-certificates-baf5b8f73009.rst b/doc/source/security/kubernetes/alarm-expiring-soon-and-expired-certificates-baf5b8f73009.rst new file mode 100644 index 000000000..22863c04b --- /dev/null +++ b/doc/source/security/kubernetes/alarm-expiring-soon-and-expired-certificates-baf5b8f73009.rst @@ -0,0 +1,150 @@ +.. _alarm-expiring-soon-and-expired-certificates-baf5b8f73009: + +============================================ +Expiring-Soon and Expired Certificate Alarms +============================================ + +Expired certificates may prevent the proper operation of platform and +applications running on the platform. In order to avoid expired certificates, +|prod| generates alarms for certificates that are within 30 days (default) of +expiry or have already expired. + +.. contents:: |minitoc| + :local: + :depth: 1 + +This functionality is enabled by default for all platform and user-installed +certificates that are approaching their respective expiry dates. User-override +options are available for customizing the alarm behavior. + +The two types of certificate alarms are: + +* ``Expiring Soon`` (alarm ID: 500.200, severity: major); by default raised 30 + days prior to expiry of the certificate. +* ``Expired`` (alarm ID: 500.210, severity: critical). + +.. note:: + Certificates are checked every 24 hours to raise an Expiring-Soon or Expired + alarm and alarms may not occur at precise 24 hour multiples of the times + they were set. + +.. code-block:: none + + [sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list + +----------+------------------------------------------------------------------------------------------+--------------------------------------+----------+------------------+ + | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | + +----------+------------------------------------------------------------------------------------------+--------------------------------------+----------+------------------+ + | 500.200 | Certificate 'system certificate-show 89b332d9-d590-4447-bf5a-6edc61c2d0e4' (mode=ssl_ca) | system.certificate.mode=ssl_ca.uuid= | major | 2021-10-08T15:34 | + | | is expiring soon on 2021-10-15, 00:00:00 | 89b332d9-d590-4447-bf5a-6edc61c2d0e4 | | :49.451107 | + | | | | | | + | 400.001 | Service group controller-services degraded; cert-alarm(enabled-active, ) | service_domain=controller. | major | 2021-10-08T15:34 | + | | | service_group=controller-services. | | :27.494473 | + | | | host=controller-0 | | | + | | | | | | + | 100.103 | Memory threshold exceeded ; threshold 80.00%, actual 81.12% | host=controller-0.memory=platform | major | 2021-10-08T00:21 | + | | | | | :25.237489 | + | | | | | | + +----------+------------------------------------------------------------------------------------------+--------------------------------------+----------+------------------+ + +The platform monitors the following resources to track and audit certificate +expiry dates: + +* All |TLS| type secrets in all Kubernetes namespaces. + + This includes secrets that you create directly or secrets that are indirectly + created by configuring a Cert-Manager certificate. + +* All certificates installed on the platform via the :command:`system + certificate-install` command. + +* Other internal certificates required by the platform such as Kubernetes + RootCA, Etcd RootCA etc. + + .. note:: + + For certificates managed by cert-manager, the expiring soon alarm is not + generated unless the certificate's ``renewBefore`` date is past. In this + way, alarms for certificates auto-renewed by cert-manager, will only occur + if the renew failed. + +Overriding Default Certificate Alarming Behavior +================================================ + +For certificates that exist under the Kubernetes domain, Kube Annotations can +be used to override the default certificate alarming behavior. All other +certificate types only support default certificate alarming behavior and cannot +be overridden. + +.. note:: + + If you added a certificate by directly creating a Kubernetes |TLS| Secret, + the annotation should be added to that Kubernetes Secret resource. If the + Secret was indirectly created by configuring a Cert-Manager certificate + resource, the annotation should be added to the certificate resource. + +The supported annotations are: + +* ``starlingx.io/alarm: `` (default=enabled) + +* ``starlingx.io/alarm-before: `` (default=30d) + +* ``starlingx.io/alarm-severity: `` + +* ``starlingx.io/alarm-text: `` + + +.. rubric:: |eg| + +If the ``system-restapi-gui-certificate`` has been configured to install the +StarlingX RESTAPI / Webserver certificate to be managed by Cert-Manager, the +default annotations can be edited: + +#. Open the current configuration: + + .. code-block:: none + + $ kubectl edit certificate system-restapi-gui-certificate -n deployment + +#. Make the following configuration changes: + + .. code-block:: none + + metadata: + + annotations: + + starlingx.io/alarm: enabled + + starlingx.io/alarm-before: 15d + + starlingx.io/alarm-severity: minor + + starlingx.io/alarm-text: “webserverAPI certificate” + +These override settings cause the ``system-restapi-gui-certificate`` resource +to be monitored via the ``alarm: enabled`` annotation. An alarm will be raised +15 days before the certificate expiry if the certificate is soon-to-expire or +has expired with a minor severity. The alarm text will be prefixed with the +string ``webserverAPI certificate``, resulting in ``webserverAPI certificate +namespace=deployment.certificate=system-restapi-gui-certificate is expiring +soon on ``. + +Corrective action +================= + +When a certificate alarm occurs, the resource should be updated in order to +clear the alarm. If the certificate was installed via the :command:`system +certificate-install` command, a new certificate needs to be obtained and +re-installed. For certificates that are managed by Cert-Manager, the +certificates will auto-renew provided there are no configuration errors; list +issues with cert-manager auto-renewal of a certificate with :command:`kubectl +-n describe certificate `. + +.. note:: + + It may take up to one hour for an active alarm to clear after corrective + action has been taken. + +.. seealso:: + + :ref:`500-series-alarm-messages` diff --git a/doc/source/security/kubernetes/index.rst b/doc/source/security/kubernetes/index.rst index ec53180b0..dc268cae3 100644 --- a/doc/source/security/kubernetes/index.rst +++ b/doc/source/security/kubernetes/index.rst @@ -112,6 +112,7 @@ HTTPS Certificate Management dc-admin-endpoint-certificates-8fe7adf3f932 add-a-trusted-ca one-single-root-ca-multiple-server-client-certificates-0692df6ce16d + alarm-expiring-soon-and-expired-certificates-baf5b8f73009 ************ Cert Manager