Eric MacDonald dab9c4774b Maintenance does not auto-start worker host services in AIO
The mtcClient is required to 'start host services' autonomously
following a node reboot. This is to handle the usecase where
the administrator disables maintenance heartbeat loss auto recovery.
If that node then reboots on its own, for whatever reason, maintenance
needs to ensure that it auto starts 'host services'.

A fairly recent update delivered support for that usecase:

    https://opendev.org/starlingx/metal/commit/
    1335bc484df331771e995ae822df3af84cc5739d

However, the current mechanism the mtcClient used to manage auto-
starting host services did not handle the worker subfunction case.
Moreover, the current implementation is not handling the potential
concurrency between the mtcClient process startup case and mtcAgent
requests during unlock recovery.

This case also fixes an issue where the mtcClient sometimes gets
into a mode where it floods the mtcAgent with a start host services
result message ; 20 unnecessary messages / sec. The aforementioned
update modified the mtcAgent to log receipt of this message which
then floods the mtcAgent log leading to unnecessary message handling
and log rotations.

Test Plan:

Success Path:

PASS: Verify mtcClient success path handling of start and stop host
      services function for the various node types in a ...
      - standard system with worker and storage nodes
      - all-in-one system with worker node
PASS: Verify appropriate start host services are run on each node
      type following a Dead Office Recovery (DOR).
      - standard system with worker and storage nodes
      - all-in-one system with worker node
PASS: Verify the mtcClient does not unnecessarily send host services
      result messages.
PASS: Verify handling of periodic start host services message while
      a node is in service.

Failure Path:

PASS: Verify mtcClient failure path handling of start and stop host
      services function for the various node types in a ...
      - standard system with worker and storage nodes
      - all-in-one system with worker node

PASS: Verify mtcClient start host services command handling when
      when message requests interleave with auto start handling
      during unlock recovery.

Closes-Bug: 2073802
Change-Id: I0da7a16c1f600cc60364f6bcec7587e2ff71c624
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-08-09 14:48:05 +00:00
2023-08-29 16:50:22 -04:00
2019-04-19 19:52:33 +00:00
2023-04-28 12:38:51 -04:00
2018-05-31 07:36:43 -07:00
2023-07-19 12:32:13 -03:00
2022-12-26 23:26:54 +00:00

metal

The starlingx/metal repository handles StarlingX Bare Metal Management1.

This repository is not intended to be developed standalone, but rather as part of the StarlingX Source System, which is defined by the StarlingX manifest2.

References


  1. https://docs.starlingx.io/api-ref/metal↩︎

  2. https://opendev.org/starlingx/manifest.git↩︎

Description
StarlingX Bare Metal and Node Management, Hardware Maintenance
Readme 15 MiB
Languages
C++ 83%
Shell 10.2%
Python 3.3%
C 2.5%
Makefile 1%