metal/mtce
Eric MacDonald dab9c4774b Maintenance does not auto-start worker host services in AIO
The mtcClient is required to 'start host services' autonomously
following a node reboot. This is to handle the usecase where
the administrator disables maintenance heartbeat loss auto recovery.
If that node then reboots on its own, for whatever reason, maintenance
needs to ensure that it auto starts 'host services'.

A fairly recent update delivered support for that usecase:

    https://opendev.org/starlingx/metal/commit/
    1335bc484d

However, the current mechanism the mtcClient used to manage auto-
starting host services did not handle the worker subfunction case.
Moreover, the current implementation is not handling the potential
concurrency between the mtcClient process startup case and mtcAgent
requests during unlock recovery.

This case also fixes an issue where the mtcClient sometimes gets
into a mode where it floods the mtcAgent with a start host services
result message ; 20 unnecessary messages / sec. The aforementioned
update modified the mtcAgent to log receipt of this message which
then floods the mtcAgent log leading to unnecessary message handling
and log rotations.

Test Plan:

Success Path:

PASS: Verify mtcClient success path handling of start and stop host
      services function for the various node types in a ...
      - standard system with worker and storage nodes
      - all-in-one system with worker node
PASS: Verify appropriate start host services are run on each node
      type following a Dead Office Recovery (DOR).
      - standard system with worker and storage nodes
      - all-in-one system with worker node
PASS: Verify the mtcClient does not unnecessarily send host services
      result messages.
PASS: Verify handling of periodic start host services message while
      a node is in service.

Failure Path:

PASS: Verify mtcClient failure path handling of start and stop host
      services function for the various node types in a ...
      - standard system with worker and storage nodes
      - all-in-one system with worker node

PASS: Verify mtcClient start host services command handling when
      when message requests interleave with auto start handling
      during unlock recovery.

Closes-Bug: 2073802
Change-Id: I0da7a16c1f600cc60364f6bcec7587e2ff71c624
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-08-09 14:48:05 +00:00
..
debian Update crashDumpMgr to source config from envfile 2023-10-06 23:06:54 +00:00
src Maintenance does not auto-start worker host services in AIO 2024-08-09 14:48:05 +00:00
PKG-INFO Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00