metal/mtce/src
Eric MacDonald d863aea172 Increase mtce host offline threshold to handle slow host shutdown
Mtce polls/queries the remote host for mtcAlive messages
for 42 x 100 ms intervals over unlock or host failed cases.
Absence of mtcAlive during this (~5 sec) period indicates
the node is offline.

However, in the rare case where shutdown is slow, 5 seconds
is not long enough. Rare cases have been seen where 7 or 8
second wait time is required to properly declare offline.

To avoid the rare transient 200.004 host alarm over an
unlock operation, this update increases the mtce host
offline window from 5 to 10 seconds (approx) by modifying
the mtce configuration file offline threshold from 42 to 90.

Test Plan:

PASS: Verify unchallenged failed to offline period to be ~10 secs
PASS: Verify algorithm restarts if there is mtcAlive received
      anytime during the polls/queries (challenge) window.
PASS: Verify challenge handling leads to a longer but
      successful offline declaration.
PASS: Verify above handling for both unlock and spontaneous
      failure handling cases.

Closes-Bug: 2024249
Change-Id: Ice41ed611b4ba71d9cf8edbfe98da4b65dcd05cf
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2023-06-16 18:14:08 +00:00
..
alarm Alarm Hostname controller function has in-service failure reported 2022-10-05 10:30:01 -04:00
common Cleanup mtcAgent error logging during startup 2023-02-14 14:18:02 -05:00
fsmon Fix remaining failing mtce services on Debian 2022-01-25 12:10:39 -03:00
fsync Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
heartbeat Remove swerr log in hbsAgent cluster delete 2021-06-14 19:04:33 -04:00
hostw Change hostwd emergency log to write to /dev/kmsg 2023-02-01 23:41:14 +00:00
hwmon Re-enable sensor suppression support in Mtce Hardware Monitor 2022-08-06 00:02:29 +00:00
lmon Fix failing mtce services on Debian 2022-01-14 10:50:09 -03:00
maintenance Cleanup mtcAgent error logging during startup 2023-02-14 14:18:02 -05:00
mtclog Set restricted permissions for mtce logfiles 2019-07-17 18:19:52 -04:00
pmon Fix bashate failure in zuul 2022-10-06 17:22:12 +00:00
public Fix mtce build error with gcc-8.2.1 2020-04-03 14:44:21 +08:00
scripts Increase mtce host offline threshold to handle slow host shutdown 2023-06-16 18:14:08 +00:00
LICENSE Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
Makefile Remove Resource Monitor ; aka rmon, from the load 2019-03-19 16:12:38 -04:00