
The maintenance alarm handling daemon (mtcalarmd) should not drop alarm requests simply because FM process is not running. Insteads it should retry for it and other FM error cases that will likely succeed in time if they are retried. Some error cases however do need to be dropped such as those that are unlikely to succeed with retries. Reviewed FM return codes with FM designer which lead to a list of errors that should drop and others that should retry. This update implements that handling with a posting and servicing of a first-in / first-out alarm queue. Typical retry case is the NOCONNECT error code which occurs when FM is not running. Alarm ordering and first try timestamp is maintained. Retries and logs are throttled to avoid flooding. Test Plan: PASS: Verify success path alarm handling End-to-End. PASS: Verify retry handling while FM is not running. PASS: Verify handling of all FM error codes (fit tool). PASS: Verify alarm handling under stress (inject-alarm script) soak. PASS: verify no memory leak over stress soak. PASS: Verify logging (success, retry, failure) PASS: Verify alarm posted date is maintained over retry success. Change-Id: Icd1e75583ef660b767e0788dd4af7f184bdb9e86 Closes-Bug: 1841653 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
metal
StarlingX Bare Metal Management
Description
Languages
C++
83%
Shell
10.2%
Python
3.3%
C
2.5%
Makefile
1%