Fix hbsAgent log flooding when SM heartbeat fails persistently

If the SM part of this update is missing or the SM heartbeat
is missing for a long period of time the hbsAgent produces
5 logs every 10 seconds reporting the missing SM heartbeat.

This is a follow-up update to its parent update
https://review.opendev.org/c/starlingx/metal/+/751558

This update throttles the warning log and corresponding
cluster dump when SM heartbeat is persistently missing.

PASS: Verify hbsAgent service and log behavior when SM
      heartbeat is persistently missing.

Change-Id: Ib379ed5d37b5349ca170b5661a930b6a71c2bed1
Partial-Fix: 1895350
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit is contained in:
Eric MacDonald 2020-12-16 21:16:48 -05:00
parent c88c4cb7ee
commit 484d662cb7

View File

@ -1521,6 +1521,7 @@ void hbs_sm_handler ( void )
* False if time delta is greater
*
***************************************************************************/
#define HUGE_NUMBER_B2B_SM_HEARTBEAT_MISSES (10000)
bool manage_sm_heartbeat ( void )
{
struct timespec ts ;
@ -1532,8 +1533,9 @@ bool manage_sm_heartbeat ( void )
if ( delta_in_ms > SM_HEARTBEAT_PULSE_PERIOD_MSECS )
{
sm_heartbeat_count = 0;
if (( ++sm_heartbeat_count_b2b_misses < 20 )||
(!( sm_heartbeat_count_b2b_misses % 100 )))
if ((( ++sm_heartbeat_count_b2b_misses < 20 ) ||
(!( sm_heartbeat_count_b2b_misses % 1000 ))) &&
( sm_heartbeat_count_b2b_misses < HUGE_NUMBER_B2B_SM_HEARTBEAT_MISSES ))
{
wlog("SM Heartbeat missing since %ld.%03ld secs ago ; HBS Period Misses:%3d ; Running HB Count:%4d",
delta.secs, delta.msecs,
@ -2523,7 +2525,9 @@ void daemon_service_run ( void )
}
}
/* log cluster throttled */
if (( heartbeat_ok == false ) && ( !( sm_heartbeat_count_b2b_misses % 100 )))
if ((( heartbeat_ok == false ) &&
( !( sm_heartbeat_count_b2b_misses % 1000 ))) &&
( sm_heartbeat_count_b2b_misses < HUGE_NUMBER_B2B_SM_HEARTBEAT_MISSES ))
{
hbs_state_audit ( );
}