Add PS Redundancy Sensor to Redfish server power sensor group

This update adds the Power Supply Redundancy sensor to the redfish
server power sensor group.

Some special handling is required to make the assertion of this
new sensor have a 'major' severity level and only while there are
2 or more power supplies provisioned. See code comments in the review
that highlight the assertion only applies when the redundancy sensor
count is 2 and severity is overridden from critical to major.

This update does not apply to the IPMI 'server power' sensor group.
This is because the IPMI protocol does not distinguish between single
and redundant power supply provisioning cases and reports a redundancy
loss in the single power supply case even when that power supply is
operating fine.

Test Plan:

PASS: Verify new PS Redundancy sensor is added to the server
      power sensor group with redfish sensor monitoring.
PASS: Verify no PS Redundancy assertion with redundant power
      supplies installed while both have AC power input.
PASS: Verify major PS Redundancy assertion with redundant power
      supplies installed while one not receiving AC power input.
PASS: Verify no PS Redundancy assertion with single power supply.

PASS: Verify PS Redundancy sensor goes offline when 'state' is
      not 'Enabled' and returns to operating state when re-Enabled.

PASS: Verify PS Redundancy sensor goes 'offline' when
      Redundancy label is missing.
PASS: Verify PS Redundancy sensor goes 'offline' when
      RedundancySet count is missing.
PASS: Verify PS Redundancy sensor goes 'offline' when
      Status label is missing.

PASS: Verify PS Redundancy sensor assertion when Status:Health
      is not 'OK'.
PASS: Verify PS Redundancy sensor goes 'offline' when Status:State
      is not 'Enabled'.
PASS: Verify new PS Redundancy sensor survives a process restart.
PASS: Verify new PS Redundancy sensor asserts with non-OK status
      while redundancy count is greater than one.

Regression:

PASS: Verify host is degraded when PS redundancy alarm is asserted.
PASS: Verify alarm and degrade is cleared if sensor reads OK.
PASS: Verify alarm and degrade is cleared if sensor goes offline.
PASS: Verify a 'logged-major' PS Redundancy assertion raises alarm
      when the group action is changed to 'alarm'.
PASS: Verify a' alarm-major' PS Redundancy assertion clears alarm
      when the group action is changed to 'log'.
PASS: Verify no PS Redundancy sensor is added to the server
      power sensor group with ipmi sensor monitoring.
PASS: Verify no PS Redundancy assertion with single or redundant
      power supplies with ipmi sensor monitoring.
PASS: Verify all sensor assertions are cleared when a server's BMC
      is reprovisioned by bm_type or bm_ip address or completely
      deprovisioned by bm_type=none.
PASS: Verify basic hardware monitor sensor assertion/clear operations.

Closes-Bug: 2076200
Change-Id: Ieae8f2b8681d1a2b29da0707b2f439cf10c47a2c
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This commit is contained in:
Eric MacDonald 2024-08-07 01:38:29 +00:00
parent b29fb32f60
commit 50204147ff
2 changed files with 73 additions and 14 deletions

View File

@ -494,14 +494,18 @@ void * hwmonThread_ipmitool ( void * arg )
/****************************************************************
*
* This fault insertion case is added for PV.
* If MTC_CMD_FIT__SENSOR_DATA file is present then no ipmitool
* If MTC_CMD_FIT__SENSOR_DATA file is present and
* there is an existing datafile then no ipmitool
* sensor read is performed. Instead, a raw output file can be
* placed in /var/run/fit/<hostname>_sensor_data and used to
* perform sensor fault insertion that way.
* manually updated and used to perform sensor fault insertion.
*
*****************************************************************/
if ( daemon_is_file_present ( MTC_CMD_FIT__SENSOR_DATA ))
if (( daemon_is_file_present ( MTC_CMD_FIT__SENSOR_DATA )) &&
( daemon_is_file_present ( sensor_datafile.data())))
{
ilog_t ("%s bypass sensor data read ; %s FIT file is present",
info_ptr->hostname.c_str(),
MTC_CMD_FIT__SENSOR_DATA);
rc = PASS ;
}
#ifdef WANT_FIT_TESTING
@ -810,6 +814,7 @@ static void _set_default_unit_type_for_sensor( thread_info_type * info_ptr,
strcpy( _sample_list[samples].unit, BMC_SENSOR_DEFAULT_UNIT_TYPE_TEMP);
}
else if (( label == REDFISH_SENSOR_LABEL_POWER_CTRL ) ||
( label == REDFISH_SENSOR_LABEL_POWER_REDUNDANCY ) ||
( label == REDFISH_SENSOR_LABEL_POWER_SUPPLY ))
{
strcpy( _sample_list[samples].unit, BMC_SENSOR_DEFAULT_UNIT_TYPE_POWER);
@ -879,6 +884,10 @@ static int _parse_redfish_sensor_data( char * json_str_ptr, thread_info_type * i
string temp_str;
std::list<string>::iterator iter_curr_ptr ;
// Required for special case handling of the Power Supply Redundancy Sensor
bool is_power_supply_redundancy_sensor = false ;
int redundancy_count = 0 ;
for ( iter_curr_ptr = sensor_list.begin();
iter_curr_ptr != sensor_list.end() ;
++iter_curr_ptr )
@ -926,13 +935,22 @@ static int _parse_redfish_sensor_data( char * json_str_ptr, thread_info_type * i
strcpy( _sample_list[samples].unit, BMC_SENSOR_DEFAULT_UNIT_TYPE_FANS);
}
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "LowerThresholdNonRecoverable", lnr )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "LowerThresholdCritical", lcr )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "LowerThresholdNonCritical", lnc )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "UpperThresholdNonCritical", unc )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "UpperThresholdCritical", ucr )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "UpperThresholdNonRecoverable", unr )
/* The Redundancy sensor does not have Upper/Lower threshold labels */
if ( label.compare(REDFISH_SENSOR_LABEL_POWER_REDUNDANCY ) )
{
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "LowerThresholdNonRecoverable", lnr )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "LowerThresholdCritical", lcr )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "LowerThresholdNonCritical", lnc )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "UpperThresholdNonCritical", unc )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "UpperThresholdCritical", ucr )
GET_SENSOR_DATA_VALUE( temp_str, json_obj, "UpperThresholdNonRecoverable", unr )
}
else
{
is_power_supply_redundancy_sensor = true ;
redundancy_count = atoi(_sample_list[samples].value);
blog2_t ("%s redundancy count is %d", info_ptr->hostname.c_str(), redundancy_count );
}
/* Set default unit type if can not get unit type from json string */
if ( !strcmp(_sample_list[samples].unit, "na") )
{
@ -948,9 +966,42 @@ static int _parse_redfish_sensor_data( char * json_str_ptr, thread_info_type * i
{
string state = jsonUtil_get_key_value_string ( json_status_obj, "State" );
string health = jsonUtil_get_key_value_string ( json_status_obj, "Health" );
// string healthRollup = jsonUtil_get_key_value_string ( json_status_obj, "HealthRollup" );
if ( !strcmp (state.data(),"Enabled" ))
{
// This condition is to override the reported health status
// of the Power Supply Redundancy sensor from Critical,
// or not OK, to Major.
//
// Some servers report a Critical status when there is only
// a single power supply installed. We don't want to do that
// here because some systems may be intentionally provisioned
// with a single power supply to save cost. We don't want to
// alarm in that case.
//
// Furthermore, when there are 2 installed power supplies
// and one is failing or not plugged in some servers report
// that as Critical. We don't want to raise a Critical alarm
// simply due to a lack of redundancy. Critical alarms are
// reserved for service affecting error conditions.
//
// System administrators that wish to have an alarm for this
// case can choose to modify the hardware monitor server power
// group major action handling from the default 'log' to 'alarm'.
//
// In Summary, only assert the redundancy failure status if
// - redundancy count of 2 and
// - health reading is not OK
if (( is_power_supply_redundancy_sensor == true ) &&
( strcmp (health.data(), REDFISH_SEVERITY__GOOD)))
{
if ( redundancy_count > 1 )
health = REDFISH_SEVERITY__MAJOR ;
else if ( redundancy_count == 0 )
health = "" ;
else
health = REDFISH_SEVERITY__GOOD ;
}
if ( !strcmp (health.data(), REDFISH_SEVERITY__GOOD ))
{
strcpy(_sample_list[samples].status, "ok");
@ -1065,8 +1116,12 @@ static int _redfishUtil_send_request( thread_info_type * info_ptr, string & data
request.c_str());
if (( info_ptr->command == BMC_THREAD_CMD__READ_SENSORS ) &&
( daemon_is_file_present ( MTC_CMD_FIT__SENSOR_DATA )))
( daemon_is_file_present ( MTC_CMD_FIT__SENSOR_DATA )) &&
( daemon_is_file_present ( datafile.data())))
{
ilog_t ("%s bypass sensor data read ; %s FIT file is present",
info_ptr->hostname.c_str(),
MTC_CMD_FIT__SENSOR_DATA);
rc = PASS ;
}
else
@ -1178,6 +1233,8 @@ static int _parse_redfish_sensor_data_output_file( thread_info_type * info_ptr,
{
_parse_redfish_sensor_data( buffer, info_ptr, REDFISH_SENSOR_LABEL_VOLT,
REDFISH_SENSOR_LABEL_VOLT_READING, samples);
_parse_redfish_sensor_data( buffer, info_ptr, REDFISH_SENSOR_LABEL_POWER_REDUNDANCY,
REDFISH_SENSOR_LABEL_REDUNDANCY_READING, samples);
_parse_redfish_sensor_data( buffer, info_ptr, REDFISH_SENSOR_LABEL_POWER_SUPPLY,
REDFISH_SENSOR_LABEL_POWER_SUPPLY_READING, samples);
_parse_redfish_sensor_data( buffer, info_ptr, REDFISH_SENSOR_LABEL_POWER_CTRL,

View File

@ -2,7 +2,7 @@
#define __INCLUDE_HWMONTHREAD_HH__
/*
* Copyright (c) 2016-2017 Wind River Systems, Inc.
* Copyright (c) 2016-2017, 2024 Wind River Systems, Inc.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -23,12 +23,14 @@
#define REDFISH_SENSOR_LABEL_FANS "Fans"
#define REDFISH_SENSOR_LABEL_POWER_SUPPLY "PowerSupplies"
#define REDFISH_SENSOR_LABEL_POWER_CTRL "PowerControl"
#define REDFISH_SENSOR_LABEL_POWER_REDUNDANCY "Redundancy"
#define REDFISH_SENSOR_LABEL_VOLT_READING "ReadingVolts"
#define REDFISH_SENSOR_LABEL_TEMP_READING "ReadingCelsius"
#define REDFISH_SENSOR_LABEL_FANS_READING "Reading"
#define REDFISH_SENSOR_LABEL_POWER_SUPPLY_READING "None"
#define REDFISH_SENSOR_LABEL_POWER_CTRL_READING "PowerConsumedWatts"
#define REDFISH_SENSOR_LABEL_REDUNDANCY_READING "RedundancySet@odata.count" // keep generic ; don't imply power
#define REDFISH_SEVERITY__GOOD "OK"
#define REDFISH_SEVERITY__MAJOR "Warning"