StarlingX Bare Metal and Node Management, Hardware Maintenance
c4b8171ddd
The current mechanism used to preserve the learned bmc protocol in the filesystem on the active controller is problematic over swact. This update removes the file storage method in favor of preserving the learned protocol in the system inventory database as a key/value pair at the host level in already existing mtce_info database field. The specified or learned bmc access protocol is then shared with the hardware monitor through inter-daemon maintenance messaging. This update refactors bmc provisioning to accommodate bmc protocol selection at the host rather than system level. Towards that this update removes system level bmc_access_method selection in favor of host level selection through bm_type. A bm_type of 'bmc' specifies that the bmc access protocol for that host be learned. This has the effect of making it the same as what is delivered today but without support for changing it as the system level. A system inventory update will be delivered shortly that enables bmc access protocol selection at the host level. That update allows the customer to specify the bmc access protocol at the host level to be either dynamic (aka learned) or to only use 'redfish' or 'ipmi'. That system inventory update delivers that information to maintenance through bm_type via bmc provisioning. Until that update is delivered bm_type always comes in as 'bmc' which get interpreted as 'dynamic' to maintain existing configuration. The following additional issues were also fixed in this update. 1. The nodeTimers module defaults the 'ring' member of timers that are not running to false but should be true. 2. Added a pingUtil_restart function to facilitate quicker sensor monitoring following provisioning changes and bmc access failures. 3. Enhanced the hardware monitor sensor grouping filter to accommodate non-standard Redfish readout labelling so that more sensors fall into the existing canned groups ; leads to more monitored sensors. 4. Added a 'http security mode' to hardware monitor messaging. This defaults to https as that is all that is supported by the Redfish implementation today. This field can be used to specify non-secure 'http' mode in the future when that gets implemented. 5. Ensure the hardware monitor performs a bmc password re-fetch on every provisioning change. Test Plan: PASS: Verify bmc access protocol store/fetched from the database (mtce_info) PASS: Verify inventory push from mtcAgent to hwmond over mtcAgent restart PASS: Verify inventory push from mtcAgent to hwmond over hwmon restart PASS: Verify bmc provisioning of ipmi and redfish servers PASS: Verify learned bmc protocol persists over process restart and swact PASS: Verify process startup with protocol already learned Hardware Monitor: PASS: Verify bmc_type=ipmi handling ; protocol forced to ipmi ; (re)prov PASS: Verify bmc_type=redfish handling ; protocol forced to redfish ; (re)prov PASS: Verify bmc_type=dynamic handling ; protocol is learned then persisted PASS: Verify sensor model delete and relearn over ip address change PASS: Verify sensor model delete and relearn over bm_type change change PASS: Verify sensor model not relearned username change PASS: Verify bm pw is re-fetched over any (re)provisioning change PASS: Verify bmc re-provisioning soak (test-bmc-reprovisioning.sh 50 loops) PASS: Verify protocol change handling, file cleanup, model recreation PASS: Verify End-2-End behavior for bm_type change from redfish to ipmi PASS: Verify End-2-End behavior for bm_type change from ipmi to redfish PASS: Verify End-2-End behavior for bm_type change from redfish to dynamic PASS: Verify End-2-End behavior for bm_type change from ipmi to dynamic PASS: Verify End-2-End behavior for bm_type change from dynamic to ipmi PASS: Verify End-2-End behavior for bm_type change from dynamic to redfish PASS: Verify sensor model creation waits for server power to be on PASS: Verify sensor relearn by provisioning change during model creation. (soak) Regression: PASS: Verify host power off and on. PASS: Verify BMC access alarm handling (assert and clear) PASS: Verify mtcAgent and hwmond logs add value PASS: Verify no core dumps / seg faults. PASS: Verify no mtcAgent and hwmond memory leak. PASS: Verify delete of BMC provisioned host PASS: Verify sensor monitoring, alarming, degrade and then clear cycle PASS: Verify static analysis report of changed modules. PASS: Verify host level bm_type=bmc functions as would dynamic selection PASS: Verify batch provisioning and deprovisioning (7 nodes) PASS: Verify batch provisioning to different protocol (5 nodes) PASS: Verify handling of flaky Redfish responses PEND: Verify System Install Change-Id: Ic224a9c33e0283a611725b33c90009132cab3382 Closes-Bug: #1853471 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com> |
||
---|---|---|
api-ref/source | ||
bsp-files | ||
devstack | ||
doc | ||
installer | ||
inventory | ||
kickstart | ||
mtce | ||
mtce-common | ||
mtce-compute | ||
mtce-control | ||
mtce-storage | ||
python-inventoryclient | ||
releasenotes | ||
.gitignore | ||
.gitreview | ||
.zuul.yaml | ||
centos_build_layer.cfg | ||
centos_iso_image.inc | ||
centos_pkg_dirs | ||
CONTRIBUTORS.wrs | ||
LICENSE | ||
README.rst | ||
test-requirements.txt | ||
tox.ini |
metal
StarlingX Bare Metal Management