ansible-playbooks/examples
Tee Ngo a2f684c8fb Improve bootstrap failure recovery in replay
Previously bootstrap playbook roles were mainly triggered by config
changes during replay. Consequently, the playbook was unable to
recover from the previous failure caused by an issue other than
misconfigurations in the host override e.g.  bad image/template,
backend code flaw, network glitch, proxy server down, user
interruption, etc…

Furthermore, depending on what step the last failure occurred,
subsequent replay would fail on non-reentrant tasks such as
filesystem resizing, ip addr add/delete, sysinv REST calls.

This commit addresses these flaws by maximizing the reentrancy
of bootstrap tasks and removing the restriction of roles inclusion
based on config changes.

Tests:
  - Bootstrap a simplex system locally
  - Bootstrap a standard system remotely
  - Install and reinstall ssl ca cert via bootstrap replay
  - Induce Kubernetes services bringup failures due to misconfiguration,
    bad template file, change config and replay.
  - Induce initial database population failure due to misconfiguration,
    change config and replay.
  - Induce database update failure due to misconfiguration, change
    config and replay.
  - Induce random failures, make no config change and replay.

Known limitation:
  - Failure during the apply of bootstrap manifests may not be
    recoverable as most of these manifests are not re-entrant.

Closes-Bug: 1830781
Change-Id: Ia2c1e1199f2c67033fb91a7e9f24d808e6fe94c9
Signed-off-by: Tee Ngo <tee.ngo@windriver.com>
2019-07-09 11:49:55 -04:00
..
remote Improve bootstrap failure recovery in replay 2019-07-09 11:49:55 -04:00