The current start logic when existing cluster state is reboot can
lead to a split brain condition under certain circumstances. This
patchset adds some additional step to ensure cluster is set to
live state once leader node is ready to start, instead of relying
on slave nodes to handle. Also add some simple retry when there
is collision detected while trying to write to configmap.
The existing hair-trigger that will put the cluster state from
"live" into "reboot" can use some fine tuning, but updating it
properly should require additional investigation and testing,
hence should be done as a separate activity outside the scope
of this patchset.
Change-Id: Ieb2861d6fbc435e24e20d13c7b358c751890b4c4