Steve Baker 81acd5df24 Implement drain shutdown support
Sending signal ``SIGUSR2`` to a conductor process will now trigger a
drain shutdown. This is similar to a ``SIGTERM`` graceful shutdown but
the timeout is determined by ``[DEFAULT]drain_shutdown_timeout`` which
defaults to ``1800`` seconds. This is enough time for running tasks on
existing reserved nodes to either complete or reach their own failure
timeout.

During the drain period the conductor needs to be removed from the hash
ring to prevent new tasks from starting. Other conductors also need to
not fail reserved nodes on the draining conductor which would appear to
be orphaned.  This is achieved by running the conductor keepalive
heartbeat for this period, but setting the ``online`` state to
``False``.

When this feature was proposed, SIGINT was suggested as the signal to
use to trigger a drain shutdown. However this is already used by
oslo_service fast exit[1] so using this for drain would be a change in
existing behaviour.

[1] https://opendev.org/openstack/oslo.service/src/branch/master/oslo_service/service.py#L340

Change-Id: I777898f5a14844c9ac9967168f33d55c4f97dfb9
2023-11-13 10:38:18 +13:00
..
2023-11-13 10:38:18 +13:00