This script was set to always restart the local sriov device plugin pod
which could result in sriov pods not starting properly.
Originally, this sequence of commands would not work properly if the
device plugin was running
kubectl delete pods -n kube-system --selector=app=sriovdp
--field-selector=spec.nodeName=${HOST} --wait=false
kubectl wait pods -n kube-system --selector=app=sriovdp
--field-selector=spec.nodeName=${HOST} --for=condition=Ready
--timeout=360s
Result when device plugin is running:
pod "kube-sriov-device-plugin-amd64-rbjpw" deleted
pod/kube-sriov-device-plugin-amd64-rbjpw condition met
The wait command succeeds against the deleted pod and the script
continues. It then deletes labeled pods without having confirmed that
the device plugin is running and can result in sriov pods not starting
properly.
Ensuring that we are only restarting a not-running device plugin pod
prevents the wait condition from immediately passing.
Closes-Bug: 1928965
Signed-off-by: Cole Walker <cole.walker@windriver.com>
Change-Id: I1cc576b26a4bba4eba4a088d33f918bb07ef3b0d