Gate: avoid failing before Nodes are defined

This is a workaround for the kubernetes issue below, which has
caused gate failures a few times.  The issue is due to a race condition
between a new apiserver becoming live, and its resource definitions being
fully loaded, resulting in a "no resource type node" kind of error
when running `kubectl wait node`.

This refactors the "loop until apiserver is ready" logic to treat
"no resource found" errors the same as apiserver unavailability,
resulting in another round of `sleep`.

https://github.com/kubernetes/kubernetes/issues/83242

Change-Id: I9b1aa0c0c12bbc9399e5a1f22390074319151df3
This commit is contained in:
Matt McEuen 2020-08-07 12:06:30 -05:00
parent 53e4f7ffc6
commit b7d2db0a96

View File

@ -78,30 +78,27 @@ echo ${KUBECONFIG} | base64 -d > /tmp/targetkubeconfig
echo "Import target kubeconfig" echo "Import target kubeconfig"
airshipctl config import /tmp/targetkubeconfig airshipctl config import /tmp/targetkubeconfig
echo "Check kubectl version" echo "Wait for apiserver to become available"
VERSION=""
N=0 N=0
MAX_RETRY=30 MAX_RETRY=30
DELAY=60 DELAY=60
until [ "$N" -ge ${MAX_RETRY} ] until [ "$N" -ge ${MAX_RETRY} ]
do do
VERSION=$(timeout 20 kubectl --kubeconfig /tmp/targetkubeconfig version | grep 'Server Version' || true) if timeout 20 kubectl --kubeconfig /tmp/targetkubeconfig get node; then
if [[ ! -z "$VERSION" ]]; then
break break
fi fi
N=$((N+1)) N=$((N+1))
echo "$N: Retry to get kubectl version." echo "$N: Retrying to reach the apiserver"
sleep ${DELAY} sleep ${DELAY}
done done
if [[ -z "$VERSION" ]]; then if [ "$N" -ge ${MAX_RETRY} ]; then
echo "Could not get kubectl version." echo "Could not reach the apiserver"
exit 1 exit 1
fi fi
echo "Check nodes status" echo "Wait for nodes to become Ready"
kubectl --kubeconfig /tmp/targetkubeconfig wait --for=condition=Ready node --all --timeout 900s kubectl --kubeconfig /tmp/targetkubeconfig wait --for=condition=Ready node --all --timeout 900s
echo "Get cluster state" echo "Get cluster state"