Increasing MariaDB startupProbe timeout

In the event of an uncontrolled reboot on a Standard configuration, we were seeing a behavior where the MariaDB pods kept trying to elect a leader and restarting until the pods get to CrashLoopBackoff. After checking the logs closely and reproducing the problem quite easily by deleting both pods at the same time, we came to the conclusion that the cluster wasn't having enough time to elect a new leader and recover from the crash. This patch increases the timeout for the startup probe of the mariadb statefulset with some slack to allow databases that are in production to fully resync the data between the 2 pods. Closes-Bug: #1938346 Signed-off-by: Thiago Brito <thiago.brito@windriver.com> Change-Id: I19e49dab55f3a8661fa71be315093029adb0947e
2021-07-28 19:09:37 -03:00 · 2021-07-28 19:09:37 -03:00 · 52b3185a19
commit 52b3185a19
parent 31c4390122
1 changed files with 3 additions and 3 deletions
--- a/openstack-helm-infra/files/0009-Enable-override-of-mariadb-server-probe-parameters.patch
+++ b/openstack-helm-infra/files/0009-Enable-override-of-mariadb-server-probe-parameters.patch
@ -47,9 +47,9 @@ index 2d75f39..444bba3 100644
 +        startup:
 +          enabled: false
 +          params:
-+            initialDelaySeconds: 30
+            initialDelaySeconds: 60
-+            periodSeconds: 30
+            periodSeconds: 60
-+            failureThreshold: 3
+            failureThreshold: 10
   security_context:
     server:
       pod: