Improvement of ProxySQL Monitoring Configuration

This update enhances the monitoring of the databasecluster
in ProxySQL. The default monitoring intervals were insufficient
for reliably detecting failures in the Galera cluster environment.

A detailed configuration for monitoring intervals has been
introduced, providing better control over how quickly and accurately
ProxySQL can identify issues.

  - Variables such as `mariadb_monitor_connect_interval`,
    `mariadb_monitor_galera_healthcheck_interval, and
    `mariadb_monitor_ping_interval` significantly reduce
    the time between connection checks.

  - Timeouts like `mariadb_monitor_galera_healthcheck_timeout`
    and `mariadb_monitor_ping_timeout` allow faster failure
    detection, while `mariadb_monitor_galera_healthcheck_max_timeout_count`
    sets the maximum number of allowed timeouts before marking a node as down.

Calculation:

 - Galera healthcheck:

   4 seconds (interval) + 1 second (timeout) + 4 seconds (interval)
   + 1 second (timeout) = 10 seconds.

 - Ping healthcheck:

   3 seconds (interval) + 2 seconds (timeout) + 3 seconds (interval)
   + 2 seconds (timeout) = 10 seconds.

Both the health check and ping check mechanisms will detect a node failure
within a maximum of 10 seconds. Both processes (health check and ping)
operate independently, and failure in either mechanism will mark the node
as failed.

Health Check Failure Detection: Up to 10 seconds.
Ping Failure Detection: Up to 10 seconds.
Connect Attempts: ProxySQL also tries to connect every 2 seconds, which
helps monitor connectivity.

These changes ensure that ProxySQL can detect issues in 10 seconds
as haproxy, significantly reducing downtime compared to default settings.
This adjustment enables faster and more reliable monitoring, improving system
stability and reducing potential downtime in production environments.

Change-Id: Ic28801519cdb35ed2387a1468b9df661847a5476
This commit is contained in:
Michal Arbet 2024-09-20 18:40:36 +02:00
parent 7723a6f49c
commit 7989756699
3 changed files with 26 additions and 0 deletions

View File

@ -480,7 +480,15 @@ mariadb_wsrep_port: "4567"
mariadb_ist_port: "4568"
mariadb_sst_port: "4444"
mariadb_clustercheck_port: "4569"
mariadb_monitor_user: "{{ 'monitor' if enable_proxysql | bool else 'haproxy' }}"
mariadb_monitor_connect_interval: "2000"
mariadb_monitor_galera_healthcheck_interval: "4000"
mariadb_monitor_galera_healthcheck_timeout: "1000"
mariadb_monitor_galera_healthcheck_max_timeout_count: "2"
mariadb_monitor_ping_interval: "3000"
mariadb_monitor_ping_timeout: "2000"
mariadb_monitor_ping_max_failures: "2"
mariadb_datadir_volume: "mariadb"

View File

@ -22,6 +22,13 @@ mysql_variables:
interfaces: "{{ kolla_internal_vip_address | put_address_in_context('url') }}:{{ database_port }}"
monitor_username: "{{ mariadb_monitor_user }}"
monitor_password: "{{ mariadb_monitor_password }}"
monitor_connect_interval: "{{ mariadb_monitor_connect_interval }}"
monitor_galera_healthcheck_interval: "{{ mariadb_monitor_galera_healthcheck_interval }}"
monitor_galera_healthcheck_timeout: "{{ mariadb_monitor_galera_healthcheck_timeout }}"
monitor_galera_healthcheck_max_timeout_count: "{{ mariadb_monitor_galera_healthcheck_max_timeout_count }}"
monitor_ping_interval: "{{ mariadb_monitor_ping_interval }}"
monitor_ping_timeout: "{{ mariadb_monitor_ping_timeout }}"
monitor_ping_max_failures: "{{ mariadb_monitor_ping_max_failures }}"
mysql_servers:
{% for shard_id, shard in mariadb_shards_info.shards.items() %}

View File

@ -0,0 +1,11 @@
---
features:
- |
Introduces new variables ``mariadb_monitor_connect_interval``,
``mariadb_monitor_galera_healthcheck_interval``,
``mariadb_monitor_galera_healthcheck_timeout``,
``mariadb_monitor_galera_healthcheck_max_timeout_count``,
``mariadb_monitor_ping_interval``, ``mariadb_monitor_ping_timeout``,
and ``mariadb_monitor_ping_max_failures``.
These allow faster detection of issues in Galera clusters,
reducing downtime to 10 seconds.