Merge "Consolidating galera recovery documentation"
This commit is contained in:
commit
5e76a9283d
@ -19,6 +19,277 @@ entire environment.
|
||||
Upon completion of this command the cluster should be back online an in
|
||||
a functional state.
|
||||
|
||||
Single-node failure
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If a single node fails, the other nodes maintain quorum and continue to
|
||||
process SQL requests.
|
||||
|
||||
#. Run the following Ansible command to determine the failed node:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql -h localhost \
|
||||
-e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server through
|
||||
socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
|
||||
In this example, node 3 has failed.
|
||||
|
||||
#. Restart MariaDB on the failed node and verify that it rejoins the
|
||||
cluster.
|
||||
|
||||
#. If MariaDB fails to start, run the **mysqld** command and perform
|
||||
further analysis on the output. As a last resort, rebuild the
|
||||
container for the node.
|
||||
|
||||
Multi-node failure
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When all but one node fails, the remaining node cannot achieve quorum
|
||||
and stops processing SQL requests. In this situation, failed nodes that
|
||||
recover cannot join the cluster because it no longer exists.
|
||||
|
||||
#. Run the following Ansible command to show the failed nodes:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node2_galera_container-49a47d25 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 18446744073709551615
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status non-Primary
|
||||
|
||||
In this example, nodes 2 and 3 have failed. The remaining operational
|
||||
server indicates ``non-Primary`` because it cannot achieve quorum.
|
||||
|
||||
#. Run the following command to
|
||||
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
|
||||
the operational node into the cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# mysql -e "SET GLOBAL wsrep_provider_options='pc.bootstrap=yes';"
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 15
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node2_galera_container-49a47d25 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
The remaining operational node becomes the primary node and begins
|
||||
processing SQL requests.
|
||||
|
||||
#. Restart MariaDB on the failed nodes and verify that they rejoin the
|
||||
cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
#. If MariaDB fails to start on any of the failed nodes, run the
|
||||
**mysqld** command and perform further analysis on the output. As a
|
||||
last resort, rebuild the container for the node.
|
||||
|
||||
Complete failure
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
If all of the nodes in a Galera cluster fail (do not shutdown
|
||||
gracefully), then the integrity of the database can no longer be
|
||||
guaranteed and should be restored from backup. Run the following command
|
||||
to determine if all nodes in the cluster have failed:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
|
||||
All the nodes have failed if ``mysqld`` is not running on any of the
|
||||
nodes and all of the nodes contain a ``seqno`` value of -1.
|
||||
|
||||
If any single node has a positive ``seqno`` value, then that node can be
|
||||
used to restart the cluster. However, because there is no guarantee that
|
||||
each node has an identical copy of the data, it is not recommended to
|
||||
restart the cluster using the **--wsrep-new-cluster** command on one
|
||||
node.
|
||||
|
||||
Rebuilding a container
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Sometimes recovering from a failure requires rebuilding one or more
|
||||
containers.
|
||||
|
||||
#. Disable the failed node on the load balancer.
|
||||
|
||||
Do not rely on the load balancer health checks to disable the node.
|
||||
If the node is not disabled, the load balancer will send SQL requests
|
||||
to it before it rejoins the cluster and cause data inconsistencies.
|
||||
|
||||
#. Use the following commands to destroy the container and remove
|
||||
MariaDB data stored outside of the container. In this example, node 3
|
||||
failed.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-stop -n node3_galera_container-3ea2cbd3
|
||||
# lxc-destroy -n node3_galera_container-3ea2cbd3
|
||||
# rm -rf /openstack/node3_galera_container-3ea2cbd3/*
|
||||
|
||||
#. Run the host setup playbook to rebuild the container specifically on
|
||||
node 3:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# openstack-ansible setup-hosts.yml -l node3 \
|
||||
-l node3_galera_container-3ea2cbd3
|
||||
|
||||
|
||||
The playbook will also restart all other containers on the node.
|
||||
|
||||
#. Run the infrastructure playbook to configure the container
|
||||
specifically on node 3:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# openstack-ansible infrastructure-setup.yml \
|
||||
-l node3_galera_container-3ea2cbd3
|
||||
|
||||
|
||||
The new container runs a single-node Galera cluster, a dangerous
|
||||
state because the environment contains more than one active database
|
||||
with potentially different data.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 1
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid da078d01-29e5-11e4-a051-03d896dbdb2d
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 4
|
||||
wsrep_cluster_size 2
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 4
|
||||
wsrep_cluster_size 2
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
#. Restart MariaDB in the new container and verify that it rejoins the
|
||||
cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
|
||||
#. Enable the failed node on the load balancer.
|
||||
|
||||
--------------
|
||||
|
||||
|
@ -1,47 +0,0 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
Complete failure
|
||||
----------------
|
||||
|
||||
If all of the nodes in a Galera cluster fail (do not shutdown
|
||||
gracefully), then the integrity of the database can no longer be
|
||||
guaranteed and should be restored from backup. Run the following command
|
||||
to determine if all nodes in the cluster have failed:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
|
||||
All the nodes have failed if ``mysqld`` is not running on any of the
|
||||
nodes and all of the nodes contain a ``seqno`` value of -1.
|
||||
|
||||
If any single node has a positive ``seqno`` value, then that node can be
|
||||
used to restart the cluster. However, because there is no guarantee that
|
||||
each node has an identical copy of the data, it is not recommended to
|
||||
restart the cluster using the **--wsrep-new-cluster** command on one
|
||||
node.
|
||||
|
||||
--------------
|
||||
|
||||
.. include:: navigation.txt
|
@ -1,107 +0,0 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
Rebuilding a container
|
||||
----------------------
|
||||
|
||||
Sometimes recovering from a failure requires rebuilding one or more
|
||||
containers.
|
||||
|
||||
#. Disable the failed node on the load balancer.
|
||||
|
||||
Do not rely on the load balancer health checks to disable the node.
|
||||
If the node is not disabled, the load balancer will send SQL requests
|
||||
to it before it rejoins the cluster and cause data inconsistencies.
|
||||
|
||||
#. Use the following commands to destroy the container and remove
|
||||
MariaDB data stored outside of the container. In this example, node 3
|
||||
failed.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-stop -n node3_galera_container-3ea2cbd3
|
||||
# lxc-destroy -n node3_galera_container-3ea2cbd3
|
||||
# rm -rf /openstack/node3_galera_container-3ea2cbd3/*
|
||||
|
||||
#. Run the host setup playbook to rebuild the container specifically on
|
||||
node 3:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# openstack-ansible setup-hosts.yml -l node3 \
|
||||
-l node3_galera_container-3ea2cbd3
|
||||
|
||||
|
||||
The playbook will also restart all other containers on the node.
|
||||
|
||||
#. Run the infrastructure playbook to configure the container
|
||||
specifically on node 3:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# openstack-ansible infrastructure-setup.yml \
|
||||
-l node3_galera_container-3ea2cbd3
|
||||
|
||||
|
||||
The new container runs a single-node Galera cluster, a dangerous
|
||||
state because the environment contains more than one active database
|
||||
with potentially different data.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 1
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid da078d01-29e5-11e4-a051-03d896dbdb2d
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 4
|
||||
wsrep_cluster_size 2
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 4
|
||||
wsrep_cluster_size 2
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
#. Restart MariaDB in the new container and verify that it rejoins the
|
||||
cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
|
||||
#. Enable the failed node on the load balancer.
|
||||
|
||||
--------------
|
||||
|
||||
.. include:: navigation.txt
|
@ -1,93 +0,0 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
Multi-node failure
|
||||
------------------
|
||||
|
||||
When all but one node fails, the remaining node cannot achieve quorum
|
||||
and stops processing SQL requests. In this situation, failed nodes that
|
||||
recover cannot join the cluster because it no longer exists.
|
||||
|
||||
#. Run the following Ansible command to show the failed nodes:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node2_galera_container-49a47d25 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 18446744073709551615
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status non-Primary
|
||||
|
||||
In this example, nodes 2 and 3 have failed. The remaining operational
|
||||
server indicates ``non-Primary`` because it cannot achieve quorum.
|
||||
|
||||
#. Run the following command to
|
||||
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
|
||||
the operational node into the cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# mysql -e "SET GLOBAL wsrep_provider_options='pc.bootstrap=yes';"
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 15
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node2_galera_container-49a47d25 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
The remaining operational node becomes the primary node and begins
|
||||
processing SQL requests.
|
||||
|
||||
#. Restart MariaDB on the failed nodes and verify that they rejoin the
|
||||
cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
#. If MariaDB fails to start on any of the failed nodes, run the
|
||||
**mysqld** command and perform further analysis on the output. As a
|
||||
last resort, rebuild the container for the node.
|
||||
|
||||
--------------
|
||||
|
||||
.. include:: navigation.txt
|
@ -1,45 +0,0 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
Single-node failure
|
||||
-------------------
|
||||
|
||||
If a single node fails, the other nodes maintain quorum and continue to
|
||||
process SQL requests.
|
||||
|
||||
#. Run the following Ansible command to determine the failed node:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql -h localhost \
|
||||
-e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server through
|
||||
socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
|
||||
In this example, node 3 has failed.
|
||||
|
||||
#. Restart MariaDB on the failed node and verify that it rejoins the
|
||||
cluster.
|
||||
|
||||
#. If MariaDB fails to start, run the **mysqld** command and perform
|
||||
further analysis on the output. As a last resort, rebuild the
|
||||
container for the node.
|
||||
|
||||
--------------
|
||||
|
||||
.. include:: navigation.txt
|
@ -8,10 +8,6 @@ Galera cluster maintenance
|
||||
ops-galera-remove.rst
|
||||
ops-galera-start.rst
|
||||
ops-galera-recovery.rst
|
||||
ops-galera-recoverysingle.rst
|
||||
ops-galera-recoverymulti.rst
|
||||
ops-galera-recoverycomplete.rst
|
||||
ops-galera-recoverycontainer.rst
|
||||
|
||||
Routine maintenance includes gracefully adding or removing nodes from
|
||||
the cluster without impacting operation and also starting a cluster
|
||||
|
Loading…
x
Reference in New Issue
Block a user