Docs: Ops section - cleanup
As per discussion in the OSA docs summit session, clean up of installation guide. This fixes typos, minor RST mark up changes, and passive voice. Change-Id: Ibacaabddafee465a05bcb6eec01dd3ef04b33826
This commit is contained in:
parent
4c393d8b01
commit
7a82904d61
@ -1,7 +1,8 @@
|
|||||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||||
|
|
||||||
|
=====================
|
||||||
Adding a compute host
|
Adding a compute host
|
||||||
---------------------
|
=====================
|
||||||
|
|
||||||
Use the following procedure to add a compute host to an operational
|
Use the following procedure to add a compute host to an operational
|
||||||
cluster.
|
cluster.
|
||||||
@ -14,8 +15,8 @@ cluster.
|
|||||||
|
|
||||||
If necessary, also modify the ``used_ips`` stanza.
|
If necessary, also modify the ``used_ips`` stanza.
|
||||||
|
|
||||||
#. If the cluster is utilizing Ceilometer, it will be necessary to edit the
|
#. If the cluster is utilizing Telemetry/Metering (Ceilometer),
|
||||||
``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the host to
|
edit the ``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the host to
|
||||||
the ``metering-compute_hosts`` stanza.
|
the ``metering-compute_hosts`` stanza.
|
||||||
|
|
||||||
#. Run the following commands to add the host. Replace
|
#. Run the following commands to add the host. Replace
|
||||||
|
@ -1,12 +1,12 @@
|
|||||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||||
|
|
||||||
|
=======================
|
||||||
Galera cluster recovery
|
Galera cluster recovery
|
||||||
-----------------------
|
=======================
|
||||||
|
|
||||||
When one or all nodes fail within a galera cluster you may need to
|
Run the `` ``galera-bootstrap`` playbook to automatically recover
|
||||||
re-bootstrap the environment. To make take advantage of the
|
a node or an entire environment. Run the ``galera install`` playbook`
|
||||||
automation Ansible provides simply execute the ``galera-install.yml``
|
using the ``galera-bootstrap`` tag to auto recover a node or an
|
||||||
play using the **galera-bootstrap** to auto recover a node or an
|
|
||||||
entire environment.
|
entire environment.
|
||||||
|
|
||||||
#. Run the following Ansible command to show the failed nodes:
|
#. Run the following Ansible command to show the failed nodes:
|
||||||
@ -15,15 +15,13 @@ entire environment.
|
|||||||
|
|
||||||
# openstack-ansible galera-install.yml --tags galera-bootstrap
|
# openstack-ansible galera-install.yml --tags galera-bootstrap
|
||||||
|
|
||||||
|
The cluster comes back online after completion of this command.
|
||||||
Upon completion of this command the cluster should be back online an in
|
|
||||||
a functional state.
|
|
||||||
|
|
||||||
Single-node failure
|
Single-node failure
|
||||||
~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
If a single node fails, the other nodes maintain quorum and continue to
|
If a single node fails, the other nodes maintain quorum and
|
||||||
process SQL requests.
|
continue to process SQL requests.
|
||||||
|
|
||||||
#. Run the following Ansible command to determine the failed node:
|
#. Run the following Ansible command to determine the failed node:
|
||||||
|
|
||||||
@ -55,15 +53,15 @@ process SQL requests.
|
|||||||
#. Restart MariaDB on the failed node and verify that it rejoins the
|
#. Restart MariaDB on the failed node and verify that it rejoins the
|
||||||
cluster.
|
cluster.
|
||||||
|
|
||||||
#. If MariaDB fails to start, run the **mysqld** command and perform
|
#. If MariaDB fails to start, run the ``mysqld`` command and perform
|
||||||
further analysis on the output. As a last resort, rebuild the
|
further analysis on the output. As a last resort, rebuild the container
|
||||||
container for the node.
|
for the node.
|
||||||
|
|
||||||
Multi-node failure
|
Multi-node failure
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
When all but one node fails, the remaining node cannot achieve quorum
|
When all but one node fails, the remaining node cannot achieve quorum and
|
||||||
and stops processing SQL requests. In this situation, failed nodes that
|
stops processing SQL requests. In this situation, failed nodes that
|
||||||
recover cannot join the cluster because it no longer exists.
|
recover cannot join the cluster because it no longer exists.
|
||||||
|
|
||||||
#. Run the following Ansible command to show the failed nodes:
|
#. Run the following Ansible command to show the failed nodes:
|
||||||
@ -92,7 +90,7 @@ recover cannot join the cluster because it no longer exists.
|
|||||||
|
|
||||||
#. Run the following command to
|
#. Run the following command to
|
||||||
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
|
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
|
||||||
the operational node into the cluster.
|
the operational node into the cluster:
|
||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
|
|
||||||
@ -116,7 +114,7 @@ recover cannot join the cluster because it no longer exists.
|
|||||||
processing SQL requests.
|
processing SQL requests.
|
||||||
|
|
||||||
#. Restart MariaDB on the failed nodes and verify that they rejoin the
|
#. Restart MariaDB on the failed nodes and verify that they rejoin the
|
||||||
cluster.
|
cluster:
|
||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
|
|
||||||
@ -144,16 +142,15 @@ recover cannot join the cluster because it no longer exists.
|
|||||||
wsrep_cluster_status Primary
|
wsrep_cluster_status Primary
|
||||||
|
|
||||||
#. If MariaDB fails to start on any of the failed nodes, run the
|
#. If MariaDB fails to start on any of the failed nodes, run the
|
||||||
**mysqld** command and perform further analysis on the output. As a
|
``mysqld`` command and perform further analysis on the output. As a
|
||||||
last resort, rebuild the container for the node.
|
last resort, rebuild the container for the node.
|
||||||
|
|
||||||
Complete failure
|
Complete failure
|
||||||
~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
If all of the nodes in a Galera cluster fail (do not shutdown
|
Restore from backup if all of the nodes in a Galera cluster fail (do not shutdown
|
||||||
gracefully), then the integrity of the database can no longer be
|
gracefully). Run the following command to determine if all nodes in the
|
||||||
guaranteed and should be restored from backup. Run the following command
|
cluster have failed:
|
||||||
to determine if all nodes in the cluster have failed:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
|
|
||||||
@ -185,34 +182,35 @@ nodes and all of the nodes contain a ``seqno`` value of -1.
|
|||||||
|
|
||||||
If any single node has a positive ``seqno`` value, then that node can be
|
If any single node has a positive ``seqno`` value, then that node can be
|
||||||
used to restart the cluster. However, because there is no guarantee that
|
used to restart the cluster. However, because there is no guarantee that
|
||||||
each node has an identical copy of the data, it is not recommended to
|
each node has an identical copy of the data, we do not recommend to
|
||||||
restart the cluster using the **--wsrep-new-cluster** command on one
|
restart the cluster using the ``--wsrep-new-cluster`` command on one
|
||||||
node.
|
node.
|
||||||
|
|
||||||
Rebuilding a container
|
Rebuilding a container
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Sometimes recovering from a failure requires rebuilding one or more
|
Recovering from certain failures require rebuilding one or more containers.
|
||||||
containers.
|
|
||||||
|
|
||||||
#. Disable the failed node on the load balancer.
|
#. Disable the failed node on the load balancer.
|
||||||
|
|
||||||
Do not rely on the load balancer health checks to disable the node.
|
.. note::
|
||||||
If the node is not disabled, the load balancer will send SQL requests
|
|
||||||
to it before it rejoins the cluster and cause data inconsistencies.
|
Do not rely on the load balancer health checks to disable the node.
|
||||||
|
If the node is not disabled, the load balancer sends SQL requests
|
||||||
|
to it before it rejoins the cluster and cause data inconsistencies.
|
||||||
|
|
||||||
#. Use the following commands to destroy the container and remove
|
#. Destroy the container and remove MariaDB data stored outside
|
||||||
MariaDB data stored outside of the container. In this example, node 3
|
of the container:
|
||||||
failed.
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
|
|
||||||
# lxc-stop -n node3_galera_container-3ea2cbd3
|
# lxc-stop -n node3_galera_container-3ea2cbd3
|
||||||
# lxc-destroy -n node3_galera_container-3ea2cbd3
|
# lxc-destroy -n node3_galera_container-3ea2cbd3
|
||||||
# rm -rf /openstack/node3_galera_container-3ea2cbd3/*
|
# rm -rf /openstack/node3_galera_container-3ea2cbd3/*
|
||||||
|
|
||||||
|
In this example, node 3 failed.
|
||||||
|
|
||||||
#. Run the host setup playbook to rebuild the container specifically on
|
#. Run the host setup playbook to rebuild the container on node 3:
|
||||||
node 3:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
|
|
||||||
@ -220,7 +218,7 @@ containers.
|
|||||||
-l node3_galera_container-3ea2cbd3
|
-l node3_galera_container-3ea2cbd3
|
||||||
|
|
||||||
|
|
||||||
The playbook will also restart all other containers on the node.
|
The playbook restarts all other containers on the node.
|
||||||
|
|
||||||
#. Run the infrastructure playbook to configure the container
|
#. Run the infrastructure playbook to configure the container
|
||||||
specifically on node 3:
|
specifically on node 3:
|
||||||
@ -231,9 +229,11 @@ containers.
|
|||||||
-l node3_galera_container-3ea2cbd3
|
-l node3_galera_container-3ea2cbd3
|
||||||
|
|
||||||
|
|
||||||
The new container runs a single-node Galera cluster, a dangerous
|
.. warning::
|
||||||
state because the environment contains more than one active database
|
|
||||||
with potentially different data.
|
The new container runs a single-node Galera cluster, which is a dangerous
|
||||||
|
state because the environment contains more than one active database
|
||||||
|
with potentially different data.
|
||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
|
|
||||||
|
@ -1,7 +1,8 @@
|
|||||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||||
|
|
||||||
|
==============
|
||||||
Removing nodes
|
Removing nodes
|
||||||
--------------
|
==============
|
||||||
|
|
||||||
In the following example, all but one node was shut down gracefully:
|
In the following example, all but one node was shut down gracefully:
|
||||||
|
|
||||||
|
@ -1,15 +1,15 @@
|
|||||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||||
|
|
||||||
|
==================
|
||||||
Starting a cluster
|
Starting a cluster
|
||||||
------------------
|
==================
|
||||||
|
|
||||||
Gracefully shutting down all nodes destroys the cluster. Starting or
|
Gracefully shutting down all nodes destroys the cluster. Starting or
|
||||||
restarting a cluster from zero nodes requires creating a new cluster on
|
restarting a cluster from zero nodes requires creating a new cluster on
|
||||||
one of the nodes.
|
one of the nodes.
|
||||||
|
|
||||||
#. The new cluster should be started on the most advanced node. Run the
|
#. Start a new cluster on the most advanced node.
|
||||||
following command to check the ``seqno`` value in the
|
Check the ``seqno`` value in the ``grastate.dat`` file on all of the nodes:
|
||||||
``grastate.dat`` file on all of the nodes:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
|
|
||||||
@ -33,7 +33,7 @@ one of the nodes.
|
|||||||
cert_index:
|
cert_index:
|
||||||
|
|
||||||
In this example, all nodes in the cluster contain the same positive
|
In this example, all nodes in the cluster contain the same positive
|
||||||
``seqno`` values because they were synchronized just prior to
|
``seqno`` values as they were synchronized just prior to
|
||||||
graceful shutdown. If all ``seqno`` values are equal, any node can
|
graceful shutdown. If all ``seqno`` values are equal, any node can
|
||||||
start the new cluster.
|
start the new cluster.
|
||||||
|
|
||||||
|
@ -1,7 +1,8 @@
|
|||||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||||
|
|
||||||
|
==========================
|
||||||
Galera cluster maintenance
|
Galera cluster maintenance
|
||||||
--------------------------
|
==========================
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
|
|
||||||
@ -13,8 +14,8 @@ Routine maintenance includes gracefully adding or removing nodes from
|
|||||||
the cluster without impacting operation and also starting a cluster
|
the cluster without impacting operation and also starting a cluster
|
||||||
after gracefully shutting down all nodes.
|
after gracefully shutting down all nodes.
|
||||||
|
|
||||||
MySQL instances are restarted when creating a cluster, adding a
|
MySQL instances are restarted when creating a cluster, when adding a
|
||||||
node, the service isn't running, or when changes are made to the
|
node, when the service is not running, or when changes are made to the
|
||||||
``/etc/mysql/my.cnf`` configuration file.
|
``/etc/mysql/my.cnf`` configuration file.
|
||||||
|
|
||||||
--------------
|
--------------
|
||||||
|
@ -1,15 +1,16 @@
|
|||||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||||
|
|
||||||
Centralized Logging
|
===================
|
||||||
-------------------
|
Centralized logging
|
||||||
|
===================
|
||||||
|
|
||||||
OpenStack-Ansible will configure all instances to send syslog data to a
|
OpenStack-Ansible configures all instances to send syslog data to a
|
||||||
container (or group of containers) running rsyslog. The rsyslog server
|
container (or group of containers) running rsyslog. The rsyslog server
|
||||||
containers are specified in the ``log_hosts`` section of the
|
containers are specified in the ``log_hosts`` section of the
|
||||||
``openstack_user_config.yml`` file.
|
``openstack_user_config.yml`` file.
|
||||||
|
|
||||||
The rsyslog server container(s) have logrotate installed and configured with
|
The rsyslog server container(s) have logrotate installed and configured with
|
||||||
a 14 day retention. All rotated logs are compressed by default.
|
a 14 day retention. All rotated logs are compressed by default.
|
||||||
|
|
||||||
Finding logs
|
Finding logs
|
||||||
~~~~~~~~~~~~
|
~~~~~~~~~~~~
|
||||||
@ -18,10 +19,10 @@ Logs are accessible in multiple locations within an OpenStack-Ansible
|
|||||||
deployment:
|
deployment:
|
||||||
|
|
||||||
* The rsyslog server container collects logs in ``/var/log/log-storage`` within
|
* The rsyslog server container collects logs in ``/var/log/log-storage`` within
|
||||||
directories named after the container or physical host
|
directories named after the container or physical host.
|
||||||
* Each physical host has the logs from its service containers mounted at
|
* Each physical host has the logs from its service containers mounted at
|
||||||
``/openstack/log/``
|
``/openstack/log/``.
|
||||||
* Each service container has its own logs stored at ``/var/log/<service_name>``
|
* Each service container has its own logs stored at ``/var/log/<service_name>``.
|
||||||
|
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
|
@ -13,8 +13,9 @@ All LXC containers on the host have two virtual Ethernet interfaces:
|
|||||||
* `eth1` in the container connects to `br-mgmt` on the host
|
* `eth1` in the container connects to `br-mgmt` on the host
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
Some containers, such as cinder, glance, neutron_agents, and
|
|
||||||
swift_proxy, have more than two interfaces to support their
|
Some containers, such as ``cinder``, ``glance``, ``neutron_agents``, and
|
||||||
|
``swift_proxy``, have more than two interfaces to support their
|
||||||
functions.
|
functions.
|
||||||
|
|
||||||
Predictable interface naming
|
Predictable interface naming
|
||||||
@ -70,10 +71,15 @@ containers.
|
|||||||
Cached Ansible facts issues
|
Cached Ansible facts issues
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
At the beginning of a playbook run, information about each host, such
|
At the beginning of a playbook run, information about each host is gathered.
|
||||||
as its Linux distribution, kernel version, and network interfaces, is
|
Examples of the information gathered are:
|
||||||
gathered. To improve performance, particularly in larger deployments,
|
|
||||||
these facts can be cached.
|
* Linux distribution
|
||||||
|
* Kernel version
|
||||||
|
* Network interfaces
|
||||||
|
|
||||||
|
To improve performance, particularly in large deployments, you can
|
||||||
|
cache host facts and information.
|
||||||
|
|
||||||
OpenStack-Ansible enables fact caching by default. The facts are
|
OpenStack-Ansible enables fact caching by default. The facts are
|
||||||
cached in JSON files within ``/etc/openstack_deploy/ansible_facts``.
|
cached in JSON files within ``/etc/openstack_deploy/ansible_facts``.
|
||||||
@ -87,8 +93,9 @@ documentation on `fact caching`_ for more details.
|
|||||||
Forcing regeneration of cached facts
|
Forcing regeneration of cached facts
|
||||||
------------------------------------
|
------------------------------------
|
||||||
|
|
||||||
If a host's kernel is upgraded or additional network interfaces or
|
Cached facts may be incorrect if the host receives a kernel upgrade or new network
|
||||||
bridges are created on the host, its cached facts may be incorrect.
|
interfaces. Newly created bridges also disrupt cache facts.
|
||||||
|
|
||||||
This can lead to unexpected errors while running playbooks, and
|
This can lead to unexpected errors while running playbooks, and
|
||||||
require that the cached facts be regenerated.
|
require that the cached facts be regenerated.
|
||||||
|
|
||||||
|
@ -1,7 +1,8 @@
|
|||||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||||
|
|
||||||
|
=====================
|
||||||
Chapter 8. Operations
|
Chapter 8. Operations
|
||||||
---------------------
|
=====================
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user