Docs: Ops section - cleanup

As per discussion in the OSA docs summit session, clean up
of installation guide. This fixes typos, minor RST mark up
changes, and passive voice.

Change-Id: Ibacaabddafee465a05bcb6eec01dd3ef04b33826
This commit is contained in:
Alexandra 2016-04-28 15:23:12 +10:00 committed by Jesse Pretorius (odyssey4me)
parent 4c393d8b01
commit 7a82904d61
8 changed files with 79 additions and 67 deletions

View File

@ -1,7 +1,8 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide
=====================
Adding a compute host
---------------------
=====================
Use the following procedure to add a compute host to an operational
cluster.
@ -14,8 +15,8 @@ cluster.
If necessary, also modify the ``used_ips`` stanza.
#. If the cluster is utilizing Ceilometer, it will be necessary to edit the
``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the host to
#. If the cluster is utilizing Telemetry/Metering (Ceilometer),
edit the ``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the host to
the ``metering-compute_hosts`` stanza.
#. Run the following commands to add the host. Replace

View File

@ -1,12 +1,12 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide
=======================
Galera cluster recovery
-----------------------
=======================
When one or all nodes fail within a galera cluster you may need to
re-bootstrap the environment. To make take advantage of the
automation Ansible provides simply execute the ``galera-install.yml``
play using the **galera-bootstrap** to auto recover a node or an
Run the `` ``galera-bootstrap`` playbook to automatically recover
a node or an entire environment. Run the ``galera install`` playbook`
using the ``galera-bootstrap`` tag to auto recover a node or an
entire environment.
#. Run the following Ansible command to show the failed nodes:
@ -15,15 +15,13 @@ entire environment.
# openstack-ansible galera-install.yml --tags galera-bootstrap
Upon completion of this command the cluster should be back online an in
a functional state.
The cluster comes back online after completion of this command.
Single-node failure
~~~~~~~~~~~~~~~~~~~
If a single node fails, the other nodes maintain quorum and continue to
process SQL requests.
If a single node fails, the other nodes maintain quorum and
continue to process SQL requests.
#. Run the following Ansible command to determine the failed node:
@ -55,15 +53,15 @@ process SQL requests.
#. Restart MariaDB on the failed node and verify that it rejoins the
cluster.
#. If MariaDB fails to start, run the **mysqld** command and perform
further analysis on the output. As a last resort, rebuild the
container for the node.
#. If MariaDB fails to start, run the ``mysqld`` command and perform
further analysis on the output. As a last resort, rebuild the container
for the node.
Multi-node failure
~~~~~~~~~~~~~~~~~~
When all but one node fails, the remaining node cannot achieve quorum
and stops processing SQL requests. In this situation, failed nodes that
When all but one node fails, the remaining node cannot achieve quorum and
stops processing SQL requests. In this situation, failed nodes that
recover cannot join the cluster because it no longer exists.
#. Run the following Ansible command to show the failed nodes:
@ -92,7 +90,7 @@ recover cannot join the cluster because it no longer exists.
#. Run the following command to
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
the operational node into the cluster.
the operational node into the cluster:
.. code-block:: shell-session
@ -116,7 +114,7 @@ recover cannot join the cluster because it no longer exists.
processing SQL requests.
#. Restart MariaDB on the failed nodes and verify that they rejoin the
cluster.
cluster:
.. code-block:: shell-session
@ -144,16 +142,15 @@ recover cannot join the cluster because it no longer exists.
wsrep_cluster_status Primary
#. If MariaDB fails to start on any of the failed nodes, run the
**mysqld** command and perform further analysis on the output. As a
``mysqld`` command and perform further analysis on the output. As a
last resort, rebuild the container for the node.
Complete failure
~~~~~~~~~~~~~~~~
If all of the nodes in a Galera cluster fail (do not shutdown
gracefully), then the integrity of the database can no longer be
guaranteed and should be restored from backup. Run the following command
to determine if all nodes in the cluster have failed:
Restore from backup if all of the nodes in a Galera cluster fail (do not shutdown
gracefully). Run the following command to determine if all nodes in the
cluster have failed:
.. code-block:: shell-session
@ -185,25 +182,25 @@ nodes and all of the nodes contain a ``seqno`` value of -1.
If any single node has a positive ``seqno`` value, then that node can be
used to restart the cluster. However, because there is no guarantee that
each node has an identical copy of the data, it is not recommended to
restart the cluster using the **--wsrep-new-cluster** command on one
each node has an identical copy of the data, we do not recommend to
restart the cluster using the ``--wsrep-new-cluster`` command on one
node.
Rebuilding a container
~~~~~~~~~~~~~~~~~~~~~~
Sometimes recovering from a failure requires rebuilding one or more
containers.
Recovering from certain failures require rebuilding one or more containers.
#. Disable the failed node on the load balancer.
.. note::
Do not rely on the load balancer health checks to disable the node.
If the node is not disabled, the load balancer will send SQL requests
If the node is not disabled, the load balancer sends SQL requests
to it before it rejoins the cluster and cause data inconsistencies.
#. Use the following commands to destroy the container and remove
MariaDB data stored outside of the container. In this example, node 3
failed.
#. Destroy the container and remove MariaDB data stored outside
of the container:
.. code-block:: shell-session
@ -211,8 +208,9 @@ containers.
# lxc-destroy -n node3_galera_container-3ea2cbd3
# rm -rf /openstack/node3_galera_container-3ea2cbd3/*
#. Run the host setup playbook to rebuild the container specifically on
node 3:
In this example, node 3 failed.
#. Run the host setup playbook to rebuild the container on node 3:
.. code-block:: shell-session
@ -220,7 +218,7 @@ containers.
-l node3_galera_container-3ea2cbd3
The playbook will also restart all other containers on the node.
The playbook restarts all other containers on the node.
#. Run the infrastructure playbook to configure the container
specifically on node 3:
@ -231,7 +229,9 @@ containers.
-l node3_galera_container-3ea2cbd3
The new container runs a single-node Galera cluster, a dangerous
.. warning::
The new container runs a single-node Galera cluster, which is a dangerous
state because the environment contains more than one active database
with potentially different data.

View File

@ -1,7 +1,8 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide
==============
Removing nodes
--------------
==============
In the following example, all but one node was shut down gracefully:

View File

@ -1,15 +1,15 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide
==================
Starting a cluster
------------------
==================
Gracefully shutting down all nodes destroys the cluster. Starting or
restarting a cluster from zero nodes requires creating a new cluster on
one of the nodes.
#. The new cluster should be started on the most advanced node. Run the
following command to check the ``seqno`` value in the
``grastate.dat`` file on all of the nodes:
#. Start a new cluster on the most advanced node.
Check the ``seqno`` value in the ``grastate.dat`` file on all of the nodes:
.. code-block:: shell-session
@ -33,7 +33,7 @@ one of the nodes.
cert_index:
In this example, all nodes in the cluster contain the same positive
``seqno`` values because they were synchronized just prior to
``seqno`` values as they were synchronized just prior to
graceful shutdown. If all ``seqno`` values are equal, any node can
start the new cluster.

View File

@ -1,7 +1,8 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide
==========================
Galera cluster maintenance
--------------------------
==========================
.. toctree::
@ -13,8 +14,8 @@ Routine maintenance includes gracefully adding or removing nodes from
the cluster without impacting operation and also starting a cluster
after gracefully shutting down all nodes.
MySQL instances are restarted when creating a cluster, adding a
node, the service isn't running, or when changes are made to the
MySQL instances are restarted when creating a cluster, when adding a
node, when the service is not running, or when changes are made to the
``/etc/mysql/my.cnf`` configuration file.
--------------

View File

@ -1,9 +1,10 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide
Centralized Logging
-------------------
===================
Centralized logging
===================
OpenStack-Ansible will configure all instances to send syslog data to a
OpenStack-Ansible configures all instances to send syslog data to a
container (or group of containers) running rsyslog. The rsyslog server
containers are specified in the ``log_hosts`` section of the
``openstack_user_config.yml`` file.
@ -18,10 +19,10 @@ Logs are accessible in multiple locations within an OpenStack-Ansible
deployment:
* The rsyslog server container collects logs in ``/var/log/log-storage`` within
directories named after the container or physical host
directories named after the container or physical host.
* Each physical host has the logs from its service containers mounted at
``/openstack/log/``
* Each service container has its own logs stored at ``/var/log/<service_name>``
``/openstack/log/``.
* Each service container has its own logs stored at ``/var/log/<service_name>``.
--------------

View File

@ -13,8 +13,9 @@ All LXC containers on the host have two virtual Ethernet interfaces:
* `eth1` in the container connects to `br-mgmt` on the host
.. note::
Some containers, such as cinder, glance, neutron_agents, and
swift_proxy, have more than two interfaces to support their
Some containers, such as ``cinder``, ``glance``, ``neutron_agents``, and
``swift_proxy``, have more than two interfaces to support their
functions.
Predictable interface naming
@ -70,10 +71,15 @@ containers.
Cached Ansible facts issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~
At the beginning of a playbook run, information about each host, such
as its Linux distribution, kernel version, and network interfaces, is
gathered. To improve performance, particularly in larger deployments,
these facts can be cached.
At the beginning of a playbook run, information about each host is gathered.
Examples of the information gathered are:
* Linux distribution
* Kernel version
* Network interfaces
To improve performance, particularly in large deployments, you can
cache host facts and information.
OpenStack-Ansible enables fact caching by default. The facts are
cached in JSON files within ``/etc/openstack_deploy/ansible_facts``.
@ -87,8 +93,9 @@ documentation on `fact caching`_ for more details.
Forcing regeneration of cached facts
------------------------------------
If a host's kernel is upgraded or additional network interfaces or
bridges are created on the host, its cached facts may be incorrect.
Cached facts may be incorrect if the host receives a kernel upgrade or new network
interfaces. Newly created bridges also disrupt cache facts.
This can lead to unexpected errors while running playbooks, and
require that the cached facts be regenerated.

View File

@ -1,7 +1,8 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide
=====================
Chapter 8. Operations
---------------------
=====================
.. toctree::