Docs: Ops section - cleanup

As per discussion in the OSA docs summit session, clean up
of installation guide. This fixes typos, minor RST mark up
changes, and passive voice.

Change-Id: Ibacaabddafee465a05bcb6eec01dd3ef04b33826
This commit is contained in:
Alexandra 2016-04-28 15:23:12 +10:00 committed by Jesse Pretorius (odyssey4me)
parent 4c393d8b01
commit 7a82904d61
8 changed files with 79 additions and 67 deletions

View File

@ -1,7 +1,8 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide `Home <index.html>`_ OpenStack-Ansible Installation Guide
=====================
Adding a compute host Adding a compute host
--------------------- =====================
Use the following procedure to add a compute host to an operational Use the following procedure to add a compute host to an operational
cluster. cluster.
@ -14,8 +15,8 @@ cluster.
If necessary, also modify the ``used_ips`` stanza. If necessary, also modify the ``used_ips`` stanza.
#. If the cluster is utilizing Ceilometer, it will be necessary to edit the #. If the cluster is utilizing Telemetry/Metering (Ceilometer),
``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the host to edit the ``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the host to
the ``metering-compute_hosts`` stanza. the ``metering-compute_hosts`` stanza.
#. Run the following commands to add the host. Replace #. Run the following commands to add the host. Replace

View File

@ -1,12 +1,12 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide `Home <index.html>`_ OpenStack-Ansible Installation Guide
=======================
Galera cluster recovery Galera cluster recovery
----------------------- =======================
When one or all nodes fail within a galera cluster you may need to Run the `` ``galera-bootstrap`` playbook to automatically recover
re-bootstrap the environment. To make take advantage of the a node or an entire environment. Run the ``galera install`` playbook`
automation Ansible provides simply execute the ``galera-install.yml`` using the ``galera-bootstrap`` tag to auto recover a node or an
play using the **galera-bootstrap** to auto recover a node or an
entire environment. entire environment.
#. Run the following Ansible command to show the failed nodes: #. Run the following Ansible command to show the failed nodes:
@ -15,15 +15,13 @@ entire environment.
# openstack-ansible galera-install.yml --tags galera-bootstrap # openstack-ansible galera-install.yml --tags galera-bootstrap
The cluster comes back online after completion of this command.
Upon completion of this command the cluster should be back online an in
a functional state.
Single-node failure Single-node failure
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
If a single node fails, the other nodes maintain quorum and continue to If a single node fails, the other nodes maintain quorum and
process SQL requests. continue to process SQL requests.
#. Run the following Ansible command to determine the failed node: #. Run the following Ansible command to determine the failed node:
@ -55,15 +53,15 @@ process SQL requests.
#. Restart MariaDB on the failed node and verify that it rejoins the #. Restart MariaDB on the failed node and verify that it rejoins the
cluster. cluster.
#. If MariaDB fails to start, run the **mysqld** command and perform #. If MariaDB fails to start, run the ``mysqld`` command and perform
further analysis on the output. As a last resort, rebuild the further analysis on the output. As a last resort, rebuild the container
container for the node. for the node.
Multi-node failure Multi-node failure
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
When all but one node fails, the remaining node cannot achieve quorum When all but one node fails, the remaining node cannot achieve quorum and
and stops processing SQL requests. In this situation, failed nodes that stops processing SQL requests. In this situation, failed nodes that
recover cannot join the cluster because it no longer exists. recover cannot join the cluster because it no longer exists.
#. Run the following Ansible command to show the failed nodes: #. Run the following Ansible command to show the failed nodes:
@ -92,7 +90,7 @@ recover cannot join the cluster because it no longer exists.
#. Run the following command to #. Run the following command to
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_ `rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
the operational node into the cluster. the operational node into the cluster:
.. code-block:: shell-session .. code-block:: shell-session
@ -116,7 +114,7 @@ recover cannot join the cluster because it no longer exists.
processing SQL requests. processing SQL requests.
#. Restart MariaDB on the failed nodes and verify that they rejoin the #. Restart MariaDB on the failed nodes and verify that they rejoin the
cluster. cluster:
.. code-block:: shell-session .. code-block:: shell-session
@ -144,16 +142,15 @@ recover cannot join the cluster because it no longer exists.
wsrep_cluster_status Primary wsrep_cluster_status Primary
#. If MariaDB fails to start on any of the failed nodes, run the #. If MariaDB fails to start on any of the failed nodes, run the
**mysqld** command and perform further analysis on the output. As a ``mysqld`` command and perform further analysis on the output. As a
last resort, rebuild the container for the node. last resort, rebuild the container for the node.
Complete failure Complete failure
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
If all of the nodes in a Galera cluster fail (do not shutdown Restore from backup if all of the nodes in a Galera cluster fail (do not shutdown
gracefully), then the integrity of the database can no longer be gracefully). Run the following command to determine if all nodes in the
guaranteed and should be restored from backup. Run the following command cluster have failed:
to determine if all nodes in the cluster have failed:
.. code-block:: shell-session .. code-block:: shell-session
@ -185,34 +182,35 @@ nodes and all of the nodes contain a ``seqno`` value of -1.
If any single node has a positive ``seqno`` value, then that node can be If any single node has a positive ``seqno`` value, then that node can be
used to restart the cluster. However, because there is no guarantee that used to restart the cluster. However, because there is no guarantee that
each node has an identical copy of the data, it is not recommended to each node has an identical copy of the data, we do not recommend to
restart the cluster using the **--wsrep-new-cluster** command on one restart the cluster using the ``--wsrep-new-cluster`` command on one
node. node.
Rebuilding a container Rebuilding a container
~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~
Sometimes recovering from a failure requires rebuilding one or more Recovering from certain failures require rebuilding one or more containers.
containers.
#. Disable the failed node on the load balancer. #. Disable the failed node on the load balancer.
Do not rely on the load balancer health checks to disable the node. .. note::
If the node is not disabled, the load balancer will send SQL requests
to it before it rejoins the cluster and cause data inconsistencies. Do not rely on the load balancer health checks to disable the node.
If the node is not disabled, the load balancer sends SQL requests
to it before it rejoins the cluster and cause data inconsistencies.
#. Use the following commands to destroy the container and remove #. Destroy the container and remove MariaDB data stored outside
MariaDB data stored outside of the container. In this example, node 3 of the container:
failed.
.. code-block:: shell-session .. code-block:: shell-session
# lxc-stop -n node3_galera_container-3ea2cbd3 # lxc-stop -n node3_galera_container-3ea2cbd3
# lxc-destroy -n node3_galera_container-3ea2cbd3 # lxc-destroy -n node3_galera_container-3ea2cbd3
# rm -rf /openstack/node3_galera_container-3ea2cbd3/* # rm -rf /openstack/node3_galera_container-3ea2cbd3/*
In this example, node 3 failed.
#. Run the host setup playbook to rebuild the container specifically on #. Run the host setup playbook to rebuild the container on node 3:
node 3:
.. code-block:: shell-session .. code-block:: shell-session
@ -220,7 +218,7 @@ containers.
-l node3_galera_container-3ea2cbd3 -l node3_galera_container-3ea2cbd3
The playbook will also restart all other containers on the node. The playbook restarts all other containers on the node.
#. Run the infrastructure playbook to configure the container #. Run the infrastructure playbook to configure the container
specifically on node 3: specifically on node 3:
@ -231,9 +229,11 @@ containers.
-l node3_galera_container-3ea2cbd3 -l node3_galera_container-3ea2cbd3
The new container runs a single-node Galera cluster, a dangerous .. warning::
state because the environment contains more than one active database
with potentially different data. The new container runs a single-node Galera cluster, which is a dangerous
state because the environment contains more than one active database
with potentially different data.
.. code-block:: shell-session .. code-block:: shell-session

View File

@ -1,7 +1,8 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide `Home <index.html>`_ OpenStack-Ansible Installation Guide
==============
Removing nodes Removing nodes
-------------- ==============
In the following example, all but one node was shut down gracefully: In the following example, all but one node was shut down gracefully:

View File

@ -1,15 +1,15 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide `Home <index.html>`_ OpenStack-Ansible Installation Guide
==================
Starting a cluster Starting a cluster
------------------ ==================
Gracefully shutting down all nodes destroys the cluster. Starting or Gracefully shutting down all nodes destroys the cluster. Starting or
restarting a cluster from zero nodes requires creating a new cluster on restarting a cluster from zero nodes requires creating a new cluster on
one of the nodes. one of the nodes.
#. The new cluster should be started on the most advanced node. Run the #. Start a new cluster on the most advanced node.
following command to check the ``seqno`` value in the Check the ``seqno`` value in the ``grastate.dat`` file on all of the nodes:
``grastate.dat`` file on all of the nodes:
.. code-block:: shell-session .. code-block:: shell-session
@ -33,7 +33,7 @@ one of the nodes.
cert_index: cert_index:
In this example, all nodes in the cluster contain the same positive In this example, all nodes in the cluster contain the same positive
``seqno`` values because they were synchronized just prior to ``seqno`` values as they were synchronized just prior to
graceful shutdown. If all ``seqno`` values are equal, any node can graceful shutdown. If all ``seqno`` values are equal, any node can
start the new cluster. start the new cluster.

View File

@ -1,7 +1,8 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide `Home <index.html>`_ OpenStack-Ansible Installation Guide
==========================
Galera cluster maintenance Galera cluster maintenance
-------------------------- ==========================
.. toctree:: .. toctree::
@ -13,8 +14,8 @@ Routine maintenance includes gracefully adding or removing nodes from
the cluster without impacting operation and also starting a cluster the cluster without impacting operation and also starting a cluster
after gracefully shutting down all nodes. after gracefully shutting down all nodes.
MySQL instances are restarted when creating a cluster, adding a MySQL instances are restarted when creating a cluster, when adding a
node, the service isn't running, or when changes are made to the node, when the service is not running, or when changes are made to the
``/etc/mysql/my.cnf`` configuration file. ``/etc/mysql/my.cnf`` configuration file.
-------------- --------------

View File

@ -1,15 +1,16 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide `Home <index.html>`_ OpenStack-Ansible Installation Guide
Centralized Logging ===================
------------------- Centralized logging
===================
OpenStack-Ansible will configure all instances to send syslog data to a OpenStack-Ansible configures all instances to send syslog data to a
container (or group of containers) running rsyslog. The rsyslog server container (or group of containers) running rsyslog. The rsyslog server
containers are specified in the ``log_hosts`` section of the containers are specified in the ``log_hosts`` section of the
``openstack_user_config.yml`` file. ``openstack_user_config.yml`` file.
The rsyslog server container(s) have logrotate installed and configured with The rsyslog server container(s) have logrotate installed and configured with
a 14 day retention. All rotated logs are compressed by default. a 14 day retention. All rotated logs are compressed by default.
Finding logs Finding logs
~~~~~~~~~~~~ ~~~~~~~~~~~~
@ -18,10 +19,10 @@ Logs are accessible in multiple locations within an OpenStack-Ansible
deployment: deployment:
* The rsyslog server container collects logs in ``/var/log/log-storage`` within * The rsyslog server container collects logs in ``/var/log/log-storage`` within
directories named after the container or physical host directories named after the container or physical host.
* Each physical host has the logs from its service containers mounted at * Each physical host has the logs from its service containers mounted at
``/openstack/log/`` ``/openstack/log/``.
* Each service container has its own logs stored at ``/var/log/<service_name>`` * Each service container has its own logs stored at ``/var/log/<service_name>``.
-------------- --------------

View File

@ -13,8 +13,9 @@ All LXC containers on the host have two virtual Ethernet interfaces:
* `eth1` in the container connects to `br-mgmt` on the host * `eth1` in the container connects to `br-mgmt` on the host
.. note:: .. note::
Some containers, such as cinder, glance, neutron_agents, and
swift_proxy, have more than two interfaces to support their Some containers, such as ``cinder``, ``glance``, ``neutron_agents``, and
``swift_proxy``, have more than two interfaces to support their
functions. functions.
Predictable interface naming Predictable interface naming
@ -70,10 +71,15 @@ containers.
Cached Ansible facts issues Cached Ansible facts issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
At the beginning of a playbook run, information about each host, such At the beginning of a playbook run, information about each host is gathered.
as its Linux distribution, kernel version, and network interfaces, is Examples of the information gathered are:
gathered. To improve performance, particularly in larger deployments,
these facts can be cached. * Linux distribution
* Kernel version
* Network interfaces
To improve performance, particularly in large deployments, you can
cache host facts and information.
OpenStack-Ansible enables fact caching by default. The facts are OpenStack-Ansible enables fact caching by default. The facts are
cached in JSON files within ``/etc/openstack_deploy/ansible_facts``. cached in JSON files within ``/etc/openstack_deploy/ansible_facts``.
@ -87,8 +93,9 @@ documentation on `fact caching`_ for more details.
Forcing regeneration of cached facts Forcing regeneration of cached facts
------------------------------------ ------------------------------------
If a host's kernel is upgraded or additional network interfaces or Cached facts may be incorrect if the host receives a kernel upgrade or new network
bridges are created on the host, its cached facts may be incorrect. interfaces. Newly created bridges also disrupt cache facts.
This can lead to unexpected errors while running playbooks, and This can lead to unexpected errors while running playbooks, and
require that the cached facts be regenerated. require that the cached facts be regenerated.

View File

@ -1,7 +1,8 @@
`Home <index.html>`_ OpenStack-Ansible Installation Guide `Home <index.html>`_ OpenStack-Ansible Installation Guide
=====================
Chapter 8. Operations Chapter 8. Operations
--------------------- =====================
.. toctree:: .. toctree::