Docs: Ops section - cleanup
As per discussion in the OSA docs summit session, clean up of installation guide. This fixes typos, minor RST mark up changes, and passive voice. Change-Id: Ibacaabddafee465a05bcb6eec01dd3ef04b33826
This commit is contained in:
parent
4c393d8b01
commit
7a82904d61
@ -1,7 +1,8 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
=====================
|
||||
Adding a compute host
|
||||
---------------------
|
||||
=====================
|
||||
|
||||
Use the following procedure to add a compute host to an operational
|
||||
cluster.
|
||||
@ -14,8 +15,8 @@ cluster.
|
||||
|
||||
If necessary, also modify the ``used_ips`` stanza.
|
||||
|
||||
#. If the cluster is utilizing Ceilometer, it will be necessary to edit the
|
||||
``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the host to
|
||||
#. If the cluster is utilizing Telemetry/Metering (Ceilometer),
|
||||
edit the ``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the host to
|
||||
the ``metering-compute_hosts`` stanza.
|
||||
|
||||
#. Run the following commands to add the host. Replace
|
||||
|
@ -1,12 +1,12 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
=======================
|
||||
Galera cluster recovery
|
||||
-----------------------
|
||||
=======================
|
||||
|
||||
When one or all nodes fail within a galera cluster you may need to
|
||||
re-bootstrap the environment. To make take advantage of the
|
||||
automation Ansible provides simply execute the ``galera-install.yml``
|
||||
play using the **galera-bootstrap** to auto recover a node or an
|
||||
Run the `` ``galera-bootstrap`` playbook to automatically recover
|
||||
a node or an entire environment. Run the ``galera install`` playbook`
|
||||
using the ``galera-bootstrap`` tag to auto recover a node or an
|
||||
entire environment.
|
||||
|
||||
#. Run the following Ansible command to show the failed nodes:
|
||||
@ -15,15 +15,13 @@ entire environment.
|
||||
|
||||
# openstack-ansible galera-install.yml --tags galera-bootstrap
|
||||
|
||||
|
||||
Upon completion of this command the cluster should be back online an in
|
||||
a functional state.
|
||||
The cluster comes back online after completion of this command.
|
||||
|
||||
Single-node failure
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If a single node fails, the other nodes maintain quorum and continue to
|
||||
process SQL requests.
|
||||
If a single node fails, the other nodes maintain quorum and
|
||||
continue to process SQL requests.
|
||||
|
||||
#. Run the following Ansible command to determine the failed node:
|
||||
|
||||
@ -55,15 +53,15 @@ process SQL requests.
|
||||
#. Restart MariaDB on the failed node and verify that it rejoins the
|
||||
cluster.
|
||||
|
||||
#. If MariaDB fails to start, run the **mysqld** command and perform
|
||||
further analysis on the output. As a last resort, rebuild the
|
||||
container for the node.
|
||||
#. If MariaDB fails to start, run the ``mysqld`` command and perform
|
||||
further analysis on the output. As a last resort, rebuild the container
|
||||
for the node.
|
||||
|
||||
Multi-node failure
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When all but one node fails, the remaining node cannot achieve quorum
|
||||
and stops processing SQL requests. In this situation, failed nodes that
|
||||
When all but one node fails, the remaining node cannot achieve quorum and
|
||||
stops processing SQL requests. In this situation, failed nodes that
|
||||
recover cannot join the cluster because it no longer exists.
|
||||
|
||||
#. Run the following Ansible command to show the failed nodes:
|
||||
@ -92,7 +90,7 @@ recover cannot join the cluster because it no longer exists.
|
||||
|
||||
#. Run the following command to
|
||||
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
|
||||
the operational node into the cluster.
|
||||
the operational node into the cluster:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
@ -116,7 +114,7 @@ recover cannot join the cluster because it no longer exists.
|
||||
processing SQL requests.
|
||||
|
||||
#. Restart MariaDB on the failed nodes and verify that they rejoin the
|
||||
cluster.
|
||||
cluster:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
@ -144,16 +142,15 @@ recover cannot join the cluster because it no longer exists.
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
#. If MariaDB fails to start on any of the failed nodes, run the
|
||||
**mysqld** command and perform further analysis on the output. As a
|
||||
``mysqld`` command and perform further analysis on the output. As a
|
||||
last resort, rebuild the container for the node.
|
||||
|
||||
Complete failure
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
If all of the nodes in a Galera cluster fail (do not shutdown
|
||||
gracefully), then the integrity of the database can no longer be
|
||||
guaranteed and should be restored from backup. Run the following command
|
||||
to determine if all nodes in the cluster have failed:
|
||||
Restore from backup if all of the nodes in a Galera cluster fail (do not shutdown
|
||||
gracefully). Run the following command to determine if all nodes in the
|
||||
cluster have failed:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
@ -185,25 +182,25 @@ nodes and all of the nodes contain a ``seqno`` value of -1.
|
||||
|
||||
If any single node has a positive ``seqno`` value, then that node can be
|
||||
used to restart the cluster. However, because there is no guarantee that
|
||||
each node has an identical copy of the data, it is not recommended to
|
||||
restart the cluster using the **--wsrep-new-cluster** command on one
|
||||
each node has an identical copy of the data, we do not recommend to
|
||||
restart the cluster using the ``--wsrep-new-cluster`` command on one
|
||||
node.
|
||||
|
||||
Rebuilding a container
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Sometimes recovering from a failure requires rebuilding one or more
|
||||
containers.
|
||||
Recovering from certain failures require rebuilding one or more containers.
|
||||
|
||||
#. Disable the failed node on the load balancer.
|
||||
|
||||
.. note::
|
||||
|
||||
Do not rely on the load balancer health checks to disable the node.
|
||||
If the node is not disabled, the load balancer will send SQL requests
|
||||
If the node is not disabled, the load balancer sends SQL requests
|
||||
to it before it rejoins the cluster and cause data inconsistencies.
|
||||
|
||||
#. Use the following commands to destroy the container and remove
|
||||
MariaDB data stored outside of the container. In this example, node 3
|
||||
failed.
|
||||
#. Destroy the container and remove MariaDB data stored outside
|
||||
of the container:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
@ -211,8 +208,9 @@ containers.
|
||||
# lxc-destroy -n node3_galera_container-3ea2cbd3
|
||||
# rm -rf /openstack/node3_galera_container-3ea2cbd3/*
|
||||
|
||||
#. Run the host setup playbook to rebuild the container specifically on
|
||||
node 3:
|
||||
In this example, node 3 failed.
|
||||
|
||||
#. Run the host setup playbook to rebuild the container on node 3:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
@ -220,7 +218,7 @@ containers.
|
||||
-l node3_galera_container-3ea2cbd3
|
||||
|
||||
|
||||
The playbook will also restart all other containers on the node.
|
||||
The playbook restarts all other containers on the node.
|
||||
|
||||
#. Run the infrastructure playbook to configure the container
|
||||
specifically on node 3:
|
||||
@ -231,7 +229,9 @@ containers.
|
||||
-l node3_galera_container-3ea2cbd3
|
||||
|
||||
|
||||
The new container runs a single-node Galera cluster, a dangerous
|
||||
.. warning::
|
||||
|
||||
The new container runs a single-node Galera cluster, which is a dangerous
|
||||
state because the environment contains more than one active database
|
||||
with potentially different data.
|
||||
|
||||
|
@ -1,7 +1,8 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
==============
|
||||
Removing nodes
|
||||
--------------
|
||||
==============
|
||||
|
||||
In the following example, all but one node was shut down gracefully:
|
||||
|
||||
|
@ -1,15 +1,15 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
==================
|
||||
Starting a cluster
|
||||
------------------
|
||||
==================
|
||||
|
||||
Gracefully shutting down all nodes destroys the cluster. Starting or
|
||||
restarting a cluster from zero nodes requires creating a new cluster on
|
||||
one of the nodes.
|
||||
|
||||
#. The new cluster should be started on the most advanced node. Run the
|
||||
following command to check the ``seqno`` value in the
|
||||
``grastate.dat`` file on all of the nodes:
|
||||
#. Start a new cluster on the most advanced node.
|
||||
Check the ``seqno`` value in the ``grastate.dat`` file on all of the nodes:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
@ -33,7 +33,7 @@ one of the nodes.
|
||||
cert_index:
|
||||
|
||||
In this example, all nodes in the cluster contain the same positive
|
||||
``seqno`` values because they were synchronized just prior to
|
||||
``seqno`` values as they were synchronized just prior to
|
||||
graceful shutdown. If all ``seqno`` values are equal, any node can
|
||||
start the new cluster.
|
||||
|
||||
|
@ -1,7 +1,8 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
==========================
|
||||
Galera cluster maintenance
|
||||
--------------------------
|
||||
==========================
|
||||
|
||||
.. toctree::
|
||||
|
||||
@ -13,8 +14,8 @@ Routine maintenance includes gracefully adding or removing nodes from
|
||||
the cluster without impacting operation and also starting a cluster
|
||||
after gracefully shutting down all nodes.
|
||||
|
||||
MySQL instances are restarted when creating a cluster, adding a
|
||||
node, the service isn't running, or when changes are made to the
|
||||
MySQL instances are restarted when creating a cluster, when adding a
|
||||
node, when the service is not running, or when changes are made to the
|
||||
``/etc/mysql/my.cnf`` configuration file.
|
||||
|
||||
--------------
|
||||
|
@ -1,9 +1,10 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
Centralized Logging
|
||||
-------------------
|
||||
===================
|
||||
Centralized logging
|
||||
===================
|
||||
|
||||
OpenStack-Ansible will configure all instances to send syslog data to a
|
||||
OpenStack-Ansible configures all instances to send syslog data to a
|
||||
container (or group of containers) running rsyslog. The rsyslog server
|
||||
containers are specified in the ``log_hosts`` section of the
|
||||
``openstack_user_config.yml`` file.
|
||||
@ -18,10 +19,10 @@ Logs are accessible in multiple locations within an OpenStack-Ansible
|
||||
deployment:
|
||||
|
||||
* The rsyslog server container collects logs in ``/var/log/log-storage`` within
|
||||
directories named after the container or physical host
|
||||
directories named after the container or physical host.
|
||||
* Each physical host has the logs from its service containers mounted at
|
||||
``/openstack/log/``
|
||||
* Each service container has its own logs stored at ``/var/log/<service_name>``
|
||||
``/openstack/log/``.
|
||||
* Each service container has its own logs stored at ``/var/log/<service_name>``.
|
||||
|
||||
--------------
|
||||
|
||||
|
@ -13,8 +13,9 @@ All LXC containers on the host have two virtual Ethernet interfaces:
|
||||
* `eth1` in the container connects to `br-mgmt` on the host
|
||||
|
||||
.. note::
|
||||
Some containers, such as cinder, glance, neutron_agents, and
|
||||
swift_proxy, have more than two interfaces to support their
|
||||
|
||||
Some containers, such as ``cinder``, ``glance``, ``neutron_agents``, and
|
||||
``swift_proxy``, have more than two interfaces to support their
|
||||
functions.
|
||||
|
||||
Predictable interface naming
|
||||
@ -70,10 +71,15 @@ containers.
|
||||
Cached Ansible facts issues
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
At the beginning of a playbook run, information about each host, such
|
||||
as its Linux distribution, kernel version, and network interfaces, is
|
||||
gathered. To improve performance, particularly in larger deployments,
|
||||
these facts can be cached.
|
||||
At the beginning of a playbook run, information about each host is gathered.
|
||||
Examples of the information gathered are:
|
||||
|
||||
* Linux distribution
|
||||
* Kernel version
|
||||
* Network interfaces
|
||||
|
||||
To improve performance, particularly in large deployments, you can
|
||||
cache host facts and information.
|
||||
|
||||
OpenStack-Ansible enables fact caching by default. The facts are
|
||||
cached in JSON files within ``/etc/openstack_deploy/ansible_facts``.
|
||||
@ -87,8 +93,9 @@ documentation on `fact caching`_ for more details.
|
||||
Forcing regeneration of cached facts
|
||||
------------------------------------
|
||||
|
||||
If a host's kernel is upgraded or additional network interfaces or
|
||||
bridges are created on the host, its cached facts may be incorrect.
|
||||
Cached facts may be incorrect if the host receives a kernel upgrade or new network
|
||||
interfaces. Newly created bridges also disrupt cache facts.
|
||||
|
||||
This can lead to unexpected errors while running playbooks, and
|
||||
require that the cached facts be regenerated.
|
||||
|
||||
|
@ -1,7 +1,8 @@
|
||||
`Home <index.html>`_ OpenStack-Ansible Installation Guide
|
||||
|
||||
=====================
|
||||
Chapter 8. Operations
|
||||
---------------------
|
||||
=====================
|
||||
|
||||
.. toctree::
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user