Documentation update

This commit uniforms and updates the documentation of the various roles
inside the repo.

Change-Id: Ib08e0206992bcc454bd632fa85e2c16ce1111fdf
This commit is contained in:
Raoul Scarazzini 2017-03-27 15:01:50 +02:00
parent 79560a3287
commit 5f4aa533f6
5 changed files with 159 additions and 118 deletions

View File

@ -1,15 +1,9 @@
Team and repository tags
========================
Utility roles and docs for tripleo-quickstart
=============================================
[![Team and repository tags](http://governance.openstack.org/badges/tripleo-quickstart-extras.svg)](http://governance.openstack.org/reference/tags/index.html)
<!-- Change things from this point on -->
Extra roles for tripleo-quickstart
==================================
These Ansible role are extending the functionality of tripleo-quickstart to do
end-to-end deployment and testing of TripleO.
These Ansible roles are a set of useful tools to be used on top of TripleO
deployments. They can be used together with tripleo-quickstart (and
[tripleo-quickstart-extras](https://github.com/openstack/tripleo-quickstart-extras).
The documentation of each role is located in the individual role folders, and
general usage information is in the [tripleo-quickstart

View File

@ -9,7 +9,7 @@ Requirements
------------
This role must be used with a deployed TripleO environment, so you'll need a
working directory of tripleo-quickstart with the following files:
working directory of tripleo-quickstart or in any case these files available:
- **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the
@ -24,9 +24,9 @@ Instance HA
-----------
Instance HA is a feature that gives a certain degree of high-availability to the
instances spawned by an OpenStack deployment. Namely, if a compute node on which an
instance is running breaks for whatever reason, this configuration will spawn the
instances that were running on the broken node onto a functioning one.
instances spawned by an OpenStack deployment. Namely, if a compute node on which
an instance is running breaks for whatever reason, this configuration will spawn
the instances that were running on the broken node onto a functioning one.
This role automates are all the necessary steps needed to configure Pacemaker
cluster to support this functionality. A typical cluster configuration on a
clean stock **newton** (or **osp10**) deployment is something like this:
@ -156,7 +156,8 @@ Where:
[defaults]
roles_path = /path/to/tripleo-quickstart-utils/roles
**hosts** file must be configured with two *controller* and *compute* sections like these:
**hosts** file must be configured with two *controller* and *compute* sections
like these:
undercloud ansible_host=undercloud ansible_user=stack ansible_private_key_file=/path/to/id_rsa_undercloud
overcloud-novacompute-1 ansible_host=overcloud-novacompute-1 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
@ -184,7 +185,8 @@ Where:
overcloud-controller-1
overcloud-controller-0
**ssh.config.ansible** can *optionally* contain specific per-host connection options, like these:
**ssh.config.ansible** can *optionally* contain specific per-host connection
options, like these:
...
...

View File

@ -1,16 +1,62 @@
stonith-config
==============
This role acts on an already deployed tripleo environment, setting up STONITH (Shoot The Other Node In The Head) inside the Pacemaker configuration for all the hosts that are part of the overcloud.
This role acts on an already deployed tripleo environment, setting up STONITH
(Shoot The Other Node In The Head) inside the Pacemaker configuration for all
the hosts that are part of the overcloud.
Requirements
------------
This role must be used with a deployed TripleO environment, so you'll need a working directory of tripleo-quickstart with these files:
This role must be used with a deployed TripleO environment, so you'll need a
working directory of tripleo-quickstart or in any case these files available:
- **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the undercloud and all the overcloud nodes;
- **instackenv.json**: which must be present on the undercloud workdir. This should be created by the installer;
- **ssh.config.ansible**: which will have all the ssh data to connect to the
undercloud and all the overcloud nodes;
- **instackenv.json**: which must be present on the undercloud workdir. This
should be created by the installer;
STONITH
-------
STONITH is the way a Pacemaker clusters use to be certain that a node is powered
off. STONITH is the only way to use a shared storage environment without
worrying about concurrent writes on disks. Inside TripleO environments STONITH
is a requisite also for activating features like Instance HA because, before
moving any machine, the system need to be sure that the "move from" machine is
off.
STONITH configuration relies on the **instackenv.json** file, used by TripleO
also to configure Ironic and all the provision stuff.
Basically this role enables STONITH on the Pacemaker cluster and takes all the
information from the mentioned file, creating a STONITH resource for each host
on the overcloud.
After running this playbook the cluster configuration will have this properties:
$ sudo pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: tripleo_cluster
...
...
**stonith-enabled: true**
And something like this, depending on how many nodes are there in the overcloud:
sudo pcs stonith
ipmilan-overcloud-compute-0 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-controller-2 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-0 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-1 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-compute-1 (stonith:fence_ipmilan): Started overcloud-controller-1
Having all this in place is a requirement for a reliable HA solution and for
configuring special OpenStack features like [Instance HA](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/instance-ha).
**Note**: by default this role configures STONITH for all the overcloud nodes,
but it is possible to limitate it just for controllers, or just for computes, by
setting the **stonith_devices** variable, which by default is set to "all", but
can also be "*controllers*" or "*computes*".
Quickstart invocation
---------------------
@ -37,38 +83,11 @@ Basically this command:
**Important note**
You might need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible* file to make the command work, like this:
You might need to export *ANSIBLE_SSH_ARGS* with the path of the
*ssh.config.ansible* file to make the command work, like this:
export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"
STONITH configuration
---------------------
STONITH configuration relies on the same **instackenv.json** file used by TripleO to configure Ironic and all the provision stuff.
Basically this role enable STONITH on the Pacemaker cluster and takes all the information from the mentioned file, creating a STONITH resource for each host on the overcloud.
After running this playbook th cluster configuration will have this property:
$ sudo pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: tripleo_cluster
...
...
**stonith-enabled: true**
And something like this, depending on how many nodes are there in the overcloud:
sudo pcs stonith
ipmilan-overcloud-compute-0 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-controller-2 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-0 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-1 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-compute-1 (stonith:fence_ipmilan): Started overcloud-controller-1
Having all this in place is a requirement for a reliable HA solution and for configuring special OpenStack features like [Instance HA](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/instance-ha).
**Note**: by default this role configures STONITH for all the overcloud nodes, but it is possible to limitate it just for controllers, or just for computes, by setting the **stonith_devices** variable, which by default is set to "all", but can also be "*controllers*" or "*computes*".
Limitations
-----------
@ -86,7 +105,8 @@ The main playbook couldn't be simpler:
roles:
- stonith-config
But it could also be used at the end of a deployment, like the validate-ha role is used in [baremetal-undercloud-validate-ha.yml](https://github.com/redhat-openstack/tripleo-quickstart-utils/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
But it could also be used at the end of a deployment, like the validate-ha role
is used in [baremetal-undercloud-validate-ha.yml](https://github.com/redhat-openstack/tripleo-quickstart-utils/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
License
-------

View File

@ -1,16 +1,20 @@
overcloud-validate-ha
=====================
This role acts on an already deployed tripleo environment, testing all HA related functionalities of the installation.
This role acts on an already deployed tripleo environment, testing all HA
related functionalities of the installation.
Requirements
------------
This role must be used with a deployed TripleO environment, so you'll need a working directory of tripleo-quickstart with these files:
This role must be used with a deployed TripleO environment, so you'll need a
working directory of tripleo-quickstart or in any case these files available:
- **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the undercloud and all the overcloud nodes;
- A **config file** with a definition for the floating network (which will be used to test HA instances), like this one:
- **ssh.config.ansible**: which will have all the ssh data to connect to the
undercloud and all the overcloud nodes;
- A **config file** with a definition for the floating network (which will be
used to test HA instances), like this one:
public_physical_network: "floating"
floating_ip_cidr: "10.0.0.0/24"
@ -18,6 +22,56 @@ This role must be used with a deployed TripleO environment, so you'll need a wor
public_net_pool_end: "10.0.0.198"
public_net_gateway: "10.0.0.254"
HA tests
--------
HA tests are meant to check the behavior of the environment in front of
circumstances that involve service interruption, lost of a node and in general
actions that stress the OpenStack installation with unexpected failures.
Each test is associated to a global variable that, if true, makes the test
happen.
Tests are grouped and performed by default depending on the OpenStack release.
This is the list of the supported variables, with test description and name of
the release on which the test is performed:
- **test_ha_failed_actions**: Look for failed actions (**all**)
- **test_ha_master_slave**: Stop master slave resources (galera and redis), all
the resources should come down (**all**)
- **test_ha_keystone_constraint_removal**: Stop keystone resource (by stopping
httpd), check no other resource is stopped (**mitaka**)
- Next generation cluster checks (**newton**, **ocata**, **master**):
- **test_ha_ng_a**: Stop every systemd resource, stop Galera and Rabbitmq,
Start every systemd resource
- **test_ha_ng_b**: Stop Galera and Rabbitmq, stop every systemd resource,
Start every systemd resource
- **test_ha_ng_c**: Stop Galera and Rabbitmq, wait 20 minutes to see if
something fails
- **test_ha_instance**: Instance deployment (**all**)
It is also possible to omit (or add) tests not made for the specific release,
using the above vars, like in this example:
./quickstart.sh \
--retain-inventory \
--ansible-debug \
--no-clone \
--playbook overcloud-validate-ha.yml \
--working-dir /path/to/workdir/ \
--config /path/to/config.yml \
--extra-vars test_ha_failed_actions=false \
--extra-vars test_ha_ng_a=true \
--release mitaka \
--tags all \
<VIRTHOST>
In this case we will not check for failed actions (which is test that otherwise
will be done in mitaka) and we will force the execution of the "ng_a" test
described earlier, which is originally executed just in newton versions or
above.
All tests are performed using an external application named
[tripleo-director-ha-test-suite](https://github.com/rscarazz/tripleo-director-ha-test-suite).
Quickstart invocation
---------------------
@ -43,44 +97,14 @@ Basically this command:
**Important note**
If the role is called by itself, so not in the same playbook that already deploys the environment (see [baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml), you need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible* file, like this:
If the role is called by itself, so not in the same playbook that already
deploys the environment (see
[baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml),
you need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible*
file, like this:
export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"
HA tests
--------
Each test is associated to a global variable that, if true, makes the test happen. Tests are grouped and performed by default depending on the OpenStack release.
This is the list of the supported variables, with test description and name of the release on which test is performed:
- **test_ha_failed_actions**: Look for failed actions (**all**)
- **test_ha_master_slave**: Stop master slave resources (galera and redis), all the resources should come down (**all**)
- **test_ha_keystone_constraint_removal**: Stop keystone resource (by stopping httpd), check no other resource is stopped (**mitaka**)
- **Test: next generation cluster checks (**newton**):
- **test_ha_ng_a**: Stop every systemd resource, stop Galera and Rabbitmq, Start every systemd resource
- **test_ha_ng_b**: Stop Galera and Rabbitmq, stop every systemd resource, Start every systemd resource
- **test_ha_ng_c**: Stop Galera and Rabbitmq, wait 20 minutes to see if something fails
- **test_ha_instance**: Instance deployment (**all**)
It is also possible to omit (or add) tests not made for the specific release, using the above vars, like in this example:
./quickstart.sh \
--retain-inventory \
--ansible-debug \
--no-clone \
--playbook overcloud-validate-ha.yml \
--working-dir /path/to/workdir/ \
--config /path/to/config.yml \
--extra-vars test_ha_failed_actions=false \
--extra-vars test_ha_ng_a=true \
--release mitaka \
--tags all \
<VIRTHOST>
In this case we will not check for failed actions (which is test that otherwise will be done in mitaka) and we will force the execution of the "ng_a" test described earlier, which is originally executed just in newton versions or above.
All tests are performed using an external application named [tripleo-director-ha-test-suite](https://github.com/rscarazz/tripleo-director-ha-test-suite).
Example Playbook
----------------
@ -93,12 +117,13 @@ The main playbook couldn't be simpler:
roles:
- tripleo-overcloud-validate-ha
But it could also be used at the end of a deployment, like in this file [baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
But it could also be used at the end of a deployment, like in this file
[baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
License
-------
Apache
GPL
Author Information
------------------