Documentation update
This commit uniforms and updates the documentation of the various roles inside the repo. Change-Id: Ib08e0206992bcc454bd632fa85e2c16ce1111fdf
This commit is contained in:
parent
79560a3287
commit
5f4aa533f6
16
README.md
16
README.md
@ -1,15 +1,9 @@
|
||||
Team and repository tags
|
||||
========================
|
||||
Utility roles and docs for tripleo-quickstart
|
||||
=============================================
|
||||
|
||||
[![Team and repository tags](http://governance.openstack.org/badges/tripleo-quickstart-extras.svg)](http://governance.openstack.org/reference/tags/index.html)
|
||||
|
||||
<!-- Change things from this point on -->
|
||||
|
||||
Extra roles for tripleo-quickstart
|
||||
==================================
|
||||
|
||||
These Ansible role are extending the functionality of tripleo-quickstart to do
|
||||
end-to-end deployment and testing of TripleO.
|
||||
These Ansible roles are a set of useful tools to be used on top of TripleO
|
||||
deployments. They can be used together with tripleo-quickstart (and
|
||||
[tripleo-quickstart-extras](https://github.com/openstack/tripleo-quickstart-extras).
|
||||
|
||||
The documentation of each role is located in the individual role folders, and
|
||||
general usage information is in the [tripleo-quickstart
|
||||
|
@ -9,7 +9,7 @@ Requirements
|
||||
------------
|
||||
|
||||
This role must be used with a deployed TripleO environment, so you'll need a
|
||||
working directory of tripleo-quickstart with the following files:
|
||||
working directory of tripleo-quickstart or in any case these files available:
|
||||
|
||||
- **hosts**: which will contain all the hosts used in the deployment;
|
||||
- **ssh.config.ansible**: which will have all the ssh data to connect to the
|
||||
@ -24,9 +24,9 @@ Instance HA
|
||||
-----------
|
||||
|
||||
Instance HA is a feature that gives a certain degree of high-availability to the
|
||||
instances spawned by an OpenStack deployment. Namely, if a compute node on which an
|
||||
instance is running breaks for whatever reason, this configuration will spawn the
|
||||
instances that were running on the broken node onto a functioning one.
|
||||
instances spawned by an OpenStack deployment. Namely, if a compute node on which
|
||||
an instance is running breaks for whatever reason, this configuration will spawn
|
||||
the instances that were running on the broken node onto a functioning one.
|
||||
This role automates are all the necessary steps needed to configure Pacemaker
|
||||
cluster to support this functionality. A typical cluster configuration on a
|
||||
clean stock **newton** (or **osp10**) deployment is something like this:
|
||||
@ -156,7 +156,8 @@ Where:
|
||||
[defaults]
|
||||
roles_path = /path/to/tripleo-quickstart-utils/roles
|
||||
|
||||
**hosts** file must be configured with two *controller* and *compute* sections like these:
|
||||
**hosts** file must be configured with two *controller* and *compute* sections
|
||||
like these:
|
||||
|
||||
undercloud ansible_host=undercloud ansible_user=stack ansible_private_key_file=/path/to/id_rsa_undercloud
|
||||
overcloud-novacompute-1 ansible_host=overcloud-novacompute-1 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
|
||||
@ -184,7 +185,8 @@ Where:
|
||||
overcloud-controller-1
|
||||
overcloud-controller-0
|
||||
|
||||
**ssh.config.ansible** can *optionally* contain specific per-host connection options, like these:
|
||||
**ssh.config.ansible** can *optionally* contain specific per-host connection
|
||||
options, like these:
|
||||
|
||||
...
|
||||
...
|
||||
|
@ -1,16 +1,62 @@
|
||||
stonith-config
|
||||
==============
|
||||
|
||||
This role acts on an already deployed tripleo environment, setting up STONITH (Shoot The Other Node In The Head) inside the Pacemaker configuration for all the hosts that are part of the overcloud.
|
||||
This role acts on an already deployed tripleo environment, setting up STONITH
|
||||
(Shoot The Other Node In The Head) inside the Pacemaker configuration for all
|
||||
the hosts that are part of the overcloud.
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
This role must be used with a deployed TripleO environment, so you'll need a working directory of tripleo-quickstart with these files:
|
||||
This role must be used with a deployed TripleO environment, so you'll need a
|
||||
working directory of tripleo-quickstart or in any case these files available:
|
||||
|
||||
- **hosts**: which will contain all the hosts used in the deployment;
|
||||
- **ssh.config.ansible**: which will have all the ssh data to connect to the undercloud and all the overcloud nodes;
|
||||
- **instackenv.json**: which must be present on the undercloud workdir. This should be created by the installer;
|
||||
- **ssh.config.ansible**: which will have all the ssh data to connect to the
|
||||
undercloud and all the overcloud nodes;
|
||||
- **instackenv.json**: which must be present on the undercloud workdir. This
|
||||
should be created by the installer;
|
||||
|
||||
STONITH
|
||||
-------
|
||||
|
||||
STONITH is the way a Pacemaker clusters use to be certain that a node is powered
|
||||
off. STONITH is the only way to use a shared storage environment without
|
||||
worrying about concurrent writes on disks. Inside TripleO environments STONITH
|
||||
is a requisite also for activating features like Instance HA because, before
|
||||
moving any machine, the system need to be sure that the "move from" machine is
|
||||
off.
|
||||
STONITH configuration relies on the **instackenv.json** file, used by TripleO
|
||||
also to configure Ironic and all the provision stuff.
|
||||
Basically this role enables STONITH on the Pacemaker cluster and takes all the
|
||||
information from the mentioned file, creating a STONITH resource for each host
|
||||
on the overcloud.
|
||||
After running this playbook the cluster configuration will have this properties:
|
||||
|
||||
$ sudo pcs property
|
||||
Cluster Properties:
|
||||
cluster-infrastructure: corosync
|
||||
cluster-name: tripleo_cluster
|
||||
...
|
||||
...
|
||||
**stonith-enabled: true**
|
||||
|
||||
And something like this, depending on how many nodes are there in the overcloud:
|
||||
|
||||
sudo pcs stonith
|
||||
ipmilan-overcloud-compute-0 (stonith:fence_ipmilan): Started overcloud-controller-1
|
||||
ipmilan-overcloud-controller-2 (stonith:fence_ipmilan): Started overcloud-controller-0
|
||||
ipmilan-overcloud-controller-0 (stonith:fence_ipmilan): Started overcloud-controller-0
|
||||
ipmilan-overcloud-controller-1 (stonith:fence_ipmilan): Started overcloud-controller-1
|
||||
ipmilan-overcloud-compute-1 (stonith:fence_ipmilan): Started overcloud-controller-1
|
||||
|
||||
Having all this in place is a requirement for a reliable HA solution and for
|
||||
configuring special OpenStack features like [Instance HA](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/instance-ha).
|
||||
|
||||
**Note**: by default this role configures STONITH for all the overcloud nodes,
|
||||
but it is possible to limitate it just for controllers, or just for computes, by
|
||||
setting the **stonith_devices** variable, which by default is set to "all", but
|
||||
can also be "*controllers*" or "*computes*".
|
||||
|
||||
Quickstart invocation
|
||||
---------------------
|
||||
@ -37,38 +83,11 @@ Basically this command:
|
||||
|
||||
**Important note**
|
||||
|
||||
You might need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible* file to make the command work, like this:
|
||||
You might need to export *ANSIBLE_SSH_ARGS* with the path of the
|
||||
*ssh.config.ansible* file to make the command work, like this:
|
||||
|
||||
export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"
|
||||
|
||||
STONITH configuration
|
||||
---------------------
|
||||
|
||||
STONITH configuration relies on the same **instackenv.json** file used by TripleO to configure Ironic and all the provision stuff.
|
||||
Basically this role enable STONITH on the Pacemaker cluster and takes all the information from the mentioned file, creating a STONITH resource for each host on the overcloud.
|
||||
After running this playbook th cluster configuration will have this property:
|
||||
|
||||
$ sudo pcs property
|
||||
Cluster Properties:
|
||||
cluster-infrastructure: corosync
|
||||
cluster-name: tripleo_cluster
|
||||
...
|
||||
...
|
||||
**stonith-enabled: true**
|
||||
|
||||
And something like this, depending on how many nodes are there in the overcloud:
|
||||
|
||||
sudo pcs stonith
|
||||
ipmilan-overcloud-compute-0 (stonith:fence_ipmilan): Started overcloud-controller-1
|
||||
ipmilan-overcloud-controller-2 (stonith:fence_ipmilan): Started overcloud-controller-0
|
||||
ipmilan-overcloud-controller-0 (stonith:fence_ipmilan): Started overcloud-controller-0
|
||||
ipmilan-overcloud-controller-1 (stonith:fence_ipmilan): Started overcloud-controller-1
|
||||
ipmilan-overcloud-compute-1 (stonith:fence_ipmilan): Started overcloud-controller-1
|
||||
|
||||
Having all this in place is a requirement for a reliable HA solution and for configuring special OpenStack features like [Instance HA](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/instance-ha).
|
||||
|
||||
**Note**: by default this role configures STONITH for all the overcloud nodes, but it is possible to limitate it just for controllers, or just for computes, by setting the **stonith_devices** variable, which by default is set to "all", but can also be "*controllers*" or "*computes*".
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
@ -86,7 +105,8 @@ The main playbook couldn't be simpler:
|
||||
roles:
|
||||
- stonith-config
|
||||
|
||||
But it could also be used at the end of a deployment, like the validate-ha role is used in [baremetal-undercloud-validate-ha.yml](https://github.com/redhat-openstack/tripleo-quickstart-utils/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
|
||||
But it could also be used at the end of a deployment, like the validate-ha role
|
||||
is used in [baremetal-undercloud-validate-ha.yml](https://github.com/redhat-openstack/tripleo-quickstart-utils/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
|
||||
|
||||
License
|
||||
-------
|
||||
|
@ -1,16 +1,20 @@
|
||||
overcloud-validate-ha
|
||||
=====================
|
||||
|
||||
This role acts on an already deployed tripleo environment, testing all HA related functionalities of the installation.
|
||||
This role acts on an already deployed tripleo environment, testing all HA
|
||||
related functionalities of the installation.
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
This role must be used with a deployed TripleO environment, so you'll need a working directory of tripleo-quickstart with these files:
|
||||
This role must be used with a deployed TripleO environment, so you'll need a
|
||||
working directory of tripleo-quickstart or in any case these files available:
|
||||
|
||||
- **hosts**: which will contain all the hosts used in the deployment;
|
||||
- **ssh.config.ansible**: which will have all the ssh data to connect to the undercloud and all the overcloud nodes;
|
||||
- A **config file** with a definition for the floating network (which will be used to test HA instances), like this one:
|
||||
- **ssh.config.ansible**: which will have all the ssh data to connect to the
|
||||
undercloud and all the overcloud nodes;
|
||||
- A **config file** with a definition for the floating network (which will be
|
||||
used to test HA instances), like this one:
|
||||
|
||||
public_physical_network: "floating"
|
||||
floating_ip_cidr: "10.0.0.0/24"
|
||||
@ -18,6 +22,56 @@ This role must be used with a deployed TripleO environment, so you'll need a wor
|
||||
public_net_pool_end: "10.0.0.198"
|
||||
public_net_gateway: "10.0.0.254"
|
||||
|
||||
HA tests
|
||||
--------
|
||||
|
||||
HA tests are meant to check the behavior of the environment in front of
|
||||
circumstances that involve service interruption, lost of a node and in general
|
||||
actions that stress the OpenStack installation with unexpected failures.
|
||||
Each test is associated to a global variable that, if true, makes the test
|
||||
happen.
|
||||
Tests are grouped and performed by default depending on the OpenStack release.
|
||||
This is the list of the supported variables, with test description and name of
|
||||
the release on which the test is performed:
|
||||
|
||||
- **test_ha_failed_actions**: Look for failed actions (**all**)
|
||||
- **test_ha_master_slave**: Stop master slave resources (galera and redis), all
|
||||
the resources should come down (**all**)
|
||||
- **test_ha_keystone_constraint_removal**: Stop keystone resource (by stopping
|
||||
httpd), check no other resource is stopped (**mitaka**)
|
||||
- Next generation cluster checks (**newton**, **ocata**, **master**):
|
||||
- **test_ha_ng_a**: Stop every systemd resource, stop Galera and Rabbitmq,
|
||||
Start every systemd resource
|
||||
- **test_ha_ng_b**: Stop Galera and Rabbitmq, stop every systemd resource,
|
||||
Start every systemd resource
|
||||
- **test_ha_ng_c**: Stop Galera and Rabbitmq, wait 20 minutes to see if
|
||||
something fails
|
||||
- **test_ha_instance**: Instance deployment (**all**)
|
||||
|
||||
It is also possible to omit (or add) tests not made for the specific release,
|
||||
using the above vars, like in this example:
|
||||
|
||||
./quickstart.sh \
|
||||
--retain-inventory \
|
||||
--ansible-debug \
|
||||
--no-clone \
|
||||
--playbook overcloud-validate-ha.yml \
|
||||
--working-dir /path/to/workdir/ \
|
||||
--config /path/to/config.yml \
|
||||
--extra-vars test_ha_failed_actions=false \
|
||||
--extra-vars test_ha_ng_a=true \
|
||||
--release mitaka \
|
||||
--tags all \
|
||||
<VIRTHOST>
|
||||
|
||||
In this case we will not check for failed actions (which is test that otherwise
|
||||
will be done in mitaka) and we will force the execution of the "ng_a" test
|
||||
described earlier, which is originally executed just in newton versions or
|
||||
above.
|
||||
|
||||
All tests are performed using an external application named
|
||||
[tripleo-director-ha-test-suite](https://github.com/rscarazz/tripleo-director-ha-test-suite).
|
||||
|
||||
Quickstart invocation
|
||||
---------------------
|
||||
|
||||
@ -43,44 +97,14 @@ Basically this command:
|
||||
|
||||
**Important note**
|
||||
|
||||
If the role is called by itself, so not in the same playbook that already deploys the environment (see [baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml), you need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible* file, like this:
|
||||
If the role is called by itself, so not in the same playbook that already
|
||||
deploys the environment (see
|
||||
[baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml),
|
||||
you need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible*
|
||||
file, like this:
|
||||
|
||||
export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"
|
||||
|
||||
HA tests
|
||||
--------
|
||||
|
||||
Each test is associated to a global variable that, if true, makes the test happen. Tests are grouped and performed by default depending on the OpenStack release.
|
||||
This is the list of the supported variables, with test description and name of the release on which test is performed:
|
||||
|
||||
- **test_ha_failed_actions**: Look for failed actions (**all**)
|
||||
- **test_ha_master_slave**: Stop master slave resources (galera and redis), all the resources should come down (**all**)
|
||||
- **test_ha_keystone_constraint_removal**: Stop keystone resource (by stopping httpd), check no other resource is stopped (**mitaka**)
|
||||
- **Test: next generation cluster checks (**newton**):
|
||||
- **test_ha_ng_a**: Stop every systemd resource, stop Galera and Rabbitmq, Start every systemd resource
|
||||
- **test_ha_ng_b**: Stop Galera and Rabbitmq, stop every systemd resource, Start every systemd resource
|
||||
- **test_ha_ng_c**: Stop Galera and Rabbitmq, wait 20 minutes to see if something fails
|
||||
- **test_ha_instance**: Instance deployment (**all**)
|
||||
|
||||
It is also possible to omit (or add) tests not made for the specific release, using the above vars, like in this example:
|
||||
|
||||
./quickstart.sh \
|
||||
--retain-inventory \
|
||||
--ansible-debug \
|
||||
--no-clone \
|
||||
--playbook overcloud-validate-ha.yml \
|
||||
--working-dir /path/to/workdir/ \
|
||||
--config /path/to/config.yml \
|
||||
--extra-vars test_ha_failed_actions=false \
|
||||
--extra-vars test_ha_ng_a=true \
|
||||
--release mitaka \
|
||||
--tags all \
|
||||
<VIRTHOST>
|
||||
|
||||
In this case we will not check for failed actions (which is test that otherwise will be done in mitaka) and we will force the execution of the "ng_a" test described earlier, which is originally executed just in newton versions or above.
|
||||
|
||||
All tests are performed using an external application named [tripleo-director-ha-test-suite](https://github.com/rscarazz/tripleo-director-ha-test-suite).
|
||||
|
||||
Example Playbook
|
||||
----------------
|
||||
|
||||
@ -93,12 +117,13 @@ The main playbook couldn't be simpler:
|
||||
roles:
|
||||
- tripleo-overcloud-validate-ha
|
||||
|
||||
But it could also be used at the end of a deployment, like in this file [baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
|
||||
But it could also be used at the end of a deployment, like in this file
|
||||
[baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
|
||||
|
||||
License
|
||||
-------
|
||||
|
||||
Apache
|
||||
GPL
|
||||
|
||||
Author Information
|
||||
------------------
|
||||
|
Loading…
Reference in New Issue
Block a user