Instance-ha role creation

This role executes all the steps needed to configure instance HA
following the Knowledge Base documents published by Red Hat.

Change-Id: I18f459276749a34e9255a1668164b6cf08a19338
This commit is contained in:
Raoul Scarazzini 2017-03-10 10:08:46 -05:00 committed by Michele Baldessari
parent 2a5d666e6d
commit 79560a3287
4 changed files with 539 additions and 0 deletions

View File

@ -0,0 +1,13 @@
---
- name: Configure STONITH for all the hosts on the overcloud
hosts: undercloud
gather_facts: no
roles:
- stonith-config
- name: Configure Instance HA
hosts: undercloud
gather_facts: no
roles:
- instance-ha

234
roles/instance-ha/README.md Normal file
View File

@ -0,0 +1,234 @@
instance-ha
===========
This role aims to automate all the steps needed to configure instance HA on a
deployed (via tripleo-quickstart) overcloud environment. For more information
about Instance HA, see the [IHA Documentation](https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/9/html-single/high_availability_for_compute_instances/)
Requirements
------------
This role must be used with a deployed TripleO environment, so you'll need a
working directory of tripleo-quickstart with the following files:
- **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the
undercloud and all the overcloud nodes;
**NOTE**: Instance-HA depends on STONITH. This means that all the steps
performed by this role make sense only if on the overcloud STONITH has been
configured. There is a dedicated role that automates the STONITH
configuration, named [stonith-config](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/stonith-config).
Instance HA
-----------
Instance HA is a feature that gives a certain degree of high-availability to the
instances spawned by an OpenStack deployment. Namely, if a compute node on which an
instance is running breaks for whatever reason, this configuration will spawn the
instances that were running on the broken node onto a functioning one.
This role automates are all the necessary steps needed to configure Pacemaker
cluster to support this functionality. A typical cluster configuration on a
clean stock **newton** (or **osp10**) deployment is something like this:
Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Full list of resources:
ip-192.168.24.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.18.0.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.20.0.19 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-172.17.0.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-172.19.0.12 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
Clone Set: haproxy-clone [haproxy]
Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Master/Slave Set: galera-master [galera]
Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
ip-172.17.0.18 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
Clone Set: rabbitmq-clone [rabbitmq]
Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Master/Slave Set: redis-master [redis]
Masters: [ overcloud-controller-0 ]
Slaves: [ overcloud-controller-1 overcloud-controller-2 ]
openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0
As you can see we have 3 controllers, six IP resources, four *core* resources
(*haproxy*, *galera*, *rabbitmq* and *redis*) and one last resource which is
*openstack-cinder-volume* that needs to run as a single active/passive resource
inside the cluster. This role configures all the additional resources needed
to have a working instance HA setup. Once the playbook is executed, the
configuration will be something like this:
Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
RemoteOnline: [ overcloud-compute-0 overcloud-compute-1 ]
Full list of resources:
ip-192.168.24.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.18.0.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
ip-172.20.0.19 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-172.17.0.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
ip-172.19.0.12 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
Clone Set: haproxy-clone [haproxy]
Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Stopped: [ overcloud-compute-0 overcloud-compute-1 ]
Master/Slave Set: galera-master [galera]
Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Stopped: [ overcloud-compute-0 overcloud-compute-1 ]
ip-172.17.0.18 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
Clone Set: rabbitmq-clone [rabbitmq]
Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Stopped: [ overcloud-compute-0 overcloud-compute-1 ]
Master/Slave Set: redis-master [redis]
Masters: [ overcloud-controller-0 ]
Slaves: [ overcloud-controller-1 overcloud-controller-2 ]
Stopped: [ overcloud-compute-0 overcloud-compute-1 ]
openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0
ipmilan-overcloud-compute-0 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-controller-2 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-0 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-1 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-compute-1 (stonith:fence_ipmilan): Started overcloud-controller-1
nova-evacuate (ocf::openstack:NovaEvacuate): Started overcloud-controller-0
Clone Set: nova-compute-checkevacuate-clone [nova-compute-checkevacuate]
Started: [ overcloud-compute-0 overcloud-compute-1 ]
Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Clone Set: nova-compute-clone [nova-compute]
Started: [ overcloud-compute-0 overcloud-compute-1 ]
Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
fence-nova (stonith:fence_compute): Started overcloud-controller-0
overcloud-compute-1 (ocf::pacemaker:remote): Started overcloud-controller-0
overcloud-compute-0 (ocf::pacemaker:remote): Started overcloud-controller-1
Since there are a lot of differences from a stock deployment, understanding
the way Instance HA works can be quite hard, so additional information around
Instance HA is available at [this link](https://github.com/rscarazz/tripleo-director-instance-ha/blob/master/README.md).
Quickstart invocation
---------------------
Quickstart can be invoked like this:
./quickstart.sh \
--retain-inventory \
--playbook overcloud-instance-ha.yml \
--working-dir /path/to/workdir \
--config /path/to/config.yml \
--release <RELEASE> \
--tags all \
<HOSTNAME or IP>
Basically this command:
- **Keeps** existing data on the repo (it's the most important one)
- Uses the *overcloud-instance-ha.yml* playbook
- Uses the same custom workdir where quickstart was first deployed
- Select the specific config file
- Specifies the release (mitaka, newton, or “master” for ocata)
- Performs all the tasks in the playbook overcloud-instance-ha.yml
**Important note**
You might need to export *ANSIBLE_SSH_ARGS* with the path of the
*ssh.config.ansible* file to make the command work, like this:
export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"
Using the playbook on an existing TripleO environment
-----------------------------------------------------
It is possible to execute the playbook on an environment not created via TriplO
quickstart, by cloning via git the tripleo-quickstart-utils repo:
$ git clone https://gitlab.com/redhat-openstack/tripleo-quickstart-utils
then it's just a matter of declaring three environment variables, pointing to
three files:
$ export ANSIBLE_CONFIG=/path/to/ansible.cfg
$ export ANSIBLE_INVENTORY=/path/to/hosts
$ export ANSIBLE_SSH_ARGS="-F /path/to/ssh.config.ansible"
Where:
**ansible.cfg** must contain at least these lines:
[defaults]
roles_path = /path/to/tripleo-quickstart-utils/roles
**hosts** file must be configured with two *controller* and *compute* sections like these:
undercloud ansible_host=undercloud ansible_user=stack ansible_private_key_file=/path/to/id_rsa_undercloud
overcloud-novacompute-1 ansible_host=overcloud-novacompute-1 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
overcloud-novacompute-0 ansible_host=overcloud-novacompute-0 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
overcloud-controller-2 ansible_host=overcloud-controller-2 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
overcloud-controller-1 ansible_host=overcloud-controller-1 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
overcloud-controller-0 ansible_host=overcloud-controller-0 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
[compute]
overcloud-novacompute-1
overcloud-novacompute-0
[undercloud]
undercloud
[overcloud]
overcloud-novacompute-1
overcloud-novacompute-0
overcloud-controller-2
overcloud-controller-1
overcloud-controller-0
[controller]
overcloud-controller-2
overcloud-controller-1
overcloud-controller-0
**ssh.config.ansible** can *optionally* contain specific per-host connection options, like these:
...
...
Host overcloud-controller-0
ProxyCommand ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=60 -F /path/to/ssh.config.ansible undercloud -W 192.168.24.16:22
IdentityFile /path/to/id_rsa_overcloud
User heat-admin
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
...
...
In this example to connect to overcloud-controller-0 ansible will use
undercloud ad a ProxyHost
With this setup in place is then possible to launch the playbook:
$ ansible-playbook -vvvv /path/to/tripleo-quickstart-utils/playbooks/overcloud-instance-ha.yml -e release=newton
Example Playbook
----------------
The main playbook contains STONITH config role as first thing, since it is a
pre requisite, and the instance-ha role itself:
---
- name: Configure STONITH for all the hosts on the overcloud
hosts: undercloud
gather_facts: no
roles:
- stonith-config
- name: Configure Instance HA
hosts: undercloud
gather_facts: no
roles:
- instance-ha
License
-------
GPL
Author Information
------------------
Raoul Scarazzini <rasca@redhat.com>

View File

@ -0,0 +1,4 @@
---
overcloud_working_dir: "/home/heat-admin"
working_dir: "/home/stack"

View File

@ -0,0 +1,288 @@
---
- name: Disable openstack-nova-compute on compute
service:
name: openstack-nova-compute
state: stopped
enabled: no
become: yes
delegate_to: "{{ item }}"
with_items:
- "{{ groups['compute'] }}"
- name: Disable neutron-openvswitch-agent on compute
service:
name: neutron-openvswitch-agent
state: stopped
enabled: no
become: yes
delegate_to: "{{ item }}"
with_items:
- "{{ groups['compute'] }}"
when: release == 'liberty' or release == 'mitaka'
- name: Disable neutron-openvswitch-agent on compute
service:
name: openstack-ceilometer-compute
state: stopped
enabled: no
become: yes
delegate_to: "{{ item }}"
with_items:
- "{{ groups['compute'] }}"
when: release == 'liberty' or release == 'mitaka'
- name: Disable libvirtd on compute
become: yes
service:
name: libvirtd
state: stopped
enabled: no
delegate_to: "{{ item }}"
with_items:
- "{{ groups['compute'] }}"
when: release == 'liberty' or release == 'mitaka'
- name: Generate authkey for remote pacemaker
shell: >
dd if=/dev/urandom of="/tmp/authkey" bs=4096 count=1
delegate_to: localhost
- name: Make sure pacemaker config dir exists
become: yes
file:
path: /etc/pacemaker
state: directory
mode: 0750
group: "haclient"
delegate_to: "{{ item }}"
with_items:
- "{{ groups['controller'] }}"
- "{{ groups['compute'] }}"
- name: Copy autkey on all the overcloud nodes
become: yes
copy:
src: /tmp/authkey
dest: /etc/pacemaker/authkey
mode: 0640
group: "haclient"
delegate_to: "{{ item }}"
with_items:
- "{{ groups['controller'] }}"
- "{{ groups['compute'] }}"
- name: Remove authkey from local dir
file:
path: /tmp/authkey
state: absent
delegate_to: localhost
- name: Enable iptables traffic for pacemaker_remote
become: yes
shell: >
iptables -I INPUT -p tcp --dport 3121 -j ACCEPT;
/sbin/service iptables save
delegate_to: "{{ item }}"
with_items:
- "{{ groups['controller'] }}"
- "{{ groups['compute'] }}"
- name: Start pacemaker remote service on compute nodes
become: yes
service:
name: pacemaker_remote
enabled: yes
state: started
delegate_to: "{{ item }}"
with_items:
- "{{ groups['compute'] }}"
- name: Get the overcloudrc file
shell: >
cat {{ working_dir }}/overcloudrc
register: overcloudrc
- name: Copy overcloudrc file on overcloud-controller-0
lineinfile:
destfile: "{{ overcloud_working_dir }}/overcloudrc"
line: "{{ overcloudrc.stdout }}"
create: yes
mode: 0644
delegate_to: overcloud-controller-0
- name: Get environment vars from overcloudrc
delegate_to: "overcloud-controller-0"
shell: >
grep OS_USERNAME {{ overcloud_working_dir }}/overcloudrc | sed 's/export OS_USERNAME=//g'
register: "OS_USERNAME"
- name: Get environment vars from overcloudrc
delegate_to: "overcloud-controller-0"
shell: >
grep OS_PASSWORD {{ overcloud_working_dir }}/overcloudrc | sed 's/export OS_PASSWORD=//g'
register: "OS_PASSWORD"
- name: Get environment vars from overcloudrc
delegate_to: "overcloud-controller-0"
shell: >
grep OS_AUTH_URL {{ overcloud_working_dir }}/overcloudrc | sed 's/export OS_AUTH_URL=//g'
register: "OS_AUTH_URL"
- name: Get environment vars from overcloudrc
delegate_to: "overcloud-controller-0"
shell: >
grep -E 'OS_PROJECT_NAME|OS_TENANT_NAME' {{ overcloud_working_dir }}/overcloudrc | sed 's/export OS_.*_NAME=//g'
register: "OS_TENANT_NAME"
- block:
- name: Create resource nova-evacuate
shell: >
pcs resource create nova-evacuate ocf:openstack:NovaEvacuate auth_url=$OS_AUTH_URL username=$OS_USERNAME password=$OS_PASSWORD tenant_name=$OS_TENANT_NAME no_shared_storage=1
- name: Create pacemaker constraints to start VIP resources before nova-evacuate
shell: |
for i in $(pcs status | grep IP | awk '{ print $1 }')
do pcs constraint order start $i then nova-evacuate
done
- name: Create pacemaker constraints to start openstack services before nova-evacuate
shell: "pcs constraint order start {{ item }} then nova-evacuate require-all=false"
with_items:
- openstack-glance-api-clone
- neutron-metadata-agent-clone
- openstack-nova-conductor-clone
when: release == 'liberty' or release == 'mitaka'
- name: Disable keystone resource
shell: "pcs resource disable openstack-keystone --wait=900"
when: release == 'liberty'
# Keystone resource was replaced by openstack-core resource in RHOS9
- name: Disable openstack-core resource
shell: "pcs resource disable openstack-core --wait=900"
when: release == 'mitaka'
- name: Set controller pacemaker property on controllers
shell: "pcs property set --node {{ item }} osprole=controller"
with_items: "{{ groups['controller'] }}"
- name: Get stonith devices
shell: "pcs stonith | awk '{print $1}' | tr '\n' ' '"
register: stonithdevs
- name: Setup stonith devices
shell: |
for i in $(sudo cibadmin -Q --xpath //primitive --node-path | awk -F "id='" '{print $2}' | awk -F "'" '{print $1}' | uniq); do
found=0
if [ -n "{{ stonithdevs.stdout }}" ]; then
for x in {{ stonithdevs.stdout }}; do
if [ "$x" == "$i" ]; then
found=1
fi
done
fi
if [ $found = 0 ]; then
sudo pcs constraint location $i rule resource-discovery=exclusive score=0 osprole eq controller
fi
done
- name: Create compute pacemaker resources and constraints
shell: |
pcs resource create nova-compute-checkevacuate ocf:openstack:nova-compute-wait auth_url=$OS_AUTH_URL username=$OS_USERNAME password=$OS_PASSWORD tenant_name=$OS_TENANT_NAME domain=localdomain op start timeout=300 --clone interleave=true --disabled --force
pcs constraint location nova-compute-checkevacuate-clone rule resource-discovery=exclusive score=0 osprole eq compute
pcs resource create nova-compute systemd:openstack-nova-compute op start timeout=60s --clone interleave=true --disabled --force
pcs constraint location nova-compute-clone rule resource-discovery=exclusive score=0 osprole eq compute
pcs constraint order start nova-compute-checkevacuate-clone then nova-compute-clone require-all=true
pcs constraint order start nova-compute-clone then nova-evacuate require-all=false
- name: Create compute pacemaker resources and constraints
shell: |
pcs resource create neutron-openvswitch-agent-compute systemd:neutron-openvswitch-agent --clone interleave=true --disabled --force
pcs constraint location neutron-openvswitch-agent-compute-clone rule resource-discovery=exclusive score=0 osprole eq compute
pcs resource create libvirtd-compute systemd:libvirtd --clone interleave=true --disabled --force
pcs constraint location libvirtd-compute-clone rule resource-discovery=exclusive score=0 osprole eq compute
pcs constraint order start neutron-openvswitch-agent-compute-clone then libvirtd-compute-clone
pcs constraint colocation add libvirtd-compute-clone with neutron-openvswitch-agent-compute-clone
pcs resource create ceilometer-compute systemd:openstack-ceilometer-compute --clone interleave=true --disabled --force
pcs constraint location ceilometer-compute-clone rule resource-discovery=exclusive score=0 osprole eq compute
pcs constraint order start libvirtd-compute-clone then ceilometer-compute-clone
pcs constraint colocation add ceilometer-compute-clone with libvirtd-compute-clone
pcs constraint order start libvirtd-compute-clone then nova-compute-clone
pcs constraint colocation add nova-compute-clone with libvirtd-compute-clone
when: release == 'liberty' or release == 'mitaka'
- name: Create pacemaker constraint for neutron-server, nova-conductor and ceilometer-notification
shell: |
pcs constraint order start neutron-server-clone then neutron-openvswitch-agent-compute-clone require-all=false
pcs constraint order start openstack-ceilometer-notification-clone then ceilometer-compute-clone require-all=false
pcs constraint order start openstack-nova-conductor-clone then nova-compute-checkevacuate-clone require-all=false
when: release == 'liberty' or release == 'mitaka'
- name: Check if ipmi exists for all compute nodes
shell: |
sudo pcs stonith show ipmilan-{{ item }}
with_items: "{{ groups['compute'] }}"
- name: Create fence-nova pacemaker resource
shell: "pcs stonith create fence-nova fence_compute auth-url=$OS_AUTH_URL login=$OS_USERNAME passwd=$OS_PASSWORD tenant-name=$OS_TENANT_NAME domain=localdomain record-only=1 no-shared-storage=False --force"
- name: Create pacemaker constraint for fence-nova to fix it on controller node and set resource-discovery never
shell: "pcs constraint location fence-nova rule resource-discovery=never score=0 osprole eq controller"
- name: Create pacemaker constraint for fence-nova to start after galera
shell: "pcs constraint order promote galera-master then fence-nova require-all=false"
- name: Create nova-compute order constraint on fence-nova
shell: "pcs constraint order start fence-nova then nova-compute-clone"
- name: Set cluster recheck interval to 1 minute
shell: "pcs property set cluster-recheck-interval=1min"
- name: Create pacemaker remote resource on compute nodes
shell: "pcs resource create {{ item }} ocf:pacemaker:remote reconnect_interval=240 op monitor interval=20"
with_items: "{{ groups['compute'] }}"
- name: Set osprole for compute nodes
shell: "pcs property set --node {{ item }} osprole=compute"
with_items: "{{ groups['compute'] }}"
- name: Add pacemaker stonith devices of compute nodes to level 1
shell: "pcs stonith level add 1 {{ item }} ipmilan-{{ item }},fence-nova"
with_items: "{{ groups['compute'] }}"
- name: Enable keystone resource
shell: "pcs resource enable openstack-keystone"
when: release == 'liberty' or release == 'mitaka'
- name: Enable openstack-core resource
shell: "pcs resource enable openstack-core"
when: release == 'mitaka'
- name: Wait for httpd service to be started
shell: "systemctl show httpd --property=ActiveState"
register: httpd_status_result
until: httpd_status_result.stdout.find('inactive') == -1 and httpd_status_result.stdout.find('activating') == -1
retries: 30
delay: 10
when: release != 'liberty' and release != 'mitaka'
- name: Enable compute resources
shell: "pcs resource enable {{ item }}"
with_items:
- nova-compute-checkevacuate
- nova-compute
- name: Enable compute resources
shell: "pcs resource enable {{ item }}"
with_items:
- neutron-openvswitch-agent-compute
- libvirtd-compute
- ceilometer-compute
when: release == 'liberty' or release == 'mitaka'
environment:
OS_USERNAME: "{{ OS_USERNAME.stdout }}"
OS_PASSWORD: "{{ OS_PASSWORD.stdout }}"
OS_AUTH_URL: "{{ OS_AUTH_URL.stdout }}"
OS_TENANT_NAME: "{{ OS_TENANT_NAME.stdout }}"
become: yes
delegate_to: "overcloud-controller-0"