tripleo-docs/doc/source/post_deployment/replace_controller.rst
Tomas Sedovic 5d3f1e05a7 Remove references to Tuskar
This moves Tuskar under a new "Deprecated" section in the components
page and removes all other references throughout the docs.

Change-Id: I1c7f2034fcce5438ffa99b33cec6fde3ef5c9199
2016-04-28 15:15:06 +00:00

103 lines
3.3 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

.. _replace_controller:
Replacing a Controller Node
===========================
Replace Failed Node
-------------------
First, create a YAML file defining node index to remove. Node index reflects
suffix of instance name in `nova list` output. In this following text
`overcloud-controller-1` node is replaced with `overcloud-controller-3`::
$ cat ~/remove.yaml
parameter_defaults:
  ControllerRemovalPolicies:
      [{'resource_list': ['1']}]
Then, re-deploy overcloud including the extra environment file::
openstack overcloud deploy --templates --control-scale 3 -e ~/remove.yaml
The old node will be removed and a new one will be added.
Because some puppet modules don't support nodes replacement, re-deployment
fails and a couple of manual changes are needed to fix controller nodes setup.
Connect to one of remaining controller nodes and delete the failed node
from Pacemaker/Corosync cluster::
crm_node -R overcloud-controller-1 --force
Delete the failed node from RabbitMQ cluster::
rabbitmqctl forget_cluster_node rabbit@overcloud-controller-1
Delete the failed node from MongoDB::
# connect to MongoDB on any of remaining nodes:
mongo --host <node ip>
# check status of MongoDB cluster:
rs.status()
# remove the failed node:
rs.remove('<node_ip>:27017')
Update list of nodes in Galera cluster::
pcs resource update galera wsrep_cluster_address=gcomm://overcloud-controller-0,overcloud-controller-3,overcloud-controller-2
Start Pacemaker/Corosync on the new node::
pcs cluster node add overcloud-controller-3
pcs cluster start overcloud-controller-3
Enable keystone service on the new node::
copy /etc/keystone from a remaining node to the new node
set admin_bind_host and public_bind_host in /etc/keystone/keystone.conf to node's IP
pcs resource cleanup openstack-keystone-clone overcloud-controller-3
Re-deploy overcloud again::
openstack overcloud deploy --templates --control-scale 3
.. note::
If deployment fails with error `Failed to call refresh: Could not restart Service[httpd]`
then try re-deploy again.
Completing Update
-----------------
Delete the failed node from `/etc/corosync/corosync.conf` file and restart
Corosync one by one on each node::
systemctl restart corosync
When re-deployment finishes, connect to one of controller nodes and start
services on the new node::
pcs resource cleanup neutron-server-clone
pcs resource cleanup openstack-nova-api-clone
pcs resource cleanup openstack-nova-consoleauth-clone
pcs resource cleanup openstack-heat-engine-clone
pcs resource cleanup openstack-cinder-api-clone
pcs resource cleanup openstack-glance-registry-clone
pcs resource cleanup httpd-clone
Replacing Bootstrap Node
------------------------
If node with index 0 is being replaced it's necessary to edit heat templates
and change bootstrap node index before starting replacement. Open
`overcloud.yaml` file in root directory of heat templates and
change lines::
bootstrap_nodeid: {get_attr: [Controller, resource.0.hostname]}
bootstrap_nodeid_ip: {get_attr: [Controller, resource.0.ip_address]}
to::
bootstrap_nodeid: {get_attr: [Controller, resource.1.hostname]}
bootstrap_nodeid_ip: {get_attr: [Controller, resource.1.ip_address]}