09cda5ec9c
Since the coreos config drive issue has been solved elsewhere we shall re-use the line to improve our logging so we can have useful configdrive logs. Change-Id: I0e044121e146f3036aa6b1830983dd9d7a68e9ac Depends-On: I76a2a23e2a75022bae3511700c81145b5cbeae77
153 lines
6.0 KiB
ReStructuredText
153 lines
6.0 KiB
ReStructuredText
===============
|
|
Troubleshooting
|
|
===============
|
|
***********
|
|
Firewalling
|
|
***********
|
|
|
|
Due to the nature of firewall settings and customizations, Bifrost does
|
|
**not** change any local firewalling on the node. Users must ensure that
|
|
their firewalling for the node running bifrost is such that the nodes that
|
|
are being booted can connect to the following ports::
|
|
|
|
67/UDP for DHCP requests to be serviced
|
|
69/UDP for TFTP file transfers (Initial iPXE binary)
|
|
6301/TCP for the Ironic API
|
|
8080/TCP for HTTP File Downloads (iPXE, Ironic-Python-Agent)
|
|
|
|
If you encounter any additional issues, use of tcpdump is highly recommended
|
|
while attempting to deploy a single node in order to capture and review the
|
|
traffic exchange between the two nodes.
|
|
|
|
*****************
|
|
NodeLocked Errors
|
|
*****************
|
|
|
|
This is due to node status checking thread in Ironic, which is a locking
|
|
action as it utilizes IPMI. The best course of action is to retry the
|
|
operation. If this is occurring with a high frequency, tuning might be
|
|
required.
|
|
|
|
Example error:
|
|
|
|
| NodeLocked: Node 00000000-0000-0000-0000-046ebb96ec21 is locked by
|
|
host $HOSTNAME, please retry after the current operation is completed.
|
|
|
|
*********************************************
|
|
Unexpected/Unknown failure with the IPA Agent
|
|
*********************************************
|
|
|
|
New image appears not to be deploying
|
|
=====================================
|
|
|
|
When deploying a new image with the same previous name, it is necessary to
|
|
purge the contents of the TFTP master_images folder which caches the image
|
|
file for deployments. The default location for this folder is
|
|
``/tftpboot/master_images``.
|
|
|
|
Additionally, a playbook has been included that can be used prior to a
|
|
re-installation to ensure fresh images are deployed. This playbook can
|
|
be found at playbooks/cleanup-deployment-images.yaml
|
|
|
|
Building an IPA image
|
|
=====================
|
|
|
|
Troubleshooting issues involving IPA can be time consuming. The IPA
|
|
developers **HIGHLY** recommend that users build their own custom IPA
|
|
images in order to inject things such as SSH keys, and turn on agent
|
|
debugging which must be done in a custom image as there is no mechanism
|
|
to enable debugging via the kernel command line at present.
|
|
|
|
IPA's instructions on building a custom image can be found at:
|
|
http://git.openstack.org/cgit/openstack/ironic-python-agent/tree/imagebuild/coreos/README.rst
|
|
|
|
This essentially boils down to the following steps:
|
|
|
|
1. `git clone https://git.openstack.org/openstack/ironic-python-agent`
|
|
2. `cd ironic-python-agent`
|
|
3. `pip install -r ./requirements.txt`
|
|
4. If you don't already have docker installed, execute:
|
|
`sudo apt-get install docker docker.io`
|
|
5. `cd imagebuild/coreos`
|
|
6. Edit oem/cloudconfig.yml and add "--debug" to the end of the ExecStart
|
|
setting for the ironic-python-agent.service unit.
|
|
7. Execute `make` to complete the build process.
|
|
|
|
Once your build is completed, you will need to copy the images files written
|
|
to the UPLOAD folder, into the /httpboot folder. If your utilizing the
|
|
default file names, executing `cp UPLOAD/* /httpboot/` should achieve this.
|
|
|
|
Since you have updated the image to be deployed, you will need to purge the
|
|
contents of /tftpboot/master_images for the new image to be utilized for the
|
|
deployment process.
|
|
|
|
Obtaining IPA logs via the console
|
|
==================================
|
|
|
|
1) By default, Bifrost sets the agent journal to be logged to the system
|
|
console. Due to the variation in hardware, you may need to tune the
|
|
parameters passed to the deployment ramdisk. This can be done, as shown
|
|
below in ironic.conf::
|
|
|
|
agent_pxe_append_params=nofb nomodeset vga=normal console=ttyS0 systemd.journald.forward_to_console=yes
|
|
|
|
Parameters will vary by your hardware type and configuration, however the
|
|
systemd.journald.forward_to_console=yes setting is a default, and will only
|
|
work for systemd based IPA images such as the default CoreOS image.
|
|
|
|
The example above, effectively disables all attempts by the Kernel to set
|
|
the video mode, defines the console as ttyS0 or the first serial port, and
|
|
instructs systemd to direct logs to the console.
|
|
|
|
2) Once set, restart the ironic-conductor service, e.g.
|
|
`service ironic-conductor restart` and attempt to redeploy the node.
|
|
You will want to view the system console occurring. If possible, you
|
|
may wish to use ipmitool and write the output to a log file.
|
|
|
|
Gaining access via SSH to the node running IPA
|
|
==============================================
|
|
|
|
If you wish to SSH into the node in order to perform any sort of post-mortem,
|
|
you will need to do the following:
|
|
|
|
1) Set an sshkey="ssh-rsa AAAA....." value on the agent_pxe_append_params
|
|
setting in /etc/ironic/ironic.conf
|
|
|
|
2) You will need to short circuit the ironic conductor process. An ideal
|
|
place to do so is in ``ironic/drivers/modules/agent.py`` in the
|
|
reboot_to_instance method. Temporarily short circuiting this step
|
|
will force you to edit the MySQL database if you wish to re-deploy
|
|
the node, but the node should stay online after IPA has completed
|
|
deployment.
|
|
|
|
3) ssh -l core <ip-address-of-node>
|
|
|
|
********************************
|
|
ssh_public_key_path is not valid
|
|
********************************
|
|
|
|
Bifrost requires that the user who executes bifrost have an SSH key in
|
|
their user home, or that the user defines a variable to tell bifrost where
|
|
to identify this file. Once this variable is defined to a valid file, the
|
|
deployment playbook can be re-run.
|
|
|
|
Generating a new ssh key
|
|
========================
|
|
|
|
See the manual page for the ssh-keygen command.
|
|
|
|
Defining a specific public key file
|
|
===================================
|
|
|
|
A user can define a specific public key file by utilizing the
|
|
ssh_public_key_path variable. This can be set in the
|
|
group_vars/inventory/all file, or on the ansible-playbook command
|
|
line utilizing the -e command line parameter.
|
|
|
|
Example::
|
|
|
|
ansible-playbook -i inventory/bifrost_inventory.py deploy-dynamic.yaml -e ssh_public_key_path=~/path/to/public/key/id_rsa.pub
|
|
|
|
NOTE: The matching private key will need to be utilized to login to the
|
|
machine deployed.
|