00df0890f0
This patch is documenting how operators can configure Ironic to be able to retrieve the logs from the deploy ramdisk (or disable it). Closes-Bug: #1587143 Change-Id: I233e925f4dd9a1aa04a722eb852a6f95c74603f2
245 lines
9.0 KiB
ReStructuredText
245 lines
9.0 KiB
ReStructuredText
.. _troubleshooting:
|
|
|
|
======================
|
|
Troubleshooting Ironic
|
|
======================
|
|
|
|
Nova returns "No valid host was found" Error
|
|
============================================
|
|
|
|
Sometimes Nova Conductor log file "nova-conductor.log" or a message returned
|
|
from Nova API contains the following error::
|
|
|
|
NoValidHost: No valid host was found. There are not enough hosts available.
|
|
|
|
"No valid host was found" means that the Nova Scheduler could not find a bare
|
|
metal node suitable for booting the new instance.
|
|
|
|
This in turn usually means some mismatch between resources that Nova expects
|
|
to find and resources that Ironic advertised to Nova.
|
|
|
|
A few things should be checked in this case:
|
|
|
|
#. Inspection should have succeeded for you before, or you should have
|
|
entered the required Ironic node properties manually. For each node with
|
|
available state in ``ironic node-list --provision-state available`` use
|
|
::
|
|
|
|
ironic node-show <IRONIC-NODE-UUID>
|
|
|
|
and make sure that ``properties`` JSON field has valid values for keys
|
|
``cpus``, ``cpu_arch``, ``memory_mb`` and ``local_gb``.
|
|
|
|
#. The Nova flavor that you are using does not match any properties of the
|
|
available Ironic nodes. Use
|
|
::
|
|
|
|
nova flavor-show <FLAVOR NAME>
|
|
|
|
to compare. If you're using exact match filters in Nova Scheduler, please
|
|
make sure the flavor and the node properties match exactly. Regarding
|
|
the extra specs in flavor, you should make sure they map to
|
|
``node.properties['capabilities']``.
|
|
|
|
#. Make sure that enough nodes are in ``available`` state according to
|
|
``ironic node-list --provision-state available``.
|
|
|
|
#. Make sure nodes you're going to deploy to are not in maintenance mode.
|
|
Again, use ``ironic node-list`` to check. A node automatically going to
|
|
maintenance mode usually means wrong power credentials for this node. Check
|
|
them and then remove maintenance mode::
|
|
|
|
ironic node-set-maintenance <IRONIC-NODE-UUID> off
|
|
|
|
#. After making changes to nodes in Ironic, it takes time for those changes
|
|
to propagate from Ironic to Nova.
|
|
Check that
|
|
::
|
|
|
|
nova hypervisor-stats
|
|
|
|
correctly shows total amount of resources in your system. You can also
|
|
check ``nova hypervisor-list`` to see the status of individual Ironic
|
|
nodes as reported to Nova. And you can correlate the Nova "hypervisor
|
|
hostname" to the Ironic node UUID.
|
|
|
|
#. If none of the above helped, check Ironic conductor log carefully to see
|
|
if there are any conductor-related errors which are the root cause for
|
|
"No valid host was found". If there are any "Error in deploy of node
|
|
<IRONIC-NODE-UUID>: [Errno 28] ..." error messages in Ironic conductor
|
|
log, it means the conductor run into a special error during deployment.
|
|
So you can check the log carefully to fix or work around and then try
|
|
again.
|
|
|
|
Patching the Deploy Ramdisk
|
|
===========================
|
|
|
|
When debugging a problem with deployment and/or inspection you may want to
|
|
quickly apply a change to the ramdisk to see if it helps. Of course you can
|
|
inject your code and/or SSH keys during the ramdisk build (depends on how
|
|
exactly you've built your ramdisk). But it's also possible to quickly modify
|
|
an already built ramdisk.
|
|
|
|
Create an empty directory and unpack the ramdisk content there::
|
|
|
|
mkdir unpack
|
|
cd unpack
|
|
gzip -dc /path/to/the/ramdisk | cpio -id
|
|
|
|
The last command will result in the whole Linux file system tree unpacked in
|
|
the current directory. Now you can modify any files you want. The actual
|
|
location of the files will depend on the way you've built the ramdisk.
|
|
|
|
After you've done the modifications, pack the whole content of the current
|
|
directory back::
|
|
|
|
find . | cpio -H newc -o > /path/to/the/new/ramdisk
|
|
|
|
.. note:: You don't need to modify the kernel (e.g.
|
|
``tinyipa-master.vmlinuz``), only the ramdisk part.
|
|
|
|
.. note:: For CoreOS-based ramdisk you also need to unpack and pack back the
|
|
squashfs archive inside the unpacked ramdisk.
|
|
|
|
API Errors
|
|
==========
|
|
|
|
The `debug_tracebacks_in_api` config option may be set to return tracebacks
|
|
in the API response for all 4xx and 5xx errors.
|
|
|
|
Retrieving logs from the deploy ramdisk
|
|
=======================================
|
|
|
|
When troubleshooting deployments (specially in case of a deploy failure)
|
|
it's important to have access to the deploy ramdisk logs to be able to
|
|
identify the source of the problem. By default, Ironic will retrieve the
|
|
logs from the deploy ramdisk when the deployment fails and save it on the
|
|
local filesystem at ``/var/log/ironic/deploy``.
|
|
|
|
To change this behavior, operators can make the following changes to
|
|
``/etc/ironic/ironic.conf`` under the ``[agent]`` group:
|
|
|
|
* ``deploy_logs_collect``: Whether Ironic should collect the deployment
|
|
logs on deployment. Valid values for this option are:
|
|
|
|
* ``on_failure`` (**default**): Retrieve the deployment logs upon a
|
|
deployment failure.
|
|
|
|
* ``always``: Always retrieve the deployment logs, even if the
|
|
deployment succeed.
|
|
|
|
* ``never``: Disable retrieving the deployment logs.
|
|
|
|
* ``deploy_logs_storage_backend``: The name of the storage backend where
|
|
the logs will be stored. Valid values for this option are:
|
|
|
|
* ``local`` (**default**): Store the logs in the local filesystem.
|
|
|
|
* ``swift``: Store the logs in Swift.
|
|
|
|
* ``deploy_logs_local_path``: The path to the directory where the
|
|
logs should be stored, used when the ``deploy_logs_storage_backend``
|
|
is configured to ``local``. By default logs will be stored at
|
|
**/var/log/ironic/deploy**.
|
|
|
|
* ``deploy_logs_swift_container``: The name of the Swift container to
|
|
store the logs, used when the deploy_logs_storage_backend is configured to
|
|
"swift". By default **ironic_deploy_logs_container**.
|
|
|
|
* ``deploy_logs_swift_days_to_expire``: Number of days before a log object
|
|
is marked as expired in Swift. If None, the logs will be kept forever
|
|
or until manually deleted. Used when the deploy_logs_storage_backend is
|
|
configured to "swift". By default **30** days.
|
|
|
|
When the logs are collected, Ironic will store a *tar.gz* file containing
|
|
all the logs according to the ``deploy_logs_storage_backend``
|
|
configuration option. All log objects will be named with the following
|
|
pattern::
|
|
|
|
<node-uuid>[_<instance-uuid>]_<timestamp yyyy-mm-dd-hh:mm:ss>.tar.gz
|
|
|
|
.. note::
|
|
The *instance_uuid* field is not required for deploying a node when
|
|
Ironic is configured to be used in standalone mode. If present it
|
|
will be appended to the name.
|
|
|
|
|
|
Accessing the log data
|
|
----------------------
|
|
|
|
When storing in the local filesystem
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
When storing the logs in the local filesystem, the log files can
|
|
be found at the path configured in the ``deploy_logs_local_path``
|
|
configuration option. For example, to find the logs from the node
|
|
``5e9258c4-cfda-40b6-86e2-e192f523d668``:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ ls /var/log/ironic/deploy | grep 5e9258c4-cfda-40b6-86e2-e192f523d668
|
|
5e9258c4-cfda-40b6-86e2-e192f523d668_88595d8a-6725-4471-8cd5-c0f3106b6898_2016-08-08-13:52:12.tar.gz
|
|
5e9258c4-cfda-40b6-86e2-e192f523d668_db87f2c5-7a9a-48c2-9a76-604287257c1b_2016-08-08-14:07:25.tar.gz
|
|
|
|
.. note::
|
|
When saving the logs to the filesystem, operators may want to enable
|
|
some form of rotation for the logs to avoid disk space problems.
|
|
|
|
|
|
When storing in Swift
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
When using Swift, operators can associate the objects in the
|
|
container with the nodes in Ironic and search for the logs for the node
|
|
``5e9258c4-cfda-40b6-86e2-e192f523d668`` using the **prefix** parameter.
|
|
For example:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ swift list ironic_deploy_logs_container -p 5e9258c4-cfda-40b6-86e2-e192f523d668
|
|
5e9258c4-cfda-40b6-86e2-e192f523d668_88595d8a-6725-4471-8cd5-c0f3106b6898_2016-08-08-13:52:12.tar.gz
|
|
5e9258c4-cfda-40b6-86e2-e192f523d668_db87f2c5-7a9a-48c2-9a76-604287257c1b_2016-08-08-14:07:25.tar.gz
|
|
|
|
To download a specific log from Swift, do:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ swift download ironic_deploy_logs_container "5e9258c4-cfda-40b6-86e2-e192f523d668_db87f2c5-7a9a-48c2-9a76-604287257c1b_2016-08-08-14:07:25.tar.gz"
|
|
5e9258c4-cfda-40b6-86e2-e192f523d668_db87f2c5-7a9a-48c2-9a76-604287257c1b_2016-08-08-14:07:25.tar.gz [auth 0.341s, headers 0.391s, total 0.391s, 0.531 MB/s]
|
|
|
|
The contents of the log file
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The log is just a ``.tar.gz`` file that can be extracted as:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ tar xvf <file path>
|
|
|
|
|
|
The contents of the file may differ slightly depending on the distribution
|
|
that the deploy ramdisk is using:
|
|
|
|
* For distributions using ``systemd`` there will be a file called
|
|
**journal** which contains all the system logs collected via the
|
|
``journalctl`` command.
|
|
|
|
* For other distributions, the ramdisk will collect all the contents of
|
|
the ``/var/log`` directory.
|
|
|
|
For all distributions, the log file will also contain the output of
|
|
the following commands (if present): ``ps``, ``df``, ``ip addr`` and
|
|
``iptables``.
|
|
|
|
Here's one example when extracting the content of a log file for a
|
|
distribution that uses ``systemd``:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ tar xvf 5e9258c4-cfda-40b6-86e2-e192f523d668_88595d8a-6725-4471-8cd5-c0f3106b6898_2016-08-08-13:52:12.tar.gz
|
|
df
|
|
ps
|
|
journal
|
|
ip_addr
|
|
iptables
|