854eb3de54
Change-Id: I94709d5b1db72121c27dcc60725dbf58de21b490 Signed-off-by: Suzana Fernandes <Suzana.Fernandes@windriver.com>
165 lines
5.1 KiB
ReStructuredText
165 lines
5.1 KiB
ReStructuredText
|
|
.. ley1552581824091
|
|
.. _troubleshooting-log-collection:
|
|
|
|
===========================
|
|
Troubleshoot Log Collection
|
|
===========================
|
|
|
|
The |prod| log collection tool gathers detailed information.
|
|
|
|
.. contents::
|
|
:local:
|
|
:depth: 1
|
|
|
|
.. _troubleshooting-log-collection-section-N10061-N1001C-N10001:
|
|
|
|
------------------------------
|
|
Collect Tool Caveats and Usage
|
|
------------------------------
|
|
|
|
.. _troubleshooting-log-collection-ul-dpj-bxp-jdb:
|
|
|
|
- Log in via SSH or local console on the active controller and use the
|
|
:command:`collect` command.
|
|
|
|
.. note::
|
|
The user must have sudo capability and be in the ``sys_protected`` group
|
|
to use the ``collect`` tool.
|
|
|
|
|
|
- All usage options can be found by using the following command:
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect --help
|
|
|
|
- For |prod| Simplex or Duplex systems, use the following command:
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect --all
|
|
|
|
- For |prod| Standard systems, use the following commands:
|
|
|
|
|
|
- For a small deployment (less than two worker nodes):
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect --all
|
|
|
|
You can also use the short form ``-a`` for this option.
|
|
|
|
.. note::
|
|
Hosts or subclouds explicitly added with the ``--all`` option will
|
|
be ignored.
|
|
|
|
- For large deployments:
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect host1 host2 host3
|
|
|
|
Or you can use the ``--list`` option. This syntax is deprecated.
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect --list host1 host2 host3
|
|
|
|
You can also use the short form ``-l`` for this option.
|
|
|
|
.. note::
|
|
Systems and subclouds are collected in parallel to reduce the
|
|
overall collection time. Use the ``--inline`` (or ``-in``) option
|
|
to collect serially. ``--inline`` can be combined with the
|
|
``--all`` option.
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect --all [--timeout | -t] <minutes>
|
|
|
|
.. note::
|
|
|
|
For large deployments, the default timeout value (20 minutes) may
|
|
need to be increased by using the ``--timeout`` (``-t``) option.
|
|
|
|
The timeout for collecting from the local host, the host that collect
|
|
is run from, does adopt the global timeout.
|
|
|
|
To fix that, run ``collect`` with an extended ``--timeout`` locally on
|
|
the host that is experiencing the timeout. That way the global timeout
|
|
applies.
|
|
|
|
Optionally, you can modify the default ``COLLECT_HOST_TIMEOUT_DEFAULT``
|
|
value in the ``/etc/collect/collect_timeouts`` file. That requires
|
|
``sudo`` command and no processes need to be restarted after the
|
|
change. All subsequent collects will adopt the new values in that file.
|
|
|
|
|
|
- For subcloud deployments:
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect --subcloud subcloud1 subcloud2 subcloud3
|
|
|
|
You can also use the short form ``-sc`` for this option. The
|
|
``--subcloud`` and ``--all`` options can be combined.
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect --all --subcloud
|
|
|
|
|
|
.. note::
|
|
The ``--all`` (``-a``) option is not recommended with large subcloud
|
|
deployments due to disk storage requirements.
|
|
|
|
|
|
- For systems with an up-time of more than 2 months, use the date range
|
|
options. The default behavior is to collect one month of logs.
|
|
|
|
Use ``--start-date`` for the collection of logs on and after a given date:
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect [--start-date | -s] <YYYYMMDD>
|
|
|
|
Use ``--end-date`` for the collection of logs on and before a given date:
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect [--end-date | -s] <YYYYMMDD>
|
|
|
|
- To prefix the collect tar ball name and easily identify the
|
|
:command:`collect` when several are present, use the following command.
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect [--name | -n] <prefix>
|
|
|
|
For example, the following prepends **TEST1** to the name of the tarball:
|
|
|
|
.. code-block:: none
|
|
|
|
(keystone_admin)$ collect --name TEST1
|
|
[sudo] password for sysadmin:
|
|
collecting data from 1 host(s): controller-0
|
|
collecting controller-0_20200316.155805 ... done (00:01:39 56M)
|
|
creating user-named tarball /scratch/TEST1_20200316.155805.tar ... done (00:01:39 56M)
|
|
|
|
- Prior to using the :command:`collect` command, the nodes need to be
|
|
unlocked-enabled or disabled online and are required to be unlocked at
|
|
least once.
|
|
|
|
- Lock the node and wait for the node to reach the disabled-online state
|
|
before collecting logs for a node that is rebooting indefinitely.
|
|
|
|
- You may be required to run the local :command:`collect` command if the
|
|
collect tool running from the active controller node fails to collect logs
|
|
from one of the system nodes. Execute the :command:`collect` command using
|
|
the console or |BMC| connection on the node that displays the failure.
|
|
|
|
.. only:: partner
|
|
|
|
.. include:: /_includes/troubleshooting-log-collection.rest |