docs/doc/source/fault-mgmt/kubernetes/troubleshooting-log-collection.rst
Suzana Fernandes 854eb3de54 Update for collect tool enhancement to support any LDAP or WAD user account
Change-Id: I94709d5b1db72121c27dcc60725dbf58de21b490
Signed-off-by: Suzana Fernandes <Suzana.Fernandes@windriver.com>
2024-11-18 14:14:20 +00:00

165 lines
5.1 KiB
ReStructuredText

.. ley1552581824091
.. _troubleshooting-log-collection:
===========================
Troubleshoot Log Collection
===========================
The |prod| log collection tool gathers detailed information.
.. contents::
:local:
:depth: 1
.. _troubleshooting-log-collection-section-N10061-N1001C-N10001:
------------------------------
Collect Tool Caveats and Usage
------------------------------
.. _troubleshooting-log-collection-ul-dpj-bxp-jdb:
- Log in via SSH or local console on the active controller and use the
:command:`collect` command.
.. note::
The user must have sudo capability and be in the ``sys_protected`` group
to use the ``collect`` tool.
- All usage options can be found by using the following command:
.. code-block:: none
(keystone_admin)$ collect --help
- For |prod| Simplex or Duplex systems, use the following command:
.. code-block:: none
(keystone_admin)$ collect --all
- For |prod| Standard systems, use the following commands:
- For a small deployment (less than two worker nodes):
.. code-block:: none
(keystone_admin)$ collect --all
You can also use the short form ``-a`` for this option.
.. note::
Hosts or subclouds explicitly added with the ``--all`` option will
be ignored.
- For large deployments:
.. code-block:: none
(keystone_admin)$ collect host1 host2 host3
Or you can use the ``--list`` option. This syntax is deprecated.
.. code-block:: none
(keystone_admin)$ collect --list host1 host2 host3
You can also use the short form ``-l`` for this option.
.. note::
Systems and subclouds are collected in parallel to reduce the
overall collection time. Use the ``--inline`` (or ``-in``) option
to collect serially. ``--inline`` can be combined with the
``--all`` option.
.. code-block:: none
(keystone_admin)$ collect --all [--timeout | -t] <minutes>
.. note::
For large deployments, the default timeout value (20 minutes) may
need to be increased by using the ``--timeout`` (``-t``) option.
The timeout for collecting from the local host, the host that collect
is run from, does adopt the global timeout.
To fix that, run ``collect`` with an extended ``--timeout`` locally on
the host that is experiencing the timeout. That way the global timeout
applies.
Optionally, you can modify the default ``COLLECT_HOST_TIMEOUT_DEFAULT``
value in the ``/etc/collect/collect_timeouts`` file. That requires
``sudo`` command and no processes need to be restarted after the
change. All subsequent collects will adopt the new values in that file.
- For subcloud deployments:
.. code-block:: none
(keystone_admin)$ collect --subcloud subcloud1 subcloud2 subcloud3
You can also use the short form ``-sc`` for this option. The
``--subcloud`` and ``--all`` options can be combined.
.. code-block:: none
(keystone_admin)$ collect --all --subcloud
.. note::
The ``--all`` (``-a``) option is not recommended with large subcloud
deployments due to disk storage requirements.
- For systems with an up-time of more than 2 months, use the date range
options. The default behavior is to collect one month of logs.
Use ``--start-date`` for the collection of logs on and after a given date:
.. code-block:: none
(keystone_admin)$ collect [--start-date | -s] <YYYYMMDD>
Use ``--end-date`` for the collection of logs on and before a given date:
.. code-block:: none
(keystone_admin)$ collect [--end-date | -s] <YYYYMMDD>
- To prefix the collect tar ball name and easily identify the
:command:`collect` when several are present, use the following command.
.. code-block:: none
(keystone_admin)$ collect [--name | -n] <prefix>
For example, the following prepends **TEST1** to the name of the tarball:
.. code-block:: none
(keystone_admin)$ collect --name TEST1
[sudo] password for sysadmin:
collecting data from 1 host(s): controller-0
collecting controller-0_20200316.155805 ... done (00:01:39 56M)
creating user-named tarball /scratch/TEST1_20200316.155805.tar ... done (00:01:39 56M)
- Prior to using the :command:`collect` command, the nodes need to be
unlocked-enabled or disabled online and are required to be unlocked at
least once.
- Lock the node and wait for the node to reach the disabled-online state
before collecting logs for a node that is rebooting indefinitely.
- You may be required to run the local :command:`collect` command if the
collect tool running from the active controller node fails to collect logs
from one of the system nodes. Execute the :command:`collect` command using
the console or |BMC| connection on the node that displays the failure.
.. only:: partner
.. include:: /_includes/troubleshooting-log-collection.rest