docs/doc/source/developer_resources/debug_issues.rst
Ron Stone 2b62f49a9d Fix symlinks
Changed paths to avoid '..', which breaks symlinks in newer versions of sphinx.
Consolidated installation include files under /_includes. Prefixed r5 versions with 'r5_'
Moved files that are used up/down, but at different paths under /shared/_includes
and /shared/figures
Move two include files to /_includes
Moved addtional images to /shared/figures/... Required for DS platform builds.

Signed-off-by: Ron Stone <ronald.stone@windriver.com>
Change-Id: Ia38f4205c5803b3d1fc043e6c59617c34a4e5cbd
Signed-off-by: Ron Stone <ronald.stone@windriver.com>
2021-09-02 13:31:45 +00:00

4.0 KiB

Debug StarlingX Issues

This guide contains some basic steps for debugging issues on StarlingX.

Record the issue

Record information about the issue so it can be reproduced during debugging. The items below describe some issue characteristics to capture.

  • Deployment issue type, such as bootstrap failure, provisioning failure, or functional failures.

  • Check the StarlingX version with the command: :

    cat /etc/build.info
  • Check the StarlingX deployment configuration, such as: Simplex, Duplex, Multi-node, by viewing the platform configuration file: :

    cat /etc/platform/platform.conf
  • Server type, such as bare metal server(s) or VMs.

  • Hardware device types and characteristics, such as NICs, PCI cards, # of hard disks, and RAM size.

  • Other aspects of the issue include: steps for reproducing, expected results, actual results, and so on.

  • Can the issue be reproduced regularly or occasionally?

  • Gather log files and configuration files using the collect command.

Check status and logs

  • Log in to the active controller.

  • Check services using the sm-dump command: :

    sudo sm-dump
  • Check services using the systemctl command.

  • Apply the platform environment for sysadmin using: :

    source /etc/platform/openrc
  • Check alarms from Fault-Manager using: :

    fm alarm-list --uuid
    fm alarm-show <uuid>
  • Search for errors in /var/log.

    • You must check /var/log/sysinv.log for errors.
    • You can get hints from sysinv.log for many deployment failures.
    • Look into other log files based on the functional area.
  • If a functional area log file includes errors, check the associated configuration file, which is typically located under the /etc/ subdirectory.

  • You may need to enable the debug option in the configuration file.

Debug and triage

  • Check the Kubernetes status for: node, pod/job, endpoint, services, secret, configmap.
  • Check the two major namespaces: kube-system, openstack
  • If issues occur inside containerized components, you need to enter the service using the kubectl exec command.

Implement fixes

  • You can try to resolve the issue by manually making some online changes without rebooting Linux or even re-deploying StarlingX. For example, you can modify system config files or the StarlingX config/database. You can make the changes and restart the corresponding services using the systemctl command or the StarlingX sm (service management) command.
  • If the fixes must be put on certain nodes (controller, worker, storage), you can temporarily lock that node, make changes using StarlingX commands, and then unlock the lock, to make the changes take effect.
  • If the changes must be made in C/C++/Go code, you can:
    • Make the changes in your development workspace with the StarlingX codebase.
    • Build the related packages using build-pkgs <package_name>.
    • Create and apply the patch using the starlingx_patching guide.
    • Restart the services using the systemctl command or the StarlingX sm (service management) command.

Additional resources