From d748ae0eb7fff1cb1810ddfff4e5aaa5089b189c Mon Sep 17 00:00:00 2001 From: MCamp859 Date: Sat, 30 May 2020 16:34:53 -0400 Subject: [PATCH] Add debug guide Added new guide to developer resources and contributor pages. Alphasorted list of guides on developer resources page. Closes-Bug: 1873500 Change-Id: I2344a437ef85c1352ac9f6a5da0a8ac64ff78f1a Signed-off-by: MCamp859 --- doc/source/contributor/index.rst | 1 + .../developer_resources/debug_issues.rst | 131 ++++++++++++++++++ doc/source/developer_resources/index.rst | 13 +- 3 files changed, 139 insertions(+), 6 deletions(-) create mode 100644 doc/source/developer_resources/debug_issues.rst diff --git a/doc/source/contributor/index.rst b/doc/source/contributor/index.rst index c30d60ea0..291fb6553 100644 --- a/doc/source/contributor/index.rst +++ b/doc/source/contributor/index.rst @@ -33,6 +33,7 @@ Additional StarlingX-specific resources are listed below. development_process ../developer_resources/code-submission-guide + ../developer_resources/debug_issues -------------------- Additional resources diff --git a/doc/source/developer_resources/debug_issues.rst b/doc/source/developer_resources/debug_issues.rst new file mode 100644 index 000000000..7a7c7b031 --- /dev/null +++ b/doc/source/developer_resources/debug_issues.rst @@ -0,0 +1,131 @@ +====================== +Debug StarlingX Issues +====================== + +This guide contains some basic steps for debugging issues on StarlingX. + +.. contents:: + :local: + :depth: 1 + +---------------- +Record the issue +---------------- + +Record information about the issue so it can be reproduced during debugging. The +items below describe some issue characteristics to capture. + +* Deployment issue type, such as bootstrap failure, provisioning failure, or + functional failures. + +* Check the StarlingX version with the command: + :: + + cat /etc/build.info + + +* Check the StarlingX deployment configuration, such as: Simplex, Duplex, + Multi-node, by viewing the platform configuration file: + :: + + cat /etc/platform/platform.conf + +* Server type, such as bare metal server(s) or VMs. + +* Hardware device types and characteristics, such as NICs, PCI cards, # of + hard disks, and RAM size. + +* Other aspects of the issue include: steps for reproducing, expected results, + actual results, and so on. + +* Can the issue be reproduced regularly or occasionally? + +* Gather log files and configuration files using the ``collect`` command. + + +--------------------- +Check status and logs +--------------------- + +* Log in to the active controller. + +* Check services using the ``sm-dump`` command: + :: + + sudo sm-dump + +* Check services using the ``systemctl`` command. + +* Apply the platform environment for ``sysadmin`` using: + :: + + source /etc/platform/openrc + +* Check alarms from Fault-Manager using: + :: + + fm alarm-list --uuid + fm alarm-show + +* Search for errors in ``/var/log``. + + * You **must** check ``/var/log/sysinv.log`` for errors. + * You can get hints from ``sysinv.log`` for many deployment failures. + * Look into other log files based on the functional area. + +* If a functional area log file includes errors, check the associated + configuration file, which is typically located under the ``/etc/`` + subdirectory. + +* You may need to enable the ``debug`` option in the configuration file. + +---------------- +Debug and triage +---------------- + +* Check the Kubernetes status for: node, pod/job, endpoint, services, secret, + configmap. + +* Check the two major namespaces: kube-system, openstack + +* If issues occur inside containerized components, you need to enter the + service using the ``kubectl exec`` command. + +--------------- +Implement fixes +--------------- + +* You can try to resolve the issue by manually making some online + changes without rebooting Linux or even re-deploying StarlingX. For + example, you can modify system config files or the StarlingX + config/database. You can make the changes and restart the corresponding + services using the ``systemctl`` command or the StarlingX ``sm`` (service + management) command. + +* If the fixes must be put on certain nodes (controller, worker, storage), + you can temporarily **lock** that node, make changes using StarlingX + commands, and then **unlock** the lock, to make the changes take effect. + +* If the changes must be made in C/C++/Go code, you can: + + * Make the changes in your *development workspace* with the StarlingX + codebase. + * Build the related packages using ``build-pkgs ``. + * Create and apply the patch using the :doc:`starlingx_patching` guide. + * Restart the services using the ``systemctl`` command or the StarlingX + ``sm`` (service management) command. + +-------------------- +Additional resources +-------------------- + +* Review the `StarlingX Discuss list `_ + for similar questions and workarounds from the community. + +* Check the `StarlingX Launchpad `_ for + similar issues and potential workarounds. + +* Open a new `StarlingX Launchpad `_ item to + report a bug. + + diff --git a/doc/source/developer_resources/index.rst b/doc/source/developer_resources/index.rst index 563df892a..b835169e1 100644 --- a/doc/source/developer_resources/index.rst +++ b/doc/source/developer_resources/index.rst @@ -10,16 +10,17 @@ Developer Resources build_guide Layered_Build + backup_restore + build_docker_image code-submission-guide + debug_issues + stx_tsn_in_kata + mirror_repo + move_to_new_openstack_version_in_starlingx navigate_source_code + Project Specifications architecture_docs starlingx_patching - build_docker_image - move_to_new_openstack_version_in_starlingx - mirror_repo - backup_restore - Project Specifications stx_ipv6_deployment - stx_tsn_in_kata