diff --git a/doc/source/install/_images/rdo_upgrades_jobs.png b/doc/source/install/_images/rdo_upgrades_jobs.png new file mode 100644 index 00000000..7663e185 Binary files /dev/null and b/doc/source/install/_images/rdo_upgrades_jobs.png differ diff --git a/doc/source/install/developer/upgrades/ci_upgrades.rst b/doc/source/install/developer/upgrades/ci_upgrades.rst index 2531b7e8..e4427224 100644 --- a/doc/source/install/developer/upgrades/ci_upgrades.rst +++ b/doc/source/install/developer/upgrades/ci_upgrades.rst @@ -3,22 +3,265 @@ here might differ from the ones in the final version. -Major upgrades/Minor upgrades CI coverage ------------------------------------------ +Major upgrades & Minor updates CI coverage +------------------------------------------ -Upgrades CI jobs -~~~~~~~~~~~~~~~~ -# WORK IN PROGRESS +.. include:: links.rst -How to add a new upgrade/update job in the CI -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -# WORK IN PROGRESS +This document tries to give a detailed overview of the current +CI coverage for upgrades/updates jobs. Also, it is intended as +a guideline to understand how these jobs work, as well as giving +some tips for debugging. + +Upgrades/Updates CI jobs +~~~~~~~~~~~~~~~~~~~~~~~~~ + +At the moment most of the upgrade jobs have been moved from upstream +infrastructure to `RDO Software Factory job definition`_ due to +runtime constraints of the OpenStack infra jobs. + +Each of these jobs are defined by a `featureset file`_ and a `scenario file`_. The +featureset used in a job can be found in the last part of the job type value. +This can be found in the ci job definition:: + + - '{trigger}-tripleo-ci-{jobname}-{release}{suffix}': + jobname: 'centos-7-containers-multinode-upgrades' + release: + - pike + - master + suffix: '' + type: 'multinode-1ctlr-featureset011' + node: upstream-centos-7-2-node + trigger: gate + +The scenario used is referenced in the featureset file, in the example above +the `featureset011`_ makes use of the following scenarios:: + + composable_scenario: multinode.yaml + upgrade_composable_scenario: multinode-containers.yaml + +As this job covers the upgrade from one release to another, we need to +specify two scenario files. The one used during deployment and the one +used when upgrading. Each of these scenario files defines the services +deployed in the nodes. + +.. note:: + There is a matrix with the different features deployed per feature set + here: `featureset matrix`_ + +Currently, two types of upgrade jobs exist: + +- multinode-upgrade (mixed-version): In this job, an undercloud with + release N+1 is deployed, while the overcloud is deployed with a N + release. Execution time is reduced by not upgrading the undercloud + , instead the heat templates from the (N+1) undercloud are used when + performing the overcloud upgrade. + + .. note:: + If you want your patch to be tested against this job you need + to add *RDO Third Party CI* as reviewer or reply with the comment + *check-rdo experimental*. + +- undercloud-upgrade: This job tests the undercloud upgrade from a + major release to another. The undercloud is deployed with release + N and upgraded to N+1 release. This job does not deploy an overcloud. + +.. note:: + There is an effort to `integrate`_ the new `tripleo-upgrade`_ role into + tripleo-quickstart that defines an unified way to upgrade and update. + +Upgrade/Update CI jobs, where to look +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The best place to check the current CI jobs status is in the `CI Status`_ +page. This webpage contains a log of all the TripleO CI jobs, it's result +status, link to logs, git patch trigger and statistics about the pass/fail +rates. + +To check the status of the Upgrades/Updates jobs, you need to click the +`TripleO CI promotion jobs`_ link from `CI Status`_, where you will find +the RDO cloud upgrades section: + +.. image:: ../../_images/rdo_upgrades_jobs.png + +In this section the CI jobs have a color code, to show its +current status in a glance:: + + - Red: CI job constantly failing. + - Yellow: Unstable job, frequent failures. + - Green: CI job passing consistently. + +If you scroll down after pressing some of the jobs in the section +you will find the CI job statistics and the last 100 (or less, it +can be edited) job executions. Each of the job executions contains:: + + - Date: Time and date the CI job was triggered + - Lenght: Job duration + - Reason: CI job result or failure reason. + - Patch: Git ref of the patch tha triggered the job. + - Logs: Link to the logs. + - Branch: Release branch used to run the job. + + +Debugging Upgrade/Update CI jobs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When opening the logs from a CI job it might look a little chaotic +(mainly when it is for the first time). It's good to have an idea +where you can find the logs you need, so you will be able to identify +the cause of a failure or debug some issue. + +.. _logs directory: + +The first thing to have a look at when debugging a CI job is the +console output or full log. When clicking in the job, the following +folder structure appears:: + + job-output.json.gz + job-output.txt.gz + logs/ + zuul-info/ + +The job execution log is located in the *job-output.txt.gz* file. Once +opened, a huge log will appear in front of you. What should you look +for? + +(1) Find the job result + + A good string to search is *PLAY RECAP*. At this point, all the + playbooks have been executed and a sumary of the runs per node + is displayed:: + + PLAY RECAP ********************************************************************* + 127.0.0.2 : ok=9 changed=0 unreachable=0 failed=0 + localhost : ok=10 changed=3 unreachable=0 failed=0 + subnode-2 : ok=3 changed=1 unreachable=0 failed=0 + undercloud : ok=120 changed=78 unreachable=0 failed=1 + + In this case, one of the playbooks executed in the undercloud has + failed. To identify which one, we can look for the string **fatal**.:: + + fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "set -o pipefail && /home/zuul/overcloud-upgrade.sh 2>&1 + | awk '{ print strftime(\"%Y-%m-%d %H:%M:%S |\"), $0; fflush(); }' > overcloud_upgrade_console.log", + "delta": "0:00:39.175219", "end": "2017-11-14 16:55:47.124998", "failed": true, "rc": 1, + "start": "2017-11-14 16:55:07.949779", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []} + + From this task, we can guess that something went wrong during the + overcloud upgrading proces. But, where can I find the log + *overcloud_upgrade_console.log* referenced in the task? + +(2) Undercloud logs + + From the `logs directory`_ , you need to open the *logs/* + folder. All undercloud logs are located inside the *undercloud/* + folder. Opening it will display the following:: + + etc/ *configuration files* + home/ *job execution logs from the playbooks* + var/ *system/services logs* + + The log we look for is located in */home/zuul/*. Most of the tasks + executed in tripleo-quickstart will store the full script as well as + the execution log in this directory. So, this is a good place to + have a better understanding of what went wrong. + + If the overcloud deployment or upgrade failed, you will also find + two log files named:: + + failed_upgrade.log.txt.gz + failed_upgrade_list.log.txt.gz + + The first one stores the output from the debugging command:: + + openstack stack failures list --long overcloud + + Which prints out the reason why the deployment or upgrade + failed. Although sometimes, this information is not enough + to find the root cause for the problem. The *stack failures* + can give you a clue of which service is causing the problem, + but then you'll need to investigate the OpenStack service logs. + +(3) Overcloud logs + + From the *logs/* folder, you can find a folder named *subnode-2* + which contains most of the overcloud logs.:: + + apache/ + ceph_conf.txt.gz + deprecations.txt.gz + devstack.journal.gz + df.txt.gz + etc/ + home/ + iptables.txt.gz + libvirt/ + listen53.txt.gz + openvswitch/ + pip2-freeze.txt.gz + ps.txt.gz + resolv_conf.txt.gz + rpm-qa.txt.gz + sudoers.d/ + var/ + + To access the OpenStack services logs, you need to go to + *subnode-2/var/log/* when deploying a baremetal overcloud. If the + overcloud is containerized, the service logs are stored under + *subnode-2/var/log/containers*. -Debugging CI jobs -~~~~~~~~~~~~~~~~~ -# WORK IN PROGRESS Replicating CI jobs ~~~~~~~~~~~~~~~~~~~ -# WORK IN PROGRESS +Thanks to `James Slagle`_ there is now a way to reproduce TripleO CI jobs in +any OpenStack cloud. Everything is enabled by the `traas`_ project, +a set of Heat templates and scripts that reproduce the TripleO CI jobs +in the same way they are being run in the Zuul gate. + +When cloning the repo, you just need to set some configuration parameters. A +set of sample templates have been located under +`templates/example-environments`_. The parameters defined in this +template are:: + + parameters: + overcloud_flavor: [*flavor used for the overcloud instance*] + overcloud_image: [*overcloud OS image (available in cloud images)*] + key_name: [*private key used to access cloud instances*] + private_net: [*network name (it must exist and match)*] + overcloud_node_count:[*number of overcloud nodes*] + public_net: [*public net in CIDR notation*] + undercloud_image: [*undercloud OS image (available in cloud images)*] + undercloud_flavor: [*flavor used for the undercloud instance*] + toci_jobtype: [*CI job type*] + zuul_changes: [*List of patches to retrieve*] + +.. note:: The CI job type toci_jobtype can be found in the job definition + under `tripleo-ci/zuul.d`_. + +A good example to deploy a multinode job in RDO Cloud is this +`sample template`_. You can test your out patches by appending +the refs patch linked with the ^ character:: + + zuul_changes: ::[^::]* + +This allows you also to test any patch in a local environment without +consuming CI resources. Or when you want to debug an environment after +a job execution. + +Once the template parameters are defined, you just need to create the stack. +If we would like to deploy the *rdo-cloud-env-config-download.yaml* +`sample template`_ we would need to run:: + + cd traas/ + openstack stack create traas -t templates/traas.yaml \ + -e templates/traas-resource-registry.yaml \ + -e templates/example-environments/rdo-cloud-env-config-download.yaml + +This stack will create two instances in your cloud tenant, one for undercloud +and another for the overcloud. Once created, the stack will directly call +the `traas/scripts/traas.sh`_ script which downloads all required repositories +to start executing the job. + +If you want to follow up the job execution, you can ssh to the undercloud +instance and tail the content from the *$HOME/tripleo-root/traas.log*. All +the execution will be logged in that file. diff --git a/doc/source/install/developer/upgrades/links.rst b/doc/source/install/developer/upgrades/links.rst new file mode 100644 index 00000000..6f8dfa92 --- /dev/null +++ b/doc/source/install/developer/upgrades/links.rst @@ -0,0 +1,31 @@ +.. Links, citations, and others... +.. _RDO Software Factory job definition: + https://github.com/rdo-infra/review.rdoproject.org-config/blob/9668021f655e53413108f8c15988f68caa8d31ba/jobs/tripleo-upstream.yml#L802 +.. _featureset file: + https://github.com/openstack/tripleo-quickstart/tree/master/config/general_config +.. _scenario file: + https://github.com/openstack/tripleo-heat-templates/tree/master/ci/environments +.. _featureset011: + https://github.com/openstack/tripleo-quickstart/blob/master/config/general_config/featureset011.yml +.. _featureset matrix: + https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html +.. _tripleo-upgrade: + https://github.com/redhat-openstack/tripleo-upgrade +.. _integrate: + https://review.openstack.org/#/q/topic:link_tripleo_upgrade +.. _James Slagle: + http://lists.openstack.org/pipermail/openstack-dev/2017-February/112993.html +.. _traas: + https://github.com/slagle/traas +.. _templates/example-environments: + https://github.com/slagle/traas/tree/master/templates/example-environments +.. _tripleo-ci/zuul.d: + https://github.com/openstack-infra/tripleo-ci/blob/4042e9c225cf9dac917b8d4c3a245b8ff492056d/zuul.d/multinode-jobs.yaml#L82 +.. _sample template: + https://github.com/slagle/traas/blob/master/templates/example-environments/rdo-cloud-env-config-download.yaml +.. _traas/scripts/traas.sh: + https://github.com/slagle/traas/blob/fb447a585895dd783519dfec68a9728fa72b7609/scripts/traas.sh +.. _CI Status: + http://cistatus.tripleo.org/ +.. _TripleO CI promotion jobs: + http://38.145.34.234/