Zuul Blog

Meet the Zuul community in Berlin, June 7-9, 2022

Authored by Sunny Cai, May 12, 2022

OpenInfra Summit Berlin is around the corner, and in less than a month, the Zuul community will join 30+ open source projects attending the Summit to collaborate in the open. The community will also share the latest upstream development, use cases and news in the Zuul project.

Register for your Summit tickets before prices increase on May 16 at 11:59 PM PT to save on your ticket purchase.

Here we have compiled all the Zuul sessions that you can look forward to at the Summit.

Tuesday, June 7

Volvo and Zuul CI

Speaker:
  • Johannes Foufas, Sr Principal Engineer at Volvo Cars Corporation
Zuul is now the default CI chain at Volvo Cars Corporation, and last year’s expansion has been extensive. Johannes will present last year’s progress with Zuul CI and how Zuul features are used as the first line of integration for all modules in the core computer.

Getting Started with Zuul

Speaker:
  • James Blair, Founder at Acme Gating
Join James Blair, the original author of Zuul and one of the current project maintainers for this interactive training session focusing on getting started with Zuul.

Wednesday, June 8

Advanced Zuul Features as used by OpenDev

Speaker:
  • Clark Boylan, Infrastructure Engineer at the OpenInfra Foundation
Learn how to employ Zuul for real world production deployment management.

Project Gating with Zuul

Speaker:
  • James Blair, Founder at Acme Gating
James Blair will share what makes Zuul unique, the latest improvements in Zuul, and how it can be used in the enterprise to stop merging broken code.

No gain without pain - a story of scaling CI with Zuul

Speakers:
  • Thomas Zink, PO in CICD DevOps at BMW Group
  • Simon Westphahl, Software Engineer at BMW Group
The speakers will share some turning points of their journey in scaling CI with Zuul, especially lately with the Zuul V5 development, and why they chose Zuul and keep contributing to the Zuul development & community.

Building Workday's Next Generation Private Cloud with Zuul

Speakers:
  • Jan Gutter, Senior Software Development Engineer at Workday
  • Simon McGuinness, Software Development Engineer at Workday
The Workday Private Cloud teams manage more than a million cores. In late 2020, they started building their next generation software and they chose Zuul to run the CI for it. They'll take you through the lessons we learned and pitfalls to avoid in bootstrapping a CI from scratch.

Thursday, June 9

Workday's Next Generation Private Cloud

Speakers:
  • Howard Abrams, Senior Cloud Engineer at Workday
  • Jan Gutter, Senior Software Development Engineer at Workday
Workday has built one of the largest OpenStack-based private clouds in the world, hosting a workload of over a million physical cores on over 16,000 compute nodes in 5 data centers for over ten years. Hear how they converted CI/CD in Jenkins to Zuul

Also congrats to Volvo Cars Corporation for being nominated as one of the candidates for the 2022 Superuser Award! Volvo Cars corporation uses Zuul as their default CI system when it comes to the code in the car. Join the OpenInfra Summit Keynotes on June 8th to find out who wins this year's Superuser Award!

Check out the Summit schedule for 100+ more sessions that you can attend, and register before prices increase on May 16 at 11:59 PM PT to save on your ticket purchase.

Introducing Zuul for improved CI/CD

A quick history of how and why Zuul is replacing Jenkins in CI testing in the OpenStack community.

Authored by Jeremy Stanley, February 7, 2020

(This article originally ran on opensource.com and is reprinted here with permission of the author under the Creative Commons Attribution-Share Alike 4.0 International License.)

Jenkins is a marvelous piece of software. As an execution and automation engine, it's one of the best you're going to find. Jenkins serves as a key component in countless continuous integration (CI) systems, and this is a testament to the value of what its community has built over the years. But that's what it is—a component. Jenkins is not a CI system itself; it just runs things for you. It does that really well and has a variety of built-ins and a vibrant ecosystem of plugins to help you tell it what to run, when, and where.

CI is, at the most fundamental level, about integrating the work of multiple software development streams into a coherent whole with as much frequency and as little friction as possible. Jenkins, on its own, doesn't know about your source code or how to merge it together, nor does it know how to give constructive feedback to you and your colleagues. You can, of course, glue it together with other software that can perform these activities, and this is how many CI systems incorporate Jenkins.

It's what we did for OpenStack, too, at least at first.

If it's not tested, it's broken

In 2010, an open source community of projects called OpenStack was forming. Some of the developers brought in to assist with the collaboration infrastructure also worked on a free database project called Drizzle, and a key philosophy within that community was the idea "if it's not tested, it's broken." So OpenStack, on day one, required all proposed changes of its software to be reviewed and tested for regressions before they could be approved to merge into the trunk of any source code repositories. To do this, Hudson (which later forked to form the Jenkins project) was configured to run tests exercising every change.

A plugin was installed to interface with the Gerrit code review system, automatically triggering jobs when new changes were proposed and reporting back with review comments indicating whether they succeeded or failed. This may sound rudimentary by today's standards, but at the time, it was a revolutionary advancement for an open source collaboration. No developer on OpenStack was special in the eyes of CI, and everyone's changes had to pass this growing battery of tests before they could merge—a concept the project called "project gating."

There was, however, an emerging flaw with this gating idea: To guarantee two unrelated changes didn't alter a piece of software in functionally incompatible ways, they had to be tested one at a time in sequence before they could merge. OpenStack was complicated to install and test, even back then, and quickly grew in popularity. The rising volume of developer contributions coupled with increasing test coverage meant that, during busy periods, there was simply not enough time to test every change that passed review. Some longer-running jobs took nearly an hour to complete, so the upper bound for what could get through the gate was roughly two dozen changes in a day. The resulting merge backlog showed a new solution was required.

Enter Zuul

During an OpenStack CI meeting in May 2012, one of the CI team members, James Blair, announced that he'd "been working on speculative execution of Jenkins jobs." Speculative execution is an optimization most commonly found in the pipelines of modern microprocessors. Much like the analogy with processor hardware, the theory was that by optimistically predicting positive gating results for changes recently approved but that had not yet completed their tests, subsequently approved changes could be tested concurrently and then conditionally merged as long as their predecessors also passed tests and merged. James said he had a name for this intelligent scheduler: Zuul.

Within this time frame, challenges from trying to perform better revision control for Jenkins' XML job configuration led to the creation of the human-readable YAML-based Jenkins Job Builder templating engine. Limited success with the JClouds plugin for Jenkins and cumbersome attempts to use jobs for refreshing cloud images of single-use Jenkins slaves ended with the creation of the Nodepool service. Limited log-storage capabilities resulted in the team adding separate external solutions for organizing, serving, and indexing job logs and assuming maintainership of an abandoned secure copy protocol (SCP) plugin replacing the less-secure FTP option that Jenkins provided out of the box. The OpenStack infrastructure team was slowly building a fleet of services and utilities around Jenkins but began to bump up against a performance limitation.

Multiplying Jenkins

By mid-2013, Nodepool was constantly recycling as many as 100 virtual machines registered with Jenkins as slaves, but this was no longer enough to keep up with the growing workload. Thread contention for global locks in Jenkins thwarted all attempts to push past this threshold, no matter how much processor power and memory was thrown at the master server. The project had offers to donate additional capacity for Jenkins slaves to help relieve the frequent job backlog, but this would require an additional Jenkins master. The efficient division of work between multiple masters needed a new channel of communication for dispatch and coordination of jobs. Zuul's maintainers identified the Gearman job server protocol as an ideal fit, so they outfitted Zuul with a new geard service and extended Jenkins with a custom Gearman client plugin.

Now that jobs were spread across a growing assembly of Jenkins masters, there was no longer any single dashboard with a complete view of job activity and results. In order to facilitate this new multi-master world, Zuul grew its own status API and WebUI, as well as a feature to emit metrics through the StatsD protocol. Over the next few years, Zuul steadily subsumed more of the CI features its users relied on, while Jenkins' place in the system waned accordingly, and it was becoming a liability. OpenStack made an early choice to standardize on the Python programming language; this was reflected in Zuul's development, yet Jenkins and its plugins were implemented in Java. Zuul's configuration was maintained in the same YAML serialization format that OpenStack used to template its own Jenkins jobs, while Jenkins kept everything in baroque XML. These differences complicated ongoing maintenance and led to an unnecessarily steep learning curve for new administrators from related communities that had started trying to run Zuuls.

The time was right for another revolution.

The rise of Ansible

In early 2016, Zuul's maintainers embarked on an ambitious year-long overhaul of their growing fleet of services with the goal of eliminating Jenkins from the overall system design. By this time, Jenkins was serving only as a conduit for running jobs consisting mostly of shell scripts on slave nodes over SSH, providing real-time streaming of job output and copying resulting artifacts to longer-term storage. Ansible was found to be a great fit for that first need; purpose-built to run commands remotely over SSH, it was written in Python, just like Zuul, and also used YAML to define its tasks. It even had built-in modules for features the team had previously implemented as bespoke Jenkins plugins. Ansible provided true multi-node support right out of the box, so the same playbooks could be used for both simulating and performing complex production deployments. An ever-expanding ecosystem of third-party modules filled in any gaps, in much the same way as the Jenkins community's plugins had before.

A new Zuul executor service filled the prior role of the Jenkins master: it acted on pending requests in the scheduler's geard, dispatched them via Ansible to ephemeral servers managed by Nodepool, then collected results and artifacts for publication. It also exposed in-progress build output over the classic RFC 742 Name/Finger protocol, streamed in real time from an extension of Ansible's command output module. Once it was no longer necessary to limit jobs to what Jenkins' parser could comprehend, Zuul was free to grow new features like distributed in-repository job definitions, shareable between projects with inheritance and secure handling of secrets, as well as the ability to test-drive proposed changes for the jobs themselves. Jenkins served its purpose admirably, but at least for Zuul, its usefulness was finally at an end.

Testing the future

Zuul's community likes to say that it "tests the future" through its novel application of speculative execution. Gone are the harrowing days of wondering whether the improvement you want to make to an existing job will render it non-functional once it's applied in production. Overloaded review teams for a massive central job repository are a thing of the past. Jobs are treated as a part of the software and shipped right alongside the rest of the source code, taking advantage of Zuul's other features like cross-repository dependencies so that your change to part of a job in one project can be exercised with a proposed job change in another project. It will even comment on your job changes, highlighting specific lines with syntax problems as if it were another code reviewer giving you advice.

These were features Zuul only dreamed of before, but which required freedom from Jenkins so that it could take job parsing into its own hands. This is the future of CI, and Zuul's users are living it.

As of early 2019, the OpenStack Foundation recognized Zuul as an independent, openly governed project with its own identity and flourishing community. If you're into open source CI, consider taking a look. Development on the next evolution of Zuul is always underway, and you're welcome to help. Find out more on Zuul's website.