6dd37ba4a2
Zuulv3 allows running Ansible playbooks from arbitrary sources. So we need to mitigate as much of that risk as we can in a reasonable amount of complexity. Change-Id: I4bc1bf971486a744ff340af5c5189733545197ab Story: 2000910 Task: 3541
692 lines
19 KiB
ReStructuredText
692 lines
19 KiB
ReStructuredText
::
|
|
|
|
Copyright (c) 2017 IBM
|
|
|
|
This work is licensed under a Creative Commons Attribution 3.0
|
|
Unported License.
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
=========================
|
|
Zuul v3 Executor Security
|
|
=========================
|
|
|
|
Storyboard: https://storyboard.openstack.org/#!/story/2000910
|
|
|
|
Playbooks provided in project repos are already run with a set of
|
|
Ansible plugins to protect the executor from compromise or information
|
|
leaks. While this belt is keeping our security pants on, we definitely
|
|
don't want them to fall down if the belt fails, so we need suspenders
|
|
in the form of OS level containment.
|
|
|
|
The goals of this effort as as follows:
|
|
|
|
|
|
* Define simple, automated ways for Zuul to protect its own executor.
|
|
* Provide operators with guidance on Executor security measures.
|
|
* Keep zuul simple.
|
|
|
|
Note that we will not discuss any methods to mitigate resource exhaustion
|
|
outside the executor, such as filling up Swift with artifacts, using
|
|
nodes for purposes outside the ToS agreed upon the by Zuul operator, etc.
|
|
|
|
Problem Description
|
|
===================
|
|
|
|
If a bug in Ansible or our Ansible plugins allows users to break out of
|
|
the insecure context, the executor will currently be vulnerable to several
|
|
known attack vectors.
|
|
|
|
Local Privilege Escalation (LPE)
|
|
--------------------------------
|
|
|
|
The executor runs as an unprivileged daemon user. It will run
|
|
`ansible-playbook` with those same privileges. While administrators
|
|
should lock this user down to the minimum amount of access required to
|
|
launch jobs, `Linux` and other operating systems which might run Zuul
|
|
are not immune from privilege escalation vulnerabilities.
|
|
|
|
Critical Information Leaks (CIL)
|
|
--------------------------------
|
|
|
|
Systems which are not generally secured against local users may provide
|
|
helpful information to malicious actors. This includes simple things
|
|
like operating system kernel versions, networking configuration, and more
|
|
critical information like files containing secrets that are accidentally
|
|
exposed by incorrect local file permissions.
|
|
|
|
Denial of Service (DoS)
|
|
-----------------------
|
|
|
|
A bad actor that breaks out of Ansible protections may be able to do
|
|
some very small things to consume all of the resources of the executor.
|
|
|
|
Proposed Change
|
|
===============
|
|
|
|
Execution Flow
|
|
--------------
|
|
|
|
Currently the executor functions like this, with untrusted context being
|
|
secured by Ansible plugins. Any of the playbooks may be run in a trusted
|
|
or untrusted context depending on whether or not they were defined in
|
|
a project repo or in a config repo.
|
|
|
|
1. Make a writable job dir
|
|
2. Copy merged git repos to job dir
|
|
3. Run pre playbooks
|
|
4. Run in-repo playbooks
|
|
5. Run post playbooks
|
|
6. Nuke job dir
|
|
|
|
Two possible revisions to this are "Secure Execution on Executor" and
|
|
"Secure Execution on Node":
|
|
|
|
Secure Execution on Executor
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
We can rework the untrusted context by wrapping it in containment methods,
|
|
outlined below. If we choose containment methods that require an image
|
|
we'll add two steps before step (1) above:
|
|
|
|
-1. Build image for chroot periodically.
|
|
0. Copy image to job dir
|
|
1. Make a writable job dir
|
|
2. Copy merged git repos to job dir
|
|
3. Run pre playbooks
|
|
4. Run in-repo playbooks
|
|
5. Run post playbooks
|
|
6. Nuke job dir
|
|
|
|
Step (-1) above could be done out of band with the executor using
|
|
`diskimage-builder`. It's possible `nodepool-builder` could be used here,
|
|
but for an initial implementation, I believe this can simply be done on
|
|
a cron once a day and configured as a path to a template image directory
|
|
or tarball. Any CoW system is an implementation/optimization detail to
|
|
speed up the copy step and reduce disk footprint.
|
|
|
|
A more practical plan is to skip (-1) and (0) and just bind mount /usr
|
|
into the working directory, which is already the default method used by
|
|
Bubblewrap. This means that a contained attacker has access to all the
|
|
tools from the executor host, but in terms of real security, depriving
|
|
them of gcc, while we allow running python and contacting the internet,
|
|
is only a very minor hurdle for an attacker to get over. We may want
|
|
to advise deployers to keep the executor host software footprint to a
|
|
minimum as a result.
|
|
|
|
Diskspace monitoring
|
|
--------------------
|
|
|
|
Because playbooks will need to transfer artifacts around, we will
|
|
need to monitor artifact space usage by playbooks. While usage of an
|
|
object storage service like Swift is also an option, there will always
|
|
be some percentage of space needed on the executor for playbooks to use
|
|
as scratch space, and we don't want to require object storage services
|
|
for effective use of Zuul.
|
|
|
|
A simple method will be to have a single process/thread which walks
|
|
playbook artifact storage with `du` periodically. Any job that has
|
|
exceeded its space allocation will be terminated immediately and have
|
|
its artifact space emptied. A very fast consumer of space will be able
|
|
to fill a disk before this can be done, so the limit should be relatively
|
|
low in comparison to the size of the storage area.
|
|
|
|
A config option will be created to define the per-job disk space limit
|
|
for all jobs. This should simplify the initial implementation, but later
|
|
on it may be necessary to define per-job space limitations.
|
|
|
|
Evaluation of methods of containment will assume that this change precedes
|
|
or accompanies any implementation.
|
|
|
|
Available Containment Methods
|
|
-----------------------------
|
|
|
|
There are a number of different options available to address executor
|
|
security.
|
|
|
|
Some known methods are listed below with general background information,
|
|
including a list of pros and cons for each.
|
|
|
|
Many of these can be combined, some cannot. It seems likely that the end
|
|
solution will have us adopting at least 2. We may also need to add in a
|
|
layer of abstraction to Zuul to allow users to write their own security
|
|
integrations based on their knowledge and abilities, but that is beyond
|
|
the scope of this document.
|
|
|
|
ulimit
|
|
~~~~~~
|
|
|
|
This limits what resources a user-space process can consume.
|
|
|
|
LPE
|
|
***
|
|
|
|
No coverage.
|
|
|
|
CIL
|
|
***
|
|
|
|
No coverage.
|
|
|
|
DoS
|
|
***
|
|
|
|
* Can prevent exhaustion of user-space memory
|
|
|
|
* Can prevent direct exhaustion of process space
|
|
|
|
* Still vulnerable to exhaustion of kernel structures and I/O
|
|
|
|
Pros
|
|
****
|
|
|
|
* Simple implementation
|
|
|
|
* No filesystem changes needed
|
|
|
|
* Built-in to all operating systems.
|
|
|
|
* No performance overhead
|
|
|
|
Cons
|
|
****
|
|
|
|
* Only covers a few DoS vectors and nothing else
|
|
|
|
Chroot
|
|
~~~~~~
|
|
|
|
This would involve building a directory with only the binaries needed
|
|
to run playbooks, source trees bind mounted or copied in, and writable
|
|
space for artifacts.
|
|
|
|
Special care would be taken to ensure the binary paths were readonly
|
|
and any writable paths are mounted noexec.
|
|
|
|
LPE
|
|
***
|
|
|
|
* Mitigates due to removal of most binaries [binaries]_
|
|
|
|
* Mitigates due to removal of access to directories outside chroot.
|
|
|
|
* Vulnerable to kernel problems which allow chroot breakout or
|
|
privilege escalation via Python.
|
|
|
|
CIL
|
|
***
|
|
|
|
* Mitigates due to removal of most binaries [binaries]_
|
|
|
|
* Mitigates due to removal of access to directories outside chroot.
|
|
|
|
* Still vulnerable to any kernel<->user space interaction which Python
|
|
can do natively.
|
|
|
|
.. [binaries] This mitigation is complicated by the fact that an attacker
|
|
could build binaries on a test node and transfer it back as an
|
|
artifact. Getting permissions and noexec parts right would
|
|
be key.
|
|
|
|
DoS
|
|
***
|
|
|
|
* No significant improvement.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Simple, built-in to most operating systems
|
|
|
|
* Well understood, can be fully achieved by unprivileged user.
|
|
|
|
Cons
|
|
****
|
|
|
|
* Incomplete coverage
|
|
|
|
* Known attack vectors
|
|
|
|
* Requires building chroot filesystem carefully.
|
|
|
|
Cgroups
|
|
~~~~~~~
|
|
|
|
Cgroups allow one to limit a set of processes' access to various kernel
|
|
subsystems, and to identify them as a group.
|
|
|
|
Various helpers exist for them, and those will be evaluated separately to
|
|
the fundamental cgroup capability.
|
|
|
|
The implementation would be to create a cgroup for each ansible-playbook execution,
|
|
with the administrator being able to decide the template for that cgroup.
|
|
|
|
LPE
|
|
***
|
|
|
|
* Mitigates somewhat by restricting access to some kernel subsystems.
|
|
|
|
CIL
|
|
***
|
|
|
|
* Mitigates somewhat by restricting access to some kernel subsystems.
|
|
|
|
DoS
|
|
***
|
|
|
|
* Significant mitigation due to limitations on all kernel subsystems.
|
|
|
|
* Provides convenient way to integrate with `du` process as any detected
|
|
overrun of disk space can have its cgroup 'frozen' stopping all
|
|
processes in the cgroup.
|
|
|
|
* Controls "noisy neighbor" by guaranteeing even consumption of CPU and IO.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Relatively simple to create and modify cgroups
|
|
|
|
Cons
|
|
****
|
|
|
|
* Direct cgroup manipulation requires root privileges or setuid helper
|
|
|
|
Seccomp
|
|
~~~~~~~
|
|
|
|
Seccomp is a system by which a process may restrict what syscalls it,
|
|
and any of its children, may make. It is a relatively straightforward
|
|
process to consider what syscalls Ansible would need to make, since its
|
|
primary functions are local file CRUD, and network operations.
|
|
|
|
LPE
|
|
***
|
|
|
|
* Reduces attack surface of the kernel by limiting to the needed syscalls.
|
|
|
|
* Reduces ability of python to do real damage beyond what the needed syscalls
|
|
can do.
|
|
|
|
CIL
|
|
***
|
|
|
|
* Should reduce surface area again by limiting access to syscalls which leak
|
|
information.
|
|
|
|
DoS
|
|
***
|
|
|
|
* Same mitigations as LPE.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Well understood, universally available Linux security technology.
|
|
|
|
* The syscall-oriented nature means it's likely the set of syscalls
|
|
needed will remain relatively static, reducing maintenance load as new
|
|
versions of Ansible are released.
|
|
|
|
Cons
|
|
****
|
|
|
|
* Tooling is a bit obtuse and user-unfriendly.
|
|
|
|
LXC
|
|
~~~
|
|
|
|
An LXC container is effectively a combination of chroot, cgroup, and
|
|
Linux kernel namespaces.
|
|
|
|
A potential implementation would be to build a chroot filesystem using
|
|
diskimage-builder and then launch an LXC container with that as the root
|
|
filesystem, and bind mounts for readonly data (git trees) and writable
|
|
space (artifacts).
|
|
|
|
LPE
|
|
***
|
|
|
|
* Mitigates a bit more than Cgroup+Chroot by preventing crossing user
|
|
namespace boundaries.
|
|
|
|
CIL
|
|
***
|
|
|
|
* Mitigates a few more leaks by further partitioning processes access to data
|
|
in the kernel that may belong to other processes.
|
|
|
|
DoS
|
|
***
|
|
|
|
* No better than cgroups + chroot.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Simpler implementation than Docker
|
|
|
|
* Well understood and mature set of technologies
|
|
|
|
Cons
|
|
****
|
|
|
|
* Less popular than Docker, risk it being abandoned
|
|
|
|
* Single-vendor open source project (Canonical) makes this problematic
|
|
for Zuul deployers on not-Ubuntu/Debian.
|
|
|
|
* Still requires careful filesystem and mount crafting.
|
|
|
|
Docker
|
|
~~~~~~
|
|
|
|
Docker started life as a daemon to control LXC, just like LXC 2.0 is
|
|
now. It has grown quite a bit from there and provides all of the same
|
|
LPE/CIL/DoS protections as LXC.
|
|
|
|
In addition to the LXC capabilities, it features a rich set of image
|
|
build tools, and a daemon for storing and retrieving those called 'docker
|
|
hub'. There is also a centralized internet Docker Hub where users share
|
|
their container images.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Industry wide attention means support and adoption will be less
|
|
controversial.
|
|
|
|
* Includes container storage limits as a feature, possibly mitigating
|
|
the need for the `du` storage monitoring thread, or at least providing
|
|
extra protection against the race condition.
|
|
|
|
Cons
|
|
****
|
|
|
|
* A mountain of features which we don't need means it is far more
|
|
complex than needed. The net effect of downtime and confusion for
|
|
operators of Zuul may not be worth the security mitigations.
|
|
|
|
rkt
|
|
~~~
|
|
|
|
Rkt is aimed at those who do feel that Docker is overkill for containing
|
|
things. It mostly sits as an abstraction for containment of things, with
|
|
systemd-nspawn and kvm available. It provides all the same LPE/CIO/DoS
|
|
protections as LXC.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Well thought out design that tries only to do one thing well
|
|
|
|
Cons
|
|
****
|
|
|
|
* Single-vendor
|
|
|
|
* Unknown how well tested it is
|
|
|
|
Bubblewrap
|
|
~~~~~~~~~~
|
|
|
|
https://github.com/projectatomic/bubblewrap
|
|
|
|
Bubblewrap is similar to Docker or LXC, except that it may not require
|
|
root privleges to sandbox an application. It is also aimed specifically
|
|
at sandboxing rather than providing image based isolation like LXC and
|
|
Docker. It would be used similar to LXC or Docker, and provide around
|
|
the same level of mitigation for LPE/CIL/DoS.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Small simple command line utility with no privileged daemons necessary.
|
|
|
|
* Specifically built for sandboxing partially trusted apps only.
|
|
|
|
* Supports Seccomp
|
|
|
|
Cons
|
|
****
|
|
|
|
* User space is not included in Ubuntu 16.04 (Backporting is trivial).
|
|
|
|
* Kernel on Ubuntu 16.04 is limited, Yakkety backport is required to
|
|
get full set of USER_NS features.
|
|
|
|
* The kernel side is relatively new and untested, and has already had
|
|
a few local root exploits found in it.
|
|
|
|
systemd-nspawn
|
|
~~~~~~~~~~~~~~
|
|
|
|
Similar to bubblewrap, but coming from the systemd project. It does have
|
|
some unprivileged capabilities, but I believe for our use case we would
|
|
need it to be setuid or run as root.
|
|
|
|
Its containment capabilities are comparable to Bubblewrap.
|
|
|
|
Pros
|
|
****
|
|
|
|
* It can take advantage of Btrfs or LVM for CoW
|
|
snapshots automatically, which is nice for scaling to lots of
|
|
concurrent jobs.
|
|
|
|
Cons
|
|
****
|
|
|
|
* Confusing relationship with systemd and machined.
|
|
|
|
* Seems focused on running a whole OS rather than an app.
|
|
|
|
AppArmor
|
|
~~~~~~~~
|
|
|
|
AppArmor is a relatively straight forward kernel security module that
|
|
allows defining the behavior of individual binaries. Combined with chroot,
|
|
this could be enough to mitigate most vulnerabilities.
|
|
|
|
LPE
|
|
***
|
|
|
|
* Mitigates further by reducing surface area in the kernel and userspace
|
|
|
|
CIL
|
|
***
|
|
|
|
* Mitigates further by reducing surface area in the kernel and userspace
|
|
|
|
DoS
|
|
***
|
|
|
|
* No significant improvement.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Extremely Simple profile language adds value without confusing admins
|
|
too much.
|
|
|
|
Cons
|
|
****
|
|
|
|
* Not supported on CentOS/Fedora/RHEL
|
|
|
|
* Having AppArmor enforcing can complicate things if packages have
|
|
defined AppArmor profiles that do not agree with how the executor
|
|
wants to use those packages.
|
|
|
|
|
|
SELinux
|
|
~~~~~~~
|
|
|
|
SELinux is similar to AppArmor, but can offer more fine-grained control
|
|
and thus more complete protection, at the cost of more complexity and
|
|
thus a more difficult implementation. It has more or less the same LPE/CIL/DoS
|
|
profile as AppArmor.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Extremely powerful tools allow extremely fine-grained control
|
|
|
|
* Specifically limits chroot and/or container breakouts with the
|
|
combination of process contexts and MCS (Multi-Category-Security)
|
|
|
|
Cons
|
|
****
|
|
|
|
* Having SELinux enforcing means the whole executor system must have its SELinux
|
|
configuration fully defined.
|
|
|
|
Recommendation
|
|
--------------
|
|
|
|
Based on the surface level evaluations, I believe Bubblewrap has the
|
|
highest value for the lowest complexity. We can use it with the /usr
|
|
from the executor bind mounted into the chroot, which is slightly less
|
|
secure than managing our own overlays and images since we may end up with
|
|
dangerous setuid binaries accessible to users. We are already building
|
|
working directories for jobs so putting a chroot in there doesn't seem
|
|
like too far of a departure.
|
|
|
|
Bubblewrap can be used via setuid on Ubuntu 16.04 (via backports)
|
|
without upgrading to a Yakkety kernel. It allows us to get a ton of
|
|
containment without sacrificing much in the way of complexity. We can
|
|
combine it with cgroups later to increase DoS protection once we have
|
|
it containing the process. We can also add SELinux support fairly easily
|
|
once this is known to work. Finally we can layer on seccomp and reduce
|
|
surface area even further.
|
|
|
|
Building images for the chroot with minimal binaries would reduce surface
|
|
area further, but this can be deferred until we have full container/COE
|
|
support for testing nodes. This way we can keep image building where it
|
|
is now, in Nodepool.
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
Secure Execution on a Test Node
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Alternatively, we could rely on Ansible in the node, and keep the flow
|
|
as-is, but make the untrusted context mean "inside a node". In order to
|
|
do that we would need to make one of the nodes an "untrusted executor"
|
|
(simplest answer on which one to use is the first one in the node set).
|
|
This would involve the following changes:
|
|
|
|
* Build custom inventory
|
|
|
|
* An inventory would need to have the untrusted executor setup
|
|
specially so that it uses ansible_connection=local, or it would
|
|
need to be able to SSH to itself.
|
|
|
|
* Create and distribute creds
|
|
|
|
* The untrusted executor would need an ephemeral private SSH key,
|
|
and all other nodes in the nodeset would need this key installed.
|
|
|
|
* Network Access
|
|
|
|
* Currently we verify that nodepool -> nodes works, and assume executor
|
|
-> nodes is equivalent. But this would require that we be able to
|
|
SSH from node to node, which may not always be possible. We also
|
|
likely will want to make sure inventories have the private IP.
|
|
|
|
* Ansible setup on untrusted executor
|
|
|
|
* We currently don't put any restrictions on nodes other than the
|
|
ability to SSH into them. We'd need to install ansible somehow,
|
|
possibly in a chroot to keep it isolated from the user's test
|
|
execution and dependencies. Isolating Ansible in this way should
|
|
be quite a bit simpler than isolating Ansible in a security context
|
|
though.
|
|
|
|
Pros
|
|
****
|
|
|
|
* Same containment for executor as tests mean we could probably
|
|
just drop the Ansible plugins.
|
|
|
|
* Executor scales with test nodes
|
|
|
|
Cons
|
|
****
|
|
|
|
* Ansible must be injected or present in all test nodes.
|
|
|
|
* Injection is brittle, requiring extra download and build steps that
|
|
add failure risk to test runs, potentially wasting resources.
|
|
|
|
* Requiring Ansible to be present is a burden for those who want to
|
|
take advantage of the fact that Zuul and nodepool allow custom images.
|
|
|
|
* Ansible's requirements are non-trivial, so if we can't spare more
|
|
test nodes for an executor-specific Ansible, at the very least
|
|
we would need to inject a virtualenv or chroot to run Ansible in,
|
|
contaminating the test nodes' environment.
|
|
|
|
* Resources normally allocated to running tests will be consumed by
|
|
executor, or nodes will need to be allocated to running playbooks only.
|
|
|
|
Ultimately, this method is rejected for both of the Cons above. The
|
|
Ansible plugin should provide medium level security, and a healthy dose
|
|
of namespaces, cgroups, and chroot should keep any breakouts contained.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
* SpamapS
|
|
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* Request backport of bubblewrap userspace from latest Ubuntu stable to
|
|
xenial-backports.
|
|
* Create ansible minimal chroot image.
|
|
* Add chroot-copy into job dir before insecure contexts.
|
|
* Add code to call ansible-playbook via `bwrap` in the insecure context.
|
|
|
|
Repositories
|
|
------------
|
|
|
|
openstack-infra/zuul (feature/zuulv3)
|
|
|
|
Servers
|
|
-------
|
|
|
|
N/A
|
|
|
|
DNS Entries
|
|
-----------
|
|
|
|
N/A
|
|
|
|
Documentation
|
|
-------------
|
|
|
|
We will need to write heavy documentation outlining not only how to setup
|
|
a executor, but what risks are still present.
|
|
|
|
Security
|
|
--------
|
|
|
|
This spec is entirely focused on enhancing the process for securing Zuul v3.
|
|
|
|
Testing
|
|
-------
|
|
|
|
Integration tests will need to be configured with the mitigation technologies
|
|
we implement.
|
|
|
|
Dependencies
|
|
============
|
|
|
|
zuulv3
|