infra-specs/specs/shade.rst

::

  Copyright 2015 Hewlett-Packard Development Company, L.P.

  This work is licensed under a Creative Commons Attribution 3.0
  Unported License.
  http://creativecommons.org/licenses/by/3.0/legalcode

========================================
shade: A library that understands clouds
========================================

Infra uses multiple clouds and as a result has learned a lot about what needs
to be done to do that. In the interest of being good citizens, instead of that
knowledge being inside of nodepool, it should be in a reusable library.

Problem Description
===================

As much as OpenStack promises a utopian future where an application can be
written once and target multiple clouds that run OpenStack, the reality is
that deployer choice leaks through the abstractions to the point where the
end user must know about it. This causes logic to require a-priori knowledge
about clouds, as well as complex logic even on discoverable differences.

The current user interface libraries, `python-*client`, are particularly
user unfriendly as they were primarily written with server-to-server
communication in mind. They were also each designed completely differently
so that an application which uses more than one OpenStack feature becomes
quickly confusing to write.

In addition to Infra, `ansible` has a set of modules that focus on creating
and managing cloud resources. As part of using `ansible` to orchestrate
`puppet`, it only makes sense for Infra to use `ansible` to manage its
resources, which means that the logic Infra has learned about how that works
should be applicable. Specifics on using `ansible` for that purpose are
out of scope of this spec, but `ansible` upstream as a consumer is an
important design consideration.

Proposed Change
===============

The `shade` library will handle all of this. It will contain the logic learned
from `nodepool`, or moving forward, it will contain any new complex cloud
manipulation logic that `nodepool` needs. It should be considered that
`nodepool` is `shade's` primary user.

To that end, `shade` must support constructs like application based API rate
limiting and caching appropriate for long-lived connections.

A consumer of `shade` should never need to put in logic such as "if my cloud
supports X, then do Y, else Z". There are two situations in which such logic
might arise.

Firstly, there are two or more ways of doing the same logical action.
An example is getting a floating IP, which could be the purview of
`neutron` or of `nova`. `shade` should present a general `create_floating_ip`
to the user and hide all details about where it came from.

Secondly, there is functionality that simply does not exist on a cloud.
For example, some clouds are deployed without trove. In that case, the user
will receive an error message stating that the selected cloud does not support
managing trove resources.

The `python-*client` libraries are not written with end users in mind. They
have, as their primary use case, the enabling of server to server
communication. As such, they make a set of assumptions that is not in keeping
with a consumer point of view. Their use should be replaced by
`python-openstacksdk` once it is ready. However, it is not, so in the mean
time the `python-*client` libraries need to be used. As the future plan is to
replace them, all objects and exceptions they return should be expressly
hidden, even though masking exceptions is considered poor form.

A future state could be imagined where `shade` and `python-openstacksdk` merge,
but it does not seem to be the primary concern of either library at the moment.
If it did happen, it would likely be as a "simple" API or something on top of
or to the side of the rest of the SDK. The reasons for this largely is that
`python-openstacksdk` is more concerned with providing an SDK to program the
OpenStack APIs with - and `shade` is more concerned with hiding the ways in
which deployers have chosen to do things that leak through the API. It is
likely that a future state where `shade` is depreciated is one in which the
issues it deals with are bundled into the server APIs. In this instance, a
layer of business logic is not needed.

Passthrough access to the underlying `Client` objects is useful for phased
adoption of `shade`. Before 1.0 is released, removal should be considered, or
hidden behind a disableable warning. This is to ensure a user has to explicity
opt-in knowing that they are not part of the API.

`ansible` is the second user of `shade`. The main addition this brings is the
need for idempotent operations. The `ansible` modules must have enough in the
API to be able to provide that without large amounts of repeated logic in the
modules themselves. In fact, most of the `ansible` modules should actually
contain very little code that is not related to `ansible` argument processing
or interpretation of results into a suitable format.

Finally, it is not `shade's` purpose in life to express what is or is not
OpenStack, nor to be involved in such categorizations. Its job is to improve
the end user's experience. For that reason, `shade` should take a maximal
approach to including support for things. If someone wants to add support
for `designate` or `magnum` or `manila` or whatever, that's awesome.

It is a conscious and active decision to not use a plugin interface for this.
Because again `shade` exists to reduce the cognitive burden on the user, the
user should not have to know to install plugins to be able to use their cloud.
The two main reasons for pluggable clients in the past is:

* Strict policies on what is 'Integrated'
* To enable proprietary extensions

The first is no longer a problem for OpenStack broadly, and even if it was
it's still not a practical issue for an Infra project.

The second is the thing that will ultimately cause OpenStack to die if it is
allowed to continue. While the right of people to choose to destroy all the
goodness in the world is an important right for them to have, there is no need
for Infra to involve itself such a tragedy.

Anything that's in `shade` needs to be testable by running `shade` functional
tests against a devstack in the Infra gates.

There is currently one exception to the testable in Infra gates, which is that
the Rackspace Task API for Glance does not work in devstack, so we cannot test
it. We have an exception for this because at the moment, `nodepool` must use
that API, and it is an API that exists in glance, even if the backing code
is broken. However, the general rule stands, and any violations of that rule
need to be carefully considered exceptions - and probably accompanied by a
large amount of complaining.

Alternatives
------------

We could ignore writing a library and write all of our logic directly in
nodepool. This is problematic because it causes a lot of really useful code
and logic to not be easily reusable by the community at large.

We could write all of the logic directly in the ansible modules upstream and
then have nodepool turn into an engine which consumes the ansible modules. This
is more tempting, but ansible does not support long-lived objects, which means
that we'd be execing ansible on every operation which seems rather extreme. It
also means that people not using ansible would be unable to benefit from the
logic.

We could improve the client libraries or `python-openstacksdk`. We've tried to
include richer logic in the client libraries and have been told it's not what
they are for. The `python-openstacksdk` is still young and we've been told it's
not ready for production use yet. We need some of the logic for `shade` now,
so the timescale for getting it done in `python-openstacksdk` isn't very
workable.

Implementation
==============

Assignee(s)
-----------

Primary assignee:
  mordred

Additional assignee(s):
  Shrews
  greghaynes
  dguerri
  TheJulia
  Spamaps

Gerrit Topic
------------

`shade` is a library itself, so there is no dedicated gerrit topic.

Work Items
----------

* Implement Image uploading for nodepool
* Get to feature parity with nodepool on floating-ips and server creation
* Implement ansible modules for every function in shade

Repositories
------------

openstack-infra/shade

Servers
-------

None

DNS Entries
-----------

None

Documentation
-------------

`shade` needs developer documentation of its API

Security
--------

None

Testing
-------

`shade` should have both unit tests and functional tests. The functional
tests should run against devstack VMs. If a developer chooses to, they should
be able to manually run functional tests against live clouds, since the purpose
of shade is to enable use of myriad clouds, not to support or expose
theoretical APIs.

Dependencies
============

None