.. This template should be in ReSTructured text. The filename in the git repository should match the launchpad URL, for example a URL of https://blueprints.launchpad.net/zaqar/+spec/awesome-thing should be named awesome-thing.rst. Please do not delete any of the sections in this template. If you have nothing to say for a whole section, just write: None For help with syntax, see http://www.sphinx-doc.org/en/stable/rest.html To test out your formatting, see http://www.tele3.cz/jbar/rest/rest.html ======================= Mistral Notifications ======================= https://blueprints.launchpad.net/zaqar/+spec/mistral-notifications Allow a message to a Zaqar queue to trigger a Mistral workflow via the Zaqar notification mechanism. Problem description =================== Developers of cloud applications expect to be able to build autonomous applications in the cloud. That is to say, applications that manage themselves by accessing the APIs of the cloud to manipulate their own infrastructure. (Examples of this include autoscaling and autorecovery.) This is one of the primary differences between a cloud platform and a simple virtualisation platform (the other being multi-tenancy). There are two parts to this that require integration, which is the purpose of this blueprint. The first is that the application must be able to receive information from the cloud. An example of this would be an Aodh alarm indicating that a server is overutilised. These notifications must be asynchronous, since the cloud is multitenant and cannot block waiting for any one user application to acknowledge it. They must also exhibit queueing semantics with at-least-once delivery and high durability, since the application may become unreliable if it misses notifications from the cloud. Not coincidentally, Zaqar offers exactly these semantics in a public, Keystone-authenticated API that is accessible to applications, and is therefore a natural choice. For this reason, a number of OpenStack projects have already started dispatching user notifications to Zaqar and more are expected in the near future. Already Aodh alarms support Zaqar as a target, and Heat can push stack and resources events as well as notifications about user hooks being triggered to Zaqar. The second is that the application must be able to perform arbitrary, and arbitrarily-complex actions. This is because in practice the Right Thing to do in cases like autoscaling and autorecovery is application-specific. There is also an entire universe of application-specific actions that a user might want to create. Of course an application can run these actions on a server provisioned with Nova, but this generally makes things more complex (and usually more expensive) than they need to be. For example, it is very hard to host autorecovery code on the servers that are being autorecovered themselves and still be reliable. Finally, OpenStack makes it difficult to provide appropriate Keystone credentials to servers provisioned with Nova. Mistral_ solves these problems by providing a lightweight, multi-tenant way of reliably running potentially long-running processes, with access to the OpenStack APIs as well as a number of other actions (some of which, like sending email and webhooks, are similar to Zaqar's notifications). The missing link to build fully autonomous applications is for messages (potentially, but not necessarily originating from the OpenStack cloud itself) on Zaqar queues to be able to trigger Mistral workflows (potentially, but not necessarily calling other OpenStack APIs). This would give developers of cloud applications an extremely flexible way of plugging together event-driven, application-specific, autonomous actions. .. _Mistral: https://wiki.openstack.org/wiki/Mistral Proposed change =============== Create a Zaqar notification sink plugin for Mistral. The effect of a notification to this sink would be to create a Mistral workflow Execution_ (i.e. to trigger a pre-existing Mistral workflow). The ``subscriber`` URI should be the URL of the Mistral executions endpoint, with the URI scheme ``trust+http`` or ``trust+https``. For example, ``trust+https://mistral.example.net/v2/executions``. This scheme indicates that Zaqar should create a Keystone trust that allows it to act on behalf of the user in making API calls to Mistral in the future. The trust ID will be inserted into the URL before it is stored in the form ``trust+http://trust_id@host/path``. This form is modelled after `the one used by Aodh`_. The trust lifetime should be slightly longer than the TTL of the subscription, or unlimited if there is no TTL for the subscription. Zaqar must delete the trust when deleting the subscription. When sending a notification, Zaqar will retrieve a trust token from Keystone using its own service user token and the trust ID stored in the URL. The trust token thus obtained should contain the correct tenant information to then make the request on behalf of the original user. Since in future Zaqar may want to make ``trust+http`` requests to other API endpoints, it should distinguish on more than just the URI scheme. When the subscription is created, Zaqar should need compare the URI with the Mistral executions endpoint URL obtained with the help of the Keystone catalog in order to distinguish between Mistral workflow triggers and ordinary webhooks. Fortunately, the URL is fixed for a given cloud, so the catalog would probably only need to be read once and it would be a straight string comparison from there. The ``options`` dict should contain the following keys: * ``workflow_id`` - The ID of the workflow to trigger * ``params`` - a dict of parameters that varies depending on the workflow type. e.g. a "reverse workflow" takes a ``task_name`` parameter to define the target task. * ``input`` - an arbitrary dict of keys and values to be passed as input to every workflow execution triggered by this notification. When creating the Mistral execution, the contents of the message and (later) the message ID will be passed in the environment (the ``env`` key in the ``params``). This allows the workflow to access the message data, but does not require it to declare a particular input for it (so the notification can be used to trigger *any* workflow). The message contents, interpreted as JSON, will be passed in a Mistral environment variable named ``notification``. When Zaqar supports passing the message id in a notification, it will be sent as the Mistral environment variable ``notification_id``. If these names conflict with the ``env`` passed by the user in ``params``, the user-provided data will be overwritten with that received in the message. Any other keys in the user's ``env`` will be preserved. If the user does not specify an ``env``, one will be created. The ``input`` dict, ``workflow_id`` and all other ``params`` will be passed through unmodified. While all the data is available to do a raw HTTP request, it is preferable if these calls are made through the python-mistralclient library. .. _Execution: https://docs.openstack.org/mistral/latest/#executions .. _the one used by Aodh: https://docs.openstack.org/aodh/latest/#trust-http Alternatives ------------ Instead of a push model, where Zaqar takes messages and notifies Mistral, it would also be possible to use a pull model where Mistral polls Zaqar topics for messages. However, while the Zaqar notification implementation already exists, there is no such existing component in Mistral that would be suitable for polling for triggers. It would need to poll large numbers of topics in different tenants. A similar design was considered and rejected for the notification feature of Zaqar; the same arguments apply here. An alternative authentication method might be to use pre-signed URLs, which are on the `Mistral roadmap`_. This might be quicker to implement, but in the longer term, Keystone trusts are probably preferable. Instead of whitelisting the Mistral executions URL, the ``trust+http`` scheme could be used to make requests to any OpenStack endpoint. However, in general the correct method of combining static information from the ``options`` dict with the contents of the message to obtain the call parameters will be different for every API. Since Mistral can already call most OpenStack APIs and supports a language (YAQL) for calculating the arguments using data from the notification and other input, the simplest way to achieve this is for the user to encapsulate any other OpenStack API call they wish to make in a Mistral workflow (which also allows them to define custom error handling). It would be nice if there were a way to identify an OpenStack resource with a URI without necessarily requiring a URL (containing redundant information about the location of the endpoint). AWS uses an `unofficial URN-like identifier`_ with an arn: (instead of urn:) scheme for this purpose. Something similar might be useful in other contexts in OpenStack too (for example, in Heat we would like to be able to distinguish between files in Swift containers or Glare links and ordinary HTTP URLs for the purposes of uploading user data, although there is some precedent for using ``swift+http`` as the scheme in the Swift case). However, this would require, at a minimum, wide cross-project agreement (and arguably IANA registration). There are no existing examples of anything like this in OpenStack. .. _Mistral roadmap: https://wiki.openstack.org/wiki/Mistral/Roadmap .. _unofficial URN-like identifier: http://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html Implementation ============== Assignee(s) ----------- This is one of those blueprints where I'm throwing it out there to see who picks it up. Milestones ---------- Target Milestone for completion: Newton-3 Work Items ---------- * Implement the Mistral notification plugin * Create a keystone trust and store its ID in the URI when setting up a ``trust+http(s)`` notification. Delete the trust again when the notification is deleted. * Add the ability to distinguish between Mistral URLs and other ``trust+http(s)`` URLs in the notification URI Dependencies ============ We won't be able to pass the message ID until https://review.opendev.org/#/c/276968/ or something equivalent merges. However, since it can be added to the Mistral environment later without rewriting any existing workflows (to declare a new input), this is in no way a blocker. .. note:: This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode