diff --git a/specs/ocata/secure-oslo-messaging-messages.rst b/specs/ocata/secure-oslo-messaging-messages.rst new file mode 100644 index 0000000..065c816 --- /dev/null +++ b/specs/ocata/secure-oslo-messaging-messages.rst @@ -0,0 +1,328 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + + Sections of this template were taken directly from the Nova spec + template at: + https://github.com/openstack/nova-specs/blob/master/specs/juno-template.rst + +================================= +Secure oslo.messaging.rpc message +================================= + +.. sectnum:: +.. contents:: + +Trove utilizes oslo_messaging.rpc to perform RPC calls and the +transport underlying this is oslo_messaging. Messages sent on +oslo.messaging are currently treated as genuine. There is a benefit to +adding a layer of validation that will ensure that the RPC calls are +in fact genuine. We propose that the RPC calls be encrypted with +unique keys. + +Launchpad Blueprint: +https://blueprints.launchpad.net/trove/+spec/secure-oslo-messaging-messages + + +Problem Description +=================== + +Messages sent on oslo.messaging are currently treated as genuine by +the recipient. Given that the names of the topics used are +predictable, it is possible for a person with sufficient knowledge of +Trove to, for example, compromise a guest instance or otherwise obtain +credentials to connect to RabbitMQ (or the underlying transport to +oslo-messaging) and then generate messages to, for example, the task +manager by impersonating the API service. While there are already +safeguards in place to contain the scope of this, such as by requiring +that the message contain a valid keystone token with the appropriate +access, this is still a point of vulnerability. + +Currently, when a client wishes to make an asynchronous RPC (cast()), +the method name and parameters are marshalled and sent down to +oslo_messaging.rpc. It is the responsibility of oslo_messaging.rpc to +transmit the information to the remote side, and then find and invoke +the method specified. After the cast() is invoked on the client side, +the next thing that is seen by the consumer of oslo_messaging.rpc is +an invocation of the desired method on the server side. + +The same thing happens for a synchronous RPC (call()) with the +additional step of the client blocking, the server completing the +operation and sending a response to the client, and the client +receiving that and unblocking. + +Proposed Change +=============== + +After experimenting with several other alternative approaches, we +propose to implement custom serializers (and deserializers) which can +be provided to oslo_messaging.rpc. + +All messages sent and return values in RPC call() will be serialized +through these custom methods which will encrypt the content. Due to a +bug ``Failure to use serializer in exception`` if an RPC function +throws an exception, the exception is not encrypted. + +How does TroveRPCDispatcher verify legitimacy of a message +---------------------------------------------------------- + +The proposed implementation relies on cryptography and unique keys for +the control plane and the guests. We propose to use symmetric keys for +the purpose of encryption. + +Trove has the following entities who are party to RPC invocations: + +- Trove API Service (client) +- Trove Taskmanager Service (client and server) +- Trove Conductor Service (server) +- Trove Guestagent (client and server) + +When an RPC call() or cast() is made, the client invokes the +serializer which will encrypt all arguments. When received on the +server side, oslo_messaging.rpc will invoke the deserializer which +will decrypt the arguments. + +It is assumed that the control plane is secure and the control plane +symmetric key is secure. If it is compromised, then all bets are off. + +In communication with the guest agent, each guest has a unique +symmetric key that is generated by the control plane and passed to the +guest at launch. + +Securing the response +--------------------- + +As described earlier, a response to a call() method will be secured in +the same way as the request. As observed earlier, due to a bug, an +exception thrown by an RPC function is not (currently) being +serialized and will therefore be returned unencrypted. When (if) that +bug is fixed in oslo_messaging.rpc, this exposure is minimized. + +Contol plane key +---------------- + +The control plane key is constructed at system initialization +time. The key is stored on the control plane (in the configuration +file). + +If the control plane consists of multiple machines, then the control +plane services on all machines must have access to the control plane +key. + +Getting keys to the guest instance +---------------------------------- + +On instance launch, the guest key is created and passed to the guest +as an injected file. We assume that the mechanism for file injection +is secure in that it cannot be intercepted and compromised by a bad +actor. + +A unique key is created for each instance. + +Why is this secure? +------------------- + +We make two assumptions above; these are: + +(a) The control plane is secure, the control plane key is not + compromised, and +(b) The transmission of the guest key to the guest is secure and is + not compromised. + +These are, meaningful and reasonable assumptions to make given the +architecture of an OpenStack system. + +Should a guest be compromised, the bad actor can connect to the +underlying transport (say Rabbit) but all they will be able to see are +encrypted messages that they cannot decrypt. + +Configuration +------------- + +The control plane key is stored on the control plane in a secure way +and there are configuration options to tell each service where to find +it. + +Each guest instance will have a key and that will be stored securely +on the instance and a configuraiton setting will tell the guestagent +where to find it. + +.. code-block:: python + + cfg.StrOpt('tm_rpc_encr_key', + default='bzH6y0SGmjuoY0FNSTptrhgieGXNDX6PIhvz', + help='OpenSSL aes_cbc key for taskmanager RPC encryption.'), + cfg.StrOpt('inst_rpc_key_encr_key', + default='emYjgHFqfXNB1NGehAFIUeoyw4V4XwWHEaKP', + help='OpenSSL aes_cbc key to encrypt instance keys in DB.'), + cfg.StrOpt('instance_rpc_encr_key', + help='OpenSSL aes_cbc key for instance RPC encryption.'), + +Database +-------- + +The guest key for each guest instance will be stored in the +database. A table instance_keys is proposed for this. + ++---------------+--------------+------+-----+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++---------------+--------------+------+-----+---------+-------+ +| id | varchar(64) | NO | PRI | NULL | | +| instance_id | varchar(64) | NO | UNI | NULL | | +| encrypted_key | varchar(255) | NO | | NULL | | +| created | datetime | NO | | NULL | | +| updated | datetime | NO | | NULL | | +| deleted | tinyint(1) | NO | | NULL | | +| deleted_at | datetime | YES | | NULL | | ++---------------+--------------+------+-----+---------+-------+ + +The guest instance keys are encrypted and stored in the encrypted_key +column. A foreign key constraint links instance_id with +instances.id. A unique constraint on instance_id is placed on this +table. + +Public API +---------- + +No changes to the public API. + +Public API Security +------------------- + +No changes. + +Python API +---------- + +No changes. + +CLI (python-troveclient) +------------------------ + +No changes. + +Internal API +------------ + +The internal API (from the perspective of developers, and invocations) +will remain unaffected by this change as the implementation seeks to +work below the Trove code entirely. As a result, the internal API will +be radically different, and code must be in place to ensure that +encrypting and non-encrypting clients and servers know how to +interoperate. + +Guest Agent +----------- + +The guestagent will receive its key as a part of the configdrive/boot +process and can use it to decrypt all messages. + +Alternatives +------------ + +Several alternatives were considered, prototyped, and abandoned. A +short summary of each is provided below. + +(a) We proposed to the oslo_messaging.rpc team to implement a + lightweight message signing and encryption mechanism in their code + by providing a mechanism of callbacks which would allow the + consumer (trove) to perform the signing and encryption. The + oslo_messaging team did not want to go this route as they felt + that the message included other private data structures which we + (the consumer) could modify and cause unexpected behavior. +(b) We proposed that that oslo_messaging.rpc allow consumers to + provide a custom dispatcher for messages on the receiver + side. With this implementation, a signature or message encryption + could be performed on the client side and intercepted on the + server side and reversed allowing us to have minimal changes on + the server side. Again, the oslo_messaging.rpc team felt that the + dispatcher was a private data structure and they did not feel that + we should be encapsulating it. +(c) We prototyped and experimented with a change where each RPC + endpoint would be decorated and the decorator would provide a + mechanism to construct the proper parameters and the invocation to + the RPC method. The client side change would be identical to (b) + but the server side change would involve a change to every RPC + method to add the decorator. In addition, the call context would + not be encrypted in this approach and it was abandoned. +(d) We were advised that we should NOT be using oslo_messaging.rpc the + way we are using it as it was only intended for use on the control + plane. And that we should instead make the guest an RPC + server. Unfortunately that's not what we need? In Trove, the guest + agent is an extension of the control plane and not well suited to + a REST based communication strategy. What we need is an RPC + mechanism, and it is sad that oslo_messaging.rpc can't seem to + provide a secure one. + +Dashboard Impact (UX) +===================== + +None. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + amrith + +Dashboard assignee: + none + +Milestones +---------- + +Ocata-1 + +Work Items +---------- + +- Implementing code on control plane and guest +- Implement changes to devstack plugin to create control plane key +- Implement unit tests +- Implement upgrade handling +- Update documentation + +Upgrade Implications +==================== + +Minimal upgrade implications are anticipated, code is proposed that +handles this transition. + +1. The control plane key will be generated and persisted on all + control plane nodes. +2. When guests are upgraded a key will be sent to them as part of the + nova migrate process. + +The API's will be rev'ed one major version to account for this. + +Dependencies +============ + +There is an assumed dependency on the RPC API versioning which has now +merged. + + +Testing +======= + +Oh yeah, we'll need some of this. + +Documentation Impact +==================== + +And some of this; details to follow. + +References +========== + +``Failure to use serializer in exception``: https://bugs.launchpad.net/oslo.messaging/+bug/1648254 + +Appendix +======== + +Any additional technical information and data.