trove-specs/specs/newton/persist-error-message.rst
Peter Stachowski 1dd207a381 Persist last error message and display on 'show'
This spec describes the work required to persist error
messages to the db and make them available to end users.

References: blueprint persist-error-message
Change-Id: I967b2c9f4f115a36e373eb83d0303601c3388cae
2016-05-19 22:20:34 +00:00

233 lines
6.9 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
Sections of this template were taken directly from the Nova spec
template at:
https://github.com/openstack/nova-specs/blob/master/specs/template.rst
..
This template should be in ReSTructured text. The filename in the git
repository should match the launchpad URL, for example a URL of
https://blueprints.launchpad.net/trove/+spec/awesome-thing should be named
awesome-thing.rst.
Please do not delete any of the sections in this template. If you
have nothing to say for a whole section, just write: None
Note: This comment may be removed if desired, however the license notice
above should remain.
=====================
Persist Error Message
=====================
.. If section numbers are desired, unindent this
.. sectnum::
.. If a TOC is desired, unindent this
.. contents::
Errors that occur in Trove should be easy to retrieve so that the end user
can see exactly what is happening with their database instance.
Launchpad Blueprint:
https://blueprints.launchpad.net/trove/+spec/persist-error-message
Problem Description
===================
Historically it has been very difficult to determine the cause of a failure in
Trove. This is due to the fact that errors may be logged in multiple places,
none of which are available to the end user. With the advent of Notifications
in Trove, however, it is now feasible to persist error messages in the db so
that they can be retrieved and displayed.
Proposed Change
===============
Each server will register a callback with the notification framework.
Whenever a notification is sent, this callback will be fired off and
any errors that occur can then be saved in the database. This information
can then be recalled by the user using the 'trove show' command.
For errors that occur outside the framework of notifications, a direct call
will be made to persist the error. Not all errors will need to be persisted, so
an initial set will be proposed that can be enhanced over time as the need
arises.
Configuration
-------------
No configuration changes are anticipated.
Database
--------
A new table (instance_faults) will be added to the Trove schema:
================= ============ =========== ==============================
Column Type Allow Nulls Description
================= ============ =========== ==============================
id varchar(64) No ID of fault (autogenerated)
instance_id varchar(64) No ID of instance that the fault
occurred on
message varchar(255) No Error message of the fault
details text(65535) No Extra details (i.e. stack
trace)
created DateTime No Created date
updated DateTime No Updated date
deleted tinyint(1) Yes Deleted flag
deleted_at DateTime Yes Deleted date
================= ============ =========== ==============================
Public API
----------
The only change to the public API will be the addition of a 'fault' data
structure that is returned when requesting instance details. This will
look like:
.. code-block:: python
'fault' :
{
'created': <date>,
'message': 'error message',
'details': 'potential stack trace',
},
The 'details' value will only be available if the request is done by an admin
user.
Public API Security
-------------------
No security issues are anticipated. Since the messages persisted are all
exception messages that are broadcast as notifications, none should contain
sensitive information. If any are found to, they should be treated as bugs
and modified accordingly (none have been discovered as of yet).
Python API
----------
No changes are anticipated to the python API.
CLI (python-troveclient)
------------------------
The 'show' Trove CLI command may now have new data displayed:
.. code-block:: bash
+-------------------+----------------------------------------------------+
| Property | Value |
+-------------------+----------------------------------------------------+
| created | 2016-05-06T21:28:53 |
| datastore | mysql |
| datastore_version | 5.6 |
| fault_date | 2016-05-06T21:30:06 |
| fault_details | Traceback (most recent call last): |
| | File "/<snip>/manager.py", line 265, in prepare |
| | cluster_config, snapshot, modules) |
| | File "/<snip>/manager.py", line 355, in _prepare |
| | raise RuntimeError("A guest error occurred") |
| | RuntimeError: A guest error occurred |
| fault_message | A guest error occured |
| flavor | 15 |
| id | 73cfc462-dd59-4dc1-9d32-95954171775f |
| ip | 10.66.25.8 |
| name | myinst2 |
| status | ACTIVE |
| updated | 2016-05-06T21:28:58 |
| volume | 1 |
| volume_used | 0.1 |
+-------------------+----------------------------------------------------+
Internal API
------------
No changes need to be made to this API.
Guest Agent
-----------
No changes need to be made to the guest agent.
Alternatives
------------
We could continue to require access to the logs and/or Nova instances to
determine what happened when an error occurs.
Dashboard Impact (UX)
=====================
The relevant fields need to be exposed during the 'show' command.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
[peterstac]
Milestones
----------
Newton
Work Items
----------
The work will be undertaken within a single task.
Upgrade Implications
====================
No upgrade issues are expected.
Dependencies
============
None.
Testing
=======
Scenario tests will be enhanced to verify that errors are persisted in the
database and can be retrieved.
Documentation Impact
====================
This is a net-new feature, and as such will require documentation.
References
==========
None
Appendix
========
None