Event-based Debugging, Monitoring and Billing solution for OpenStack.
Go to file
Manali Latkar 3f6542f049 - Added logic to populate generic rawdata and glance rawdata
- Moved the responsibilty to save rawdata to the notification classes
- Notification are now created based on exchange instead of routing_key since routing_keys
  may not be unique across services
- Separate consumers are now created for every exchange specified in the config
- Each consumer is started in a separate process
- Introduced notification factory and the config module
2013-07-18 13:52:58 +05:30
etc - Added logic to populate generic rawdata and glance rawdata 2013-07-18 13:52:58 +05:30
migrations - Converted migration to South data migration 2013-06-24 20:23:32 +05:30
reports Logging and more obvious match failures 2013-07-11 15:08:36 -04:00
stacktach - Added logic to populate generic rawdata and glance rawdata 2013-07-18 13:52:58 +05:30
static upgrade to StackTach v2 2012-10-26 15:00:50 -03:00
templates Adding request_id to search options 2013-06-12 16:35:08 -04:00
tests - Added logic to populate generic rawdata and glance rawdata 2013-07-18 13:52:58 +05:30
util Include disabled computes in usage seed 2013-06-07 15:42:00 -04:00
verifier Marking exists reconciled at proper location 2013-07-15 13:17:21 -04:00
worker - Added logic to populate generic rawdata and glance rawdata 2013-07-18 13:52:58 +05:30
__init__.py started extracting code 2012-02-20 11:54:13 -08:00
.gitignore Added tenant_id in verification of instanceexists and instanceusages 2013-05-17 02:05:53 +05:30
manage.py Increased list of fields that are verified 2013-06-24 18:57:18 +05:30
README.md better logging, stem the worker memory leaks and fix up the start script 2012-11-07 10:11:04 -04:00
run_integration_tests.sh Unit tests for RawData parsing 2013-01-25 16:04:21 -05:00
run_tests.sh Erasing coverage datafile on each run 2013-06-18 14:47:27 -04:00
settings.py - Added logic to populate generic rawdata and glance rawdata 2013-07-18 13:52:58 +05:30
urls.py novastats and other site templates 2012-02-29 07:46:37 -06:00

StackTach

StackTach is a debugging / monitoring utility for OpenStack ([Open]StackTach[ometer]). StackTach can work with multiple datacenters including multi-cell deployments.

Watch the video here: http://www.youtube.com/watch?v=pZgwDHZ3wm0

Overview

OpenStack has the ability to publish notifications to a RabbitMQ exchange as they occur. So, rather than pouring through reams of logs across multiple servers, you can now watch requests travel through the system from a single location.

A detailed description of the notifications published by OpenStack is available here

StackTach has three primary components:

  1. The Worker daemon. Consumes the notifications from the Rabbit queue and writes it to a SQL database.
  2. The Web UI, which is a Django application. Provides a real-time display of notifications as they are consumed by the worker. Also provides for point-and-click analysis of the events for following related events.
  3. Stacky, the command line tool. Operator and Admins aren't big fans of web interfaces. StackTach also exposes a REST interface which Stacky can use to provide output suitable for tail/grep post-processing.

Installing StackTach

The "Hurry Up" Install Guide

  1. Create a database for StackTach to use. By default, StackTach assumes MySql, but you can modify the settings.py file to others.
  2. Install django and the other required libraries listed in ./etc/pip-requires.txt (I hope I got 'em all)
  3. Clone this repo
  4. Copy and configure the config files in ./etc (see below for details)
  5. Create the necessary database tables (python manage.py syncdb) You don't need an administrator account since there are no user profiles used.
  6. Configure OpenStack to publish Notifications back into RabbitMQ (see below)
  7. Restart the OpenStack services.
  8. Run the Worker to start consuming messages. (see below)
  9. Run the web server (python manage.py runserver)
  10. Point your browser to http://127.0.0.1:8000 (the default server location)
  11. Click on stuff, see what happens. You can't hurt anything, it's all read-only.

Of course, this is only suitable for playing around. If you want to get serious about deploying StackTach you should set up a proper webserver and database on standalone servers. There is a lot of data that gets collected by StackTach (depending on your deployment size) ... be warned. Keep an eye on DB size.

The Config Files

There are two config files for StackTach. The first one tells us where the second one is. A sample of these two files is in ./etc/sample_*

The sample_stacktach_config.sh shell script defines the necessary environment variables StackTach needs. Most of these are just information about the database (assuming MySql) but some are a little different.

If your db host is not on the same machine, you'll need to set this flag. Otherwise the empty string is fine.

STACKTACH_INSTALL_DIR should point to where StackTach is running out of. In most cases this will be your repo directory, but it could be elsewhere if your going for a proper deployment. The StackTach worker needs to know which RabbitMQ servers to listen to. This information is stored in the deployment file. STACKTACH_DEPLOYMENTS_FILE should point to this json file. To learn more about the deployments file, see further down.

Finally, DJANGO_SETTINGS_MODULE tells Django where to get its configuration from. This should point to the setting.py file. You shouldn't have to do much with the settings.py file and most of what it needs is in these environment variables.

The sample_stacktach_worker_config.json file tells StackTach where each of the RabbitMQ servers are that it needs to get events from. In most cases you'll only have one entry in this file, but for large multi-cell deployments, this file can get pretty large. It's also handy for setting up one StackTach for each developer environment.

The file is in json format and the main configuration is under the "deployments" key, which should contain a list of deployment dictionaries.

A blank worker config file would look like this:

{"deployments": [] }

But that's not much fun. A deployment entry would look like this:

{"deployments": [
     {
         "name": "east_coast.prod.cell1",
         "durable_queue": false,
         "rabbit_host": "10.0.1.1",
         "rabbit_port": 5672,
         "rabbit_userid": "rabbit",
         "rabbit_password": "rabbit",
         "rabbit_virtual_host": "/"
     }
]}

where, name is whatever you want to call your deployment, and rabbit_<> are the connectivity details for your rabbit server. It should be the same information in your nova.conf file that OpenStack is using. Note, json has no concept of comments, so using #, // or /* */ as a comment won't work.

By default, Nova uses emphemeral queues. If you are using durable queues, be sure to change the necessary flag here.

You can add as many deployments as you like.

Starting the Worker

Note: the worker now uses librabbitmq, be sure to install that first.

./worker/start_workers.py will spawn a worker.py process for each deployment defined. Each worker will consume from a single Rabbit queue.

Configuring Nova to generate Notifications

--notification_driver=nova.openstack.common.notifier.rabbit_notifier --notification_topics=monitor

This will tell OpenStack to publish notifications to a Rabbit exchange starting with monitor.* ... this may result in monitor.info, monitor.error, etc.

You'll need to restart Nova once these changes are made.

Next Steps

Once you have this working well, you should download and install Stacky and play with the command line tool.