Event-based Debugging, Monitoring and Billing solution for OpenStack.

Go to file

Andrew Melton 5df346da89 Testing and refactoring stacky_server		2013-02-26 15:35:52 -05:00
etc	Missing argparse in piprequires	2013-02-19 09:59:51 -04:00
migrations	Stored report support.	2013-02-19 09:37:13 -04:00
reports	97 percentile and median	2013-02-25 18:41:44 +00:00
stacktach	Testing and refactoring stacky_server	2013-02-26 15:35:52 -05:00
static	upgrade to StackTach v2	2012-10-26 15:00:50 -03:00
templates	Initial usage parser work	2013-01-25 16:02:52 -05:00
tests	Testing and refactoring stacky_server	2013-02-26 15:35:52 -05:00
worker	Adding gets for individual dbapi resources	2013-02-14 16:35:00 -05:00
__init__.py	started extracting code	2012-02-20 11:54:13 -08:00
.gitignore	Added non-sample config files to .gitignore	2013-02-06 13:50:14 -04:00
manage.py	started extracting code	2012-02-20 11:54:13 -08:00
README.md	better logging, stem the worker memory leaks and fix up the start script	2012-11-07 10:11:04 -04:00
run_integration_tests.sh	Unit tests for RawData parsing	2013-01-25 16:04:21 -05:00
run_tests.sh	Worker unit tests	2013-01-25 18:17:18 -05:00
settings.py	Use system time zone	2013-02-21 09:50:27 -05:00
urls.py	novastats and other site templates	2012-02-29 07:46:37 -06:00

README.md

StackTach

StackTach is a debugging / monitoring utility for OpenStack ([Open]StackTach[ometer]). StackTach can work with multiple datacenters including multi-cell deployments.

Watch the video here: http://www.youtube.com/watch?v=pZgwDHZ3wm0

Overview

OpenStack has the ability to publish notifications to a RabbitMQ exchange as they occur. So, rather than pouring through reams of logs across multiple servers, you can now watch requests travel through the system from a single location.

A detailed description of the notifications published by OpenStack is available here

StackTach has three primary components:

The Worker daemon. Consumes the notifications from the Rabbit queue and writes it to a SQL database.
The Web UI, which is a Django application. Provides a real-time display of notifications as they are consumed by the worker. Also provides for point-and-click analysis of the events for following related events.
Stacky, the command line tool. Operator and Admins aren't big fans of web interfaces. StackTach also exposes a REST interface which Stacky can use to provide output suitable for tail/grep post-processing.

Installing StackTach

The "Hurry Up" Install Guide

Create a database for StackTach to use. By default, StackTach assumes MySql, but you can modify the settings.py file to others.
Install django and the other required libraries listed in ./etc/pip-requires.txt (I hope I got 'em all)
Clone this repo
Copy and configure the config files in ./etc (see below for details)
Create the necessary database tables (python manage.py syncdb) You don't need an administrator account since there are no user profiles used.
Configure OpenStack to publish Notifications back into RabbitMQ (see below)
Restart the OpenStack services.
Run the Worker to start consuming messages. (see below)
Run the web server (python manage.py runserver)
Point your browser to http://127.0.0.1:8000 (the default server location)
Click on stuff, see what happens. You can't hurt anything, it's all read-only.

Of course, this is only suitable for playing around. If you want to get serious about deploying StackTach you should set up a proper webserver and database on standalone servers. There is a lot of data that gets collected by StackTach (depending on your deployment size) ... be warned. Keep an eye on DB size.

The Config Files

There are two config files for StackTach. The first one tells us where the second one is. A sample of these two files is in ./etc/sample_*

The sample_stacktach_config.sh shell script defines the necessary environment variables StackTach needs. Most of these are just information about the database (assuming MySql) but some are a little different.

If your db host is not on the same machine, you'll need to set this flag. Otherwise the empty string is fine.

STACKTACH_INSTALL_DIR should point to where StackTach is running out of. In most cases this will be your repo directory, but it could be elsewhere if your going for a proper deployment. The StackTach worker needs to know which RabbitMQ servers to listen to. This information is stored in the deployment file. STACKTACH_DEPLOYMENTS_FILE should point to this json file. To learn more about the deployments file, see further down.

Finally, DJANGO_SETTINGS_MODULE tells Django where to get its configuration from. This should point to the setting.py file. You shouldn't have to do much with the settings.py file and most of what it needs is in these environment variables.

The sample_stacktach_worker_config.json file tells StackTach where each of the RabbitMQ servers are that it needs to get events from. In most cases you'll only have one entry in this file, but for large multi-cell deployments, this file can get pretty large. It's also handy for setting up one StackTach for each developer environment.

The file is in json format and the main configuration is under the "deployments" key, which should contain a list of deployment dictionaries.

A blank worker config file would look like this:

{"deployments": [] }

But that's not much fun. A deployment entry would look like this:

{"deployments": [
     {
         "name": "east_coast.prod.cell1",
         "durable_queue": false,
         "rabbit_host": "10.0.1.1",
         "rabbit_port": 5672,
         "rabbit_userid": "rabbit",
         "rabbit_password": "rabbit",
         "rabbit_virtual_host": "/"
     }
]}

where, name is whatever you want to call your deployment, and rabbit_<> are the connectivity details for your rabbit server. It should be the same information in your nova.conf file that OpenStack is using. Note, json has no concept of comments, so using #, // or /* */ as a comment won't work.

By default, Nova uses emphemeral queues. If you are using durable queues, be sure to change the necessary flag here.

You can add as many deployments as you like.

Starting the Worker

Note: the worker now uses librabbitmq, be sure to install that first.

./worker/start_workers.py will spawn a worker.py process for each deployment defined. Each worker will consume from a single Rabbit queue.

Configuring Nova to generate Notifications

--notification_driver=nova.openstack.common.notifier.rabbit_notifier --notification_topics=monitor

This will tell OpenStack to publish notifications to a Rabbit exchange starting with monitor.* ... this may result in monitor.info, monitor.error, etc.

You'll need to restart Nova once these changes are made.

Next Steps

Once you have this working well, you should download and install Stacky and play with the command line tool.