diff --git a/docs/about.html b/docs/about.html index 2350225..8761901 100644 --- a/docs/about.html +++ b/docs/about.html @@ -62,17 +62,17 @@ @@ -135,7 +135,7 @@ max_messages = 100

You can write your own Yagi handlers if you like, but there are a number that ship with StackTach.v3 to do some interesting things. The most important of these is the winchester.yagi_handler:WinchesterHandler. This handler is your entry point into StackTach.v3 stream processing. But first, we need to convert those messy notifications into events ...

Distilling Notifications to Events

Now we have notifications coming into Winchester. But, as we hinted at above, we need to take the larger notification and distill it down into a, more manageable, event. The stack-distiller module makes this happen. Within StackTach.v3, this is part of winchester.yagi_handler:WinchesterHandler.

-

A notification is a large, nested JSON data structure. But we don't need all of that data for stream processing. In fact, we generally only require a few Traits from the notification. That's what distilling does. It pulls out the important traits, scrubs the data and uses that. Distillations are done via the distillation configuration file (specified in winchester.conf).

+

A notification is a large, nested JSON data structure. But we don't need all of that data for stream processing. In fact, we generally only require a few Traits from the notification. That's what distilling does. It pulls out the important traits, scrubs the data and uses that. Distillations are done via the distillation configuration file (specified in winchester.conf).

Only timestamp and event_type are required traits.

A sample notification @@ -281,7 +281,7 @@ pipeline_handlers: notabene: winchester.pipeline_handler:NotabeneHandler -

The first thing you'll notice is the database connection string. But then you'll notice that the Winchester module needs three other configuration files. The distiller config file we've already covered. The other two require a little more explaination. They define your Triggers and your Pipelines.

+

The first thing you'll notice is the database connection string. But then you'll notice that the Winchester module needs three other configuration files. The distiller config file we've already covered. The other two require a little more explaination. They define your Triggers and your Pipelines.

@@ -334,7 +334,7 @@ config_file = winchester.yaml

Streams are buckets that collect events. The bucket the event goes in is determined by the distinguishing traits you define. Generally these are traits that have a somewhat constrained set of values. For example, instance_id, request_id, user_id, tenant_id, region, server, ip_address ... are all good choices. Timestamp is generally not a good distinguishing trait since it varies so greatly. You would end up with a different stream for every incoming event and each stream would only have one event in it. Not very useful. Also, you can define multiple distinguishing traits. For example: region and the "day" portion of the timestamp. This would produce one stream for each region for each day of the month. If you had five regions, you'd end up with 5*31 stream buckets. The choices are limitless.

At some point you have to do something with the data in your buckets. This is what the fire criteria defines. You can make time-based firing criteria (such as 2 hours past the last collected event) or trait-based criteria (such as "when you see the 'foo' event"). Wildcards are permitted in matching criteria. Time-based firings are defined with the "expiration" setting. There is a simple grammar for defining how much time has to elapse for a expiry to occur. We will go into detail on this later. For real-time stream processing, it's best to keep these expiries short or stick with trait-based firing criteria. Expiries = lag.

-

Finally, we define the pipelines that will process the streams when they fire or expire. Pipelines are sets of pipeline handlers that do the processing. A pipeline handler is called with all the events in that stream. The events are in the temporal order they were generated. A pipeline handler does not need to concern itself with querying the database. It has all that it needs. Out-of-the-box, StackTach.v3 comes with a collection of pipeline handler for computing OpenStack usage for billing as well as re-publishing new notifications back into the queue. More are constantly being added and writing your own pipeline handlers is trivial. But more on that later.

+

Finally, we define the pipelines that will process the streams when they fire or expire. Pipelines are sets of pipeline handlers that do the processing. A pipeline handler is called with all the events in that stream. The events are in the temporal order they were generated. A pipeline handler does not need to concern itself with querying the database. It has all that it needs. Out-of-the-box, StackTach.v3 comes with a collection of pipeline handler for computing OpenStack usage for billing as well as re-publishing new notifications back into the queue. More are constantly being added and writing your own pipeline handlers is trivial. But more on that later.

You can define different pipelines for streams that fire and streams that expire. In the trigger definition file you simply give the name of the pipeline. Your winchester config file points to the pipeline configuration file that lists the pipeline handlers to run.

diff --git a/docs/contribute.html b/docs/contribute.html index 7e9e4af..7c47d75 100644 --- a/docs/contribute.html +++ b/docs/contribute.html @@ -58,7 +58,7 @@

Contributing to StackTach.v3

StackTach.v3 is licensed under the Apache 2.0 license

-

All the source repos for StackTach.v3 (and .v2) are available on SourceForge. Details on contributing to StackForge projects are available here

+

All the source repos for StackTach.v3 (and .v2) are available on StackForge. Details on contributing to StackForge projects are available here

The core developers are available on Freenode IRC in the #stacktach channel

These docs are available in the Sandbox repo. Patches welcome!

diff --git a/docs/glossary.html b/docs/glossary.html new file mode 100644 index 0000000..7e93307 --- /dev/null +++ b/docs/glossary.html @@ -0,0 +1,85 @@ + + + + + + + + + + + + StackTach.v3 + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+
+

Glossary

+
    +
  • Atom-Hopper - Atom-Hopper is a java system that produces ATOM feeds in a pub-sub manner. +
  • Stack Distiller - Stack-Distiller is a python library that extracts key traits from a complex JSON message to produce a smaller, flat set of key-value pairs. +
  • Events - Events are what we call Notifications that have been distilled. +
  • Handlers - a handler is python code that processes a small chunk of data. In StackTach.v3 we have a variety of different handlers for different purposes. There are Shoebox Handlers for dealing with notification archives, Yagi Handlers for processing messages as they come off the queue, Winchester Pipeline Handlers for processing completed event streams, etc. Refer to the appropriate library to see the structure of that handler, as they are all a little different. +
  • Notification - A notification is a JSON data structure. It can contain nested data with all native JSON data types. A notification must have event_type, message_id and timestamp in the top level traits. +
  • Notigen - Notigen is a python library that generates fake OpenStack Nova-style notifications. It simulates the common operations of Nova such as Create/Delete/Resize/Rebuild instance. +
  • Notabene - Notabene is a python library that consumes and publishes notifications to/from RabbitMQ queues. Within StackTach.v3 it's used for it's publishing capabilities. There is a Winchester Pipeline Handler that uses Notabene to publish new notification back to RabbitMQ. Notigen also uses Notabene to push simulated notifications to RabbitMQ. +
  • Pipeline - A pipeline is a series of handlers that process data one after another. There are Yagi pipelines, Winchester pipelines and Shoebox pipelines. +
  • Queues - A queue refers to a RabbitMQ queue. In RabbitMQ, messages are published to Exchanges, which are routed to queues until they are read by consumers. +
  • Shoebox - Shoebox is a python library for archiving complex JSON messages. Messages can be stored locally and tarballed (like logfiles) or packaged into binary archives. Archives can be exported to external stores, like HDFS or Swift, when they reach a certain size or age. +
  • Traits - Traits are key-value pairs. For example, in {'foo': 1, 'blah': 2} foo and blah are traits. +
  • Trigger - A trigger is a rule that deems when a Winchester stream should be processed. There are triggers that can fire when a particular event is seen or after a period of stream inactivity. +
  • Yagi - Yagi is a python library for consuming messages from queues. It supports a handler-chain approach to processing these messages. A handler can do whatever it wants with consumed messages. Multiple yagi workers can be run to consume messages faster. +
+
+

© Dark Secret Software Inc. 2014

+
+ +
+ + + + + +