swift/doc/source/development_watchers.rst
Samuel Merritt b971280907 Let developers/operators add watchers to object audit
Swift operators may find it useful to operate on each object in their
cluster in some way. This commit provides them a way to hook into the
object auditor with a simple, clearly-defined boundary so that they
can iterate over their objects without additional disk IO.

For example, a cluster operator may want to ensure a semantic
consistency with all SLO segments accounted in their manifests,
or locate objects that aren't in container listings. Now that Swift
has encryption support, this could be used to locate unencrypted
objects. The list goes on.

This commit makes the auditor locate, via entry points, the watchers
named in its config file.

A watcher is a class with at least these four methods:

   __init__(self, conf, logger, **kwargs)

   start(self, audit_type, **kwargs)

   see_object(self, object_metadata, data_file_path, **kwargs)

   end(self, **kwargs)

The auditor will call watcher.start(audit_type) at the start of an
audit pass, watcher.see_object(...) for each object audited, and
watcher.end() at the end of an audit pass. All method arguments are
passed as keyword args.

This version of the API is implemented on the context of the
auditor itself, without spawning any additional processes.
If the plugins are not working well -- hang, crash, or leak --
it's easier to debug them when there's no additional complication
of processes that run by themselves.

In addition, we include a reference implementation of plugin for
the watcher API, as a help to plugin writers.

Change-Id: I1be1faec53b2cdfaabf927598f1460e23c206b0a
2020-12-26 17:16:14 -06:00

4.5 KiB

Auditor Watchers

Overview

The duty of auditors is to guard Swift against corruption in the storage media. But because auditors crawl all objects, they can be used to program Swift to operate on every object. It is done through an API known as "watcher".

Watchers do not have any private view into the cluster. An operator can write a standalone program that walks the directories and performs any desired inspection or maintenance. What watcher brings to the table is a framework to do the same job easily, under resource restrictions already in place for the auditor.

Operations performed by watchers are often site-specific, or else they would be incorporated into Swift already. However, the code in the tree provides a reference implementation for convenience. It is located in swift/obj/watchers/dark_data.py and implements so-called "Dark Data Watcher".

Currently, only object auditor supports the watchers.

The API class

The implementation of a watcher is a Python class that may look like this:

class MyWatcher(object):

  def __init__(self, conf, logger, **kwargs):
      pass

  def start(self, audit_type, **kwargs):
      pass

  def see_object(self, object_metadata, policy_index, partition,
                 data_file_path, **kwargs):
      pass

  def end(self, **kwargs):
      pass

Arguments to watcher methods are passed as keyword arguments, and methods are expected to consume new, unknown arguments.

The method __init__() is used to save configuration and logger at the start of the plug-in.

The method start() is invoked when auditor starts a pass. It usually resets counters. The argument auditor_type is string of "ALL" or "ZBF", according to the type of the auditor running the watcher. Watchers that talk to the network tend to hang off the ALL-type auditor, the lightweight ones are okay with the ZBF-type.

The method end() is the closing bracket for start(). It is typically used to log something, or dump some statistics.

The method see_object() is called when auditor completed an audit of an object. This is where most of the work is done.

The protocol for see_object() allows it to raise a special exception, QuarantienRequested. Auditor catches it and quarantines the object. In general, it's okay for watcher methods to throw exceptions, so an author of a watcher plugin does not have to catch them explicitly with a try:; they can be just permitted to bubble up naturally.

Loading the plugins

Swift auditor loads watcher classes from eggs, so it is necessary to wrap the class and provide it an entry point:

$ cat /usr/lib/python3.8/site-p*/mywatcher*egg-info/entry_points.txt
[mywatcher.mysection]
mywatcherentry = mywatcher:MyWatcher

Operator tells Swift auditor what plugins to load by adding them to object-server.conf in the section [object-auditor]. It is also possible to pass parameters, arriving in the argument conf{} of method start():

[object-auditor]
watchers = mywatcher#mywatcherentry,swift#dark_data

[object-auditor:watcher:mywatcher#mywatcherentry]
myparam=testing2020

Do not forget to remove the watcher from auditors when done. Although the API itself is very lightweight, it is common for watchers to incur a significant performance penalty: they can talk to networked services or access additional objects.

Dark Data Watcher

The watcher API is assumed to be under development. Operators who need extensions are welcome to report any needs for more arguments to see_object(). For now, start by copying the provided template watcher swift/obj/watchers/dark_data.py and see if it is sufficient.

The name of "Dark Data" refers to the scientific hypothesis of Dark Matter, which supposes that the universe contains a lot of matter than we cannot observe. The Dark Data in Swift is the name of objects that are not accounted in the containers.

The experience of running large scale clusters suggests that Swift does not have any particular bugs that trigger creation of dark data. So, this is an excercise in writing watchers, with a plausible function.

When enabled, Dark Data watcher definitely drags down the cluster's overall performance, as mentioned above. Of course, the load increase can be mitigated as usual, but at the expense of the total time taken by the pass of auditor.

Finally, keep in mind that Dark Data watcher needs the container ring to operate, but runs on an object node. This can come up if cluster has nodes separated by function.