161 lines
6.4 KiB
ReStructuredText
161 lines
6.4 KiB
ReStructuredText
==================
|
|
Swift stats system
|
|
==================
|
|
|
|
The swift stats system is composed of three parts parts: log creation, log
|
|
uploading, and log processing. The system handles two types of logs (access
|
|
and account stats), but it can be extended to handle other types of logs.
|
|
|
|
---------
|
|
Log Types
|
|
---------
|
|
|
|
***********
|
|
Access logs
|
|
***********
|
|
|
|
Access logs are the proxy server logs. Rackspace uses syslog-ng to redirect
|
|
the proxy log output to an hourly log file. For example, a proxy request that
|
|
is made on August 4, 2010 at 12:37 gets logged in a file named 2010080412.
|
|
This allows easy log rotation and easy per-hour log processing.
|
|
|
|
******************
|
|
Account stats logs
|
|
******************
|
|
|
|
Account stats logs are generated by a stats system process.
|
|
swift-account-stats-logger runs on each account server (via cron) and walks
|
|
the filesystem looking for account databases. When an account database is
|
|
found, the logger selects the account hash, bytes_used, container_count, and
|
|
object_count. These values are then written out as one line in a csv file. One
|
|
csv file is produced for every run of swift-account-stats-logger. This means
|
|
that, system wide, one csv file is produced for every storage node. Rackspace
|
|
runs the account stats logger every hour. Therefore, in a cluster of ten
|
|
account servers, ten csv files are produced every hour. Also, every account
|
|
will have one entry for every replica in the system. On average, there will be
|
|
three copies of each account in the aggregate of all account stat csv files
|
|
created in one system-wide run.
|
|
|
|
----------------------
|
|
Log Processing plugins
|
|
----------------------
|
|
|
|
The swift stats system is written to allow a plugin to be defined for every
|
|
log type. Swift includes plugins for both access logs and storage stats logs.
|
|
Each plugin is responsible for defining, in a config section, where the logs
|
|
are stored on disk, where the logs will be stored in swift (account and
|
|
container), the filename format of the logs on disk, the location of the
|
|
plugin class definition, and any plugin-specific config values.
|
|
|
|
The plugin class definition defines three methods. The constructor must accept
|
|
one argument (the dict representation of the plugin's config section). The
|
|
process method must accept an iterator, and the account, container, and object
|
|
name of the log. The keylist_mapping accepts no parameters.
|
|
|
|
-------------
|
|
Log Uploading
|
|
-------------
|
|
|
|
swift-log-uploader accepts a config file and a plugin name. It finds the log
|
|
files on disk according to the plugin config section and uploads them to the
|
|
swift cluster. This means one uploader process will run on each proxy server
|
|
node and each account server node. To not upload partially-written log files,
|
|
the uploader will not upload files with an mtime of less than two hours ago.
|
|
Rackspace runs this process once an hour via cron.
|
|
|
|
--------------
|
|
Log Processing
|
|
--------------
|
|
|
|
swift-log-stats-collector accepts a config file and generates a csv that is
|
|
uploaded to swift. It loads all plugins defined in the config file, generates
|
|
a list of all log files in swift that need to be processed, and passes an
|
|
iterable of the log file data to the appropriate plugin's process method. The
|
|
process method returns a dictionary of data in the log file keyed on (account,
|
|
year, month, day, hour). The log-stats-collector process then combines all
|
|
dictionaries from all calls to a process method into one dictionary. Key
|
|
collisions within each (account, year, month, day, hour) dictionary are
|
|
summed. Finally, the summed dictionary is mapped to the final csv values with
|
|
each plugin's keylist_mapping method.
|
|
|
|
The resulting csv file has one line per (account, year, month, day, hour) for
|
|
all log files processed in that run of swift-log-stats-collector.
|
|
|
|
|
|
================================
|
|
Running the stats system on SAIO
|
|
================================
|
|
|
|
#. Create a swift account to use for storing stats information, and note the
|
|
account hash. The hash will be used in config files.
|
|
|
|
#. Install syslog-ng::
|
|
|
|
sudo apt-get install syslog-ng
|
|
|
|
#. Add a destination rule to `/etc/syslog-ng/syslog-ng.conf`::
|
|
|
|
destination df_syslog_hourly { file("/var/log/swift/access-$YEAR$MONTH$DAY$HOUR"); };
|
|
|
|
#. Edit the destination rules to standard logging in
|
|
`/etc/syslog-ng/syslog-ng.conf` by adding the destination just created.
|
|
This will cause syslog messages to be also put into a file, named by the
|
|
current hour, in `/var/log/swift`.::
|
|
|
|
log {
|
|
source(s_all);
|
|
filter(f_syslog);
|
|
destination(df_syslog);
|
|
destination(df_syslog_hourly);
|
|
};
|
|
|
|
#. Restart syslog-ng
|
|
|
|
#. Create `/etc/swift/log-processor.conf`::
|
|
|
|
[log-processor]
|
|
swift_account = <your-stats-account-hash>
|
|
user = <your-user-name>
|
|
|
|
[log-processor-access]
|
|
swift_account = <your-stats-account-hash>
|
|
container_name = log_data
|
|
source_filename_format = access-%Y%m%d%H
|
|
class_path = swift.stats.access_processor.AccessLogProcessor
|
|
user = <your-user-name>
|
|
|
|
[log-processor-stats]
|
|
swift_account = <your-stats-account-hash>
|
|
container_name = account_stats
|
|
source_filename_format = stats-%Y%m%d%H_*
|
|
class_path = swift.stats.stats_processor.StatsLogProcessor
|
|
account_server_conf = /etc/swift/account-server/1.conf
|
|
user = <your-user-name>
|
|
|
|
#. Create a `cron` job to run once per hour to create the stats logs. In
|
|
`/etc/cron.d/swift-stats-log-creator`::
|
|
|
|
0 * * * * <your-user-name> swift-account-stats-logger /etc/swift/log-processor.conf
|
|
|
|
#. Create a `cron` job to run once per hour to upload the stats logs. In
|
|
`/etc/cron.d/swift-stats-log-uploader`::
|
|
|
|
10 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf stats
|
|
|
|
#. Create a `cron` job to run once per hour to upload the access logs. In
|
|
`/etc/cron.d/swift-access-log-uploader`::
|
|
|
|
5 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf access
|
|
|
|
#. Create a `cron` job to run once per hour to process the logs. In
|
|
`/etc/cron.d/swift-stats-processor`::
|
|
|
|
30 * * * * <your-user-name> swift-log-stats-collector /etc/swift/log-processor.conf
|
|
|
|
After running for a few hours, you should start to see .csv files in the
|
|
log_processing_data container in the swift stats account that was created
|
|
earlier. This file will have one entry per account per hour for each account
|
|
with activity in that hour. One .csv file should be produced per hour. Note
|
|
that the stats will be delayed by at least two hours by default. This can be
|
|
changed with the new_log_cutoff variable in the config file. See
|
|
`log-processing.conf-sample` for more details. |