dcf70f61b7
this closes github issue #21
193 lines
7.8 KiB
ReStructuredText
193 lines
7.8 KiB
ReStructuredText
==================
|
|
Swift stats system
|
|
==================
|
|
|
|
The swift stats system is composed of three parts parts: log creation, log
|
|
uploading, and log processing. The system handles two types of logs (access
|
|
and account stats), but it can be extended to handle other types of logs.
|
|
|
|
---------
|
|
Log Types
|
|
---------
|
|
|
|
***********
|
|
Access logs
|
|
***********
|
|
|
|
Access logs are the proxy server logs. Rackspace uses syslog-ng to redirect
|
|
the proxy log output to an hourly log file. For example, a proxy request that
|
|
is made on August 4, 2010 at 12:37 gets logged in a file named 2010080412.
|
|
This allows easy log rotation and easy per-hour log processing.
|
|
|
|
*********************************
|
|
Account / Container DB stats logs
|
|
*********************************
|
|
|
|
DB stats logs are generated by a stats system process.
|
|
swift-account-stats-logger runs on each account server (via cron) and walks
|
|
the filesystem looking for account databases. When an account database is
|
|
found, the logger selects the account hash, bytes_used, container_count, and
|
|
object_count. These values are then written out as one line in a csv file. One
|
|
csv file is produced for every run of swift-account-stats-logger. This means
|
|
that, system wide, one csv file is produced for every storage node. Rackspace
|
|
runs the account stats logger every hour. Therefore, in a cluster of ten
|
|
account servers, ten csv files are produced every hour. Also, every account
|
|
will have one entry for every replica in the system. On average, there will be
|
|
three copies of each account in the aggregate of all account stat csv files
|
|
created in one system-wide run. The swift-container-stats-logger runs in a
|
|
similar fashion, scanning the container dbs.
|
|
|
|
----------------------
|
|
Log Processing plugins
|
|
----------------------
|
|
|
|
The swift stats system is written to allow a plugin to be defined for every
|
|
log type. Swift includes plugins for both access logs and storage stats logs.
|
|
Each plugin is responsible for defining, in a config section, where the logs
|
|
are stored on disk, where the logs will be stored in swift (account and
|
|
container), the filename format of the logs on disk, the location of the
|
|
plugin class definition, and any plugin-specific config values.
|
|
|
|
The plugin class definition defines three methods. The constructor must accept
|
|
one argument (the dict representation of the plugin's config section). The
|
|
process method must accept an iterator, and the account, container, and object
|
|
name of the log. The keylist_mapping accepts no parameters.
|
|
|
|
-------------
|
|
Log Uploading
|
|
-------------
|
|
|
|
swift-log-uploader accepts a config file and a plugin name. It finds the log
|
|
files on disk according to the plugin config section and uploads them to the
|
|
swift cluster. This means one uploader process will run on each proxy server
|
|
node and each account server node. To not upload partially-written log files,
|
|
the uploader will not upload files with an mtime of less than two hours ago.
|
|
Rackspace runs this process once an hour via cron.
|
|
|
|
--------------
|
|
Log Processing
|
|
--------------
|
|
|
|
swift-log-stats-collector accepts a config file and generates a csv that is
|
|
uploaded to swift. It loads all plugins defined in the config file, generates
|
|
a list of all log files in swift that need to be processed, and passes an
|
|
iterable of the log file data to the appropriate plugin's process method. The
|
|
process method returns a dictionary of data in the log file keyed on (account,
|
|
year, month, day, hour). The log-stats-collector process then combines all
|
|
dictionaries from all calls to a process method into one dictionary. Key
|
|
collisions within each (account, year, month, day, hour) dictionary are
|
|
summed. Finally, the summed dictionary is mapped to the final csv values with
|
|
each plugin's keylist_mapping method.
|
|
|
|
The resulting csv file has one line per (account, year, month, day, hour) for
|
|
all log files processed in that run of swift-log-stats-collector.
|
|
|
|
|
|
--------------------------------
|
|
Running the stats system on SAIO
|
|
--------------------------------
|
|
|
|
#. Create a swift account to use for storing stats information, and note the
|
|
account hash. The hash will be used in config files.
|
|
|
|
#. Edit /etc/rsyslog.d/10-swift.conf::
|
|
|
|
# Uncomment the following to have a log containing all logs together
|
|
#local1,local2,local3,local4,local5.* /var/log/swift/all.log
|
|
|
|
$template HourlyProxyLog,"/var/log/swift/hourly/%$YEAR%%$MONTH%%$DAY%%$HOUR%"
|
|
local1.*;local1.!notice ?HourlyProxyLog
|
|
|
|
local1.*;local1.!notice /var/log/swift/proxy.log
|
|
local1.notice /var/log/swift/proxy.error
|
|
local1.* ~
|
|
|
|
#. Edit /etc/rsyslog.conf and make the following change::
|
|
$PrivDropToGroup adm
|
|
|
|
#. `mkdir -p /var/log/swift/hourly`
|
|
#. `chown -R syslog.adm /var/log/swift`
|
|
#. `chmod 775 /var/log/swift /var/log/swift/hourly`
|
|
#. `service rsyslog restart`
|
|
#. `usermod -a -G adm <your-user-name>`
|
|
#. Relogin to let the group change take effect.
|
|
#. Create `/etc/swift/log-processor.conf`::
|
|
|
|
[log-processor]
|
|
swift_account = <your-stats-account-hash>
|
|
user = <your-user-name>
|
|
|
|
[log-processor-access]
|
|
swift_account = <your-stats-account-hash>
|
|
container_name = log_data
|
|
log_dir = /var/log/swift/hourly/
|
|
source_filename_pattern = ^
|
|
(?P<year>[0-9]{4})
|
|
(?P<month>[0-1][0-9])
|
|
(?P<day>[0-3][0-9])
|
|
(?P<hour>[0-2][0-9])
|
|
.*$
|
|
class_path = slogging.access_processor.AccessLogProcessor
|
|
user = <your-user-name>
|
|
|
|
[log-processor-stats]
|
|
swift_account = <your-stats-account-hash>
|
|
container_name = account_stats
|
|
log_dir = /var/log/swift/stats/
|
|
class_path = slogging.stats_processor.StatsLogProcessor
|
|
devices = /srv/1/node
|
|
mount_check = false
|
|
user = <your-user-name>
|
|
|
|
[log-processor-container-stats]
|
|
swift_account = <your-stats-account-hash>
|
|
container_name = container_stats
|
|
log_dir = /var/log/swift/stats/
|
|
class_path = slogging.stats_processor.StatsLogProcessor
|
|
processable = false
|
|
devices = /srv/1/node
|
|
mount_check = false
|
|
user = <your-user-name>
|
|
|
|
#. Add the following under [app:proxy-server] in `/etc/swift/proxy-server.conf`::
|
|
|
|
log_facility = LOG_LOCAL1
|
|
|
|
#. Create a `cron` job to run once per hour to create the stats logs. In
|
|
`/etc/cron.d/swift-stats-log-creator`::
|
|
|
|
0 * * * * <your-user-name> /usr/local/bin/swift-account-stats-logger /etc/swift/log-processor.conf
|
|
|
|
#. Create a `cron` job to run once per hour to create the container stats logs. In
|
|
`/etc/cron.d/swift-container-stats-log-creator`::
|
|
|
|
5 * * * * <your-user-name> /usr/local/bin/swift-container-stats-logger /etc/swift/log-processor.conf
|
|
|
|
#. Create a `cron` job to run once per hour to upload the stats logs. In
|
|
`/etc/cron.d/swift-stats-log-uploader`::
|
|
|
|
10 * * * * <your-user-name> /usr/local/bin/swift-log-uploader /etc/swift/log-processor.conf stats
|
|
|
|
#. Create a `cron` job to run once per hour to upload the stats logs. In
|
|
`/etc/cron.d/swift-stats-log-uploader`::
|
|
|
|
15 * * * * <your-user-name> /usr/local/bin/swift-log-uploader /etc/swift/log-processor.conf container-stats
|
|
|
|
#. Create a `cron` job to run once per hour to upload the access logs. In
|
|
`/etc/cron.d/swift-access-log-uploader`::
|
|
|
|
5 * * * * <your-user-name> /usr/local/bin/swift-log-uploader /etc/swift/log-processor.conf access
|
|
|
|
#. Create a `cron` job to run once per hour to process the logs. In
|
|
`/etc/cron.d/swift-stats-processor`::
|
|
|
|
30 * * * * <your-user-name> /usr/local/bin/swift-log-stats-collector /etc/swift/log-processor.conf
|
|
|
|
After running for a few hours, you should start to see .csv files in the
|
|
log_processing_data container in the swift stats account that was created
|
|
earlier. This file will have one entry per account per hour for each account
|
|
with activity in that hour. One .csv file should be produced per hour. Note
|
|
that the stats will be delayed by at least two hours by default. This can be
|
|
changed with the new_log_cutoff variable in the config file. See
|
|
`log-processor.conf-sample` for more details.
|