docs: Move metric name/description tables out to separate page(s)
Offer it both by service and as a single, more easily searchable, page. That admin guide is *still* too long, but this should help a bit. Change-Id: I946c72f40dce2f33ef845a0ca816038727848b3a
This commit is contained in:
parent
149b617c28
commit
307315bde2
@ -883,450 +883,28 @@ of async_pendings in real-time, but will not tell you the current number of
|
|||||||
async_pending container updates on disk at any point in time.
|
async_pending container updates on disk at any point in time.
|
||||||
|
|
||||||
Note also that the set of metrics collected, their names, and their semantics
|
Note also that the set of metrics collected, their names, and their semantics
|
||||||
are not locked down and will change over time.
|
are not locked down and will change over time. For more details, see the
|
||||||
|
service-specific tables listed below:
|
||||||
|
|
||||||
Metrics for `account-auditor`:
|
.. toctree::
|
||||||
|
metrics/account_auditor
|
||||||
========================== =========================================================
|
metrics/account_reaper
|
||||||
Metric Name Description
|
metrics/account_server
|
||||||
-------------------------- ---------------------------------------------------------
|
metrics/account_replicator
|
||||||
`account-auditor.errors` Count of audit runs (across all account databases) which
|
metrics/container_auditor
|
||||||
caught an Exception.
|
metrics/container_replicator
|
||||||
`account-auditor.passes` Count of individual account databases which passed audit.
|
metrics/container_server
|
||||||
`account-auditor.failures` Count of individual account databases which failed audit.
|
metrics/container_sync
|
||||||
`account-auditor.timing` Timing data for individual account database audits.
|
metrics/container_updater
|
||||||
========================== =========================================================
|
metrics/object_auditor
|
||||||
|
metrics/object_expirer
|
||||||
Metrics for `account-reaper`:
|
metrics/object_reconstructor
|
||||||
|
metrics/object_replicator
|
||||||
============================================== ====================================================
|
metrics/object_server
|
||||||
Metric Name Description
|
metrics/object_updater
|
||||||
---------------------------------------------- ----------------------------------------------------
|
metrics/proxy_server
|
||||||
`account-reaper.errors` Count of devices failing the mount check.
|
|
||||||
`account-reaper.timing` Timing data for each reap_account() call.
|
|
||||||
`account-reaper.return_codes.X` Count of HTTP return codes from various operations
|
|
||||||
(e.g. object listing, container deletion, etc.). The
|
|
||||||
value for X is the first digit of the return code
|
|
||||||
(2 for 201, 4 for 404, etc.).
|
|
||||||
`account-reaper.containers_failures` Count of failures to delete a container.
|
|
||||||
`account-reaper.containers_deleted` Count of containers successfully deleted.
|
|
||||||
`account-reaper.containers_remaining` Count of containers which failed to delete with
|
|
||||||
zero successes.
|
|
||||||
`account-reaper.containers_possibly_remaining` Count of containers which failed to delete with
|
|
||||||
at least one success.
|
|
||||||
`account-reaper.objects_failures` Count of failures to delete an object.
|
|
||||||
`account-reaper.objects_deleted` Count of objects successfully deleted.
|
|
||||||
`account-reaper.objects_remaining` Count of objects which failed to delete with zero
|
|
||||||
successes.
|
|
||||||
`account-reaper.objects_possibly_remaining` Count of objects which failed to delete with at
|
|
||||||
least one success.
|
|
||||||
============================================== ====================================================
|
|
||||||
|
|
||||||
Metrics for `account-server` ("Not Found" is not considered an error and requests
|
|
||||||
which increment `errors` are not included in the timing data):
|
|
||||||
|
|
||||||
======================================== =======================================================
|
|
||||||
Metric Name Description
|
|
||||||
---------------------------------------- -------------------------------------------------------
|
|
||||||
`account-server.DELETE.errors.timing` Timing data for each DELETE request resulting in an
|
|
||||||
error: bad request, not mounted, missing timestamp.
|
|
||||||
`account-server.DELETE.timing` Timing data for each DELETE request not resulting in
|
|
||||||
an error.
|
|
||||||
`account-server.PUT.errors.timing` Timing data for each PUT request resulting in an error:
|
|
||||||
bad request, not mounted, conflict, recently-deleted.
|
|
||||||
`account-server.PUT.timing` Timing data for each PUT request not resulting in an
|
|
||||||
error.
|
|
||||||
`account-server.HEAD.errors.timing` Timing data for each HEAD request resulting in an
|
|
||||||
error: bad request, not mounted.
|
|
||||||
`account-server.HEAD.timing` Timing data for each HEAD request not resulting in
|
|
||||||
an error.
|
|
||||||
`account-server.GET.errors.timing` Timing data for each GET request resulting in an
|
|
||||||
error: bad request, not mounted, bad delimiter,
|
|
||||||
account listing limit too high, bad accept header.
|
|
||||||
`account-server.GET.timing` Timing data for each GET request not resulting in
|
|
||||||
an error.
|
|
||||||
`account-server.REPLICATE.errors.timing` Timing data for each REPLICATE request resulting in an
|
|
||||||
error: bad request, not mounted.
|
|
||||||
`account-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
|
|
||||||
in an error.
|
|
||||||
`account-server.POST.errors.timing` Timing data for each POST request resulting in an
|
|
||||||
error: bad request, bad or missing timestamp, not
|
|
||||||
mounted.
|
|
||||||
`account-server.POST.timing` Timing data for each POST request not resulting in
|
|
||||||
an error.
|
|
||||||
======================================== =======================================================
|
|
||||||
|
|
||||||
Metrics for `account-replicator`:
|
|
||||||
|
|
||||||
===================================== ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
------------------------------------- ----------------------------------------------------
|
|
||||||
`account-replicator.diffs` Count of syncs handled by sending differing rows.
|
|
||||||
`account-replicator.diff_caps` Count of "diffs" operations which failed because
|
|
||||||
"max_diffs" was hit.
|
|
||||||
`account-replicator.no_changes` Count of accounts found to be in sync.
|
|
||||||
`account-replicator.hashmatches` Count of accounts found to be in sync via hash
|
|
||||||
comparison (`broker.merge_syncs` was called).
|
|
||||||
`account-replicator.rsyncs` Count of completely missing accounts which were sent
|
|
||||||
via rsync.
|
|
||||||
`account-replicator.remote_merges` Count of syncs handled by sending entire database
|
|
||||||
via rsync.
|
|
||||||
`account-replicator.attempts` Count of database replication attempts.
|
|
||||||
`account-replicator.failures` Count of database replication attempts which failed
|
|
||||||
due to corruption (quarantined) or inability to read
|
|
||||||
as well as attempts to individual nodes which
|
|
||||||
failed.
|
|
||||||
`account-replicator.removes.<device>` Count of databases on <device> deleted because the
|
|
||||||
delete_timestamp was greater than the put_timestamp
|
|
||||||
and the database had no rows or because it was
|
|
||||||
successfully sync'ed to other locations and doesn't
|
|
||||||
belong here anymore.
|
|
||||||
`account-replicator.successes` Count of replication attempts to an individual node
|
|
||||||
which were successful.
|
|
||||||
`account-replicator.timing` Timing data for each database replication attempt
|
|
||||||
not resulting in a failure.
|
|
||||||
===================================== ====================================================
|
|
||||||
|
|
||||||
Metrics for `container-auditor`:
|
|
||||||
|
|
||||||
============================ ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
---------------------------- ----------------------------------------------------
|
|
||||||
`container-auditor.errors` Incremented when an Exception is caught in an audit
|
|
||||||
pass (only once per pass, max).
|
|
||||||
`container-auditor.passes` Count of individual containers passing an audit.
|
|
||||||
`container-auditor.failures` Count of individual containers failing an audit.
|
|
||||||
`container-auditor.timing` Timing data for each container audit.
|
|
||||||
============================ ====================================================
|
|
||||||
|
|
||||||
Metrics for `container-replicator`:
|
|
||||||
|
|
||||||
======================================= ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
--------------------------------------- ----------------------------------------------------
|
|
||||||
`container-replicator.diffs` Count of syncs handled by sending differing rows.
|
|
||||||
`container-replicator.diff_caps` Count of "diffs" operations which failed because
|
|
||||||
"max_diffs" was hit.
|
|
||||||
`container-replicator.no_changes` Count of containers found to be in sync.
|
|
||||||
`container-replicator.hashmatches` Count of containers found to be in sync via hash
|
|
||||||
comparison (`broker.merge_syncs` was called).
|
|
||||||
`container-replicator.rsyncs` Count of completely missing containers where were sent
|
|
||||||
via rsync.
|
|
||||||
`container-replicator.remote_merges` Count of syncs handled by sending entire database
|
|
||||||
via rsync.
|
|
||||||
`container-replicator.attempts` Count of database replication attempts.
|
|
||||||
`container-replicator.failures` Count of database replication attempts which failed
|
|
||||||
due to corruption (quarantined) or inability to read
|
|
||||||
as well as attempts to individual nodes which
|
|
||||||
failed.
|
|
||||||
`container-replicator.removes.<device>` Count of databases deleted on <device> because the
|
|
||||||
delete_timestamp was greater than the put_timestamp
|
|
||||||
and the database had no rows or because it was
|
|
||||||
successfully sync'ed to other locations and doesn't
|
|
||||||
belong here anymore.
|
|
||||||
`container-replicator.successes` Count of replication attempts to an individual node
|
|
||||||
which were successful.
|
|
||||||
`container-replicator.timing` Timing data for each database replication attempt
|
|
||||||
not resulting in a failure.
|
|
||||||
======================================= ====================================================
|
|
||||||
|
|
||||||
Metrics for `container-server` ("Not Found" is not considered an error and requests
|
|
||||||
which increment `errors` are not included in the timing data):
|
|
||||||
|
|
||||||
========================================== ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
------------------------------------------ ----------------------------------------------------
|
|
||||||
`container-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
|
|
||||||
not mounted, missing timestamp, conflict.
|
|
||||||
`container-server.DELETE.timing` Timing data for each DELETE request not resulting in
|
|
||||||
an error.
|
|
||||||
`container-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
|
|
||||||
missing timestamp, not mounted, conflict.
|
|
||||||
`container-server.PUT.timing` Timing data for each PUT request not resulting in an
|
|
||||||
error.
|
|
||||||
`container-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
|
|
||||||
not mounted.
|
|
||||||
`container-server.HEAD.timing` Timing data for each HEAD request not resulting in
|
|
||||||
an error.
|
|
||||||
`container-server.GET.errors.timing` Timing data for GET request errors: bad request,
|
|
||||||
not mounted, parameters not utf8, bad accept header.
|
|
||||||
`container-server.GET.timing` Timing data for each GET request not resulting in
|
|
||||||
an error.
|
|
||||||
`container-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
|
|
||||||
request, not mounted.
|
|
||||||
`container-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
|
|
||||||
in an error.
|
|
||||||
`container-server.POST.errors.timing` Timing data for POST request errors: bad request,
|
|
||||||
bad x-container-sync-to, not mounted.
|
|
||||||
`container-server.POST.timing` Timing data for each POST request not resulting in
|
|
||||||
an error.
|
|
||||||
========================================== ====================================================
|
|
||||||
|
|
||||||
Metrics for `container-sync`:
|
|
||||||
|
|
||||||
=============================== ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
------------------------------- ----------------------------------------------------
|
|
||||||
`container-sync.skips` Count of containers skipped because they don't have
|
|
||||||
sync'ing enabled.
|
|
||||||
`container-sync.failures` Count of failures sync'ing of individual containers.
|
|
||||||
`container-sync.syncs` Count of individual containers sync'ed successfully.
|
|
||||||
`container-sync.deletes` Count of container database rows sync'ed by
|
|
||||||
deletion.
|
|
||||||
`container-sync.deletes.timing` Timing data for each container database row
|
|
||||||
synchronization via deletion.
|
|
||||||
`container-sync.puts` Count of container database rows sync'ed by Putting.
|
|
||||||
`container-sync.puts.timing` Timing data for each container database row
|
|
||||||
synchronization via Putting.
|
|
||||||
=============================== ====================================================
|
|
||||||
|
|
||||||
Metrics for `container-updater`:
|
|
||||||
|
|
||||||
============================== ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
------------------------------ ----------------------------------------------------
|
|
||||||
`container-updater.successes` Count of containers which successfully updated their
|
|
||||||
account.
|
|
||||||
`container-updater.failures` Count of containers which failed to update their
|
|
||||||
account.
|
|
||||||
`container-updater.no_changes` Count of containers which didn't need to update
|
|
||||||
their account.
|
|
||||||
`container-updater.timing` Timing data for processing a container; only
|
|
||||||
includes timing for containers which needed to
|
|
||||||
update their accounts (i.e. "successes" and
|
|
||||||
"failures" but not "no_changes").
|
|
||||||
============================== ====================================================
|
|
||||||
|
|
||||||
Metrics for `object-auditor`:
|
|
||||||
|
|
||||||
============================ ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
---------------------------- ----------------------------------------------------
|
|
||||||
`object-auditor.quarantines` Count of objects failing audit and quarantined.
|
|
||||||
`object-auditor.errors` Count of errors encountered while auditing objects.
|
|
||||||
`object-auditor.timing` Timing data for each object audit (does not include
|
|
||||||
any rate-limiting sleep time for
|
|
||||||
max_files_per_second, but does include rate-limiting
|
|
||||||
sleep time for max_bytes_per_second).
|
|
||||||
============================ ====================================================
|
|
||||||
|
|
||||||
Metrics for `object-expirer`:
|
|
||||||
|
|
||||||
======================== ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
------------------------ ----------------------------------------------------
|
|
||||||
`object-expirer.objects` Count of objects expired.
|
|
||||||
`object-expirer.errors` Count of errors encountered while attempting to
|
|
||||||
expire an object.
|
|
||||||
`object-expirer.timing` Timing data for each object expiration attempt,
|
|
||||||
including ones resulting in an error.
|
|
||||||
======================== ====================================================
|
|
||||||
|
|
||||||
Metrics for `object-reconstructor`:
|
|
||||||
|
|
||||||
====================================================== ======================================================
|
|
||||||
Metric Name Description
|
|
||||||
------------------------------------------------------ ------------------------------------------------------
|
|
||||||
`object-reconstructor.partition.delete.count.<device>` A count of partitions on <device> which were
|
|
||||||
reconstructed and synced to another node because they
|
|
||||||
didn't belong on this node. This metric is tracked
|
|
||||||
per-device to allow for "quiescence detection" for
|
|
||||||
object reconstruction activity on each device.
|
|
||||||
`object-reconstructor.partition.delete.timing` Timing data for partitions reconstructed and synced to
|
|
||||||
another node because they didn't belong on this node.
|
|
||||||
This metric is not tracked per device.
|
|
||||||
`object-reconstructor.partition.update.count.<device>` A count of partitions on <device> which were
|
|
||||||
reconstructed and synced to another node, but also
|
|
||||||
belong on this node. As with delete.count, this metric
|
|
||||||
is tracked per-device.
|
|
||||||
`object-reconstructor.partition.update.timing` Timing data for partitions reconstructed which also
|
|
||||||
belong on this node. This metric is not tracked
|
|
||||||
per-device.
|
|
||||||
`object-reconstructor.suffix.hashes` Count of suffix directories whose hash (of filenames)
|
|
||||||
was recalculated.
|
|
||||||
`object-reconstructor.suffix.syncs` Count of suffix directories reconstructed with ssync.
|
|
||||||
====================================================== ======================================================
|
|
||||||
|
|
||||||
Metrics for `object-replicator`:
|
|
||||||
|
|
||||||
=================================================== ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
--------------------------------------------------- ----------------------------------------------------
|
|
||||||
`object-replicator.partition.delete.count.<device>` A count of partitions on <device> which were
|
|
||||||
replicated to another node because they didn't
|
|
||||||
belong on this node. This metric is tracked
|
|
||||||
per-device to allow for "quiescence detection" for
|
|
||||||
object replication activity on each device.
|
|
||||||
`object-replicator.partition.delete.timing` Timing data for partitions replicated to another
|
|
||||||
node because they didn't belong on this node. This
|
|
||||||
metric is not tracked per device.
|
|
||||||
`object-replicator.partition.update.count.<device>` A count of partitions on <device> which were
|
|
||||||
replicated to another node, but also belong on this
|
|
||||||
node. As with delete.count, this metric is tracked
|
|
||||||
per-device.
|
|
||||||
`object-replicator.partition.update.timing` Timing data for partitions replicated which also
|
|
||||||
belong on this node. This metric is not tracked
|
|
||||||
per-device.
|
|
||||||
`object-replicator.suffix.hashes` Count of suffix directories whose hash (of filenames)
|
|
||||||
was recalculated.
|
|
||||||
`object-replicator.suffix.syncs` Count of suffix directories replicated with rsync.
|
|
||||||
=================================================== ====================================================
|
|
||||||
|
|
||||||
Metrics for `object-server`:
|
|
||||||
|
|
||||||
======================================= ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
--------------------------------------- ----------------------------------------------------
|
|
||||||
`object-server.quarantines` Count of objects (files) found bad and moved to
|
|
||||||
quarantine.
|
|
||||||
`object-server.async_pendings` Count of container updates saved as async_pendings
|
|
||||||
(may result from PUT or DELETE requests).
|
|
||||||
`object-server.POST.errors.timing` Timing data for POST request errors: bad request,
|
|
||||||
missing timestamp, delete-at in past, not mounted.
|
|
||||||
`object-server.POST.timing` Timing data for each POST request not resulting in
|
|
||||||
an error.
|
|
||||||
`object-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
|
|
||||||
not mounted, missing timestamp, object creation
|
|
||||||
constraint violation, delete-at in past.
|
|
||||||
`object-server.PUT.timeouts` Count of object PUTs which exceeded max_upload_time.
|
|
||||||
`object-server.PUT.timing` Timing data for each PUT request not resulting in an
|
|
||||||
error.
|
|
||||||
`object-server.PUT.<device>.timing` Timing data per kB transferred (ms/kB) for each
|
|
||||||
non-zero-byte PUT request on each device.
|
|
||||||
Monitoring problematic devices, higher is bad.
|
|
||||||
`object-server.GET.errors.timing` Timing data for GET request errors: bad request,
|
|
||||||
not mounted, header timestamps before the epoch,
|
|
||||||
precondition failed.
|
|
||||||
File errors resulting in a quarantine are not
|
|
||||||
counted here.
|
|
||||||
`object-server.GET.timing` Timing data for each GET request not resulting in an
|
|
||||||
error. Includes requests which couldn't find the
|
|
||||||
object (including disk errors resulting in file
|
|
||||||
quarantine).
|
|
||||||
`object-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
|
|
||||||
not mounted.
|
|
||||||
`object-server.HEAD.timing` Timing data for each HEAD request not resulting in
|
|
||||||
an error. Includes requests which couldn't find the
|
|
||||||
object (including disk errors resulting in file
|
|
||||||
quarantine).
|
|
||||||
`object-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
|
|
||||||
missing timestamp, not mounted, precondition
|
|
||||||
failed. Includes requests which couldn't find or
|
|
||||||
match the object.
|
|
||||||
`object-server.DELETE.timing` Timing data for each DELETE request not resulting
|
|
||||||
in an error.
|
|
||||||
`object-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
|
|
||||||
request, not mounted.
|
|
||||||
`object-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
|
|
||||||
in an error.
|
|
||||||
======================================= ====================================================
|
|
||||||
|
|
||||||
Metrics for `object-updater`:
|
|
||||||
|
|
||||||
============================ ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
---------------------------- ----------------------------------------------------
|
|
||||||
`object-updater.errors` Count of drives not mounted or async_pending files
|
|
||||||
with an unexpected name.
|
|
||||||
`object-updater.timing` Timing data for object sweeps to flush async_pending
|
|
||||||
container updates. Does not include object sweeps
|
|
||||||
which did not find an existing async_pending storage
|
|
||||||
directory.
|
|
||||||
`object-updater.quarantines` Count of async_pending container updates which were
|
|
||||||
corrupted and moved to quarantine.
|
|
||||||
`object-updater.successes` Count of successful container updates.
|
|
||||||
`object-updater.failures` Count of failed container updates.
|
|
||||||
`object-updater.unlinks` Count of async_pending files unlinked. An
|
|
||||||
async_pending file is unlinked either when it is
|
|
||||||
successfully processed or when the replicator sees
|
|
||||||
that there is a newer async_pending file for the
|
|
||||||
same object.
|
|
||||||
============================ ====================================================
|
|
||||||
|
|
||||||
Metrics for `proxy-server` (in the table, `<type>` is the proxy-server
|
|
||||||
controller responsible for the request and will be one of "account",
|
|
||||||
"container", or "object"):
|
|
||||||
|
|
||||||
======================================== ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
---------------------------------------- ----------------------------------------------------
|
|
||||||
`proxy-server.errors` Count of errors encountered while serving requests
|
|
||||||
before the controller type is determined. Includes
|
|
||||||
invalid Content-Length, errors finding the internal
|
|
||||||
controller to handle the request, invalid utf8, and
|
|
||||||
bad URLs.
|
|
||||||
`proxy-server.<type>.handoff_count` Count of node hand-offs; only tracked if log_handoffs
|
|
||||||
is set in the proxy-server config.
|
|
||||||
`proxy-server.<type>.handoff_all_count` Count of times *only* hand-off locations were
|
|
||||||
utilized; only tracked if log_handoffs is set in the
|
|
||||||
proxy-server config.
|
|
||||||
`proxy-server.<type>.client_timeouts` Count of client timeouts (client did not read within
|
|
||||||
`client_timeout` seconds during a GET or did not
|
|
||||||
supply data within `client_timeout` seconds during
|
|
||||||
a PUT).
|
|
||||||
`proxy-server.<type>.client_disconnects` Count of detected client disconnects during PUT
|
|
||||||
operations (does NOT include caught Exceptions in
|
|
||||||
the proxy-server which caused a client disconnect).
|
|
||||||
======================================== ====================================================
|
|
||||||
|
|
||||||
Metrics for `proxy-logging` middleware (in the table, `<type>` is either the
|
|
||||||
proxy-server controller responsible for the request: "account", "container",
|
|
||||||
"object", or the string "SOS" if the request came from the `Swift Origin Server`_
|
|
||||||
middleware. The `<verb>` portion will be one of "GET", "HEAD", "POST", "PUT",
|
|
||||||
"DELETE", "COPY", "OPTIONS", or "BAD_METHOD". The list of valid HTTP methods
|
|
||||||
is configurable via the `log_statsd_valid_http_methods` config variable and
|
|
||||||
the default setting yields the above behavior):
|
|
||||||
|
|
||||||
.. _Swift Origin Server: https://github.com/dpgoetz/sos
|
|
||||||
|
|
||||||
==================================================== ============================================
|
|
||||||
Metric Name Description
|
|
||||||
---------------------------------------------------- --------------------------------------------
|
|
||||||
`proxy-server.<type>.<verb>.<status>.timing` Timing data for requests, start to finish.
|
|
||||||
The <status> portion is the numeric HTTP
|
|
||||||
status code for the request (e.g. "200" or
|
|
||||||
"404").
|
|
||||||
`proxy-server.<type>.GET.<status>.first-byte.timing` Timing data up to completion of sending the
|
|
||||||
response headers (only for GET requests).
|
|
||||||
<status> and <type> are as for the main
|
|
||||||
timing metric.
|
|
||||||
`proxy-server.<type>.<verb>.<status>.xfer` This counter metric is the sum of bytes
|
|
||||||
transferred in (from clients) and out (to
|
|
||||||
clients) for requests. The <type>, <verb>,
|
|
||||||
and <status> portions of the metric are just
|
|
||||||
like the main timing metric.
|
|
||||||
==================================================== ============================================
|
|
||||||
|
|
||||||
The `proxy-logging` middleware also groups these metrics by policy. The
|
|
||||||
`<policy-index>` portion represents a policy index):
|
|
||||||
|
|
||||||
========================================================================== =====================================
|
|
||||||
Metric Name Description
|
|
||||||
-------------------------------------------------------------------------- -------------------------------------
|
|
||||||
`proxy-server.object.policy.<policy-index>.<verb>.<status>.timing` Timing data for requests, aggregated
|
|
||||||
by policy index.
|
|
||||||
`proxy-server.object.policy.<policy-index>.GET.<status>.first-byte.timing` Timing data up to completion of
|
|
||||||
sending the response headers,
|
|
||||||
aggregated by policy index.
|
|
||||||
`proxy-server.object.policy.<policy-index>.<verb>.<status>.xfer` Sum of bytes transferred in and out,
|
|
||||||
aggregated by policy index.
|
|
||||||
========================================================================== =====================================
|
|
||||||
|
|
||||||
Metrics for `tempauth` middleware (in the table, `<reseller_prefix>` represents
|
|
||||||
the actual configured reseller_prefix or "`NONE`" if the reseller_prefix is the
|
|
||||||
empty string):
|
|
||||||
|
|
||||||
========================================= ====================================================
|
|
||||||
Metric Name Description
|
|
||||||
----------------------------------------- ----------------------------------------------------
|
|
||||||
`tempauth.<reseller_prefix>.unauthorized` Count of regular requests which were denied with
|
|
||||||
HTTPUnauthorized.
|
|
||||||
`tempauth.<reseller_prefix>.forbidden` Count of regular requests which were denied with
|
|
||||||
HTTPForbidden.
|
|
||||||
`tempauth.<reseller_prefix>.token_denied` Count of token requests which were denied.
|
|
||||||
`tempauth.<reseller_prefix>.errors` Count of errors.
|
|
||||||
========================================= ====================================================
|
|
||||||
|
|
||||||
|
Or, view :doc:`metrics/all` as one page.
|
||||||
|
|
||||||
------------------------
|
------------------------
|
||||||
Debugging Tips and Tools
|
Debugging Tips and Tools
|
||||||
|
12
doc/source/metrics/account_auditor.rst
Normal file
12
doc/source/metrics/account_auditor.rst
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
``account-auditor`` Metrics
|
||||||
|
===========================
|
||||||
|
|
||||||
|
========================== =========================================================
|
||||||
|
Metric Name Description
|
||||||
|
-------------------------- ---------------------------------------------------------
|
||||||
|
`account-auditor.errors` Count of audit runs (across all account databases) which
|
||||||
|
caught an Exception.
|
||||||
|
`account-auditor.passes` Count of individual account databases which passed audit.
|
||||||
|
`account-auditor.failures` Count of individual account databases which failed audit.
|
||||||
|
`account-auditor.timing` Timing data for individual account database audits.
|
||||||
|
========================== =========================================================
|
25
doc/source/metrics/account_reaper.rst
Normal file
25
doc/source/metrics/account_reaper.rst
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
``account-reaper`` Metrics
|
||||||
|
==========================
|
||||||
|
|
||||||
|
============================================== ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
---------------------------------------------- ----------------------------------------------------
|
||||||
|
`account-reaper.errors` Count of devices failing the mount check.
|
||||||
|
`account-reaper.timing` Timing data for each reap_account() call.
|
||||||
|
`account-reaper.return_codes.X` Count of HTTP return codes from various operations
|
||||||
|
(e.g. object listing, container deletion, etc.). The
|
||||||
|
value for X is the first digit of the return code
|
||||||
|
(2 for 201, 4 for 404, etc.).
|
||||||
|
`account-reaper.containers_failures` Count of failures to delete a container.
|
||||||
|
`account-reaper.containers_deleted` Count of containers successfully deleted.
|
||||||
|
`account-reaper.containers_remaining` Count of containers which failed to delete with
|
||||||
|
zero successes.
|
||||||
|
`account-reaper.containers_possibly_remaining` Count of containers which failed to delete with
|
||||||
|
at least one success.
|
||||||
|
`account-reaper.objects_failures` Count of failures to delete an object.
|
||||||
|
`account-reaper.objects_deleted` Count of objects successfully deleted.
|
||||||
|
`account-reaper.objects_remaining` Count of objects which failed to delete with zero
|
||||||
|
successes.
|
||||||
|
`account-reaper.objects_possibly_remaining` Count of objects which failed to delete with at
|
||||||
|
least one success.
|
||||||
|
============================================== ====================================================
|
31
doc/source/metrics/account_replicator.rst
Normal file
31
doc/source/metrics/account_replicator.rst
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
``account-replicator`` Metrics
|
||||||
|
==============================
|
||||||
|
|
||||||
|
===================================== ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
------------------------------------- ----------------------------------------------------
|
||||||
|
`account-replicator.diffs` Count of syncs handled by sending differing rows.
|
||||||
|
`account-replicator.diff_caps` Count of "diffs" operations which failed because
|
||||||
|
"max_diffs" was hit.
|
||||||
|
`account-replicator.no_changes` Count of accounts found to be in sync.
|
||||||
|
`account-replicator.hashmatches` Count of accounts found to be in sync via hash
|
||||||
|
comparison (`broker.merge_syncs` was called).
|
||||||
|
`account-replicator.rsyncs` Count of completely missing accounts which were sent
|
||||||
|
via rsync.
|
||||||
|
`account-replicator.remote_merges` Count of syncs handled by sending entire database
|
||||||
|
via rsync.
|
||||||
|
`account-replicator.attempts` Count of database replication attempts.
|
||||||
|
`account-replicator.failures` Count of database replication attempts which failed
|
||||||
|
due to corruption (quarantined) or inability to read
|
||||||
|
as well as attempts to individual nodes which
|
||||||
|
failed.
|
||||||
|
`account-replicator.removes.<device>` Count of databases on <device> deleted because the
|
||||||
|
delete_timestamp was greater than the put_timestamp
|
||||||
|
and the database had no rows or because it was
|
||||||
|
successfully sync'ed to other locations and doesn't
|
||||||
|
belong here anymore.
|
||||||
|
`account-replicator.successes` Count of replication attempts to an individual node
|
||||||
|
which were successful.
|
||||||
|
`account-replicator.timing` Timing data for each database replication attempt
|
||||||
|
not resulting in a failure.
|
||||||
|
===================================== ====================================================
|
37
doc/source/metrics/account_server.rst
Normal file
37
doc/source/metrics/account_server.rst
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
``account-server`` Metrics
|
||||||
|
==========================
|
||||||
|
|
||||||
|
..note::
|
||||||
|
"Not Found" is not considered an error and requests
|
||||||
|
which increment `errors` are not included in the timing data.
|
||||||
|
|
||||||
|
======================================== =======================================================
|
||||||
|
Metric Name Description
|
||||||
|
---------------------------------------- -------------------------------------------------------
|
||||||
|
`account-server.DELETE.errors.timing` Timing data for each DELETE request resulting in an
|
||||||
|
error: bad request, not mounted, missing timestamp.
|
||||||
|
`account-server.DELETE.timing` Timing data for each DELETE request not resulting in
|
||||||
|
an error.
|
||||||
|
`account-server.PUT.errors.timing` Timing data for each PUT request resulting in an error:
|
||||||
|
bad request, not mounted, conflict, recently-deleted.
|
||||||
|
`account-server.PUT.timing` Timing data for each PUT request not resulting in an
|
||||||
|
error.
|
||||||
|
`account-server.HEAD.errors.timing` Timing data for each HEAD request resulting in an
|
||||||
|
error: bad request, not mounted.
|
||||||
|
`account-server.HEAD.timing` Timing data for each HEAD request not resulting in
|
||||||
|
an error.
|
||||||
|
`account-server.GET.errors.timing` Timing data for each GET request resulting in an
|
||||||
|
error: bad request, not mounted, bad delimiter,
|
||||||
|
account listing limit too high, bad accept header.
|
||||||
|
`account-server.GET.timing` Timing data for each GET request not resulting in
|
||||||
|
an error.
|
||||||
|
`account-server.REPLICATE.errors.timing` Timing data for each REPLICATE request resulting in an
|
||||||
|
error: bad request, not mounted.
|
||||||
|
`account-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
|
||||||
|
in an error.
|
||||||
|
`account-server.POST.errors.timing` Timing data for each POST request resulting in an
|
||||||
|
error: bad request, bad or missing timestamp, not
|
||||||
|
mounted.
|
||||||
|
`account-server.POST.timing` Timing data for each POST request not resulting in
|
||||||
|
an error.
|
||||||
|
======================================== =======================================================
|
24
doc/source/metrics/all.rst
Normal file
24
doc/source/metrics/all.rst
Normal file
@ -0,0 +1,24 @@
|
|||||||
|
:orphan:
|
||||||
|
|
||||||
|
All Statsd Metrics
|
||||||
|
==================
|
||||||
|
|
||||||
|
.. include:: account_auditor.rst
|
||||||
|
.. include:: account_reaper.rst
|
||||||
|
.. include:: account_server.rst
|
||||||
|
.. include:: account_replicator.rst
|
||||||
|
|
||||||
|
.. include:: container_auditor.rst
|
||||||
|
.. include:: container_replicator.rst
|
||||||
|
.. include:: container_server.rst
|
||||||
|
.. include:: container_sync.rst
|
||||||
|
.. include:: container_updater.rst
|
||||||
|
|
||||||
|
.. include:: object_auditor.rst
|
||||||
|
.. include:: object_expirer.rst
|
||||||
|
.. include:: object_reconstructor.rst
|
||||||
|
.. include:: object_replicator.rst
|
||||||
|
.. include:: object_server.rst
|
||||||
|
.. include:: object_updater.rst
|
||||||
|
|
||||||
|
.. include:: proxy_server.rst
|
12
doc/source/metrics/container_auditor.rst
Normal file
12
doc/source/metrics/container_auditor.rst
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
``container-auditor`` Metrics
|
||||||
|
=============================
|
||||||
|
|
||||||
|
============================ ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
---------------------------- ----------------------------------------------------
|
||||||
|
`container-auditor.errors` Incremented when an Exception is caught in an audit
|
||||||
|
pass (only once per pass, max).
|
||||||
|
`container-auditor.passes` Count of individual containers passing an audit.
|
||||||
|
`container-auditor.failures` Count of individual containers failing an audit.
|
||||||
|
`container-auditor.timing` Timing data for each container audit.
|
||||||
|
============================ ====================================================
|
31
doc/source/metrics/container_replicator.rst
Normal file
31
doc/source/metrics/container_replicator.rst
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
``container-replicator`` Metrics
|
||||||
|
================================
|
||||||
|
|
||||||
|
======================================= ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
--------------------------------------- ----------------------------------------------------
|
||||||
|
`container-replicator.diffs` Count of syncs handled by sending differing rows.
|
||||||
|
`container-replicator.diff_caps` Count of "diffs" operations which failed because
|
||||||
|
"max_diffs" was hit.
|
||||||
|
`container-replicator.no_changes` Count of containers found to be in sync.
|
||||||
|
`container-replicator.hashmatches` Count of containers found to be in sync via hash
|
||||||
|
comparison (`broker.merge_syncs` was called).
|
||||||
|
`container-replicator.rsyncs` Count of completely missing containers where were sent
|
||||||
|
via rsync.
|
||||||
|
`container-replicator.remote_merges` Count of syncs handled by sending entire database
|
||||||
|
via rsync.
|
||||||
|
`container-replicator.attempts` Count of database replication attempts.
|
||||||
|
`container-replicator.failures` Count of database replication attempts which failed
|
||||||
|
due to corruption (quarantined) or inability to read
|
||||||
|
as well as attempts to individual nodes which
|
||||||
|
failed.
|
||||||
|
`container-replicator.removes.<device>` Count of databases deleted on <device> because the
|
||||||
|
delete_timestamp was greater than the put_timestamp
|
||||||
|
and the database had no rows or because it was
|
||||||
|
successfully sync'ed to other locations and doesn't
|
||||||
|
belong here anymore.
|
||||||
|
`container-replicator.successes` Count of replication attempts to an individual node
|
||||||
|
which were successful.
|
||||||
|
`container-replicator.timing` Timing data for each database replication attempt
|
||||||
|
not resulting in a failure.
|
||||||
|
======================================= ====================================================
|
35
doc/source/metrics/container_server.rst
Normal file
35
doc/source/metrics/container_server.rst
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
``container-server`` Metrics
|
||||||
|
============================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
"Not Found" is not considered an error and requests
|
||||||
|
which increment `errors` are not included in the timing data.
|
||||||
|
|
||||||
|
========================================== ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
------------------------------------------ ----------------------------------------------------
|
||||||
|
`container-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
|
||||||
|
not mounted, missing timestamp, conflict.
|
||||||
|
`container-server.DELETE.timing` Timing data for each DELETE request not resulting in
|
||||||
|
an error.
|
||||||
|
`container-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
|
||||||
|
missing timestamp, not mounted, conflict.
|
||||||
|
`container-server.PUT.timing` Timing data for each PUT request not resulting in an
|
||||||
|
error.
|
||||||
|
`container-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
|
||||||
|
not mounted.
|
||||||
|
`container-server.HEAD.timing` Timing data for each HEAD request not resulting in
|
||||||
|
an error.
|
||||||
|
`container-server.GET.errors.timing` Timing data for GET request errors: bad request,
|
||||||
|
not mounted, parameters not utf8, bad accept header.
|
||||||
|
`container-server.GET.timing` Timing data for each GET request not resulting in
|
||||||
|
an error.
|
||||||
|
`container-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
|
||||||
|
request, not mounted.
|
||||||
|
`container-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
|
||||||
|
in an error.
|
||||||
|
`container-server.POST.errors.timing` Timing data for POST request errors: bad request,
|
||||||
|
bad x-container-sync-to, not mounted.
|
||||||
|
`container-server.POST.timing` Timing data for each POST request not resulting in
|
||||||
|
an error.
|
||||||
|
========================================== ====================================================
|
18
doc/source/metrics/container_sync.rst
Normal file
18
doc/source/metrics/container_sync.rst
Normal file
@ -0,0 +1,18 @@
|
|||||||
|
``container-sync`` Metrics
|
||||||
|
==========================
|
||||||
|
|
||||||
|
=============================== ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
------------------------------- ----------------------------------------------------
|
||||||
|
`container-sync.skips` Count of containers skipped because they don't have
|
||||||
|
sync'ing enabled.
|
||||||
|
`container-sync.failures` Count of failures sync'ing of individual containers.
|
||||||
|
`container-sync.syncs` Count of individual containers sync'ed successfully.
|
||||||
|
`container-sync.deletes` Count of container database rows sync'ed by
|
||||||
|
deletion.
|
||||||
|
`container-sync.deletes.timing` Timing data for each container database row
|
||||||
|
synchronization via deletion.
|
||||||
|
`container-sync.puts` Count of container database rows sync'ed by Putting.
|
||||||
|
`container-sync.puts.timing` Timing data for each container database row
|
||||||
|
synchronization via Putting.
|
||||||
|
=============================== ====================================================
|
17
doc/source/metrics/container_updater.rst
Normal file
17
doc/source/metrics/container_updater.rst
Normal file
@ -0,0 +1,17 @@
|
|||||||
|
``container-updater`` Metrics
|
||||||
|
=============================
|
||||||
|
|
||||||
|
============================== ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
------------------------------ ----------------------------------------------------
|
||||||
|
`container-updater.successes` Count of containers which successfully updated their
|
||||||
|
account.
|
||||||
|
`container-updater.failures` Count of containers which failed to update their
|
||||||
|
account.
|
||||||
|
`container-updater.no_changes` Count of containers which didn't need to update
|
||||||
|
their account.
|
||||||
|
`container-updater.timing` Timing data for processing a container; only
|
||||||
|
includes timing for containers which needed to
|
||||||
|
update their accounts (i.e. "successes" and
|
||||||
|
"failures" but not "no_changes").
|
||||||
|
============================== ====================================================
|
13
doc/source/metrics/object_auditor.rst
Normal file
13
doc/source/metrics/object_auditor.rst
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
``object-auditor`` Metrics
|
||||||
|
==========================
|
||||||
|
|
||||||
|
============================ ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
---------------------------- ----------------------------------------------------
|
||||||
|
`object-auditor.quarantines` Count of objects failing audit and quarantined.
|
||||||
|
`object-auditor.errors` Count of errors encountered while auditing objects.
|
||||||
|
`object-auditor.timing` Timing data for each object audit (does not include
|
||||||
|
any rate-limiting sleep time for
|
||||||
|
max_files_per_second, but does include rate-limiting
|
||||||
|
sleep time for max_bytes_per_second).
|
||||||
|
============================ ====================================================
|
12
doc/source/metrics/object_expirer.rst
Normal file
12
doc/source/metrics/object_expirer.rst
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
``object-expirer`` Metrics
|
||||||
|
==========================
|
||||||
|
|
||||||
|
======================== ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
------------------------ ----------------------------------------------------
|
||||||
|
`object-expirer.objects` Count of objects expired.
|
||||||
|
`object-expirer.errors` Count of errors encountered while attempting to
|
||||||
|
expire an object.
|
||||||
|
`object-expirer.timing` Timing data for each object expiration attempt,
|
||||||
|
including ones resulting in an error.
|
||||||
|
======================== ====================================================
|
25
doc/source/metrics/object_reconstructor.rst
Normal file
25
doc/source/metrics/object_reconstructor.rst
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
``object-reconstructor`` Metrics
|
||||||
|
================================
|
||||||
|
|
||||||
|
====================================================== ======================================================
|
||||||
|
Metric Name Description
|
||||||
|
------------------------------------------------------ ------------------------------------------------------
|
||||||
|
`object-reconstructor.partition.delete.count.<device>` A count of partitions on <device> which were
|
||||||
|
reconstructed and synced to another node because they
|
||||||
|
didn't belong on this node. This metric is tracked
|
||||||
|
per-device to allow for "quiescence detection" for
|
||||||
|
object reconstruction activity on each device.
|
||||||
|
`object-reconstructor.partition.delete.timing` Timing data for partitions reconstructed and synced to
|
||||||
|
another node because they didn't belong on this node.
|
||||||
|
This metric is not tracked per device.
|
||||||
|
`object-reconstructor.partition.update.count.<device>` A count of partitions on <device> which were
|
||||||
|
reconstructed and synced to another node, but also
|
||||||
|
belong on this node. As with delete.count, this metric
|
||||||
|
is tracked per-device.
|
||||||
|
`object-reconstructor.partition.update.timing` Timing data for partitions reconstructed which also
|
||||||
|
belong on this node. This metric is not tracked
|
||||||
|
per-device.
|
||||||
|
`object-reconstructor.suffix.hashes` Count of suffix directories whose hash (of filenames)
|
||||||
|
was recalculated.
|
||||||
|
`object-reconstructor.suffix.syncs` Count of suffix directories reconstructed with ssync.
|
||||||
|
====================================================== ======================================================
|
25
doc/source/metrics/object_replicator.rst
Normal file
25
doc/source/metrics/object_replicator.rst
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
``object-replicator`` Metrics
|
||||||
|
=============================
|
||||||
|
|
||||||
|
=================================================== ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
--------------------------------------------------- ----------------------------------------------------
|
||||||
|
`object-replicator.partition.delete.count.<device>` A count of partitions on <device> which were
|
||||||
|
replicated to another node because they didn't
|
||||||
|
belong on this node. This metric is tracked
|
||||||
|
per-device to allow for "quiescence detection" for
|
||||||
|
object replication activity on each device.
|
||||||
|
`object-replicator.partition.delete.timing` Timing data for partitions replicated to another
|
||||||
|
node because they didn't belong on this node. This
|
||||||
|
metric is not tracked per device.
|
||||||
|
`object-replicator.partition.update.count.<device>` A count of partitions on <device> which were
|
||||||
|
replicated to another node, but also belong on this
|
||||||
|
node. As with delete.count, this metric is tracked
|
||||||
|
per-device.
|
||||||
|
`object-replicator.partition.update.timing` Timing data for partitions replicated which also
|
||||||
|
belong on this node. This metric is not tracked
|
||||||
|
per-device.
|
||||||
|
`object-replicator.suffix.hashes` Count of suffix directories whose hash (of filenames)
|
||||||
|
was recalculated.
|
||||||
|
`object-replicator.suffix.syncs` Count of suffix directories replicated with rsync.
|
||||||
|
=================================================== ====================================================
|
49
doc/source/metrics/object_server.rst
Normal file
49
doc/source/metrics/object_server.rst
Normal file
@ -0,0 +1,49 @@
|
|||||||
|
``object-server`` Metrics
|
||||||
|
=========================
|
||||||
|
|
||||||
|
======================================= ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
--------------------------------------- ----------------------------------------------------
|
||||||
|
`object-server.quarantines` Count of objects (files) found bad and moved to
|
||||||
|
quarantine.
|
||||||
|
`object-server.async_pendings` Count of container updates saved as async_pendings
|
||||||
|
(may result from PUT or DELETE requests).
|
||||||
|
`object-server.POST.errors.timing` Timing data for POST request errors: bad request,
|
||||||
|
missing timestamp, delete-at in past, not mounted.
|
||||||
|
`object-server.POST.timing` Timing data for each POST request not resulting in
|
||||||
|
an error.
|
||||||
|
`object-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
|
||||||
|
not mounted, missing timestamp, object creation
|
||||||
|
constraint violation, delete-at in past.
|
||||||
|
`object-server.PUT.timeouts` Count of object PUTs which exceeded max_upload_time.
|
||||||
|
`object-server.PUT.timing` Timing data for each PUT request not resulting in an
|
||||||
|
error.
|
||||||
|
`object-server.PUT.<device>.timing` Timing data per kB transferred (ms/kB) for each
|
||||||
|
non-zero-byte PUT request on each device.
|
||||||
|
Monitoring problematic devices, higher is bad.
|
||||||
|
`object-server.GET.errors.timing` Timing data for GET request errors: bad request,
|
||||||
|
not mounted, header timestamps before the epoch,
|
||||||
|
precondition failed.
|
||||||
|
File errors resulting in a quarantine are not
|
||||||
|
counted here.
|
||||||
|
`object-server.GET.timing` Timing data for each GET request not resulting in an
|
||||||
|
error. Includes requests which couldn't find the
|
||||||
|
object (including disk errors resulting in file
|
||||||
|
quarantine).
|
||||||
|
`object-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
|
||||||
|
not mounted.
|
||||||
|
`object-server.HEAD.timing` Timing data for each HEAD request not resulting in
|
||||||
|
an error. Includes requests which couldn't find the
|
||||||
|
object (including disk errors resulting in file
|
||||||
|
quarantine).
|
||||||
|
`object-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
|
||||||
|
missing timestamp, not mounted, precondition
|
||||||
|
failed. Includes requests which couldn't find or
|
||||||
|
match the object.
|
||||||
|
`object-server.DELETE.timing` Timing data for each DELETE request not resulting
|
||||||
|
in an error.
|
||||||
|
`object-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
|
||||||
|
request, not mounted.
|
||||||
|
`object-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
|
||||||
|
in an error.
|
||||||
|
======================================= ====================================================
|
22
doc/source/metrics/object_updater.rst
Normal file
22
doc/source/metrics/object_updater.rst
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
``object-updater`` Metrics
|
||||||
|
==========================
|
||||||
|
|
||||||
|
============================ ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
---------------------------- ----------------------------------------------------
|
||||||
|
`object-updater.errors` Count of drives not mounted or async_pending files
|
||||||
|
with an unexpected name.
|
||||||
|
`object-updater.timing` Timing data for object sweeps to flush async_pending
|
||||||
|
container updates. Does not include object sweeps
|
||||||
|
which did not find an existing async_pending storage
|
||||||
|
directory.
|
||||||
|
`object-updater.quarantines` Count of async_pending container updates which were
|
||||||
|
corrupted and moved to quarantine.
|
||||||
|
`object-updater.successes` Count of successful container updates.
|
||||||
|
`object-updater.failures` Count of failed container updates.
|
||||||
|
`object-updater.unlinks` Count of async_pending files unlinked. An
|
||||||
|
async_pending file is unlinked either when it is
|
||||||
|
successfully processed or when the replicator sees
|
||||||
|
that there is a newer async_pending file for the
|
||||||
|
same object.
|
||||||
|
============================ ====================================================
|
91
doc/source/metrics/proxy_server.rst
Normal file
91
doc/source/metrics/proxy_server.rst
Normal file
@ -0,0 +1,91 @@
|
|||||||
|
``proxy-server`` Metrics
|
||||||
|
========================
|
||||||
|
|
||||||
|
In the table, ``<type>`` is the proxy-server controller responsible for the
|
||||||
|
request and will be one of ``account``, ``container``, or ``object``.
|
||||||
|
|
||||||
|
======================================== ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
---------------------------------------- ----------------------------------------------------
|
||||||
|
`proxy-server.errors` Count of errors encountered while serving requests
|
||||||
|
before the controller type is determined. Includes
|
||||||
|
invalid Content-Length, errors finding the internal
|
||||||
|
controller to handle the request, invalid utf8, and
|
||||||
|
bad URLs.
|
||||||
|
`proxy-server.<type>.handoff_count` Count of node hand-offs; only tracked if log_handoffs
|
||||||
|
is set in the proxy-server config.
|
||||||
|
`proxy-server.<type>.handoff_all_count` Count of times *only* hand-off locations were
|
||||||
|
utilized; only tracked if log_handoffs is set in the
|
||||||
|
proxy-server config.
|
||||||
|
`proxy-server.<type>.client_timeouts` Count of client timeouts (client did not read within
|
||||||
|
`client_timeout` seconds during a GET or did not
|
||||||
|
supply data within `client_timeout` seconds during
|
||||||
|
a PUT).
|
||||||
|
`proxy-server.<type>.client_disconnects` Count of detected client disconnects during PUT
|
||||||
|
operations (does NOT include caught Exceptions in
|
||||||
|
the proxy-server which caused a client disconnect).
|
||||||
|
======================================== ====================================================
|
||||||
|
|
||||||
|
Additionally, middleware often emit their own metrics
|
||||||
|
|
||||||
|
``proxy-logging`` Middleware
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
In the table, ``<type>`` is either the proxy-server controller responsible
|
||||||
|
for the request: ``account``, ``container``, ``object``, or the string
|
||||||
|
``SOS`` if the request came from the `Swift Origin Server`_ middleware.
|
||||||
|
The ``<verb>`` portion will be one of ``GET``, ``HEAD``, ``POST``, ``PUT``,
|
||||||
|
``DELETE``, ``COPY``, ``OPTIONS``, or ``BAD_METHOD``. The list of valid
|
||||||
|
HTTP methods is configurable via the ``log_statsd_valid_http_methods``
|
||||||
|
config variable and the default setting yields the above behavior.
|
||||||
|
|
||||||
|
.. _Swift Origin Server: https://github.com/dpgoetz/sos
|
||||||
|
|
||||||
|
==================================================== ============================================
|
||||||
|
Metric Name Description
|
||||||
|
---------------------------------------------------- --------------------------------------------
|
||||||
|
`proxy-server.<type>.<verb>.<status>.timing` Timing data for requests, start to finish.
|
||||||
|
The <status> portion is the numeric HTTP
|
||||||
|
status code for the request (e.g. "200" or
|
||||||
|
"404").
|
||||||
|
`proxy-server.<type>.GET.<status>.first-byte.timing` Timing data up to completion of sending the
|
||||||
|
response headers (only for GET requests).
|
||||||
|
<status> and <type> are as for the main
|
||||||
|
timing metric.
|
||||||
|
`proxy-server.<type>.<verb>.<status>.xfer` This counter metric is the sum of bytes
|
||||||
|
transferred in (from clients) and out (to
|
||||||
|
clients) for requests. The <type>, <verb>,
|
||||||
|
and <status> portions of the metric are just
|
||||||
|
like the main timing metric.
|
||||||
|
==================================================== ============================================
|
||||||
|
|
||||||
|
The ``proxy-logging`` middleware also groups these metrics by policy. The
|
||||||
|
``<policy-index>`` portion represents a policy index:
|
||||||
|
|
||||||
|
========================================================================== =====================================
|
||||||
|
Metric Name Description
|
||||||
|
-------------------------------------------------------------------------- -------------------------------------
|
||||||
|
`proxy-server.object.policy.<policy-index>.<verb>.<status>.timing` Timing data for requests, aggregated
|
||||||
|
by policy index.
|
||||||
|
`proxy-server.object.policy.<policy-index>.GET.<status>.first-byte.timing` Timing data up to completion of
|
||||||
|
sending the response headers,
|
||||||
|
aggregated by policy index.
|
||||||
|
`proxy-server.object.policy.<policy-index>.<verb>.<status>.xfer` Sum of bytes transferred in and out,
|
||||||
|
aggregated by policy index.
|
||||||
|
========================================================================== =====================================
|
||||||
|
|
||||||
|
``tempauth`` Middleware
|
||||||
|
-----------------------
|
||||||
|
In the table, ``<reseller_prefix>`` represents the actual configured
|
||||||
|
reseller_prefix or ``NONE`` if the reseller_prefix is the empty string:
|
||||||
|
|
||||||
|
========================================= ====================================================
|
||||||
|
Metric Name Description
|
||||||
|
----------------------------------------- ----------------------------------------------------
|
||||||
|
`tempauth.<reseller_prefix>.unauthorized` Count of regular requests which were denied with
|
||||||
|
HTTPUnauthorized.
|
||||||
|
`tempauth.<reseller_prefix>.forbidden` Count of regular requests which were denied with
|
||||||
|
HTTPForbidden.
|
||||||
|
`tempauth.<reseller_prefix>.token_denied` Count of token requests which were denied.
|
||||||
|
`tempauth.<reseller_prefix>.errors` Count of errors.
|
||||||
|
========================================= ====================================================
|
Loading…
Reference in New Issue
Block a user