Add an optional delay to account reaping.

Normally, the reaper begins deleting account information for deleted
accounts immediately. With this patch you can set it to delay its
work. You set the delay_reaping value in the [account-reaper] section
of the account-server.conf. The value is in seconds; 2592000 = 30
days, for example.

Unfortunately, there are currently zero tests for the account-reaper.
This also needs fixing, but I thought I'd submit this delay patch
alone for consideration.

Change-Id: Ic077df9cdd95c5d3f8949dd3bbe9893cf24c6623
This commit is contained in:
gholt 2012-03-16 17:10:36 +00:00
parent 14fbbab0f1
commit ac3cc680de
5 changed files with 82 additions and 10 deletions

View File

@ -481,6 +481,11 @@ concurrency 25 Number of replication workers to spawn
interval 3600 Minimum time for a pass to take
node_timeout 10 Request timeout to external services
conn_timeout 0.5 Connection timeout to external services
delay_reaping 0 Normally, the reaper begins deleting
account information for deleted accounts
immediately; you can set this to delay
its work however. The value is in seconds,
2592000 = 30 days, for example.
================== =============== =========================================
--------------------------

View File

@ -4,13 +4,22 @@ The Account Reaper
The Account Reaper removes data from deleted accounts in the background.
An account is marked for deletion by a reseller through the services server's
remove_storage_account XMLRPC call. This simply puts the value DELETED into the
status column of the account_stat table in the account database (and replicas),
indicating the data for the account should be deleted later. There is no set
retention time and no undelete; it is assumed the reseller will implement such
features and only call remove_storage_account once it is truly desired the
account's data be removed.
An account is marked for deletion by a reseller issuing a DELETE request on the
account's storage URL. This simply puts the value DELETED into the status
column of the account_stat table in the account database (and replicas),
indicating the data for the account should be deleted later.
There is normally no set retention time and no undelete; it is assumed the
reseller will implement such features and only call DELETE on the account once
it is truly desired the account's data be removed. However, in order to protect
the Swift cluster accounts from an improper or mistaken delete request, you can
set a delay_reaping value in the [account-reaper] section of the
account-server.conf to delay the actual deletion of data. At this time, there
is no utility to undelete an account; one would have to update the account
database replicas directly, setting the status column to an empty string and
updating the put_timestamp to be greater than the delete_timestamp. (On the
TODO list is writing a utility to perform this task, preferably through a ReST
call.)
The account reaper runs on each account server and scans the server
occasionally for account databases marked for deletion. It will only trigger on

View File

@ -66,3 +66,7 @@ use = egg:swift#account
# conn_timeout = 0.5
# log_facility = LOG_LOCAL0
# log_level = INFO
# Normally, the reaper begins deleting account information for deleted accounts
# immediately; you can set this to delay its work however. The value is in
# seconds; 2592000 = 30 days for example.
# delay_reaping = 0

View File

@ -72,6 +72,7 @@ class AccountReaper(Daemon):
self.container_concurrency = self.object_concurrency = \
sqrt(self.concurrency)
self.container_pool = GreenPool(size=self.container_concurrency)
self.delay_reaping = int(conf.get('delay_reaping') or 0)
def get_account_ring(self):
""" The account :class:`swift.common.ring.Ring` for the cluster. """
@ -211,7 +212,10 @@ class AccountReaper(Daemon):
of the node dicts.
"""
begin = time()
account = broker.get_info()['account']
info = broker.get_info()
if time() - float(info['delete_timestamp']) <= self.delay_reaping:
return False
account = info['account']
self.logger.info(_('Beginning pass on account %s'), account)
self.stats_return_codes = {}
self.stats_containers_deleted = 0
@ -264,6 +268,7 @@ class AccountReaper(Daemon):
log = log[:-2]
log += _(', elapsed: %.02fs') % (time() - begin)
self.logger.info(log)
return True
def reap_container(self, account, account_partition, account_nodes,
container):

View File

@ -16,12 +16,61 @@
# TODO: Tests
import unittest
from swift.account import reaper
from swift.common.utils import normalize_timestamp
class FakeBroker(object):
def __init__(self):
self.info = {}
def get_info(self):
return self.info
class TestReaper(unittest.TestCase):
def test_placeholder(self):
pass
def test_delay_reaping_conf_default(self):
r = reaper.AccountReaper({})
self.assertEquals(r.delay_reaping, 0)
r = reaper.AccountReaper({'delay_reaping': ''})
self.assertEquals(r.delay_reaping, 0)
def test_delay_reaping_conf_set(self):
r = reaper.AccountReaper({'delay_reaping': '123'})
self.assertEquals(r.delay_reaping, 123)
def test_delay_reaping_conf_bad_value(self):
self.assertRaises(ValueError, reaper.AccountReaper,
{'delay_reaping': 'abc'})
def test_reap_delay(self):
time_value = [100]
def _time():
return time_value[0]
time_orig = reaper.time
try:
reaper.time = _time
r = reaper.AccountReaper({'delay_reaping': '10'})
b = FakeBroker()
b.info['delete_timestamp'] = normalize_timestamp(110)
self.assertFalse(r.reap_account(b, 0, None))
b.info['delete_timestamp'] = normalize_timestamp(100)
self.assertFalse(r.reap_account(b, 0, None))
b.info['delete_timestamp'] = normalize_timestamp(90)
self.assertFalse(r.reap_account(b, 0, None))
# KeyError raised immediately as reap_account tries to get the
# account's name to do the reaping.
b.info['delete_timestamp'] = normalize_timestamp(89)
self.assertRaises(KeyError, r.reap_account, b, 0, None)
b.info['delete_timestamp'] = normalize_timestamp(1)
self.assertRaises(KeyError, r.reap_account, b, 0, None)
finally:
reaper.time = time_orig
if __name__ == '__main__':