Merge "Write-affinity aware object deletion"

This commit is contained in:
Jenkins 2017-07-06 14:00:05 +00:00 committed by Gerrit Code Review
commit c3f6e82ae1
8 changed files with 471 additions and 230 deletions

View File

@ -1676,187 +1676,207 @@ ionice_priority None I/O scheduling p
[proxy-server] [proxy-server]
============================ =============== ===================================== ====================================== =============== =====================================
Option Default Description Option Default Description
---------------------------- --------------- ------------------------------------- -------------------------------------- --------------- -------------------------------------
use Entry point for paste.deploy for use Entry point for paste.deploy for
the proxy server. For most the proxy server. For most
cases, this should be cases, this should be
`egg:swift#proxy`. `egg:swift#proxy`.
set log_name proxy-server Label used when logging set log_name proxy-server Label used when logging
set log_facility LOG_LOCAL0 Syslog log facility set log_facility LOG_LOCAL0 Syslog log facility
set log_level INFO Log level set log_level INFO Log level
set log_headers True If True, log headers in each set log_headers True If True, log headers in each
request request
set log_handoffs True If True, the proxy will log set log_handoffs True If True, the proxy will log
whenever it has to failover to a whenever it has to failover to a
handoff node handoff node
recheck_account_existence 60 Cache timeout in seconds to recheck_account_existence 60 Cache timeout in seconds to
send memcached for account send memcached for account
existence existence
recheck_container_existence 60 Cache timeout in seconds to recheck_container_existence 60 Cache timeout in seconds to
send memcached for container send memcached for container
existence existence
object_chunk_size 65536 Chunk size to read from object_chunk_size 65536 Chunk size to read from
object servers object servers
client_chunk_size 65536 Chunk size to read from client_chunk_size 65536 Chunk size to read from
clients clients
memcache_servers 127.0.0.1:11211 Comma separated list of memcache_servers 127.0.0.1:11211 Comma separated list of
memcached servers memcached servers
ip:port or [ipv6addr]:port ip:port or [ipv6addr]:port
memcache_max_connections 2 Max number of connections to memcache_max_connections 2 Max number of connections to
each memcached server per each memcached server per
worker worker
node_timeout 10 Request timeout to external node_timeout 10 Request timeout to external
services services
recoverable_node_timeout node_timeout Request timeout to external recoverable_node_timeout node_timeout Request timeout to external
services for requests that, on services for requests that, on
failure, can be recovered failure, can be recovered
from. For example, object GET. from. For example, object GET.
client_timeout 60 Timeout to read one chunk client_timeout 60 Timeout to read one chunk
from a client from a client
conn_timeout 0.5 Connection timeout to conn_timeout 0.5 Connection timeout to
external services external services
error_suppression_interval 60 Time in seconds that must error_suppression_interval 60 Time in seconds that must
elapse since the last error elapse since the last error
for a node to be considered for a node to be considered
no longer error limited no longer error limited
error_suppression_limit 10 Error count to consider a error_suppression_limit 10 Error count to consider a
node error limited node error limited
allow_account_management false Whether account PUTs and DELETEs allow_account_management false Whether account PUTs and DELETEs
are even callable are even callable
object_post_as_copy false Deprecated. object_post_as_copy false Deprecated.
account_autocreate false If set to 'true' authorized account_autocreate false If set to 'true' authorized
accounts that do not yet exist accounts that do not yet exist
within the Swift cluster will within the Swift cluster will
be automatically created. be automatically created.
max_containers_per_account 0 If set to a positive value, max_containers_per_account 0 If set to a positive value,
trying to create a container trying to create a container
when the account already has at when the account already has at
least this maximum containers least this maximum containers
will result in a 403 Forbidden. will result in a 403 Forbidden.
Note: This is a soft limit, Note: This is a soft limit,
meaning a user might exceed the meaning a user might exceed the
cap for cap for
recheck_account_existence before recheck_account_existence before
the 403s kick in. the 403s kick in.
max_containers_whitelist This is a comma separated list max_containers_whitelist This is a comma separated list
of account names that ignore of account names that ignore
the max_containers_per_account the max_containers_per_account
cap. cap.
rate_limit_after_segment 10 Rate limit the download of rate_limit_after_segment 10 Rate limit the download of
large object segments after large object segments after
this segment is downloaded. this segment is downloaded.
rate_limit_segments_per_sec 1 Rate limit large object rate_limit_segments_per_sec 1 Rate limit large object
downloads at this rate. downloads at this rate.
request_node_count 2 * replicas Set to the number of nodes to request_node_count 2 * replicas Set to the number of nodes to
contact for a normal request. contact for a normal request.
You can use '* replicas' at the You can use '* replicas' at the
end to have it use the number end to have it use the number
given times the number of given times the number of
replicas for the ring being used replicas for the ring being used
for the request. for the request.
swift_owner_headers <see the sample These are the headers whose swift_owner_headers <see the sample These are the headers whose
conf file for values will only be shown to conf file for values will only be shown to
the list of swift_owners. The exact the list of swift_owners. The exact
default definition of a swift_owner is default definition of a swift_owner is
headers> up to the auth system in use, headers> up to the auth system in use,
but usually indicates but usually indicates
administrative responsibilities. administrative responsibilities.
sorting_method shuffle Storage nodes can be chosen at sorting_method shuffle Storage nodes can be chosen at
random (shuffle), by using timing random (shuffle), by using timing
measurements (timing), or by using measurements (timing), or by using
an explicit match (affinity). an explicit match (affinity).
Using timing measurements may allow Using timing measurements may allow
for lower overall latency, while for lower overall latency, while
using affinity allows for finer using affinity allows for finer
control. In both the timing and control. In both the timing and
affinity cases, equally-sorting nodes affinity cases, equally-sorting nodes
are still randomly chosen to spread are still randomly chosen to spread
load. This option may be overridden load. This option may be overridden
in a per-policy configuration in a per-policy configuration
section. section.
timing_expiry 300 If the "timing" sorting_method is timing_expiry 300 If the "timing" sorting_method is
used, the timings will only be valid used, the timings will only be valid
for the number of seconds configured for the number of seconds configured
by timing_expiry. by timing_expiry.
concurrent_gets off Use replica count number of concurrent_gets off Use replica count number of
threads concurrently during a threads concurrently during a
GET/HEAD and return with the GET/HEAD and return with the
first successful response. In first successful response. In
the EC case, this parameter only the EC case, this parameter only
effects an EC HEAD as an EC GET effects an EC HEAD as an EC GET
behaves differently. behaves differently.
concurrency_timeout conn_timeout This parameter controls how long concurrency_timeout conn_timeout This parameter controls how long
to wait before firing off the to wait before firing off the
next concurrent_get thread. A next concurrent_get thread. A
value of 0 would we fully concurrent value of 0 would we fully concurrent
any other number will stagger the any other number will stagger the
firing of the threads. This number firing of the threads. This number
should be between 0 and node_timeout. should be between 0 and node_timeout.
The default is conn_timeout (0.5). The default is conn_timeout (0.5).
nice_priority None Scheduling priority of server nice_priority None Scheduling priority of server
processes. processes.
Niceness values range from -20 (most Niceness values range from -20 (most
favorable to the process) to 19 (least favorable to the process) to 19 (least
favorable to the process). The default favorable to the process). The default
does not modify priority. does not modify priority.
ionice_class None I/O scheduling class of server ionice_class None I/O scheduling class of server
processes. I/O niceness class values processes. I/O niceness class values
are IOPRIO_CLASS_RT (realtime), are IOPRIO_CLASS_RT (realtime),
IOPRIO_CLASS_BE (best-effort), IOPRIO_CLASS_BE (best-effort),
and IOPRIO_CLASS_IDLE (idle). and IOPRIO_CLASS_IDLE (idle).
The default does not modify class and The default does not modify class and
priority. Linux supports io scheduling priority. Linux supports io scheduling
priorities and classes since 2.6.13 priorities and classes since 2.6.13
with the CFQ io scheduler. with the CFQ io scheduler.
Work only with ionice_priority. Work only with ionice_priority.
ionice_priority None I/O scheduling priority of server ionice_priority None I/O scheduling priority of server
processes. I/O niceness priority is processes. I/O niceness priority is
a number which goes from 0 to 7. a number which goes from 0 to 7.
The higher the value, the lower the The higher the value, the lower the
I/O priority of the process. Work I/O priority of the process. Work
only with ionice_class. only with ionice_class.
Ignored if IOPRIO_CLASS_IDLE is set. Ignored if IOPRIO_CLASS_IDLE is set.
read_affinity None Specifies which backend servers to read_affinity None Specifies which backend servers to
prefer on reads; used in conjunction prefer on reads; used in conjunction
with the sorting_method option being with the sorting_method option being
set to 'affinity'. Format is a comma set to 'affinity'. Format is a comma
separated list of affinity descriptors separated list of affinity descriptors
of the form <selection>=<priority>. of the form <selection>=<priority>.
The <selection> may be r<N> for The <selection> may be r<N> for
selecting nodes in region N or selecting nodes in region N or
r<N>z<M> for selecting nodes in r<N>z<M> for selecting nodes in
region N, zone M. The <priority> region N, zone M. The <priority>
value should be a whole number value should be a whole number
that represents the priority to that represents the priority to
be given to the selection; lower be given to the selection; lower
numbers are higher priority. numbers are higher priority.
Default is empty, meaning no Default is empty, meaning no
preference. This option may be preference. This option may be
overridden in a per-policy overridden in a per-policy
configuration section. configuration section.
write_affinity None Specifies which backend servers to write_affinity None Specifies which backend servers to
prefer on writes. Format is a comma prefer on writes. Format is a comma
separated list of affinity separated list of affinity
descriptors of the form r<N> for descriptors of the form r<N> for
region N or r<N>z<M> for region N, region N or r<N>z<M> for region N,
zone M. Default is empty, meaning no zone M. Default is empty, meaning no
preference. This option may be preference. This option may be
overridden in a per-policy overridden in a per-policy
configuration section. configuration section.
write_affinity_node_count 2 * replicas The number of local (as governed by write_affinity_node_count 2 * replicas The number of local (as governed by
the write_affinity setting) nodes to the write_affinity setting) nodes to
attempt to contact first on writes, attempt to contact first on writes,
before any non-local ones. The value before any non-local ones. The value
should be an integer number, or use should be an integer number, or use
'* replicas' at the end to have it '* replicas' at the end to have it
use the number given times the number use the number given times the number
of replicas for the ring being used of replicas for the ring being used
for the request. This option may be for the request. This option may be
overridden in a per-policy overridden in a per-policy
configuration section. configuration section.
============================ =============== ===================================== write_affinity_handoff_delete_count auto The number of local (as governed by
the write_affinity setting) handoff
nodes to attempt to contact on
deletion, in addition to primary
nodes. Example: in geographically
distributed deployment, If replicas=3,
sometimes there may be 1 primary node
and 2 local handoff nodes in one region
holding the object after uploading but
before object replicated to the
appropriate locations in other regions.
In this case, include these handoff
nodes to send request when deleting
object could help make correct decision
for the response. The default value 'auto'
means Swift will calculate the number
automatically, the default value is
(replicas - len(local_primary_nodes)).
This option may be overridden in a
per-policy configuration section.
====================================== =============== =====================================
.. _proxy_server_per_policy_config: .. _proxy_server_per_policy_config:
@ -1871,6 +1891,7 @@ options are:
- ``read_affinity`` - ``read_affinity``
- ``write_affinity`` - ``write_affinity``
- ``write_affinity_node_count`` - ``write_affinity_node_count``
- ``write_affinity_handoff_delete_count``
The per-policy config section name must be of the form:: The per-policy config section name must be of the form::
@ -1900,6 +1921,7 @@ policy with index ``3``::
read_affinity = r2=1 read_affinity = r2=1
write_affinity = r2 write_affinity = r2
write_affinity_node_count = 1 * replicas write_affinity_node_count = 1 * replicas
write_affinity_handoff_delete_count = 2
.. note:: .. note::

View File

@ -82,9 +82,9 @@ Note that read_affinity only affects the ordering of primary nodes
(see ring docs for definition of primary node), not the ordering of (see ring docs for definition of primary node), not the ordering of
handoff nodes. handoff nodes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
write_affinity and write_affinity_node_count write_affinity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
This setting makes the proxy server prefer local backend servers for This setting makes the proxy server prefer local backend servers for
object PUT requests over non-local ones. For example, it may be object PUT requests over non-local ones. For example, it may be
@ -97,9 +97,15 @@ the object won't immediately have any replicas in NY. However,
replication will move the object's replicas to their proper homes in replication will move the object's replicas to their proper homes in
both SF and NY. both SF and NY.
Note that only object PUT requests are affected by the write_affinity One potential issue with write_affinity is, end user may get 404 error when
setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT deleting objects before replication. The write_affinity_handoff_delete_count
requests are not affected. setting is used together with write_affinity in order to solve that issue.
With its default configuration, Swift will calculate the proper number of
handoff nodes to send requests to.
Note that only object PUT/DELETE requests are affected by the write_affinity
setting; POST, GET, HEAD, OPTIONS, and account/container PUT requests are
not affected.
This setting lets you trade data distribution for throughput. If This setting lets you trade data distribution for throughput. If
write_affinity is enabled, then object replicas will initially be write_affinity is enabled, then object replicas will initially be

View File

@ -236,6 +236,20 @@ use = egg:swift#proxy
# This option may be overridden in a per-policy configuration section. # This option may be overridden in a per-policy configuration section.
# write_affinity_node_count = 2 * replicas # write_affinity_node_count = 2 * replicas
# #
# The number of local (as governed by the write_affinity setting) handoff nodes
# to attempt to contact on deletion, in addition to primary nodes.
#
# Example: in geographically distributed deployment of 2 regions, If
# replicas=3, sometimes there may be 1 primary node and 2 local handoff nodes
# in one region holding the object after uploading but before object replicated
# to the appropriate locations in other regions. In this case, include these
# handoff nodes to send request when deleting object could help make correct
# decision for the response. The default value 'auto' means Swift will
# calculate the number automatically, the default value is
# (replicas - len(local_primary_nodes)). This option may be overridden in a
# per-policy configuration section.
# write_affinity_handoff_delete_count = auto
#
# These are the headers whose values will only be shown to swift_owners. The # These are the headers whose values will only be shown to swift_owners. The
# exact definition of a swift_owner is up to the auth system in use, but # exact definition of a swift_owner is up to the auth system in use, but
# usually indicates administrative responsibilities. # usually indicates administrative responsibilities.
@ -264,6 +278,7 @@ use = egg:swift#proxy
# read_affinity = # read_affinity =
# write_affinity = # write_affinity =
# write_affinity_node_count = # write_affinity_node_count =
# write_affinity_handoff_delete_count =
[filter:tempauth] [filter:tempauth]
use = egg:swift#tempauth use = egg:swift#tempauth

View File

@ -1596,7 +1596,8 @@ class Controller(object):
{'method': method, 'path': path}) {'method': method, 'path': path})
def make_requests(self, req, ring, part, method, path, headers, def make_requests(self, req, ring, part, method, path, headers,
query_string='', overrides=None): query_string='', overrides=None, node_count=None,
node_iterator=None):
""" """
Sends an HTTP request to multiple nodes and aggregates the results. Sends an HTTP request to multiple nodes and aggregates the results.
It attempts the primary nodes concurrently, then iterates over the It attempts the primary nodes concurrently, then iterates over the
@ -1613,11 +1614,16 @@ class Controller(object):
:param query_string: optional query string to send to the backend :param query_string: optional query string to send to the backend
:param overrides: optional return status override map used to override :param overrides: optional return status override map used to override
the returned status of a request. the returned status of a request.
:param node_count: optional number of nodes to send request to.
:param node_iterator: optional node iterator.
:returns: a swob.Response object :returns: a swob.Response object
""" """
start_nodes = ring.get_part_nodes(part) nodes = GreenthreadSafeIterator(
nodes = GreenthreadSafeIterator(self.app.iter_nodes(ring, part)) node_iterator or self.app.iter_nodes(ring, part)
pile = GreenAsyncPile(len(start_nodes)) )
node_number = node_count or len(ring.get_part_nodes(part))
pile = GreenAsyncPile(node_number)
for head in headers: for head in headers:
pile.spawn(self._make_request, nodes, part, method, path, pile.spawn(self._make_request, nodes, part, method, path,
head, query_string, self.app.logger.thread_locals) head, query_string, self.app.logger.thread_locals)
@ -1628,7 +1634,7 @@ class Controller(object):
continue continue
response.append(resp) response.append(resp)
statuses.append(resp[0]) statuses.append(resp[0])
if self.have_quorum(statuses, len(start_nodes)): if self.have_quorum(statuses, node_number):
break break
# give any pending requests *some* chance to finish # give any pending requests *some* chance to finish
finished_quickly = pile.waitall(self.app.post_quorum_timeout) finished_quickly = pile.waitall(self.app.post_quorum_timeout)
@ -1637,7 +1643,7 @@ class Controller(object):
continue continue
response.append(resp) response.append(resp)
statuses.append(resp[0]) statuses.append(resp[0])
while len(response) < len(start_nodes): while len(response) < node_number:
response.append((HTTP_SERVICE_UNAVAILABLE, '', '', '')) response.append((HTTP_SERVICE_UNAVAILABLE, '', '', ''))
statuses, reasons, resp_headers, bodies = zip(*response) statuses, reasons, resp_headers, bodies = zip(*response)
return self.best_response(req, statuses, reasons, bodies, return self.best_response(req, statuses, reasons, bodies,

View File

@ -128,7 +128,8 @@ class BaseObjectController(Controller):
self.container_name = unquote(container_name) self.container_name = unquote(container_name)
self.object_name = unquote(object_name) self.object_name = unquote(object_name)
def iter_nodes_local_first(self, ring, partition, policy=None): def iter_nodes_local_first(self, ring, partition, policy=None,
local_handoffs_first=False):
""" """
Yields nodes for a ring partition. Yields nodes for a ring partition.
@ -141,6 +142,9 @@ class BaseObjectController(Controller):
:param ring: ring to get nodes from :param ring: ring to get nodes from
:param partition: ring partition to yield nodes for :param partition: ring partition to yield nodes for
:param policy: optional, an instance of :class:`BaseStoragePolicy
:param local_handoffs_first: optional, if True prefer primaries and
local handoff nodes first before looking elsewhere.
""" """
policy_options = self.app.get_policy_options(policy) policy_options = self.app.get_policy_options(policy)
is_local = policy_options.write_affinity_is_local_fn is_local = policy_options.write_affinity_is_local_fn
@ -148,23 +152,38 @@ class BaseObjectController(Controller):
return self.app.iter_nodes(ring, partition, policy=policy) return self.app.iter_nodes(ring, partition, policy=policy)
primary_nodes = ring.get_part_nodes(partition) primary_nodes = ring.get_part_nodes(partition)
num_locals = policy_options.write_affinity_node_count_fn( handoff_nodes = ring.get_more_nodes(partition)
len(primary_nodes)) all_nodes = itertools.chain(primary_nodes, handoff_nodes)
all_nodes = itertools.chain(primary_nodes, if local_handoffs_first:
ring.get_more_nodes(partition)) num_locals = policy_options.write_affinity_handoff_delete_count
first_n_local_nodes = list(itertools.islice( if num_locals is None:
(node for node in all_nodes if is_local(node)), num_locals)) local_primaries = [node for node in primary_nodes
if is_local(node)]
num_locals = len(primary_nodes) - len(local_primaries)
# refresh it; it moved when we computed first_n_local_nodes first_local_handoffs = list(itertools.islice(
all_nodes = itertools.chain(primary_nodes, (node for node in handoff_nodes if is_local(node)), num_locals)
ring.get_more_nodes(partition)) )
local_first_node_iter = itertools.chain( preferred_nodes = primary_nodes + first_local_handoffs
first_n_local_nodes, else:
(node for node in all_nodes if node not in first_n_local_nodes)) num_locals = policy_options.write_affinity_node_count_fn(
len(primary_nodes)
)
preferred_nodes = list(itertools.islice(
(node for node in all_nodes if is_local(node)), num_locals)
)
# refresh it; it moved when we computed preferred_nodes
handoff_nodes = ring.get_more_nodes(partition)
all_nodes = itertools.chain(primary_nodes, handoff_nodes)
return self.app.iter_nodes( node_iter = itertools.chain(
ring, partition, node_iter=local_first_node_iter, policy=policy) preferred_nodes,
(node for node in all_nodes if node not in preferred_nodes)
)
return self.app.iter_nodes(ring, partition, node_iter=node_iter,
policy=policy)
def GETorHEAD(self, req): def GETorHEAD(self, req):
"""Handle HTTP GET or HEAD requests.""" """Handle HTTP GET or HEAD requests."""
@ -592,10 +611,12 @@ class BaseObjectController(Controller):
raise NotImplementedError() raise NotImplementedError()
def _delete_object(self, req, obj_ring, partition, headers): def _delete_object(self, req, obj_ring, partition, headers):
""" """Delete object considering write-affinity.
send object DELETE request to storage nodes. Subclasses of
the BaseObjectController can provide their own implementation When deleting object in write affinity deployment, also take configured
of this method. handoff nodes number into consideration, instead of just sending
requests to primary nodes. Otherwise (write-affinity is disabled),
go with the same way as before.
:param req: the DELETE Request :param req: the DELETE Request
:param obj_ring: the object ring :param obj_ring: the object ring
@ -603,11 +624,37 @@ class BaseObjectController(Controller):
:param headers: system headers to storage nodes :param headers: system headers to storage nodes
:return: Response object :return: Response object
""" """
# When deleting objects treat a 404 status as 204. policy_index = req.headers.get('X-Backend-Storage-Policy-Index')
policy = POLICIES.get_by_index(policy_index)
node_count = None
node_iterator = None
policy_options = self.app.get_policy_options(policy)
is_local = policy_options.write_affinity_is_local_fn
if is_local is not None:
primaries = obj_ring.get_part_nodes(partition)
node_count = len(primaries)
local_handoffs = policy_options.write_affinity_handoff_delete_count
if local_handoffs is None:
local_primaries = [node for node in primaries
if is_local(node)]
local_handoffs = len(primaries) - len(local_primaries)
node_count += local_handoffs
node_iterator = self.iter_nodes_local_first(
obj_ring, partition, policy=policy, local_handoffs_first=True
)
status_overrides = {404: 204} status_overrides = {404: 204}
resp = self.make_requests(req, obj_ring, resp = self.make_requests(req, obj_ring,
partition, 'DELETE', req.swift_entity_path, partition, 'DELETE', req.swift_entity_path,
headers, overrides=status_overrides) headers, overrides=status_overrides,
node_count=node_count,
node_iterator=node_iterator)
return resp return resp
def _post_object(self, req, obj_ring, partition, headers): def _post_object(self, req, obj_ring, partition, headers):
@ -734,8 +781,20 @@ class BaseObjectController(Controller):
else: else:
req.headers['X-Timestamp'] = Timestamp(time.time()).internal req.headers['X-Timestamp'] = Timestamp(time.time()).internal
# Include local handoff nodes if write-affinity is enabled.
node_count = len(nodes)
policy = POLICIES.get_by_index(policy_index)
policy_options = self.app.get_policy_options(policy)
is_local = policy_options.write_affinity_is_local_fn
if is_local is not None:
local_handoffs = policy_options.write_affinity_handoff_delete_count
if local_handoffs is None:
local_primaries = [node for node in nodes if is_local(node)]
local_handoffs = len(nodes) - len(local_primaries)
node_count += local_handoffs
headers = self._backend_requests( headers = self._backend_requests(
req, len(nodes), container_partition, container_nodes) req, node_count, container_partition, container_nodes)
return self._delete_object(req, obj_ring, partition, headers) return self._delete_object(req, obj_ring, partition, headers)

View File

@ -35,7 +35,7 @@ from swift.common.ring import Ring
from swift.common.utils import cache_from_env, get_logger, \ from swift.common.utils import cache_from_env, get_logger, \
get_remote_client, split_path, config_true_value, generate_trans_id, \ get_remote_client, split_path, config_true_value, generate_trans_id, \
affinity_key_function, affinity_locality_predicate, list_from_csv, \ affinity_key_function, affinity_locality_predicate, list_from_csv, \
register_swift_info, readconf register_swift_info, readconf, config_auto_int_value
from swift.common.constraints import check_utf8, valid_api_version from swift.common.constraints import check_utf8, valid_api_version
from swift.proxy.controllers import AccountController, ContainerController, \ from swift.proxy.controllers import AccountController, ContainerController, \
ObjectControllerRouter, InfoController ObjectControllerRouter, InfoController
@ -130,13 +130,18 @@ class ProxyOverrideOptions(object):
'Invalid write_affinity_node_count value: %r' % 'Invalid write_affinity_node_count value: %r' %
(' '.join(value))) (' '.join(value)))
self.write_affinity_handoff_delete_count = config_auto_int_value(
get('write_affinity_handoff_delete_count', 'auto'), None
)
def __repr__(self): def __repr__(self):
return '%s({}, {%s})' % (self.__class__.__name__, ', '.join( return '%s({}, {%s})' % (self.__class__.__name__, ', '.join(
'%r: %r' % (k, getattr(self, k)) for k in ( '%r: %r' % (k, getattr(self, k)) for k in (
'sorting_method', 'sorting_method',
'read_affinity', 'read_affinity',
'write_affinity', 'write_affinity',
'write_affinity_node_count'))) 'write_affinity_node_count',
'write_affinity_handoff_delete_count')))
def __eq__(self, other): def __eq__(self, other):
if not isinstance(other, ProxyOverrideOptions): if not isinstance(other, ProxyOverrideOptions):
@ -145,7 +150,8 @@ class ProxyOverrideOptions(object):
'sorting_method', 'sorting_method',
'read_affinity', 'read_affinity',
'write_affinity', 'write_affinity',
'write_affinity_node_count')) 'write_affinity_node_count',
'write_affinity_handoff_delete_count'))
class Application(object): class Application(object):

View File

@ -279,6 +279,86 @@ class BaseObjectControllerMixin(object):
self.assertEqual(len(all_nodes), len(local_first_nodes)) self.assertEqual(len(all_nodes), len(local_first_nodes))
self.assertEqual(sorted(all_nodes), sorted(local_first_nodes)) self.assertEqual(sorted(all_nodes), sorted(local_first_nodes))
def test_iter_nodes_local_handoff_first_noops_when_no_affinity(self):
# this test needs a stable node order - most don't
self.app.sort_nodes = lambda l, *args, **kwargs: l
controller = self.controller_cls(
self.app, 'a', 'c', 'o')
policy = self.policy
self.app.get_policy_options(policy).write_affinity_is_local_fn = None
object_ring = policy.object_ring
all_nodes = object_ring.get_part_nodes(1)
all_nodes.extend(object_ring.get_more_nodes(1))
local_first_nodes = list(controller.iter_nodes_local_first(
object_ring, 1, local_handoffs_first=True))
self.maxDiff = None
self.assertEqual(all_nodes, local_first_nodes)
def test_iter_nodes_handoff_local_first_default(self):
controller = self.controller_cls(
self.app, 'a', 'c', 'o')
policy_conf = self.app.get_policy_options(self.policy)
policy_conf.write_affinity_is_local_fn = (
lambda node: node['region'] == 1)
object_ring = self.policy.object_ring
primary_nodes = object_ring.get_part_nodes(1)
handoff_nodes_iter = object_ring.get_more_nodes(1)
all_nodes = primary_nodes + list(handoff_nodes_iter)
handoff_nodes_iter = object_ring.get_more_nodes(1)
local_handoffs = [n for n in handoff_nodes_iter if
policy_conf.write_affinity_is_local_fn(n)]
prefered_nodes = list(controller.iter_nodes_local_first(
object_ring, 1, local_handoffs_first=True))
self.assertEqual(len(all_nodes), self.replicas() +
POLICIES.default.object_ring.max_more_nodes)
first_primary_nodes = prefered_nodes[:len(primary_nodes)]
self.assertEqual(sorted(primary_nodes), sorted(first_primary_nodes))
handoff_count = self.replicas() - len(primary_nodes)
first_handoffs = prefered_nodes[len(primary_nodes):][:handoff_count]
self.assertEqual(first_handoffs, local_handoffs[:handoff_count])
def test_iter_nodes_handoff_local_first_non_default(self):
# Obviously this test doesn't work if we're testing 1 replica.
# In that case, we don't have any failovers to check.
if self.replicas() == 1:
return
controller = self.controller_cls(
self.app, 'a', 'c', 'o')
policy_conf = self.app.get_policy_options(self.policy)
policy_conf.write_affinity_is_local_fn = (
lambda node: node['region'] == 1)
policy_conf.write_affinity_handoff_delete_count = 1
object_ring = self.policy.object_ring
primary_nodes = object_ring.get_part_nodes(1)
handoff_nodes_iter = object_ring.get_more_nodes(1)
all_nodes = primary_nodes + list(handoff_nodes_iter)
handoff_nodes_iter = object_ring.get_more_nodes(1)
local_handoffs = [n for n in handoff_nodes_iter if
policy_conf.write_affinity_is_local_fn(n)]
prefered_nodes = list(controller.iter_nodes_local_first(
object_ring, 1, local_handoffs_first=True))
self.assertEqual(len(all_nodes), self.replicas() +
POLICIES.default.object_ring.max_more_nodes)
first_primary_nodes = prefered_nodes[:len(primary_nodes)]
self.assertEqual(sorted(primary_nodes), sorted(first_primary_nodes))
handoff_count = policy_conf.write_affinity_handoff_delete_count
first_handoffs = prefered_nodes[len(primary_nodes):][:handoff_count]
self.assertEqual(first_handoffs, local_handoffs[:handoff_count])
def test_connect_put_node_timeout(self): def test_connect_put_node_timeout(self):
controller = self.controller_cls( controller = self.controller_cls(
self.app, 'a', 'c', 'o') self.app, 'a', 'c', 'o')
@ -369,6 +449,36 @@ class BaseObjectControllerMixin(object):
resp = req.get_response(self.app) resp = req.get_response(self.app)
self.assertEqual(resp.status_int, 204) self.assertEqual(resp.status_int, 204)
def test_DELETE_write_affinity_before_replication(self):
policy_conf = self.app.get_policy_options(self.policy)
policy_conf.write_affinity_handoff_delete_count = self.replicas() / 2
policy_conf.write_affinity_is_local_fn = (
lambda node: node['region'] == 1)
handoff_count = policy_conf.write_affinity_handoff_delete_count
req = swift.common.swob.Request.blank('/v1/a/c/o', method='DELETE')
codes = [204] * self.replicas() + [404] * handoff_count
with set_http_connect(*codes):
resp = req.get_response(self.app)
self.assertEqual(resp.status_int, 204)
def test_DELETE_write_affinity_after_replication(self):
policy_conf = self.app.get_policy_options(self.policy)
policy_conf.write_affinity_handoff_delete_count = self.replicas() / 2
policy_conf.write_affinity_is_local_fn = (
lambda node: node['region'] == 1)
handoff_count = policy_conf.write_affinity_handoff_delete_count
req = swift.common.swob.Request.blank('/v1/a/c/o', method='DELETE')
codes = ([204] * (self.replicas() - handoff_count) +
[404] * handoff_count +
[204] * handoff_count)
with set_http_connect(*codes):
resp = req.get_response(self.app)
self.assertEqual(resp.status_int, 204)
def test_POST_non_int_delete_after(self): def test_POST_non_int_delete_after(self):
t = str(int(time.time() + 100)) + '.1' t = str(int(time.time() + 100)) + '.1'
req = swob.Request.blank('/v1/a/c/o', method='POST', req = swob.Request.blank('/v1/a/c/o', method='POST',

View File

@ -1366,16 +1366,19 @@ class TestProxyServerConfigLoading(unittest.TestCase):
read_affinity = r1=100 read_affinity = r1=100
write_affinity = r1 write_affinity = r1
write_affinity_node_count = 1 * replicas write_affinity_node_count = 1 * replicas
write_affinity_handoff_delete_count = 4
""" """
expected_default = {"read_affinity": "", expected_default = {"read_affinity": "",
"sorting_method": "shuffle", "sorting_method": "shuffle",
"write_affinity": "", "write_affinity": "",
"write_affinity_node_count_fn": 6} "write_affinity_node_count_fn": 6,
"write_affinity_handoff_delete_count": None}
exp_options = {None: expected_default, exp_options = {None: expected_default,
POLICIES[0]: {"read_affinity": "r1=100", POLICIES[0]: {"read_affinity": "r1=100",
"sorting_method": "affinity", "sorting_method": "affinity",
"write_affinity": "r1", "write_affinity": "r1",
"write_affinity_node_count_fn": 3}, "write_affinity_node_count_fn": 3,
"write_affinity_handoff_delete_count": 4},
POLICIES[1]: expected_default} POLICIES[1]: expected_default}
exp_is_local = {POLICIES[0]: [({'region': 1, 'zone': 2}, True), exp_is_local = {POLICIES[0]: [({'region': 1, 'zone': 2}, True),
({'region': 2, 'zone': 1}, False)], ({'region': 2, 'zone': 1}, False)],
@ -1387,7 +1390,8 @@ class TestProxyServerConfigLoading(unittest.TestCase):
self.assertEqual( self.assertEqual(
"ProxyOverrideOptions({}, {'sorting_method': 'shuffle', " "ProxyOverrideOptions({}, {'sorting_method': 'shuffle', "
"'read_affinity': '', 'write_affinity': '', " "'read_affinity': '', 'write_affinity': '', "
"'write_affinity_node_count': '2 * replicas'})", "'write_affinity_node_count': '2 * replicas', "
"'write_affinity_handoff_delete_count': None})",
repr(default_options)) repr(default_options))
self.assertEqual(default_options, eval(repr(default_options), { self.assertEqual(default_options, eval(repr(default_options), {
'ProxyOverrideOptions': default_options.__class__})) 'ProxyOverrideOptions': default_options.__class__}))
@ -1396,7 +1400,8 @@ class TestProxyServerConfigLoading(unittest.TestCase):
self.assertEqual( self.assertEqual(
"ProxyOverrideOptions({}, {'sorting_method': 'affinity', " "ProxyOverrideOptions({}, {'sorting_method': 'affinity', "
"'read_affinity': 'r1=100', 'write_affinity': 'r1', " "'read_affinity': 'r1=100', 'write_affinity': 'r1', "
"'write_affinity_node_count': '1 * replicas'})", "'write_affinity_node_count': '1 * replicas', "
"'write_affinity_handoff_delete_count': 4})",
repr(policy_0_options)) repr(policy_0_options))
self.assertEqual(policy_0_options, eval(repr(policy_0_options), { self.assertEqual(policy_0_options, eval(repr(policy_0_options), {
'ProxyOverrideOptions': policy_0_options.__class__})) 'ProxyOverrideOptions': policy_0_options.__class__}))
@ -1411,6 +1416,7 @@ class TestProxyServerConfigLoading(unittest.TestCase):
use = egg:swift#proxy use = egg:swift#proxy
sorting_method = affinity sorting_method = affinity
write_affinity_node_count = 1 * replicas write_affinity_node_count = 1 * replicas
write_affinity_handoff_delete_count = 3
[proxy-server:policy:0] [proxy-server:policy:0]
read_affinity = r1=100 read_affinity = r1=100
@ -1419,12 +1425,14 @@ class TestProxyServerConfigLoading(unittest.TestCase):
expected_default = {"read_affinity": "", expected_default = {"read_affinity": "",
"sorting_method": "affinity", "sorting_method": "affinity",
"write_affinity": "", "write_affinity": "",
"write_affinity_node_count_fn": 3} "write_affinity_node_count_fn": 3,
"write_affinity_handoff_delete_count": 3}
exp_options = {None: expected_default, exp_options = {None: expected_default,
POLICIES[0]: {"read_affinity": "r1=100", POLICIES[0]: {"read_affinity": "r1=100",
"sorting_method": "affinity", "sorting_method": "affinity",
"write_affinity": "r1", "write_affinity": "r1",
"write_affinity_node_count_fn": 3}, "write_affinity_node_count_fn": 3,
"write_affinity_handoff_delete_count": 3},
POLICIES[1]: expected_default} POLICIES[1]: expected_default}
exp_is_local = {POLICIES[0]: [({'region': 1, 'zone': 2}, True), exp_is_local = {POLICIES[0]: [({'region': 1, 'zone': 2}, True),
({'region': 2, 'zone': 1}, False)], ({'region': 2, 'zone': 1}, False)],
@ -1440,29 +1448,35 @@ class TestProxyServerConfigLoading(unittest.TestCase):
read_affinity = r2=10 read_affinity = r2=10
write_affinity_node_count = 1 * replicas write_affinity_node_count = 1 * replicas
write_affinity = r2 write_affinity = r2
write_affinity_handoff_delete_count = 2
[proxy-server:policy:0] [proxy-server:policy:0]
read_affinity = r1=100 read_affinity = r1=100
write_affinity = r1 write_affinity = r1
write_affinity_node_count = 5 write_affinity_node_count = 5
write_affinity_handoff_delete_count = 3
[proxy-server:policy:1] [proxy-server:policy:1]
read_affinity = r1=1 read_affinity = r1=1
write_affinity = r3 write_affinity = r3
write_affinity_node_count = 4 write_affinity_node_count = 4
write_affinity_handoff_delete_count = 4
""" """
exp_options = {None: {"read_affinity": "r2=10", exp_options = {None: {"read_affinity": "r2=10",
"sorting_method": "affinity", "sorting_method": "affinity",
"write_affinity": "r2", "write_affinity": "r2",
"write_affinity_node_count_fn": 3}, "write_affinity_node_count_fn": 3,
"write_affinity_handoff_delete_count": 2},
POLICIES[0]: {"read_affinity": "r1=100", POLICIES[0]: {"read_affinity": "r1=100",
"sorting_method": "affinity", "sorting_method": "affinity",
"write_affinity": "r1", "write_affinity": "r1",
"write_affinity_node_count_fn": 5}, "write_affinity_node_count_fn": 5,
"write_affinity_handoff_delete_count": 3},
POLICIES[1]: {"read_affinity": "r1=1", POLICIES[1]: {"read_affinity": "r1=1",
"sorting_method": "affinity", "sorting_method": "affinity",
"write_affinity": "r3", "write_affinity": "r3",
"write_affinity_node_count_fn": 4}} "write_affinity_node_count_fn": 4,
"write_affinity_handoff_delete_count": 4}}
exp_is_local = {POLICIES[0]: [({'region': 1, 'zone': 2}, True), exp_is_local = {POLICIES[0]: [({'region': 1, 'zone': 2}, True),
({'region': 2, 'zone': 1}, False)], ({'region': 2, 'zone': 1}, False)],
POLICIES[1]: [({'region': 3, 'zone': 2}, True), POLICIES[1]: [({'region': 3, 'zone': 2}, True),
@ -1533,18 +1547,21 @@ class TestProxyServerConfigLoading(unittest.TestCase):
None: {"read_affinity": "r1=100", None: {"read_affinity": "r1=100",
"sorting_method": "shuffle", "sorting_method": "shuffle",
"write_affinity": "r0", "write_affinity": "r0",
"write_affinity_node_count_fn": 6}, "write_affinity_node_count_fn": 6,
"write_affinity_handoff_delete_count": None},
# policy 0 read affinity is r2, dictated by policy 0 section # policy 0 read affinity is r2, dictated by policy 0 section
POLICIES[0]: {"read_affinity": "r2=100", POLICIES[0]: {"read_affinity": "r2=100",
"sorting_method": "affinity", "sorting_method": "affinity",
"write_affinity": "r2", "write_affinity": "r2",
"write_affinity_node_count_fn": 6}, "write_affinity_node_count_fn": 6,
"write_affinity_handoff_delete_count": None},
# policy 1 read_affinity is r0, dictated by DEFAULT section, # policy 1 read_affinity is r0, dictated by DEFAULT section,
# overrides proxy server section # overrides proxy server section
POLICIES[1]: {"read_affinity": "r0=100", POLICIES[1]: {"read_affinity": "r0=100",
"sorting_method": "affinity", "sorting_method": "affinity",
"write_affinity": "r0", "write_affinity": "r0",
"write_affinity_node_count_fn": 6}} "write_affinity_node_count_fn": 6,
"write_affinity_handoff_delete_count": None}}
exp_is_local = { exp_is_local = {
# default write_affinity is r0, dictated by DEFAULT section # default write_affinity is r0, dictated by DEFAULT section
None: [({'region': 0, 'zone': 2}, True), None: [({'region': 0, 'zone': 2}, True),