Foundational support for PUT and GET of erasure-coded objects

This commit makes it possible to PUT an object into Swift and have it
stored using erasure coding instead of replication, and also to GET
the object back from Swift at a later time.

This works by splitting the incoming object into a number of segments,
erasure-coding each segment in turn to get fragments, then
concatenating the fragments into fragment archives. Segments are 1 MiB
in size, except the last, which is between 1 B and 1 MiB.

+====================================================================+
|                             object data                            |
+====================================================================+

                                   |
          +------------------------+----------------------+
          |                        |                      |
          v                        v                      v

+===================+    +===================+         +==============+
|     segment 1     |    |     segment 2     |   ...   |   segment N  |
+===================+    +===================+         +==============+

          |                       |
          |                       |
          v                       v

     /=========\             /=========\
     | pyeclib |             | pyeclib |         ...
     \=========/             \=========/

          |                       |
          |                       |
          +--> fragment A-1       +--> fragment A-2
          |                       |
          |                       |
          |                       |
          |                       |
          |                       |
          +--> fragment B-1       +--> fragment B-2
          |                       |
          |                       |
         ...                     ...

Then, object server A gets the concatenation of fragment A-1, A-2,
..., A-N, so its .data file looks like this (called a "fragment archive"):

+=====================================================================+
|     fragment A-1     |     fragment A-2     |  ...  |  fragment A-N |
+=====================================================================+

Since this means that the object server never sees the object data as
the client sent it, we have to do a few things to ensure data
integrity.

First, the proxy has to check the Etag if the client provided it; the
object server can't do it since the object server doesn't see the raw
data.

Second, if the client does not provide an Etag, the proxy computes it
and uses the MIME-PUT mechanism to provide it to the object servers
after the object body. Otherwise, the object would not have an Etag at
all.

Third, the proxy computes the MD5 of each fragment archive and sends
it to the object server using the MIME-PUT mechanism. With replicated
objects, the proxy checks that the Etags from all the object servers
match, and if they don't, returns a 500 to the client. This mitigates
the risk of data corruption in one of the proxy --> object connections,
and signals to the client when it happens. With EC objects, we can't
use that same mechanism, so we must send the checksum with each
fragment archive to get comparable protection.

On the GET path, the inverse happens: the proxy connects to a bunch of
object servers (M of them, for an M+K scheme), reads one fragment at a
time from each fragment archive, decodes those fragments into a
segment, and serves the segment to the client.

When an object server dies partway through a GET response, any
partially-fetched fragment is discarded, the resumption point is wound
back to the nearest fragment boundary, and the GET is retried with the
next object server.

GET requests for a single byterange work; GET requests for multiple
byteranges do not.

There are a number of things _not_ included in this commit. Some of
them are listed here:

 * multi-range GET

 * deferred cleanup of old .data files

 * durability (daemon to reconstruct missing archives)

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
This commit is contained in:
Samuel Merritt 2014-10-22 13:18:34 -07:00 committed by Clay Gerrard
parent b1eda4aef8
commit decbcd24d4
19 changed files with 3882 additions and 557 deletions

View File

@ -31,10 +31,28 @@ class SwiftException(Exception):
pass
class PutterConnectError(Exception):
def __init__(self, status=None):
self.status = status
class InvalidTimestamp(SwiftException):
pass
class InsufficientStorage(SwiftException):
pass
class FooterNotSupported(SwiftException):
pass
class MultiphasePUTNotSupported(SwiftException):
pass
class DiskFileError(SwiftException):
pass
@ -103,6 +121,10 @@ class ConnectionTimeout(Timeout):
pass
class ResponseTimeout(Timeout):
pass
class DriveNotMounted(SwiftException):
pass

View File

@ -218,7 +218,14 @@ class FormPost(object):
env, attrs['boundary'])
start_response(status, headers)
return [body]
except (FormInvalid, MimeInvalid, EOFError) as err:
except MimeInvalid:
body = 'FormPost: invalid starting boundary'
start_response(
'400 Bad Request',
(('Content-Type', 'text/plain'),
('Content-Length', str(len(body)))))
return [body]
except (FormInvalid, EOFError) as err:
body = 'FormPost: %s' % err
start_response(
'400 Bad Request',

View File

@ -243,7 +243,7 @@ class Ring(object):
if dev_id not in seen_ids:
part_nodes.append(self.devs[dev_id])
seen_ids.add(dev_id)
return part_nodes
return [dict(node, index=i) for i, node in enumerate(part_nodes)]
def get_part(self, account, container=None, obj=None):
"""
@ -291,6 +291,7 @@ class Ring(object):
====== ===============================================================
id unique integer identifier amongst devices
index offset into the primary node list for the partition
weight a float of the relative weight of this device as compared to
others; this indicates how many partitions the builder will try
to assign to this device

View File

@ -356,6 +356,36 @@ class ECStoragePolicy(BaseStoragePolicy):
def ec_segment_size(self):
return self._ec_segment_size
@property
def fragment_size(self):
"""
Maximum length of a fragment, including header.
NB: a fragment archive is a sequence of 0 or more max-length
fragments followed by one possibly-shorter fragment.
"""
# Technically pyeclib's get_segment_info signature calls for
# (data_len, segment_size) but on a ranged GET we don't know the
# ec-content-length header before we need to compute where in the
# object we should request to align with the fragment size. So we
# tell pyeclib a lie - from it's perspective, as long as data_len >=
# segment_size it'll give us the answer we want. From our
# perspective, because we only use this answer to calculate the
# *minimum* size we should read from an object body even if data_len <
# segment_size we'll still only read *the whole one and only last
# fragment* and pass than into pyeclib who will know what to do with
# it just as it always does when the last fragment is < fragment_size.
return self.pyeclib_driver.get_segment_info(
self.ec_segment_size, self.ec_segment_size)['fragment_size']
@property
def ec_scheme_description(self):
"""
This short hand form of the important parts of the ec schema is stored
in Object System Metadata on the EC Fragment Archives for debugging.
"""
return "%s %d+%d" % (self._ec_type, self._ec_ndata, self._ec_nparity)
def __repr__(self):
return ("%s, EC config(ec_type=%s, ec_segment_size=%d, "
"ec_ndata=%d, ec_nparity=%d)") % (

View File

@ -2236,11 +2236,16 @@ class GreenAsyncPile(object):
Correlating results with jobs (if necessary) is left to the caller.
"""
def __init__(self, size):
def __init__(self, size_or_pool):
"""
:param size: size pool of green threads to use
:param size_or_pool: thread pool size or a pool to use
"""
self._pool = GreenPool(size)
if isinstance(size_or_pool, GreenPool):
self._pool = size_or_pool
size = self._pool.size
else:
self._pool = GreenPool(size_or_pool)
size = size_or_pool
self._responses = eventlet.queue.LightQueue(size)
self._inflight = 0
@ -2646,6 +2651,10 @@ def public(func):
def quorum_size(n):
"""
quorum size as it applies to services that use 'replication' for data
integrity (Account/Container services). Object quorum_size is defined
on a storage policy basis.
Number of successful backend requests needed for the proxy to consider
the client request successful.
"""
@ -3139,6 +3148,26 @@ _rfc_extension_pattern = re.compile(
r'(?:\s*;\s*(' + _rfc_token + r")\s*(?:=\s*(" + _rfc_token +
r'|"(?:[^"\\]|\\.)*"))?)')
_content_range_pattern = re.compile(r'^bytes (\d+)-(\d+)/(\d+)$')
def parse_content_range(content_range):
"""
Parse a content-range header into (first_byte, last_byte, total_size).
See RFC 7233 section 4.2 for details on the header format, but it's
basically "Content-Range: bytes ${start}-${end}/${total}".
:param content_range: Content-Range header value to parse,
e.g. "bytes 100-1249/49004"
:returns: 3-tuple (start, end, total)
:raises: ValueError if malformed
"""
found = re.search(_content_range_pattern, content_range)
if not found:
raise ValueError("malformed Content-Range %r" % (content_range,))
return tuple(int(x) for x in found.groups())
def parse_content_type(content_type):
"""
@ -3293,8 +3322,11 @@ def iter_multipart_mime_documents(wsgi_input, boundary, read_chunk_size=4096):
:raises: MimeInvalid if the document is malformed
"""
boundary = '--' + boundary
if wsgi_input.readline(len(boundary + '\r\n')).strip() != boundary:
raise swift.common.exceptions.MimeInvalid('invalid starting boundary')
blen = len(boundary) + 2 # \r\n
got = wsgi_input.readline(blen)
if got.strip() != boundary:
raise swift.common.exceptions.MimeInvalid(
'invalid starting boundary: wanted %r, got %r', (boundary, got))
boundary = '\r\n' + boundary
input_buffer = ''
done = False

View File

@ -530,6 +530,11 @@ class DiskFileRouter(object):
their DiskFile implementation.
"""
def register_wrapper(diskfile_cls):
if policy_type in cls.policy_type_to_manager_cls:
raise PolicyError(
'%r is already registered for the policy_type %r' % (
cls.policy_type_to_manager_cls[policy_type],
policy_type))
cls.policy_type_to_manager_cls[policy_type] = diskfile_cls
return diskfile_cls
return register_wrapper

View File

@ -13,7 +13,7 @@
from swift.proxy.controllers.base import Controller
from swift.proxy.controllers.info import InfoController
from swift.proxy.controllers.obj import ObjectController
from swift.proxy.controllers.obj import ObjectControllerRouter
from swift.proxy.controllers.account import AccountController
from swift.proxy.controllers.container import ContainerController
@ -22,5 +22,5 @@ __all__ = [
'ContainerController',
'Controller',
'InfoController',
'ObjectController',
'ObjectControllerRouter',
]

View File

@ -58,9 +58,10 @@ class AccountController(Controller):
constraints.MAX_ACCOUNT_NAME_LENGTH)
return resp
partition, nodes = self.app.account_ring.get_nodes(self.account_name)
partition = self.app.account_ring.get_part(self.account_name)
node_iter = self.app.iter_nodes(self.app.account_ring, partition)
resp = self.GETorHEAD_base(
req, _('Account'), self.app.account_ring, partition,
req, _('Account'), node_iter, partition,
req.swift_entity_path.rstrip('/'))
if resp.status_int == HTTP_NOT_FOUND:
if resp.headers.get('X-Account-Status', '').lower() == 'deleted':

View File

@ -28,6 +28,7 @@ import os
import time
import functools
import inspect
import logging
import operator
from sys import exc_info
from swift import gettext_ as _
@ -39,14 +40,14 @@ from eventlet.timeout import Timeout
from swift.common.wsgi import make_pre_authed_env
from swift.common.utils import Timestamp, config_true_value, \
public, split_path, list_from_csv, GreenthreadSafeIterator, \
quorum_size, GreenAsyncPile
GreenAsyncPile, quorum_size, parse_content_range
from swift.common.bufferedhttp import http_connect
from swift.common.exceptions import ChunkReadTimeout, ChunkWriteTimeout, \
ConnectionTimeout
from swift.common.http import is_informational, is_success, is_redirection, \
is_server_error, HTTP_OK, HTTP_PARTIAL_CONTENT, HTTP_MULTIPLE_CHOICES, \
HTTP_BAD_REQUEST, HTTP_NOT_FOUND, HTTP_SERVICE_UNAVAILABLE, \
HTTP_INSUFFICIENT_STORAGE, HTTP_UNAUTHORIZED
HTTP_INSUFFICIENT_STORAGE, HTTP_UNAUTHORIZED, HTTP_CONTINUE
from swift.common.swob import Request, Response, HeaderKeyDict, Range, \
HTTPException, HTTPRequestedRangeNotSatisfiable
from swift.common.request_helpers import strip_sys_meta_prefix, \
@ -593,16 +594,37 @@ def close_swift_conn(src):
pass
def bytes_to_skip(record_size, range_start):
"""
Assume an object is composed of N records, where the first N-1 are all
the same size and the last is at most that large, but may be smaller.
When a range request is made, it might start with a partial record. This
must be discarded, lest the consumer get bad data. This is particularly
true of suffix-byte-range requests, e.g. "Range: bytes=-12345" where the
size of the object is unknown at the time the request is made.
This function computes the number of bytes that must be discarded to
ensure only whole records are yielded. Erasure-code decoding needs this.
This function could have been inlined, but it took enough tries to get
right that some targeted unit tests were desirable, hence its extraction.
"""
return (record_size - (range_start % record_size)) % record_size
class GetOrHeadHandler(object):
def __init__(self, app, req, server_type, ring, partition, path,
backend_headers):
def __init__(self, app, req, server_type, node_iter, partition, path,
backend_headers, client_chunk_size=None):
self.app = app
self.ring = ring
self.node_iter = node_iter
self.server_type = server_type
self.partition = partition
self.path = path
self.backend_headers = backend_headers
self.client_chunk_size = client_chunk_size
self.skip_bytes = 0
self.used_nodes = []
self.used_source_etag = ''
@ -649,6 +671,35 @@ class GetOrHeadHandler(object):
else:
self.backend_headers['Range'] = 'bytes=%d-' % num_bytes
def learn_size_from_content_range(self, start, end):
"""
If client_chunk_size is set, makes sure we yield things starting on
chunk boundaries based on the Content-Range header in the response.
Sets our first Range header to the value learned from the
Content-Range header in the response; if we were given a
fully-specified range (e.g. "bytes=123-456"), this is a no-op.
If we were given a half-specified range (e.g. "bytes=123-" or
"bytes=-456"), then this changes the Range header to a
semantically-equivalent one *and* it lets us resume on a proper
boundary instead of just in the middle of a piece somewhere.
If the original request is for more than one range, this does not
affect our backend Range header, since we don't support resuming one
of those anyway.
"""
if self.client_chunk_size:
self.skip_bytes = bytes_to_skip(self.client_chunk_size, start)
if 'Range' in self.backend_headers:
req_range = Range(self.backend_headers['Range'])
if len(req_range.ranges) > 1:
return
self.backend_headers['Range'] = "bytes=%d-%d" % (start, end)
def is_good_source(self, src):
"""
Indicates whether or not the request made to the backend found
@ -674,42 +725,74 @@ class GetOrHeadHandler(object):
"""
try:
nchunks = 0
bytes_read_from_source = 0
client_chunk_size = self.client_chunk_size
bytes_consumed_from_backend = 0
node_timeout = self.app.node_timeout
if self.server_type == 'Object':
node_timeout = self.app.recoverable_node_timeout
buf = ''
while True:
try:
with ChunkReadTimeout(node_timeout):
chunk = source.read(self.app.object_chunk_size)
nchunks += 1
bytes_read_from_source += len(chunk)
buf += chunk
except ChunkReadTimeout:
exc_type, exc_value, exc_traceback = exc_info()
if self.newest or self.server_type != 'Object':
raise exc_type, exc_value, exc_traceback
try:
self.fast_forward(bytes_read_from_source)
self.fast_forward(bytes_consumed_from_backend)
except (NotImplementedError, HTTPException, ValueError):
raise exc_type, exc_value, exc_traceback
buf = ''
new_source, new_node = self._get_source_and_node()
if new_source:
self.app.exception_occurred(
node, _('Object'),
_('Trying to read during GET (retrying)'))
_('Trying to read during GET (retrying)'),
level=logging.ERROR, exc_info=(
exc_type, exc_value, exc_traceback))
# Close-out the connection as best as possible.
if getattr(source, 'swift_conn', None):
close_swift_conn(source)
source = new_source
node = new_node
bytes_read_from_source = 0
continue
else:
raise exc_type, exc_value, exc_traceback
if buf and self.skip_bytes:
if self.skip_bytes < len(buf):
buf = buf[self.skip_bytes:]
bytes_consumed_from_backend += self.skip_bytes
self.skip_bytes = 0
else:
self.skip_bytes -= len(buf)
bytes_consumed_from_backend += len(buf)
buf = ''
if not chunk:
break
if buf:
with ChunkWriteTimeout(self.app.client_timeout):
yield chunk
bytes_consumed_from_backend += len(buf)
yield buf
buf = ''
break
if client_chunk_size is not None:
while len(buf) >= client_chunk_size:
client_chunk = buf[:client_chunk_size]
buf = buf[client_chunk_size:]
with ChunkWriteTimeout(self.app.client_timeout):
yield client_chunk
bytes_consumed_from_backend += len(client_chunk)
else:
with ChunkWriteTimeout(self.app.client_timeout):
yield buf
bytes_consumed_from_backend += len(buf)
buf = ''
# This is for fairness; if the network is outpacing the CPU,
# we'll always be able to read and write data without
# encountering an EWOULDBLOCK, and so eventlet will not switch
@ -757,7 +840,7 @@ class GetOrHeadHandler(object):
node_timeout = self.app.node_timeout
if self.server_type == 'Object' and not self.newest:
node_timeout = self.app.recoverable_node_timeout
for node in self.app.iter_nodes(self.ring, self.partition):
for node in self.node_iter:
if node in self.used_nodes:
continue
start_node_timing = time.time()
@ -793,8 +876,10 @@ class GetOrHeadHandler(object):
src_headers = dict(
(k.lower(), v) for k, v in
possible_source.getheaders())
if src_headers.get('etag', '').strip('"') != \
self.used_source_etag:
if self.used_source_etag != src_headers.get(
'x-object-sysmeta-ec-etag',
src_headers.get('etag', '')).strip('"'):
self.statuses.append(HTTP_NOT_FOUND)
self.reasons.append('')
self.bodies.append('')
@ -832,7 +917,9 @@ class GetOrHeadHandler(object):
src_headers = dict(
(k.lower(), v) for k, v in
possible_source.getheaders())
self.used_source_etag = src_headers.get('etag', '').strip('"')
self.used_source_etag = src_headers.get(
'x-object-sysmeta-ec-etag',
src_headers.get('etag', '')).strip('"')
return source, node
return None, None
@ -841,13 +928,17 @@ class GetOrHeadHandler(object):
res = None
if source:
res = Response(request=req)
res.status = source.status
update_headers(res, source.getheaders())
if req.method == 'GET' and \
source.status in (HTTP_OK, HTTP_PARTIAL_CONTENT):
cr = res.headers.get('Content-Range')
if cr:
start, end, total = parse_content_range(cr)
self.learn_size_from_content_range(start, end)
res.app_iter = self._make_app_iter(req, node, source)
# See NOTE: swift_conn at top of file about this.
res.swift_conn = source.swift_conn
res.status = source.status
update_headers(res, source.getheaders())
if not res.environ:
res.environ = {}
res.environ['swift_x_timestamp'] = \
@ -993,7 +1084,8 @@ class Controller(object):
else:
info['partition'] = part
info['nodes'] = nodes
info.setdefault('storage_policy', '0')
if info.get('storage_policy') is None:
info['storage_policy'] = 0
return info
def _make_request(self, nodes, part, method, path, headers, query,
@ -1098,6 +1190,13 @@ class Controller(object):
'%s %s' % (self.server_type, req.method),
overrides=overrides, headers=resp_headers)
def _quorum_size(self, n):
"""
Number of successful backend responses needed for the proxy to
consider the client request successful.
"""
return quorum_size(n)
def have_quorum(self, statuses, node_count):
"""
Given a list of statuses from several requests, determine if
@ -1107,16 +1206,18 @@ class Controller(object):
:param node_count: number of nodes being queried (basically ring count)
:returns: True or False, depending on if quorum is established
"""
quorum = quorum_size(node_count)
quorum = self._quorum_size(node_count)
if len(statuses) >= quorum:
for hundred in (HTTP_OK, HTTP_MULTIPLE_CHOICES, HTTP_BAD_REQUEST):
for hundred in (HTTP_CONTINUE, HTTP_OK, HTTP_MULTIPLE_CHOICES,
HTTP_BAD_REQUEST):
if sum(1 for s in statuses
if hundred <= s < hundred + 100) >= quorum:
return True
return False
def best_response(self, req, statuses, reasons, bodies, server_type,
etag=None, headers=None, overrides=None):
etag=None, headers=None, overrides=None,
quorum_size=None):
"""
Given a list of responses from several servers, choose the best to
return to the API.
@ -1128,10 +1229,16 @@ class Controller(object):
:param server_type: type of server the responses came from
:param etag: etag
:param headers: headers of each response
:param overrides: overrides to apply when lacking quorum
:param quorum_size: quorum size to use
:returns: swob.Response object with the correct status, body, etc. set
"""
if quorum_size is None:
quorum_size = self._quorum_size(len(statuses))
resp = self._compute_quorum_response(
req, statuses, reasons, bodies, etag, headers)
req, statuses, reasons, bodies, etag, headers,
quorum_size=quorum_size)
if overrides and not resp:
faked_up_status_indices = set()
transformed = []
@ -1145,7 +1252,8 @@ class Controller(object):
statuses, reasons, headers, bodies = zip(*transformed)
resp = self._compute_quorum_response(
req, statuses, reasons, bodies, etag, headers,
indices_to_avoid=faked_up_status_indices)
indices_to_avoid=faked_up_status_indices,
quorum_size=quorum_size)
if not resp:
resp = Response(request=req)
@ -1156,14 +1264,14 @@ class Controller(object):
return resp
def _compute_quorum_response(self, req, statuses, reasons, bodies, etag,
headers, indices_to_avoid=()):
headers, quorum_size, indices_to_avoid=()):
if not statuses:
return None
for hundred in (HTTP_OK, HTTP_MULTIPLE_CHOICES, HTTP_BAD_REQUEST):
hstatuses = \
[(i, s) for i, s in enumerate(statuses)
if hundred <= s < hundred + 100]
if len(hstatuses) >= quorum_size(len(statuses)):
if len(hstatuses) >= quorum_size:
resp = Response(request=req)
try:
status_index, status = max(
@ -1228,22 +1336,25 @@ class Controller(object):
else:
self.app.logger.warning('Could not autocreate account %r' % path)
def GETorHEAD_base(self, req, server_type, ring, partition, path):
def GETorHEAD_base(self, req, server_type, node_iter, partition, path,
client_chunk_size=None):
"""
Base handler for HTTP GET or HEAD requests.
:param req: swob.Request object
:param server_type: server type used in logging
:param ring: the ring to obtain nodes from
:param node_iter: an iterator to obtain nodes from
:param partition: partition
:param path: path for the request
:param client_chunk_size: chunk size for response body iterator
:returns: swob.Response object
"""
backend_headers = self.generate_request_headers(
req, additional=req.headers)
handler = GetOrHeadHandler(self.app, req, self.server_type, ring,
partition, path, backend_headers)
handler = GetOrHeadHandler(self.app, req, self.server_type, node_iter,
partition, path, backend_headers,
client_chunk_size=client_chunk_size)
res = handler.get_working_response(req)
if not res:

View File

@ -93,8 +93,9 @@ class ContainerController(Controller):
return HTTPNotFound(request=req)
part = self.app.container_ring.get_part(
self.account_name, self.container_name)
node_iter = self.app.iter_nodes(self.app.container_ring, part)
resp = self.GETorHEAD_base(
req, _('Container'), self.app.container_ring, part,
req, _('Container'), node_iter, part,
req.swift_entity_path)
if 'swift.authorize' in req.environ:
req.acl = resp.headers.get('x-container-read')

File diff suppressed because it is too large Load Diff

View File

@ -20,6 +20,8 @@ from swift import gettext_ as _
from random import shuffle
from time import time
import itertools
import functools
import sys
from eventlet import Timeout
@ -32,11 +34,12 @@ from swift.common.utils import cache_from_env, get_logger, \
affinity_key_function, affinity_locality_predicate, list_from_csv, \
register_swift_info
from swift.common.constraints import check_utf8
from swift.proxy.controllers import AccountController, ObjectController, \
ContainerController, InfoController
from swift.proxy.controllers import AccountController, ContainerController, \
ObjectControllerRouter, InfoController
from swift.proxy.controllers.base import get_container_info
from swift.common.swob import HTTPBadRequest, HTTPForbidden, \
HTTPMethodNotAllowed, HTTPNotFound, HTTPPreconditionFailed, \
HTTPServerError, HTTPException, Request
HTTPServerError, HTTPException, Request, HTTPServiceUnavailable
# List of entry points for mandatory middlewares.
@ -109,6 +112,7 @@ class Application(object):
# ensure rings are loaded for all configured storage policies
for policy in POLICIES:
policy.load_ring(swift_dir)
self.obj_controller_router = ObjectControllerRouter()
self.memcache = memcache
mimetypes.init(mimetypes.knownfiles +
[os.path.join(swift_dir, 'mime.types')])
@ -235,29 +239,44 @@ class Application(object):
"""
return POLICIES.get_object_ring(policy_idx, self.swift_dir)
def get_controller(self, path):
def get_controller(self, req):
"""
Get the controller to handle a request.
:param path: path from request
:param req: the request
:returns: tuple of (controller class, path dictionary)
:raises: ValueError (thrown by split_path) if given invalid path
"""
if path == '/info':
if req.path == '/info':
d = dict(version=None,
expose_info=self.expose_info,
disallowed_sections=self.disallowed_sections,
admin_key=self.admin_key)
return InfoController, d
version, account, container, obj = split_path(path, 1, 4, True)
version, account, container, obj = split_path(req.path, 1, 4, True)
d = dict(version=version,
account_name=account,
container_name=container,
object_name=obj)
if obj and container and account:
return ObjectController, d
info = get_container_info(req.environ, self)
policy_index = req.headers.get('X-Backend-Storage-Policy-Index',
info['storage_policy'])
policy = POLICIES.get_by_index(policy_index)
if not policy:
# This indicates that a new policy has been created,
# with rings, deployed, released (i.e. deprecated =
# False), used by a client to create a container via
# another proxy that was restarted after the policy
# was released, and is now cached - all before this
# worker was HUPed to stop accepting new
# connections. There should never be an "unknown"
# index - but when there is - it's probably operator
# error and hopefully temporary.
raise HTTPServiceUnavailable('Unknown Storage Policy')
return self.obj_controller_router[policy], d
elif container and account:
return ContainerController, d
elif account and not container and not obj:
@ -317,7 +336,7 @@ class Application(object):
request=req, body='Invalid UTF8 or contains NULL')
try:
controller, path_parts = self.get_controller(req.path)
controller, path_parts = self.get_controller(req)
p = req.path_info
if isinstance(p, unicode):
p = p.encode('utf-8')
@ -474,9 +493,9 @@ class Application(object):
def iter_nodes(self, ring, partition, node_iter=None):
"""
Yields nodes for a ring partition, skipping over error
limited nodes and stopping at the configurable number of
nodes. If a node yielded subsequently gets error limited, an
extra node will be yielded to take its place.
limited nodes and stopping at the configurable number of nodes. If a
node yielded subsequently gets error limited, an extra node will be
yielded to take its place.
Note that if you're going to iterate over this concurrently from
multiple greenthreads, you'll want to use a
@ -527,7 +546,8 @@ class Application(object):
if nodes_left <= 0:
return
def exception_occurred(self, node, typ, additional_info):
def exception_occurred(self, node, typ, additional_info,
**kwargs):
"""
Handle logging of generic exceptions.
@ -536,11 +556,18 @@ class Application(object):
:param additional_info: additional information to log
"""
self._incr_node_errors(node)
self.logger.exception(
_('ERROR with %(type)s server %(ip)s:%(port)s/%(device)s re: '
'%(info)s'),
{'type': typ, 'ip': node['ip'], 'port': node['port'],
'device': node['device'], 'info': additional_info})
if 'level' in kwargs:
log = functools.partial(self.logger.log, kwargs.pop('level'))
if 'exc_info' not in kwargs:
kwargs['exc_info'] = sys.exc_info()
else:
log = self.logger.exception
log(_('ERROR with %(type)s server %(ip)s:%(port)s/%(device)s'
' re: %(info)s'), {
'type': typ, 'ip': node['ip'], 'port':
node['port'], 'device': node['device'],
'info': additional_info
}, **kwargs)
def modify_wsgi_pipeline(self, pipe):
"""

View File

@ -67,11 +67,11 @@ def patch_policies(thing_or_policies=None, legacy_only=False,
elif with_ec_default:
default_policies = [
ECStoragePolicy(0, name='ec', is_default=True,
ec_type='jerasure_rs_vand', ec_ndata=4,
ec_nparity=2, ec_segment_size=4096),
ec_type='jerasure_rs_vand', ec_ndata=10,
ec_nparity=4, ec_segment_size=4096),
StoragePolicy(1, name='unu'),
]
default_ring_args = [{'replicas': 6}, {}]
default_ring_args = [{'replicas': 14}, {}]
else:
default_policies = [
StoragePolicy(0, name='nulo', is_default=True),
@ -223,7 +223,7 @@ class FakeRing(Ring):
return self.replicas
def _get_part_nodes(self, part):
return list(self._devs)
return [dict(node, index=i) for i, node in enumerate(list(self._devs))]
def get_more_nodes(self, part):
# replicas^2 is the true cap

View File

@ -297,7 +297,8 @@ class TestReaper(unittest.TestCase):
'X-Backend-Storage-Policy-Index': policy.idx
}
ring = r.get_object_ring(policy.idx)
expected = call(ring.devs[i], 0, 'a', 'c', 'o',
expected = call(dict(ring.devs[i], index=i), 0,
'a', 'c', 'o',
headers=headers, conn_timeout=0.5,
response_timeout=10)
self.assertEqual(call_args, expected)

View File

@ -363,63 +363,74 @@ class TestRing(TestRingBase):
self.assertRaises(TypeError, self.ring.get_nodes)
part, nodes = self.ring.get_nodes('a')
self.assertEquals(part, 0)
self.assertEquals(nodes, [self.intended_devs[0],
self.intended_devs[3]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a1')
self.assertEquals(part, 0)
self.assertEquals(nodes, [self.intended_devs[0],
self.intended_devs[3]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a4')
self.assertEquals(part, 1)
self.assertEquals(nodes, [self.intended_devs[1],
self.intended_devs[4]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[1],
self.intended_devs[4]])])
part, nodes = self.ring.get_nodes('aa')
self.assertEquals(part, 1)
self.assertEquals(nodes, [self.intended_devs[1],
self.intended_devs[4]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[1],
self.intended_devs[4]])])
part, nodes = self.ring.get_nodes('a', 'c1')
self.assertEquals(part, 0)
self.assertEquals(nodes, [self.intended_devs[0],
self.intended_devs[3]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c0')
self.assertEquals(part, 3)
self.assertEquals(nodes, [self.intended_devs[1],
self.intended_devs[4]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[1],
self.intended_devs[4]])])
part, nodes = self.ring.get_nodes('a', 'c3')
self.assertEquals(part, 2)
self.assertEquals(nodes, [self.intended_devs[0],
self.intended_devs[3]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c2')
self.assertEquals(part, 2)
self.assertEquals(nodes, [self.intended_devs[0],
self.intended_devs[3]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c', 'o1')
self.assertEquals(part, 1)
self.assertEquals(nodes, [self.intended_devs[1],
self.intended_devs[4]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[1],
self.intended_devs[4]])])
part, nodes = self.ring.get_nodes('a', 'c', 'o5')
self.assertEquals(part, 0)
self.assertEquals(nodes, [self.intended_devs[0],
self.intended_devs[3]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c', 'o0')
self.assertEquals(part, 0)
self.assertEquals(nodes, [self.intended_devs[0],
self.intended_devs[3]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
part, nodes = self.ring.get_nodes('a', 'c', 'o2')
self.assertEquals(part, 2)
self.assertEquals(nodes, [self.intended_devs[0],
self.intended_devs[3]])
self.assertEquals(nodes, [dict(node, index=i) for i, node in
enumerate([self.intended_devs[0],
self.intended_devs[3]])])
def add_dev_to_ring(self, new_dev):
self.ring.devs.append(new_dev)

View File

@ -2190,13 +2190,14 @@ cluster_dfw1 = http://dfw1.host/v1/
self.assertFalse(utils.streq_const_time('a', 'aaaaa'))
self.assertFalse(utils.streq_const_time('ABC123', 'abc123'))
def test_quorum_size(self):
def test_replication_quorum_size(self):
expected_sizes = {1: 1,
2: 2,
3: 2,
4: 3,
5: 3}
got_sizes = dict([(n, utils.quorum_size(n)) for n in expected_sizes])
got_sizes = dict([(n, utils.quorum_size(n))
for n in expected_sizes])
self.assertEqual(expected_sizes, got_sizes)
def test_rsync_ip_ipv4_localhost(self):
@ -4593,6 +4594,22 @@ class TestLRUCache(unittest.TestCase):
self.assertEqual(f.size(), 4)
class TestParseContentRange(unittest.TestCase):
def test_good(self):
start, end, total = utils.parse_content_range("bytes 100-200/300")
self.assertEqual(start, 100)
self.assertEqual(end, 200)
self.assertEqual(total, 300)
def test_bad(self):
self.assertRaises(ValueError, utils.parse_content_range,
"100-300/500")
self.assertRaises(ValueError, utils.parse_content_range,
"bytes 100-200/aardvark")
self.assertRaises(ValueError, utils.parse_content_range,
"bytes bulbous-bouffant/4994801")
class TestParseContentDisposition(unittest.TestCase):
def test_basic_content_type(self):
@ -4622,7 +4639,8 @@ class TestIterMultipartMimeDocuments(unittest.TestCase):
it.next()
except MimeInvalid as err:
exc = err
self.assertEquals(str(exc), 'invalid starting boundary')
self.assertTrue('invalid starting boundary' in str(exc))
self.assertTrue('--unique' in str(exc))
def test_empty(self):
it = utils.iter_multipart_mime_documents(StringIO('--unique'),

View File

@ -21,9 +21,11 @@ from swift.proxy.controllers.base import headers_to_container_info, \
headers_to_account_info, headers_to_object_info, get_container_info, \
get_container_memcache_key, get_account_info, get_account_memcache_key, \
get_object_env_key, get_info, get_object_info, \
Controller, GetOrHeadHandler, _set_info_cache, _set_object_info_cache
Controller, GetOrHeadHandler, _set_info_cache, _set_object_info_cache, \
bytes_to_skip
from swift.common.swob import Request, HTTPException, HeaderKeyDict, \
RESPONSE_REASONS
from swift.common import exceptions
from swift.common.utils import split_path
from swift.common.http import is_success
from swift.common.storage_policy import StoragePolicy
@ -159,9 +161,11 @@ class TestFuncs(unittest.TestCase):
def test_GETorHEAD_base(self):
base = Controller(self.app)
req = Request.blank('/v1/a/c/o/with/slashes')
ring = FakeRing()
nodes = list(ring.get_part_nodes(0)) + list(ring.get_more_nodes(0))
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
resp = base.GETorHEAD_base(req, 'object', FakeRing(), 'part',
resp = base.GETorHEAD_base(req, 'object', iter(nodes), 'part',
'/a/c/o/with/slashes')
self.assertTrue('swift.object/a/c/o/with/slashes' in resp.environ)
self.assertEqual(
@ -169,14 +173,14 @@ class TestFuncs(unittest.TestCase):
req = Request.blank('/v1/a/c/o')
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
resp = base.GETorHEAD_base(req, 'object', FakeRing(), 'part',
resp = base.GETorHEAD_base(req, 'object', iter(nodes), 'part',
'/a/c/o')
self.assertTrue('swift.object/a/c/o' in resp.environ)
self.assertEqual(resp.environ['swift.object/a/c/o']['status'], 200)
req = Request.blank('/v1/a/c')
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
resp = base.GETorHEAD_base(req, 'container', FakeRing(), 'part',
resp = base.GETorHEAD_base(req, 'container', iter(nodes), 'part',
'/a/c')
self.assertTrue('swift.container/a/c' in resp.environ)
self.assertEqual(resp.environ['swift.container/a/c']['status'], 200)
@ -184,7 +188,7 @@ class TestFuncs(unittest.TestCase):
req = Request.blank('/v1/a')
with patch('swift.proxy.controllers.base.'
'http_connect', fake_http_connect(200)):
resp = base.GETorHEAD_base(req, 'account', FakeRing(), 'part',
resp = base.GETorHEAD_base(req, 'account', iter(nodes), 'part',
'/a')
self.assertTrue('swift.account/a' in resp.environ)
self.assertEqual(resp.environ['swift.account/a']['status'], 200)
@ -546,7 +550,7 @@ class TestFuncs(unittest.TestCase):
resp,
headers_to_object_info(headers.items(), 200))
def test_have_quorum(self):
def test_base_have_quorum(self):
base = Controller(self.app)
# just throw a bunch of test cases at it
self.assertEqual(base.have_quorum([201, 404], 3), False)
@ -648,3 +652,88 @@ class TestFuncs(unittest.TestCase):
self.assertEqual(v, dst_headers[k.lower()])
for k, v in bad_hdrs.iteritems():
self.assertFalse(k.lower() in dst_headers)
def test_client_chunk_size(self):
class TestSource(object):
def __init__(self, chunks):
self.chunks = list(chunks)
def read(self, _read_size):
if self.chunks:
return self.chunks.pop(0)
else:
return ''
source = TestSource((
'abcd', '1234', 'abc', 'd1', '234abcd1234abcd1', '2'))
req = Request.blank('/v1/a/c/o')
node = {}
handler = GetOrHeadHandler(self.app, req, None, None, None, None, {},
client_chunk_size=8)
app_iter = handler._make_app_iter(req, node, source)
client_chunks = list(app_iter)
self.assertEqual(client_chunks, [
'abcd1234', 'abcd1234', 'abcd1234', 'abcd12'])
def test_client_chunk_size_resuming(self):
class TestSource(object):
def __init__(self, chunks):
self.chunks = list(chunks)
def read(self, _read_size):
if self.chunks:
chunk = self.chunks.pop(0)
if chunk is None:
raise exceptions.ChunkReadTimeout()
else:
return chunk
else:
return ''
node = {'ip': '1.2.3.4', 'port': 6000, 'device': 'sda'}
source1 = TestSource(['abcd', '1234', 'abc', None])
source2 = TestSource(['efgh5678'])
req = Request.blank('/v1/a/c/o')
handler = GetOrHeadHandler(
self.app, req, 'Object', None, None, None, {},
client_chunk_size=8)
app_iter = handler._make_app_iter(req, node, source1)
with patch.object(handler, '_get_source_and_node',
lambda: (source2, node)):
client_chunks = list(app_iter)
self.assertEqual(client_chunks, ['abcd1234', 'efgh5678'])
self.assertEqual(handler.backend_headers['Range'], 'bytes=8-')
def test_bytes_to_skip(self):
# if you start at the beginning, skip nothing
self.assertEqual(bytes_to_skip(1024, 0), 0)
# missed the first 10 bytes, so we've got 1014 bytes of partial
# record
self.assertEqual(bytes_to_skip(1024, 10), 1014)
# skipped some whole records first
self.assertEqual(bytes_to_skip(1024, 4106), 1014)
# landed on a record boundary
self.assertEqual(bytes_to_skip(1024, 1024), 0)
self.assertEqual(bytes_to_skip(1024, 2048), 0)
# big numbers
self.assertEqual(bytes_to_skip(2 ** 20, 2 ** 32), 0)
self.assertEqual(bytes_to_skip(2 ** 20, 2 ** 32 + 1), 2 ** 20 - 1)
self.assertEqual(bytes_to_skip(2 ** 20, 2 ** 32 + 2 ** 19), 2 ** 19)
# odd numbers
self.assertEqual(bytes_to_skip(123, 0), 0)
self.assertEqual(bytes_to_skip(123, 23), 100)
self.assertEqual(bytes_to_skip(123, 247), 122)
# prime numbers
self.assertEqual(bytes_to_skip(11, 7), 4)
self.assertEqual(bytes_to_skip(97, 7873823), 55)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff