Go to file

Samuel Merritt decbcd24d4 Foundational support for PUT and GET of erasure-coded objects

This commit makes it possible to PUT an object into Swift and have it
stored using erasure coding instead of replication, and also to GET
the object back from Swift at a later time.

This works by splitting the incoming object into a number of segments,
erasure-coding each segment in turn to get fragments, then
concatenating the fragments into fragment archives. Segments are 1 MiB
in size, except the last, which is between 1 B and 1 MiB.

+====================================================================+
|                             object data                            |
+====================================================================+

                                   |
          +------------------------+----------------------+
          |                        |                      |
          v                        v                      v

+===================+    +===================+         +==============+
|     segment 1     |    |     segment 2     |   ...   |   segment N  |
+===================+    +===================+         +==============+

          |                       |
          |                       |
          v                       v

     /=========\             /=========\
     | pyeclib |             | pyeclib |         ...
     \=========/             \=========/

          |                       |
          |                       |
          +--> fragment A-1       +--> fragment A-2
          |                       |
          |                       |
          |                       |
          |                       |
          |                       |
          +--> fragment B-1       +--> fragment B-2
          |                       |
          |                       |
         ...                     ...

Then, object server A gets the concatenation of fragment A-1, A-2,
..., A-N, so its .data file looks like this (called a "fragment archive"):

+=====================================================================+
|     fragment A-1     |     fragment A-2     |  ...  |  fragment A-N |
+=====================================================================+

Since this means that the object server never sees the object data as
the client sent it, we have to do a few things to ensure data
integrity.

First, the proxy has to check the Etag if the client provided it; the
object server can't do it since the object server doesn't see the raw
data.

Second, if the client does not provide an Etag, the proxy computes it
and uses the MIME-PUT mechanism to provide it to the object servers
after the object body. Otherwise, the object would not have an Etag at
all.

Third, the proxy computes the MD5 of each fragment archive and sends
it to the object server using the MIME-PUT mechanism. With replicated
objects, the proxy checks that the Etags from all the object servers
match, and if they don't, returns a 500 to the client. This mitigates
the risk of data corruption in one of the proxy --> object connections,
and signals to the client when it happens. With EC objects, we can't
use that same mechanism, so we must send the checksum with each
fragment archive to get comparable protection.

On the GET path, the inverse happens: the proxy connects to a bunch of
object servers (M of them, for an M+K scheme), reads one fragment at a
time from each fragment archive, decodes those fragments into a
segment, and serves the segment to the client.

When an object server dies partway through a GET response, any
partially-fetched fragment is discarded, the resumption point is wound
back to the nearest fragment boundary, and the GET is retried with the
next object server.

GET requests for a single byterange work; GET requests for multiple
byteranges do not.

There are a number of things _not_ included in this commit. Some of
them are listed here:

 * multi-range GET

 * deferred cleanup of old .data files

 * durability (daemon to reconstruct missing archives)

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2

2015-04-14 00:52:17 -07:00

bin

Add swift-recon feature to track swift-drive-audit error count

2015-03-23 11:38:32 +00:00

doc

Add some debug output to the ring builder

2015-03-30 17:47:28 -07:00

etc

Add support for policy types, 'erasure_coding' policy

2015-04-13 22:57:42 -07:00

examples

Add a user variable to templates

2013-09-17 11:46:04 +10:00

swift

Foundational support for PUT and GET of erasure-coded objects

2015-04-14 00:52:17 -07:00

test

Foundational support for PUT and GET of erasure-coded objects

2015-04-14 00:52:17 -07:00

.coveragerc

Align tox.ini and fix coverage jobs in jenkins.

2012-06-08 20:05:14 -04:00

.functests

Move the tests from functionalnosetests

2014-01-07 15:58:11 +08:00

.gitignore

more probe test refactoring

2015-02-13 16:55:45 -08:00

.gitreview

make git review easier

2015-04-01 12:41:44 -07:00

.mailmap

2.2.2 changelog and authors update

2015-01-28 11:44:58 -08:00

.probetests

Allow specify arguments to .probetests script

2013-12-24 01:18:19 -08:00

.unittests

Fix coverage report for newer versions of coverage

2014-04-24 16:50:03 +00:00

AUTHORS

Promote some of the best developers I know to CORE Emeritus

2015-02-13 13:11:40 -08:00

babel.cfg

add pybabel setup.py commands and initial .pot

2011-01-27 00:01:24 +00:00

CHANGELOG

2.2.2 changelog and authors update

2015-01-28 11:44:58 -08:00

CONTRIBUTING.md

Add Swift Design Principles to CONTRIBUTING.md

2015-03-27 13:13:31 -04:00

LICENSE

Convert LICENSE to use unix style line endings.

2012-12-19 12:48:27 -05:00

MANIFEST.in

Add requirements files to the source distribution

2013-06-03 19:26:20 +04:00

README.md

added testing notes to the contributing doc

2014-12-04 10:41:11 -05:00

requirements.txt

Merge "Bump eventlet version to 0.16.1"

2015-03-25 23:15:58 +00:00

setup.cfg

Fix translation setup

2014-11-19 09:11:55 -05:00

setup.py

taking the global reqs that we can

2014-05-21 09:37:22 -07:00

test-requirements.txt

warn against sorting requirements

2014-09-03 12:03:57 -05:00

tox.ini

updated hacking rules

2014-09-25 11:04:31 -07:00

README.md

Swift

A distributed object storage system designed to scale from a single machine to thousands of servers. Swift is optimized for multi-tenancy and high concurrency. Swift is ideal for backups, web and mobile content, and any other unstructured data that can grow without bound.

Swift provides a simple, REST-based API fully documented at http://docs.openstack.org/.

Swift was originally developed as the basis for Rackspace's Cloud Files and was open-sourced in 2010 as part of the OpenStack project. It has since grown to include contributions from many companies and has spawned a thriving ecosystem of 3rd party tools. Swift's contributors are listed in the AUTHORS file.

Docs

To build documentation install sphinx (pip install sphinx), run python setup.py build_sphinx, and then browse to /doc/build/html/index.html. These docs are auto-generated after every commit and available online at http://docs.openstack.org/developer/swift/.

For Developers

The best place to get started is the "SAIO - Swift All In One". This document will walk you through setting up a development cluster of Swift in a VM. The SAIO environment is ideal for running small-scale tests against swift and trying out new features and bug fixes.

You can run unit tests with .unittests and functional tests with .functests.

If you would like to start contributing, check out these notes to help you get started.

Code Organization

bin/: Executable scripts that are the processes run by the deployer
doc/: Documentation
etc/: Sample config files
swift/: Core code
- account/: account server
- common/: code shared by different modules
  - middleware/: "standard", officially-supported middleware
  - ring/: code implementing Swift's ring
- container/: container server
- obj/: object server
- proxy/: proxy server
test/: Unit and functional tests

Data Flow

Swift is a WSGI application and uses eventlet's WSGI server. After the processes are running, the entry point for new requests is the Application class in swift/proxy/server.py. From there, a controller is chosen, and the request is processed. The proxy may choose to forward the request to a back- end server. For example, the entry point for requests to the object server is the ObjectController class in swift/obj/server.py.

For Deployers

Deployer docs are also available at http://docs.openstack.org/developer/swift/. A good starting point is at http://docs.openstack.org/developer/swift/deployment_guide.html

You can run functional tests against a swift cluster with .functests. These functional tests require /etc/swift/test.conf to run. A sample config file can be found in this source tree in test/sample.conf.

For Client Apps

For client applications, official Python language bindings are provided at http://github.com/openstack/python-swiftclient.

Complete API documentation at http://docs.openstack.org/api/openstack-object-storage/1.0/content/

For more information come hang out in #openstack-swift on freenode.

Thanks,

The Swift Development Team