OpenStack Storage (Swift)
Go to file
Samuel Merritt decbcd24d4 Foundational support for PUT and GET of erasure-coded objects
This commit makes it possible to PUT an object into Swift and have it
stored using erasure coding instead of replication, and also to GET
the object back from Swift at a later time.

This works by splitting the incoming object into a number of segments,
erasure-coding each segment in turn to get fragments, then
concatenating the fragments into fragment archives. Segments are 1 MiB
in size, except the last, which is between 1 B and 1 MiB.

+====================================================================+
|                             object data                            |
+====================================================================+

                                   |
          +------------------------+----------------------+
          |                        |                      |
          v                        v                      v

+===================+    +===================+         +==============+
|     segment 1     |    |     segment 2     |   ...   |   segment N  |
+===================+    +===================+         +==============+

          |                       |
          |                       |
          v                       v

     /=========\             /=========\
     | pyeclib |             | pyeclib |         ...
     \=========/             \=========/

          |                       |
          |                       |
          +--> fragment A-1       +--> fragment A-2
          |                       |
          |                       |
          |                       |
          |                       |
          |                       |
          +--> fragment B-1       +--> fragment B-2
          |                       |
          |                       |
         ...                     ...

Then, object server A gets the concatenation of fragment A-1, A-2,
..., A-N, so its .data file looks like this (called a "fragment archive"):

+=====================================================================+
|     fragment A-1     |     fragment A-2     |  ...  |  fragment A-N |
+=====================================================================+

Since this means that the object server never sees the object data as
the client sent it, we have to do a few things to ensure data
integrity.

First, the proxy has to check the Etag if the client provided it; the
object server can't do it since the object server doesn't see the raw
data.

Second, if the client does not provide an Etag, the proxy computes it
and uses the MIME-PUT mechanism to provide it to the object servers
after the object body. Otherwise, the object would not have an Etag at
all.

Third, the proxy computes the MD5 of each fragment archive and sends
it to the object server using the MIME-PUT mechanism. With replicated
objects, the proxy checks that the Etags from all the object servers
match, and if they don't, returns a 500 to the client. This mitigates
the risk of data corruption in one of the proxy --> object connections,
and signals to the client when it happens. With EC objects, we can't
use that same mechanism, so we must send the checksum with each
fragment archive to get comparable protection.

On the GET path, the inverse happens: the proxy connects to a bunch of
object servers (M of them, for an M+K scheme), reads one fragment at a
time from each fragment archive, decodes those fragments into a
segment, and serves the segment to the client.

When an object server dies partway through a GET response, any
partially-fetched fragment is discarded, the resumption point is wound
back to the nearest fragment boundary, and the GET is retried with the
next object server.

GET requests for a single byterange work; GET requests for multiple
byteranges do not.

There are a number of things _not_ included in this commit. Some of
them are listed here:

 * multi-range GET

 * deferred cleanup of old .data files

 * durability (daemon to reconstruct missing archives)

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2015-04-14 00:52:17 -07:00
bin Add swift-recon feature to track swift-drive-audit error count 2015-03-23 11:38:32 +00:00
doc Add some debug output to the ring builder 2015-03-30 17:47:28 -07:00
etc Add support for policy types, 'erasure_coding' policy 2015-04-13 22:57:42 -07:00
examples Add a user variable to templates 2013-09-17 11:46:04 +10:00
swift Foundational support for PUT and GET of erasure-coded objects 2015-04-14 00:52:17 -07:00
test Foundational support for PUT and GET of erasure-coded objects 2015-04-14 00:52:17 -07:00
.coveragerc Align tox.ini and fix coverage jobs in jenkins. 2012-06-08 20:05:14 -04:00
.functests Move the tests from functionalnosetests 2014-01-07 15:58:11 +08:00
.gitignore more probe test refactoring 2015-02-13 16:55:45 -08:00
.gitreview make git review easier 2015-04-01 12:41:44 -07:00
.mailmap 2.2.2 changelog and authors update 2015-01-28 11:44:58 -08:00
.probetests Allow specify arguments to .probetests script 2013-12-24 01:18:19 -08:00
.unittests Fix coverage report for newer versions of coverage 2014-04-24 16:50:03 +00:00
AUTHORS Promote some of the best developers I know to CORE Emeritus 2015-02-13 13:11:40 -08:00
babel.cfg add pybabel setup.py commands and initial .pot 2011-01-27 00:01:24 +00:00
CHANGELOG 2.2.2 changelog and authors update 2015-01-28 11:44:58 -08:00
CONTRIBUTING.md Add Swift Design Principles to CONTRIBUTING.md 2015-03-27 13:13:31 -04:00
LICENSE Convert LICENSE to use unix style line endings. 2012-12-19 12:48:27 -05:00
MANIFEST.in Add requirements files to the source distribution 2013-06-03 19:26:20 +04:00
README.md added testing notes to the contributing doc 2014-12-04 10:41:11 -05:00
requirements.txt Merge "Bump eventlet version to 0.16.1" 2015-03-25 23:15:58 +00:00
setup.cfg Fix translation setup 2014-11-19 09:11:55 -05:00
setup.py taking the global reqs that we can 2014-05-21 09:37:22 -07:00
test-requirements.txt warn against sorting requirements 2014-09-03 12:03:57 -05:00
tox.ini updated hacking rules 2014-09-25 11:04:31 -07:00

Swift

A distributed object storage system designed to scale from a single machine to thousands of servers. Swift is optimized for multi-tenancy and high concurrency. Swift is ideal for backups, web and mobile content, and any other unstructured data that can grow without bound.

Swift provides a simple, REST-based API fully documented at http://docs.openstack.org/.

Swift was originally developed as the basis for Rackspace's Cloud Files and was open-sourced in 2010 as part of the OpenStack project. It has since grown to include contributions from many companies and has spawned a thriving ecosystem of 3rd party tools. Swift's contributors are listed in the AUTHORS file.

Docs

To build documentation install sphinx (pip install sphinx), run python setup.py build_sphinx, and then browse to /doc/build/html/index.html. These docs are auto-generated after every commit and available online at http://docs.openstack.org/developer/swift/.

For Developers

The best place to get started is the "SAIO - Swift All In One". This document will walk you through setting up a development cluster of Swift in a VM. The SAIO environment is ideal for running small-scale tests against swift and trying out new features and bug fixes.

You can run unit tests with .unittests and functional tests with .functests.

If you would like to start contributing, check out these notes to help you get started.

Code Organization

  • bin/: Executable scripts that are the processes run by the deployer
  • doc/: Documentation
  • etc/: Sample config files
  • swift/: Core code
    • account/: account server
    • common/: code shared by different modules
      • middleware/: "standard", officially-supported middleware
      • ring/: code implementing Swift's ring
    • container/: container server
    • obj/: object server
    • proxy/: proxy server
  • test/: Unit and functional tests

Data Flow

Swift is a WSGI application and uses eventlet's WSGI server. After the processes are running, the entry point for new requests is the Application class in swift/proxy/server.py. From there, a controller is chosen, and the request is processed. The proxy may choose to forward the request to a back- end server. For example, the entry point for requests to the object server is the ObjectController class in swift/obj/server.py.

For Deployers

Deployer docs are also available at http://docs.openstack.org/developer/swift/. A good starting point is at http://docs.openstack.org/developer/swift/deployment_guide.html

You can run functional tests against a swift cluster with .functests. These functional tests require /etc/swift/test.conf to run. A sample config file can be found in this source tree in test/sample.conf.

For Client Apps

For client applications, official Python language bindings are provided at http://github.com/openstack/python-swiftclient.

Complete API documentation at http://docs.openstack.org/api/openstack-object-storage/1.0/content/


For more information come hang out in #openstack-swift on freenode.

Thanks,

The Swift Development Team