Documentation of the manifest/segments feature

2010-11-23 14:26:48 -08:00 · 2010-11-23 14:26:48 -08:00 · 1fa4ba38e5
commit 1fa4ba38e5
parent 83e54dda91
2 changed files with 122 additions and 0 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -44,6 +44,7 @@ Overview and Concepts
    overview_replication
    overview_stats
    ratelimit
    overview_large_objects
 Developer Documentation
 =======================
--- a/doc/source/overview_large_objects.rst
+++ b/doc/source/overview_large_objects.rst
@ -0,0 +1,121 @@
 ====================
 Large Object Support
 ====================
 --------
 Overview
 --------
 Swift has a limit on the size of a single uploaded object; by default this is
 5GB. However, the download size of a single object is virtually unlimited with
 the concept of segmentation. Segments of the larger object are uploaded and a
 special manifest file is created that, when downloaded, sends all the segments
 concatenated as a single object. This also offers much greater upload speed
 with the possibility of parallel uploads of the segments.
 ----------------------------------
 Using ``st`` for Segmented Objects
 ----------------------------------
 The quickest way to try out this feature is use the included ``st`` Swift Tool.
 You can use the ``-S`` option to specify the segment size to use when splitting
 a large file. For example::
    st upload test_container -S 1073741824 large_file
 This would split the large_file into 1G segments and begin uploading those
 segments in parallel. Once all the segments have been uploaded, ``st`` will
 then create the manifest file so the segments can be downloaded as one.
 So now, the following ``st`` command would download the entire large object::
    st download test_container large_file
 ``st`` uses a strict convention for its segmented object support. In the above
 example it will upload all the segments into a second container named
 test_container_segments. These segments will have names like
 large_file/1290206778.25/21474836480/00000000,
 large_file/1290206778.25/21474836480/00000001, etc.
 The main benefit for using a separate container is that the main container
 listings will not be polluted with all the segment names. The reason for using
 the segment name format of <name>/<timestamp>/<size>/<segment> is so that an
 upload of a new file with the same name won't overwrite the contents of the
 first until the last moment when the manifest file is updated.
 ``st`` will manage these segment files for you, deleting old segments on
 deletes and overwrites, etc. You can override this behavior with the
 ``--leave-segments`` option if desired; this is useful if you want to have
 multiple versions of the same large object available.
 ----------
 Direct API
 ----------
 You can also work with the segments and manifests directly with HTTP requests
 instead of having ``st`` do that for you. You can just upload the segments like
 you would any other object and the manifest is just a zero-byte file with an
 extra ``X-Object-Manifest`` header.
 All the object segments need to be in the same container, have a common object
 name prefix, and their names sort in the order they should be concatenated.
 They don't have to be in the same container as the manifest file will be, which
 is useful to keep container listings clean as explained above with ``st``.
 The manifest file is simply a zero-byte file with the extra
 ``X-Object-Manifest: <container>/<prefix>`` header, where ``<container>`` is
 the container the object segments are in and ``<prefix>`` is the common prefix
 for all the segments.
 It is best to upload all the segments first and then create or update the
 manifest. In this way, the full object won't be available for downloading until
 the upload is complete. Also, you can upload a new set of segments to a second
 location and then update the manifest to point to this new location. During the
 upload of the new segments, the original manifest will still be available to
 download the first set of segments.
 Here's an example using ``curl`` with tiny 1-byte segments::
    # First, upload the segments
    curl -X PUT -H 'X-Auth-Token: <token>' \
        http://<storage_url>/container/myobject/1 --data-binary '1'
    curl -X PUT -H 'X-Auth-Token: <token>' \
        http://<storage_url>/container/myobject/2 --data-binary '2'
    curl -X PUT -H 'X-Auth-Token: <token>' \
        http://<storage_url>/container/myobject/3 --data-binary '3'
    # Next, create the manifest file
    curl -X PUT -H 'X-Auth-Token: <token>' \
        -H 'X-Object-Manifest: container/myobject/' \
        http://<storage_url>/container/myobject --data-binary ''
    # And now we can download the segments as a single object
    curl -H 'X-Auth-Token: <token>' \
        http://<storage_url>/container/myobject
 ----------------
 Additional Notes
 ----------------
 * With a ``GET`` or ``HEAD`` of a manifest file, the ``X-Object-Manifest:
  <container>/<prefix>`` header will be returned with the concatenated object
  so you can tell where it's getting its segments from.
 * The response's ``Content-Length`` for a ``GET`` or ``HEAD`` on the manifest
  file will be the sum of all the segments in the ``<container>/<prefix>``
  listing, dynamically. So, uploading additional segments after the manifest is
  created will cause the concatenated object to be that much larger; there's no
  need to recreate the manifest file.
 * The response's ``Content-Type`` for a ``GET`` or ``HEAD`` on the manifest
  will be the same as the ``Content-Type`` set during the ``PUT`` request that
  created the manifest. You can easily change the ``Content-Type`` by reissuing
  the ``PUT``.
 * The response's ``ETag`` for a ``GET`` or ``HEAD`` on the manifest file will
  be the MD5 sum of the concatenated string of ETags for each of the segments
  in the ``<container>/<prefix>`` listing, dynamically. Usually in Swift the
  ETag is the MD5 sum of the contents of the object, and that holds true for
  each segment independently. But, it's not feasible to generate such an ETag
  for the manifest itself, so this method was chosen to at least offer change
  detection.