Merge "Add Pros/Cons docs for global cluster consideration"

2016-08-25 00:38:47 +00:00 · 2016-08-25 00:38:47 +00:00 · 714384a716
commit 714384a716
parent 181c7513c1 dfa5523d8c
3 changed files with 185 additions and 119 deletions
--- a/doc/source/admin_guide.rst
+++ b/doc/source/admin_guide.rst
@ -496,133 +496,65 @@ When you specify a policy the containers created also include the policy index,
 thus even when running a container_only report, you will need to specify the
 policy not using the default.

-----------------------------------
-Geographically Distributed Clusters
-----------------------------------
+-----------------------------------------------
+Geographically Distributed Swift Considerations
+-----------------------------------------------

-Swift's default configuration is currently designed to work in a
-single region, where a region is defined as a group of machines with
-high-bandwidth, low-latency links between them. However, configuration
-options exist that make running a performant multi-region Swift
-cluster possible.
+Swift provides two features that may be used to distribute replicas of objects
+across multiple geographically distributed data-centers: with
+:doc:`overview_global_cluster` object replicas may be dispersed across devices
+from different data-centers by using `regions` in ring device descriptors; with
+:doc:`overview_container_sync` objects may be copied between independent Swift
+clusters in each data-center. The operation and configuration of each are
+described in their respective documentation. The following points should be
+considered when selecting the feature that is most appropriate for a particular
+use case:

-For the rest of this section, we will assume a two-region Swift
-cluster: region 1 in San Francisco (SF), and region 2 in New York
-(NY). Each region shall contain within it 3 zones, numbered 1, 2, and
-3, for a total of 6 zones.
+  #. Global Clusters allows the distribution of object replicas across
+     data-centers to be controlled by the cluster operator on per-policy basis,
+     since the distribution is determined by the assignment of devices from
+     each data-center in each policy's ring file. With Container Sync the end
+     user controls the distribution of objects across clusters on a
+     per-container basis.

-~~~~~~~~~~~~~
-read_affinity
-~~~~~~~~~~~~~
+  #. Global Clusters requires an operator to coordinate ring deployments across
+     multiple data-centers. Container Sync allows for independent management of
+     separate Swift clusters in each data-center, and for existing Swift
+     clusters to be used as peers in Container Sync relationships without
+     deploying new policies/rings.

-This setting, combined with sorting_method setting, makes the proxy server prefer local backend servers for
-GET and HEAD requests over non-local ones. For example, it is
-preferable for an SF proxy server to service object GET requests
-by talking to SF object servers, as the client will receive lower
-latency and higher throughput.
+  #. Global Clusters seamlessly supports features that may rely on
+     cross-container operations such as large objects and versioned writes.
+     Container Sync requires the end user to ensure that all required
+     containers are sync'd for these features to work in all data-centers.

-By default, Swift randomly chooses one of the three replicas to give
-to the client, thereby spreading the load evenly. In the case of a
-geographically-distributed cluster, the administrator is likely to
-prioritize keeping traffic local over even distribution of results.
-This is where the read_affinity setting comes in.
+  #. Global Clusters makes objects available for GET or HEAD requests in both
+     data-centers even if a replica of the object has not yet been
+     asynchronously migrated between data-centers, by forwarding requests
+     between data-centers. Container Sync is unable to serve requests for an
+     object in a particular data-center until the asynchronous sync process has
+     copied the object to that data-center.

-Example::
+  #. Global Clusters may require less storage capacity than Container Sync to
+     achieve equivalent durability of objects in each data-center. Global
+     Clusters can restore replicas that are lost or corrupted in one
+     data-center using replicas from other data-centers. Container Sync
+     requires each data-center to independently manage the durability of
+     objects, which may result in each data-center storing more replicas than
+     with Global Clusters.

-    [app:proxy-server]
-    sorting_method = affinity
-    read_affinity = r1=100
-
-This will make the proxy attempt to service GET and HEAD requests from
-backends in region 1 before contacting any backends in region 2.
-However, if no region 1 backends are available (due to replica
-placement, failed hardware, or other reasons), then the proxy will
-fall back to backend servers in other regions.
-
-Example::
-
-    [app:proxy-server]
-    sorting_method = affinity
-    read_affinity = r1z1=100, r1=200
-
-This will make the proxy attempt to service GET and HEAD requests from
-backends in region 1 zone 1, then backends in region 1, then any other
-backends. If a proxy is physically close to a particular zone or
-zones, this can provide bandwidth savings. For example, if a zone
-corresponds to servers in a particular rack, and the proxy server is
-in that same rack, then setting read_affinity to prefer reads from
-within the rack will result in less traffic between the top-of-rack
-switches.
-
-The read_affinity setting may contain any number of region/zone
-specifiers; the priority number (after the equals sign) determines the
-ordering in which backend servers will be contacted. A lower number
-means higher priority.
-
-Note that read_affinity only affects the ordering of primary nodes
-(see ring docs for definition of primary node), not the ordering of
-handoff nodes.
-
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-write_affinity and write_affinity_node_count
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This setting makes the proxy server prefer local backend servers for
-object PUT requests over non-local ones. For example, it may be
-preferable for an SF proxy server to service object PUT requests
-by talking to SF object servers, as the client will receive lower
-latency and higher throughput. However, if this setting is used, note
-that a NY proxy server handling a GET request for an object that was
-PUT using write affinity may have to fetch it across the WAN link, as
-the object won't immediately have any replicas in NY. However,
-replication will move the object's replicas to their proper homes in
-both SF and NY.
-
-Note that only object PUT requests are affected by the write_affinity
-setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT
-requests are not affected.
-
-This setting lets you trade data distribution for throughput. If
-write_affinity is enabled, then object replicas will initially be
-stored all within a particular region or zone, thereby decreasing the
-quality of the data distribution, but the replicas will be distributed
-over fast WAN links, giving higher throughput to clients. Note that
-the replicators will eventually move objects to their proper,
-well-distributed homes.
-
-The write_affinity setting is useful only when you don't typically
-read objects immediately after writing them. For example, consider a
-workload of mainly backups: if you have a bunch of machines in NY that
-periodically write backups to Swift, then odds are that you don't then
-immediately read those backups in SF. If your workload doesn't look
-like that, then you probably shouldn't use write_affinity.
-
-The write_affinity_node_count setting is only useful in conjunction
-with write_affinity; it governs how many local object servers will be
-tried before falling back to non-local ones.
-
-Example::
-
-    [app:proxy-server]
-    write_affinity = r1
-    write_affinity_node_count = 2 * replicas
-
-Assuming 3 replicas, this configuration will make object PUTs try
-storing the object's replicas on up to 6 disks ("2 * replicas") in
-region 1 ("r1"). Proxy server tries to find 3 devices for storing the
-object. While a device is unavailable, it queries the ring for the 4th
-device and so on until 6th device. If the 6th disk is still unavailable,
-the last replica will be sent to other region. It doesn't mean there'll
-have 6 replicas in region 1.
-
-
-You should be aware that, if you have data coming into SF faster than
-your replicators are transferring it to NY, then your cluster's data distribution
-will get worse and worse over time as objects pile up in SF. If this
-happens, it is recommended to disable write_affinity and simply let
-object PUTs traverse the WAN link, as that will naturally limit the
-object growth rate to what your WAN link can handle.
+  #. Global Clusters execute all account/container metadata updates
+     synchronously to account/container replicas in all data-centers, which may
+     incur delays when making updates across WANs. Container Sync only copies
+     objects between data-centers and all Swift internal traffic is
+     confined to each data-center.

+  #. Global Clusters does not yet guarantee the availability of objects stored
+     in Erasure Coded policies when one data-center is offline. With Container
+     Sync the availability of objects in each data-center is independent of the
+     state of other data-centers once objects have been synced. Container Sync
+     also allows objects to be stored using different policy types in different
+     data-centers.

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Checking handoff partition distribution
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -52,6 +52,7 @@ Overview and Concepts
    ratelimit
    overview_large_objects
    overview_object_versioning
+    overview_global_cluster
    overview_container_sync
    overview_expiring_objects
    cors
--- a/doc/source/overview_global_cluster.rst
+++ b/doc/source/overview_global_cluster.rst
@ -0,0 +1,133 @@
+===============
+Global Clusters
+===============
+
+--------
+Overview
+--------
+
+Swift's default configuration is currently designed to work in a
+single region, where a region is defined as a group of machines with
+high-bandwidth, low-latency links between them. However, configuration
+options exist that make running a performant multi-region Swift
+cluster possible.
+
+For the rest of this section, we will assume a two-region Swift
+cluster: region 1 in San Francisco (SF), and region 2 in New York
+(NY). Each region shall contain within it 3 zones, numbered 1, 2, and
+3, for a total of 6 zones.
+
+---------------------------
+Configuring Global Clusters
+---------------------------
+~~~~~~~~~~~~~
+read_affinity
+~~~~~~~~~~~~~
+
+This setting, combined with sorting_method setting, makes the proxy
+server prefer local backend servers for GET and HEAD requests over
+non-local ones. For example, it is preferable for an SF proxy server
+to service object GET requests by talking to SF object servers, as the
+client will receive lower latency and higher throughput.
+
+By default, Swift randomly chooses one of the three replicas to give
+to the client, thereby spreading the load evenly. In the case of a
+geographically-distributed cluster, the administrator is likely to
+prioritize keeping traffic local over even distribution of results.
+This is where the read_affinity setting comes in.
+
+Example::
+
+    [app:proxy-server]
+    sorting_method = affinity
+    read_affinity = r1=100
+
+This will make the proxy attempt to service GET and HEAD requests from
+backends in region 1 before contacting any backends in region 2.
+However, if no region 1 backends are available (due to replica
+placement, failed hardware, or other reasons), then the proxy will
+fall back to backend servers in other regions.
+
+Example::
+
+    [app:proxy-server]
+    sorting_method = affinity
+    read_affinity = r1z1=100, r1=200
+
+This will make the proxy attempt to service GET and HEAD requests from
+backends in region 1 zone 1, then backends in region 1, then any other
+backends. If a proxy is physically close to a particular zone or
+zones, this can provide bandwidth savings. For example, if a zone
+corresponds to servers in a particular rack, and the proxy server is
+in that same rack, then setting read_affinity to prefer reads from
+within the rack will result in less traffic between the top-of-rack
+switches.
+
+The read_affinity setting may contain any number of region/zone
+specifiers; the priority number (after the equals sign) determines the
+ordering in which backend servers will be contacted. A lower number
+means higher priority.
+
+Note that read_affinity only affects the ordering of primary nodes
+(see ring docs for definition of primary node), not the ordering of
+handoff nodes.
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+write_affinity and write_affinity_node_count
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This setting makes the proxy server prefer local backend servers for
+object PUT requests over non-local ones. For example, it may be
+preferable for an SF proxy server to service object PUT requests
+by talking to SF object servers, as the client will receive lower
+latency and higher throughput. However, if this setting is used, note
+that a NY proxy server handling a GET request for an object that was
+PUT using write affinity may have to fetch it across the WAN link, as
+the object won't immediately have any replicas in NY. However,
+replication will move the object's replicas to their proper homes in
+both SF and NY.
+
+Note that only object PUT requests are affected by the write_affinity
+setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT
+requests are not affected.
+
+This setting lets you trade data distribution for throughput. If
+write_affinity is enabled, then object replicas will initially be
+stored all within a particular region or zone, thereby decreasing the
+quality of the data distribution, but the replicas will be distributed
+over fast WAN links, giving higher throughput to clients. Note that
+the replicators will eventually move objects to their proper,
+well-distributed homes.
+
+The write_affinity setting is useful only when you don't typically
+read objects immediately after writing them. For example, consider a
+workload of mainly backups: if you have a bunch of machines in NY that
+periodically write backups to Swift, then odds are that you don't then
+immediately read those backups in SF. If your workload doesn't look
+like that, then you probably shouldn't use write_affinity.
+
+The write_affinity_node_count setting is only useful in conjunction
+with write_affinity; it governs how many local object servers will be
+tried before falling back to non-local ones.
+
+Example::
+
+    [app:proxy-server]
+    write_affinity = r1
+    write_affinity_node_count = 2 * replicas
+
+Assuming 3 replicas, this configuration will make object PUTs try
+storing the object's replicas on up to 6 disks ("2 * replicas") in
+region 1 ("r1"). Proxy server tries to find 3 devices for storing the
+object. While a device is unavailable, it queries the ring for the 4th
+device and so on until 6th device. If the 6th disk is still unavailable,
+the last replica will be sent to other region. It doesn't mean there'll
+have 6 replicas in region 1.
+
+
+You should be aware that, if you have data coming into SF faster than
+your replicators are transferring it to NY, then your cluster's data
+distribution will get worse and worse over time as objects pile up in SF.
+If this happens, it is recommended to disable write_affinity and simply let
+object PUTs traverse the WAN link, as that will naturally limit the
+object growth rate to what your WAN link can handle.