Merge "Add Pros/Cons docs for global cluster consideration"
This commit is contained in:
commit
714384a716
@ -496,133 +496,65 @@ When you specify a policy the containers created also include the policy index,
|
||||
thus even when running a container_only report, you will need to specify the
|
||||
policy not using the default.
|
||||
|
||||
-----------------------------------
|
||||
Geographically Distributed Clusters
|
||||
-----------------------------------
|
||||
-----------------------------------------------
|
||||
Geographically Distributed Swift Considerations
|
||||
-----------------------------------------------
|
||||
|
||||
Swift's default configuration is currently designed to work in a
|
||||
single region, where a region is defined as a group of machines with
|
||||
high-bandwidth, low-latency links between them. However, configuration
|
||||
options exist that make running a performant multi-region Swift
|
||||
cluster possible.
|
||||
Swift provides two features that may be used to distribute replicas of objects
|
||||
across multiple geographically distributed data-centers: with
|
||||
:doc:`overview_global_cluster` object replicas may be dispersed across devices
|
||||
from different data-centers by using `regions` in ring device descriptors; with
|
||||
:doc:`overview_container_sync` objects may be copied between independent Swift
|
||||
clusters in each data-center. The operation and configuration of each are
|
||||
described in their respective documentation. The following points should be
|
||||
considered when selecting the feature that is most appropriate for a particular
|
||||
use case:
|
||||
|
||||
For the rest of this section, we will assume a two-region Swift
|
||||
cluster: region 1 in San Francisco (SF), and region 2 in New York
|
||||
(NY). Each region shall contain within it 3 zones, numbered 1, 2, and
|
||||
3, for a total of 6 zones.
|
||||
#. Global Clusters allows the distribution of object replicas across
|
||||
data-centers to be controlled by the cluster operator on per-policy basis,
|
||||
since the distribution is determined by the assignment of devices from
|
||||
each data-center in each policy's ring file. With Container Sync the end
|
||||
user controls the distribution of objects across clusters on a
|
||||
per-container basis.
|
||||
|
||||
~~~~~~~~~~~~~
|
||||
read_affinity
|
||||
~~~~~~~~~~~~~
|
||||
#. Global Clusters requires an operator to coordinate ring deployments across
|
||||
multiple data-centers. Container Sync allows for independent management of
|
||||
separate Swift clusters in each data-center, and for existing Swift
|
||||
clusters to be used as peers in Container Sync relationships without
|
||||
deploying new policies/rings.
|
||||
|
||||
This setting, combined with sorting_method setting, makes the proxy server prefer local backend servers for
|
||||
GET and HEAD requests over non-local ones. For example, it is
|
||||
preferable for an SF proxy server to service object GET requests
|
||||
by talking to SF object servers, as the client will receive lower
|
||||
latency and higher throughput.
|
||||
#. Global Clusters seamlessly supports features that may rely on
|
||||
cross-container operations such as large objects and versioned writes.
|
||||
Container Sync requires the end user to ensure that all required
|
||||
containers are sync'd for these features to work in all data-centers.
|
||||
|
||||
By default, Swift randomly chooses one of the three replicas to give
|
||||
to the client, thereby spreading the load evenly. In the case of a
|
||||
geographically-distributed cluster, the administrator is likely to
|
||||
prioritize keeping traffic local over even distribution of results.
|
||||
This is where the read_affinity setting comes in.
|
||||
#. Global Clusters makes objects available for GET or HEAD requests in both
|
||||
data-centers even if a replica of the object has not yet been
|
||||
asynchronously migrated between data-centers, by forwarding requests
|
||||
between data-centers. Container Sync is unable to serve requests for an
|
||||
object in a particular data-center until the asynchronous sync process has
|
||||
copied the object to that data-center.
|
||||
|
||||
Example::
|
||||
#. Global Clusters may require less storage capacity than Container Sync to
|
||||
achieve equivalent durability of objects in each data-center. Global
|
||||
Clusters can restore replicas that are lost or corrupted in one
|
||||
data-center using replicas from other data-centers. Container Sync
|
||||
requires each data-center to independently manage the durability of
|
||||
objects, which may result in each data-center storing more replicas than
|
||||
with Global Clusters.
|
||||
|
||||
[app:proxy-server]
|
||||
sorting_method = affinity
|
||||
read_affinity = r1=100
|
||||
|
||||
This will make the proxy attempt to service GET and HEAD requests from
|
||||
backends in region 1 before contacting any backends in region 2.
|
||||
However, if no region 1 backends are available (due to replica
|
||||
placement, failed hardware, or other reasons), then the proxy will
|
||||
fall back to backend servers in other regions.
|
||||
|
||||
Example::
|
||||
|
||||
[app:proxy-server]
|
||||
sorting_method = affinity
|
||||
read_affinity = r1z1=100, r1=200
|
||||
|
||||
This will make the proxy attempt to service GET and HEAD requests from
|
||||
backends in region 1 zone 1, then backends in region 1, then any other
|
||||
backends. If a proxy is physically close to a particular zone or
|
||||
zones, this can provide bandwidth savings. For example, if a zone
|
||||
corresponds to servers in a particular rack, and the proxy server is
|
||||
in that same rack, then setting read_affinity to prefer reads from
|
||||
within the rack will result in less traffic between the top-of-rack
|
||||
switches.
|
||||
|
||||
The read_affinity setting may contain any number of region/zone
|
||||
specifiers; the priority number (after the equals sign) determines the
|
||||
ordering in which backend servers will be contacted. A lower number
|
||||
means higher priority.
|
||||
|
||||
Note that read_affinity only affects the ordering of primary nodes
|
||||
(see ring docs for definition of primary node), not the ordering of
|
||||
handoff nodes.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
write_affinity and write_affinity_node_count
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This setting makes the proxy server prefer local backend servers for
|
||||
object PUT requests over non-local ones. For example, it may be
|
||||
preferable for an SF proxy server to service object PUT requests
|
||||
by talking to SF object servers, as the client will receive lower
|
||||
latency and higher throughput. However, if this setting is used, note
|
||||
that a NY proxy server handling a GET request for an object that was
|
||||
PUT using write affinity may have to fetch it across the WAN link, as
|
||||
the object won't immediately have any replicas in NY. However,
|
||||
replication will move the object's replicas to their proper homes in
|
||||
both SF and NY.
|
||||
|
||||
Note that only object PUT requests are affected by the write_affinity
|
||||
setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT
|
||||
requests are not affected.
|
||||
|
||||
This setting lets you trade data distribution for throughput. If
|
||||
write_affinity is enabled, then object replicas will initially be
|
||||
stored all within a particular region or zone, thereby decreasing the
|
||||
quality of the data distribution, but the replicas will be distributed
|
||||
over fast WAN links, giving higher throughput to clients. Note that
|
||||
the replicators will eventually move objects to their proper,
|
||||
well-distributed homes.
|
||||
|
||||
The write_affinity setting is useful only when you don't typically
|
||||
read objects immediately after writing them. For example, consider a
|
||||
workload of mainly backups: if you have a bunch of machines in NY that
|
||||
periodically write backups to Swift, then odds are that you don't then
|
||||
immediately read those backups in SF. If your workload doesn't look
|
||||
like that, then you probably shouldn't use write_affinity.
|
||||
|
||||
The write_affinity_node_count setting is only useful in conjunction
|
||||
with write_affinity; it governs how many local object servers will be
|
||||
tried before falling back to non-local ones.
|
||||
|
||||
Example::
|
||||
|
||||
[app:proxy-server]
|
||||
write_affinity = r1
|
||||
write_affinity_node_count = 2 * replicas
|
||||
|
||||
Assuming 3 replicas, this configuration will make object PUTs try
|
||||
storing the object's replicas on up to 6 disks ("2 * replicas") in
|
||||
region 1 ("r1"). Proxy server tries to find 3 devices for storing the
|
||||
object. While a device is unavailable, it queries the ring for the 4th
|
||||
device and so on until 6th device. If the 6th disk is still unavailable,
|
||||
the last replica will be sent to other region. It doesn't mean there'll
|
||||
have 6 replicas in region 1.
|
||||
|
||||
|
||||
You should be aware that, if you have data coming into SF faster than
|
||||
your replicators are transferring it to NY, then your cluster's data distribution
|
||||
will get worse and worse over time as objects pile up in SF. If this
|
||||
happens, it is recommended to disable write_affinity and simply let
|
||||
object PUTs traverse the WAN link, as that will naturally limit the
|
||||
object growth rate to what your WAN link can handle.
|
||||
#. Global Clusters execute all account/container metadata updates
|
||||
synchronously to account/container replicas in all data-centers, which may
|
||||
incur delays when making updates across WANs. Container Sync only copies
|
||||
objects between data-centers and all Swift internal traffic is
|
||||
confined to each data-center.
|
||||
|
||||
#. Global Clusters does not yet guarantee the availability of objects stored
|
||||
in Erasure Coded policies when one data-center is offline. With Container
|
||||
Sync the availability of objects in each data-center is independent of the
|
||||
state of other data-centers once objects have been synced. Container Sync
|
||||
also allows objects to be stored using different policy types in different
|
||||
data-centers.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Checking handoff partition distribution
|
||||
|
@ -52,6 +52,7 @@ Overview and Concepts
|
||||
ratelimit
|
||||
overview_large_objects
|
||||
overview_object_versioning
|
||||
overview_global_cluster
|
||||
overview_container_sync
|
||||
overview_expiring_objects
|
||||
cors
|
||||
|
133
doc/source/overview_global_cluster.rst
Normal file
133
doc/source/overview_global_cluster.rst
Normal file
@ -0,0 +1,133 @@
|
||||
===============
|
||||
Global Clusters
|
||||
===============
|
||||
|
||||
--------
|
||||
Overview
|
||||
--------
|
||||
|
||||
Swift's default configuration is currently designed to work in a
|
||||
single region, where a region is defined as a group of machines with
|
||||
high-bandwidth, low-latency links between them. However, configuration
|
||||
options exist that make running a performant multi-region Swift
|
||||
cluster possible.
|
||||
|
||||
For the rest of this section, we will assume a two-region Swift
|
||||
cluster: region 1 in San Francisco (SF), and region 2 in New York
|
||||
(NY). Each region shall contain within it 3 zones, numbered 1, 2, and
|
||||
3, for a total of 6 zones.
|
||||
|
||||
---------------------------
|
||||
Configuring Global Clusters
|
||||
---------------------------
|
||||
~~~~~~~~~~~~~
|
||||
read_affinity
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
This setting, combined with sorting_method setting, makes the proxy
|
||||
server prefer local backend servers for GET and HEAD requests over
|
||||
non-local ones. For example, it is preferable for an SF proxy server
|
||||
to service object GET requests by talking to SF object servers, as the
|
||||
client will receive lower latency and higher throughput.
|
||||
|
||||
By default, Swift randomly chooses one of the three replicas to give
|
||||
to the client, thereby spreading the load evenly. In the case of a
|
||||
geographically-distributed cluster, the administrator is likely to
|
||||
prioritize keeping traffic local over even distribution of results.
|
||||
This is where the read_affinity setting comes in.
|
||||
|
||||
Example::
|
||||
|
||||
[app:proxy-server]
|
||||
sorting_method = affinity
|
||||
read_affinity = r1=100
|
||||
|
||||
This will make the proxy attempt to service GET and HEAD requests from
|
||||
backends in region 1 before contacting any backends in region 2.
|
||||
However, if no region 1 backends are available (due to replica
|
||||
placement, failed hardware, or other reasons), then the proxy will
|
||||
fall back to backend servers in other regions.
|
||||
|
||||
Example::
|
||||
|
||||
[app:proxy-server]
|
||||
sorting_method = affinity
|
||||
read_affinity = r1z1=100, r1=200
|
||||
|
||||
This will make the proxy attempt to service GET and HEAD requests from
|
||||
backends in region 1 zone 1, then backends in region 1, then any other
|
||||
backends. If a proxy is physically close to a particular zone or
|
||||
zones, this can provide bandwidth savings. For example, if a zone
|
||||
corresponds to servers in a particular rack, and the proxy server is
|
||||
in that same rack, then setting read_affinity to prefer reads from
|
||||
within the rack will result in less traffic between the top-of-rack
|
||||
switches.
|
||||
|
||||
The read_affinity setting may contain any number of region/zone
|
||||
specifiers; the priority number (after the equals sign) determines the
|
||||
ordering in which backend servers will be contacted. A lower number
|
||||
means higher priority.
|
||||
|
||||
Note that read_affinity only affects the ordering of primary nodes
|
||||
(see ring docs for definition of primary node), not the ordering of
|
||||
handoff nodes.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
write_affinity and write_affinity_node_count
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This setting makes the proxy server prefer local backend servers for
|
||||
object PUT requests over non-local ones. For example, it may be
|
||||
preferable for an SF proxy server to service object PUT requests
|
||||
by talking to SF object servers, as the client will receive lower
|
||||
latency and higher throughput. However, if this setting is used, note
|
||||
that a NY proxy server handling a GET request for an object that was
|
||||
PUT using write affinity may have to fetch it across the WAN link, as
|
||||
the object won't immediately have any replicas in NY. However,
|
||||
replication will move the object's replicas to their proper homes in
|
||||
both SF and NY.
|
||||
|
||||
Note that only object PUT requests are affected by the write_affinity
|
||||
setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT
|
||||
requests are not affected.
|
||||
|
||||
This setting lets you trade data distribution for throughput. If
|
||||
write_affinity is enabled, then object replicas will initially be
|
||||
stored all within a particular region or zone, thereby decreasing the
|
||||
quality of the data distribution, but the replicas will be distributed
|
||||
over fast WAN links, giving higher throughput to clients. Note that
|
||||
the replicators will eventually move objects to their proper,
|
||||
well-distributed homes.
|
||||
|
||||
The write_affinity setting is useful only when you don't typically
|
||||
read objects immediately after writing them. For example, consider a
|
||||
workload of mainly backups: if you have a bunch of machines in NY that
|
||||
periodically write backups to Swift, then odds are that you don't then
|
||||
immediately read those backups in SF. If your workload doesn't look
|
||||
like that, then you probably shouldn't use write_affinity.
|
||||
|
||||
The write_affinity_node_count setting is only useful in conjunction
|
||||
with write_affinity; it governs how many local object servers will be
|
||||
tried before falling back to non-local ones.
|
||||
|
||||
Example::
|
||||
|
||||
[app:proxy-server]
|
||||
write_affinity = r1
|
||||
write_affinity_node_count = 2 * replicas
|
||||
|
||||
Assuming 3 replicas, this configuration will make object PUTs try
|
||||
storing the object's replicas on up to 6 disks ("2 * replicas") in
|
||||
region 1 ("r1"). Proxy server tries to find 3 devices for storing the
|
||||
object. While a device is unavailable, it queries the ring for the 4th
|
||||
device and so on until 6th device. If the 6th disk is still unavailable,
|
||||
the last replica will be sent to other region. It doesn't mean there'll
|
||||
have 6 replicas in region 1.
|
||||
|
||||
|
||||
You should be aware that, if you have data coming into SF faster than
|
||||
your replicators are transferring it to NY, then your cluster's data
|
||||
distribution will get worse and worse over time as objects pile up in SF.
|
||||
If this happens, it is recommended to disable write_affinity and simply let
|
||||
object PUTs traverse the WAN link, as that will naturally limit the
|
||||
object growth rate to what your WAN link can handle.
|
Loading…
x
Reference in New Issue
Block a user