Add Pros/Cons docs for global cluster consideration
This comes from discussion in Bristol Hackathon (Feb 2016). Currently Swift has a couple of choices (Global Cluster and Container Sync) to sync the stored data into geographically distributed locations. This patch adds the summary of the discussion comparing between Global Cluster and Container Sync to enable operators to know which functionality fits their own use case. And, to be fairness with container-sync, this patch moves global cluster docs into overview_global_cluster.rst from admin_guide.rst. Co-Authored-By: Alistair Coles <alistair.coles@hpe.com> Change-Id: I624eb519503ae71dbc82245c33dab6e8637d0f8b
This commit is contained in:
parent
e9f5e7966a
commit
dfa5523d8c
@ -496,133 +496,65 @@ When you specify a policy the containers created also include the policy index,
|
|||||||
thus even when running a container_only report, you will need to specify the
|
thus even when running a container_only report, you will need to specify the
|
||||||
policy not using the default.
|
policy not using the default.
|
||||||
|
|
||||||
-----------------------------------
|
-----------------------------------------------
|
||||||
Geographically Distributed Clusters
|
Geographically Distributed Swift Considerations
|
||||||
-----------------------------------
|
-----------------------------------------------
|
||||||
|
|
||||||
Swift's default configuration is currently designed to work in a
|
Swift provides two features that may be used to distribute replicas of objects
|
||||||
single region, where a region is defined as a group of machines with
|
across multiple geographically distributed data-centers: with
|
||||||
high-bandwidth, low-latency links between them. However, configuration
|
:doc:`overview_global_cluster` object replicas may be dispersed across devices
|
||||||
options exist that make running a performant multi-region Swift
|
from different data-centers by using `regions` in ring device descriptors; with
|
||||||
cluster possible.
|
:doc:`overview_container_sync` objects may be copied between independent Swift
|
||||||
|
clusters in each data-center. The operation and configuration of each are
|
||||||
|
described in their respective documentation. The following points should be
|
||||||
|
considered when selecting the feature that is most appropriate for a particular
|
||||||
|
use case:
|
||||||
|
|
||||||
For the rest of this section, we will assume a two-region Swift
|
#. Global Clusters allows the distribution of object replicas across
|
||||||
cluster: region 1 in San Francisco (SF), and region 2 in New York
|
data-centers to be controlled by the cluster operator on per-policy basis,
|
||||||
(NY). Each region shall contain within it 3 zones, numbered 1, 2, and
|
since the distribution is determined by the assignment of devices from
|
||||||
3, for a total of 6 zones.
|
each data-center in each policy's ring file. With Container Sync the end
|
||||||
|
user controls the distribution of objects across clusters on a
|
||||||
|
per-container basis.
|
||||||
|
|
||||||
~~~~~~~~~~~~~
|
#. Global Clusters requires an operator to coordinate ring deployments across
|
||||||
read_affinity
|
multiple data-centers. Container Sync allows for independent management of
|
||||||
~~~~~~~~~~~~~
|
separate Swift clusters in each data-center, and for existing Swift
|
||||||
|
clusters to be used as peers in Container Sync relationships without
|
||||||
|
deploying new policies/rings.
|
||||||
|
|
||||||
This setting, combined with sorting_method setting, makes the proxy server prefer local backend servers for
|
#. Global Clusters seamlessly supports features that may rely on
|
||||||
GET and HEAD requests over non-local ones. For example, it is
|
cross-container operations such as large objects and versioned writes.
|
||||||
preferable for an SF proxy server to service object GET requests
|
Container Sync requires the end user to ensure that all required
|
||||||
by talking to SF object servers, as the client will receive lower
|
containers are sync'd for these features to work in all data-centers.
|
||||||
latency and higher throughput.
|
|
||||||
|
|
||||||
By default, Swift randomly chooses one of the three replicas to give
|
#. Global Clusters makes objects available for GET or HEAD requests in both
|
||||||
to the client, thereby spreading the load evenly. In the case of a
|
data-centers even if a replica of the object has not yet been
|
||||||
geographically-distributed cluster, the administrator is likely to
|
asynchronously migrated between data-centers, by forwarding requests
|
||||||
prioritize keeping traffic local over even distribution of results.
|
between data-centers. Container Sync is unable to serve requests for an
|
||||||
This is where the read_affinity setting comes in.
|
object in a particular data-center until the asynchronous sync process has
|
||||||
|
copied the object to that data-center.
|
||||||
|
|
||||||
Example::
|
#. Global Clusters may require less storage capacity than Container Sync to
|
||||||
|
achieve equivalent durability of objects in each data-center. Global
|
||||||
|
Clusters can restore replicas that are lost or corrupted in one
|
||||||
|
data-center using replicas from other data-centers. Container Sync
|
||||||
|
requires each data-center to independently manage the durability of
|
||||||
|
objects, which may result in each data-center storing more replicas than
|
||||||
|
with Global Clusters.
|
||||||
|
|
||||||
[app:proxy-server]
|
#. Global Clusters execute all account/container metadata updates
|
||||||
sorting_method = affinity
|
synchronously to account/container replicas in all data-centers, which may
|
||||||
read_affinity = r1=100
|
incur delays when making updates across WANs. Container Sync only copies
|
||||||
|
objects between data-centers and all Swift internal traffic is
|
||||||
This will make the proxy attempt to service GET and HEAD requests from
|
confined to each data-center.
|
||||||
backends in region 1 before contacting any backends in region 2.
|
|
||||||
However, if no region 1 backends are available (due to replica
|
|
||||||
placement, failed hardware, or other reasons), then the proxy will
|
|
||||||
fall back to backend servers in other regions.
|
|
||||||
|
|
||||||
Example::
|
|
||||||
|
|
||||||
[app:proxy-server]
|
|
||||||
sorting_method = affinity
|
|
||||||
read_affinity = r1z1=100, r1=200
|
|
||||||
|
|
||||||
This will make the proxy attempt to service GET and HEAD requests from
|
|
||||||
backends in region 1 zone 1, then backends in region 1, then any other
|
|
||||||
backends. If a proxy is physically close to a particular zone or
|
|
||||||
zones, this can provide bandwidth savings. For example, if a zone
|
|
||||||
corresponds to servers in a particular rack, and the proxy server is
|
|
||||||
in that same rack, then setting read_affinity to prefer reads from
|
|
||||||
within the rack will result in less traffic between the top-of-rack
|
|
||||||
switches.
|
|
||||||
|
|
||||||
The read_affinity setting may contain any number of region/zone
|
|
||||||
specifiers; the priority number (after the equals sign) determines the
|
|
||||||
ordering in which backend servers will be contacted. A lower number
|
|
||||||
means higher priority.
|
|
||||||
|
|
||||||
Note that read_affinity only affects the ordering of primary nodes
|
|
||||||
(see ring docs for definition of primary node), not the ordering of
|
|
||||||
handoff nodes.
|
|
||||||
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
write_affinity and write_affinity_node_count
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
This setting makes the proxy server prefer local backend servers for
|
|
||||||
object PUT requests over non-local ones. For example, it may be
|
|
||||||
preferable for an SF proxy server to service object PUT requests
|
|
||||||
by talking to SF object servers, as the client will receive lower
|
|
||||||
latency and higher throughput. However, if this setting is used, note
|
|
||||||
that a NY proxy server handling a GET request for an object that was
|
|
||||||
PUT using write affinity may have to fetch it across the WAN link, as
|
|
||||||
the object won't immediately have any replicas in NY. However,
|
|
||||||
replication will move the object's replicas to their proper homes in
|
|
||||||
both SF and NY.
|
|
||||||
|
|
||||||
Note that only object PUT requests are affected by the write_affinity
|
|
||||||
setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT
|
|
||||||
requests are not affected.
|
|
||||||
|
|
||||||
This setting lets you trade data distribution for throughput. If
|
|
||||||
write_affinity is enabled, then object replicas will initially be
|
|
||||||
stored all within a particular region or zone, thereby decreasing the
|
|
||||||
quality of the data distribution, but the replicas will be distributed
|
|
||||||
over fast WAN links, giving higher throughput to clients. Note that
|
|
||||||
the replicators will eventually move objects to their proper,
|
|
||||||
well-distributed homes.
|
|
||||||
|
|
||||||
The write_affinity setting is useful only when you don't typically
|
|
||||||
read objects immediately after writing them. For example, consider a
|
|
||||||
workload of mainly backups: if you have a bunch of machines in NY that
|
|
||||||
periodically write backups to Swift, then odds are that you don't then
|
|
||||||
immediately read those backups in SF. If your workload doesn't look
|
|
||||||
like that, then you probably shouldn't use write_affinity.
|
|
||||||
|
|
||||||
The write_affinity_node_count setting is only useful in conjunction
|
|
||||||
with write_affinity; it governs how many local object servers will be
|
|
||||||
tried before falling back to non-local ones.
|
|
||||||
|
|
||||||
Example::
|
|
||||||
|
|
||||||
[app:proxy-server]
|
|
||||||
write_affinity = r1
|
|
||||||
write_affinity_node_count = 2 * replicas
|
|
||||||
|
|
||||||
Assuming 3 replicas, this configuration will make object PUTs try
|
|
||||||
storing the object's replicas on up to 6 disks ("2 * replicas") in
|
|
||||||
region 1 ("r1"). Proxy server tries to find 3 devices for storing the
|
|
||||||
object. While a device is unavailable, it queries the ring for the 4th
|
|
||||||
device and so on until 6th device. If the 6th disk is still unavailable,
|
|
||||||
the last replica will be sent to other region. It doesn't mean there'll
|
|
||||||
have 6 replicas in region 1.
|
|
||||||
|
|
||||||
|
|
||||||
You should be aware that, if you have data coming into SF faster than
|
|
||||||
your replicators are transferring it to NY, then your cluster's data distribution
|
|
||||||
will get worse and worse over time as objects pile up in SF. If this
|
|
||||||
happens, it is recommended to disable write_affinity and simply let
|
|
||||||
object PUTs traverse the WAN link, as that will naturally limit the
|
|
||||||
object growth rate to what your WAN link can handle.
|
|
||||||
|
|
||||||
|
#. Global Clusters does not yet guarantee the availability of objects stored
|
||||||
|
in Erasure Coded policies when one data-center is offline. With Container
|
||||||
|
Sync the availability of objects in each data-center is independent of the
|
||||||
|
state of other data-centers once objects have been synced. Container Sync
|
||||||
|
also allows objects to be stored using different policy types in different
|
||||||
|
data-centers.
|
||||||
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
Checking handoff partition distribution
|
Checking handoff partition distribution
|
||||||
|
@ -52,6 +52,7 @@ Overview and Concepts
|
|||||||
ratelimit
|
ratelimit
|
||||||
overview_large_objects
|
overview_large_objects
|
||||||
overview_object_versioning
|
overview_object_versioning
|
||||||
|
overview_global_cluster
|
||||||
overview_container_sync
|
overview_container_sync
|
||||||
overview_expiring_objects
|
overview_expiring_objects
|
||||||
cors
|
cors
|
||||||
|
133
doc/source/overview_global_cluster.rst
Normal file
133
doc/source/overview_global_cluster.rst
Normal file
@ -0,0 +1,133 @@
|
|||||||
|
===============
|
||||||
|
Global Clusters
|
||||||
|
===============
|
||||||
|
|
||||||
|
--------
|
||||||
|
Overview
|
||||||
|
--------
|
||||||
|
|
||||||
|
Swift's default configuration is currently designed to work in a
|
||||||
|
single region, where a region is defined as a group of machines with
|
||||||
|
high-bandwidth, low-latency links between them. However, configuration
|
||||||
|
options exist that make running a performant multi-region Swift
|
||||||
|
cluster possible.
|
||||||
|
|
||||||
|
For the rest of this section, we will assume a two-region Swift
|
||||||
|
cluster: region 1 in San Francisco (SF), and region 2 in New York
|
||||||
|
(NY). Each region shall contain within it 3 zones, numbered 1, 2, and
|
||||||
|
3, for a total of 6 zones.
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
Configuring Global Clusters
|
||||||
|
---------------------------
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
read_affinity
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This setting, combined with sorting_method setting, makes the proxy
|
||||||
|
server prefer local backend servers for GET and HEAD requests over
|
||||||
|
non-local ones. For example, it is preferable for an SF proxy server
|
||||||
|
to service object GET requests by talking to SF object servers, as the
|
||||||
|
client will receive lower latency and higher throughput.
|
||||||
|
|
||||||
|
By default, Swift randomly chooses one of the three replicas to give
|
||||||
|
to the client, thereby spreading the load evenly. In the case of a
|
||||||
|
geographically-distributed cluster, the administrator is likely to
|
||||||
|
prioritize keeping traffic local over even distribution of results.
|
||||||
|
This is where the read_affinity setting comes in.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
[app:proxy-server]
|
||||||
|
sorting_method = affinity
|
||||||
|
read_affinity = r1=100
|
||||||
|
|
||||||
|
This will make the proxy attempt to service GET and HEAD requests from
|
||||||
|
backends in region 1 before contacting any backends in region 2.
|
||||||
|
However, if no region 1 backends are available (due to replica
|
||||||
|
placement, failed hardware, or other reasons), then the proxy will
|
||||||
|
fall back to backend servers in other regions.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
[app:proxy-server]
|
||||||
|
sorting_method = affinity
|
||||||
|
read_affinity = r1z1=100, r1=200
|
||||||
|
|
||||||
|
This will make the proxy attempt to service GET and HEAD requests from
|
||||||
|
backends in region 1 zone 1, then backends in region 1, then any other
|
||||||
|
backends. If a proxy is physically close to a particular zone or
|
||||||
|
zones, this can provide bandwidth savings. For example, if a zone
|
||||||
|
corresponds to servers in a particular rack, and the proxy server is
|
||||||
|
in that same rack, then setting read_affinity to prefer reads from
|
||||||
|
within the rack will result in less traffic between the top-of-rack
|
||||||
|
switches.
|
||||||
|
|
||||||
|
The read_affinity setting may contain any number of region/zone
|
||||||
|
specifiers; the priority number (after the equals sign) determines the
|
||||||
|
ordering in which backend servers will be contacted. A lower number
|
||||||
|
means higher priority.
|
||||||
|
|
||||||
|
Note that read_affinity only affects the ordering of primary nodes
|
||||||
|
(see ring docs for definition of primary node), not the ordering of
|
||||||
|
handoff nodes.
|
||||||
|
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
write_affinity and write_affinity_node_count
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This setting makes the proxy server prefer local backend servers for
|
||||||
|
object PUT requests over non-local ones. For example, it may be
|
||||||
|
preferable for an SF proxy server to service object PUT requests
|
||||||
|
by talking to SF object servers, as the client will receive lower
|
||||||
|
latency and higher throughput. However, if this setting is used, note
|
||||||
|
that a NY proxy server handling a GET request for an object that was
|
||||||
|
PUT using write affinity may have to fetch it across the WAN link, as
|
||||||
|
the object won't immediately have any replicas in NY. However,
|
||||||
|
replication will move the object's replicas to their proper homes in
|
||||||
|
both SF and NY.
|
||||||
|
|
||||||
|
Note that only object PUT requests are affected by the write_affinity
|
||||||
|
setting; POST, GET, HEAD, DELETE, OPTIONS, and account/container PUT
|
||||||
|
requests are not affected.
|
||||||
|
|
||||||
|
This setting lets you trade data distribution for throughput. If
|
||||||
|
write_affinity is enabled, then object replicas will initially be
|
||||||
|
stored all within a particular region or zone, thereby decreasing the
|
||||||
|
quality of the data distribution, but the replicas will be distributed
|
||||||
|
over fast WAN links, giving higher throughput to clients. Note that
|
||||||
|
the replicators will eventually move objects to their proper,
|
||||||
|
well-distributed homes.
|
||||||
|
|
||||||
|
The write_affinity setting is useful only when you don't typically
|
||||||
|
read objects immediately after writing them. For example, consider a
|
||||||
|
workload of mainly backups: if you have a bunch of machines in NY that
|
||||||
|
periodically write backups to Swift, then odds are that you don't then
|
||||||
|
immediately read those backups in SF. If your workload doesn't look
|
||||||
|
like that, then you probably shouldn't use write_affinity.
|
||||||
|
|
||||||
|
The write_affinity_node_count setting is only useful in conjunction
|
||||||
|
with write_affinity; it governs how many local object servers will be
|
||||||
|
tried before falling back to non-local ones.
|
||||||
|
|
||||||
|
Example::
|
||||||
|
|
||||||
|
[app:proxy-server]
|
||||||
|
write_affinity = r1
|
||||||
|
write_affinity_node_count = 2 * replicas
|
||||||
|
|
||||||
|
Assuming 3 replicas, this configuration will make object PUTs try
|
||||||
|
storing the object's replicas on up to 6 disks ("2 * replicas") in
|
||||||
|
region 1 ("r1"). Proxy server tries to find 3 devices for storing the
|
||||||
|
object. While a device is unavailable, it queries the ring for the 4th
|
||||||
|
device and so on until 6th device. If the 6th disk is still unavailable,
|
||||||
|
the last replica will be sent to other region. It doesn't mean there'll
|
||||||
|
have 6 replicas in region 1.
|
||||||
|
|
||||||
|
|
||||||
|
You should be aware that, if you have data coming into SF faster than
|
||||||
|
your replicators are transferring it to NY, then your cluster's data
|
||||||
|
distribution will get worse and worse over time as objects pile up in SF.
|
||||||
|
If this happens, it is recommended to disable write_affinity and simply let
|
||||||
|
object PUTs traverse the WAN link, as that will naturally limit the
|
||||||
|
object growth rate to what your WAN link can handle.
|
Loading…
Reference in New Issue
Block a user