First pass at admin-guide cleanup
Change-Id: I005232d95a3e1d181271eef488a828ad330e6006
This commit is contained in:
parent
4cb76a41ce
commit
d112b7d29d
@ -38,7 +38,7 @@ Typically, the tier consists of a collection of 1U servers. These
|
|||||||
machines use a moderate amount of RAM and are network I/O intensive.
|
machines use a moderate amount of RAM and are network I/O intensive.
|
||||||
Since these systems field each incoming API request, you should
|
Since these systems field each incoming API request, you should
|
||||||
provision them with two high-throughput (10GbE) interfaces - one for the
|
provision them with two high-throughput (10GbE) interfaces - one for the
|
||||||
incoming ``front-end`` requests and the other for the ``back-end`` access to
|
incoming front-end requests and the other for the back-end access to
|
||||||
the object storage nodes to put and fetch data.
|
the object storage nodes to put and fetch data.
|
||||||
|
|
||||||
Factors to consider
|
Factors to consider
|
||||||
@ -59,9 +59,10 @@ amount of storage capacity. Storage nodes use a reasonable amount of
|
|||||||
memory and CPU. Metadata needs to be readily available to return objects
|
memory and CPU. Metadata needs to be readily available to return objects
|
||||||
quickly. The object stores run services not only to field incoming
|
quickly. The object stores run services not only to field incoming
|
||||||
requests from the access tier, but to also run replicators, auditors,
|
requests from the access tier, but to also run replicators, auditors,
|
||||||
and reapers. You can provision object stores provisioned with single
|
and reapers. You can provision storage nodes with single gigabit or
|
||||||
gigabit or 10 gigabit network interface depending on the expected
|
10 gigabit network interface depending on the expected workload and
|
||||||
workload and desired performance.
|
desired performance, although it may be desirable to isolate replication
|
||||||
|
traffic with a second interface.
|
||||||
|
|
||||||
**Object Storage (swift)**
|
**Object Storage (swift)**
|
||||||
|
|
||||||
|
@ -6,8 +6,12 @@ The key characteristics of Object Storage are that:
|
|||||||
|
|
||||||
- All objects stored in Object Storage have a URL.
|
- All objects stored in Object Storage have a URL.
|
||||||
|
|
||||||
- All objects stored are replicated 3✕ in as-unique-as-possible zones,
|
- "Storage Policies" may be used to define different levels of durability
|
||||||
which can be defined as a group of drives, a node, a rack, and so on.
|
for objects stored in the cluster. These policies support not only
|
||||||
|
complete replicas but also erasure-coded fragments.
|
||||||
|
|
||||||
|
- All replicas or fragments for an object are stored in as-unique-as-possible
|
||||||
|
zones to increase durability and availability.
|
||||||
|
|
||||||
- All objects have their own metadata.
|
- All objects have their own metadata.
|
||||||
|
|
||||||
|
@ -39,13 +39,13 @@ Proxy servers
|
|||||||
Proxy servers are the public face of Object Storage and handle all of
|
Proxy servers are the public face of Object Storage and handle all of
|
||||||
the incoming API requests. Once a proxy server receives a request, it
|
the incoming API requests. Once a proxy server receives a request, it
|
||||||
determines the storage node based on the object's URL, for example:
|
determines the storage node based on the object's URL, for example:
|
||||||
https://swift.example.com/v1/account/container/object. Proxy servers
|
``https://swift.example.com/v1/account/container/object``. Proxy servers
|
||||||
also coordinate responses, handle failures, and coordinate timestamps.
|
also coordinate responses, handle failures, and coordinate timestamps.
|
||||||
|
|
||||||
Proxy servers use a shared-nothing architecture and can be scaled as
|
Proxy servers use a shared-nothing architecture and can be scaled as
|
||||||
needed based on projected workloads. A minimum of two proxy servers
|
needed based on projected workloads. A minimum of two proxy servers
|
||||||
should be deployed for redundancy. If one proxy server fails, the others
|
should be deployed behind a separately-managed load balancer. If one
|
||||||
take over.
|
proxy server fails, the others take over.
|
||||||
|
|
||||||
For more information concerning proxy server configuration, see
|
For more information concerning proxy server configuration, see
|
||||||
`Configuration Reference
|
`Configuration Reference
|
||||||
@ -54,23 +54,24 @@ For more information concerning proxy server configuration, see
|
|||||||
Rings
|
Rings
|
||||||
-----
|
-----
|
||||||
|
|
||||||
A ring represents a mapping between the names of entities stored on disks
|
A ring represents a mapping between the names of entities stored in the
|
||||||
and their physical locations. There are separate rings for accounts,
|
cluster and their physical locations on disks. There are separate rings
|
||||||
containers, and objects. When other components need to perform any
|
for accounts, containers, and objects. When components of the system need
|
||||||
operation on an object, container, or account, they need to interact
|
to perform an operation on an object, container, or account, they need to
|
||||||
with the appropriate ring to determine their location in the cluster.
|
interact with the corresponding ring to determine the appropriate location
|
||||||
|
in the cluster.
|
||||||
|
|
||||||
The ring maintains this mapping using zones, devices, partitions, and
|
The ring maintains this mapping using zones, devices, partitions, and
|
||||||
replicas. Each partition in the ring is replicated, by default, three
|
replicas. Each partition in the ring is replicated, by default, three
|
||||||
times across the cluster, and partition locations are stored in the
|
times across the cluster, and partition locations are stored in the
|
||||||
mapping maintained by the ring. The ring is also responsible for
|
mapping maintained by the ring. The ring is also responsible for
|
||||||
determining which devices are used for handoff in failure scenarios.
|
determining which devices are used as handoffs in failure scenarios.
|
||||||
|
|
||||||
Data can be isolated into zones in the ring. Each partition replica is
|
Data can be isolated into zones in the ring. Each partition replica
|
||||||
guaranteed to reside in a different zone. A zone could represent a
|
will try to reside in a different zone. A zone could represent a
|
||||||
drive, a server, a cabinet, a switch, or even a data center.
|
drive, a server, a cabinet, a switch, or even a data center.
|
||||||
|
|
||||||
The partitions of the ring are equally divided among all of the devices
|
The partitions of the ring are distributed among all of the devices
|
||||||
in the Object Storage installation. When partitions need to be moved
|
in the Object Storage installation. When partitions need to be moved
|
||||||
around (for example, if a device is added to the cluster), the ring
|
around (for example, if a device is added to the cluster), the ring
|
||||||
ensures that a minimum number of partitions are moved at a time, and
|
ensures that a minimum number of partitions are moved at a time, and
|
||||||
@ -104,7 +105,7 @@ working with each item separately or the entire cluster all at once.
|
|||||||
|
|
||||||
Another configurable value is the replica count, which indicates how
|
Another configurable value is the replica count, which indicates how
|
||||||
many of the partition-device assignments make up a single ring. For a
|
many of the partition-device assignments make up a single ring. For a
|
||||||
given partition number, each replica's device will not be in the same
|
given partition index, each replica's device will not be in the same
|
||||||
zone as any other replica's device. Zones can be used to group devices
|
zone as any other replica's device. Zones can be used to group devices
|
||||||
based on physical locations, power separations, network separations, or
|
based on physical locations, power separations, network separations, or
|
||||||
any other attribute that would improve the availability of multiple
|
any other attribute that would improve the availability of multiple
|
||||||
@ -167,7 +168,7 @@ System replicators and object uploads/downloads operate on partitions.
|
|||||||
As the system scales up, its behavior continues to be predictable
|
As the system scales up, its behavior continues to be predictable
|
||||||
because the number of partitions is a fixed number.
|
because the number of partitions is a fixed number.
|
||||||
|
|
||||||
Implementing a partition is conceptually simple, a partition is just a
|
Implementing a partition is conceptually simple: a partition is just a
|
||||||
directory sitting on a disk with a corresponding hash table of what it
|
directory sitting on a disk with a corresponding hash table of what it
|
||||||
contains.
|
contains.
|
||||||
|
|
||||||
@ -189,19 +190,19 @@ the other zones to see if there are any differences.
|
|||||||
|
|
||||||
The replicator knows if replication needs to take place by examining
|
The replicator knows if replication needs to take place by examining
|
||||||
hashes. A hash file is created for each partition, which contains hashes
|
hashes. A hash file is created for each partition, which contains hashes
|
||||||
of each directory in the partition. Each of the three hash files is
|
of each directory in the partition. For a given partition, the hash files
|
||||||
compared. For a given partition, the hash files for each of the
|
for each of the partition's copies are compared. If the hashes are
|
||||||
partition's copies are compared. If the hashes are different, then it is
|
different, then it is time to replicate, and the directory that needs to
|
||||||
time to replicate, and the directory that needs to be replicated is
|
be replicated is copied over.
|
||||||
copied over.
|
|
||||||
|
|
||||||
This is where partitions come in handy. With fewer things in the system,
|
This is where partitions come in handy. With fewer things in the system,
|
||||||
larger chunks of data are transferred around (rather than lots of little
|
larger chunks of data are transferred around (rather than lots of little
|
||||||
TCP connections, which is inefficient) and there is a consistent number
|
TCP connections, which is inefficient) and there is a consistent number
|
||||||
of hashes to compare.
|
of hashes to compare.
|
||||||
|
|
||||||
The cluster eventually has a consistent behavior where the newest data
|
The cluster has an eventually-consistent behavior where old data may be
|
||||||
has a priority.
|
served from partitions that missed updates, but replication will cause
|
||||||
|
all partitions to converge toward the newest data.
|
||||||
|
|
||||||
|
|
||||||
.. _objectstorage-replication-figure:
|
.. _objectstorage-replication-figure:
|
||||||
@ -252,7 +253,7 @@ Download
|
|||||||
~~~~~~~~
|
~~~~~~~~
|
||||||
|
|
||||||
A request comes in for an account/container/object. Using the same
|
A request comes in for an account/container/object. Using the same
|
||||||
consistent hashing, the partition name is generated. A lookup in the
|
consistent hashing, the partition index is determined. A lookup in the
|
||||||
ring reveals which storage nodes contain that partition. A request is
|
ring reveals which storage nodes contain that partition. A request is
|
||||||
made to one of the storage nodes to fetch the object and, if that fails,
|
made to one of the storage nodes to fetch the object and, if that fails,
|
||||||
requests are made to the other nodes.
|
requests are made to the other nodes.
|
||||||
|
@ -50,14 +50,3 @@ Features and benefits
|
|||||||
- Utilize tools that were designed for the popular S3 API.
|
- Utilize tools that were designed for the popular S3 API.
|
||||||
* - Restrict containers per account
|
* - Restrict containers per account
|
||||||
- Limit access to control usage by user.
|
- Limit access to control usage by user.
|
||||||
* - Support for NetApp, Nexenta, Solidfire
|
|
||||||
- Unified support for block volumes using a variety of storage
|
|
||||||
systems.
|
|
||||||
* - Snapshot and backup API for block volumes.
|
|
||||||
- Data protection and recovery for VM data.
|
|
||||||
* - Standalone volume API available
|
|
||||||
- Separate endpoint and API for integration with other compute
|
|
||||||
systems.
|
|
||||||
* - Integration with Compute
|
|
||||||
- Fully integrated with Compute for attaching block volumes and
|
|
||||||
reporting on usage.
|
|
||||||
|
Loading…
Reference in New Issue
Block a user