First pass at admin-guide cleanup
Change-Id: I005232d95a3e1d181271eef488a828ad330e6006
This commit is contained in:
parent
4cb76a41ce
commit
d112b7d29d
@ -38,7 +38,7 @@ Typically, the tier consists of a collection of 1U servers. These
|
||||
machines use a moderate amount of RAM and are network I/O intensive.
|
||||
Since these systems field each incoming API request, you should
|
||||
provision them with two high-throughput (10GbE) interfaces - one for the
|
||||
incoming ``front-end`` requests and the other for the ``back-end`` access to
|
||||
incoming front-end requests and the other for the back-end access to
|
||||
the object storage nodes to put and fetch data.
|
||||
|
||||
Factors to consider
|
||||
@ -59,9 +59,10 @@ amount of storage capacity. Storage nodes use a reasonable amount of
|
||||
memory and CPU. Metadata needs to be readily available to return objects
|
||||
quickly. The object stores run services not only to field incoming
|
||||
requests from the access tier, but to also run replicators, auditors,
|
||||
and reapers. You can provision object stores provisioned with single
|
||||
gigabit or 10 gigabit network interface depending on the expected
|
||||
workload and desired performance.
|
||||
and reapers. You can provision storage nodes with single gigabit or
|
||||
10 gigabit network interface depending on the expected workload and
|
||||
desired performance, although it may be desirable to isolate replication
|
||||
traffic with a second interface.
|
||||
|
||||
**Object Storage (swift)**
|
||||
|
||||
|
@ -6,8 +6,12 @@ The key characteristics of Object Storage are that:
|
||||
|
||||
- All objects stored in Object Storage have a URL.
|
||||
|
||||
- All objects stored are replicated 3✕ in as-unique-as-possible zones,
|
||||
which can be defined as a group of drives, a node, a rack, and so on.
|
||||
- "Storage Policies" may be used to define different levels of durability
|
||||
for objects stored in the cluster. These policies support not only
|
||||
complete replicas but also erasure-coded fragments.
|
||||
|
||||
- All replicas or fragments for an object are stored in as-unique-as-possible
|
||||
zones to increase durability and availability.
|
||||
|
||||
- All objects have their own metadata.
|
||||
|
||||
|
@ -39,13 +39,13 @@ Proxy servers
|
||||
Proxy servers are the public face of Object Storage and handle all of
|
||||
the incoming API requests. Once a proxy server receives a request, it
|
||||
determines the storage node based on the object's URL, for example:
|
||||
https://swift.example.com/v1/account/container/object. Proxy servers
|
||||
``https://swift.example.com/v1/account/container/object``. Proxy servers
|
||||
also coordinate responses, handle failures, and coordinate timestamps.
|
||||
|
||||
Proxy servers use a shared-nothing architecture and can be scaled as
|
||||
needed based on projected workloads. A minimum of two proxy servers
|
||||
should be deployed for redundancy. If one proxy server fails, the others
|
||||
take over.
|
||||
should be deployed behind a separately-managed load balancer. If one
|
||||
proxy server fails, the others take over.
|
||||
|
||||
For more information concerning proxy server configuration, see
|
||||
`Configuration Reference
|
||||
@ -54,23 +54,24 @@ For more information concerning proxy server configuration, see
|
||||
Rings
|
||||
-----
|
||||
|
||||
A ring represents a mapping between the names of entities stored on disks
|
||||
and their physical locations. There are separate rings for accounts,
|
||||
containers, and objects. When other components need to perform any
|
||||
operation on an object, container, or account, they need to interact
|
||||
with the appropriate ring to determine their location in the cluster.
|
||||
A ring represents a mapping between the names of entities stored in the
|
||||
cluster and their physical locations on disks. There are separate rings
|
||||
for accounts, containers, and objects. When components of the system need
|
||||
to perform an operation on an object, container, or account, they need to
|
||||
interact with the corresponding ring to determine the appropriate location
|
||||
in the cluster.
|
||||
|
||||
The ring maintains this mapping using zones, devices, partitions, and
|
||||
replicas. Each partition in the ring is replicated, by default, three
|
||||
times across the cluster, and partition locations are stored in the
|
||||
mapping maintained by the ring. The ring is also responsible for
|
||||
determining which devices are used for handoff in failure scenarios.
|
||||
determining which devices are used as handoffs in failure scenarios.
|
||||
|
||||
Data can be isolated into zones in the ring. Each partition replica is
|
||||
guaranteed to reside in a different zone. A zone could represent a
|
||||
Data can be isolated into zones in the ring. Each partition replica
|
||||
will try to reside in a different zone. A zone could represent a
|
||||
drive, a server, a cabinet, a switch, or even a data center.
|
||||
|
||||
The partitions of the ring are equally divided among all of the devices
|
||||
The partitions of the ring are distributed among all of the devices
|
||||
in the Object Storage installation. When partitions need to be moved
|
||||
around (for example, if a device is added to the cluster), the ring
|
||||
ensures that a minimum number of partitions are moved at a time, and
|
||||
@ -104,7 +105,7 @@ working with each item separately or the entire cluster all at once.
|
||||
|
||||
Another configurable value is the replica count, which indicates how
|
||||
many of the partition-device assignments make up a single ring. For a
|
||||
given partition number, each replica's device will not be in the same
|
||||
given partition index, each replica's device will not be in the same
|
||||
zone as any other replica's device. Zones can be used to group devices
|
||||
based on physical locations, power separations, network separations, or
|
||||
any other attribute that would improve the availability of multiple
|
||||
@ -167,7 +168,7 @@ System replicators and object uploads/downloads operate on partitions.
|
||||
As the system scales up, its behavior continues to be predictable
|
||||
because the number of partitions is a fixed number.
|
||||
|
||||
Implementing a partition is conceptually simple, a partition is just a
|
||||
Implementing a partition is conceptually simple: a partition is just a
|
||||
directory sitting on a disk with a corresponding hash table of what it
|
||||
contains.
|
||||
|
||||
@ -189,19 +190,19 @@ the other zones to see if there are any differences.
|
||||
|
||||
The replicator knows if replication needs to take place by examining
|
||||
hashes. A hash file is created for each partition, which contains hashes
|
||||
of each directory in the partition. Each of the three hash files is
|
||||
compared. For a given partition, the hash files for each of the
|
||||
partition's copies are compared. If the hashes are different, then it is
|
||||
time to replicate, and the directory that needs to be replicated is
|
||||
copied over.
|
||||
of each directory in the partition. For a given partition, the hash files
|
||||
for each of the partition's copies are compared. If the hashes are
|
||||
different, then it is time to replicate, and the directory that needs to
|
||||
be replicated is copied over.
|
||||
|
||||
This is where partitions come in handy. With fewer things in the system,
|
||||
larger chunks of data are transferred around (rather than lots of little
|
||||
TCP connections, which is inefficient) and there is a consistent number
|
||||
of hashes to compare.
|
||||
|
||||
The cluster eventually has a consistent behavior where the newest data
|
||||
has a priority.
|
||||
The cluster has an eventually-consistent behavior where old data may be
|
||||
served from partitions that missed updates, but replication will cause
|
||||
all partitions to converge toward the newest data.
|
||||
|
||||
|
||||
.. _objectstorage-replication-figure:
|
||||
@ -252,7 +253,7 @@ Download
|
||||
~~~~~~~~
|
||||
|
||||
A request comes in for an account/container/object. Using the same
|
||||
consistent hashing, the partition name is generated. A lookup in the
|
||||
consistent hashing, the partition index is determined. A lookup in the
|
||||
ring reveals which storage nodes contain that partition. A request is
|
||||
made to one of the storage nodes to fetch the object and, if that fails,
|
||||
requests are made to the other nodes.
|
||||
|
@ -50,14 +50,3 @@ Features and benefits
|
||||
- Utilize tools that were designed for the popular S3 API.
|
||||
* - Restrict containers per account
|
||||
- Limit access to control usage by user.
|
||||
* - Support for NetApp, Nexenta, Solidfire
|
||||
- Unified support for block volumes using a variety of storage
|
||||
systems.
|
||||
* - Snapshot and backup API for block volumes.
|
||||
- Data protection and recovery for VM data.
|
||||
* - Standalone volume API available
|
||||
- Separate endpoint and API for integration with other compute
|
||||
systems.
|
||||
* - Integration with Compute
|
||||
- Fully integrated with Compute for attaching block volumes and
|
||||
reporting on usage.
|
||||
|
Loading…
Reference in New Issue
Block a user