Explain how replication works more clearly

A replicator creates an extra replica when it detects a remote disk
failure. However, when it fails to create a replica due to other
reasons (e.g. entire node failures), it doesn't create another replica
at all.  We should explain it explicitly so that users can know it.
This fixes bug 906976.

Change-Id: I2f56428ccbbb0cf0d8538ca6e08f7da71257e661
This commit is contained in:
MORITA Kazutaka 2011-12-21 02:08:40 +09:00
parent 79fbd95433
commit c98ee54f68

View File

@ -8,7 +8,7 @@ Replication uses a push model, with records and files generally only being copie
Every deleted record or file in the system is marked by a tombstone, so that deletions can be replicated alongside creations. These tombstones are cleaned up by the replication process after a period of time referred to as the consistency window, which is related to replication duration and how long transient failures can remove a node from the cluster. Tombstone cleanup must be tied to replication to reach replica convergence. Every deleted record or file in the system is marked by a tombstone, so that deletions can be replicated alongside creations. These tombstones are cleaned up by the replication process after a period of time referred to as the consistency window, which is related to replication duration and how long transient failures can remove a node from the cluster. Tombstone cleanup must be tied to replication to reach replica convergence.
If a replicator detects that a remote drive is has failed, it will use the ring's "get_more_nodes" interface to choose an alternate node to synchronize with. The replicator can generally maintain desired levels of replication in the face of hardware failures, though some replicas may not be in an immediately usable location. If a replicator detects that a remote drive is has failed, it will use the ring's "get_more_nodes" interface to choose an alternate node to synchronize with. The replicator can maintain desired levels of replication in the face of disk failures, though some replicas may not be in an immediately usable location. Note that the replicator doesn't maintain desired levels of replication in the case of other failures (e.g. entire node failures) because the most of such failures are transient.
Replication is an area of active development, and likely rife with potential improvements to speed and correctness. Replication is an area of active development, and likely rife with potential improvements to speed and correctness.