diff --git a/doc/source/admin_guide.rst b/doc/source/admin_guide.rst index 9c74e0a1e0..78c5b2bec7 100644 --- a/doc/source/admin_guide.rst +++ b/doc/source/admin_guide.rst @@ -145,9 +145,20 @@ then it is just best to replace the drive, format it, remount it, and let replication fill it up. If the drive can't be replaced immediately, then it is best to leave it -unmounted, and remove the drive from the ring. This will allow all the +unmounted, and set the device weight to 0. This will allow all the replicas that were on that drive to be replicated elsewhere until the drive -is replaced. Once the drive is replaced, it can be re-added to the ring. +is replaced. Once the drive is replaced, the device weight can be increased +again. Setting the device weight to 0 instead of removing the drive from the +ring gives Swift the chance to replicate data from the failing disk too (in case +it is still possible to read some of the data). + +Setting the device weight to 0 (or removing a failed drive from the ring) has +another benefit: all partitions that were stored on the failed drive are +distributed over the remaining disks in the cluster, and each disk only needs to +store a few new partitions. This is much faster compared to replicating all +partitions to a single, new disk. It decreases the time to recover from a +degraded number of replicas significantly, and becomes more and more important +with bigger disks. ----------------------- Handling Server Failure