1880351f1a
Imagine a 3-zone ring, and consider a partition in that ring with replicas placed as follows: * replica 0 is on device A (zone 2) * replica 1 is on device B (zone 1) * replica 2 is on device C (zone 2) Further, imagine that there are zero parts_wanted in all of zone 3; that is, zone 3 is completely full. However, zones 1 and 2 each have at least one parts_wanted on at least one device. When the ring builder goes to gather replicas to move, it gathers replica 0 because there are three zones available, but the replicas are only in two of them. Then, it places replica 0 in zone 1 or 2 somewhere because those are the only zones with parts_wanted. Notice that this does *not* do anything to spread the partition out better. Then, on the next rebalance, replica 0 gets picked up and moved (again) but doesn't improve its placement (again). If your builder has min_part_hours > 0 (and it should), then replicas 1 and 2 cannot move at all. A coworker observed the bug because a customer had such a partition, and its replica 2 was on a zero-weight device. He thought it odd that a zero-weight device should still have one partition on it despite the ring having been rebalanced dozens of times. Even if you don't have zero-weight devices, having a bunch of partitions trade places on each rebalance isn't particularly good. Note that this only happens with an unbalanceable ring; if the ring *can* balance, the gathered partitions will swap places, but they will get spread across more zones, so they won't get gathered up again on the next rebalance. Change-Id: I8f44f032caac25c44778a497dedf23f5cb61b6bb Closes-Bug: 1400083 |
||
---|---|---|
.. | ||
__init__.py | ||
test_builder.py | ||
test_ring.py | ||
test_utils.py |