swift/test
Clay Gerrard 5070869ac0 Validate against duplicate device part replica assignment
We should never assign multiple replicas of the same partition to the
same device - our on-disk layout can only support a single replica of a
given part on a single device.  We should not do this, so we validate
against it and raise a loud warning if this terrible state is ever
observed after a rebalance.

Unfortunately currently there's a couple not necessarily uncommon
scenarios which will trigger this observed state today:

 1. If we have less devices than replicas
 2. If a server or zones aggregate device weight make it the most
    appropriate candidate for multiple replicas and you're a bit unlucky

Fixing #1 would be easy, we should just not allow that state anymore.
Really we never did - if you have a 3 replica ring with one device - you
have one replica.  Everything that iter_nodes'd would de-dupe.  We
should just be insisting that you explicitly acknowledge your replica
count with set_replicas.

I have been lost in the abyss for days searching for a general solutions
to #2.  I'm sure it exists, but I will not have wrestled it to
submission by RC1.  In the meantime we can eliminate a great deal of the
luck required simply by refusing to place more than one replica of a
part on a device in assign_parts.

The meat of the change is a small update to the .validate method in
RingBuilder.  It basically unrolls a pre-existing (part, replica) loop
so that all the replicas of the part come out in order so that we can
build up the set of dev_id's for which all the replicas of a given part
are assigned part-by-part.

If we observe any duplicates - we raise a warning.

To clean the cobwebs out of the rest of the corner cases we're going to
delay get_required_overload from kicking in until we achive dispersion,
and a small check was added when selecting a device subtier to validate
if it's already being used - picking any other device in the tier works
out much better.  If no other devices are available in the tier - we
raise a warning.  A more elegant or optimized solution may exist.

Many unittests did not meet the criteria #1, but the fix was straight
forward after being identified by the pigeonhole check.

However, many more tests were affected by #2 - but again the fix came to
be simply adding more devices.  The fantasy that all failure domains
contain at least replica count devices is prevalent in both our ring
placement algorithm and it's tests.  These tests were trying to
demonstrate some complex characteristics of our ring placement algorithm
and I believe we just got a bit too carried away trying to find the
simplest possible example to demonstrate the desirable trait.  I think
a better example looks more like a real ring - with many devices in each
server and many servers in each zone - I think more devices makes the
tests better.  As much as possible I've tried to maintain the original
intent of the tests - when adding devices I've either spread the weight
out amongst them or added proportional weights to the other tiers.

I added an example straw man test to validate that three devices with
different weights in three different zones won't blow up.  Once we can
do that without raising warnings and assigning duplicate device part
replicas - we can add more.  And more importantly change the warnings to
errors - because we would much prefer to not do that #$%^ anymore.

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Related-Bug: #1452431
Change-Id: I592d5b611188670ae842fe3d030aa3b340ac36f9
2015-10-02 16:42:25 -07:00
..
functional Add test that a tempurl POST cannot set a DLO manifest header 2015-08-26 16:30:23 +01:00
probe Merge "Fix purge for tombstone only REVERT job" 2015-09-19 18:42:30 +00:00
unit Validate against duplicate device part replica assignment 2015-10-02 16:42:25 -07:00
__init__.py test/(functional/probe):Replace python print operator with print function (pep H233, py33) 2015-08-20 11:42:58 +09:00
sample.conf Add extra_header_count to document and config. 2015-08-05 22:09:40 +08:00