Go to file

Clay Gerrard 7035639dfd Put part-replicas where they go

It's harder than it sounds.  There was really three challenges.

Challenge #1 Initial Assignment
===============================

Before starting to assign parts on this new shiny ring you've
constructed, maybe we'll pause for a moment up front and consider the
lay of the land.  This process is called the replica_plan.

The replica_plan approach is separating part assignment failures into
two modes:

 1) we considered the cluster topology and it's weights and came up with
    the wrong plan

 2) we failed to execute on the plan

I failed at both parts plenty of times before I got it this close.  I'm
sure a counter example still exists, but when we find it the new helper
methods will let us reason about where things went wrong.

Challenge #2 Fixing Placement
=============================

With a sound plan in hand, it's much easier to fail to execute on it the
less material you have to execute with - so we gather up as many parts
as we can - as long as we think we can find them a better home.

Picking the right parts for gather is a black art - when you notice a
balance is slow it's because it's spending so much time iterating over
replica2part2dev trying to decide just the right parts to gather.

The replica plan can help at least in the gross dispersion collection to
gather up the worst offenders first before considering balance.  I think
trying to avoid picking up parts that are stuck to the tier before
falling into a forced grab on anything over parts_wanted helps with
stability generally - but depending on where the parts_wanted are in
relation to the full devices it's pretty easy pick up something that'll
end up really close to where it started.

I tried to break the gather methods into smaller pieces so it looked
like I knew what I was doing.

Going with a MAXIMUM gather iteration instead of balance (which doesn't
reflect the replica_plan) doesn't seem to be costing me anything - most
of the time the exit condition is either solved or all the parts overly
aggressively locked up on min_part_hours.  So far, it mostly seemds if
the thing is going to balance this round it'll get it in the first
couple of shakes.

Challenge #3 Crazy replica2part2dev tables
==========================================

I think there's lots of ways "scars" can build up a ring which can
result in very particular replica2part2dev tables that are physically
difficult to dig out of.  It's repairing these scars that will take
multiple rebalances to resolve.

... but at this point ...

... lacking a counter example ...

I've been able to close up all the edge cases I was able to find.  It
may not be quick, but progress will be made.

Basically my strategy just required a better understanding of how
previous algorithms were able to *mostly* keep things moving by brute
forcing the whole mess with a bunch of randomness.  Then when we detect
our "elegant" careful part selection isn't making progress - we can fall
back to same old tricks.

Validation
==========

We validate against duplicate part replica assignment after rebalance
and raise an ERROR if we detect more than one replica of a part assigned
to the same device.

In order to meet that requirement we have to have as many devices as
replicas, so attempting to rebalance with too few devices w/o changing
your replica_count is also an ERROR not a warning.

Random Thoughts
===============

As usual with rings, the test diff can be hard to reason about -
hopefully I've added enough comments to assure future me that these
assertions make sense.

Despite being a large rewrite of a lot of important code, the existing
code is known to have failed us.  This change fixes a critical bug that's
trivial to reproduce in a critical component of the system.

There's probably a bunch of error messages and exit status stuff that's
not as helpful as it could be considering the new behaviors.

Change-Id: I1bbe7be38806fc1c8b9181a722933c18a6c76e05
Closes-Bug: #1452431

2015-12-07 16:06:42 -08:00

bin

py3: Replace urllib imports with six.moves.urllib

2015-10-08 15:24:13 +02:00

doc

Merge "Fix missing *-replicator conf sections in deployment guide"

2015-11-03 22:51:24 +00:00

etc

Fix missing *-replicator conf sections in deployment guide

2015-10-23 14:58:38 +01:00

examples

Add a user variable to templates

2013-09-17 11:46:04 +10:00

swift

Put part-replicas where they go

2015-12-07 16:06:42 -08:00

test

Put part-replicas where they go

2015-12-07 16:06:42 -08:00

.alltests

Script for running unit, func and probe tests at once

2015-10-13 09:10:09 +02:00

.coveragerc

Fix .coveragrc to prevent nose tests error

2015-09-21 10:06:29 +01:00

.functests

Move the tests from functionalnosetests

2014-01-07 15:58:11 +08:00

.gitignore

more probe test refactoring

2015-02-13 16:55:45 -08:00

.gitreview

make git review easier

2015-04-01 12:41:44 -07:00

.mailmap

authors and changelog update for 2.5.0

2015-10-02 21:28:15 -07:00

.probetests

Allow specify arguments to .probetests script

2013-12-24 01:18:19 -08:00

.unittests

Fix coverage report for newer versions of coverage

2014-04-24 16:50:03 +00:00

AUTHORS

authors and changelog update for 2.5.0

2015-10-02 21:28:15 -07:00

babel.cfg

add pybabel setup.py commands and initial .pot

2011-01-27 00:01:24 +00:00

bandit.yaml

Adding bandit for security static analysis testing in swift

2015-07-31 07:37:33 +05:30

CHANGELOG

authors and changelog update for 2.5.0

2015-10-02 21:28:15 -07:00

CONTRIBUTING.md

Add Swift Design Principles to CONTRIBUTING.md

2015-03-27 13:13:31 -04:00

LICENSE

Convert LICENSE to use unix style line endings.

2012-12-19 12:48:27 -05:00

MANIFEST.in

Add requirements files to the source distribution

2013-06-03 19:26:20 +04:00

README.md

added testing notes to the contributing doc

2014-12-04 10:41:11 -05:00

requirements.txt

On py3, use dnspython3 dependency, not dnspython

2015-11-05 15:56:24 +01:00

setup.cfg

versioned writes middleware

2015-08-07 14:11:32 -04:00

setup.py

taking the global reqs that we can

2014-05-21 09:37:22 -07:00

test-requirements.txt

Merge "Adding bandit for security static analysis testing in swift"

2015-08-12 20:55:16 +00:00

tox.ini

Enable H234 check (assertEquals is deprecated, use assertEqual)

2015-10-12 07:40:17 +00:00

README.md

Swift

A distributed object storage system designed to scale from a single machine to thousands of servers. Swift is optimized for multi-tenancy and high concurrency. Swift is ideal for backups, web and mobile content, and any other unstructured data that can grow without bound.

Swift provides a simple, REST-based API fully documented at http://docs.openstack.org/.

Swift was originally developed as the basis for Rackspace's Cloud Files and was open-sourced in 2010 as part of the OpenStack project. It has since grown to include contributions from many companies and has spawned a thriving ecosystem of 3rd party tools. Swift's contributors are listed in the AUTHORS file.

Docs

To build documentation install sphinx (pip install sphinx), run python setup.py build_sphinx, and then browse to /doc/build/html/index.html. These docs are auto-generated after every commit and available online at http://docs.openstack.org/developer/swift/.

For Developers

The best place to get started is the "SAIO - Swift All In One". This document will walk you through setting up a development cluster of Swift in a VM. The SAIO environment is ideal for running small-scale tests against swift and trying out new features and bug fixes.

You can run unit tests with .unittests and functional tests with .functests.

If you would like to start contributing, check out these notes to help you get started.

Code Organization

bin/: Executable scripts that are the processes run by the deployer
doc/: Documentation
etc/: Sample config files
swift/: Core code
- account/: account server
- common/: code shared by different modules
  - middleware/: "standard", officially-supported middleware
  - ring/: code implementing Swift's ring
- container/: container server
- obj/: object server
- proxy/: proxy server
test/: Unit and functional tests

Data Flow

Swift is a WSGI application and uses eventlet's WSGI server. After the processes are running, the entry point for new requests is the Application class in swift/proxy/server.py. From there, a controller is chosen, and the request is processed. The proxy may choose to forward the request to a back- end server. For example, the entry point for requests to the object server is the ObjectController class in swift/obj/server.py.

For Deployers

Deployer docs are also available at http://docs.openstack.org/developer/swift/. A good starting point is at http://docs.openstack.org/developer/swift/deployment_guide.html

You can run functional tests against a swift cluster with .functests. These functional tests require /etc/swift/test.conf to run. A sample config file can be found in this source tree in test/sample.conf.

For Client Apps

For client applications, official Python language bindings are provided at http://github.com/openstack/python-swiftclient.

Complete API documentation at http://docs.openstack.org/api/openstack-object-storage/1.0/content/

For more information come hang out in #openstack-swift on freenode.

Thanks,

The Swift Development Team