The object replicator checks each partition directory to ensure it's
really a directory and not a zero-byte file. This was happening in
collect_jobs(), which is the first thing that the object replicator
does.
The effect was that, at startup, the object-replicator process would
list each "objects" or "objects-N" directory on each object device,
then stat() every single thing in there. On devices with lots of
partitions on them, this makes the replicator take a long time before
it does anything useful.
If you have a cluster with a too-high part_power plus some failing
disks elsewhere, you can easily get thousands of partition directories
on each disk. If you've got 36 disks per node, that turns into a very
long wait for the object replicator to do anything. Worse yet, if you
add in a configuration management system that pushes new rings every
couple hours, the object replicator can spend the vast majority of its
time collecting jobs, then only spend a short time doing useful work
before the ring changes and it has to start all over again.
This commit moves the stat() call (os.path.isfile) to the loop that
processes jobs. In a complete pass, the total work done is about the
same, but the replicator starts doing useful work much sooner.
Change-Id: I5ed4cd09dde514ec7d1e74afe35feaab0cf28a10