Clay Gerrard 754defc39c Client should retry when there's just one 404 and a bunch of errors
During a rebalance, it's expected that we may get a 404 for data that
does exist elsewhere in the cluster. Normally this isn't a problem; the
proxy sees the 404, keeps digging, and one of the other primaries will
serve the response.

Previously, if the other replicas were heavily loaded, the proxy would
see a bunch of timeouts and the fresh (empty) primary, treat the 404 as
good, and send that on to the client.

Now, have the proxy throw out that first 404 (provided it doesn't have a
timestamp); it will then return a 503 to the client, indicating that it
should try again.

Add a new (per-policy) proxy-server config option,
rebalance_missing_suppression_count; operators may use this to increase
the number of 404-no-timestamp responses to discard if their rebalances
are going faster than replication can keep up, or set it to zero to
return to the previous behavior.

Change-Id: If4bd39788642c00d66579b26144af8f116735b4d
2020-09-08 14:33:09 -07:00
..