Multipart uploads in AWS (seem to) have ETags like:
'"' + MD5_hex(MD5(part1) + ... + MD5(partN)) + '-' + N + '"'
On the other hand, Swift SLOs have Etags like:
MD5_hex(MD5_hex(part1) + ... + MD5_hex(partN))
(In both examples, MD5 gets the raw 16-byte digest while MD5_hex
gets the 32-byte hex-encoded digest.)
Some clients (such as aws-sdk-java) use the presence of a dash
to decide whether to perform client-side validation of downloads.
Other clients (like s3cmd) use the presence of a dash *in bucket
listings* to decide whether or not to perform additional HEAD requests
to look for MD5 metadata that can be used to compare against the MD5s
of local files.
Now we include a dash as well, to prevent spurious errors like
> Unable to verify integrity of data download. Client calculated
> content hash didn't match hash calculated by Amazon S3. The data
> may be corrupt.
or unnecessary uploads/downloads because the client assumes data has
changed that hasn't.
For new multipart-uploads via the S3 API, the ETag that is stored will
be calculated in the same way that AWS uses. This ETag will be used in
GET/HEAD responses, bucket listings, and conditional requests via the S3
API. Accessing the same object via the Swift API will use the SLO Etag;
however, in JSON container listings the multipart upload etag will be
exposed in a new "s3_etag" key.
New SLOs and pre-existing multipart-uploads will continue to behave as
before; there is no data migration or mitigation as part of this patch.
Change-Id: Ibe68c44bef6c17605863e9084503e8f5dc577fab
Closes-Bug: 1522578