Skip to content

Commit

Permalink
Do not set Content-MD5 in S3. (#5226)
Browse files Browse the repository at this point in the history
[SC-52300](https://app.shortcut.com/tiledb-inc/story/52300/do-not-set-content-md5-in-s3)

This PR removes setting the `Content-MD5` header on S3 uploads. There
are multiple reasons for this:

* The header is [not supported on directory
buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-differences.html#s3-express-differences-unsupported-features)
used by S3 Express One Zone.
* Computing the MD5 hash of the data immediately before uploading them
has a performance overhead and does not provide additional security.
* We don't do a similar thing in the other cloud VFSes.

Validated by:
* Creating, writing, consolidating and vacuuming an array on a directory
bucket.
* Running `quickstart_dense_cpp` on a Google Cloud Storage bucket using
the S3 compatibility API.

---
TYPE: BUG
DESC: Fix incompatibilities with S3 Express One Zone, by stopping
setting `Content-MD5` on all S3 uploads.

Co-authored-by: KiterLuc <[email protected]>
  • Loading branch information
teo-tsirpanis and KiterLuc authored Aug 15, 2024
1 parent ca51aa0 commit 5b92634
Showing 1 changed file with 0 additions and 18 deletions.
18 changes: 0 additions & 18 deletions tiledb/sm/filesystem/s3.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1786,12 +1786,6 @@ void S3::write_direct(const URI& uri, const void* buffer, uint64_t length) {
put_object_request.SetBody(stream);
put_object_request.SetContentLength(length);

// we only want to hash once, and must do it after setting the body
auto md5_hash =
Aws::Utils::HashingUtils::CalculateMD5(*put_object_request.GetBody());

put_object_request.SetContentMD5(
Aws::Utils::HashingUtils::Base64Encode(md5_hash));
put_object_request.SetContentType("application/octet-stream");
put_object_request.SetBucket(aws_uri.GetAuthority());
put_object_request.SetKey(aws_uri.GetPath());
Expand All @@ -1815,16 +1809,6 @@ void S3::write_direct(const URI& uri, const void* buffer, uint64_t length) {
outcome_error_message(put_object_outcome));
}

// verify the MD5 hash of the result
// note the etag is hex-encoded not base64
Aws::StringStream md5_hex;
md5_hex << "\"" << Aws::Utils::HashingUtils::HexEncode(md5_hash) << "\"";
if (md5_hex.str() != put_object_outcome.GetResult().GetETag()) {
throw S3Exception(
"Object uploaded successfully, but MD5 hash does "
"not match result from server!' ");
}

throw_if_not_ok(wait_for_object_to_propagate(
put_object_request.GetBucket(), put_object_request.GetKey()));
}
Expand Down Expand Up @@ -1974,8 +1958,6 @@ S3::MakeUploadPartCtx S3::make_upload_part_req(
upload_part_request.SetPartNumber(upload_part_num);
upload_part_request.SetUploadId(upload_id);
upload_part_request.SetBody(stream);
upload_part_request.SetContentMD5(Aws::Utils::HashingUtils::Base64Encode(
Aws::Utils::HashingUtils::CalculateMD5(*stream)));
upload_part_request.SetContentLength(length);
if (request_payer_ != Aws::S3::Model::RequestPayer::NOT_SET)
upload_part_request.SetRequestPayer(request_payer_);
Expand Down

0 comments on commit 5b92634

Please sign in to comment.