From 0da68e4f74652eb943ceacdee5bc87755aeca045 Mon Sep 17 00:00:00 2001 From: Aindriu Lavelle Date: Wed, 8 Jan 2025 04:57:52 +0000 Subject: [PATCH 1/2] Adding information for handling S3 Sink multipart upload aborted parts Signed-off-by: Aindriu Lavelle --- s3-sink-connector/README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/s3-sink-connector/README.md b/s3-sink-connector/README.md index fb690d5e8..1c4af1c88 100644 --- a/s3-sink-connector/README.md +++ b/s3-sink-connector/README.md @@ -625,6 +625,14 @@ There are four configuration properties to configure retry strategy exists. - To use SSE-KMS set to `aws:kms` - To use DSSE-KMS set to `aws:kms:dsse` + +### Incomplete Multipart uploads +The S3 Sink Connector uploads the files using the S3 Mutli part upload API for improved performance and handling large file sizes. +Occasionally the API can throw an exception or the connector can fail to complete a multipart upload. +This can leave "Parts" of the multipart upload on S3 waiting to complete taking up unnecessary space. +To handle these incomplete parts AWS recommends setting up a Lifecycle rule to delete old parts that weren't completed as described in this excellent (blog post)[https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/]. +Or if you would prefer to work through the official documentation it is available (here)[https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html] + ## Development ### Developing together with Commons library From 5249f8b69f6bc5d2cf0862720e0fd8a51340a10e Mon Sep 17 00:00:00 2001 From: Aindriu Lavelle Date: Wed, 22 Jan 2025 10:52:36 +0000 Subject: [PATCH 2/2] Corrections & improvements from review Signed-off-by: Aindriu Lavelle --- s3-sink-connector/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/s3-sink-connector/README.md b/s3-sink-connector/README.md index 1c4af1c88..f0e68ee61 100644 --- a/s3-sink-connector/README.md +++ b/s3-sink-connector/README.md @@ -626,12 +626,12 @@ There are four configuration properties to configure retry strategy exists. - To use DSSE-KMS set to `aws:kms:dsse` -### Incomplete Multipart uploads -The S3 Sink Connector uploads the files using the S3 Mutli part upload API for improved performance and handling large file sizes. +### Cleaning temporary files from failed multipart uploads +The S3 Sink Connector uploads files using the S3 multipart upload API for improved performance and handling large files. Occasionally the API can throw an exception or the connector can fail to complete a multipart upload. -This can leave "Parts" of the multipart upload on S3 waiting to complete taking up unnecessary space. +This can leave orphaned "parts" of a failed multipart upload taking up unnecessary space. To handle these incomplete parts AWS recommends setting up a Lifecycle rule to delete old parts that weren't completed as described in this excellent (blog post)[https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/]. -Or if you would prefer to work through the official documentation it is available (here)[https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html] +Alternatively, if you would prefer to work through the official documentation it is available (here)[https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html] ## Development