Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SSE-C for S3 object storage #941

Merged
merged 7 commits into from
Oct 12, 2024

Conversation

MihirLuthra
Copy link
Contributor

@MihirLuthra MihirLuthra commented Sep 25, 2024

Fixes #919.

This PR introduces new args/envs allowing users to use SSE-C for encrypting the objects in S3.

To achieve that, an arg --object-sse and env P_OBJECT_SSE have been exposed. Example usage:

parseable s3-store --object-sse SSE-C:AES256:$BASE64_ENCRYPTION_KEY

Fortunately, object_store crate had already added a way to define SSE-C key from AmazonS3Builder: apache/arrow-rs#6230. But they have not published a new version to crates.io containing this change.

Although, even if they did, it won't be usable yet. Apparently, the version of datafusion being used in parseable is not compatible with anything above 0.10.2 yet. So, for now, I have created a fork of arrow-rs with a branch checked out of object_store_0.10.2 tag. On top of that, I have cherry picked commits from:

In the changes here, I have used my own fork but if this way is acceptable, a fork would be needed in parseablehq organization instead.

Other ways could be:

  • Fix datafusion to work with 0.11.0 and ask arrow-rs maintainers to publish new version on crates.io. I think this would be a new issue in itself.
  • Ask arrow-rs maintainers if they are okay with backporting SSE-C change to 0.10 and introduce a new 0.10.3.

I have tested this with my AWS Account. If parseable is started with SSE-C configured, then an attempt to download object without the key looks like this:

❯ aws s3api get-object  --bucket parseable-bucket-2 --key demo/date=2024-09-24/hour=23/minute=46/mihir.data.czlkrB4wsZ6vf2X.parquet /dev/stdout | more

An error occurred (InvalidRequest) when calling the GetObject operation: The object was stored using a form of Server Side Encryption. The correct parameters must be provided to retrieve the object.

But following works if correct key and md5 hash given:

❯ aws s3api get-object  --bucket parseable-bucket-2 --key demo/date=2024-09-24/hour=23/minute=46/mihir.data.czlkrB4wsZ6vf2X.parquet /dev/stdout --sse-customer-algorithm AES256 --sse-customer-key $ENCRYPTION_KEY --sse-customer-key-md5 $ENCRYPTION_KEY_MD5 | more

For my testing, keys were generated as:

#!/bin/bash

# Generate a 256-bit (32-byte) AES key and base64 encode it
ENCRYPTION_KEY=$(openssl rand -base64 32)

# Compute the MD5 hash of the encryption key and base64 encode it
ENCRYPTION_KEY_MD5=$(echo -n $ENCRYPTION_KEY | base64 -d | openssl md5 -binary | base64)

echo "Encryption Key: $ENCRYPTION_KEY"
echo "Key MD5: $ENCRYPTION_KEY_MD5"

This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Copy link
Contributor

github-actions bot commented Sep 25, 2024

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@MihirLuthra
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

nitisht added a commit to parseablehq/.github that referenced this pull request Sep 25, 2024
@@ -84,6 +86,12 @@ pub struct S3Config {
#[arg(long, env = "P_S3_BUCKET", value_name = "bucket-name", required = true)]
pub bucket_name: String,

/// Server side encryption to use for operations with objects.
/// Currently, this only supports SSE-C. Value should be
/// like AES256:<base64_encoded_encryption_key>.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was meant to be of type SSE-C:AES256:<base64_encoded_encryption_key>. Will add that later today.

@nikhilsinhaparseable
Copy link
Contributor

@MihirLuthra parseable is now using object_store version 0.11.0, can you please update this PR to use the same.
Also, can you use https://github.com/parseablehq/arrow-rs instead of your fork for the ssec changes (this is only temporary, and will be used till the object_store release does not come with the changes).
Please update your PR with above mentoned comments.

@MihirLuthra
Copy link
Contributor Author

Sure, will do tonight.

@MihirLuthra
Copy link
Contributor Author

Updated. Did a confirmation test against S3, works correctly.

@MihirLuthra
Copy link
Contributor Author

Actually, just clicked me that we don't really need an arrow-rs fork anymore since object_store 0.11.0 is already being used. So, we could use the particular rev directly from https://github.com/apache/arrow-rs.git.

Let me know if you want me to change that. No particular issue with using fork as well (temporary anyway)

@nikhilsinhaparseable
Copy link
Contributor

@MihirLuthra you can do that, no need to use fork then.

@MihirLuthra
Copy link
Contributor Author

Done

@nikhilsinhaparseable
Copy link
Contributor

@MihirLuthra I have renamed the env var and related variables in the change.
Otherwise, the change looks good and ready to merge (once all checks are passed)

Thanks for the enhancement!

Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to merge

@nitisht nitisht merged commit ebb51cd into parseablehq:main Oct 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support encryption on object storage
3 participants