Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kms_key in sagemaker.processing.Processor should default to output_kms_key #4874

Open
admivsn opened this issue Sep 17, 2024 · 1 comment
Open
Labels

Comments

@admivsn
Copy link
Contributor

admivsn commented Sep 17, 2024

Describe the bug
The kms_key used to encrypt either the user code file or local inputs when uploading to S3 should default to output_kms_key.

This would align the behaviour of with sagemaker.estimator.Estimator where output_kms_key is used to encrypt the tar'd user training code when uploading to S3.

Also, since output_kms_key is resolved from the config it means that kms_key can inherit this default from the config.

To reproduce
A clear, step-by-step set of instructions to reproduce the bug.
The provided code need to be complete and runnable, if additional data is needed, please include them in the issue.

Expected behavior
The kms_key should default to output_kms_key. This can be implemented in either:

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: v2.232.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
  • Framework version: N/A
  • Python version: 3.10
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N/A

Additional context
Add any other context about the problem here.

@admivsn admivsn added the bug label Sep 17, 2024
@cylaceste
Copy link

This is well-thought out! My thoughts are: output_kms_key being applied to the input code doesn't quite make sense from a naming perspective.

And in the estimator case, it's only when the buckets are the same (output and code) that it's applied.

kms_key = self.output_kms_key if code_bucket == output_bucket else None

Probably because it's generally safe to assume if the bucket is the same and the output has some encryption, the same encryption can be applied to the code, whereas not applying it to the code might not work for uploading due to bucket policies. Maybe this logic was added as a fix to get around some common user errors caused by bucket encryption policies.

Actually it might make sense for the estimator to encrypt the code with a kms_key that is not the output_kms_key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants