Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Enhance checkpoint location pre-validation for write permissions #404

Closed
dai-chen opened this issue Jun 28, 2024 · 0 comments
Closed
Assignees
Labels
0.5 enhancement New feature or request

Comments

@dai-chen
Copy link
Collaborator

Is your feature request related to a problem?

The existing pre-validation mechanism introduced in PR #65 handles common cases such as Hive tables and checkpoint locations. However, it currently only validates read permissions on the checkpoint location by checking the existence of the checkpoint folder. This approach does not verify write permissions, which can lead to issues if the checkpoint location is not writable.

What solution would you like?

Potential solution:

  1. Write Permission Verification: Create a temporary folder at the given checkpoint location to ensure that the location has write permissions.
  2. Read Permission Verification: Check the existence of this temporary folder to confirm read permissions.
  3. Clean up the temporary folder? [TBD]

What alternatives have you considered?

  1. Use a temporary file to reduce the footprint
  2. Add pre-validation in SQL plugin

Do you have any additional context?

Currently, the query passes pre-validation and executes the refreshIndex API. However, during the start() API call in a Spark streaming job, before the streaming thread starts, the checkpoint manager attempts to create the checkpoint folder but fails due to insufficient write permissions with detailed error below:

24/06/27 00:22:53 ERROR DefaultOptimisticTransaction: Rolling back transient log due to transaction operation failure
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied 
(Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: QPPMMXGBH3EX7W9W; 
S3 Extended Request ID: GdtSQViVaQNHUh7xLjc+TqvjeKcSwQduf47lWk+c8DlvE47qhyXImyNVQ2Yj6TOi3v7OMTJYV8A=; 
Proxy: null)
    ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.5 enhancement New feature or request
Development

No branches or pull requests

1 participant