You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The existing pre-validation mechanism introduced in PR #65 handles common cases such as Hive tables and checkpoint locations. However, it currently only validates read permissions on the checkpoint location by checking the existence of the checkpoint folder. This approach does not verify write permissions, which can lead to issues if the checkpoint location is not writable.
What solution would you like?
Potential solution:
Write Permission Verification: Create a temporary folder at the given checkpoint location to ensure that the location has write permissions.
Read Permission Verification: Check the existence of this temporary folder to confirm read permissions.
Clean up the temporary folder? [TBD]
What alternatives have you considered?
Use a temporary file to reduce the footprint
Add pre-validation in SQL plugin
Do you have any additional context?
Currently, the query passes pre-validation and executes the refreshIndex API. However, during the start() API call in a Spark streaming job, before the streaming thread starts, the checkpoint manager attempts to create the checkpoint folder but fails due to insufficient write permissions with detailed error below:
24/06/27 00:22:53 ERROR DefaultOptimisticTransaction: Rolling back transient log due to transaction operation failure
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied
(Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: QPPMMXGBH3EX7W9W;
S3 Extended Request ID: GdtSQViVaQNHUh7xLjc+TqvjeKcSwQduf47lWk+c8DlvE47qhyXImyNVQ2Yj6TOi3v7OMTJYV8A=;
Proxy: null)
...
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem?
The existing pre-validation mechanism introduced in PR #65 handles common cases such as Hive tables and checkpoint locations. However, it currently only validates read permissions on the checkpoint location by checking the existence of the checkpoint folder. This approach does not verify write permissions, which can lead to issues if the checkpoint location is not writable.
What solution would you like?
Potential solution:
What alternatives have you considered?
Do you have any additional context?
Currently, the query passes pre-validation and executes the refreshIndex API. However, during the start() API call in a Spark streaming job, before the streaming thread starts, the checkpoint manager attempts to create the checkpoint folder but fails due to insufficient write permissions with detailed error below:
The text was updated successfully, but these errors were encountered: