Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load Delta Table from S3 through AWS Lambda #53

Open
ISimion opened this issue Nov 15, 2023 · 7 comments · Fixed by #54
Open

Cannot load Delta Table from S3 through AWS Lambda #53

ISimion opened this issue Nov 15, 2023 · 7 comments · Fixed by #54

Comments

@ISimion
Copy link

ISimion commented Nov 15, 2023

Unfortunately, I cannot use the latest s3fs because the latest delta-lake-reader[aws]==0.2.14 requires s3fs < 2023, and on s3fs==2022.11.0, I am getting a known issue with s3fs.

Why was that issue closed, I do not know, since it happened to lots of folks even to one of the latest versions, i.e. 2023.1.0.

Also, I would like to specify that my lambdas have all the policy permissions set to all the s3 objects and buckets through the IAM Role.

Could by any chance be released an update which can use the latest s3fs==2023.10.0, such I would know to address this as an s3fs issue, please?

Screenshot 2023-11-15 141635

[ERROR] PermissionError: Forbidden
Traceback (most recent call last):
  File "/var/task/inquire-data-set.py", line 118, in lambda_handler
    dt = DeltaTable(s3_path, file_system=fs)
  File "/var/lang/lib/python3.10/site-packages/deltalake/deltatable.py", line 40, in __init__
    if not self._is_delta_table():
  File "/var/lang/lib/python3.10/site-packages/deltalake/deltatable.py", line 62, in _is_delta_table
    return self.filesystem.exists(f"{self.log_path}")
  File "/var/lang/lib/python3.10/site-packages/fsspec/asyn.py", line 113, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/fsspec/asyn.py", line 98, in sync
    raise return_result
  File "/var/lang/lib/python3.10/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/var/lang/lib/python3.10/site-packages/s3fs/core.py", line 946, in _exists
    await self._info(path, bucket, key, version_id=version_id)
  File "/var/lang/lib/python3.10/site-packages/s3fs/core.py", line 1210, in _info
    out = await self._call_s3(
  File "/var/lang/lib/python3.10/site-packages/s3fs/core.py", line 339, in _call_s3
    return await _error_wrapper(
  File "/var/lang/lib/python3.10/site-packages/s3fs/core.py", line 139, in _error_wrapper
    raise err
@jeppe742
Copy link
Owner

Thanks @ISimion . Will try to have a look tomorrow

@jeppe742
Copy link
Owner

@ISimion published a new version. Let me know if it fixes your issues

@ISimion
Copy link
Author

ISimion commented Nov 27, 2023

@jeppe742 Thank you for upgrading that library. I retried the code with the updated delta-lake-reader[aws]==0.2.16, which now includes the latest s3fs==2023.10.0.

Unfortunately, the code breaks from another requirement. The latest delta-lake-reader will install botocore==1.31.64; however, if I try to run an AWS Lambda from an image using the official Python 3.10 runtime the botocore forced version will be 1.29.90, and that will cause the following break:

[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda-name-function': No module named 'botocore. compress'

I do not know if there are other corner cases like this, but for sure, a lambda in an official Python 3.10 image won't work with the delta-lake-reader==0.2.16 library at this point.

Maybe another issue has to be open?

@jeppe742 jeppe742 reopened this Nov 28, 2023
@jeppe742
Copy link
Owner

Hey @ISimion
Just for me to understand.
When you got the initial error, you were using s3fs==2022.11.0?

If I try to find the latest version of s3fs compatible with botocore==1.29.90 I have to go all the way back to s3fs==0.4.2 which was released back in 2020. Which seems pretty old to me
Maybe you were just lucky that s3fs==2022.11.0 worked with botocore==1.29.90, despite technically not being compatible? So not sure there is a nice way to handle it from my side. Unless I'm missing something?

I have no experience with AWS Lambda, but isn't it possible to define your own dependencies, including botocore?

@ISimion
Copy link
Author

ISimion commented Dec 2, 2023

Hey @jeppe742

I will answer all your questions in order.

  1. Your assessment is correct. And yes, using s3fs==0.4.2 just to have botocore==1.29.90 is not a solution.

  2. It might be the case that I was lucky.

  3. I do not think you are missing anything. At this point, I have all the reasons to believe that this issue is related more to s3fs, so I will address it there. Since I was using s3fs through the library you provided, I believe it was only fair to ask you first for assistance.

  4. While using AWS Lambda through the officially provided AWS Runtime image (i.e. official VM with Amazon Linux 2 operating system installed with Python 3.10 and botocore and boto3), it seems that one only gets the botocore and boto3 versions mentioned by AWS, i.e. botocore==1.29.90 and boto3==1.26.90, even if I tried to force install on that machine other versions.

Although your point is valid, theoretically, I can install a machine with an operating system and requirements of my choosing and put the AWS Lambda image on that machine; that would be too much of a burden just to make a lambda work. I wanted to know before I got to this step that I eliminated any doubts about any other way of handling the read of Delta Tables through your library using the official AWS Runtime.

As I said, at this point, it is not a delta-lake-reader issue but an s3fs issue, and I would address it properly; thank you so much for your time and involvement. I consider this issue as being closed and let you do the honors.

@jeppe742
Copy link
Owner

jeppe742 commented Dec 2, 2023

Thanks @ISimion
Hope you get it to work

Alternatively you can also try looking into delta-rs

@ISimion
Copy link
Author

ISimion commented Dec 4, 2023

Thank you as well. And yes, delta-rs is what I eventually end up using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants