-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3.read(-1)
for a large file (2^31+α bytes) fails due to an SSL OverflowError
#271
Comments
It seems that this Python bug is deeply related, which is apparently dealt in Python 3.10~. For pfio, since we cannot drop support for Python 3.8~ right now, I guess we need some workarounds to prevent attempting to read the whole content at once even when The naive approach would be to modify I wonder if there is a way to somehow force |
Strictly speaking, 42853 was fixed in Python 3.9.7 (release note). I knew this issue in January, but I didn't report here, sorry! I thought at that time reading a fairly large file (>2GB) at once was a bit rare use case so that it doesn't pay implementing a workaround for the issue. Regarding the fact you reported here, did you find this issue in your application? Python 3.8 EoL is scheduled in 2024-10. It'll be more than two years from today, and 3.8 maintenance state is security update only. 42853 isn't a vulnerability, so it' won't be fixed in 3.8 branch. Hmmm.... |
We just had observed an internal use case of loading large pickle file failure like this: import pickle
from pfio.v2 import open_url
with open_url("s3://very/large/file.pickle", "rb") as fp:
pickle.load(fp) # Gets the exception |
Update: even after 3.9.7, this issue reproduced against loading large ndarray pickled files, possibly because the binary protocol forces to load a large array more than 2G to read at once from SSL. This is fixed in Python 3.10, as it uses
So the complete resolution for this issue is to use Python 3.10. |
I found out that when reading the entire content of a file of 2+αGiB from S3 fails by
OverflowError: signed integer is greater than maximum
exception raised from Python SSL library.Here is the minimum reproduction.
Reading the error message, I thought reading a file of 2^31 bytes is fine and 2^31+1 bytes is NG, but it seems to be slightly different; the threshold is somewhere between 2147490816 (2^31+7k) ~ 2147491840 (2^31+8k).
I think the S3 API itself should support reading such a large file, but the issue is in Python SSL library layer (if so, maybe it'd be better trying Python 3.9 and 3.10).
Here is my environment:
The text was updated successfully, but these errors were encountered: