-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to fetch remote files (s3 and http[s]) #34
base: master
Are you sure you want to change the base?
Conversation
bdb76c1
to
792c289
Compare
Added the ability to read directly from buffer containing the whole parquet file |
LGTM |
@ZJONSSON let's add tests if you have the time, if not let us know and we'll try to do it |
I agree. To keep tests as unit tests we would have to add a couple of things to the
A test for the buffer reader does not require any additional dependencies |
Adapters included for S3 files and files available over http(s)
Instantiating a new client blocks on retrieving filesize. But there are cases when we really don't need the filesize, for example when we have the metadata cached already.
4f5ad32
to
e3c70df
Compare
what is the status of this? abandoned? |
@Kosta-Github it's not abandoned but we're (mainly me) are having problem allocating time for this project, since the initial effort of building it. If you want to contribute, let's discuss :-) |
Allow MAP and LIST (for athena/hive)
Adapters included for S3 files and files available over http(s). Only the parts if interest are fetched over the wire, eliminating the need to download the complete files. The performance of the adapters when fetching full rows is drastically improved with #33
No tests so far, but if you are happy with the approach we could add tests using a simple localhost server and an S3 compatible local service, if that makes sense?