-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow passing in metadata_size_hint on a per-file basis #13213
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adriangb -- the code looks good to me.
However, I worry about the lack of a test. Specifically without a test of some sort I think it is likely this feature could easily get broken in a refactor and we wouldn't catch it in review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @adriangb
Amazing! Let me know what else is needed to merge |
Currently you can only pass it to
ParquetExecBuilder
but that applies to entire groups of files. This adds an API to set it on a per-file basis in cases where you know the size of the metadata in each file. In our case we have a metadata store that stores information about each file, including the size of the metadata, so we know the exact size in bytes and can ensure that we never have to double-request to read the entire metadata.