Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow passing in metadata_size_hint on a per-file basis #13213

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Nov 1, 2024

Currently you can only pass it to ParquetExecBuilder but that applies to entire groups of files. This adds an API to set it on a per-file basis in cases where you know the size of the metadata in each file. In our case we have a metadata store that stores information about each file, including the size of the metadata, so we know the exact size in bytes and can ensure that we never have to double-request to read the entire metadata.

@github-actions github-actions bot added core Core DataFusion crate substrait proto Related to proto crate labels Nov 1, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb -- the code looks good to me.

However, I worry about the lack of a test. Specifically without a test of some sort I think it is likely this feature could easily get broken in a refactor and we wouldn't catch it in review

@adriangb
Copy link
Contributor Author

adriangb commented Nov 3, 2024

@alamb I added a test in b3890a8. It doesn't test below create_reader because that part was not touched and it starts getting complicated (tracking ObjectStore calls, etc.) so I tried to restrict the test to only the bits of code impacted by this change.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @adriangb

@adriangb
Copy link
Contributor Author

adriangb commented Nov 5, 2024

Amazing! Let me know what else is needed to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate proto Related to proto crate substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants