-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make use of Parquet size statistics when reading #14714
Conversation
/ok to test |
Started some perf testing, and found that reading the indexes was having a negative impact on the non-string benchmarks. After dealing with that, the current code is anywhere from 1-20% faster for strings.
|
For comparison, reading the page stats when they aren't used has the following impact:
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No fundamental concerns with the changes, just some minor comments/questions.
Going to close this for now and wait for #14360 to merge before trying again. |
Description
#14000 added writing of page size statistics to the Parquet writer. This PR is an attempt to use those statistics to skip some preprocessing steps, primarily the page string size calculations.
Still a work-in-progress...feedback welcome.
Checklist