-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistics #53
base: master
Are you sure you want to change the base?
Statistics #53
Conversation
Improved tests required: should capture statistics that are different across pages and row_groups and include null_values and unique_value counts |
Not ready to merge. |
* bitpacking should work for any length of data, not just multiple of 8 (last packed is padded if less than 8) * Improve runs estimation - only start a new run if we are at a mod 8 === 0, otherwise use bitpacking
This moves data into encoded buffer as soon as possible, reducing memory requirements for the whole rowGroup
e6a3cfa
to
70a67f1
Compare
Default for all columns unless `statistics: false` in the field definition
Hi, I see this PR has been pending for almost a year now. Do you need any help? |
Is there anything I could do to help with this PR? |
Subsequent to #52
Calculate statistics for each page and each column, including:
max_value, min_value, null_count, distinct_count
. For any columns that are sorted, the statistics either on column level or page level allows skipping over sections that are not of interest.