Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics #53

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Statistics #53

wants to merge 4 commits into from

Conversation

ZJONSSON
Copy link
Contributor

Subsequent to #52

Calculate statistics for each page and each column, including:max_value, min_value, null_count, distinct_count. For any columns that are sorted, the statistics either on column level or page level allows skipping over sections that are not of interest.

@ZJONSSON
Copy link
Contributor Author

Improved tests required: should capture statistics that are different across pages and row_groups and include null_values and unique_value counts

@ZJONSSON
Copy link
Contributor Author

ZJONSSON commented Feb 28, 2018

Not ready to merge. max_value and min_value have to be encoded with the column encoding

* bitpacking should work for any length of data, not just multiple of 8 (last packed is padded if less than 8)

* Improve runs estimation - only start a new run if we are at a mod 8 === 0, otherwise use bitpacking
ZJONSSON added 2 commits March 2, 2018 11:36
This moves data into encoded buffer as soon as possible, reducing memory requirements for the whole rowGroup
@ZJONSSON ZJONSSON force-pushed the statistics branch 2 times, most recently from e6a3cfa to 70a67f1 Compare March 3, 2018 00:31
Default for all columns unless `statistics: false` in the field definition
@hadrienk
Copy link

hadrienk commented Feb 11, 2019

Hi,

I see this PR has been pending for almost a year now. Do you need any help?
I can test locally or contribute if there's more to do.

@dobesv
Copy link
Contributor

dobesv commented Nov 29, 2019

Is there anything I could do to help with this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants