Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smoketest for SamplingCompressor, fix bug in varbin stats #510

Merged
merged 1 commit into from
Jul 24, 2024

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Jul 24, 2024

There are a number of failures in the past few months that only show up in an ETE test like random_access benchmark. However, random_access benchmark is very slow to run in CI.

I'm adding a very tiny SamplingCompressor test that runs at CI time. It takes <1s on my macbook, and uses 65k rows of synthetic data to try and get the compressor to go down a few common paths.

It's not perfect, but it does trigger Constant, DateTimeParts, Dict and could trigger Bitpacking if we implement #509.

Bonus: after writing this test and examining the compressed output in a few scenarios, I was noticing that Dict compression was never kicking in for out-of-order arrays. It looks like we were computing the is_strict_sorted stat incorrectly for varbin.

There are a number of failures in the past few months that only show up in an ETE test
like random_access benchmark. However, random_access benchmark is very slow to run in
CI.

I'm adding a very tiny SamplingCompressor test that runs at CI time. It takes <1s
on my macbook, and uses 65k rows of synthetic data to try and get the compressor
to go down a few common paths.

It's not perfect, but it does trigger Constant, DateTimeParts, Dict and Bitpacking
@a10y a10y requested a review from robert3005 July 24, 2024 17:51
@a10y
Copy link
Contributor Author

a10y commented Jul 24, 2024

CI Overhead seems to be minimal:

image

Ordering::Less => self.is_sorted = false,
Ordering::Less => {
self.is_sorted = false;
self.is_strict_sorted = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortuante

@a10y a10y merged commit 28d6e17 into develop Jul 24, 2024
2 checks passed
@a10y a10y deleted the aduffy/compressor-smoketest branch July 24, 2024 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants