Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chunked array stat merging #303

Merged
merged 5 commits into from
May 8, 2024
Merged

Fix chunked array stat merging #303

merged 5 commits into from
May 8, 2024

Conversation

robert3005
Copy link
Member

Stats merging conflated min and max statistics and would wrongly merge max
values. If statistics were not computed for an array then merging would result
in incorrect results only spanning one part of the array

@robert3005 robert3005 enabled auto-merge (squash) May 7, 2024 16:50
@robert3005
Copy link
Member Author

@jdcasale this is probably the way it should have been. Before we assumed that empty was completely empty but that is a weird assumption to make and can lead to hard to notice bugs

@gatesn gatesn requested a review from jdcasale May 8, 2024 08:57
Copy link
Contributor

@jdcasale jdcasale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge_sortedness_stat is wrong too, fixed this and added some more tests in #306

If that merges into this branch, lgtm

Entry::Vacant(e) => {
if let Some(min) = other.get(Stat::Max) {
e.insert(min.clone());
fn merge_ordered<F: Fn(&Scalar, &Scalar) -> bool>(&mut self, stat: Stat, other: &Self, cmp: F) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a docstring, eg

/// Merges stats if both are present, if either stat is not present, drops the stat from the result set. For example, if we know the minimums of two arrays, the minimum of their union is the minimum-of-minimums, but if we only know the minimum of one of the two arrays, we do not know the minimum of their union.

e.insert(min.clone());
}
if let Entry::Occupied(mut e) = self.values.entry(stat) {
if let Some(other_value) = other.get_as::<usize>(stat) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters because negative values will panic on the coercion to usize, but this will mutate the PType of the scalars to be merged -- ie if we merge an u32 with an u32, the result of this will be a u64.

Also maybe worth renaming this method to merge_count_stat

@@ -213,3 +193,29 @@ impl IntoIterator for StatsSet {
self.values.into_iter()
}
}

#[cfg(test)]
mod test {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want tests for the other codepaths as well

@robert3005 robert3005 merged commit f1242b5 into develop May 8, 2024
2 checks passed
@robert3005 robert3005 deleted the rk/chunkedstats branch May 8, 2024 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants