Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilterIndices compute function #326

Merged
merged 17 commits into from
May 20, 2024
Merged

FilterIndices compute function #326

merged 17 commits into from
May 20, 2024

Conversation

jdcasale
Copy link
Contributor

@jdcasale jdcasale commented May 16, 2024

Calculates which indices match a given predicate. Includes an implementation for for PrimitiveArray.

@jdcasale jdcasale changed the title [WIP] FilterIndices compute function [WIP] FilterIndices compute function for PrimitiveArray May 16, 2024
@jdcasale jdcasale changed the title [WIP] FilterIndices compute function for PrimitiveArray [WIP] FilterIndices compute function May 16, 2024
@jdcasale jdcasale marked this pull request as ready for review May 16, 2024 17:26
@jdcasale jdcasale changed the title [WIP] FilterIndices compute function FilterIndices compute function May 16, 2024
vortex-array/src/array/primitive/compute/filter_indices.rs Outdated Show resolved Hide resolved
vortex-array/src/array/primitive/compute/filter_indices.rs Outdated Show resolved Hide resolved
vortex-array/src/array/primitive/compute/filter_indices.rs Outdated Show resolved Hide resolved
MergeOp::All(
conj.predicates
.iter()
.map(|pred| indices_matching_predicate(self, pred).unwrap())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to echo my understanding:

right now, this instantiates a byte per value per predicate, then converts those Vec<bool> instances to iterators, then calls collect_vec again...? So we have a Vec of iterators (which are backed by Vecs), basically (N + 1) allocations of self.len bytes for N predicates, which get passed to MergeOp.

MergeOp is itself an iterator, which we collect in order to force evaluation of the All predicate, then we do it all over again with a cycle of iterators and collect calls to evaluate the Any predicate, enumerate/filter/collect to get indices.

I'm concerned that that's an awful lot of heap allocations on an extremely hot code path. From a machine efficiency point-of-view, we would ideally have at most 2 allocations and end up producing SIMD instructions to do bitwise AND on bitmaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point -- initially I wanted to avoid doing pairwise reductions in favor of a single-pass, but the allocations/vec overhead here might outweigh that benefit anyway. I'll rewrite this to be fully lazy and use bitmaps instead of vecs.

The one thing I'm not sure of here is whether we have a 64-bit bitmap in the rust croaring crate -- at first glance I didn't see one, will take a closer look tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's not actually that hot. It's an allocation per predicate, so an expression of (X > 2 & X < 10) is two predicates, regardless of how big the array X is.

With the right bitset utility, you should be able to mutate-in-place instead of allocating a third bitset for the result. But without benchmarking, it's hard to intuit which will perform better.

vortex-array/src/array/primitive/compute/filter_indices.rs Outdated Show resolved Hide resolved
vortex-array/src/compute/filter_indices.rs Outdated Show resolved Hide resolved
vortex-dtype/src/field_paths.rs Outdated Show resolved Hide resolved
@jdcasale jdcasale marked this pull request as draft May 16, 2024 22:23
@jdcasale jdcasale marked this pull request as ready for review May 17, 2024 16:07
@jdcasale jdcasale merged commit 8b6606a into develop May 20, 2024
3 checks passed
@jdcasale jdcasale deleted the jc/filter-indices branch May 20, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants