-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FilterIndices compute function #326
Conversation
MergeOp::All( | ||
conj.predicates | ||
.iter() | ||
.map(|pred| indices_matching_predicate(self, pred).unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to echo my understanding:
right now, this instantiates a byte per value per predicate, then converts those Vec<bool>
instances to iterators, then calls collect_vec
again...? So we have a Vec of iterators (which are backed by Vecs), basically (N + 1) allocations of self.len
bytes for N predicates, which get passed to MergeOp.
MergeOp is itself an iterator, which we collect in order to force evaluation of the All
predicate, then we do it all over again with a cycle of iterators and collect calls to evaluate the Any
predicate, enumerate/filter/collect to get indices.
I'm concerned that that's an awful lot of heap allocations on an extremely hot code path. From a machine efficiency point-of-view, we would ideally have at most 2 allocations and end up producing SIMD instructions to do bitwise AND on bitmaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point -- initially I wanted to avoid doing pairwise reductions in favor of a single-pass, but the allocations/vec overhead here might outweigh that benefit anyway. I'll rewrite this to be fully lazy and use bitmaps instead of vecs.
The one thing I'm not sure of here is whether we have a 64-bit bitmap in the rust croaring crate -- at first glance I didn't see one, will take a closer look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's not actually that hot. It's an allocation per predicate, so an expression of (X > 2 & X < 10) is two predicates, regardless of how big the array X is.
With the right bitset utility, you should be able to mutate-in-place instead of allocating a third bitset for the result. But without benchmarking, it's hard to intuit which will perform better.
Calculates which indices match a given predicate. Includes an implementation for for PrimitiveArray.