FilterIndices compute function #326

jdcasale · 2024-05-16T16:23:18Z

Calculates which indices match a given predicate. Includes an implementation for for PrimitiveArray.

vortex-array/src/array/primitive/compute/filter_indices.rs

lwwmanning · 2024-05-16T20:41:30Z

vortex-array/src/array/primitive/compute/filter_indices.rs

+                MergeOp::All(
+                    conj.predicates
+                        .iter()
+                        .map(|pred| indices_matching_predicate(self, pred).unwrap())


just to echo my understanding:

right now, this instantiates a byte per value per predicate, then converts those Vec<bool> instances to iterators, then calls collect_vec again...? So we have a Vec of iterators (which are backed by Vecs), basically (N + 1) allocations of self.len bytes for N predicates, which get passed to MergeOp.

MergeOp is itself an iterator, which we collect in order to force evaluation of the All predicate, then we do it all over again with a cycle of iterators and collect calls to evaluate the Any predicate, enumerate/filter/collect to get indices.

I'm concerned that that's an awful lot of heap allocations on an extremely hot code path. From a machine efficiency point-of-view, we would ideally have at most 2 allocations and end up producing SIMD instructions to do bitwise AND on bitmaps.

This is a good point -- initially I wanted to avoid doing pairwise reductions in favor of a single-pass, but the allocations/vec overhead here might outweigh that benefit anyway. I'll rewrite this to be fully lazy and use bitmaps instead of vecs.

The one thing I'm not sure of here is whether we have a 64-bit bitmap in the rust croaring crate -- at first glance I didn't see one, will take a closer look tomorrow.

So it's not actually that hot. It's an allocation per predicate, so an expression of (X > 2 & X < 10) is two predicates, regardless of how big the array X is.

With the right bitset utility, you should be able to mutate-in-place instead of allocating a third bitset for the result. But without benchmarking, it's hard to intuit which will perform better.

vortex-array/src/array/primitive/compute/filter_indices.rs

vortex-array/src/compute/filter_indices.rs

vortex-dtype/src/field_paths.rs

initial sketch

f8cd016

robert3005 reviewed May 16, 2024

View reviewed changes

vortex-array/src/array/primitive/compute/filter_indices.rs Outdated Show resolved Hide resolved

robert3005 reviewed May 16, 2024

View reviewed changes

vortex-array/src/array/primitive/compute/filter_indices.rs Outdated Show resolved Hide resolved

robert3005 reviewed May 16, 2024

View reviewed changes

vortex-array/src/array/primitive/compute/filter_indices.rs Outdated Show resolved Hide resolved

robert3005 reviewed May 16, 2024

View reviewed changes

vortex-array/src/array/primitive/compute/filter_indices.rs Outdated Show resolved Hide resolved

jdcasale added 2 commits May 16, 2024 18:17

review

3a92740

nit

f316729

jdcasale changed the title ~~[WIP] FilterIndices compute function~~ [WIP] FilterIndices compute function for PrimitiveArray May 16, 2024

jdcasale changed the title ~~[WIP] FilterIndices compute function for PrimitiveArray~~ [WIP] FilterIndices compute function May 16, 2024

jdcasale marked this pull request as ready for review May 16, 2024 17:26

jdcasale changed the title ~~[WIP] FilterIndices compute function~~ FilterIndices compute function May 16, 2024

lwwmanning reviewed May 16, 2024

View reviewed changes

vortex-array/src/compute/filter_indices.rs Outdated Show resolved Hide resolved

vortex-dtype/src/field_paths.rs Outdated Show resolved Hide resolved

jdcasale added 2 commits May 16, 2024 23:13

fixes

7254956

nit

44f226c

jdcasale marked this pull request as draft May 16, 2024 22:23

jdcasale added 11 commits May 16, 2024 23:33

comment

655ea7a

temp

f9e9268

mergeops uses booleanbuffer and tree_fold

daf268e

nit

c31925d

use bitset instead of booleanbuffer because allocations

8c3a211

temp

9dc3ae3

use booleanbuffers (again), return boolarray

08720d4

nit

5fbaa2a

nit

affde69

nit

3f1c16a

nit

09b28b8

jdcasale marked this pull request as ready for review May 17, 2024 16:07

benchmark

180af05

gatesn approved these changes May 20, 2024

View reviewed changes

jdcasale merged commit 8b6606a into develop May 20, 2024
3 checks passed

jdcasale deleted the jc/filter-indices branch May 20, 2024 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilterIndices compute function #326

FilterIndices compute function #326

jdcasale commented May 16, 2024 •

edited

Loading

lwwmanning May 16, 2024

jdcasale May 16, 2024

gatesn May 18, 2024

FilterIndices compute function #326

FilterIndices compute function #326

Conversation

jdcasale commented May 16, 2024 • edited Loading

lwwmanning May 16, 2024

Choose a reason for hiding this comment

jdcasale May 16, 2024

Choose a reason for hiding this comment

gatesn May 18, 2024

Choose a reason for hiding this comment

jdcasale commented May 16, 2024 •

edited

Loading