Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FilterFn trait + default implementation #458

Merged
merged 4 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions vortex-array/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ workspace = true
arrow-array = { workspace = true }
arrow-buffer = { workspace = true }
arrow-cast = { workspace = true }
arrow-select = { workspace = true }
arrow-schema = { workspace = true }
enum-iterator = { workspace = true }
flatbuffers = { workspace = true }
Expand Down
5 changes: 2 additions & 3 deletions vortex-array/src/canonical.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,10 @@ use crate::{Array, ArrayDType, IntoArray, ToArray};
///
/// Binary and String views are a new, better encoding format for nearly all use-cases. For now,
/// because DataFusion does not include pervasive support for compute over StringView, we opt to use
/// the [`VarBinArray`] as the canonical encoding (which corresponds to the Arrow
/// [`BinaryViewArray`]).
/// the [`VarBinArray`] as the canonical encoding (which corresponds to the Arrow `BinaryViewArray`).
///
/// We expect to change this soon once DataFusion is able to finish up some initial support, which
/// is tracked in https://github.com/apache/datafusion/issues/10918.
/// is tracked in <https://github.com/apache/datafusion/issues/10918>.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were cleanups to make cargo doc -p vortex-array complete cleanly

#[derive(Debug, Clone)]
pub enum Canonical {
Null(NullArray),
Expand Down
78 changes: 78 additions & 0 deletions vortex-array/src/compute/filter.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
use arrow_array::cast::AsArray;
use vortex_dtype::{DType, Nullability};
use vortex_error::VortexResult;

use crate::arrow::FromArrowArray;
use crate::{Array, ArrayDType, ArrayData, IntoArray, IntoCanonical};

pub trait FilterFn {
/// Filter an array by the provided predicate.
fn filter(&self, predicate: &Array) -> Array;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving to this taking a &dyn BoolArrayTrait?

}

/// Return a new array by applying a boolean predicate to select items from a base Array.
///
/// # Performance
///
/// This function attempts to amortize the cost of copying
///
/// # Panics
///
/// The `predicate` must receive an Array with type non-nullable bool, and will panic if this is
/// not the case.
pub fn filter(array: &Array, predicate: &Array) -> VortexResult<Array> {
assert_eq!(
predicate.dtype(),
&DType::Bool(Nullability::NonNullable),
"predicate must be non-nullable bool"
);
assert_eq!(
predicate.len(),
array.len(),
"predicate.len() must equal array.len()"
);

array.with_dyn(|a| {
if let Some(ref filter_fn) = a.filter() {
Ok(filter_fn.filter(array))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be filter_fn.filter(predicate)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill fix it in a follow up.

} else {
// Fallback: implement using Arrow kernels.
let array_ref = array.clone().into_canonical()?.into_arrow();
let predicate_ref = predicate.clone().into_canonical()?.into_arrow();
let filtered =
arrow_select::filter::filter(array_ref.as_ref(), predicate_ref.as_boolean())?;

Ok(ArrayData::from_arrow(filtered, array.dtype().is_nullable()).into_array())
}
})
}

#[cfg(test)]
mod test {
use crate::array::bool::BoolArray;
use crate::array::primitive::PrimitiveArray;
use crate::compute::filter::filter;
use crate::validity::Validity;
use crate::{IntoArray, IntoCanonical};

#[test]
fn test_filter() {
let items =
PrimitiveArray::from_nullable_vec(vec![Some(0i32), None, Some(1i32), None, Some(2i32)])
.into_array();
let predicate =
BoolArray::from_vec(vec![true, false, true, false, true], Validity::NonNullable)
.into_array();

let filtered = filter(&items, &predicate).unwrap();
assert_eq!(
filtered
.into_canonical()
.unwrap()
.into_primitive()
.unwrap()
.into_maybe_null_slice::<i32>(),
vec![0i32, 1i32, 2i32]
);
}
}
16 changes: 16 additions & 0 deletions vortex-array/src/compute/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
//! Compute kernels on top of Vortex Arrays.
//!
//! We aim to provide a basic set of compute kernels that can be used to efficiently index, slice,
//! and filter Vortex Arrays in their encoded forms.
//!
//! Every [array variant][crate::ArrayTrait] has the ability to implement their own efficient
//! implementations of these operators, else we will decode, and perform the equivalent operator
//! from Arrow.

use compare::CompareFn;
use search_sorted::SearchSortedFn;
use slice::SliceFn;
Expand All @@ -8,14 +17,17 @@ use self::unary::cast::CastFn;
use self::unary::fill_forward::FillForwardFn;
use self::unary::scalar_at::ScalarAtFn;
use self::unary::scalar_subtract::SubtractScalarFn;
use crate::compute::filter::FilterFn;

pub mod compare;
mod filter;
pub mod filter_indices;
pub mod search_sorted;
pub mod slice;
pub mod take;
pub mod unary;

/// Trait providing compute functions on top of
pub trait ArrayCompute {
fn cast(&self) -> Option<&dyn CastFn> {
None
Expand All @@ -29,6 +41,10 @@ pub trait ArrayCompute {
None
}

fn filter(&self) -> Option<&dyn FilterFn> {
None
}

fn filter_indices(&self) -> Option<&dyn FilterIndicesFn> {
None
}
Expand Down
13 changes: 12 additions & 1 deletion vortex-array/src/compute/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,22 @@ use vortex_error::{vortex_bail, vortex_err, VortexResult};

use crate::Array;

/// Limit array to start..stop range
/// Limit array to start...stop range
pub trait SliceFn {
fn slice(&self, start: usize, stop: usize) -> VortexResult<Array>;
}

/// Return a zero-copy slice of an array, between `start` (inclusive) and `end` (exclusive).
///
/// # Panics
///
/// Slicing will panic if you attempt to slice a range that exceeds the bounds of the
/// underlying array.
///
/// # Errors
///
/// Slicing returns an error if the underlying codec's [slice](SliceFn::slice()) implementation
/// returns an error.
pub fn slice(array: &Array, start: usize, stop: usize) -> VortexResult<Array> {
check_slice_bounds(array, start, stop)?;

Expand Down
2 changes: 1 addition & 1 deletion vortex-array/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
//! Vortex crate containing core logic for encoding and memory representation of [arrays](Array).
//!
//! At the heart of Vortex are [arrays](Array) and [encodings](crate::encoding::EncodingCompression).
//! At the heart of Vortex are [arrays](Array) and [encodings](crate::encoding::ArrayEncoding).
//! Arrays are typed views of memory buffers that hold [scalars](vortex_scalar::Scalar). These
//! buffers can be held in a number of physical encodings to perform lightweight compression that
//! exploits the particular data distribution of the array's values.
Expand Down
Loading