Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and/or compute functions #481

Merged
merged 21 commits into from
Jul 22, 2024
Merged

Conversation

AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Jul 18, 2024

No description provided.

@AdamGS AdamGS marked this pull request as draft July 18, 2024 12:19
vortex-array/src/array/bool/mod.rs Outdated Show resolved Hide resolved
vortex-datafusion/src/eval.rs Outdated Show resolved Hide resolved
vortex-datafusion/src/eval.rs Outdated Show resolved Hide resolved
@AdamGS AdamGS force-pushed the adamg/shortcircuit-filtering-pushdown branch from b26d28e to acef812 Compare July 18, 2024 16:29
@AdamGS AdamGS requested a review from gatesn July 19, 2024 09:22
@AdamGS AdamGS changed the title Short-circuit some logical operations Add and/or compute functions Jul 19, 2024
@AdamGS AdamGS marked this pull request as ready for review July 19, 2024 10:12
@AdamGS AdamGS force-pushed the adamg/shortcircuit-filtering-pushdown branch from 202bf1a to c2727b7 Compare July 19, 2024 10:31
@AdamGS AdamGS force-pushed the adamg/shortcircuit-filtering-pushdown branch from c2727b7 to 6257749 Compare July 19, 2024 10:31
gatesn
gatesn previously requested changes Jul 19, 2024
vortex-array/Cargo.toml Outdated Show resolved Hide resolved
vortex-array/src/compute/boolean.rs Show resolved Hide resolved
vortex-array/src/compute/boolean.rs Outdated Show resolved Hide resolved
vortex-array/src/compute/mod.rs Outdated Show resolved Hide resolved
@AdamGS AdamGS requested a review from gatesn July 19, 2024 14:32
Cargo.toml Outdated Show resolved Hide resolved
vortex-array/src/array/constant/compute.rs Outdated Show resolved Hide resolved
vortex-array/src/array/constant/compute.rs Outdated Show resolved Hide resolved
let data = PrimitiveArray::from(mem::take(&mut self.data));
pub fn finish(mut self, dtype: DType) -> VarBinArray {
let offsets = PrimitiveArray::from(self.offsets);
let data = PrimitiveArray::from(Vec::from(self.data.freeze()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't zero copy is it? We can add a function to PrimitiveArray to construct directly from a buffer (for PType==u8?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, @robert3005 did we discuss VarBin data being a buffer instead of a child array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we thought about it but then we wanted to have something like ZstdEncoding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m starting to think general purpose compression can be configured on buffers at write-time though; using the layouts mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not zero-copy, but still pretty cheap IMO. Constructing things from Bytes is hard because there's no guarantee the instance is exclusive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But PrimitiveArray wraps a vortex-buffer, which itself wraps Bytes. So this copy is purely because the right API isn't exposed / isn't used

Copy link
Contributor Author

@AdamGS AdamGS Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, fixing. Added a from_bytes function.

vortex-array/src/compute/boolean.rs Show resolved Hide resolved
&& constant_array.len() == other.len()
{
if let Ok(array) = ConstantArray::try_from(other.clone()) {
let lhs = constant_array.scalar().value().as_bool()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe first check if either scalar is_null, and then I think the more canonical conversion would be bool::try_from(array.scalar())?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should chat through the scalar API next week, it's a bit weird, you might have some ideas to improve


Ok(ConstantArray::new(scalar, constant_array.len()).into_array())
} else {
AndFn::and(&constant_array.clone().into_bool()?, other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this fallback will fail (into_bool when it's not?), or into_bool does an expensive expansion of the constant.

I think the best thing is to have the convention that constant goes on the RHS, so we should fallback to and(other, constant_array) to allow the RHS encoding a chance to be efficient, e.g. if it's run-end it only needs to run over the values.

Copy link
Contributor Author

@AdamGS AdamGS Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't actually fallback into AndFn here because this might also be an or, I'm trying to think of a better way of generalizing boolean ops here (I do agree with trying to let rhs to give us a more efficient implementation) Found a solution I'm happy with here.

Copy link
Member

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of small comments

vortex-array/Cargo.toml Outdated Show resolved Hide resolved
Comment on lines +15 to +16
let rhs = array.clone().into_canonical()?.into_arrow();
let rhs = rhs.as_boolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe dispatch on the right hand side instead of converting to arrow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely possible, but I think we might be missing some larger abstraction here to solve the whole left/right issue

vortex-array/src/array/constant/compute.rs Show resolved Hide resolved
vortex-array/src/array/constant/compute.rs Outdated Show resolved Hide resolved
@AdamGS AdamGS enabled auto-merge (squash) July 22, 2024 16:34
@AdamGS AdamGS disabled auto-merge July 22, 2024 16:34
@AdamGS AdamGS dismissed gatesn’s stale review July 22, 2024 16:35

Addressed most things

@AdamGS AdamGS merged commit 062c817 into develop Jul 22, 2024
2 checks passed
@AdamGS AdamGS deleted the adamg/shortcircuit-filtering-pushdown branch July 22, 2024 16:35
AdamGS added a commit that referenced this pull request Jul 22, 2024
Doesn't quite get us where we want performance-wise, but does seem much
better.
Follow up of #481.

---------

Co-authored-by: Robert Kruszewski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants