Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Eq, PartialEq, Hash for dyn PhysicalExpr #13005

Merged
merged 7 commits into from
Nov 5, 2024

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Oct 18, 2024

Which issue does this PR close?

Part of #12599.

Rationale for this change

Common subexpression elimination will require Arc<dyn PhysicalExpr> tree nodes to implement Eq, PartialEq, Hash.

What changes are included in this PR?

This PR:

  • Refactors PhysicalExpr trait and moves dyn_hash() to a new DynHash trait.
  • Adds blanket implementation of DynHash for types that implement Hash. This enables us to drop most of the previous dyn_hash() boilerplate implementations.
  • Introduces DynEq trait with dyn_eq().
  • Adds blanket implementation of DynEq for types that implement Eq.

Please note that:

  • This PR contains an API breaking change that dyn_hash() is moved from PhysicalExpr to separate DynHash trait.
  • This PR contains manual implementation of PartialEq and Hash for some dyn PhysicalExpr types (e.g. BinaryExpr) where adding just #[derive(PartialEq, Hash)] would be straightforward at first glance.
    This is due to a Rust bug: Error on deriving PartialEq on Foo and then implementing it for dyn Foo rust-lang/rust#78808 that the derive macro would generate code with the following issue:
       error[E0507]: cannot move out of `other.left` which is behind a shared reference
      --> datafusion/physical-expr/src/expressions/binary.rs:53:5
       |
    51 | #[derive(Debug, Hash, Clone, Eq, PartialEq)]
       |                                  --------- in this derive macro expansion
    52 | pub struct BinaryExpr {
    53 |     left: Arc<dyn PhysicalExpr>,
       |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `other.left` has type `Arc<dyn PhysicalExpr>`, which does not implement the `Copy` trait
    

Are these changes tested?

Yes, with existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate proto Related to proto crate labels Oct 18, 2024
@peter-toth
Copy link
Contributor Author

cc @alamb

@alamb
Copy link
Contributor

alamb commented Oct 21, 2024

On my list for review tomorrow

@alamb alamb added the api change Changes the API exposed to users of the crate label Oct 24, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @peter-toth -- this PR is a pretty nice improvement in its own right (even without the improved CSE).

Let's see if we can avoid making BinaryExpr generic

@@ -525,20 +519,6 @@ impl PhysicalExpr for BinaryExpr {
}
}

impl PartialEq<dyn Any> for BinaryExpr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this is great

pub struct BinaryExpr {
left: Arc<dyn PhysicalExpr>,
#[derive(Debug, Hash, Clone, Eq, PartialEq)]
pub struct BinaryExpr<DynPhysicalExpr: ?Sized = dyn PhysicalExpr> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the boiler plate in this PR is great ❤️

I feel like this additional trait / workaround is non ideal as it makes understanding the BinaryExpr very unobvious (there is now a level of indirection that is irrelevant for most of users of BinaryExpr).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am playing around with it to see if I can avoid it somehow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one way that seems to work: peter-toth#5

More verbose, but I think it keeps the structs simpler to understand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we don't need to keep the generic parameter workaround. I agree that those params look a bit weird.
peter-toth#5 looks good to me, if you extend it to other expressions I can merge it tomorrow and amend my PR description.

Copy link
Contributor

@alamb alamb Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github copilot and I banged out the code. However, I think I accidentally push the commits to your branch 😬 -- hopefully that is ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. 😁 I've just updated the PR description.

@@ -183,6 +151,39 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + PartialEq<dyn Any> {
}
}

pub trait DynEq {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add some documentation here explaining why this is needed and what it is used for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in f1917fa.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @peter-toth

I think this is an improvement in my mind, even outside the context of CSE improvements, as it permits PhysicalExpr to implement the standard PartialEq rather than a more special DynEq / any pattern

Perhaps @akurmustafa / @mustafasrepo may have some additonal thoughts as I think they added the original dyn hash work

cc @berkaysynnada

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR brings significant improvements to the API, enhancing both code readability and usability. The clarity of the trait has increased, making it more intuitive to work with. LGTM, Thank you @peter-toth.

@@ -80,7 +79,7 @@ enum EvalMethod {
/// [WHEN ...]
/// [ELSE result]
/// END
#[derive(Debug, Hash)]
#[derive(Debug, Hash, PartialEq, Eq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it permit auto derivation of PartialEq for CaseExpr 🤔 -- we need manual impl's for others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is probably the strangest part of the bug. If you have double wrappers around dyn Trait then there is no issue: rust-lang/rust#78808 (comment)
CaseExpr doesn't have any dyn Trait fields with a single wrapper so the derive macro has no problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a ticket to remove those impl's once the bug is resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about #13196?

# Conflicts:
#	datafusion/physical-expr/src/expressions/not.rs
@alamb
Copy link
Contributor

alamb commented Oct 31, 2024

What I would like to propose for this PR is wait until we have made the next release: #12470 and then we can merge it after that

I am thinking of how we can spread out the API upgrade pain

@alamb
Copy link
Contributor

alamb commented Nov 5, 2024

The 43.0.0 release candidate has been made and we have started voting on it. I merged this branch up from main locally to ensure everything still compiles. It looks good so this is ready to go 🏁

THanks again @peter-toth and @berkaysynnada

@alamb alamb merged commit cc43766 into apache:main Nov 5, 2024
24 checks passed
@peter-toth
Copy link
Contributor Author

Thanks for review @alamb and @berkaysynnada.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate physical-expr Physical Expressions proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants