Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Common Subexpression Elimination for PhysicalExpr trees #13046

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Oct 21, 2024

Please note that this PR is WIP and contains the changes of #13005 too. Once that PR get merged only the main commit of this PR remains.

Which issue does this PR close?

Part of #12599.

Rationale for this change

As described in #12599, there is a CSE rule for logical plans already, but some projects create physical plans directly that could benefit from physical CSE.

What changes are included in this PR?

This PR:

  • Adds EliminateCommonPhysicalSubexprs rule to eliminate common subtrees for Arc<dyn PhysicalExpr> trees. This initial implementation targets ProjectionExec nodes only. Follow-up PR can add support for other nodes like aggregates and filters.
  • Adds DynHashNode trait and implements it for PhysicalExprs.
  • Contains some code cleanup in CommonSubexprEliminate rule.

Are these changes tested?

Added new UTs.

Are there any user-facing changes?

No.

… shorter names for eliminator and controller, change `CSE::extract_common_nodes()` to return `Result<FoundCommonNodes<N>>` (instead of `Result<Transformed<FoundCommonNodes<N>>>`)
# Conflicts:
#	datafusion-cli/Cargo.lock
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate proto Related to proto crate labels Oct 21, 2024
@peter-toth
Copy link
Contributor Author

cc @alamb, @andygrove

# Conflicts:
#	datafusion/common/src/cse.rs
#	datafusion/optimizer/src/common_subexpr_eliminate.rs
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Oct 21, 2024
@andygrove andygrove changed the title Add Common Subexpression Elimination for PsysicalExpr trees Add Common Subexpression Elimination for PhysicalExpr trees Oct 22, 2024
@andygrove
Copy link
Member

Thanks @peter-toth. I plan on testing this out with Comet.

// The EliminateCommonPhysicalSubExprs rule extracts common physical
// subexpression trees into a `ProjectionExec` node under the actual node to
// calculate the common values only once.
Arc::new(EliminateCommonPhysicalSubexprs::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any examples of where this rule does eliminations for a plan that has been optimized by the corresponding logical rule?

Because if not it not clear to me why it should be part of the default/recommended list of rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usecase is for systems (like Comet) that don't use the LogicalPlanner.

That being said, I think the question of "should it be run by default" is a good one -- and I agree with your assertion if we can't find an example where this helps a plan we should probably not enable it by default.

FYI @andygrove

Copy link
Contributor Author

@peter-toth peter-toth Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this rule is useful only for Comet and similar usecases. Let me remove it from the default optimizer: d2529ce

@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Oct 23, 2024
# Conflicts:
#	datafusion/common/src/cse.rs
type CommonNodes<'n, N> = IndexMap<Identifier<'n, N>, (N, String)>;
/// A list that contains the common [`TreeNode`]s and their alias, extracted during the
/// second, rewriting traversal.
type CommonNodes<'n, N> = Vec<(N, String)>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this change of type from IndexMap to Vec in CommonNodes is that physical columns works with indexes rather than names. E.g. when we repace a common subexpression to a column during rewrite, we need both the name and the index of the common subexpression in the intermediate ProjectionExec node. Storing the index in NodeStats and using Vec in CommonNodes better fits this usecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants