-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unparsing plans after applying optimize_projections
rule
#13267
base: main
Are you sure you want to change the base?
Conversation
90b33e1
to
3b3f58e
Compare
|
||
/// When set to true, the `optimize_projections` rule will not attempt to move, add, or remove existing projections. | ||
/// This flag helps maintain the original structure of the `LogicalPlan` when converting it back into SQL via the `unparser` module. It ensures the query layout remains simple and readable, relying on the underlying SQL engine to apply its own optimizations during execution. | ||
pub optimize_projections_preserve_existing_projections: bool, default = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be better to make this flag more generic, for example, just preserve_existing_projections
or prefer_existing_plan_nodes
, so it can be reused in the future in similar cases
// Avoid creating a duplicate Projection node, which would result in an additional subquery if a projection already exists. | ||
// For example, if the `optimize_projection` rule is applied, there will be a Projection node, and duplicate projection | ||
// information included in the TableScan node. | ||
if !already_projected { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prevents from generating queries like SELECT a, b from (SELECT a, b from my_table)
.
@@ -882,6 +882,7 @@ fn test_table_scan_pushdown() -> Result<()> { | |||
let query_from_table_scan_with_projection = LogicalPlanBuilder::from( | |||
table_scan(Some("t1"), &schema, Some(vec![0, 1]))?.build()?, | |||
) | |||
.project(vec![col("id"), col("age")])? | |||
.project(vec![wildcard()])? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this actually be a real plan with a wildcard projection and a TableScan that includes only two columns? I would expect them to match. If this is a real use case I will improve logic above (check for parent projection is a wildcard or does not match). Running all TPC-H and TPC-DS queries I've not found query where it was the case.
https://github.com/apache/datafusion/pull/13267/files#r1830100022
3b3f58e
to
32da325
Compare
3ae32bf
to
808b99f
Compare
808b99f
to
37b5245
Compare
Which issue does this PR close?
The
optimize_projections
optimization is very useful as it pushes down projections to theTableScan
and ensures only required columns are fetched. This is useful when used alongside unparsing scenarios for plans involving multiple data sources, as the plan must be optimized to push down projections and fetch only the required columns. The downside of this process is that the rule modifies the original plan in a way that makes it difficult to unparse, and the resultant plan is not always optimal or efficient for unparsing use cases, for examplehttps://gist.github.com/sgrebnov/5071d2834e812b62bfdf434cf7e7e54c
Original query (TPC-DS Q72)
Plan and query after applying
optimize_projections
rule. Notice the additional projections added after joins.Rationale for this change
To support unparsing plans after
optimize_projections
is applied, it is proposed to add theoptimize_projections_preserve_existing_projections
configuration option to prevent the optimization logic from creating or removing projections and to preserve the original structure. It ensures the query layout remains simple and readable, relying on the underlying SQL engine to apply its own optimizations during execution.Are these changes tested?
Added test for
optimize_projections_preserve_existing_projections
configuration option. Unparsing have been tested by running all TPC-H and TPC-DS queries withoptimization_projections
enabled.Are there any user-facing changes?
Yes, a new
optimize_projections_preserve_existing_projections
configuration option has been introduced, which can be specified viaSessionConfig
or at a lower level usingOptimizerContext::new_with_options
.Or
There are no changes in default behavior.