-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: optimise outer joins #15840
feat: optimise outer joins #15840
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15840 +/- ##
==========================================
+ Coverage 68.24% 68.25% +0.01%
==========================================
Files 1562 1562
Lines 197171 197300 +129
==========================================
+ Hits 134550 134669 +119
- Misses 62621 62631 +10 ☔ View full report in Codecov by Sentry. |
func IsOuter(outer JoinOp) bool { | ||
return !outer.IsInner() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to define this on the JoinOp
type? I'm not really sure when we prefer that over a standalone function, so I'm asking mostly out of curiosity. 🙇♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider that, but that forces all implementations to have to have an implementation of this method, and that seems silly.
If golang had some way of defining method implementations on interfaces, I'd use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be an issue with TestAliasesInOuterJoinQueries
when running an outer join where the column names returned are different than what MySQL is sending back. Instead of "t0", "t1", "col"
we are getting "t0", "t0", "col"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no end to end test for this change.
I see MySQL syntax error for
select * from t limit 1 + 2;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '+ 2' at line 1
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
6628e84
to
7c2b2ee
Compare
Signed-off-by: Andres Taylor <[email protected]>
7c2b2ee
to
d8b5aba
Compare
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
if len(op.Source.GetColumns(ctx)) == len(cols) && offsetInInputOrder(cols) { | ||
cols = nil | ||
} | ||
return newSimpleProjection(cols, colNames, src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still do better here by changing a little more on simple projection like checking len(op.Source.GetColumns(ctx)) >= len(cols)
. And Simple project truncates from the end.
can leave a TODO here if you think it make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the columns need to be of the expected size. we should not return more columns than what the user asked for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can do again minimal work of truncate. not important for this PR
Description
This PR introduces enhancements to the query planner, enabling the push down of
LIMIT
clauses to the right-hand side (RHS) of joins, including outer joins, when the expressions permit. Additionally, logic has been added to push downLIMIT
under routes when applicable.These changes aim to optimize query efficiency and resource utilization, ensuring faster query processing and lower overhead in the presence of outer joins.
For inner joins, we can't push down
LIMIT
on the left-hand side (LHS) of our nested loop joins (ApplyJoin). For a join to produce a row, we need a match on both sides, and we don't know how many rows we need from the LHS for the join to produce sufficient rows to satisfy the user-setLIMIT
. The best we can do is push aLIMIT
on the RHS and then alsoLIMIT
on top of the join.On the other hand, for left outer joins, we know that every row from the LHS will be returned from the join. This means we can push down the
LIMIT
to the LHS of the join. This is really good and important - we want to run the RHS of the join as few times as possible.Example
The first, unoptimized version of the query with the
LIMIT
looks like this:Now we rewrite by doing iterative tree rewrites until we reach a fixed point. It will look something like this:
Here are the reasons for the different
LIMIT
operations:LIMIT
. It ensures that the final results do not contain more than 10 rows.LIMIT
on top of the LHS of the JOIN.LIMIT
under the route. 🎉LIMIT
on top of the Route to minimize the number of rows processed by the join.Related Issue(s)
Fixes #15828
Checklist