Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49646][SQL] fix subquery decorrelation for union/set operation…
…s when parentOuterReferences has references not covered in collectedChildOuterReferences ### What changes were proposed in this pull request? fix bug when encounter union/setOp under limit/aggregation with filter predicates cannot pulled up directly in lateral join. eg: ``` create table IF NOT EXISTS t(t1 INT,t2 int) using json; CREATE TABLE IF NOT EXISTS a (a1 INT) using json; select 1 from t as t_outer left join lateral( select b1,b2 from ( select a.a1 as b1, 1 as b2 from a union select t_outer.t1 as b1, null as b2 ) as t_inner where (t_inner.b1 < t_outer.t2 or t_inner.b1 is null) and t_inner.b1 = t_outer.t1 order by t_inner.b1,t_inner.b2 desc limit 1 ) as lateral_table ``` ### Why are the changes needed? In general, spark cannot handle this query because: 1. Decorrelation logic tries to rewrite limit operator into Window aggregation and pull up correlated predicates, and Union operator is rewritten to have DomainJoin within its children with outer references. 2. When we're rewriting DomainJoin to real join execution, it needs attribute reference map based on pulled up correlated predicates to rewrite outer references in DomainJoin. However, each child of Union/SetOp operator are using different attribute references even they are referring to the same column of outer table. We need Union/SetOp output and its children output to map between these references. 3. Combined with aggregation and filters with inequality comparison, more outer references are remained within children of Union operator, and these references are not covered in Union/SetOp output which leads to lacking of information when we're trying to map different attributed references within children of Union/SetOp operator. More context -> please read this short investigation doc(I've changed the link and it's now public): https://docs.google.com/document/d/1_pJIi_8GuLHOXabLEgRy2e7OHw-OIBnWbwGwSkwIcxg/edit?usp=sharing ### Does this PR introduce _any_ user-facing change? yes, bug is fixed and the above query can be handled without error. ### How was this patch tested? added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #48109 from averyqi-db/averyqi-db/SPARK-49646. Authored-by: Avery Qi <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
- Loading branch information