Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49618][SQL]: Union & UnionExec nodes equality not take into account unaligned positions of branches causing cache miss and non reuse of exchange #48094

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ahshahid
Copy link

What changes were proposed in this pull request?

A Trait UnionEquality is introduced which is implemented by Union and UnionExec nodes. It contains code to check equality of Union node legs in an order agnostic manner and also hashCode independent of the order of the legs. The equality does consider if the output attributes of the head nodes are same in terms of name, datatype, metadata, nullability etc (but not exprIDs).
It is true that converting Sequence of Legs into set to get order agnostic hashCode can result in situation like:
Seq(leg1, leg2) and Seq(leg1, leg2, leg2) to have same hashCode when converted to Set, but that should not cause logical problem as equality checks for length.
Though if we want to avoid hash collision in that situation, the code can be changed to
Objects.hashCode(Seq(leg1, leg2).map(_.hashCode).sorted: _*)

Why are the changes needed?

Because of the way the equality of Union nodes behave currently, changing the order of the legs, will cause cache miss and reuse of exchange not happening, as the canonicalized plans will not match.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added tests to check the equality of Union and UnionExec nodes with unaligned order of the legs.
Added test to verify cache lookup of InMemoryRelation and reuse of exchange.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant