You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-51738][SQL][FOLLOWUP] Fix HashJoin to accept structurally-equal types
### What changes were proposed in this pull request?
This is a follow-up of #50537.
Fixes `HashJoin` to accept structurally-equal types.
### Why are the changes needed?
#50537 relaxed the requirement for binary comparison, so should `HashJoin`; otherwise, it can fail with `IllegalArgumentException`.
For example, in `SubquerySuite`:
```scala
sql("""
|SELECT foo IN (SELECT struct(c, d) FROM r)
|FROM (SELECT struct(a, b) foo FROM l)
|""".stripMargin).show()
```
fails with:
```
[info] java.lang.IllegalArgumentException: requirement failed: Join keys from two sides should have same length and types
[info] at scala.Predef$.require(Predef.scala:337)
[info] at org.apache.spark.sql.execution.joins.HashJoin.org$apache$spark$sql$execution$joins$HashJoin$$x$6(HashJoin.scala:115)
[info] at org.apache.spark.sql.execution.joins.HashJoin.org$apache$spark$sql$execution$joins$HashJoin$$x$6$(HashJoin.scala:110)
[info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$6$lzycompute(BroadcastHashJoinExec.scala:40)
[info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$6(BroadcastHashJoinExec.scala:40)
[info] at org.apache.spark.sql.execution.joins.HashJoin.buildKeys(HashJoin.scala:110)
[info] at org.apache.spark.sql.execution.joins.HashJoin.buildKeys$(HashJoin.scala:110)
[info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:40)
[info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:40)
[info] at org.apache.spark.sql.execution.joins.HashJoin.buildBoundKeys(HashJoin.scala:130)
[info] at org.apache.spark.sql.execution.joins.HashJoin.buildBoundKeys$(HashJoin.scala:129)
[info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildBoundKeys$lzycompute(BroadcastHashJoinExec.scala:40)
[info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildBoundKeys(BroadcastHashJoinExec.scala:40)
[info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:63)
...
```
### Does this PR introduce _any_ user-facing change?
Yes, `HashJoin` will work.
### How was this patch tested?
Added the related test.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#50549 from ueshin/issues/SPARK-51738/hashjoin.
Authored-by: Takuya Ueshin <ueshin@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
0 commit comments