Skip to content

Commit a9987a3

Browse files
ueshincloud-fan
authored andcommitted
[SPARK-51738][SQL][FOLLOWUP] Fix HashJoin to accept structurally-equal types
### What changes were proposed in this pull request? This is a follow-up of #50537. Fixes `HashJoin` to accept structurally-equal types. ### Why are the changes needed? #50537 relaxed the requirement for binary comparison, so should `HashJoin`; otherwise, it can fail with `IllegalArgumentException`. For example, in `SubquerySuite`: ```scala sql(""" |SELECT foo IN (SELECT struct(c, d) FROM r) |FROM (SELECT struct(a, b) foo FROM l) |""".stripMargin).show() ``` fails with: ``` [info] java.lang.IllegalArgumentException: requirement failed: Join keys from two sides should have same length and types [info] at scala.Predef$.require(Predef.scala:337) [info] at org.apache.spark.sql.execution.joins.HashJoin.org$apache$spark$sql$execution$joins$HashJoin$$x$6(HashJoin.scala:115) [info] at org.apache.spark.sql.execution.joins.HashJoin.org$apache$spark$sql$execution$joins$HashJoin$$x$6$(HashJoin.scala:110) [info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$6$lzycompute(BroadcastHashJoinExec.scala:40) [info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$6(BroadcastHashJoinExec.scala:40) [info] at org.apache.spark.sql.execution.joins.HashJoin.buildKeys(HashJoin.scala:110) [info] at org.apache.spark.sql.execution.joins.HashJoin.buildKeys$(HashJoin.scala:110) [info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:40) [info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:40) [info] at org.apache.spark.sql.execution.joins.HashJoin.buildBoundKeys(HashJoin.scala:130) [info] at org.apache.spark.sql.execution.joins.HashJoin.buildBoundKeys$(HashJoin.scala:129) [info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildBoundKeys$lzycompute(BroadcastHashJoinExec.scala:40) [info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildBoundKeys(BroadcastHashJoinExec.scala:40) [info] at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:63) ... ``` ### Does this PR introduce _any_ user-facing change? Yes, `HashJoin` will work. ### How was this patch tested? Added the related test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50549 from ueshin/issues/SPARK-51738/hashjoin. Authored-by: Takuya Ueshin <ueshin@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent c8cdeec commit a9987a3

File tree

2 files changed

+11
-3
lines changed

2 files changed

+11
-3
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,10 @@ import org.apache.spark.sql.catalyst.expressions.codegen._
2525
import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight, BuildSide}
2626
import org.apache.spark.sql.catalyst.plans._
2727
import org.apache.spark.sql.catalyst.plans.physical.Partitioning
28-
import org.apache.spark.sql.catalyst.types.DataTypeUtils
2928
import org.apache.spark.sql.errors.QueryExecutionErrors
3029
import org.apache.spark.sql.execution.{CodegenSupport, ExplainUtils, RowIterator}
3130
import org.apache.spark.sql.execution.metric.SQLMetric
32-
import org.apache.spark.sql.types.{BooleanType, IntegralType, LongType}
31+
import org.apache.spark.sql.types.{BooleanType, DataType, IntegralType, LongType}
3332

3433
/**
3534
* @param relationTerm variable name for HashedRelation
@@ -111,7 +110,7 @@ trait HashJoin extends JoinCodegenSupport {
111110
require(leftKeys.length == rightKeys.length &&
112111
leftKeys.map(_.dataType)
113112
.zip(rightKeys.map(_.dataType))
114-
.forall(types => DataTypeUtils.sameType(types._1, types._2)),
113+
.forall(types => DataType.equalsStructurally(types._1, types._2, ignoreNullability = true)),
115114
"Join keys from two sides should have same length and types")
116115
buildSide match {
117116
case BuildLeft => (leftKeys, rightKeys)

sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2836,5 +2836,14 @@ class SubquerySuite extends QueryTest
28362836
sql("SELECT foo IN (SELECT struct(1 a)) FROM (SELECT struct(1 b) foo)"),
28372837
Row(true)
28382838
)
2839+
2840+
checkAnswer(
2841+
sql("""
2842+
|SELECT foo IN (SELECT struct(c, d) FROM r)
2843+
|FROM (SELECT struct(a, b) foo FROM l)
2844+
|""".stripMargin),
2845+
Row(false) :: Row(false) :: Row(false) :: Row(false) :: Row(false)
2846+
:: Row(true) :: Row(true) :: Row(true) :: Nil
2847+
)
28392848
}
28402849
}

0 commit comments

Comments
 (0)