Skip to content

Commit c8cdeec

Browse files
Pajarajapavle-martinovic_data
authored andcommitted
[SPARK-51751][SQL] Fix multiple rCTEs for one WITH statement that reference each other
### What changes were proposed in this pull request? Add a check to UnionLoopExec that the UnionLoopRef in its subtree is referring to the correct query. This is important since if we call another rCTE from it, it will also have a UnionLoopRef which needs to not be replaced with the reference. ### Why are the changes needed? Multiple rCTEs within one WITH statement ended in an infinite recursion. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New golden file test added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50544 from Pajaraja/pavle-martinovic_data/multiplerctesfix. Lead-authored-by: Pavle Martinovic <34302662+Pajaraja@users.noreply.github.com> Co-authored-by: pavle-martinovic_data <pavle.martinovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent c7ef21a commit c8cdeec

File tree

4 files changed

+79
-2
lines changed

4 files changed

+79
-2
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ case class UnionLoopExec(
188188
// This way we support only UNION ALL case. Additional case should be added for UNION case.
189189
// One way of supporting UNION case can be seen at SPARK-24497 PR from Peter Toth.
190190
val newRecursion = recursion.transform {
191-
case r: UnionLoopRef =>
191+
case r: UnionLoopRef if r.loopId == loopId =>
192192
val logicalPlan = prevDF.logicalPlan
193193
val optimizedPlan = prevDF.queryExecution.optimizedPlan
194194
val (stats, constraints) = rewriteStatsAndConstraints(logicalPlan, optimizedPlan)

sql/core/src/test/resources/sql-tests/analyzer-results/cte-recursion.sql.out

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1256,3 +1256,47 @@ WithCTE
12561256
+- Project [id#x, xid#x]
12571257
+- SubqueryAlias t
12581258
+- CTERelationRef xxxx, true, [id#x, xid#x], false, false
1259+
1260+
1261+
-- !query
1262+
WITH RECURSIVE t1(a, b) AS (
1263+
SELECT 1, 1
1264+
UNION ALL
1265+
SELECT a + b, a FROM t1 WHERE a < 20
1266+
),
1267+
t2(n) AS (
1268+
SELECT 1
1269+
UNION ALL
1270+
SELECT n + 1 FROM t2, t1 WHERE n + 1 = a
1271+
)
1272+
SELECT * FROM t2
1273+
-- !query analysis
1274+
WithCTE
1275+
:- CTERelationDef xxxx, false
1276+
: +- SubqueryAlias t1
1277+
: +- Project [1#x AS a#x, 1#x AS b#x]
1278+
: +- UnionLoop xxxx
1279+
: :- Project [1 AS 1#x, 1 AS 1#x]
1280+
: : +- OneRowRelation
1281+
: +- Project [(a#x + b#x) AS (a + b)#x, a#x]
1282+
: +- Filter (a#x < 20)
1283+
: +- SubqueryAlias t1
1284+
: +- Project [1#x AS a#x, 1#x AS b#x]
1285+
: +- UnionLoopRef xxxx, [1#x, 1#x], false
1286+
:- CTERelationDef xxxx, false
1287+
: +- SubqueryAlias t2
1288+
: +- Project [1#x AS n#x]
1289+
: +- UnionLoop xxxx
1290+
: :- Project [1 AS 1#x]
1291+
: : +- OneRowRelation
1292+
: +- Project [(n#x + 1) AS (n + 1)#x]
1293+
: +- Filter ((n#x + 1) = a#x)
1294+
: +- Join Inner
1295+
: :- SubqueryAlias t2
1296+
: : +- Project [1#x AS n#x]
1297+
: : +- UnionLoopRef xxxx, [1#x], false
1298+
: +- SubqueryAlias t1
1299+
: +- CTERelationRef xxxx, true, [a#x, b#x], false, false
1300+
+- Project [n#x]
1301+
+- SubqueryAlias t2
1302+
+- CTERelationRef xxxx, true, [n#x], false, false

sql/core/src/test/resources/sql-tests/inputs/cte-recursion.sql

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -486,4 +486,17 @@ WITH RECURSIVE
486486
UNION ALL
487487
SELECT t.id + 1, xid * 10 + x.id FROM t CROSS JOIN x WHERE t.id < 3
488488
)
489-
SELECT * FROM t
489+
SELECT * FROM t;
490+
491+
-- rCTE referencing other rCTE
492+
WITH RECURSIVE t1(a, b) AS (
493+
SELECT 1, 1
494+
UNION ALL
495+
SELECT a + b, a FROM t1 WHERE a < 20
496+
),
497+
t2(n) AS (
498+
SELECT 1
499+
UNION ALL
500+
SELECT n + 1 FROM t2, t1 WHERE n + 1 = a
501+
)
502+
SELECT * FROM t2;

sql/core/src/test/resources/sql-tests/results/cte-recursion.sql.out

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1178,3 +1178,23 @@ struct<id:int,xid:int>
11781178
3 212
11791179
3 221
11801180
3 222
1181+
1182+
1183+
-- !query
1184+
WITH RECURSIVE t1(a, b) AS (
1185+
SELECT 1, 1
1186+
UNION ALL
1187+
SELECT a + b, a FROM t1 WHERE a < 20
1188+
),
1189+
t2(n) AS (
1190+
SELECT 1
1191+
UNION ALL
1192+
SELECT n + 1 FROM t2, t1 WHERE n + 1 = a
1193+
)
1194+
SELECT * FROM t2
1195+
-- !query schema
1196+
struct<n:int>
1197+
-- !query output
1198+
1
1199+
2
1200+
3

0 commit comments

Comments
 (0)