Skip to content

Commit 2bf5b4b

Browse files
authored
chore(explain): Adds support for explain subquery in plans (#17861)
* Adds support for subquery representation in plans Signed-off-by: coldWater <forsaken628@gmail.com> * update optimizer test Signed-off-by: coldWater <forsaken628@gmail.com> * fix Signed-off-by: coldWater <forsaken628@gmail.com> --------- Signed-off-by: coldWater <forsaken628@gmail.com>
1 parent b8b2440 commit 2bf5b4b

File tree

13 files changed

+514
-375
lines changed

13 files changed

+514
-375
lines changed

src/query/service/tests/it/sql/planner/optimizer/data/README.md

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@ This directory contains test data for TPC-DS optimizer tests. The tests are stru
66

77
```
88
data
9-
├── tables/ # SQL table definitions
10-
└── yaml/ # YAML test case definitions
9+
├── tables/ # SQL table definitions
10+
├── statistics/ # SQL table definitions
11+
└── cases/ # YAML test case definitions and golden files
1112
```
1213

1314
## YAML Test Case Format
@@ -37,12 +38,6 @@ column_statistics: # Column statistics
3738
ndv: 10 # Number of distinct values
3839
null_count: 0 # Number of null values
3940

40-
raw_plan: | # Expected raw plan
41-
...
42-
43-
optimized_plan: | # Expected optimized plan
44-
...
45-
4641
good_plan: | # Optional expected good plan
4742
...
4843
```
@@ -63,5 +58,5 @@ To add a new test case:
6358

6459
If the expected output of a test changes (e.g., due to optimizer improvements):
6560

66-
1. Run the test to see the actual output.
67-
2. Update the `raw_plan`, `optimized_plan`, or `good_plan` field in the YAML file to match the actual output.
61+
1. Run the test with UPDATE_GOLDENFILES to generate new file.
62+
2. Checking that changes to files are as expected.
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
Raw plan:
2+
Limit
3+
├── limit: [100]
4+
├── offset: [0]
5+
└── Sort
6+
├── sort keys: [default.customer.c_customer_id (#79) ASC NULLS LAST]
7+
├── limit: [NONE]
8+
└── EvalScalar
9+
├── scalars: [customer.c_customer_id (#79) AS (#79)]
10+
└── Filter
11+
├── filters: [gt(ctr1.ctr_total_return (#48), SUBQUERY), eq(store.s_store_sk (#49), ctr1.ctr_store_sk (#7)), eq(store.s_state (#73), 'TN'), eq(ctr1.ctr_customer_sk (#3), customer.c_customer_sk (#78))]
12+
├── subquerys
13+
│ └── Subquery (Scalar)
14+
│ ├── output_column: derived.sum(ctr_total_return) / if(count(ctr_total_return) = 0, 1, count(ctr_total_return)) * 1.2 (#147)
15+
│ └── EvalScalar
16+
│ ├── scalars: [multiply(divide(sum(ctr_total_return) (#145), if(eq(count(ctr_total_return) (#146), 0), 1, count(ctr_total_return) (#146))), 1.2) AS (#147)]
17+
│ └── Aggregate(Initial)
18+
│ ├── group items: []
19+
│ ├── aggregate functions: [sum(ctr_total_return) AS (#145), count(ctr_total_return) AS (#146)]
20+
│ └── EvalScalar
21+
│ ├── scalars: [ctr2.ctr_total_return (#144) AS (#144), ctr2.ctr_total_return (#144) AS (#144)]
22+
│ └── Filter
23+
│ ├── filters: [eq(ctr1.ctr_store_sk (#7), ctr2.ctr_store_sk (#103))]
24+
│ └── EvalScalar
25+
│ ├── scalars: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103), Sum(sr_return_amt) (#144) AS (#144)]
26+
│ └── Aggregate(Initial)
27+
│ ├── group items: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103)]
28+
│ ├── aggregate functions: [Sum(sr_return_amt) AS (#144)]
29+
│ └── EvalScalar
30+
│ ├── scalars: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103), store_returns.sr_return_amt (#107) AS (#107)]
31+
│ └── Filter
32+
│ ├── filters: [eq(store_returns.sr_returned_date_sk (#96), date_dim.d_date_sk (#116)), eq(date_dim.d_year (#122), 2001)]
33+
│ └── Join(Cross)
34+
│ ├── build keys: []
35+
│ ├── probe keys: []
36+
│ ├── other filters: []
37+
│ ├── Scan
38+
│ │ ├── table: default.store_returns (#4)
39+
│ │ ├── filters: []
40+
│ │ ├── order by: []
41+
│ │ └── limit: NONE
42+
│ └── Scan
43+
│ ├── table: default.date_dim (#5)
44+
│ ├── filters: []
45+
│ ├── order by: []
46+
│ └── limit: NONE
47+
└── Join(Cross)
48+
├── build keys: []
49+
├── probe keys: []
50+
├── other filters: []
51+
├── Join(Cross)
52+
│ ├── build keys: []
53+
│ ├── probe keys: []
54+
│ ├── other filters: []
55+
│ ├── EvalScalar
56+
│ │ ├── scalars: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7), Sum(sr_return_amt) (#48) AS (#48)]
57+
│ │ └── Aggregate(Initial)
58+
│ │ ├── group items: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7)]
59+
│ │ ├── aggregate functions: [Sum(sr_return_amt) AS (#48)]
60+
│ │ └── EvalScalar
61+
│ │ ├── scalars: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7), store_returns.sr_return_amt (#11) AS (#11)]
62+
│ │ └── Filter
63+
│ │ ├── filters: [eq(store_returns.sr_returned_date_sk (#0), date_dim.d_date_sk (#20)), eq(date_dim.d_year (#26), 2001)]
64+
│ │ └── Join(Cross)
65+
│ │ ├── build keys: []
66+
│ │ ├── probe keys: []
67+
│ │ ├── other filters: []
68+
│ │ ├── Scan
69+
│ │ │ ├── table: default.store_returns (#0)
70+
│ │ │ ├── filters: []
71+
│ │ │ ├── order by: []
72+
│ │ │ └── limit: NONE
73+
│ │ └── Scan
74+
│ │ ├── table: default.date_dim (#1)
75+
│ │ ├── filters: []
76+
│ │ ├── order by: []
77+
│ │ └── limit: NONE
78+
│ └── Scan
79+
│ ├── table: default.store (#2)
80+
│ ├── filters: []
81+
│ ├── order by: []
82+
│ └── limit: NONE
83+
└── Scan
84+
├── table: default.customer (#3)
85+
├── filters: []
86+
├── order by: []
87+
└── limit: NONE
88+
89+
Optimized plan:
90+
Limit
91+
├── limit: [100]
92+
├── offset: [0]
93+
└── Sort
94+
├── sort keys: [default.customer.c_customer_id (#79) ASC NULLS LAST]
95+
├── limit: [100]
96+
└── EvalScalar
97+
├── scalars: [customer.c_customer_id (#79) AS (#79), ctr1.ctr_total_return (#48) AS (#154), scalar_subquery_147 (#147) AS (#155), store.s_store_sk (#49) AS (#156), ctr1.ctr_store_sk (#7) AS (#157), store.s_state (#73) AS (#158), ctr1.ctr_customer_sk (#3) AS (#159), customer.c_customer_sk (#78) AS (#160)]
98+
└── Join(Inner)
99+
├── build keys: [sr_store_sk (#103)]
100+
├── probe keys: [sr_store_sk (#7)]
101+
├── other filters: [gt(ctr1.ctr_total_return (#48), scalar_subquery_147 (#147))]
102+
├── Join(Inner)
103+
│ ├── build keys: [customer.c_customer_sk (#78)]
104+
│ ├── probe keys: [ctr1.ctr_customer_sk (#3)]
105+
│ ├── other filters: []
106+
│ ├── Aggregate(Final)
107+
│ │ ├── group items: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7)]
108+
│ │ ├── aggregate functions: [Sum(sr_return_amt) AS (#48)]
109+
│ │ └── Aggregate(Partial)
110+
│ │ ├── group items: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7)]
111+
│ │ ├── aggregate functions: [Sum(sr_return_amt) AS (#48)]
112+
│ │ └── EvalScalar
113+
│ │ ├── scalars: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7), store_returns.sr_return_amt (#11) AS (#11), store_returns.sr_returned_date_sk (#0) AS (#148), date_dim.d_date_sk (#20) AS (#149), date_dim.d_year (#26) AS (#150)]
114+
│ │ └── Join(Inner)
115+
│ │ ├── build keys: [date_dim.d_date_sk (#20)]
116+
│ │ ├── probe keys: [store_returns.sr_returned_date_sk (#0)]
117+
│ │ ├── other filters: []
118+
│ │ ├── Scan
119+
│ │ │ ├── table: default.store_returns (#0)
120+
│ │ │ ├── filters: []
121+
│ │ │ ├── order by: []
122+
│ │ │ └── limit: NONE
123+
│ │ └── Scan
124+
│ │ ├── table: default.date_dim (#1)
125+
│ │ ├── filters: [eq(date_dim.d_year (#26), 2001)]
126+
│ │ ├── order by: []
127+
│ │ └── limit: NONE
128+
│ └── Scan
129+
│ ├── table: default.customer (#3)
130+
│ ├── filters: []
131+
│ ├── order by: []
132+
│ └── limit: NONE
133+
└── Join(Inner)
134+
├── build keys: [sr_store_sk (#103)]
135+
├── probe keys: [store.s_store_sk (#49)]
136+
├── other filters: []
137+
├── Scan
138+
│ ├── table: default.store (#2)
139+
│ ├── filters: [eq(store.s_state (#73), 'TN')]
140+
│ ├── order by: []
141+
│ └── limit: NONE
142+
└── EvalScalar
143+
├── scalars: [outer.sr_store_sk (#103) AS (#103), multiply(divide(sum(ctr_total_return) (#145), if(eq(count(ctr_total_return) (#146), 0), 1, count(ctr_total_return) (#146))), 1.2) AS (#147)]
144+
└── Aggregate(Final)
145+
├── group items: [outer.sr_store_sk (#103) AS (#103)]
146+
├── aggregate functions: [sum(ctr_total_return) AS (#145), count(ctr_total_return) AS (#146)]
147+
└── Aggregate(Partial)
148+
├── group items: [outer.sr_store_sk (#103) AS (#103)]
149+
├── aggregate functions: [sum(ctr_total_return) AS (#145), count(ctr_total_return) AS (#146)]
150+
└── Aggregate(Final)
151+
├── group items: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103)]
152+
├── aggregate functions: [Sum(sr_return_amt) AS (#144)]
153+
└── Aggregate(Partial)
154+
├── group items: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103)]
155+
├── aggregate functions: [Sum(sr_return_amt) AS (#144)]
156+
└── EvalScalar
157+
├── scalars: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103), store_returns.sr_return_amt (#107) AS (#107), store_returns.sr_returned_date_sk (#96) AS (#151), date_dim.d_date_sk (#116) AS (#152), date_dim.d_year (#122) AS (#153)]
158+
└── Join(Inner)
159+
├── build keys: [date_dim.d_date_sk (#116)]
160+
├── probe keys: [store_returns.sr_returned_date_sk (#96)]
161+
├── other filters: []
162+
├── Scan
163+
│ ├── table: default.store_returns (#4)
164+
│ ├── filters: []
165+
│ ├── order by: []
166+
│ └── limit: NONE
167+
└── Scan
168+
├── table: default.date_dim (#5)
169+
├── filters: [eq(date_dim.d_year (#122), 2001)]
170+
├── order by: []
171+
└── limit: NONE
172+
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
name: "Q01"
2+
description: "TPC-DS Query 1 optimizer test"
3+
4+
sql: |
5+
WITH customer_total_return
6+
AS (SELECT sr_customer_sk AS ctr_customer_sk,
7+
sr_store_sk AS ctr_store_sk,
8+
Sum(sr_return_amt) AS ctr_total_return
9+
FROM store_returns,
10+
date_dim
11+
WHERE sr_returned_date_sk = d_date_sk
12+
AND d_year = 2001
13+
GROUP BY sr_customer_sk,
14+
sr_store_sk)
15+
SELECT c_customer_id
16+
FROM customer_total_return ctr1,
17+
store,
18+
customer
19+
WHERE ctr1.ctr_total_return > (SELECT Avg(ctr_total_return) * 1.2
20+
FROM customer_total_return ctr2
21+
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
22+
AND s_store_sk = ctr1.ctr_store_sk
23+
AND s_state = 'TN'
24+
AND ctr1.ctr_customer_sk = c_customer_sk
25+
ORDER BY c_customer_id
26+
LIMIT 100
27+
28+
# Reference to external statistics file
29+
statistics_file: statistics.yaml
30+
31+
# Converted from tabular format to tree format based on parent-child relationships
32+
good_plan: |
33+
Result
34+
└── SortWithLimit [sortKey: (CUSTOMER.C_CUSTOMER_ID ASC NULLS LAST), rowCount: 100]
35+
└── InnerJoin [joinKey: (CTR1.CTR_CUSTOMER_SK = CUSTOMER.C_CUSTOMER_SK)]
36+
├── InnerJoin [joinKey: (STORE.S_STORE_SK = CTR1.CTR_STORE_SK)]
37+
│ ├── Filter [STORE.S_STATE = 'TN']
38+
│ │ └── TableScan [SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE] [S_STORE_SK, S_STATE] [partitions: 1/1, bytes: 135,680]
39+
│ └── InnerJoin [joinKey: (CTR2.CTR_STORE_SK = CTR1.CTR_STORE_SK), joinFilter: (CTR1.CTR_TOTAL_RETURN) > (((SUM(CTR2.CTR_TOTAL_RETURN)) / (NVL(COUNT(CTR2.CTR_TOTAL_RETURN), 0))) * 1.2)]
40+
│ ├── Filter [(SUM(CTR2.CTR_TOTAL_RETURN) IS NOT NULL) AND (COUNT(CTR2.CTR_TOTAL_RETURN) IS NOT NULL)]
41+
│ │ └── Aggregate [aggExprs: [SUM(CTR2.CTR_TOTAL_RETURN), COUNT(CTR2.CTR_TOTAL_RETURN)], groupKeys: [CTR2.CTR_STORE_SK]]
42+
│ │ └── JoinFilter [joinKey: (STORE.S_STORE_SK = CTR1.CTR_STORE_SK)]
43+
│ │ └── WithReference [CTR2]
44+
│ │ └── Filter [STORE_RETURNS.SR_STORE_SK IS NOT NULL]
45+
│ │ └── WithClause [CUSTOMER_TOTAL_RETURN]
46+
│ │ └── Aggregate [aggExprs: [SUM(SUM(SUM(STORE_RETURNS.SR_RETURN_AMT)))], groupKeys: [STORE_RETURNS.SR_CUSTOMER_SK, STORE_RETURNS.SR_STORE_SK]]
47+
│ │ └── Aggregate [aggExprs: [SUM(SUM(STORE_RETURNS.SR_RETURN_AMT))], groupKeys: [STORE_RETURNS.SR_CUSTOMER_SK, STORE_RETURNS.SR_STORE_SK]]
48+
│ │ └── InnerJoin [joinKey: (DATE_DIM.D_DATE_SK = STORE_RETURNS.SR_RETURNED_DATE_SK)]
49+
│ │ ├── Filter [DATE_DIM.D_YEAR = 2001]
50+
│ │ │ └── TableScan [SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.DATE_DIM] [D_DATE_SK, D_YEAR] [partitions: 1/1, bytes: 2,138,624]
51+
│ │ └── Aggregate [aggExprs: [SUM(STORE_RETURNS.SR_RETURN_AMT)], groupKeys: [STORE_RETURNS.SR_CUSTOMER_SK, STORE_RETURNS.SR_STORE_SK, STORE_RETURNS.SR_RETURNED_DATE_SK]]
52+
│ │ └── Filter [STORE_RETURNS.SR_RETURNED_DATE_SK IS NOT NULL]
53+
│ │ └── JoinFilter [joinKey: (DATE_DIM.D_DATE_SK = STORE_RETURNS.SR_RETURNED_DATE_SK)]
54+
│ │ └── TableScan [SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE_RETURNS] [SR_RETURNED_DATE_SK, SR_CUSTOMER_SK, SR_STORE_SK, SR_RETURN_AMT] [partitions: 7070/7070, bytes: 124,763,446,272]
55+
│ └── JoinFilter [joinKey: (STORE.S_STORE_SK = CTR1.CTR_STORE_SK)]
56+
│ └── WithReference [CTR1]
57+
│ └── Filter [(STORE_RETURNS.SR_STORE_SK IS NOT NULL) AND (STORE_RETURNS.SR_CUSTOMER_SK IS NOT NULL)]
58+
│ └── WithClause [CUSTOMER_TOTAL_RETURN] (reference to earlier WITH clause)
59+
└── JoinFilter [joinKey: (CTR1.CTR_CUSTOMER_SK = CUSTOMER.C_CUSTOMER_SK)]
60+
└── TableScan [SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.CUSTOMER] [C_CUSTOMER_SK, C_CUSTOMER_ID] [partitions: 261/261, bytes: 2,328,538,624]

0 commit comments

Comments
 (0)