You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-51747][SQL] Data source cached plan should respect options
### What changes were proposed in this pull request?
Data source cached plan should respect options, such as CSV delimiter. Before this, DataSourceStrategy caches the first plan and reuses it in the future, ignoring updated options. This change returns a **new plan** if options are changed.
### Why are the changes needed?
For example:
```
spark.sql("CREATE TABLE t(a string, b string) USING CSV".stripMargin)
spark.sql("INSERT INTO TABLE t VALUES ('a;b', 'c')")
spark.sql("SELECT * FROM t").show()
spark.sql("SELECT * FROM t WITH ('delimiter' = ';')")
```
Expected output:
```
+----+----+
|col1|col2|
+----+----+
| a;b| c|
+----+----+
+----+----+
|col1|col2|
+----+----+
| a| b,c|
+----+----+
```
Output before this PR:
```
+----+----+
|col1|col2|
+----+----+
| a;b| c|
+----+----+
+----+----+
|col1|col2|
+----+----+
| a;b| c|
+----+----+
```
The PR is needed to get the expected result.
### Does this PR introduce _any_ user-facing change?
Yes, corrects the caching behavior from DataSourceStrategy
### How was this patch tested?
Added test in DDLSuite.scala
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#50538 from asl3/asl3/datasourcestrategycacheoptions.
Lead-authored-by: Amanda Liu <amanda.liu@databricks.com>
Co-authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
0 commit comments