-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Description
Consider the following query:
>>> query = parse_one("select REGEXP_EXTRACT('foo', 'foo', CAST(4 AS INT64)) as call_0", dialect="bigquery")
# original, roundtripped
>>> query.sql(dialect="bigquery")
"SELECT REGEXP_EXTRACT('foo', 'foo', CAST(4 AS INT64)) AS call_0"
# transpiled to DuckDB
>>> query.sql(dialect="duckdb")
"SELECT REGEXP_EXTRACT(SUBSTRING('foo', CAST(4 AS BIGINT)), 'foo') AS call_0"
The BQ Docs state:
If position is greater than the length of value, NULL is returned.
When the original is run against BigQuery, it returns NULL:
$ bq query --use_legacy_sql=false "SELECT REGEXP_EXTRACT('foo', 'foo', CAST(4 AS INT64)) AS call_0"
+--------+
| call_0 |
+--------+
| NULL |
+--------+
When the transpiled version is run against DuckDB, it returns empty string:
D SELECT REGEXP_EXTRACT(SUBSTRING('foo', CAST(4 AS BIGINT)), 'foo') AS call_0;
┌─────────┐
│ call_0 │
│ varchar │
├─────────┤
│ │
└─────────┘
I think the transpilation needs to wrap the SUBSTRING in NULLIF to match, eg the following returns the correct result:
D SELECT REGEXP_EXTRACT(NULLIF(SUBSTRING('foo', CAST(4 AS BIGINT)), ''), 'foo') AS call_0;
┌─────────┐
│ call_0 │
│ varchar │
├─────────┤
│ NULL │
└─────────┘
Metadata
Metadata
Assignees
Labels
No labels