Skip to content

[bigquery -> duckdb] REGEXP_EXTRACT transpilation can result in different return type #6442

@erindru

Description

@erindru

Consider the following query:

>>> query = parse_one("select REGEXP_EXTRACT('foo', 'foo', CAST(4 AS INT64)) as call_0", dialect="bigquery")

# original, roundtripped
>>> query.sql(dialect="bigquery")
"SELECT REGEXP_EXTRACT('foo', 'foo', CAST(4 AS INT64)) AS call_0"

# transpiled to DuckDB
>>> query.sql(dialect="duckdb")
"SELECT REGEXP_EXTRACT(SUBSTRING('foo', CAST(4 AS BIGINT)), 'foo') AS call_0"

The BQ Docs state:

If position is greater than the length of value, NULL is returned.

When the original is run against BigQuery, it returns NULL:

$ bq query --use_legacy_sql=false "SELECT REGEXP_EXTRACT('foo', 'foo', CAST(4 AS INT64)) AS call_0"
+--------+
| call_0 |
+--------+
| NULL   |
+--------+

When the transpiled version is run against DuckDB, it returns empty string:

D SELECT REGEXP_EXTRACT(SUBSTRING('foo', CAST(4 AS BIGINT)), 'foo') AS call_0;
┌─────────┐
│ call_0  │
│ varchar │
├─────────┤
│         │
└─────────┘

I think the transpilation needs to wrap the SUBSTRING in NULLIF to match, eg the following returns the correct result:

D SELECT REGEXP_EXTRACT(NULLIF(SUBSTRING('foo', CAST(4 AS BIGINT)), ''), 'foo') AS call_0;
┌─────────┐
│ call_0  │
│ varchar │
├─────────┤
│ NULL    │
└─────────┘

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions