Skip to content

feat(duckdb): Add transpilation support for SUBSTR-SUBSTRING functions#7471

Merged
georgesittas merged 3 commits intomainfrom
RD-1147762-substr-substring
Apr 16, 2026
Merged

feat(duckdb): Add transpilation support for SUBSTR-SUBSTRING functions#7471
georgesittas merged 3 commits intomainfrom
RD-1147762-substr-substring

Conversation

@fivetran-amrutabhimsenayachit
Copy link
Copy Markdown
Collaborator

This PR aims to resolve the following issues:
Issue 1 — Zero start position
Snowflake SUBSTR(str, 0, n) treats 0 as 1. DuckDB returns a shorter string. Fix: clamp 0 → 1.

Issue 2 — Negative length
Snowflake SUBSTR(str, pos, -n) returns ''. DuckDB silently returns wrong data. Fix: clamp negative → 0.

python3 -c 'import sqlglot; print(sqlglot.transpile("SELECT SUBSTR('"'"'testing 1 2 3'"'"', 9, 3) AS basic, SUBSTR('"'"'testing 1 2 3'"'"', 9) AS no_len, SUBSTR('"'"'testing 1 2 3'"'"', 0, 3) AS zero_start, SUBSTR('"'"'testing 1 2 3'"'"', -5, 3) AS neg_start, SUBSTR(NULL, 1, 3) AS null_str, SUBSTR('"'"'testing'"'"', 1, 0) AS zero_len, SUBSTR('"'"'testing 1 2 3'"'"', 5, -3) AS neg_len", read="snowflake", write="duckdb")[0])' | duckdb
┌─────────┬─────────┬────────────┬───────────┬──────────┬──────────┬─────────┐
│  basic  │ no_len  │ zero_start │ neg_start │ null_str │ zero_len │ neg_len │
│ varchar │ varchar │  varchar   │  varchar  │ varchar  │ varchar  │ varchar │
├─────────┼─────────┼────────────┼───────────┼──────────┼──────────┼─────────┤
│ 1 2     │ 1 2 3   │ tes        │ 1 2       │ NULL     │          │         │
└─────────┴─────────┴────────────┴───────────┴──────────┴──────────┴─────────┘

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:RD-1147762-substr-substring, sqlglot version: RD-1147762-substr-substring)
  • baseline (main, sqlglot version: 0.0.1.dev1)

By Dialect

dialect main sqlglot:RD-1147762-substr-substring transitions links
bigquery -> bigquery 25161/25166 passed (100.0%) 23491/23491 passed (100.0%) No change full result / delta
bigquery -> duckdb 1304/1674 passed (77.9%) 0/0 passed (0.0%) Results not found full result / delta
snowflake -> duckdb 1526/2678 passed (57.0%) 1526/2678 passed (57.0%) No change full result / delta
snowflake -> snowflake 65926/65926 passed (100.0%) 65926/65926 passed (100.0%) No change full result / delta
databricks -> databricks 1370/1370 passed (100.0%) 1370/1370 passed (100.0%) No change full result / delta
postgres -> postgres 6042/6042 passed (100.0%) 6042/6042 passed (100.0%) No change full result / delta
redshift -> redshift 7101/7101 passed (100.0%) 7101/7101 passed (100.0%) No change full result / delta

Overall

main: 109957 total, 108430 passed (pass rate: 98.6%), sqlglot version: 0.0.1.dev1

sqlglot:RD-1147762-substr-substring: 106608 total, 105456 passed (pass rate: 98.9%), sqlglot version: RD-1147762-substr-substring

Transitions:
No change

Dialect pair changes: 0 previous results not found, 1 curent results not found

✅ 23 test(s) passed

@georgesittas georgesittas force-pushed the RD-1147762-substr-substring branch 2 times, most recently from 1728111 to 0d33d57 Compare April 8, 2026 22:05
Comment thread sqlglot/generators/duckdb.py Outdated
Comment on lines +2512 to +2525
new_start = (
exp.Literal.number(1)
if start is not None and start.is_number and start.to_py() == 0
else exp.If(this=start.eq(0), true=exp.Literal.number(1), false=start.copy())
if start is not None and not start.is_number
else None
)
new_length = (
exp.Literal.number(0)
if length is not None and length.is_number and length.to_py() < 0
else exp.If(this=length.copy() < 0, true=exp.Literal.number(0), false=length.copy())
if length is not None and not length.is_number
else None
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not handle literals manually. It results in more complicated logic and sets a precedent that we don't want to follow. Let's refactor these to always inject the If node in this branch.

Comment thread sqlglot/generators/duckdb.py Outdated
)

if new_start is not None or new_length is not None:
expression = expression.copy()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not copy here. The generator already does this for you.

@georgesittas georgesittas force-pushed the RD-1147762-substr-substring branch from ae5603e to 9ddccff Compare April 16, 2026 12:24
@georgesittas georgesittas merged commit b985092 into main Apr 16, 2026
7 of 8 checks passed
@georgesittas georgesittas deleted the RD-1147762-substr-substring branch April 16, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants