feat(typing): add databricks type inference for REGEXP_INSTR, REGEXP_LIKE, REGEXP_SUBSTR, REGR_AVGX by fivetran-amrutabhimsenayachit · Pull Request #7815 · tobymao/sqlglot

fivetran-amrutabhimsenayachit · 2026-06-29T17:14:02Z

Summary

Adds Databricks type inference support for REGEXP_INSTR (INT), REGEXP_LIKE (BOOLEAN via fixture), REGEXP_SUBSTR (VARCHAR via parser mapping to RegexpExtract), and REGR_AVGX (DOUBLE), plus fixture coverage for all four functions.

Tickets

RD-1229633 (REGEXP_INSTR) — new INT entry in typing/databricks.py
RD-1229634 (REGEXP_LIKE) — fixture entry only (inherited from base typing)
RD-1229635 (REGEXP_REPLACE) — already complete, no changes
RD-1229636 (REGEXP_SUBSTR) — parser mapping REGEXP_SUBSTR→RegexpExtract + fixture entry
RD-1229637 (REGR_AVGX) — new DOUBLE entry in typing/databricks.py

Test plan

make style — PASS
make unit — PASS

…LIKE, REGEXP_SUBSTR, REGR_AVGX

github-actions · 2026-06-29T17:40:31Z

SQLGlot Integration Test Results

✅ All tests passed

Comparing:

this branch (sqlglot:type-inference-batch-2 @ sqlglot 2195638)
baseline (main @ sqlglot fd6d4d6)

By Dialect

dialect	main	feature branch	transitions	links
databricks -> databricks	9982/11820 passed (84.5%)	9998/11820 passed (84.6%)	16 fail -> pass	full result / delta

Overall

main: 192428 total, 153523 passed (pass rate: 79.8%)

sqlglot:type-inference-batch-2: 180234 total, 142394 passed (pass rate: 79.0%)

Transitions:
16 fail -> pass

Dialect pair changes: 0 previous results not found, 3 current results not found

✅ All tests passed

…TR round-trip in databricks

geooo109 · 2026-06-30T08:49:58Z

+VARCHAR;
+
+# dialect: databricks
+REGR_AVGX(tbl.double_col, tbl.double_col);


Let's also add tests containing the ALL, DISTINCT, andREGR_AVGX(...) OVER (PARTITION BY 1)

geooo109 · 2026-06-30T09:03:20Z

+class RegexpSubstr(Expression, Func):
+    arg_types = {"this": True, "expression": True}


Can you check if we can reuse exp.RegexpExtract for this? The default group should be 0 here. So, check the semantics for regexp_extract vs regexp_substr, if they match (for group 0), we can reuse the existing expression. If they don't match, let's keep this addition and add round-trip tests.

Minimal example: SELECT regexp_extract('Order: 100-200', '(\\d+)-(\\d+)', 0) > 100-200 SELECT regexp_substr('Order: 100-200', '(\\d+)-(\\d+)') > 100 -200

feat(typing): add databricks type inference for REGEXP_INSTR, REGEXP_…

a5e24b3

…LIKE, REGEXP_SUBSTR, REGR_AVGX

fivetran-amrutabhimsenayachit self-assigned this Jun 29, 2026

fix(typing): use dedicated RegexpSubstr class to preserve REGEXP_SUBS…

e17246d

…TR round-trip in databricks

geooo109 self-assigned this Jun 30, 2026

geooo109 reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(typing): add databricks type inference for REGEXP_INSTR, REGEXP_LIKE, REGEXP_SUBSTR, REGR_AVGX#7815

feat(typing): add databricks type inference for REGEXP_INSTR, REGEXP_LIKE, REGEXP_SUBSTR, REGR_AVGX#7815
fivetran-amrutabhimsenayachit wants to merge 2 commits into
mainfrom
type-inference-batch-2

fivetran-amrutabhimsenayachit commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

geooo109 Jun 30, 2026

Uh oh!

geooo109 Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		class RegexpSubstr(Expression, Func):
		arg_types = {"this": True, "expression": True}

Conversation

fivetran-amrutabhimsenayachit commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tickets

Test plan

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SQLGlot Integration Test Results

✅ All tests passed

By Dialect

Overall

Uh oh!

geooo109 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

geooo109 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fivetran-amrutabhimsenayachit commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading