Works alongside DataHub · pip install · 60 seconds

DataHub's SQL parser silently
drops lineage. Across every warehouse.

Name: gsp-datahub-sidecar
Author: Gudu Software

sqlglot falls back to an opaque Command node on procedural SQL — stored procedures, control flow, dynamic SQL — losing column-level lineage without warning. gsp-datahub-sidecar recovers it.

Install in 60 seconds → See the proof ↓

340+

SQL/lineage issues tracked across ecosystems

Warehouse dialects covered

Relationships recovered per query

Works alongside

DataHub sqlglot BigQuery Snowflake Databricks SQL Server Oracle Hive Spark

The problem

Your lineage graph is lying to you.

DataHub uses sqlglot to parse SQL. When it hits procedural constructs — stored procedures, control flow, dynamic SQL — it silently falls back to an opaque Command node. Zero lineage extracted. Zero warnings.

This isn't a BigQuery-only gap. The same silent fallback happens across Snowflake, SQL Server, Oracle, Databricks, and every dialect with procedural SQL.

-- Real query from DataHub issue #11654 DECLARE cutoff_date DATE; IF condition THEN CREATE TEMP TABLE stg AS SELECT * FROM source_table; INSERT INTO final_output SELECT * FROM stg; END IF;

sqlglot result: Command('DECLARE cutoff_date...') — 0 lineage edges

-- Real query from DataHub issue #11251 SELECT upper(cs.customercode) AS customercode , cs.ear2id, db.branch_rollup_name FROM dim_customer cs --join ... (commented out) JOIN ref_branch db ON ... WHERE cs.status = 1 --active

Without #(lf) decoding: 1 upstream table — JOINs after -- comment dropped

-- Snowflake stored procedure CREATE PROCEDURE load_daily() RETURNS STRING LANGUAGE SQL AS $$ INSERT INTO analytics.daily_summary SELECT region, SUM(revenue) FROM raw.transactions GROUP BY region; $$;

sqlglot result: Command('CREATE PROCEDURE...') — 0 lineage edges

-- SQL Server stored procedure CREATE PROCEDURE usp_UpdateOrders AS BEGIN BEGIN TRY INSERT INTO dbo.order_archive SELECT * FROM dbo.orders WHERE status = 'closed'; END TRY BEGIN CATCH ... END CATCH END;

sqlglot result: Command('CREATE PROCEDURE...') — 0 lineage edges

-- Oracle PL/SQL package body CREATE PACKAGE BODY etl_pkg AS PROCEDURE transform IS BEGIN INSERT INTO dim_customer SELECT id, name, region FROM stg_customer WHERE active = 1; END; END etl_pkg;

sqlglot result: Command('CREATE PACKAGE...') — 0 lineage edges

The proof

Same query. Dramatically different results.

DataHub default (sqlglot)

column-level relationships

sqlglot fallback

→

With gsp-datahub-sidecar

column-level relationships

GSP SQLFlow engine

Same 4 tables. Same query. 0 vs 25 column-level relationships.

Same pattern, any dialect — the gap is universal.

Supported warehouses

Every warehouse. Every parser gap. One sidecar.

Click a dialect to see the deep dive, or request priority review for the warehouse you need next.

BigQuery

Procedural SQL — DECLARE, IF/END IF, temp tables — drops to Command node.

0 → 25 relationships recovered

Live

Snowflake

Stored procedures and EXECUTE IMMEDIATE silently bypass parsing.

96 tracked issues

Request review

Databricks

Notebook SQL and Unity Catalog references fall through the parser.

69 tracked issues

Request review

Power BI

#(lf) encoding in M-language means -- comments swallow JOINs and WHERE clauses.

1 → 5 column lineages recovered

Live

SQL Server

T-SQL stored procedures with TRY/CATCH, cursors, and dynamic SQL.

17 tracked issues

Request review

Oracle

PL/SQL packages, CONNECT BY, MODEL clause — full procedural support.

47 tracked issues

Request review

Hive

HiveQL UDFs and TRANSFORM clauses bypass standard parsing.

154 tracked issues

Request review

Spark SQL

Spark SQL extensions, DataFrame lineage, and catalog references.

265 tracked issues

Request review

DB2

DB2 SQL PL stored procedures and compound statements.

27 tracked issues

Request review

How it works

3 steps. 60 seconds.

Install

One pip command. No Docker, no infra changes, no DataHub plugins.

pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git

Detect & Re-parse

The sidecar identifies every SQL statement where sqlglot fell back to a Command node, then re-parses with the GSP engine.

Emit to DataHub

Column-level lineage is emitted back into DataHub via the GMS API. Your lineage graph is complete — no fork, no redeploy.

gsp-sidecar emit --gms-url http://localhost:8080

Choose your backend

Three backends. Pick your comfort level.

Every backend uses the same GSP SQLFlow engine. The only difference is where SQL gets parsed.

Anonymous

Free · No signup

Cloud-parsed, not logged
Rate-limited (fair use)
Great for evaluation

Get started →

Authenticated

API key · Higher limits

Personal API key
Higher per-minute quota
Usage dashboard
Priority processing

Get API key →

Self-Hosted

On-premise · SQLFlow license

SQL never leaves your network
No rate limits
Full audit trail
Enterprise support

Talk to us →

FAQ

Common questions.

Which SQL databases are supported?

BigQuery and Power BI deep dives are live today. For Snowflake, Databricks, SQL Server, Oracle, Hive, Spark SQL, and DB2, the underlying GSP SQLFlow engine already parses the dialects; contact us to prioritize a DataHub sidecar review for your warehouse and SQL corpus.

Does this replace DataHub's lineage parser?

No. The sidecar complements DataHub's existing parser. DataHub still runs sqlglot for standard SQL — it handles straightforward queries well. The sidecar only re-parses statements where sqlglot falls back to a Command node, recovering the lineage that would otherwise be silently lost.

How is the sidecar different from what DataHub already does?

DataHub's native SQL parser (sqlglot) handles standard SELECT/INSERT/UPDATE queries. But procedural SQL — stored procedures, DECLARE blocks, TRY/CATCH, dynamic SQL, temp tables — produces an opaque Command node with zero lineage. The sidecar detects those gaps and re-parses with the GSP engine, which fully understands procedural constructs across all supported dialects.

Is my SQL sent to a third party?

Depends on your backend. Anonymous and Authenticated modes parse SQL via Gudu Software's cloud API (processed in memory, never logged or stored). Self-hosted mode runs the GSP engine on your infrastructure — SQL never leaves your network. Choose Self-hosted for regulated or sensitive environments.

What's the licensing model?

The sidecar tool itself is open source (Apache 2.0). The Anonymous backend is free with fair-use rate limits. The Authenticated backend provides higher limits with a personal API key. Self-hosted deployments require a commercial SQLFlow license — contact us for pricing.

When will Snowflake / Databricks / other dialects be available?

We're shipping dialect-specific sidecar integrations based on community demand. Snowflake and Databricks are the highest priority (Tier 1). If you need a specific dialect, email support@gudusoft.com — every email directly influences our roadmap sequencing.

Install in 60 seconds. See what you've been missing.

One command. Every missing column-level relationship back in DataHub.

$ pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Copy

Get API key → Book a demo →

Open source on GitHub · Apache 2.0 license · Read the deep dive blog post

DataHub's SQL parser silentlydrops lineage. Across every warehouse.

Your lineage graph is lying to you.

Same query. Dramatically different results.

Every warehouse. Every parser gap. One sidecar.

BigQuery

Snowflake

Databricks

Power BI

SQL Server

Oracle

Hive

Spark SQL

DB2

3 steps. 60 seconds.

Install

Detect & Re-parse

Emit to DataHub

Three backends. Pick your comfort level.

Anonymous

Authenticated

Self-Hosted

Common questions.

Install in 60 seconds. See what you've been missing.

DataHub's SQL parser silently
drops lineage. Across every warehouse.