GSP SQLFlow Install now →
Problem Proof Dialects How it works FAQ Install now →
Works alongside DataHub · pip install · 60 seconds

DataHub's SQL parser silently
drops lineage. Across every warehouse.

sqlglot falls back to an opaque Command node on procedural SQL — stored procedures, control flow, dynamic SQL — losing column-level lineage without warning. gsp-datahub-sidecar recovers it.

340+
SQL/lineage issues tracked across ecosystems
8
Warehouse dialects covered
25
Relationships recovered per query
Works alongside
The problem

Your lineage graph is lying to you.

DataHub uses sqlglot to parse SQL. When it hits procedural constructs — stored procedures, control flow, dynamic SQL — it silently falls back to an opaque Command node. Zero lineage extracted. Zero warnings.

This isn't a BigQuery-only gap. The same silent fallback happens across Snowflake, SQL Server, Oracle, Databricks, and every dialect with procedural SQL.

-- Real query from DataHub issue #11654 DECLARE cutoff_date DATE; IF condition THEN CREATE TEMP TABLE stg AS SELECT * FROM source_table; INSERT INTO final_output SELECT * FROM stg; END IF;
sqlglot result: Command('DECLARE cutoff_date...') — 0 lineage edges
-- Real query from DataHub issue #11251 SELECT upper(cs.customercode) AS customercode , cs.ear2id, db.branch_rollup_name FROM dim_customer cs --join ... (commented out) JOIN ref_branch db ON ... WHERE cs.status = 1 --active
Without #(lf) decoding: 1 upstream table — JOINs after -- comment dropped
-- Snowflake stored procedure CREATE PROCEDURE load_daily() RETURNS STRING LANGUAGE SQL AS $$ INSERT INTO analytics.daily_summary SELECT region, SUM(revenue) FROM raw.transactions GROUP BY region; $$;
sqlglot result: Command('CREATE PROCEDURE...') — 0 lineage edges
-- SQL Server stored procedure CREATE PROCEDURE usp_UpdateOrders AS BEGIN BEGIN TRY INSERT INTO dbo.order_archive SELECT * FROM dbo.orders WHERE status = 'closed'; END TRY BEGIN CATCH ... END CATCH END;
sqlglot result: Command('CREATE PROCEDURE...') — 0 lineage edges
-- Oracle PL/SQL package body CREATE PACKAGE BODY etl_pkg AS PROCEDURE transform IS BEGIN INSERT INTO dim_customer SELECT id, name, region FROM stg_customer WHERE active = 1; END; END etl_pkg;
sqlglot result: Command('CREATE PACKAGE...') — 0 lineage edges
The proof

Same query. Dramatically different results.

DataHub default (sqlglot)
0
column-level relationships
sqlglot fallback
With gsp-datahub-sidecar
25
column-level relationships
GSP SQLFlow engine

Same 4 tables. Same query. 0 vs 25 column-level relationships.

Same pattern, any dialect — the gap is universal.

How it works

3 steps. 60 seconds.

1

Install

One pip command. No Docker, no infra changes, no DataHub plugins.

pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git
2

Detect & Re-parse

The sidecar identifies every SQL statement where sqlglot fell back to a Command node, then re-parses with the GSP engine.

3

Emit to DataHub

Column-level lineage is emitted back into DataHub via the GMS API. Your lineage graph is complete — no fork, no redeploy.

gsp-sidecar emit --gms-url http://localhost:8080
Choose your backend

Three backends. Pick your comfort level.

Every backend uses the same GSP SQLFlow engine. The only difference is where SQL gets parsed.

Anonymous

Free · No signup
  • Cloud-parsed, not logged
  • Rate-limited (fair use)
  • Great for evaluation
Get started →

Self-Hosted

On-premise · SQLFlow license
  • SQL never leaves your network
  • No rate limits
  • Full audit trail
  • Enterprise support
Talk to us →
FAQ

Common questions.

Which SQL databases are supported?

BigQuery and Power BI deep dives are live today. For Snowflake, Databricks, SQL Server, Oracle, Hive, Spark SQL, and DB2, the underlying GSP SQLFlow engine already parses the dialects; contact us to prioritize a DataHub sidecar review for your warehouse and SQL corpus.

Does this replace DataHub's lineage parser?

No. The sidecar complements DataHub's existing parser. DataHub still runs sqlglot for standard SQL — it handles straightforward queries well. The sidecar only re-parses statements where sqlglot falls back to a Command node, recovering the lineage that would otherwise be silently lost.

How is the sidecar different from what DataHub already does?

DataHub's native SQL parser (sqlglot) handles standard SELECT/INSERT/UPDATE queries. But procedural SQL — stored procedures, DECLARE blocks, TRY/CATCH, dynamic SQL, temp tables — produces an opaque Command node with zero lineage. The sidecar detects those gaps and re-parses with the GSP engine, which fully understands procedural constructs across all supported dialects.

Is my SQL sent to a third party?

Depends on your backend. Anonymous and Authenticated modes parse SQL via Gudu Software's cloud API (processed in memory, never logged or stored). Self-hosted mode runs the GSP engine on your infrastructure — SQL never leaves your network. Choose Self-hosted for regulated or sensitive environments.

What's the licensing model?

The sidecar tool itself is open source (Apache 2.0). The Anonymous backend is free with fair-use rate limits. The Authenticated backend provides higher limits with a personal API key. Self-hosted deployments require a commercial SQLFlow license — contact us for pricing.

When will Snowflake / Databricks / other dialects be available?

We're shipping dialect-specific sidecar integrations based on community demand. Snowflake and Databricks are the highest priority (Tier 1). If you need a specific dialect, email support@gudusoft.com — every email directly influences our roadmap sequencing.

Install in 60 seconds. See what you've been missing.

One command. Every missing column-level relationship back in DataHub.

$ pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Copy

Open source on GitHub · Apache 2.0 license · Read the deep dive blog post