GSP SQLFlow Install now →
Problem Proof Dialects How it works FAQ Install now →
Works alongside DataHub · pip install · 60 seconds

DataHub's SQL parser silently
drops lineage. Across every warehouse.

sqlglot falls back to an opaque Command node on procedural SQL — stored procedures, control flow, dynamic SQL — losing column-level lineage without warning. gsp-datahub-sidecar recovers it.

340+
SQL/lineage issues tracked across ecosystems
8
Warehouse dialects covered
25
Relationships recovered per query
Works alongside
The problem

Your lineage graph is lying to you.

DataHub uses sqlglot to parse SQL. When it hits procedural constructs — stored procedures, control flow, dynamic SQL — it silently falls back to an opaque Command node. Zero lineage extracted. Zero warnings.

This isn't a BigQuery-only gap. The same silent fallback happens across Snowflake, SQL Server, Oracle, Databricks, and every dialect with procedural SQL.

-- Real query from DataHub issue #11654 DECLARE cutoff_date DATE; IF condition THEN CREATE TEMP TABLE stg AS SELECT * FROM source_table; INSERT INTO final_output SELECT * FROM stg; END IF;
sqlglot result: Command('DECLARE cutoff_date...') — 0 lineage edges
-- Real query from DataHub issue #11251 SELECT upper(cs.customercode) AS customercode , cs.ear2id, db.branch_rollup_name FROM dim_customer cs --join ... (commented out) JOIN ref_branch db ON ... WHERE cs.status = 1 --active
Without #(lf) decoding: 1 upstream table — JOINs after -- comment dropped
-- Snowflake stored procedure CREATE PROCEDURE load_daily() RETURNS STRING LANGUAGE SQL AS $$ INSERT INTO analytics.daily_summary SELECT region, SUM(revenue) FROM raw.transactions GROUP BY region; $$;
sqlglot result: Command('CREATE PROCEDURE...') — 0 lineage edges
-- SQL Server stored procedure CREATE PROCEDURE usp_UpdateOrders AS BEGIN BEGIN TRY INSERT INTO dbo.order_archive SELECT * FROM dbo.orders WHERE status = 'closed'; END TRY BEGIN CATCH ... END CATCH END;
sqlglot result: Command('CREATE PROCEDURE...') — 0 lineage edges
-- Oracle PL/SQL package body CREATE PACKAGE BODY etl_pkg AS PROCEDURE transform IS BEGIN INSERT INTO dim_customer SELECT id, name, region FROM stg_customer WHERE active = 1; END; END etl_pkg;
sqlglot result: Command('CREATE PACKAGE...') — 0 lineage edges
The proof

Same query. Dramatically different results.

DataHub default (sqlglot)
0
column-level relationships
sqlglot fallback
With gsp-datahub-sidecar
25
column-level relationships
GSP SQLFlow engine

Same 4 tables. Same query. 0 vs 25 column-level relationships.

Same pattern, any dialect — the gap is universal.

How it works

3 steps. 60 seconds.

1

Install

One pip command. No Docker, no infra changes, no DataHub plugins.

pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git
2

Detect & Re-parse

The sidecar identifies every SQL statement where sqlglot fell back to a Command node, then re-parses with the GSP engine.

3

Emit to DataHub

Column-level lineage is emitted back into DataHub via the GMS API. Your lineage graph is complete — no fork, no redeploy.

gsp-sidecar emit --gms-url http://localhost:8080
Choose your backend

Three backends. Pick your comfort level.

Every backend uses the same GSP SQLFlow engine. The only difference is where SQL gets parsed.

Anonymous

Free · No signup
  • Cloud-parsed, not logged
  • Rate-limited (fair use)
  • Great for evaluation
Get started →

Self-Hosted

On-premise · SQLFlow license
  • SQL never leaves your network
  • No rate limits
  • Full audit trail
  • Enterprise support
Talk to us →
FAQ

Common questions.

Which SQL databases are supported?

BigQuery is live today with a full deep-dive page. Snowflake, Databricks, SQL Server, Oracle, Hive, Spark SQL, and DB2 are coming soon. The underlying GSP SQLFlow engine already parses all 8 dialects (and 20+ more) — the sidecar integration is what we're shipping per dialect. Email us if you need a specific dialect prioritized.

Does this replace DataHub's lineage parser?

No. The sidecar complements DataHub's existing parser. DataHub still runs sqlglot for standard SQL — it handles straightforward queries well. The sidecar only re-parses statements where sqlglot falls back to a Command node, recovering the lineage that would otherwise be silently lost.

How is the sidecar different from what DataHub already does?

DataHub's native SQL parser (sqlglot) handles standard SELECT/INSERT/UPDATE queries. But procedural SQL — stored procedures, DECLARE blocks, TRY/CATCH, dynamic SQL, temp tables — produces an opaque Command node with zero lineage. The sidecar detects those gaps and re-parses with the GSP engine, which fully understands procedural constructs across all supported dialects.

Is my SQL sent to a third party?

Depends on your backend. Anonymous and Authenticated modes parse SQL via Gudu Software's cloud API (processed in memory, never logged or stored). Self-hosted mode runs the GSP engine on your infrastructure — SQL never leaves your network. Choose Self-hosted for regulated or sensitive environments.

What's the licensing model?

The sidecar tool itself is open source (Apache 2.0). The Anonymous backend is free with fair-use rate limits. The Authenticated backend provides higher limits with a personal API key. Self-hosted deployments require a commercial SQLFlow license — contact us for pricing.

When will Snowflake / Databricks / other dialects be available?

We're shipping dialect-specific sidecar integrations based on community demand. Snowflake and Databricks are the highest priority (Tier 1). If you need a specific dialect, email support@gudusoft.com — every email directly influences our roadmap sequencing.

Install in 60 seconds. See what you've been missing.

One command. Every missing column-level relationship back in DataHub.

$ pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Copy

Open source on GitHub · Apache 2.0 license · Read the deep dive blog post