Open source · pip install · 60 seconds

Stop losing BigQuery lineage
in DataHub.

Procedural SQL — DECLARE, IF/END IF, CALL, CREATE TEMP TABLE — silently disappears from DataHub's lineage graph. One pip install brings it all back.

Fix it in 60 seconds → See broken vs fixed ↓

Relationships found by sqlglot

Column-level lineages recovered
(plus 2 table-level)

20+

SQL dialects supported

Works with

DataHub BigQuery dbt Airflow Snowflake

The problem

Your lineage graph is lying to you.

DataHub uses sqlglot to parse SQL. When it hits procedural constructs — stored procedures, control flow, dynamic SQL — it silently falls back to an opaque Command node. Zero lineage extracted. Zero warnings.

Your BigQuery warehouse's most business-critical queries are invisible in the lineage graph. You see empty panels and assume "it's fine."

It's not.

-- Real query from DataHub issue #11654
DECLARE cutoff_date DATE;
IF condition THEN
  CREATE TEMP TABLE stg AS
  SELECT * FROM source_table;
  INSERT INTO final_output
  SELECT * FROM stg;
END IF;

sqlglot result: Command('DECLARE cutoff_date...') — 0 lineage edges extracted

The proof

Same query. Dramatically different results.

DataHub default (sqlglot)

column-level relationships

sqlglot fallback

→

With gsp-datahub-sidecar

column-level relationships
+ 2 table-level

GSP SQLFlow engine

Same 4 tables. Same query. 0 vs 11 column-level relationships, plus 2 table-level.

How it works

3 steps. 60 seconds.

Install

One pip command. No Docker, no infra changes, no DataHub plugins.

pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git

Run

Point the sidecar at your DataHub GMS. It re-parses every failed SQL statement automatically.

gsp-sidecar emit --gms-url http://localhost:8080

See the lineage

Open DataHub. Every column-level relationship is back. Stored procedures, temp tables, control flow — all visible.

Choose your backend

Three backends. Pick your comfort level.

Every backend uses the same GSP SQLFlow engine. The only difference is where SQL gets parsed.

Anonymous

Free · No signup

Cloud-parsed, not logged
Rate-limited (fair use)
Great for evaluation

Get started →

Authenticated

API key · Higher limits

Personal API key
Higher per-minute quota
Usage dashboard
Priority processing

Get API key →

Self-Hosted

On-premise · SQLFlow license

SQL never leaves your network
No rate limits
Full audit trail
Enterprise support

Talk to us →

FAQ

Common questions.

Does this replace DataHub's lineage parser?

No. The sidecar augments DataHub's existing parser. DataHub still runs sqlglot for standard SQL. The sidecar only re-parses statements that sqlglot couldn't handle — procedural constructs like DECLARE, IF/END IF, CALL, and CREATE TEMP TABLE.

Which SQL dialects are supported?

BigQuery is the primary focus. The GSP SQLFlow engine also supports SQL Server, Oracle, PostgreSQL, Snowflake, Redshift, Teradata, and 20+ other dialects. If your warehouse uses procedural SQL, the sidecar can likely parse it.

Is my SQL sent to a third party?

Depends on the backend you choose. Anonymous and Authenticated modes parse SQL in Gudu Software's cloud (processed in memory, not logged or stored). Self-hosted mode keeps everything on your infrastructure — SQL never leaves your network.

How long does installation take?

Under 60 seconds. Run pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git, point it at your DataHub GMS endpoint, and run. No Docker, no Kubernetes, no DataHub plugins to install.

How is this different from sqlglot?

sqlglot handles standard SQL well but drops procedural constructs silently — it falls back to an opaque Command node with zero lineage. The GSP engine parses the full procedural SQL, including control flow, temp tables, and dynamic SQL. On the DataHub #11654 reproducer: 0 relationships (sqlglot) vs 11 column-level + 2 table-level (GSP).

What does it cost?

The sidecar tool is open source (Apache 2.0). The Anonymous backend is free with fair-use rate limits. Authenticated and Self-hosted backends have separate pricing — contact us for details.

Recover your BigQuery lineage.

One command. Every missing column-level relationship back in DataHub.

$ pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Copy

Get API key → Book a demo →

Open source on GitHub · Apache 2.0 license

Stop losing BigQuery lineagein DataHub.