Stop losing BigQuery lineage
in DataHub.
Procedural SQL — DECLARE, IF/END IF,
CALL, CREATE TEMP TABLE — silently
disappears from DataHub's lineage graph. One pip install brings it all back.
(plus 2 table-level)
Your lineage graph is lying to you.
DataHub uses sqlglot to parse SQL. When it hits procedural constructs —
stored procedures, control flow, dynamic SQL — it silently falls back
to an opaque Command node. Zero lineage extracted. Zero warnings.
Your BigQuery warehouse's most business-critical queries are invisible in the lineage graph. You see empty panels and assume "it's fine."
It's not.
DECLARE cutoff_date DATE;
IF condition THEN
CREATE TEMP TABLE stg AS
SELECT * FROM source_table;
INSERT INTO final_output
SELECT * FROM stg;
END IF;
Same query. Dramatically different results.
+ 2 table-level
Same 4 tables. Same query. 0 vs 11 column-level relationships, plus 2 table-level.
3 steps. 60 seconds.
Install
One pip command. No Docker, no infra changes, no DataHub plugins.
pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Run
Point the sidecar at your DataHub GMS. It re-parses every failed SQL statement automatically.
gsp-sidecar emit --gms-url http://localhost:8080 See the lineage
Open DataHub. Every column-level relationship is back. Stored procedures, temp tables, control flow — all visible.
Three backends. Pick your comfort level.
Every backend uses the same GSP SQLFlow engine. The only difference is where SQL gets parsed.
Anonymous
- Cloud-parsed, not logged
- Rate-limited (fair use)
- Great for evaluation
Authenticated
- Personal API key
- Higher per-minute quota
- Usage dashboard
- Priority processing
Self-Hosted
- SQL never leaves your network
- No rate limits
- Full audit trail
- Enterprise support
Common questions.
Does this replace DataHub's lineage parser?
No. The sidecar augments DataHub's existing parser. DataHub still runs sqlglot for standard SQL. The sidecar only re-parses statements that sqlglot couldn't handle — procedural constructs like DECLARE, IF/END IF, CALL, and CREATE TEMP TABLE.
Which SQL dialects are supported?
BigQuery is the primary focus. The GSP SQLFlow engine also supports SQL Server, Oracle, PostgreSQL, Snowflake, Redshift, Teradata, and 20+ other dialects. If your warehouse uses procedural SQL, the sidecar can likely parse it.
Is my SQL sent to a third party?
Depends on the backend you choose. Anonymous and Authenticated modes parse SQL in Gudu Software's cloud (processed in memory, not logged or stored). Self-hosted mode keeps everything on your infrastructure — SQL never leaves your network.
How long does installation take?
Under 60 seconds. Run pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git, point it at your DataHub GMS endpoint, and run. No Docker, no Kubernetes, no DataHub plugins to install.
How is this different from sqlglot?
sqlglot handles standard SQL well but drops procedural constructs silently — it falls back to an opaque Command node with zero lineage. The GSP engine parses the full procedural SQL, including control flow, temp tables, and dynamic SQL. On the DataHub #11654 reproducer: 0 relationships (sqlglot) vs 11 column-level + 2 table-level (GSP).
What does it cost?
The sidecar tool is open source (Apache 2.0). The Anonymous backend is free with fair-use rate limits. Authenticated and Self-hosted backends have separate pricing — contact us for details.
Recover your BigQuery lineage.
One command. Every missing column-level relationship back in DataHub.
Open source on GitHub · Apache 2.0 license
The evidence is in the issue tracker.
On the procedural BigQuery script pasted in DataHub issue #11654: 0 relationships recovered by the default parser vs 11 column-level + 2 table-level recovered by the sidecar — on the exact same query.
Works with your existing DataHub 0.13+ installation. No DataHub fork or plugin required.