Stop losing Power BI lineage
in DataHub.
Power BI encodes newlines as #(lf) in M-language.
Without decoding, -- comments swallow every JOIN and
WHERE clause after them. DataHub's lineage graph goes silent.
One pip install brings it all back.
(out of 2)
(plus 2 table-level)
Your lineage graph is lying to you.
Power BI embeds SQL inside M-language Value.NativeQuery calls,
encoding newlines as #(lf). When DataHub's parser hits a
-- comment, it treats it as running to the end of the entire
string — because #(lf) isn't a real newline.
Every JOIN, WHERE clause, and upstream table after that first comment vanishes from lineage. No warning. No error. Just silence.
Open since August 2024. Three users confirmed it blocks DataHub adoption. Issue #11251 — still no fix.
SELECT upper(cs.customercode),
cs.ear2id, db.branch_rollup_name
FROM dim_customer cs
--join ... (commented out)
JOIN dim_customer_ear2... as so
JOIN ref_branch db ON ...
WHERE cs.customerstatusid = 1 --active
#(lf) decoding): 1 upstream table — everything after the -- comment is gone
Same query. Dramatically different results.
0 column-level lineages
+ 2 table-level
Same query with SQL comments. 1 vs 2 upstream tables, 0 vs 5 column-level lineages.
3 steps. 60 seconds.
Install
One pip command. No Docker, no infra changes, no DataHub plugins.
pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Run
The sidecar decodes Power BI's #(lf) encoding, then GSP parses the clean SQL and recovers all JOINs.
gsp-datahub-sidecar --sql-file query.sql --db-vendor dbvmssql --dry-run See the lineage
Open DataHub. Every upstream table and column-level relationship is back. Comments handled cleanly.
Three backends. Pick your comfort level.
Every backend uses the same GSP SQLFlow engine. The only difference is where SQL gets parsed.
Anonymous
- Cloud-parsed, not logged
- Rate-limited (fair use)
- Great for evaluation
Authenticated
- Personal API key
- Higher per-minute quota
- Usage dashboard
- Priority processing
Self-Hosted
- SQL never leaves your network
- No rate limits
- Full audit trail
- Enterprise support
Common questions.
Does this replace DataHub's lineage parser?
No. The sidecar augments DataHub's existing parser. DataHub still runs sqlglot for standard SQL. The sidecar re-parses statements where comments or encoding cause sqlglot to drop lineage edges.
Why do comments break Power BI lineage?
Power BI's M-language encodes newlines as #(lf) in SQL passed via Value.NativeQuery. Since #(lf) isn't a real newline, -- comments consume everything to end-of-string instead of end-of-line. All subsequent JOINs and WHERE clauses vanish from lineage.
Which SQL dialects are supported?
The GSP SQLFlow engine supports SQL Server, BigQuery, Snowflake, Oracle, PostgreSQL, Redshift, Teradata, and 20+ other dialects. For Power BI queries that typically target SQL Server or Snowflake, use --db-vendor dbvmssql or --db-vendor dbvsnowflake.
Is my SQL sent to a third party?
Depends on the backend you choose. Anonymous and Authenticated modes parse SQL in Gudu Software's cloud (processed in memory, not logged or stored). Self-hosted mode keeps everything on your infrastructure.
How is this different from sqlglot?
sqlglot parses SQL comments correctly when newlines are real — the issue is that Power BI encodes newlines as #(lf), and no SQL parser (sqlglot or GSP) can handle that without preprocessing. The sidecar adds the missing step: it decodes #(lf) back to real newlines before sending the SQL to GSP. On the #11251 reproducer: 1 upstream table (without sidecar) vs 2 tables + 5 column-level lineages (with sidecar).
What does it cost?
The sidecar tool is open source (Apache 2.0). The Anonymous backend is free with fair-use rate limits. Authenticated and Self-hosted backends have separate pricing — contact us for details.
Recover your Power BI lineage.
One command. Every missing upstream table and column-level relationship back in DataHub.
Open source on GitHub · Apache 2.0 license
The evidence is in the issue tracker.
“This is something very important. Especially when SQL comments are very common in PBI M-queries.”
“We also run into this issue and it is a real issue for part of our business to accept Datahub as a common catalog solution.”
Open since August 2024. 3 affected users. No maintainer fix. The sidecar recovers every missing relationship.