-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/tableau): support column level lineage for custom sql #8466
feat(ingestion/tableau): support column level lineage for custom sql #8466
Conversation
siddiquebagwan
commented
Jul 20, 2023
•
edited
Loading
edited
- CLL extraction in custom data sources (self.emit_custom_sql_datasources())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some high level questions:
- it looks like we're calling
parse_custom_sql
twice, once for table lineage and once for column lineage - what's the reasoning behind that - if self.ctx.graph is unset, we should still be able to get table lineage using sqlglot_lineage. It might be necessary to use the underyling method instead of graph.parse_sql_lineage
- does it make sense to fail if database info is missing? what was the logic behind that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had a few comments on the unit test, and seems like on of the tableau tests is still failing
"query": "SELECT user_id, source, user_source FROM (SELECT *, ROW_NUMBER() OVER (partition BY user_id ORDER BY __partition_day DESC) AS rank_ FROM invent_dw.UserDetail ) source_user WHERE rank_ = 1", | ||
"isUnsupportedCustomSql": "true", | ||
"database": { | ||
"name": "production database", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change this to be my-bigquery-project
Given that it's set, shouldn't this be used as upstream_db, and eventually passed as default_db to sqlglot_lineage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. It is getting used as default_db
code link: https://github.com/datahub-project/datahub/pull/8466/files#diff-f618f3bf4fb8b6cd799b3375ab22051094345681942a27895f8a7d56f3c36951R1606
mcp.entityUrn | ||
== "urn:li:dataset:(urn:li:dataPlatform:tableau,09988088-05ad-173c-a2f1-f33ba3a13d1a,PROD)" | ||
) | ||
sqlglot_lineage.return_value = SqlParsingResult( # type:ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to mock this - does it not produce correct results otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test_tableau_cll_ingest graph is None follow getting tested and in this test case I am providing Mock graph, so without mocking the sqlglot_lineage it returns empty array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully follow this - let's talk about it next time we chat
mcp.entityUrn | ||
== "urn:li:dataset:(urn:li:dataPlatform:tableau,09988088-05ad-173c-a2f1-f33ba3a13d1a,PROD)" | ||
) | ||
sqlglot_lineage.return_value = SqlParsingResult( # type:ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully follow this - let's talk about it next time we chat
@mohdsiddique looks like there's a merge conflict |
…k into master+tableau-cll
…into master+tableau-cll
…8466) Co-authored-by: MohdSiddiqueBagwan <[email protected]>
* tag 'v0.10.5': (222 commits) fix(test): increase siblings.js test stability (datahub-project#8542) feat(search): Allow aggregating on facets that are not explicitly part of default filter set (datahub-project#8540) fix(ui) Make multiple small updates to new search and browse (datahub-project#8524) feat(presto-on-hive): allow v1 fieldpaths in the presto-on-hive source (datahub-project#8474) feat(cli): Adds ability to upload recipes to DataHub's UI (datahub-project#8317) feat(browseV2): add browseV2 logic to system update (datahub-project#8506) fix(ingest/json-schema): convert non-string enums to strings (datahub-project#8479) feat(ingestion/tableau): support column level lineage for custom sql (datahub-project#8466) test(ingest): test case statements with sql parser (datahub-project#8437) feat(ingest/vertica): performance improvement and bug fixes (datahub-project#8328) ci: reduce git fetch depth (datahub-project#8473) fix(ingest): remove duplication of tags (datahub-project#8532) docs: small update to homepage (datahub-project#8483) fix(ingest): pin boto3-stubs in CI (datahub-project#8527) feat(siblings): hiding non-existant siblings in FE (datahub-project#8528) fix(ingest/build): Fix sagemaker mypy and flake8 issues (datahub-project#8530) feat(metrics): add metrics for aspect write and bytes (datahub-project#8526) feat(elasticsearch): allow bulk delete (datahub-project#8424) fix(ui): use locale lowercase when filtering columns of an entity in the lineage (datahub-project#8213) fix(auth): ignore case when comparing http headers (datahub-project#8356) ...