DataHub v0.10.1
Known Issues
CLI
- BigQuery: Table and Column Level profile broken due to bad assumption introduced in this version. Please use an alternate version if you are using the BigQuery Profiling feature.
ElasticSearch
7.9 and below clusters are no longer supported with this release due to lack of case sensitivity support in term queries
Release Highlights
User Experience
- The Queries Tab has a new look - supports manually adding and annotating queries directly from the UI, making it easier to share trusted SQL logic with others
- Glossary Terms now shows “Contained by" and "Inherited by" relationships
- Resolved issues with Download to CSV for large volumes of entities
- Update to the Analytics tab - view Monthly Active users to keep track of DataHub adoption and activity within your organization
- Ongoing UI optimizations focused on improve navigation experience
Metadata Ingestion
BigQuery
- Improvements to memory usage during metadata extraction
- Ingestion now captures Dataset Labels
- Emit cross-project usage
PowerBI
- Support for Platform Instance and uniquely identify multiple instances of the same Platform
- Support for PowerBI <> (Redshift, BigQuery) lineage extraction
- Extract entity descriptions
Miscellaneous
- DataHub Integrations Catalog to quickly filter and search for supported integrations
- Kafka Connect - support for stateful ingestion & lowercasing URNs
- Snowflake: improvements to memory usage during metadata extraction
- Postgres: supports estimated row counts during profiling
- Fix to dbt ingestion to address inconsistent upper/lower casing
- S3 ingestion now supports path_specs of multiple buckets in the same recipe
- Looker: Upgrade Looker API from 3.1 to 4.0
- Great Expectations: support for lowercasing URNs
- Tableau: Support for Project Path & Containers; ingestion more resilient to timeout exceptions
Developer Experience
Miscellaneous
- Neo4j support for lineage time filter
- Metadata model support for JSON schemas stored in Files, Directories, and Kafka Schema Registry
- Timeline API now supports Glossary Terms
- Improvements to startup time for DataHub CLI
API Docs & Guides
- Table of contents to understand DataHub APIs at a glance
- Guides:
- Add Tags, Terms, Owners to entities
- Create datasets
- Manage Lineage
Search Improvements
- searchAcrossEntities/Lineage improvements
- support searchAfter
- advanced query, identity autocomplete, exact match weight
Breaking Changes
Lineage Graph UI
- Previously, DataHub would display Nodes in Lineage Viz even for URNs that do not technically exist (do not have any aspects defined). Now, those nodes are filtered out. This means that lineage which previously existed may not appear anymore in Lineage Graph. This change was done to improve the correctness and consistency of the DataHub experience. If you have feedback, feel free to reach out to the core team. To fix this issue, simply produce "DatasetKey" aspects for any URNs that you'd like to show in Lineage graph.
What's Changed
- fix(test): cleanup test on setup error by @david-leifker in #7259
- feat(cli): add 0.10 awareness to upgrade prompt by @shirshanka in #7273
- chore(ci): cleanup build to remove dependencies duckdb, dev by @anshbansal in #7267
- feat(oidc): add options for preferred jws algorithm by @david-leifker in #7245
- chore(cypress): upgrade cypress to latest v12.5.1 by @aditya-radhakrishnan in #7276
- fix(ingest/bigquery) - Fix for Bigquery parser quoted semicolon in the FROM table name as well by @treff7es in #7277
- chore(ci): ensure kafka setup runs for smoke tests by @anshbansal in #7278
- feat(ingest/bigquery) - Reporting current state of BigQuery ingestion by @treff7es in #7282
- feat(graphql): enabling graphql for data platform instance aspects by @sgomezvillamor in #7177
- feat(api): Timeline API supports Glossary Terms now by @vojtechneradatos in #7229
- getting rid of build locally(broken) for ./gradlew quickstart(working) by @laulpogan in #7283
- chore(ci): remove redundant quickstart check by @anshbansal in #7286
- Update smoke.sh by @david-leifker in #7284
- docs(release notes): Managed DataHub v0.2.0 release notes by @david-leifker in #7299
- docs(release): updating docs per release process by @david-leifker in #7281
- doc(access): move heading above the images by @anshbansal in #7291
- fix(docs): kafka - update docs to indicate protobuf support by @shirshanka in #7280
- fix(system-update): fixes system-update with more than 1 partition by @david-leifker in #7302
- fix(ui): fix styling on sign up and reset screens by @aditya-radhakrishnan in #7301
- fix(cypress): fix broken cypress tag tests by @aditya-radhakrishnan in #7306
- chore(ci): speed up ingestion test runs by @anshbansal in #7296
- docs(release notes): Update updating-datahub.md by @david-leifker in #7311
- fix(ingest/bigquery): Usage rate limiting and lineage exported log fix by @treff7es in #7297
- fix(bootstrap): do not re-run retention policy ingestion by @anshbansal in #7295
- refactor(github): change github reference to git references by @anshbansal in #7308
- fix(datahub-upgrade): allow registry override by @david-leifker in #7258
- feat(cli): improve startup time by @hsheth2 in #7292
- fix(search): correctly filter fields in EDITABLE_FIELD_TO_QUERY_PAIRS with a list of values by @jinlintt in #7303
- fix(ingest/bigquery) Lowering significantly the memory usage of the BigQuery connector by @treff7es in #7315
- chore(ingest): upgrade to mypy 1.0.0 by @hsheth2 in #7313
- fix(tests): Remove rollback-reports, add to ignore by @david-leifker in #7312
- perf(ingest): speed up MCPW.validate() by @hsheth2 in #7319
- fix(ingest/bigquery): Fix for table cache was not cleared by @treff7es in #7323
- fix(ingest/bigquery): Improve memory usage of lineage extraction by @treff7es in #7326
- docs(): Adding notebook support disclaimer by @jjoyce0510 in #7327
- fix(docs): sort sources by display name in doc's sidebar by @Masterchen09 in #7322
- fix(transformers): pattern add domain transformer - enable replace_existing by @asikowitz in #7317
- fix(ci): remove command from cache key as irrelevant for dependency by @anshbansal in #7314
- fix(check upgrade): update logic to compare server and client version by @mayurinehate in #7238
- fix(tracking): Remove 'title' field from tracking by @jjoyce0510 in #7328
- fix(homepage): make entity counts execute in parallel and make cache configurable by @RyanHolstien in #7249
- docs(delete): cleanup removed option by @anshbansal in #7335
- feat(ingestion): powerbi # Configurable Admin API by @mohdsiddique in #7055
- fix(sso) Retrieve cookie configs separately from SSO configs by @chriscollins3456 in #7330
- logging(cli): dropping neo4j message to debug to avoid confusion by @shirshanka in #7340
- perf(matadata-io): neo4j generateLineageStatement use shortestPath by @shidianshifen in #7219
- fix(tableau): make Tableau ingestor resilient to timeout exceptions by @skrydal in #7333
- chore(ci): mark tests correctly by @anshbansal in #7337
- refactor(upgrade): Trim upgrade name before executing by @jjoyce0510 in #7343
- fix(ui) Update styles of embedded profile page to match designs by @chriscollins3456 in #7348
- fixed links and improved recommendations by @laulpogan in #7334
- gradle(development): add additional commands for development by @david-leifker in #7321
- fix(search): support searchFlags for GraphQL by @RyanHolstien in #7346
- fix(elasticsearch): make alias creation atomic by @david-leifker in #7332
- Saas docs migration by @laulpogan in #6603
- removing local airflow from sidebar and adding a warning at the top by @laulpogan in #7331
- development(docker): add flag to gradle for quickstart by @david-leifker in #7355
- fix(gradle): fix gradle command referenced in docs by @david-leifker in #7318
- fix(ingest/bigquery): Increase batch size in metadata extraction if no partitioned table involved by @treff7es in #7252
- feat(cli): make deprecations, renames easier to notice by @anshbansal in #7310
- fix(cli): Corrects search filter for delete by @pedro93 in #7367
- fix(ingestion/snowflake): Fixing stateful ingestion commit at Snowflake source by @treff7es in #7363
- fix(ingestion): powerbi # continue ingestion if m-query parsing fail by @mohdsiddique in #7360
- feat: add chart entities to similar browsepath as dashboards by @looppi in #7293
- fix(lineage): Include maxHops in Lineage Cache Key + misc UI improvements by @jjoyce0510 in #7351
- refactor(ingest,athena): update athena sample recipe by @bossenti in #7368
- fix(ingest/looker): do not instantiate LookerDashboardSource on test_connection by @asikowitz in #7369
- fix(deps): pin snowflake-connector-python by @asikowitz in #7365
- feat(ingest): json-schema - add json schema support for files and kaf… by @shirshanka in #7361
- docs: fix broken link by @sandertan in #7344
- chore(versions): bump versions by @david-leifker in #7358
- test(cli): add check for missing init files by @anshbansal in #7378
- fix(ingest/snowflake): Improve memory usage of metadata extraction by @asikowitz in #7349
- feat(elasticsearch): advanced query, identity autocomplete, exact match weight by @david-leifker in #7354
- feat(queries): Overhaul Queries Tab by @jjoyce0510 in #7366
- chore(version): additional version bumps & suppressions by @david-leifker in #7382
- fix(lineage): Fix Upstream + Downstream Count in presence of Soft-Deleted / Non-Existent references by @jjoyce0510 in #7374
- fix(dep/json-schema): Fixing json-schema dependencies by @treff7es in #7383
- feat(analytics): add monthly active users in highlights by @anshbansal in #7341
- fix(search): fix search filters, handle detection of keyword subfield by @david-leifker in #7372
- chore(bump): bump hadoop client and fix exclusion name by @david-leifker in #7386
- build(scan): enable trivy scan ingestion-base by @david-leifker in #7389
- chore(ci): relax bigquery dependency by @anshbansal in #7309
- build(idea): mark metadata-ingestion sources and tests by @asikowitz in #7394
- docs(website): Add airtel logo by @jeffmerrick in #7395
- fix(ingest/oracle) add database name to oracle urn name by @jaegwonseo in #7016
- fix(docs) Update transformers docs to note not minting urns by @chriscollins3456 in #7399
- chore(ci): update base dependencies by @anshbansal in #7390
- fix(search): exact match updates per review by @david-leifker in #7385
- fix(ingest): Do not require platform_instance for stateful ingestion by @asikowitz in #7397
- Dockerize updates by @david-leifker in #7387
- fix(ingest/bigquery): Correctly upsert lineage_map when parsing view ddl by @asikowitz in #7403
- feat(timeBasedLineage): add feature flag for always producing MCL by @pedro93 in #7407
- fix(ingest/bigquery): Prefer parsed lineage for view over lineage from audit logs by @mayurinehate in #7408
- Update README.md by @amartinson193 in #7400
- docs(logo) add VanMoof logo to site by @maggiehays in #7402
- feat(ingest/kafka-connect): add config to lowercase urns, do not emit… by @mayurinehate in #7393
- feat(frontend): add additional tabs to glossary terms view by @alexey-kravtsov in #7392
- feat(auth): REST API authorization by @RyanHolstien in #6614
- fix(ingest/kafka): Remove topic from browse path by @asikowitz in #7398
- feat(ingest/bigquery) - Emit cross-project usage from gcp logs by @treff7es in #7364
- feat(elasticsearch): support searchAfter by @RyanHolstien in #7235
- docs(managed datahub): release notes for v0.2.1 by @anshbansal in #7414
- fix(frontend) support utf-8 charset by @lutongzero in #7405
- fix(ingest/bigquery) Filter upstream lineage by list of existing tables by @asikowitz in #7415
- refactor(ingest): lookml - fix up golden files in normalized form by @shirshanka in #7423
- fix(ingest/bigquery): Fixing double quoting in profiling approx count query by @treff7es in #7416
- fix(ingest): lookml - add support for includes, extends, view_name i… by @shirshanka in #7428
- fix(recommendations): fix recommendations on homepage by @david-leifker in #7433
- docs(website): fix homepage logo sizing by @jeffmerrick in #7430
- feat(queries): Adding Tooltips to Queries Tab by @jjoyce0510 in #7421
- fix(analytics): remove zero values being added in charts by @anshbansal in #7425
- docs(ingest): add ingestion configs guide by @hsheth2 in #7438
- fix(ingest/bigquery): Querying table metadata details in batch properly by @treff7es in #7429
- fix(ingest/snowflake): fixing Snowflake state issue by @treff7es in #7443
- refactor(tests): extract common code by @anshbansal in #7441
- fix date ranges being queried in charts by @anshbansal in #7444
- feat(tests): allow use of system auth for test session by @anshbansal in #7445
- fix(ingest/athena): Fix athena source if dbname is not specified in the connection string by @treff7es in #7417
- fix(lineage): Fixing Timeline Lineage Filters by @jjoyce0510 in #7435
- fix(ingest/unity): Use assigned metastore if not metastore listed in unity catalog by @treff7es in #7446
- chore(ingest): cleanup unused files/vars in tests by @hsheth2 in #7450
- Feat/s3 ingestion enhancement to update schema from latest partition by @nachiket-juneja in #7410
- chore(ingest/glue): cleanup deprecated
underlying_platform
config by @hsheth2 in #7449 - refactor(ingest): avoid allowing extras for all DataHubGraphConfig by @hsheth2 in #7448
- docs(ingest): add more guidelines for writing sources by @hsheth2 in #7451
- fix(smoke): add missing test resource by @hsheth2 in #7455
- refactor(ingest): subtypes - standardize by @shirshanka in #7437
- docs(ingest): add details about backwards compatibility guarantees by @hsheth2 in #7439
- fix(ui) Merge duplicate schema fields on siblings regardless of casing by @chriscollins3456 in #7413
- fix(kafka-setup): configure sasl.mechanism in case SASL_PLAINTEXT by @k-popov in #7447
- docs(managed): v0.2.2 managed datahub release notes by @david-leifker in #7456
- chore(ci): exclude duckdb from smoke test by @anshbansal in #7458
- fix(ingest/bigquery): simplify type annotations for bigquery usage by @hsheth2 in #7457
- feat(ingest): Introduce FileBackedDict for offloading data to disk by @asikowitz in #7461
- fix(ingest/dbt): remove deprecated
backcompat_skip_source_on_lineage_edge
option by @hsheth2 in #7466 - refactor(ingest): use auto_stale_entity_removal in json schema source by @hsheth2 in #7465
- fix(ingest/bigquery): update bigquery platform_instance capability by @TonyOuyangGit in #7467
- fix(ingest/s3): propagate s3 endpoint to profiling by @tmemenga in #7431
- fix(ingest): remove extraneous platform configs by @hsheth2 in #7454
- feat(ingest/bigquery) - Capture dataset labels in bigquery by @treff7es in #7460
- Add setup job labels to compose files by @szalai1 in #7473
- chore(ci): upgrade GE version by @anshbansal in #7290
- fix(ingest/dbt): check for nodes key before accessing by @khgould in #7462
- fix(search): per field analyzers for simple_query_string by @david-leifker in #7436
- tests(cypress): add improved Cypress tests for timeline lineage by @aditya-radhakrishnan in #7464
- fix(ui) Standardize subtypes casing with View Definition tab by @chriscollins3456 in #7477
- feat(elasticsearch): validate index.blocks.write setting by @david-leifker in #7478
- feat(ingest/tableau): project path and container support by @mohdsiddique in #7426
- refactor(ingest): Convert FileBackedDict to dataclass for cleaner init by @asikowitz in #7469
- chore(ingest): pin acryl-datahub-classify by @hsheth2 in #7485
- fix(ingest/tableau): load project workbook hierarchy correctly by @hsheth2 in #7483
- fix(ingest): redact auth info in curl commands by @hsheth2 in #7496
- fix(ingest): prevent logging from blowing up on TypeErrors by @hsheth2 in #7497
- fix(ui) Make tooltip on search results stats summary clearer by @chriscollins3456 in #7492
- fix(ui) Fix UI flickering when switching between glossary entities by @chriscollins3456 in #7432
- feat(ingest/GX): add urn lowercasing option for GX assertions by @mayurinehate in #7472
- feat(cli): introduce remote config for quickstart by @szalai1 in #7424
- feat(ingestion): powerbi # support Google BigQuery table lineage by @mohdsiddique in #7502
- feat(ingest): unbundle airflow plugin emitter dependencies by @cburroughs in #7493
- feat(cli): finalizing quickstart config commit hash by @szalai1 in #7509
- feat(ingest/postgres): support estimated row counts in profiling by @arunvasudevan in #7476
- fix(ingest/bigquery): fix missing materialized views by @mayurinehate in #7511
- fix(ingest): make quickstart error handling more robust by @hsheth2 in #7513
- fix(ingest): limit typing_extensions classes to those available in min version by @cburroughs in #7490
- feat(ingest/vertica): improve vertica type mappings by @NotYuki in #7459
- chore(ingest): remove unused dependency for bigquery by @mayurinehate in #7510
- feat(ingest/looker): upgrade to Looker API from 3.1 to 4.0 by @feljen in #7411
- Docs update by @szalai1 in #7517
- feat(graphql): Added GraphQL mappings for the "created" and "lastModified" fields in "DatasetProperties" aspect by @siladitya2 in #7463
- docs(guidelines) Update community guidelines by @maggiehays in #7518
- fix(ui-ingestion) Fix UI manual ingestion runs by consistently setting pipeline_name by @chriscollins3456 in #7521
- feat(docs-website): support category links by @hsheth2 in #7516
- feat(ingest/powerbi): support PowerBI parameter references by @hsheth2 in #7523
- feat(ingest): enable joins across FileBackedDicts + add FileBackedList by @hsheth2 in #7506
- fix(): Fix Query Detail Modal Scroll + add misc log messages by @jjoyce0510 in #7530
- fix(frontend proxy): Disable unnecessary URL encoding at the proxy layer by @jjoyce0510 in #7532
- fix(ingest): delta-lake - support assume aws role by @shirshanka in #7524
- docs(ingest): add guidelines around proactive version pinning by @hsheth2 in #7534
- docs(): add sources summary page by @laulpogan in #7480
- fix(grafana): use variable datasource uid by @maaaikoool in #7488
- fix(ingest/looker): stringify looker user ids by @hsheth2 in #7531
- Revert "docs(): add sources summary page" by @laulpogan in #7546
- feat(openapi): add relationships endpoint by @shirshanka in #7547
- Add documentation example for using restoreIndices with an urnLike argument by @iprentic in #7544
- fix(ingest/snowflake): bump up classification library version to 0.0.6 by @mayurinehate in #7542
- fix(test): suppress s3 golden file test for specific paths by @shirshanka in #7551
- fix(docs-website): reflect PythonSDK & GraphQL Docs changes by @yoonhyejin in #7557
- feat(search): searchAcrossEntities/Lineage improvements by @david-leifker in #7550
- docs(): re-add sources summary page by @laulpogan in #7563
- Update restore indices docs to include batch information by @iprentic in #7564
- feat(ingest): fix edge cases + interface cleanup for file-system APIs by @hsheth2 in #7533
- feat(ingest): powerbi # store powerbi entity descriptions by @looppi in #7154
- fix(cli): Adding exit code to correctly return failure or success by @jjoyce0510 in #7520
- feat(cli): switch default quickstart to v0.10.0 by @hsheth2 in #7567
- chore(ci): try Qodana Scan for quality by @anshbansal in #7560
- chore(ci): add daylight savings timezone for tests, fix daylight saving bug in analytics charts by @anshbansal in #7484
- fix(lineage): nullpointer exceptions by @anshbansal in #7577
- docs(managed ingestion): add release notes for v0.2.3 by @anshbansal in #7578
- fix(logging): increase log level for system-upgrade job to complete before starting by @iprentic in #7566
- fix(ui) Safeguard ingestion execution request check by @chriscollins3456 in #7584
- refactor(ui): Separate entity lineage counts query from rest of entity query by @jjoyce0510 in #7569
- feat(ingest/snowflake): use auto_workunit_reporter helper by @hsheth2 in #7568
- feat(ingest/kafka-connect): add stateful ingestion and platform instance support by @mayurinehate in #7526
- fix(gms): convert obj to string, fix wrong setup by @anshbansal in #7582
- refactor(ingest): Use shared connection wrapper over connection cache by @asikowitz in #7570
- Extend character limit for Create Domain Modal by @gabe-lyons in #7589
- fix(smoke-test): always use built images in smoke tests by @hsheth2 in #7587
- feat(ingest/s3): support path_specs of different S3 buckets in the same recipe by @harsha-mandadi-4026 in #7514
- fix(ingest): pin
typeguard
version for feast by @hsheth2 in #7591 - chore(ci): update dependencies, fix smoke image build by @anshbansal in #7580
- chore(deps): bump @sideway/formula from 3.0.0 to 3.0.1 in /datahub-web-react by @dependabot in #7554
- fix(ingest/powerbi): support each expression in m-query function invocation by @mohdsiddique in #7541
- fix(ingestion): Readd batchDelayMs by @egemenberk in #7559
- chore(deps): bump @sideway/formula from 3.0.0 to 3.0.1 in /docs-website by @dependabot in #7553
- fix(docker): fix elasticsearch image tag by @david-leifker in #7548
- feat(docs): add docs on lineage by @yoonhyejin in #7576
- refactor: misc fixes logging, annotations by @anshbansal in #7579
- fix(policies): add missing policies, add check to prevent problems by @anshbansal in #7586
- docs: misc fixes by @anshbansal in #7603
- feat: add docs on adding column/dataset description by @yoonhyejin in #7597
- feat(cli): show image pull progress in quickstart by @hsheth2 in #7593
- fix(ingest/snowflake): Allow SnowflakeObjectAccessEntry.objectId to be None by @asikowitz in #7601
- docs(): Add View-related permissions to DataHub docs by @jjoyce0510 in #7600
- feat(ingest): add urn modification helper by @hsheth2 in #7440
- feat: add docs on creating tags/terms/datasets by @yoonhyejin in #7608
- feat(metadata-io): add support in Neo4jGraphService for lineage time filter by @shidianshifen in #7375
- docs: add new code examples on creating entities & fix minor typos by @yoonhyejin in #7613
- chore(ci): fix flakiness, misc improvements by @anshbansal in #7605
- feat(ingest/docs): json-schema fixes, improvements to ingestion doc generation by @shirshanka in #7615
- fix(docker): fix gradle quickstart version parsing by @hsheth2 in #7614
- fix(elasticsearch): make indexNameMapping in IndexConventionImpl threadsafe by @iprentic in #7565
- docs: add CLI installiation guide via poetry by @yoonhyejin in #7619
- fix(ingest/docs): improve matcher to include types with spaces in them by @shirshanka in #7631
- docs: reformat use case guide toc & api comparison table by @yoonhyejin in #7621
- docs: fix image in development by @jx2lee in #7637
- docs: fix typo and image by @yoonhyejin in #7635
- feat(ingestion): powerbi # Amazon Redshift lineage support by @mohdsiddique in #7562
- fix(ingest/dbt): introduce lowercase column urn option by @alex-magno in #7418
- fix(smoke-test): fix native user and access token tests by @aditya-radhakrishnan in #7628
- build(docker): metadata-ingestion images build and add slim version by @david-leifker in #7412
- fix(search): tags with colons exercises search with urns, must follow… by @david-leifker in #7602
- feat(ingest): add auto_materialize_referenced_tags helper by @hsheth2 in #7626
- fix(ingest): remove get_platform_instance_id from stateful ingestion by @hsheth2 in #7572
- fix(ingest/superset): support superset v2 by @hsheth2 in #7588
- fix(entity registry): Fix patching aspects onto existing Config based entity by @jjoyce0510 in #7624
- fix(docker): fix image name for datahub-ingestion-slim by @shirshanka in #7653
- misc fixes by @david-leifker in #7633
- fix(impactAnalysis): fix filtering for lightning mode search (#1225) by @shirshanka in #7652
- fix(platform): Ensure time based lineage handles noop changes by @shirshanka in #7657
- refactor(ui): Fix scrolling behavior for compact entity profile by @jjoyce0510 in #7599
- feat(ingestion): powerbi # support platform instance by @mohdsiddique in #7583
- feat(ingestion): powerbi # uniquly identify the multiple instance of same platform by @mohdsiddique in #7632
- fix(datahub-upgrade) custom timeseries aspect index creation issue. by @siladitya2 in #7622
- fix(ui): Fix download to CSV flow using Scroll across entities api by @jjoyce0510 in #7629
- fix(search): missing model updates and tests by @david-leifker in #7617
- fix(revert): remove unnecessary class check by @RyanHolstien in #7658
- lint(test): remove unused imports, other test fixes by @david-leifker in #7659
- refactor(ui): Loading schema dynamically for dataset profile by @jjoyce0510 in #7558
- fix(ui): Address regression in column usage stats + add unit test by @jjoyce0510 in #7645
- refactor(ui): Make Navigating DataHub UI easier, fix duplicate tracking, duplicate networks calls, + misc optimizations by @jjoyce0510 in #7592
- refactor(lineage): Refactor getAndUpdatePaths inside of ESGraphQueryDao by @jjoyce0510 in #7556
- feat(cli): build and upload Python wheels in CI by @hsheth2 in #7537
- fix(ingest/bigquery): Pass whether view is materialized; pass last_altered correctly by @asikowitz in #7660
- feat(docs-website): add vercel preview environment by @hsheth2 in #7644
New Contributors
- @jinlintt made their first contribution in #7303
- @asikowitz made their first contribution in #7317
- @shidianshifen made their first contribution in #7219
- @sandertan made their first contribution in #7344
- @amartinson193 made their first contribution in #7400
- @lutongzero made their first contribution in #7405
- @nachiket-juneja made their first contribution in #7410
- @k-popov made their first contribution in #7447
- @TonyOuyangGit made their first contribution in #7467
- @tmemenga made their first contribution in #7431
- @khgould made their first contribution in #7462
- @NotYuki made their first contribution in #7459
- @siladitya2 made their first contribution in #7463
- @iprentic made their first contribution in #7544
- @yoonhyejin made their first contribution in #7557
- @harsha-mandadi-4026 made their first contribution in #7514
- @egemenberk made their first contribution in #7559
- @alex-magno made their first contribution in #7418
Full Changelog: v0.10.0...v0.10.1
v0.10.0
Release Highlights
Potential Downtime
This release introduces substantial improvements to search functionality which require reindexing indices.
During the reindexing:
- a system-update job will set indices to read-only and create a backup/clone of each index
- new components will be prevented from start-up until the reindex completes
- Helm deployments will go into read-only mode and new ingestion runs will fail
This process can take anywhere from 5 minutes to multiple hours; as rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.
User Experience
We have some really exciting improvements to the DataHub user experience in this release!
Improved documentation editor, contributed by @ngamanda and the Grab Team.
This work provides a much more intuitive documentation editing experience within the UI, providing “what you see is what you get” formatting & removing the need for markdown expertise.
Additionally, you can easily:
- Add links to other entities/users within DataHub
- embed and resize tables & images
- toggle between font sizes and formats
- embed syntax-highlighted code blocks
Filter lineage graphs based on time windows
You can now easily see the full lineage graph of an entity at a specific point in time. This makes it much easier to understand how interdependencies have evolved over time and to troubleshoot data issues in the past.
Improvements in Search
As noted above, we have rolled out substantial improvements to Search functionality, making it easier than ever for end-user to find the entities that matter most. This release includes:
- Stemm & Synonyms
- Search by full or partial URN
- Autocomplete improvements
- Quoted search analyzer for exact & prefix match
Metadata Ingestion
Here are some of the most notable ingestion-related improvements:
- Redshift: You can now extract lineage information from unload queries – thanks for the contrib, @mmmeeedddsss
- PowerBI: Ingestion now maps Workspaces to DataHub Containers – thanks for the contrib, @looppi
- BigQuery: You can now extract lineage metadata from the Catalog API – thanks for the crontrib, @PatrickfBraz
- Glue: Ingestion now uses table name as the human-readable name – thanks for the contrib, @danielcmessias
Developer Experience
- This release introduces DataHub Lite - a new experimental lightweight implementation of DataHub. It is intended to enable local developer tooling use-cases such as simple access to metadata for scripts and other tools. DataHub Lite is compatible with the DataHub metadata format and all the ingestion connectors that DataHub supports. Checkout the docs here.
Breaking Changes
#7103 This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the kafka-setup docker image have been updated to be in-line with other DataHub components, for more info see our docs on Configuring Kafka in DataHub . They have been suffixed with _TOPIC where as now the correct suffix is _TOPIC_NAME. This change should not affect any user who is using default Kafka names.
What's Changed
- fix(ci): only scan on master branch by @anshbansal in #7047
- fix(ci): use trivy offline scanning by @anshbansal in #7050
- docs(get-started) Simplify copy on Get Started landing page by @maggiehays in #7043
- fix(ingest/kafka): fix ResourceType import error for confluent_kafka<1.9.0 by @mayurinehate in #7046
- docs(dbt): fix indentation in dbt meta mapping docs by @jx2lee in #7045
- fix(ingest): temporarily disable vertica tests by @hsheth2 in #7059
- feat(editor): improve documentation editor using Remirror by @ngamanda in #6631
- fix(bootstrap): add EDIT_LINEAGE privilege to some default policies by @aditya-radhakrishnan in #7060
- feat(ingest): add entity registry in codegen by @hsheth2 in #6984
- feat(ingest): extract powerbi endorsements to tags by @looppi in #6638
- feat(ingestion): pull metabase database, schema names from raw query and api by @remisalmon in #7039
- fix(ingest): support multiple entity_registry sections by @hsheth2 in #7066
- ci(ingest): add flag to skip tests but run codegen during release by @hsheth2 in #7067
- fix(ingest): preserve dbt column name casing by @hsheth2 in #7063
- fix(ingest/tableau): fix node limit exceeded error for workbooks query by @mayurinehate in #7068
- fix(build/airflow): Fixing gradlew path by @treff7es in #7069
- feat(ingest): support snapshots in dbt and dbt-cloud by @hsheth2 in #7062
- fix(ui) Fix duplicate schema field rendering with siblings by @chriscollins3456 in #7057
- refactor(ingest/athena): Replace
s3_staging_dir
parameter in Athena source withquery_result_location
by @bossenti in #7044 - feat(ingest): fix handling of unions with aliases in post restli conversion by @hsheth2 in #7058
- fix(ui) Make checkboxes in ingestion forms easier to see by @chriscollins3456 in #7061
- fix(ingest): support git clone of non-github repos by @hsheth2 in #7065
- feat(ingest): reporting revamp, part 1 by @hsheth2 in #7031
- fix(secret-service): fix default encrypt key by @david-leifker in #7074
- feat(datahub-lite): introduces a new experimental lightweight impleme… by @shirshanka in #7052
- feat(datahub-lite): adding tab completion, small serialization fixes by @shirshanka in #7079
- docs: add docs for managed DataHub v0.1.72 by @anshbansal in #7070
- docs(readme): add inovex as adopter by @DSchmidtDev in #7077
- docs: add warning about clearing cookies for login by @anshbansal in #7084
- feat(cache): add hazelcast distributed cache option by @RyanHolstien in #6645
- docs(datahub-lite): small improvement for zsh tab completion by @shirshanka in #7085
- fix(ingest/bigquery): clear stateful ingestion correctly by @hsheth2 in #7075
- fix(graphql): Return with appropriate status code instead of stacktrace by @szalai1 in #7086
- fix(sso): Clear cookies on SSO redirect error by @aditya-radhakrishnan in #7088
- fix(docs): add missing mutation literal by @ruedigerblock in #7082
- fix(ui): display the correct access token expiry in AccessTokenModal by @ngamanda in #7078
- fix(cli/lite): fix datahub lite serve command by @hsheth2 in #7089
- fix(profiling): Fix syntax for APPROX_COUNT_DISTINCT on bigquery and snowflake by @feljen in #7087
- fix(ingest): fix logic error of google protobuf wrapper type. by @wngus606 in #7076
- feat(ui): Documentation Editor Improvements by @jjoyce0510 in #7072
- fix(uri): marks uri field as deprecated, removes problem code, and adds coercer for usages of URI typeref by @RyanHolstien in #7093
- fix(build): postgres docker secret by @david-leifker in #7092
- fix(ingest/snowflake): handle corrupted snowflake OCSP cache file by @hsheth2 in #7095
- refactor(ingest): Refactoring container creation to common place by @treff7es in #6877
- feat(ingest): move datahub-lite to optional dep and add shim when missing by @hsheth2 in #7097
- fix(docker): support non amd64 dockerize in setup containers by @tonycsoka in #7091
- test(ingest): fix kafka admin client mocking by @hsheth2 in #7098
- fix(build): Fix postgres setup gha by @david-leifker in #7104
- fix(ingest/profile): properly quoting approx_count_distinct by @treff7es in #7101
- style(models): Replaces non-ASCII charactes in pdl files with ASCII c… by @nmbryant in #7105
- feat(ingest): hide cartesian product warnings in GE profiler by @hsheth2 in #7096
- feat(ingest): add removing partition pattern in spark lineage by @ssilb4 in #6605
- feat(redshift): Fetch lineage from unload queries by @mmmeeedddsss in #7041
- fix(ci): do not confirm on force for deletion by @anshbansal in #7106
- fix(analytics): add missing usage events causing warning in logs by @anshbansal in #7109
- feat(quickstart): Remove kafka-setup as a hard deployment requirement by @pedro93 in #7073
- fix(tests): Fixing add_users smoke test by @jjoyce0510 in #7116
- chore(deps): bump ua-parser-js from 0.7.32 to 0.7.33 in /docs-website by @dependabot in #7122
- docs(gms): clarify behavior of soft deletion in UI by @aditya-radhakrishnan in #7117
- fix(kafka-setup): Make topic name consistent with other images by @pedro93 in #7103
- chore(deps): bump ua-parser-js from 0.7.32 to 0.7.33 in /datahub-web-react by @dependabot in #7123
- feat(ingest): powerbi # add powerbi workspaces to containers by @looppi in #6532
- fix(diffMode): prevent misconfiguration of diff mode by @RyanHolstien in #7127
- fix(ui) Display glossary term name in analytics page properly by @chriscollins3456 in #7128
- fix(ui): only use visible and enabled tabs for selected tab and routing in entity profiles by @Masterchen09 in #6629
- fix(htrace): remove htrace jar by @szalai1 in #7126
- feat(datahub-lite): simplify get response by @shirshanka in #7131
- fix(doc/biquery): Updating bigquery capability doc by @treff7es in #7136
- fix(ci): do not fail fast for matrix runs by @anshbansal in #7132
- refactor(ui): refactor capitalization of platform name and sub types by @Masterchen09 in #7099
- refactor(cli): extract method, change wording by @anshbansal in #7134
- docs(lineage): Updating Lineage feature guide by @maggiehays in #6257
- removing WIP by @laulpogan in #7140
- docs(oidc): Updating + improving docs around OIDC configuration by @jjoyce0510 in #7141
- fix(ingest): add message proto check by @tinolyu in #7130
- fix(ingest): use snowflake median function in profiling by @hsheth2 in #6987
- feat(ui): allow removing parentNodes of Glossary Nodes and Glossary Terms by @ngamanda in #7135
- feat(ui) Add new embedded profile to be displayed in extension by @chriscollins3456 in #7113
- feat(ingest): add
--log-file
option and show CLI logs in UI report by @hsheth2 in #7118 - fix(misc): NPE and GraphQL case fixes by @david-leifker in #7149
- fix(ingest/snowflake): fix regression in approx count distinct by @hsheth2 in #7146
- [docs] fix typo / add missing line for docker compose / attach overwriting system action config for confluent. by @kdongho in #7142
- reordering sidebar and adding homepage to apis by @laulpogan in #7139
- fix(ingestion): powerbi # Not all arguments converted to string by @mohdsiddique in #7157
- fix(ui): Sort top users by their query count in datasets stats tab by @jaykadambi in #7148
- refactor(ui): Updates to Manual Lineage search by @jjoyce0510 in #7151
- feat(ui) Build entity doesn't exist page for entity profiles by @chriscollins3456 in #7150
- ci(ingest): fix broken CI workflow for metadata-ingestion by @hsheth2 in #7161
- fix(ingest): azuread group mapping do not stop ingestion by @anshbansal in #7169
- fix(docs): Fixes links to docs templates by @viniciusdsmello in #7171
- refactor(ui ingest): Allow enabling / disabling ingestion schedule easily by @jjoyce0510 in #7162
- fix(ingest): switch various sources to
auto_stale_entity_removal
helper by @hsheth2 in #7158 - docs(townhall) Update Townhall History doc by @maggiehays in #7180
- test(ingest/delta-lake): fix spurious directory creation by @hsheth2 in #7179
- feat: add a linter for github actions workflows by @hsheth2 in #7178
- fix(quickstart): adding back kafka-setup by @szalai1 in #7181
- fix(docs) Fix broken links in ingestion docs by @chriscollins3456 in #7183
- fix(ingest/GX): fix snowflake urn generated from connection string by @mayurinehate in #7173
- feat(ingest): switch dbt to use
auto_stale_entity_removal
by @hsheth2 in #7160 - fix(ingest): fix issue in glue tests by @hsheth2 in #7185
- fix(log): logging timestamp in ISO8601 format instead of time by @anshbansal in #7188
- feat(ingest): bigquery - extracts lineage metadata from catalog api by @PatrickfBraz in #7137
- fix(ingest/tableau): show warning about token expiry for PATs by @hsheth2 in #7187
- fix(ingest/vertica): Fixing missing container properties by @treff7es in #7197
- chore(deps): bump Netty from 4.1.85.Final to 4.1.86.Final by @janhicken in #7191
- docs(ingestion): powerbi # Add permission for DAX and mashup expressions by @mohdsiddique in #7195
- feat(elasticsearch): Elasticsearch improvements by @david-leifker in #6894
- fix(test): spark-lineage # build task as dependency of integrationTest by @mohdsiddique in #7189
- chore(sample): add status removed aspect for sample data by @anshbansal in #7203
- docs(managed datahub): release notes for v0.1.73 by @anshbansal in #7194
- fix(bootstrapdata): update timestamp to be in the last 1 year by @szalai1 in #7206
- fix(ingest/bigquery): quoting for APPROX_COUNT_DISTINCT in BigQuery by @mryorik in #7207
- fix(versioning): Ensure that CLI version is always dot-delimited even in minor release versions by @jjoyce0510 in #7200
- fix(test): missing variables in test causing error in logs by @anshbansal in #7210
- feat(mlModel): mark downstream jobs as ml model downstreams lineage by @mayurinehate in #7205
- ci(): fix datahub-upgrade quickstart regression by @hsheth2 in #7217
- feat(ingest): Add custom properties to the ldap ingestion by @bda618 in #7125
- fix(ingest): upgrade feast to avoid build issues by @hsheth2 in #7218
- fix(ui) Increase the number of assertions that we query for in tab by @chriscollins3456 in #7215
- fix(ci): trivy code scanning fix by @anshbansal in #7232
- feat(glue): Use table name as human-readable name for Glue ingestion by @danielcmessias in #7213
- feat(ui): Supporting display of columns and storage count in previews by @jjoyce0510 in #7198
- fix(gms): Fixes delete references for single relationship aspects by @pedro93 in #7211
- docs(ingest/lineage): clarify name field in entity config for file based lineage by @mayurinehate in #7225
- fix(ui): typo 'Documenataion' by @vojtechneradatos in #7227
- fix(cli/delete): skip references prompt if deleting an aspect by @hsheth2 in #7220
- fix(ingest/tableau): implement workbook_page_size parameter by @hsheth2 in #7216
- fix(gms): Corrects MCP generation in async mode by @pedro93 in #7214
- fix(ingest): redshift # build late binding view lineage when sql written in upper case by @looppi in #7223
- fix(siblings) Fix editing of schema fields for siblings with unequal schemas by @chriscollins3456 in #7199
- fix(ingest-idp): emit empty GroupMembership when there are no groups by @aditya-radhakrishnan in #7196
- feat(lineage): add time filtering for lineage edges by @aditya-radhakrishnan in #7159
- chore(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 in /docs-website by @dependabot in #7230
- refactor(docs): Minor language updates for kafka source doc header by @jjoyce0510 in #7237
- docs(website): fix feature availability dark mode styles by @jeffmerrick in #7233
- chore(log/docs): improve error log, docs by @anshbansal in #7239
- fix(dev.sh): Add context to kafka-setup build by @szalai1 in #7234
- feat(cli): improve docker quickstart by @hsheth2 in #7184
- fix(elasticsearch): fix orphan index clean up pattern, consistent top… by @david-leifker in #7242
- chore(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 in /datahub-web-react by @dependabot in #7231
New Contributors
- @bossenti made their first contribution in #7044
- @ruedigerblock made their first contribution in #7082
- @feljen made their first contribution in #7087
- @tonycsoka made their first contribution in #7091
- @tinolyu made their first contribution in #7130
- @kdongho made their first contribution in #7142
- @jaykadambi made their first contribution in #7148
- @viniciusdsmello made their first contribution in #7171
- @mryorik made their first contribution in #7207
- @danielcmessias made their first contribution in #7213
- @vojtechneradatos made their first contribution in #7227
Full Changelog: v0.9.6...v0.9.7
v0.9.6.1
Release Highlights
Please disregard release v0.9.6 in favor of this release v0.9.6.1
Bug fix for secrets encryption
- Prevents decryption errors for existing secrets
- Affects reading ingestion secret created with a previous release
- Affects native user password validation
What's Changed
Full Changelog: v0.9.6...v0.9.6.1
v0.9.6
# Release Highlights
User Experience
We now support embedding Dashboards, Charts, and Datasets. This allows us to do things like directly embed Looker / Tableau / Mode / Redash Looks, Dashboards, Explores into the Dataset pages themselves.
[Experimental] You can now customize the number of queries displayed on the Query tab of a Dataset entity
Improved error messaging for bulk editing via the UI
Metadata Ingestion
Update to data profiling to allow configurable number of sample values to be returned
Postgres ingestion now supports emitting lineage edges for Views - shoutout to @LucasRoesler for the contribution!
Snowflake ingestion now supports extracting tags - shoutout to @frsann for the contribution!
Vertica ingestion now supports projections and lineage- thanks for the contribution, @vishalkSimplify!
Glue ingestion now emits an s3 lineage edge when data was written with an s3a/s3n client - thanks for the contribution, @danielli-ziprecruiter!
Developer Experience
Fixes quickstart/docker compose issues for M1 machines
Improvements in reliability and performance of the Restli Service endpoints for ingestion:
Scale Restli Service thread pool based on CPU
Add retry (exp backoff) to Restli Entity Client
MCE no longer relies on GMS for Restli service
Converted Restli Service from standalone servlet to Spring injectable
Docker build externalized (significantly faster on m1, <7 minute build times, based on this)
Frontend asset generation refactor (causing tests to fail intermittently)
What's Changed
- feat(ingest): add pydantic helper for removed fields by @hsheth2 in #6853
- chore(0.9.5): Bump defaults for release v0.9.5 by @jjoyce0510 in #6856
- Revert "fix(ci): remove warnings due to deprecated action" by @anshbansal in #6857
- refactor(restli-mce-consumer) by @david-leifker in #6744
- fix(ci): reduce smoke test run time by @anshbansal in #6841
- fix(security): require signed/encrypted jwt tokens by @david-leifker in #6565
- feat(ingest): update profiling to fetch configurable number of sample values by @mayurinehate in #6859
- feat(ingest/airflow): support raw dataset urns in airflow lineage by @hsheth2 in #6854
- refactor(graphql): make graphqlengine easier to use by @anshbansal in #6865
- fix(kafka): datahub-upgrade job by @david-leifker in #6864
- feat(ingest): pass timeout config in kafka admin client api calls by @mayurinehate in #6863
- chore(ingest): loosen requirements file by @hsheth2 in #6867
- feat(ingest): upgrade pydantic version by @cccs-eric in #6858
- fix(elasticsearch): fixes out of order runId writes by @david-leifker in #6845
- chore(ingest): loosen additional requirements by @hsheth2 in #6868
- feat(ingest): bigquery/snowflake - Store last profile date in state by @treff7es in #6832
- docs(google-analytics): Correct grammatical error in README.md by @jx2lee in #6870
- feat(CI): add venv caching by @szalai1 in #6843
- feat(ingest/snowflake): handle failures gracefully and raise permission failures by @mayurinehate in #6748
- fix(runid): always update runid, except when queued by @david-leifker in #6876
- fix(ingest): conditionally include env in assertion guid by @hsheth2 in #6811
- chore(ci): update dependencies docs-website by @anshbansal in #6871
- feat(ui) - Add a custom error message for bulk edit to add clarity by @mkamalas in #6775
- docs(adding users): Refreshing the docs for adding new DataHub Users by @jjoyce0510 in #6879
- test(mce-consumer): mockbeans by @david-leifker in #6878
- feat(ingest): avoid embedding serialized json in metadata files by @hsheth2 in #6742
- refactor(gradle): move the local docker registry to common location by @david-leifker in #6881
- refactor(smoke): use env variables by @anshbansal in #6866
- fix(lint): pin pydantic version by @anshbansal in #6886
- refactor(docs): Correctly spell elasticsearch in docs by @jjoyce0510 in #6880
- fix(ingest): okta undefined variable error by @anshbansal in #6882
- fix(ci): reduce flakiness in add_users, siblings smoke test by @anshbansal in #6883
- fix(ingest): fall back to default table comment method for all Trino query errors by @marvin-roesch in #6873
- test(misc): misc test updates by @david-leifker in #6890
- deprecate(ingest): bigquery - Removing bigquery-legacy source by @treff7es in #6851
- chore(ingest): remove inferred args to MCPW, part 1 by @hsheth2 in #6819
- test(ingest/kafka-connect): make docker setup more reliable by @hsheth2 in #6902
- fix(ingest): profiling (bigquery) - Address biquery profiling query error due to timestamp vs data mismatch by @treff7es in #6874
- fix(cli): Make datahub quickstart work with latest docker compose in M1 by @pedro93 in #6891
- fix(cli): fix delete urn cli bug + stricter type annotations by @hsheth2 in #6903
- fix(ingest/airflow): reorder imports to avoid cyclical dependencies by @stijndehaes in #6719
- feat: remove jq requirement + tweak modeldocgen args by @hsheth2 in #6904
- chore(ingest): loosen pyspark and pydeequ deps by @hsheth2 in #6908
- docs(ingest/looker): fix typos + update lookml github action example by @hsheth2 in #6910
- fix(ingest/metabase): use card_id in dashboard to chart lineage by @ccpypy in #6583
- fix(es-setup): create data stream on non-aws by @szalai1 in #6926
- Adding missing Platform logos by @maggiehays in #6892
- feat(ingestion): PowerBI# Improve PowerBI source ingestion by @mohdsiddique in #6549
- Fix compose context for kafka-setup by @szalai1 in #6923
- feat(backend): Supporting Embeddable Previews for Dashboards, Charts, Datasets by @jjoyce0510 in #6875
- chore(deps): bump json5 from 2.2.1 to 2.2.3 in /docs-website by @dependabot in #6930
- chore(deps): bump json5 from 1.0.1 to 1.0.2 in /datahub-web-react by @dependabot in #6931
- fix(ci): managed ingestion test fix by @anshbansal in #6946
- feat(ingest): add
include_table_location_lineage
flag for SQL common by @hsheth2 in #6934 - feat(ingest): allow extracting snowflake tags by @frsann in #6500
- chore(ingest): unpin pydantic dep by @hsheth2 in #6909
- chore(ingest): partially revert pyspark dep from #6908 by @hsheth2 in #6954
- fix(ingest): use branch info when cloning git repos by @hsheth2 in #6937
- chore(ingest): remove inferred args to MCPW, part 2 by @hsheth2 in #6905
- fix(ingest/unity): simplify MCP generation and reporting by @hsheth2 in #6911
- chore(ci): parallelise build and test workflow to reduce time by @anshbansal in #6949
- fix(frontend): sasl.client.callback.handler.class by @szalai1 in #6962
- chore(react): remove outdated cypress tests and dependency by @anshbansal in #6948
- fix(ci): restrict GE to fix build issues by @anshbansal in #6967
- feat(queries): [Experimental] Allow customization of # of queries in Query tab via env var by @gabe-lyons in #6964
- feat(ingest/postgres): emit lineage for postgres views by @LucasRoesler in #6953
- feat(ingest/vertica): support projections and lineage in vertica by @vishalkSimplify in #6785
- fix(ingest): add missing dep for powerbi by @hsheth2 in #6969
- Docs fixes week of 12 22 by @laulpogan in #6963
- fix(ingest): unfreeze bigquery/snowflake column dataclass by @mayurinehate in #6921
- chore(frontend) Remove unused dependencies from package.json by @chriscollins3456 in #6974
- chore: misc fixes by @anshbansal in #6966
- feat(ingest/glue): emit s3 lineage for s3a and s3n schemes by @danielli-ziprecruiter in #6788
- fix(kafka-setup): Make kafka-setup run with multiple threads by @pedro93 in #6970
- feat(ingest): mark database_alias and env as deprecated by @hsheth2 in #6901
- fix(docs): Updating Tag, Glossary Term docs to point to correct GraphQL methods by @jjoyce0510 in #6965
- chore(deps): bump certifi from 2020.12.5 to 2022.12.7 in /metadata-ingestion/src/datahub/ingestion/source/feast_image by @dependabot in #6979
- fix(ingest): profiling - Fixing issue with the wrong timestamp stored in check by @treff7es in #6978
- config(quickstart): enable auto-reindex for quickstart by @david-leifker in #6983
- feat(privileges) - Create a privilege to manage glossary children recursively by @mkamalas in #6731
- chore(ingest): finish removing feast-legacy by @hsheth2 in #6985
- feat(ingest): add import descriptions of two or more nested messages by @wngus606 in #6959
- feat(docs) Add feature guide for Manual Lineage by @chriscollins3456 in #6933
- docs(rfc): Serialising GMS Updates with Preconditions by @mattmatravers in #5818
- fix(ingest/kafka-connect) support newer version of debezium by @jaegwonseo in #6943
- fix(docs): build and broken snowflake docs fix by @anshbansal in #6997
- fix(ingest): bigquery - views in case more than 1 datasets with views by @anshbansal in #6995
- fix(docs): Renaming Business Glossary Doc by @jjoyce0510 in #7001
- fix(ingest/snowflake): fix type annotations + refactor get_connect_args by @hsheth2 in #7004
- fix(docs): Changing the platform event topic name in kafka custom topic docs by @blankon123 in #7007
- fix(docs): fix name of privilege referenced in posts doc by @aditya-radhakrishnan in #7002
- fix(SSO): Correctly redirect to originally requested URL in SSO by @jjoyce0510 in #7011
- fix(ingest): remove dead code from tests by @hsheth2 in #7005
- feat(ingestion): Tableau # Embed links by @mohdsiddique in #6994
- feat(auth) Update auth cookies to have same-site none for chrome extension by @chriscollins3456 in #6976
- docs(website): DPG WIP by @maggiehays in #6998
- docs: resize datahub logo by @hsheth2 in #7014
- fix(kafka-setup): Remove reference to non-existing topic by @pedro93 in #7019
- fix(ingest): powerbi # use display name field as title for powerbi report page by @looppi in #7017
- feat(auth) Allow session ttl to be configurable by env variable by @chriscollins3456 in #7022
- fix(ui): URL Encode all Entity Profile URLs by @jjoyce0510 in #7023
- fix(ui ingest): Fix test connection when stateful ingest is enabled by @jjoyce0510 in #7013
- docs(sso) move root user warning to earlier in SSO guides by @maggiehays in #7028
- fix(ingest/looker): add clarity in chart input parsing logs by @hsheth2 in #7003
- chore(ingest): remove duplicate data_platform.json file by @hsheth2 in #7026
- feat(ingestion): PowerBI # Remove corpUserInfo aspect ingestion by @mohdsiddique in #7034
- fix(metadata-models): remove unnecessary bin folder by @jjoyce0510 in #7035
- fixing typos by @maggiehays in #7030
New Contributors
- @marvin-roesch made their first contribution in #6873
- @stijndehaes made their first contribution in #6719
- @ccpypy made their first contribution in #6583
- @LucasRoesler made their first contribution in #6953
- @vishalkSimplify made their first contribution in #6785
- @wngus606 made their first contribution in #6959
- @jaegwonseo made their first contribution in #6943
- @blankon123 made their first contribution in #7007
Full Changelog: v0.9.5...v0.9.6
v0.9.4
# Release Highlights
KNOWN ISSUES
There is a known issue with OIDC which we will address in a fast-follow release. If you use OIDC, please wait for v0.9.5 to upgrade.
User Experience
Manual Lineage is LIVE! You can now add and remove lineage between entities in the Lineage Visualization screen, making it easier than ever to manage the complex relationships between your data resources.
Our new Views feature makes it easy to create curated sets of Entities within DataHub. This is a great way to start to isolate the entities that matter most, and provide your DataHub end-users with a streamlined view of the assets that are relevant to their use cases.
In-App Product Tours are here! When logging into DataHub and/or visiting a new page type for the first time, new users will be prompted with a helpful walkthrough of core functionality to get them familiar with the platform. We’ll continue to add modules as we roll out new features!
Automatically send updates to Slack and/or Microsoft Teams when changes are made within DataHub by leveraging our the new Slack and Teams Actions
Metadata Ingestion
We’re continuing to improve the user experience for UI-based ingestion for the following sources:
dbt Cloud
DataBricks Unity Catalog
MySQL
Trino/Preso
MSSQL
MariaDB
If you’re just getting started with UI-based Ingestion, check out our new BigQuery & Snowflake guides
Stateful ingestion is now supported for Iceberg (thanks for the contrib, @cccs-Dustin!) and LDAP (thanks for the contrib, @bda618!)
Speaking of Stateful Ingestion, we’re taking some steps to simplify the code behind Sta
What's Changed
- chore(): Updating default CLI version, update updating-datahub.md by @jjoyce0510 in #6590
- fix(ingest): profiling - Profiling failed if column cardinality threw an error by @treff7es in #6582
- fix(actions): add missing datahub-gms-protocol env var by @shirshanka in #6593
- fix(ingest): restrict snowflake-connector-python dependency by @mayurinehate in #6594
- feat(ingest/bigquery): avoid creating/deleting tables for profiling by @hsheth2 in #6578
- fix(ingest): unify emit interface by @hsheth2 in #6592
- fix(security): security version updates by @david-leifker in #6602
- docs: remove Kafka Streams from documentation by @maver1ck in #6596
- refactor(ui): Improving Kafka UI Ingestion Form, Create Domain, Create Secret Modals by @jjoyce0510 in #6588
- fix(ingest): clarify tableau auth error messages by @hsheth2 in #6600
- docs(graphql): fix deleteTest "Create"->"Delete" by @nickwu241 in #6574
- fix(gms/startup): remove set -x from start.sh by @timcosta in #6589
- feat(sql): Add SQL index on createdon field by @pedro93 in #6522
- feat(ml model): updating view of ml model feature list by @gabe-lyons in #6576
- fix(ingest/bigquery): ignore complex types from profiling by @treff7es in #6613
- feat(ingest): add external url for snowflake objects by @mayurinehate in #6580
- chore(ingest): bump and pin mypy by @hsheth2 in #6584
- fix(ingest): only require github_info for lookml and not looker by @hsheth2 in #6608
- docs(ingest): add airflow docs that use the
PythonVirtualenvOperator
by @hsheth2 in #6604 - fix(ui) Fix double scroll in embedded list search sections by @chriscollins3456 in #6618
- feat(ingest): print detailed GMS error messages by @djordje-mijatovic in #6519
- Townhall agenda wikimedia by @maggiehays in #6622
- fix(analytics): skip ListDomains if user cannot manage domains and have only one loading message by @aditya-radhakrishnan in #6624
- feat(quickstart): add support for passing thru env vars needed by Sla… by @shirshanka in #6591
- docs(actions): slack, teams by @shirshanka in #6632
- fix(logging): Remove lombok as source of slf4j-api by @david-leifker in #6616
- docs: add links from main README to slack, teams actions by @shirshanka in #6633
- feat(ingest): Support config variable for specifying a direct privat… by @mayurinehate in #6609
- Add AWS Postgres Iam Auth jar to GMS by @syedzoherer in #6371
- feat(ingest/snowflake): support filtering by fully qualified schema_pattern by @mayurinehate in #6611
- feat(ingest/kafka-connect): support MongoSourceConnector by @frsann in #6416
- feat(graph) Add createdOn, createdActor, updatedOn, updatedActor to graph edges by @chriscollins3456 in #6615
- refactor(ui): Making improvements to UI ingestion forms, adding MySQL, Trino, Presto, MSSQL, MariaDB forms by @jjoyce0510 in #6607
- perf(ui-ingestion): cache on creation or deletion of ingestion sources to reduce latency by @aditya-radhakrishnan in #6647
- feat(ingest): add dummy data source for automated testing by @anshbansal in #6550
- docs(managed datahub): adding release notes for v0.1.70 by @anshbansal in #6655
- feat(gms): Pluggable Authentication & Authorization Framework by @mohdsiddique in #6634
- docs: move rfcs to separate repo by @laulpogan in #6621
- fix(ingest): fix lingering demo-data source issues by @hsheth2 in #6659
- feat(ingest): bigquery - Running lineage extraction after metadata extraction by @treff7es in #6653
- fix(ingest): issue deprecation warning correctly by @hsheth2 in #6623
- chore(ingest): remove feast-legacy by @hsheth2 in #6661
- fix(ingest/snowflake): support domains for snowflake schema containers by @hsheth2 in #6662
- build(deps): bump decode-uri-component from 0.2.0 to 0.2.2 in /datahub-web-react by @dependabot in #6617
- feat(ingest/dbt): add support for latest DBT version 1.3 by @MatthieuBlais in #6651
- docs: add languages to code highlighting by @hsheth2 in #5576
- docs(typo) Correct typo in domains.md by @maggiehays in #6667
- feat(gms): Enable auth-api publishing to maven by @mohdsiddique in #6671
- fix(ingest/powerbi-report-server): deprecate unused graphql config by @daha in #6630
- fix(docker): Fix datahub-frontend dockerfile by @jjoyce0510 in #6670
- fix(ingest): profiling - Changing profiling defaults by @treff7es in #6640
- feat(ci): add smoke test for domain mutation by @anshbansal in #6641
- fix(datahub-protobuf): fix missing httpclient dependency by @shirshanka in #6672
- feat(ingest): update snowflake docs, add simple validations by @mayurinehate in #6636
- fix(gms): DataHub Auth API java doc fix by @mohdsiddique in #6674
- feat(ingest): run profiler in more cardinality cases by @hsheth2 in #6397
- docs(search) update broken youtube link by @maggiehays in #6678
- docs(protobuf): update examples for protobuf by @david-leifker in #6681
- feat(ingest): support knowledge links in business glossary by @mohdsiddique in #6375
- fix(ingestion/vertica): support columns with timestamp precision by @inancdokurel in #6295
- feat(ingest): add timestamps for snowflake objects by @mayurinehate in #6570
- feat(onboarding): adds framework and some steps for onboarding steps UI by @aditya-radhakrishnan in #6462
- feat(ingest): use entry point for registering transformers by @Masterchen09 in #6628
- chore(ci): update base ingestion image requirements file by @anshbansal in #6687
- fix(ci): reduce warnings due to deprecated action by @anshbansal in #6686
- refactor(ui): Adding caching for users, groups, and roles by @jjoyce0510 in #6673
- fix(ci): revert confluent kafka in base image by @anshbansal in #6690
- fix(security): version bump to latest minor python image by @david-leifker in #6694
- docs(ingest/salesforce): list required permissions by @orlandine in #6610
- feat(ingest): bigquery - option to set on behalf project by @treff7es in #6660
- ci: stop commenting test results on PR by @hsheth2 in #6700
- fix(auth-api): Attempting to fix publish for auth-api by @jjoyce0510 in #6695
- build(deps): bump qs from 6.5.2 to 6.5.3 in /smoke-test/tests/cypress by @dependabot in #6663
- build(deps): bump express from 4.17.1 to 4.18.2 in /datahub-web-react by @dependabot in #6665
- fix(ingest/tableau): support ssl_verify flag properly by @hsheth2 in #6682
- fix(config): unify the handling of boolean environment variables by @Masterchen09 in #6684
- fix(ui): fix search on policy builder by @aditya-radhakrishnan in #6703
- build(deps): bump qs from 6.5.2 to 6.5.3 in /datahub-web-react by @dependabot in #6664
- fix(ingest): cleanup config extra usage by @hsheth2 in #6699
- docs(logos): add Great Expectations logo by @maggiehays in #6698
- fix(security): play framework upgrade by @david-leifker in #6626
- fix(ingest/sagemaker): handle missing ProcessingInputs field by @hsheth2 in #6697
- build: add retries to gradle wrapper download in ingestion docker by @hsheth2 in #6704
- test(quickstart): add debugging to quickstart test by @david-leifker in #6718
- fix(setup): Bump setup images to alpine 3.14 with arch based on machine OS. by @pedro93 in #6612
- fix(ingest): fix bug in auto_status_aspect by @hsheth2 in #6705
- fix(security): commons-text, hadoop-commons versions by @david-leifker in #6723
- fix(build): rename conflicting module
auth-api
by @david-leifker in #6728 - docs(aws): edit markdown link by @jx2lee in #6706
- fix(ingest): fix mysql ingestion issue with non-lowercase database by @mayurinehate in #6713
- feat(ingest): redact configs reported in ingestion_run_summary by @hsheth2 in #6696
- fix(ingest): rectify filter for BigQuery external tables by @janhicken in #6691
- feat(ingest): add separate config for include_column_lineage in snowf… by @mayurinehate in #6712
- fix(ci): flakiness due to onboarding tour in add user test by @anshbansal in #6734
- feat(ui): Support DataBricks Unity Catalog Source in Ui Ingestion by @jjoyce0510 in #6707
- feat(ingest/iceberg): add stateful ingestion by @cccs-Dustin in #6344
- doc(restore): document restore indices API endpoint by @anshbansal in #6737
- feat(): Views Feature Milestone 1 by @jjoyce0510 in #6666
- feat(ingest): bigquery - external url support and a small profiling filter fix by @treff7es in #6714
- test(ingest): make hive/trino test more reliable by @hsheth2 in #6741
- Initial commit for bigquery ingestion guide by @treff7es in #6587
- fix(ci): remove warnings due to deprecated action by @anshbansal in #6735
- feat(ingest): add stateful ingestion to the ldap source by @bda618 in #6127
- fix(ingest): fix codegen
from_obj
for empty dicts in unions with null by @hsheth2 in #6745 - feat(ingest): start simplifying stateful ingestion state by @hsheth2 in #6740
- docs(gms): plugins# auth-api as compileOnly dependency by @mohdsiddique in #6747
- fix(elasticsearch): build in resilience against IO exceptions on httpclient by @RyanHolstien in #6680
- ci: fix ingestion gradle retry by @hsheth2 in #6752
- fix(ingest): support airflow mapped operators by @cccs-seb in #6738
- fix(actions): fix mistype slack/teams base url by @ssilb4 in #6754
- fix(smoke-test): fix stateful ingestion test regression by @hsheth2 in #6753
- fix(auth): Renames metadata-auth archive name to not conflict with other modules. by @pedro93 in #6749
- fix(ingest/lookml): fix directory handling and a config validation bug by @hsheth2 in #6751
- refactor(ingest): bigquery-lineage - allow tables and datasets in uppercase by @PatrickfBraz in #6739
- refactor(ux): Misc UX Improvements (tutorial copy, caching, filters) by @jjoyce0510 in #6743
- Added build failed yarn error by @jakobhanna in #6757
- feat(ingest): remove source config from DatahubIngestionCheckpoint by @hsheth2 in #6722
- fix(python-sdk): DataHubGraph get_aspect should accept empty responses by @shirshanka in #6760
- fix(datahub-web-react): Properly escape a quote in React by @jjoyce0510 in #6764
- docs(ingest/airflow): clarify Airflow 1.x docs for airflow plugin by @hsheth2 in #6761
- feat(ingest): simplify more stateful ingestion state by @hsheth2 in #6762
- fix(ingest): bigquery - handling custom sql errors as warning in profiling by @treff7es in #6777
- docs(docker): add section for adding community images by @anshbansal in #6770
- docs(ingest): fix error in custom tags transformer example by @hsheth2 in #6767
- feat(ingest): add
datahub state inspect
command by @hsheth2 in #6763 - refactor(ui): Caching Ingestion Secrets by @jjoyce0510 in #6772
- docs(snowflake) Snowflake quick ingestion guide by @maggiehays in #6750
- Optimize kafka setup by @david-leifker in #6778
- feat(ingest/lookml): add unreachable views to report by @hsheth2 in #6779
- feat(ci): adding github security reporting to trivy scans by @shirshanka in #6773
- fix(smoke-test): remove stateful ingestion config check by @hsheth2 in #6781
- fix(ingest): correct external url for account identifier with account name by @mayurinehate in #6715
- fix(tutorial): skip getting steps if there is no user by @aditya-radhakrishnan in #6786
- fix(kafka-setup): fix return code check by @david-leifker in #6782
- refactor(ui): Make include_tables and include_views default to True. Improve Tableau default recipe. by @jjoyce0510 in #6790
- fix(ingest): prevent NullPointerException when non-jdbc SaveIntoDataS… by @danielli-ziprecruiter in #6803
- docs(architecture): edit documents in architecture section by @jx2lee in #6798
- fix(ingest/dbt): remove unsupported usage indicator by @hsheth2 in #6805
- refactor(ui): Adding frontend caching + some misc. refactoring by @jjoyce0510 in #6796
- fix(ingest): bigquery - sharded table support improvements by @treff7es in #6789
- chore(ingest): pin black version by @hsheth2 in #6807
- refactor(ingest/stateful): remove most remaining state classes by @hsheth2 in #6791
- fix(profile): bigquery-legacy - Fix for TypeError-related failures in legacy plugin by @senapatim in #6806
- Update Grafana Dashboard by @NavinSharma13 in #6076
- refactor(ingest/stateful): remove
IngestionJobStateProvider
by @hsheth2 in #6792 - chore(ingest): bump python package dependencies to resolve vulns by @cyberay01 in #6384
- refactor(ingest/stateful): remove
get_last_state
method by @hsheth2 in #6794 - fix(ui): URL encode urns for ownership entity links by @aditya-radhakrishnan in #6814
- fix(posts): add deletePost GraphQL endpoint by @aditya-radhakrishnan in #6813
- fix(policies): resolve the associated domain for a domain as the domain itself by @aditya-radhakrishnan in #6812
- feat(lineage) Adds ability to edit lineage manually from the UI by @chriscollins3456 in #6816
- fix(ui): change caching to happen post server-response when creating a UI ingestion recipe by @aditya-radhakrishnan in #6815
- feat(ingest/stateful): remove platform_instance_id from state urn by @hsheth2 in #6795
- feat(ui): Adding DBT Cloud support for UI ingestion by @jjoyce0510 in #6804
- feat(kafka): expose default kafka producer mechanism by @djordje-mijatovic in #6381
New Contributors
- @maver1ck made their first contribution in #6596
- @MatthieuBlais made their first contribution in #6651
- @inancdokurel made their first contribution in #6295
- @orlandine made their first contribution in #6610
- @janhicken made their first contribution in #6691
- @cccs-Dustin made their first contribution in #6344
- @cccs-seb made their first contribution in #6738
- @ssilb4 made their first contribution in #6754
- @senapatim made their first contribution in #6806
- @cyberay01 made their first contribution in #6384
Full Changelog: v0.9.3...v0.9.4
V0.9.3
# Release Highlights
User Experience
Column Level Lineage Impact Analysis is live! Read more about it here
You can now sort Dataset field names alphabetically - this is super handy for finding columns within wide datasets that may not have an easy-to-follow order by default [gif]
Miscellaneous UX improvements:
“Explore All” button on home page, making it easier to jump into the search experience [gif]
“Share” button on entity pages [screenshot]
[Community Contribution] You can now assign the same user as different owner types - thanks for the contrib, @rtekal!
Metadata Ingestion
Snowflake Automated PII Classification is here! We’re eager for feedback on the utility of this feature - check out this guide, take it for a spin, and let us know what you think!
We’ve simplified the configs required to add stateful ingestion to an ingestion source - check out the updated docs here
Speaking of stateful ingestion, it’s now supported with:
Looker & LookML ingestion sources
[Community Contribution] Container-level ingestion – thanks for the contrib, @wangsaisai!
Developer Experience
NEW! dbt Cloud ingestion is ready for ya - check out the module details here
[Community Contribution] For those of you deploying DataHub with Neo4j, we now support Lineage Impact analysis via Neoj4 mulithop functionality. Thanks for the contrib, @djordje-mijatovic!
We’ve loosened our SQLAlchemy dependencies to support Airflow 2.3+
What's Changed
- fix(spark-lineage): Smoke test fix + smoke test m1 support by @treff7es in #6372
- feat(ingest): supports MCEs in domain transformer by @hsheth2 in #6364
- feat(ingest): enable container stateful ingestion by @wangsaisai in #6343
- build(ingest): pin mypy version by @hsheth2 in #6391
- build: use acryl's gradle-avro-plugin by @hsheth2 in #6390
- fix(ingest): unity - add missing date type by @ms32035 in #6385
- fix(ingest): unity-catalog - Removing unneeded sqlalchemy dependency to fix install by @treff7es in #6379
- feat(ingest/tableau): re-authenticate if the token expires by @hsheth2 in #6380
- fix(ingest): use profiler config settings correctly by @hsheth2 in #6354
- fix(ingest): handle error when query returns no columns in snowflake lineage by @mayurinehate in #6404
- fix(ingest): fix missing snowflake lineage when table_pattern is set by @mayurinehate in #6410
- feat(ingest): loosen sqlalchemy dep & support airflow 2.3+ by @hsheth2 in #6204
- fix(ingest/s3): add status aspect for detected s3 datasets by @mayurinehate in #6402
- fix(ingest/snowflake): loosen snowflake connector version requirement by @hsheth2 in #6418
- fix(mysql): fix native data type for mysql set type by @mayurinehate in #6407
- perf(ui): virtualized schema table rows by @stanbaker in #6287
- fix(ui) Improve HoverEntityTooltip and truncate parent glossary nodes by @chriscollins3456 in #6417
- feat(ingest): support incremental lineage to dbt node from external platform by @mayurinehate in #6392
- fix(ingest): init dataset props if missing in transformer by @hsheth2 in #6429
- fix(change-event): remove unnecessary dependencies on EntityChangeEventGeneratorRegistryFactory by @aditya-radhakrishnan in #6431
- build(deps): bump moment-timezone from 0.5.34 to 0.5.35 in /datahub-web-react by @dependabot in #5783
- feat(frontend): Adding support to show externalUrl and institutionalMemoryFields for MLModels by @lurecas in #6053
- feat(model): adds properties, ownership, deprecated, institutional memory and tags as aspects for data platform instance entity by @sgomezvillamor in #5728
- docs(ingest/airflow): clarify docs around 1.x compat by @hsheth2 in #6436
- feat(recommendations): add last edited entities by @CorentinDuhamel in #6329
- fix(ingest): correctly compute entity change percentage by @hsheth2 in #6438
- docs(townhall) Updating Townhall History by @maggiehays in #6336
- Neo4j multihop support by @djordje-mijatovic in #6104
- fix(mae-consumer): Set proper variable expansion for JMX_OPTS and JAVA_OPTS in MAE docker by @skrydal in #6378
- docs(ingest): move prerequisite section before the ingestion recipe example by @mayurinehate in #6341
- fix(dataset): improve glossary term load performance for datasets by @Reilman79 in #6396
- feat(lineage) Implement CLL impact analysis for inputFields by @chriscollins3456 in #6426
- feat(ui) Add upgrade step to enable CLL impact analysis for existing data by @chriscollins3456 in #6427
- Added functionality to copy fieldpath and urn of each column by @Ankit-Keshari-Vituity in #6398
- fix(ingestion): add output converters for ODBC unsuported datatype in… by @LavinaVRovine in #6134
- fix(ui) Fix parentNodes overfetching everywhere it's used by @chriscollins3456 in #6446
- fix(ingest): snowflake - Fixing top query trimming in snowflake by @treff7es in #6447
- feat(elasticsearch): Updates to elasticsearch configuration, dao, tests by @david-leifker in #6269
- chore(ingest): fix mssql lint by @hsheth2 in #6453
- fix(ingest): add cli info to ingestion reporter by @hsheth2 in #6451
- fix(ui) Fix glossary side browser width fluctuating by @chriscollins3456 in #6457
- fix(python): Fix python dependencies for doc generation by @david-leifker in #6460
- docs(website): add homepage links by @jeffmerrick in #6458
- build(ingest): loosen jinja2 dependency for superset by @KulykDmytro in #6433
- fix(ingest): lowercase db name in mssql ingestion by @hsheth2 in #6448
- fix(ingest): handle missing schema in transformer by @hsheth2 in #6445
- feat(ingest): allow specific profiler config fields to override profile_table_level_only by @hsheth2 in #6366
- docs(enrichment) updating enrichment landing page by @maggiehays in #6286
- fix(home-page): remove redundant getAuthenticatedUser query by @aditya-radhakrishnan in #6464
- feat(ingest): detect old or missing docker compose by @hsheth2 in #6466
- feat(ingestion): powerbi # Power BI report support by @mohdsiddique in #6339
- fix(ingest/dbt): disable incremental lineage by default by @hsheth2 in #6467
- fix(loggin): print logging timestamp in ISO8601 format instead of jus… by @szalai1 in #6474
- docs(ingest/trino): add example of http connection by @hsheth2 in #6461
- refactor(ui): Simplify base glossary page toolbar by @jjoyce0510 in #6469
- revert: mssql - lowercase db name in mssql ingestion by @hsheth2 in #6481
- build: remove
Jinja2
dependency fromsuperset
by @KulykDmytro in #6476 - fix(roles): allows role service to unassign roles by @aditya-radhakrishnan in #6434
- fix(docs): update the Okta and Azure AD docs to clarify the point of ingesting users by @aditya-radhakrishnan in #6465
- Highlighted the description text on search by @Ankit-Keshari-Vituity in #6400
- Ownership type is deprecated by @jakobhanna in #6477
- feat(ui): Adding Explore all button on home page search by @jjoyce0510 in #6468
- fix(ingest): fix athena and GE lint errors by @hsheth2 in #6482
- refactor(ingest): simplify stateful ingestion config by @hsheth2 in #6454
- docs(ingest/tableau): required permissions + doc formatting by @hsheth2 in #6484
- feat(ingest): presto - Adding presto source by @treff7es in #6459
- fix(ui) Fix lineage graph rendering with duplicate nodes by @chriscollins3456 in #6480
- docs(cypress): adding local cypress running instructions by @gabe-lyons in #6492
- fix(managed ingestion): updating snowflake schema pattern placeholder text by @gabe-lyons in #6493
- feat(ui): Adding External URLs to search preview for Dataset, Container, DataFlow, DataJob by @jjoyce0510 in #6496
- fix(ingest/tableau): check
tableName
existence on datasource response by @lustefaniak in #6478 - fix(build): do not use neo4j for dev by @anshbansal in #6501
- docs(gms): update search example, do not use deprecated clause by @mayurinehate in #6340
- feat(ingest): add stateful ingestion support to looker and lookml source by @mayurinehate in #6443
- feat(ingest): dbt cloud integration by @hsheth2 in #6323
- fix(tableau): extra defensive error-handling by @hsheth2 in #6503
- fix(ingest): remove redundant types by @hsheth2 in #6486
- fix(ingest/snowflake): fix lineage allow/deny pattern typo by @hsheth2 in #6506
- fix(docs): add missing docs for 0.9.1 by @anshbansal in #6515
- feat(ui): Introducing Share Button on Entity Pages by @jjoyce0510 in #6450
- Added I AM auth for Opensearch by @syedzoherer in #6370
- fix(ingest): correctly handle transformer patch semantics by @hsheth2 in #6505
- feat(ingest/csv-enrich): handle BOM character by @hsheth2 in #6509
- feat(airflow): support kafka hook in the airflow plugin by @hsheth2 in #6508
- fix(patch): cover case where patch is used to create an entity by @RyanHolstien in #6504
- build(deps): bump loader-utils from 2.0.0 to 2.0.4 in /docs-website by @dependabot in #6452
- fix(ingest): add alias for bigquery-beta by @hsheth2 in #6521
- feat(ingest): add config for ingesting delta table without files by @mayurinehate in #6403
- fix(ingest): fix typo in unique count profiling by @mayurinehate in #6517
- fix(ui) Fix roles not always displaying on page load by @chriscollins3456 in #6524
- feat(datahub-upgrade): Added msk IAM auth as a build dependency. by @pghazanfari in #6439
- feat(kafka-setup): Added support for MSK IAM authentication. by @pghazanfari in #6435
- Added sorting method to fieldpath column of schema tab by @Ankit-Keshari-Vituity in #6510
- fix(ingest): make kafka emit callback optional by @hsheth2 in #6525
- feat(ingest): automated term classification for snowflake by @mayurinehate in #6376
- fix(ingest): fix typo in urn utilities by @bskim45 in #6520
- fix(ingest): fix trino properties and tests by @mayurinehate in #6518
- fix(build): remove warnings in github actions by @anshbansal in #6512
- fix(security): Bump ranger plugin commons dependency by @pedro93 in #6535
- fix(ingest): kafka - properly picking doc from union type by @treff7es in #6472
- feat(ingest): disable stateful_ingestion fail-safe by default by @hsheth2 in #6537
- fix(ingest/airflow): respect enabled flag in airflow plugin by @hsheth2 in #6528
- refactor(ui): Adding apollo caching to manage domains page. by @jjoyce0510 in #6494
- refactor(recommendations): Filtering for specific entity types in recommendations by @jjoyce0510 in #6538
- fix(ingest): handle groupby custom label case by @phongvu99 in #6456
- build(ingest): support flake8 6.0.0 by @hsheth2 in #6540
- fix(ui) Wrap schema field descriptions to allow read more/less always by @chriscollins3456 in #6541
- fix(ui) Display duplicate nodes in lineage viz by @chriscollins3456 in #6526
- style(ingest): fix lint checks for superset by @mayurinehate in #6548
- fix(envs): remove DATASET_ENABLE_SCSI stale env var by @szalai1 in #6546
- feat(upgrade): Make restore from backup logic generic by @pedro93 in #6536
- feat(ingest): refractor classification mixin, support new infotypes by @mayurinehate in #6545
- fix(ingest): bigquery - missing sqlalchemy dep and row count fix by @treff7es in #6553
- fix(ingest): bigquery - Fixing querying non-date partition columns in profiling by @treff7es in #6554
- feat(ingest): powerbi # scan all accessible workspaces by @looppi in #6441
- fix(ingest): bigquery - Setting partition id for profiling data and project_id fix by @treff7es in #6558
- fix(gms): fix java.lang.NoClassDefFoundError: com/sun/syndication/io/FeedException for apache-ranger authorizer by @mohdsiddique in #6560
- feat(ui): Add Test Connection Support for BigQuery ingestion source by @jjoyce0510 in #6543
- fix(contrib): Update base python image for es7-upgrade by @david-leifker in #6562
- fix(ingest): handle docker-compose version
v
prefix by @hsheth2 in #6561 - docs(ingest/kafka): add field descriptions of kafka-related configs to pydantic by @mmmeeedddsss in #6559
- feat(platform): Support @searchable + @relationship Annotations for Timeseries Aspects by @jjoyce0510 in #6455
- feat(models): Adding 'created', 'lastModified' timestamp to Dataset, Container, Dashboard, Chart by @jjoyce0510 in #6527
- fix(ingest): set DataProcessInstance created ts to start time by @hsheth2 in #6566
- feat(docs-site): fast reload command for markdown edits by @hsheth2 in #6539
- fix(ingest): graceful error handling in snowflake classification by @mayurinehate in #6568
- ci(label): add smoke test label by @anshbansal in #6571
- fix(ingest): fix types changes in clickhouse sqlalchemy 0.2.3 by @mayurinehate in #6572
- fix(tests): Misc updates for tests, auth log level, and quickstart by @david-leifker in #6491
- feat(ui) Add owner to dataset - allow same owner with a different type by @rtekal in #6463
- fix(verions): Update opentelemetry and updates from pr-5239 by @david-leifker in #6563
- refactor(airflow): remove verbose log from airflow plugin by @bskim45 in #6516
- feat(cli): remove inconsistency check command by @anshbansal in #6569
- fix(ingest): restrict snowflake's sqlalchemy dep by @hsheth2 in #6579
- docs(notes): add release notes for v0.1.69 managed DataHub by @anshbansal in #6573
- fix(test): fix delete smoke test by @david-leifker in #6585
New Contributors
- @wangsaisai made their first contribution in #6343
- @stanbaker made their first contribution in #6287
- @lurecas made their first contribution in #6053
- @Reilman79 made their first contribution in #6396
- @LavinaVRovine made their first contribution in #6134
- @KulykDmytro made their first contribution in #6433
- @jakobhanna made their first contribution in #6477
- @lustefaniak made their first contribution in #6478
- @syedzoherer made their first contribution in #6370
- @phongvu99 made their first contribution in #6456
- @looppi made their first contribution in #6441
- @rtekal made their first contribution in #6463
Full Changelog: v0.9.2...v0.9.3
V0.9.2
# Release Highlights
User Experience
Metadata Ingestion
New ingestion source PowerBI Report Server
DataHub Docs Site
What's Changed
- feat(change-event): add change events for DataProcessInstanceRunEvent by @aditya-radhakrishnan in #6320
- Worked on the Usage column & Lineage Drawer by @Ankit-Keshari-Vituity in #6290