Releases: datahub-project/datahub
v0.8.40
Highlights
Fixes bug in 0.8.39 that prevented standalone MAE consumers from being deployed.
User Experience
Support for deleting Tags and Domains via the UI
Support for editing Domain name via the UI
Visualize Glossary Term source on the Glossary Term Entity Page
Developer Experience
Fix for issue where standalone MAE consumers could not be deployed
Metadata Ingestion
Script to re-index sibling associations for dbt nodes that had already been ingested before 0.8.39
What's Changed
- feat(search) Allow users to update the number of search results per page by @chriscollins3456 in #5212
- feat(build): add base image for ingest by @anshbansal in #5243
- feat(ingest): working with multiple bigquery projects by @anshbansal in #5240
- fix(build): missing libs by @anshbansal in #5254
- fix(build): use correct creds by @anshbansal in #5261
- feat(ingest): redshift - Option to define path spec for Redshift lineage generation by @treff7es in #5256
- fix(ui): Enable previews properly when browsing for DataJob by @MikeSchlosser16 in #5250
- fix(docs): Fix acronym on mxe docs by @MikeSchlosser16 in #5249
- fix(ui): Support deleting references to glossary terms / nodes, users, assertions, and groups by @jjoyce0510 in #5248
- feat(docs) add links in quickstart for adding users by @pedro93 in #5267
- fix(siblings) Display sibling assertions in Validations tab by @chriscollins3456 in #5268
- Feat(domain) Add ability to edit a Domain name from the UI by @chriscollins3456 in #5266
- Delta lake base by @MugdhaHardikar-GSLab in #5259
- fix(siblings) Update the names of siblings utils args for readability by @chriscollins3456 in #5269
- docs(adopters): add showroomprive and n26 as DataHub adopters by @maggiehays in #5271
- feat(glossary) Add Source section to sidebar for Glossary Terms by @chriscollins3456 in #5262
- fix(delta-lake): fix dependency issue for snowflake due to s3_util by @MugdhaHardikar-GSLab in #5274
- fix(ingest): s3 - Remove unneeded methods from s3_util by @MugdhaHardikar-GSLab in #5276
- Selector recommendations in Owner, Tag and Domain Modal by @Ankit-Keshari-Vituity in #5197
- fix(security) Sanitize rich text before sending to backend or rendering on frontend by @chriscollins3456 in #5278
- feat(GraphQL): Support for Deleting Domains, Tags via GraphQL API by @jjoyce0510 in #5272
- feat(build): reduce build time for ingestion image by @anshbansal in #5225
- fix(ingestion): profiling - Fixing partitioned table profiling in BQ by @treff7es in #5283
- fix(ingest) redshift: Adding missing dependencies and relaxing sqlalchemy dependency by @treff7es in #5284
- fix(ingestion): Reverting sqlalchemy upgrade because it caused issues with mssql and redshift-usage by @treff7es in #5289
- fix(Siblings): Have sibling hook use entity client by @gabe-lyons in #5279
- Show message when related glossary terms are empty. by @Ankit-Keshari-Vituity in #5285
- docs(adopter): add Digital Turbine as DataHub adopter by @maggiehays in #5290
- Update schema-registry docker.env by @liyuhui666 in #5231
- feat(siblings): index sibling aspects for historical dbt metadata by @gabe-lyons in #5291
- feat(ui) Adding support for deleting Tags and Domains via the UI by @jjoyce0510 in #5280
Full Changelog: v0.8.39...v0.8.40
v0.8.39
Release Highlights
Known Issues
When using stand-alone MAE consumers (mae-consumer-job) this release will not work; this has been resolved in v0.8.40.
User Experience
- NEW: support for surfacing outcomes of dbt Tests in dataset entity pages (see it in action here)
- NEW: Improved navigation of dbt resources: dbt models and their associated warehouse tables are now merged into a unified entity (see it here). This will automatically be enabled for all newly ingested entities. To view this for entities you have already ingested, you will need to run a restore indices job.
- Improvement to Impact Analysis: When looking at the
Lineage
tab, you can now easily toggle between “Upstream” and “Downstream” entities (try it out here)
Developer Experience
- NEW: Java Kafka Emitter – Use this when you want to decouple your metadata producer from the uptime of your datahub metadata server by utilizing Kafka as a highly available message bus
Metadata Ingestion
- NEW: Make bulk edits to your metadata via CSV (read more)
- Snowflake ingestion improvements: configure profiling to run only if they have been updated within the prior N days
- Managed ingestion update: removed need for sink block
What's Changed
- fix(ui-ingestion): update looker ingestion warning banner by @aditya-radhakrishnan in #5142
- chore: Bump Default UI Ingestion Version 0.8.38 by @jjoyce0510 in #5145
- feat(schema): support rendering schemas with
.
in field names by @gabe-lyons in #5141 - feat(dbt): Platform instances for target platform by @skrydal in #5129
- feat(ingest): snowflake profile tables only if they have been updates… by @mayurinehate in #5132
- fix(airflow): fixes DeprecationWarning with hook-class-names by @sayakmaity in #5143
- feat(frontend): Parse JWT access token claims by @chen4119 in #5138
- fix(tokens): Using keyword search filters for ListAccessTokensResolver by @jjoyce0510 in #5154
- feat(ui) Update the max text length of Terms/Term Groups by @chriscollins3456 in #5162
- docs(policies): add info about Manage User Credentials by @aditya-radhakrishnan in #5157
- fix(restore-indices): Do not fail on MAE row count diff by @dexter-mh-lee in #5165
- fix(Kafka-setup): Make sure it doesn't fail when the new envs are not set by @dexter-mh-lee in #5168
- chore(deps): Bump Nimbus Jose JWT dependency by @pedro93 in #5158
- fix(recs): Verify that an entity exists before recommending by @jjoyce0510 in #5163
- fix(business glossary): setting properties to be empty if the node has no properties aspect by @gabe-lyons in #5166
- refactor(ui): Misc improvements to Dataset Assertions UI by @jjoyce0510 in #5155
- chore(guava): force version of guava in client jars per #5134 by @RyanHolstien in #5153
- feat(boot): Make Glossary Term Upgrade Async by @jjoyce0510 in #5164
- fix(frontend): Add iam auth jar to frontend by @dexter-mh-lee in #5171
- docs(features): update & clean up Features page by @maggiehays in #5175
- fix(glue): fix glue profiling config option by @kangseonghyun in #5178
- feat(upgrade) Check version when determining to run RestoreGlossaryIndices step by @chriscollins3456 in #5182
- fix(jaas): fixed auth.jaas.enabled option parsing by @alexey-kravtsov in #5179
- feat(ingestion): bigquery - Option to send usage queries as well as Operational metadata by @treff7es in #5151
- feat(build): changes to decrease build time, cancel runs in case of multiple commits by @anshbansal in #5187
- refactor(docs): Update Metadata Events Docs by @jjoyce0510 in #5173
- fix(ingest): If there is no manager for a LDAP user (example: system account) by @bda618 in #5180
- bug(ingest): correct case of sys views for mssql description populati… by @BALyons in #5186
- refactor(configs): Simplify Kafka Topic name configurations + docs by @jjoyce0510 in #5198
- feat(ingest): dbt - adding support for dbt tests by @shirshanka in #5201
- fix(cli): correct handling of env variables by @anshbansal in #5203
- feat(ci): split integration tests to reduce run time by @anshbansal in #5205
- feat(datahub-client): add java kafka emitter by @MugdhaHardikar-GSLab in #5074
- feat(graphql): add metrics capturing for graphql latency by @RyanHolstien in #5200
- test(ingestion): bigquery-usage - Adding tests for bigquery usage filters by @treff7es in #5195
- fix(ui): load monaco-editor as a dependency and not from a third party CDN by @Masterchen09 in #5189
- feat(cli): Add token parameter for sample ingestion by @pedro93 in #5160
- feat(lineage) Update Lineage tab and Impact Analysis feature by @chriscollins3456 in #5121
- fix(ingest): add missing ownership types by @afghori in #5209
- feat(ingestion) ldap: make ldap attrs keys configurable by @atulsaurav in #4682
- Remove unnecessary space from application.yml of GMS by @mmmeeedddsss in #5216
- fix(upgrade): fix upgrade when s3 path has = by @RyanHolstien in #5220
- feat(docs) Add and update docs for the new Glossary experience by @chriscollins3456 in #5211
- feat(glossary) Add empty state for the Business Glossary home page by @chriscollins3456 in #5217
- feat(bootstrap): add bootstrap step to clear out unknown aspect rows from the database by @RyanHolstien in #5148
- feat(ingest): adds csv enricher ingestion source by @aditya-radhakrishnan in #5221
- fix(build): pin confluent kafka dependency by @anshbansal in #5224
- fix(ingest): databricks - ingest structs correctly through hive by @shirshanka in #5223
- feat(dbt): add sibling association logic to associate dbt elements with their target systems by @gabe-lyons in #5190
- feat(tableau): use pagination for all connection queries by @mayurinehate in #5204
- Handling 404 page not found by @Ankit-Keshari-Vituity in #5227
- refactor(UI): Refactor Dataset Health Status by @jjoyce0510 in #5222
- fix(dbt-test): Inconsistency in assertions by @Santhin in #5214
- feat(ingest): remove need for sink block in UI based ingestion by @anshbansal in #5208
- fix(ingest): bigquery - Grouping date named tables at bigquery by @treff7es in #5230
- Add check for 0 rows when profiling datasets from s3 by @Jiafi in #5219
- [bug fix]: disabled create buttons by @xiphl in #5234
- fix(ingest): bigquery - Handling gracefully sql parser error in bq lineage by @treff7es in #5238
- fix(ingest): do not dump password by @anshbansal in #5235
- feat(ingest): dbt - improving dbt_meta mapping by @shirshanka in https://github.com/datahub-project/data...
[!] DataHub v0.8.38
Notice: There is a known issue in this release. Listing access tokens for a user may not return the correct results to the UI due to an unreliable query to DataHub's search backend. This will be resolved in v0.8.39. Note that this does not mean that access tokens will not work or are in any way compromised - the functionality of generating and using access tokens is not impacted.
The below release notes are copied from v0.8.37 release notes.
Highlights
User Experience
This release comes packed full of new features and updates.
- NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
- NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
- NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
- UPDATE - Rename “Manage” navigation item to “Govern”
- [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
- [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
- FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
- Minor fixes & improvements to UI for adding policy users + groups.
Metadata Ingestion
- Support Snowflake ingest via Oauth
- Misc fixes and improvements to existing ingestion sources
Disclaimers:
With this upgrade, we've added a new mechanism for authenticating users: native authentication. By default, this is enabled, which will allow new users to be created by Admin and for the user to login.
If you were previously disabling BOTH JaaS (via AUTH_JAAS_ENABLED = false) AND OIDC, and you still do not want to require a username + password to login, you'll need to add a new environment variable to datahub-frontend-react
container: AUTH_NATIVE_ENABLED=false.
What's Changed
- feat(docs): auto-open config section for ingestion sources by @shirshanka in #5075
- feat(spark-lineage): coalesce spark jobs by @MugdhaHardikar-GSLab in #5077
- refactor(ui): UI Navigation Refactoring by @jjoyce0510 in #5076
- Update docs to alert users to restore indices for their Glossary by @chriscollins3456 in #5082
- fix(restore-indices): Do not fail while working with each row by @dexter-mh-lee in #5084
- fix(ingestion): looker - Handling gracefully invalid json in query dynamic field by @treff7es in #5083
- feat(docs): ingest - add tab for config json schema by @shirshanka in #5086
- chore(dep): upgrade json-smart by @RyanHolstien in #5081
- feat(ingest): rest_emitter - Adding option to rest emitter to disable ssl verification by @treff7es in #5042
- feat(cli): suggest upgrades when appropriate by @shirshanka in #5091
- feat(doc): Generating json schema for ingestion recipes by @treff7es in #5092
- feat(ingest): snowflake using oauth by @saxo-lalrishav in #4647
- fix(ui): do not show copy URN buttons when Clipboard API is not available by @Masterchen09 in #5087
- feat(kafka): use a thread pool executor for kafka for thread reuse by @RyanHolstien in #5079
- Manage Access Tokens by @Ankit-Keshari-Vituity in #5067
- tests(lookml): adding tests for model deny patterns by @gabe-lyons in #4934
- feat(model): Add optional context field to tag/term association by @dexter-mh-lee in #5085
- fix(glossary) Two quick followup fixes around the new Glossary updates by @chriscollins3456 in #5065
- chore(deps): bump eventsource from 1.1.0 to 1.1.1 in /docs-website by @dependabot in #5057
- feat(oidc): add configurable read timeout by @RyanHolstien in #5088
- feat(glossary) Display Incoming 'IsA' Glossary related entities by @chriscollins3456 in #5063
- fix(profiling): don't stop if some steps fail by @anshbansal in #5095
- feat(upgrades) Create new DataHubUpgrade + Restore Glossary Entities Bootstrap step by @chriscollins3456 in #5099
- fix(deps): ingest - moving packaging to framework_common by @shirshanka in #5096
- feat(frontend) Allow overriding akka-max-header-value-length by @karoliskascenas in #5094
- refactor(graphql): Migrate Visual Config into the Configuration Provider by @jjoyce0510 in #4780
- chore(akka): upgrade akka http for vuln by @RyanHolstien in #5100
- fix(build): reduce time taken for resolution by @anshbansal in #5106
- fix(build): remove dependencies added for compatibility by @anshbansal in #5108
- fix(ci): pin google-cloud-logging to avoid pip backtracking by @shirshanka in #5109
- Policies page issue by @Ankit-Keshari-Vituity in #5107
- chore(deps): Bump spring to 5.3.20 for vuln fix by @pedro93 in #5110
- fix(cli): Bumping avro-gen3 to 0.7.4 by @jjoyce0510 in #5098
- feat(docs): Updating example files with the new ingestion recipe suffix by @treff7es in #5103
- feat(graphql): add graphql endpoint to check whether an entity exists by @aditya-radhakrishnan in #5102
- feat(looker): ensure explore name matches looker's display name by @shirshanka in #5111
- fix(ui): Fixing missing homescreen logo by @jjoyce0510 in #5112
- fix(dbt): final fix of dbt platform instance issues by @gabe-lyons in #5115
- feat(ingestion): bigquery-usage - Collect stats from read event reasons by @treff7es in #5118
- feat(terms) Add ability to Add and Remove Related Terms to Glossary Terms by @chriscollins3456 in #5120
- Fixed Issue : Add Members Modal by @Ankit-Keshari-Vituity in #5117
- fix(bigquery): handling of empty partitioned tables, improve report message by @anshbansal in #5122
- feat(glossary) Hide self and children from select when moving a GlossaryNode by @chriscollins3456 in #5123
- fix(ingestion): bigquery-usage - Removing filtering at queryevents by @treff7es in #5124
- feat(users): add ability to add native users from the UI by @aditya-radhakrishnan in #5097
- fix(ingestion): Looker original view name should be used for explore_joins by @sebkim in #4928
- fix(iceberg): Change how MapType are mapped to Avro to support complex Map key type by @cccs-eric in #5060
- fix(ingestion): bigquery-usage - Only send operational metadata for allowed tables by @treff7es in #5127
- fix(dbt): Validator error fix by @BoyuanZhangDE in #5125
- feat(settings): skip calling graphql hooks if user does not have the right permissions by @aditya-radhakrishnan in #5136
- fix(ingest): fix table urn for athena connectionType by @mayurinehate in #5135
- Fixed the UI issue on Deprecated Pop-Up issue by @Ankit-Keshari-Vituity in #5130
- fix(ui-ingestion): show warning banner when configuring looker ui-ingestion for the first time by @aditya-radhakrishnan in #5139
- fix(tokens): Fix stale cache problem, reduce cache timeout for access tokens + fix listing owner tokens by @jjoyce0510 in #5140
Full Changelog: v0.8.37...v0.8.38
[!] DataHub v0.8.37
Notice! This version has a few known bugs regarding revocable access tokens. Specifically, the UI for listing access tokens does not work properly unless you have a specific platform privilege. Additionally, there is a delay in revoking access tokens of 6 hours. We recommend that you skip this version and upgrade directly to v0.8.38.
Highlights
User Experience
This release comes packed full of new features and updates.
- NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
- NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
- NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
- UPDATE - Rename “Manage” navigation item to “Govern”
- [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
- [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
- FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
- Minor fixes & improvements to UI for adding policy users + groups.
Metadata Ingestion
- Support Snowflake ingest via Oauth
- Misc fixes and improvements to existing ingestion sources
What's Changed
- feat(docs): auto-open config section for ingestion sources by @shirshanka in #5075
- feat(spark-lineage): coalesce spark jobs by @MugdhaHardikar-GSLab in #5077
- refactor(ui): UI Navigation Refactoring by @jjoyce0510 in #5076
- Update docs to alert users to restore indices for their Glossary by @chriscollins3456 in #5082
- fix(restore-indices): Do not fail while working with each row by @dexter-mh-lee in #5084
- fix(ingestion): looker - Handling gracefully invalid json in query dynamic field by @treff7es in #5083
- feat(docs): ingest - add tab for config json schema by @shirshanka in #5086
- chore(dep): upgrade json-smart by @RyanHolstien in #5081
- feat(ingest): rest_emitter - Adding option to rest emitter to disable ssl verification by @treff7es in #5042
- feat(cli): suggest upgrades when appropriate by @shirshanka in #5091
- feat(doc): Generating json schema for ingestion recipes by @treff7es in #5092
- feat(ingest): snowflake using oauth by @saxo-lalrishav in #4647
- fix(ui): do not show copy URN buttons when Clipboard API is not available by @Masterchen09 in #5087
- feat(kafka): use a thread pool executor for kafka for thread reuse by @RyanHolstien in #5079
- Manage Access Tokens by @Ankit-Keshari-Vituity in #5067
- tests(lookml): adding tests for model deny patterns by @gabe-lyons in #4934
- feat(model): Add optional context field to tag/term association by @dexter-mh-lee in #5085
- fix(glossary) Two quick followup fixes around the new Glossary updates by @chriscollins3456 in #5065
- chore(deps): bump eventsource from 1.1.0 to 1.1.1 in /docs-website by @dependabot in #5057
- feat(oidc): add configurable read timeout by @RyanHolstien in #5088
- feat(glossary) Display Incoming 'IsA' Glossary related entities by @chriscollins3456 in #5063
- fix(profiling): don't stop if some steps fail by @anshbansal in #5095
- feat(upgrades) Create new DataHubUpgrade + Restore Glossary Entities Bootstrap step by @chriscollins3456 in #5099
- fix(deps): ingest - moving packaging to framework_common by @shirshanka in #5096
- feat(frontend) Allow overriding akka-max-header-value-length by @karoliskascenas in #5094
- refactor(graphql): Migrate Visual Config into the Configuration Provider by @jjoyce0510 in #4780
- chore(akka): upgrade akka http for vuln by @RyanHolstien in #5100
- fix(build): reduce time taken for resolution by @anshbansal in #5106
- fix(build): remove dependencies added for compatibility by @anshbansal in #5108
- fix(ci): pin google-cloud-logging to avoid pip backtracking by @shirshanka in #5109
- Policies page issue by @Ankit-Keshari-Vituity in #5107
- chore(deps): Bump spring to 5.3.20 for vuln fix by @pedro93 in #5110
- fix(cli): Bumping avro-gen3 to 0.7.4 by @jjoyce0510 in #5098
- feat(docs): Updating example files with the new ingestion recipe suffix by @treff7es in #5103
- feat(graphql): add graphql endpoint to check whether an entity exists by @aditya-radhakrishnan in #5102
- feat(looker): ensure explore name matches looker's display name by @shirshanka in #5111
- fix(ui): Fixing missing homescreen logo by @jjoyce0510 in #5112
- fix(dbt): final fix of dbt platform instance issues by @gabe-lyons in #5115
- feat(ingestion): bigquery-usage - Collect stats from read event reasons by @treff7es in #5118
- feat(terms) Add ability to Add and Remove Related Terms to Glossary Terms by @chriscollins3456 in #5120
- Fixed Issue : Add Members Modal by @Ankit-Keshari-Vituity in #5117
- fix(bigquery): handling of empty partitioned tables, improve report message by @anshbansal in #5122
- feat(glossary) Hide self and children from select when moving a GlossaryNode by @chriscollins3456 in #5123
- fix(ingestion): bigquery-usage - Removing filtering at queryevents by @treff7es in #5124
- feat(users): add ability to add native users from the UI by @aditya-radhakrishnan in #5097
- fix(ingestion): Looker original view name should be used for explore_joins by @sebkim in #4928
- fix(iceberg): Change how MapType are mapped to Avro to support complex Map key type by @cccs-eric in #5060
- fix(ingestion): bigquery-usage - Only send operational metadata for allowed tables by @treff7es in #5127
- fix(dbt): Validator error fix by @BoyuanZhangDE in #5125
- feat(settings): skip calling graphql hooks if user does not have the right permissions by @aditya-radhakrishnan in #5136
Full Changelog: v0.8.36...v0.8.37
DataHub V0.8.36
V0.8.36
Highlights
User Experience
NEW – Manage Glossary Terms via the DataHub UI! Delivering on our Q2’22 Roadmap item, end users can now create, edit, move, delete, and deprecate Glossary Terms via the UI! With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to restore your indices in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!
Ability to add multiple Owners, Tags, Terms
Developer Experience
The new Revokable Token API supports a new type of Access Token which can be revoked & queried, allowing admins to easily delete tokens for operational & security reasons. Read all about it in the Access Token Management Usage Guide.
Ingestion Updates
This release includes 3 new Metadata Sources:
- Iceberg
- Vertica
- SAP HANA
📣 Massive shoutout to DataHub Community members @cccs-eric, @eburairu, and @buggythepirate for driving these contributions! 📣
These sources are currently marked as “Testing” - we encourage you to try them out & provide feedback in the DataHub #ingestion Slack channel!
We’ve rolled out the following ingestion-related improvements:
- AWS Glue - data profiling is now supported
- S3 ingestion speed-up
- Various bug fixes
Full Commit Log
- #5071 @dexter-mh-lee fix(docker): Fix mysql setup bug
- #5066 @jjoyce0510 refactor(docs): Rename metadata modeling ingestion sidebar titles
- #5036 @mmmeeedddsss fix(mysql-setup-job): add mysql default port override support
- #5056 @nj7 fix: ES Rest Client Creation for non ssl authenticated connection
- #5053 @ShubhamThakre fix(ui): ui bug fix for datasets sidebar stats section
- #5061 @anshbansal feat(redash): add parallelism support for ingestion
- #5017 @anshbansal feat(model): new chart types
- #5047 @RyanHolstien fix(datahub-upgrade): exclude unnecessary configuration from standalone applications
- #5052 @shirshanka feat(ci): datahub-client - add workflow, fix build
- #5054 @jjoyce0510 docs(actions): Adding DataHub Actions to docs website
- #5031 @piyushn-stripe feat(frontend): Allow overriding frontend with a custom akka http server
- #5050 @dexter-mh-lee Remove exception on ingest policies
- #5043 @Masterchen09 fix(docs): hana - rename SAP HANA source and data platform
- #5051 @shirshanka fix(ingest): fix build breakage due to traitlets 5.2.2 bug
- #5045 @anshbansal fix(redash): fix bug with names, add option for page size, debugging info
- #5022 @jjoyce0510 fix(restore): Add RESTATE ChangeType to MCL / MCP to permit restore indices
- #5041 @anshbansal doc(bigquery): fix missing permissions
- #5030 @endeesa fix(doc) - Specify docker-compose version to avoid compatibility issues
- #4879 @BoyuanZhangDE feat(ingest): glue - enable profiling
- #5035 @treff7es fix(profiling): bigquery - Fix for Bigquery temp table creation on GE >= 0.15.3
- #5040 @shirshanka fix(build): m1 build fails to install hdb-cli
- #5026 @chriscollins3456 feat(glossary) Business Glossary updates
- #4940 @MugdhaHardikar-GSLab fix(spark-lineage): remove need for sparksession.stop call
- #5023 @rslanka fix(ingest): common - fix nullability determination for the AVRO fixed type.
- #5012 @anshbansal fix(cli): don't use env for container, add example
- #5021 @maggiehays docs(townhall): update townhall rsvp link and add may townhall detail
- #5038 @shirshanka fix(build): docgen should fail if plugin is not loadable
- #5033 @RyanHolstien fix(timelineAPI): fix issue with semantic versioning
- #5034 @RyanHolstien fix(telemetry): exclude configuration from standalone apps
- #5029 @RyanHolstien feat: telemetry improvements
- #5028 @gabe-lyons dont set platform instances for sources
- #5027 @anshbansal fix(parsing): incorrect parsing for commas
- #4938 @Ankit-Keshari-Vituity refactor(ui): UI Integration to add multiple tags, terms and owners
- #5025 @anshbansal fix(parsing): improve sql parsing, some debugging redash
- #5024 @rslanka fix(ingestion): Remove hana from base_dev_requirements to unblock m1 users
- #5014 @anshbansal fix(bigquery): reduce number of calls for details of partitioning
- #5016 @ShubhamThakre fix(ui): arrow click position update
- #5019 @rslanka fix(build): fix for hana build failure for aarch64.
- #5020 @jjoyce0510 feat(Tests): Make DataHub Tests Feature configurable via env variable
- #5005 @hsheth2 test(ingestion): change class names to avoid unittest warnings
- #5006 @hsheth2 fix(ingestion): use raw strings for regexes
- #5010 @rslanka feat(ingestion): Add Iceberg source
- #5001 @PatrickfBraz fix(bigquery-usage): fix audit metadata query template
- #4997 @anshbansal fix(redash): improve logging for debugging, add validation for dataset urn, some refactoring
- #4376 @buggythepirate feat(ingest): Added new ingestion source SAP HANA
- #5011 @rslanka Fix pulsar source docs.
- #4555 @eburairu feat(ingest): Add Source from Vertica
- #5008 @anshbansal fix(dbt): missing aws dependency
- #5007 @anshbansal fix(bigquery): restrict protobuf version
- #5004 @pedro93 fix(gms): Fix incorrect StatefulTokenService init
- #5002 @ShubhamThakre fix(ui): ui bug fix - fixing search card vertical margin
- #4994 @anshbansal doc(delete): add example for dataflow and datajob
- #4988 @jjoyce0510 feat(DataHub Operations): Adding GraphQL mutation for reporting Dataset operations
- #4998 @shirshanka fix(cli): timeline - adjust for timeline API changes on server
- #5000 @pedro93 fix(docs): Fixes token docs
- #4989 @jjoyce0510 feat(Tests): Metadata Tests Models + APIs + UI (Part 1)
- #4995 @treff7es fix(airflow): Fix for Airflow 1 support
- #4993 @shirshanka chore(deps): upgrade gson version
- #4935 @BoyuanZhangDE feat(dbt): enable dbt read artifacts from s3
- #4833 @treff7es feat(airflow): Airflow lineage ingestion plugin
- #4931 @mayurinehate fix(ingest): tableau - fix chart custom properties None key error, update docs
- #4943 @mayurinehate feat(model): add created, lastModified auditstamps to SchemaField
- #4991 @anshbansal refactor(redash): emit charts first and try with id based dashboard API first
- #4942 @mohdsiddique metabase chart are missing from dashboard
- #4992 @anshbansal doc(ingest): update golden file command
- #4927 @treff7es feat(ingest): s3 - speeding up ingestion with sampling
- #4979 @pedro93 fix(smoke-tests) Increases sleep timeout in rollback test to prevent flakiness
- #4964 @dexter-mh-lee feat(run): Create a describe run endpoint for fetching aspects created by the ingestion run
- #4169 @claudio-benfatto feat(ingestion): optionally disable some kafka schema warnings
- #4972 @mayurinehate feat(great-expectations): allow DATAHUB_DEBUG env var to enable debug logs in GE Action
- #4957 @justinas-marozas refactor(metadata-io): introduce a storage-independent in-memory entity aspect model
- #4982 @jjoyce0510 feat(authorization): Adding AuthorizerContext + ResourceSpecResolver to context
- #4984 @anshbansal doc(ingestion): default boolean fix, broken bigquery docgen
- #4970 @pedro93 feat(graphql) Add new Revokable Token API
- #4987 @anshbansal fix(ingest): remove new schema field usage
- #4985 @anshbansal fix(redash): use dashboard id if slug does not work
- #4986 @pedro93 chore(deps): upgrade datastax libs version
- #4981 @RyanHolstien fix(metadata-service): telemetry - fix hardcoded aspect name, suppress errors when producing MAE
- #4983 @shirshanka fix(ingest): mode - dashboards without creator info fails to process
- #4975 @chriscollins3456 fix(UI) Fix multiple UI usability issues
- #4977 @maggiehays docs(townhall): update invite links and townhall history
- #4980 @MugdhaHardikar-GSLab feat(spark-lineage): support for persist API
- #4974 @anshbansal feat(bigquery): add partition key tag
- #4967 @anshbansal fix(bigquery): add rate limiting for api calls made
- #4971 @shirshanka fix(cli): graph - get_aspect_v2 method fails to deserialize aspects correctly
- #4958 @anshbansal doc(ingest): mysql - describe required grants
- #4969 @RyanHolstien doc(telemetry): fix telemetry doc
- #4878 @MugdhaHardikar-GSLab fix(datahub-client): support utf8 encoding
- #4961 @anshbansal feat(bigquery): reduce logging
- #4909 @ShubhamThakre fix(ui): policy outside modal click issue update
- #4968 @jeffmerrick docs(website): Remove banner and nav item for metadata day 2022
- #4965 @mmmeeedddsss docs(datahub-kafka-sink): add topic_routes config to doc of datahub-kafka-sink
- #4966 @liyuhui666 fix(data platforms): Update data_platforms.json
- #4922 @mayurinehate feat(cli): raise error if get entity api fails
- #4963 @Masterchen09 fix(ui): do not show copy URN buttons when Clipboard API is not available
- #4962 @RyanHolstien feat(release): update CLI version
- #4960 @RyanHolstien feat: updates for 0.8.35
- #4945 @treff7es Revert "feat(spark-lineage): add support for iceberg and cache based plans (#4882)"
- #4954 @dexter-mh-lee fix(ci): remove scheduled artifact deletion run to avoid api rate limiting
-...
[!] DataHub v0.8.35
Notice: Deploying this release will result in an incorrectly named aspect entry existing in the database. The impact is that some upgrade jobs may fail to perform full scans of the database. This will be fixed by upgrading to > v0.8.38 OR by pulling the latest DataHub Upgrade docker image and executing the following upgrade:
./datahub-upgrade.sh -u RemoveUnknownAspects
v0.8.35
Highlights
Reduced vulnerability counts in project
Various bug fixes
New streamlined docker workflow
Full Commit Log
- #4937 @RyanHolstien fix(env): provide default for unset telemetry variable
- #4926 @gabe-lyons feat(dbt): enable data platform instance on dbt
- #4933 @anshbansal fix(lint): lint failure due to mypy upgrade
- #4925 @RyanHolstien feat(telemetry): add server side telemetry
- #4917 @jjoyce0510 feat(graphql): Adding resolvers for adding multiple tags, terms, and owners
- #4924 @chen4119 fix(kafka-setup): Check if keystore/truststore location env variables are set
- #4919 @jjoyce0510 feat(ui): Adding Search Bar to all List Views (groups, users, domains, policies, ingestion)
- #4923 @chen4119 fix(kafka-setup): Add ssl.keystore.type and ssl.truststore.type
- #4882 @maggie-zhu feat(spark-lineage): add support for iceberg and cache based plans
- #4918 @RyanHolstien fix(idea): change location of coercer to make intellij not complain about classes
- #4916 @chriscollins3456 fix(ui) Fix some spacing issues on the search card
- #4914 @anshbansal docs(ingest): remove incorrectly annotated lineage capability
- #4912 @mayurinehate docs(transformer): update custom transform example to add missing super init
- #4903 @jjoyce0510 refactor(actions): Migrate to use new datahub-actions container
- #4869 @jjoyce0510 refactor(API): Add "Filter" support for Assertion Run Events, Dataset Profiles, Dataset Operations
- #4860 @anshbansal fix(doc): update doc url to generated docs
- #4910 @chriscollins3456 feat(containers) Get and display all parent containers in header and search
- #4791 @pedro93 feat(gms): Add support for deleting reference pointers when deleting by urn
- #4911 @RyanHolstien docs(frontend): update build command for partial build
- #4839 @BoyuanZhangDE feat(ingestion): For all usage connectors, allow exclusion of top_n_queries from ingestion via a config param.
- #4908 @jeffmerrick fix(docs): Metadata day 2022: Fix year
- #4859 @anshbansal doc(biqquery): add caveat for materialized view
- #4906 @jeffmerrick docs(website): add banner and nav item for metadata day 2022
- #4905 @anshbansal fix(build): Fix breaking changes from GE 0.15.3
- #4884 @shirshanka fix(deps): reduce frontend dependency
- #4902 @anshbansal doc(ingestion): add note for UI ingestion & custom sources
- #4901 @anshbansal revert(bigquery-usage): dataset allow filter impl
- #4824 @gabe-lyons fix(usage): pull usage from environment source rather than args
- #4899 @SagarTiwari24 fix(docs): Update developing.md to mention directory context
- #4892 @gabe-lyons fix(ui): fix side panel resize css
- #4890 @justinas-marozas fix(mxe-consumer): exclude CassandraAutoConfiguration from consumer boot
- #4853 @sebkim fix(ingestion): ElasticSearch when no properties from elastic_mappings, gracefully continue
- #4865 @dependabot chore(deps): bump axios from 0.21.1 to 0.21.4 in /datahub-web-react
- #4898 @treff7es fix(ingestion): bigquery-usage: Fix biquery usage table deny pattern template
- #4893 @shirshanka fix(ci): remove logging statement
- #4891 @RyanHolstien chore(deps): play - upgrade for CVEs
- #4889 @shirshanka fix(ci): clean up docker workflow for multi-tags
- #4875 @shirshanka fix(ingest): lookml - add view definitions for all views
- #4887 @shirshanka fix(ci): docker - either load or push, don't do both
- #4885 @shirshanka fix(ci): remove buildx and qemu for non multi-platform images
- #4862 @anshbansal fix(sql-parsing): improve error handling
- #4883 @shirshanka fix(ci): remove multiplatform builds from containers that don't support it
- #4881 @shirshanka feat(ci): docker actions simplify, add vulnerability scanner, simplify smoke-tests
- #4867 @chriscollins3456 feat(dataPlatformInstance) - Resolve and display dataPlatformInstance on entities
- #4880 @shirshanka fix(docs): ingest - sort modules, fix small typos
- #4866 @ShubhamThakre fix(ui): search filter entity ui update
- #4855 @treff7es fix(ingestion): dependencies - Downgrading typing-extension dependency to work with Airflow 2.0.2
- #4600 @pedro93 Use ingest proposal to submit status updates
- #4868 @RyanHolstien Revert "chore(deps): upgrade play to remove CVEs (#4864)"
- #4857 @RyanHolstien chore(jetty): upgrade jetty to 9.4.46 for CVE
- #4776 @tha23rd fix(bigquery-usage): dataset allow filter impl
- #4864 @RyanHolstien chore(deps): upgrade play to remove CVEs
- #4843 @cristiancalugaru ssl configuration support for elasticsearch source
- #4861 @RyanHolstien Revert "chore(deps): upgrade play dependencies to remove CVE vulnerabilities (#4820)"
- #4846 @dependabot chore(deps): bump async from 2.6.3 to 2.6.4 in /docs-website
- #4847 @dependabot chore(deps): bump minimist from 1.2.5 to 1.2.6 in /docs-website
- #4820 @RyanHolstien chore(deps): upgrade play dependencies to remove CVE vulnerabilities
- #4842 @rslanka fix(ingestion): Allow profiling of only those tables that are allowed by the table_pattern.
- #4844 @RyanHolstien Revert "fix(jetty): upgrade jetty dependency for CVE (#4838)"
- #4838 @RyanHolstien fix(jetty): upgrade jetty dependency for CVE
- #4840 @rslanka chore(deps): upgrade dependency io.netty:netty-all to address vulnerability
- #4841 @RyanHolstien fix(policies): change order of operations for policies bootstrap step to update index after database
- #4837 @RyanHolstien chore(deps): move from velocity 1.7 to 2.3
- #4821 @ShubhamThakre feat(ui): entity profile add copy url option update
- #4817 @aditya-radhakrishnan docs(schema-history): add usage guide for schema history
- #4835 @gabe-lyons hide soft deleted entities in lineage
- #4836 @shirshanka refactor(metadata-service): remove redundant file
- #4826 @jjoyce0510 chore(deps): pinning jackson dataformat cbor
- #4777 @treff7es feat(ingest): s3 - add support for multiple pathspecs in one recipe
- #4807 @eclaassen-pb chore(deps): upgrade spring and parquet dependencies
- #4813 @pedro93 fix(docs): Adds access policy documentation
- #4832 @mayurinehate feat(ingest): great-expectations - add more logs
v0.8.34
Release Highlights
Developer Experience
- DataHub Actions Framework is LIVE! The Actions Framework makes responding to real-time changes in your Metadata Graph easy, enabling you to seamlessly integrate DataHub into a broader events-based architecture. Check out the repo here
- This release also introduces OpenAPI endpoints to post, get, and delete entities. Check out the usage guide here
- Metadata Ingestion Source docs have a new look! We now have code-generated documentation to apply consistency in format and contents
User Experience
- New! The Dataset Schema page now supports a “Blame View” to quickly understand how a field has evolved over semantic schema versions. You can find more info about how we compute versions here.
Ingestion Improvements
- New! Now incubating the Apache Pulsar source
- Update to Feast connector to support v0.18
- Ongoing improvements to Snowflake external table support
- Improvements to handling BigQuery audit log SQL queries
- Miscellaneous Tableau fixes for lineage, browse path, non-embedded datasets
What's Changed
- fix(cypress) - enable retries for failed tests to minimize flaking by @aditya-radhakrishnan in #4680
- Deprecate an entity by @Ankit-Keshari-Vituity in #4633
- fix(timeline): enhance schema field name change and removal support by @RyanHolstien in #4603
- fix(cli): rest emitter should override config and env variables by @anshbansal in #4622
- fix(docs): elasticsearch secret reference by @felixb in #4314
- fix(mcl-processor): Remove unnecessary log.info by @dexter-mh-lee in #4686
- fix(datahub-client): avoid parallel execution of metadat-io:test by @MugdhaHardikar-GSLab in #4685
- docs(metadata-models-custom): add example script to show producing cu… by @shirshanka in #4681
- fix(gms): Ensure Ordering by version when fetching next version by @arunvasudevan in #4696
- fix(docker): Fix issue #4683 by @jjoyce0510 in #4697
- feat(vulnerability): Upgrade spring libraries to latest version by @dexter-mh-lee in #4698
- refactor(gms): EbeanAspectDao - make the orderBy clause explicitly ascending in getNextVersions by @jjoyce0510 in #4699
- feat(gms): Entity change events v1 (Platform Event) by @jjoyce0510 in #4687
- Redesign the login page by @Ankit-Keshari-Vituity in #4684
- fix(snowflake): remove extra lineage edges in reports, change badly named config variable by @anshbansal in #4595
- fix(bigquery): error due to not handling data properly by @anshbansal in #4702
- fix(looker): Fix for Pydantic validation error for Looker TransportOptions on python 3.8 by @treff7es in #4705
- fix(ingest) bigquery: Moving bigquery temporary credential deletion to atexit by @treff7es in #4701
- fix(lineage): Fix lineage entity drawer height UI bug by @chriscollins3456 in #4707
- feat(ingest) - update identity sources to add flags for masking sensitive work units by @aditya-radhakrishnan in #4711
- fix(snowflake): deprecate config, update examples by @anshbansal in #4644
- fix(glue): delete CatalogId parameter from get_jobs api call by @BoyuanZhangDE in #4646
- fix(ui): Show deprecate button only for specific entity pages. by @jjoyce0510 in #4712
- feat(ml): show custom properties for MLFeatureTable in UI by @maaaikoool in #4706
- fix(glue): fix error for custom connector if ignore_unsupported_conne… by @mayurinehate in #4667
- feat(ingest): add decimal128 custom type for mysql by @kevinhu in #4624
- fix(policy): Use search to fetch all policies by @dexter-mh-lee in #4713
- fix(transformers): add snapshot aspects from dataset into base_transf… by @shirshanka in #4719
- Revert "fix(policy): Use search to fetch all policies" by @dexter-mh-lee in #4725
- minor fix(metadata-ingestion): Add new schemas to python codegen by @jjoyce0510 in #4726
- fix(ui): Display warning in UI when metadata service auth is disabled. by @jjoyce0510 in #4728
- fix(timelineCli): fix naming for timeline cli by @RyanHolstien in #4729
- fix(entity header): Fixes two issues in the EntityHeader - update UI and remove link by @chriscollins3456 in #4720
- Revert "fix(timelineCli): fix naming for timeline cli (#4729)" by @jjoyce0510 in #4731
- feat(cli): suppress stacktrace printing on configuration errors by @shirshanka in #4718
- fix(cli): align default sink env variables across ingest and other cl… by @shirshanka in #4739
- feat(ingest) dbt: Dbt query tag mapping and match template by @treff7es in #4744
- fix(cli): telemetry - make config file processing more robust by @shirshanka in #4738
- feat(react theming): stop homepage flicker for env-var based logos by @gabe-lyons in #4730
- feat(Cassandra): add Cassandra implementation of EntityService by @xdl in #3286
- fix(policies): Re-revert the policies fix + ingest documents directly to search by @dexter-mh-lee in #4733
- feat(cli): Eagerly load datahub actions CLI commands by @jjoyce0510 in #4748
- fix(ingest) bigquery: Fix BigQuery Datetime/Timestamp type column partition table profile bug by @sebkim in #4658
- docs: add missing PR numbers by @anshbansal in #4742
- fix(azure_ad): silently discard other Azure AD object types (#4693) by @cccs-eric in #4704
- fix(datahub-frontend): OIDC discovery URL will not have NONE as auth_methods_supported by @chen4119 in #4710
- fix(docs): fix links by @daha in #4703
- feat(ingest): add Feast repository source by @danilopeixoto in #4094
- feat(soft deletes): rephrasing soft delete banner by @gabe-lyons in #4753
- feat(ebeans): Add metrics to track connection pool by @dexter-mh-lee in #4755
- fix(AWS) When using aws_profile, grab temporary credentials from the session. by @Jiafi in #4751
- feat(metadata-ingestion): Custom endpoint url and proxies in S3. by @pawel3275 in #4708
- fix(tableau): miscellaneous tableau fixes for lineage, browse path, non-embedded datasets by @mayurinehate in #4724
- doc: add warning for JDK by @anshbansal in #4761
- fix(ui): fix expandedName for dataset by @mayurinehate in #4762
- fix(ui): Users and Groups UI bug fixes by @ShubhamThakre in #4746
- fix(azure_ad): make redirect and graph_url optional parameters and update docs by @aditya-radhakrishnan in #4754
- docs(glue): clarify that table regex patterns should be fully-qualified by @aditya-radhakrishnan in #4747
- fix(ml models): fix features tab by @gabe-lyons in #4769
- fix(lint): lib upgrade caused by @anshbansal in #4773
- fix(lineage) Filter dataset -> dataset lineage edges if data is transformed by @chriscollins3456 in #4732
- fix(build): Fix breaking changes from GE 0.15.3 that are affecting our Python3.6 smoke_tests by @rslanka in #4779
- fix(ingestion): Fixing how we eagerly import DataHub actions by @jjoyce0510 in #4784
- fix(ingest): fwk - datahub_api should be initialized by datahub-rest … by @shirshanka in #4786
...
DataHub v0.8.33
Release Highlights
User Experience
Refreshed the ML Entity page to match the feel of all other entity types; improved ML lineage functionality
Ingestion Improvements
- Airflow Improvements - as demoed in March Town Hall
- Add support to capture Airflow execution runs from lineage backend
- Introduce new High level API for generating dataflow/job/dataprocessinstance
- MS SQL ingestion now captures table & column descriptions
- Trino platform support for Great Expectations
- New Presto-on-Hive ingestion source
- BigQuery ingestion now supports extraction of usage info from audit logs
- Fix to Looker ingestion to extract Explore Views from join names
- Fix to Tableau ingestion to avoid duplicating schema in URNs for upstream tables
- Simplify & annotate Redshift Usage source
Full Commit Log
- feat(gms): Expose kafka listener concurrency as a GMS setting by @jjoyce0510 in #4536
- feat(ingest): add option for external Spark cluster by @kevinhu in #4571
- fix(upgrade): Renaming kafka producer since it clashes with spring-internal by @dexter-mh-lee in #4573
- feat(GraphQL): Add data platform query to GraphQL API by @jjoyce0510 in #4574
- build(ui): Fix Windows UI lint by @mattmatravers in #4556
- doc: make note prominent on quickstart by @anshbansal in #4558
- fix(protobuf) minor bugfixes for protobuf by @leifker in #4553
- feat(docs) Improves docs around developing datahub, removes deprecated docs on building metadata service by @pedro93 in #4552
- chore: cleanup extra file by @anshbansal in #4541
- feat(snowflake): reduce permissions provisioned by default by @anshbansal in #4543
- fix(ingestion): Redshift usage refactoring - simplify, annotate, fix bugs by @rslanka in #4572
- fix(graphql): Adding PRE FabricType to GraphQL by @jjoyce0510 in #4582
- feat(search) - add DATETIME FieldType by @aditya-radhakrishnan in #4407
- fix(tableau): fix for incorrect schema returned by tableau api for sn… by @mayurinehate in #4577
- chore: update default cli for managed ingestion by @anshbansal in #4581
- feat(okta) - add support for filtering/searching when ingesting Okta groups and users by @aditya-radhakrishnan in #4586
- doc(snowflake): add example of table pattern by @anshbansal in #4580
- fix(doc): try to fix broken link by @daha in #4593
- fix(bigquery): incorrect lineage when views are present by @anshbansal in #4568
- feat(metadata-service): Supporting a configurable Authorizer Chain by @jjoyce0510 in #4584
- fix(search): Make sure home page and search pages are consistent by @dexter-mh-lee in #4588
- fix(browse): Reduce browse aggregation size by @dexter-mh-lee in #4601
- doc: add page for handling deprecations, breaking changes etc. by @anshbansal in #4590
- docs(GraphQL): fix typo by @Falci in #4605
- feat(search): Add SearchScore annotation to use fields for search ranking by @dexter-mh-lee in #4596
- feat(ingestion): Redshift Usage Source - simplify OperationalStats workunit generation. by @rslanka in #4585
- feat(tableau): add some logic to normalize table names in tableau by @gabe-lyons in #4609
- fix: urlencode slash in urns too by @daha in #4527
- fix(bigquery): fix lineage bug, improve docs, add dataset filter config by @anshbansal in #4607
- fix(protobuf) fix test instabilitity by @leifker in #4612
- fix(ui): Fix dashboard tags display by @jjoyce0510 in #4611
- feat(ui): Adding GraphQL queries to fetch entity deprecation status by @jjoyce0510 in #4614
- feat(ingest): enable connection string for all sqlalchemy datasources by @ms32035 in #4508
- fix(docs): add grant statements for redshift-ingestion by @Abhiram98 in #4559
- chore: fix lint and remove incorrect integration mark from unit tests by @anshbansal in #4621
- feat: adding gradle, pip cache via gh cache, docker cache via dockerhub by @anshbansal in #4387
- doc(scheduling): make it easier to find ui ingestion by @anshbansal in #4610
- feat(glue): add CatalogId parameter for cross-account access by @BoyuanZhangDE in #4608
- doc(cli): add env variables and options for ingest command by @anshbansal in #4598
- fix(ingest): Restricting pytest docker version to <0.12 by @treff7es in #4639
- fix(cypress) - add waits for cypress search test to remove flakiness by @aditya-radhakrishnan in #4640
- Revert "feat: adding gradle, pip cache via gh cache, docker cache via dockerhub" by @dexter-mh-lee in #4637
- feat(search): Only reindex if the mappings for an existing field changed by @dexter-mh-lee in #4629
- feat: add presto-on-hive metadata ingestion source by @jchen0824 in #4625
- feat(ingest): add trino platform for great expectations by @ms32035 in #4594
- fix(kafka): Stop overriding kafka registry props with empty values by @jsotelo in #4604
- [model]: Dataprocess instance entity to model datajob/jobflow runs by @treff7es in #4459
- feat(ingest): add Urn python library for DataJob, DataFlow, Domain and Tag by @tc350981 in #4618
- fix(ingestion): ensure source/sink reports are always logged by @anshbansal in #4592
- fix(ingestion): extract explore views from join name in Looker by @dyanarose in #4627
- feat(ingestion): Enable lower-casing of the name part of dataset urn if env variable is set. by @rslanka in #4649
- feat: Enable the ingestion of bigquery audit logs to parse usage info… by @tha23rd in #4441
- fix(ingest): Fix snowflake KEY_PAIR auth by @mkamalas in #4638
- fix(home): Fix issue where some browse cards are missing by @dexter-mh-lee in #4652
- fix(tableau): avoid duplicate schema in URNs for upstream tables by @maaaikoool in #4645
- feat(ingest): capture MSSQL table+column descriptions by @kevinhu in #4579
- feat(ml): bringing ml screens up to date w/ the modern ui layout & improving ml lineage by @gabe-lyons in #4651
- (feat:airflow) Add support to capture airflow executions + high level dataflow/jobs api by @treff7es in #4615
- fix(ingestion): add missing workunit ids by @anshbansal in #4657
- fix(ingestion): Adding missing init.py by @anshbansal in #4659
- fix(bigquery-usage): missing dependency by @anshbansal in #4661
- feat(cypress) - add cypress dashboard view to CI by @aditya-radhakrishnan in #4654
- feat(autocomplete): show fully qualified name in autocomplete by @gabe-lyons in #4663
- feat(ingestion) dbt: Fixing issue with strip_user_ids_from_email and adding owner_naming_pattern by @arunvasudevan in #4587
- fix(sqlparser): fix sqlparser breaking due to # sign by @anshbansal in #4662
- fix(ingestion): validate datasource in Tableau connector, before creating its upstream by @nandacamargo in #4613
- Added Relative Routing on the Users & Groups screen by @Ankit-Keshari-Vituity in #4664
- fix(airflow): Not importing emitters directly to eliminate unneeded dependency by @treff7es in #4668
- docs:...
DataHub v0.8.32
Release Highlights
User Experience
We're excited to announce View-based RBAC Policies! You can now create and apply view-only permissions to your DataHub end-users, providing more robust access controls.
We've also included some small (but impactful!) improvements to UX, including:
- Display recent search terms when beginning the search flow
- Consistently displaying entity subtypes for dbt, Looker, Kafka, & more. Think: Kafka entities are displayed as "topics" instead of "datasets"
Ingestion Highlights
- New! Protobuf ingestion (shoutout to @leifker for this Community-led contribution!)
- Initial work to support a "Notebook" entity (shoutout to @tc350981 for spearheading this work!!)
- Stateful ingestion for dbt is now supported
- Ongoing improvements to our Tableau ingestion source from @nandacamargo & @cuong-pham
- Improvements to handling database aliases for Redshift ingestion
- Improvements to S3 source:
- Add containers for datasets
- Support platform_instance
- Support for folder level datasets
- Increased flexibility to specify dataset paths
- Ingestion Fixes:
- Snowflake Usage - log warning instead of error out & other error handling
- Snowflake allow/deny patterns
- Examples of allow/deny patterns added to docs
Full Commit Log
- #4570 @gabe-lyons fix(search): handle commas in search queries in the UI
- #4557 @daha fix: replace direct and indirect references to linkedin with datahub-project
- #4569 @dexter-mh-lee fix(policy): Add view entity page priv to all entity types
- #4567@anshbansal fix(bigquery): missing dependency
- #4548 @mayurinehate fix(tableau): gracefully stop ingestion if tableau sign in not successful
- #4564 @dexter-mh-lee fix(docs): fix logo links on ingestion docs
- #4396 @Abhiram98 feat(ingestion): schema, table filtering for redshift-usage
- #4554 @darapuk (fix): Update path generated when creating LookML URL
- #4549 @maggiehays docs: add sumup logo
- #4560 @mhw docs: Fix PostgreSQL typo in
features.md
- #4562 @gabe-lyons feat(lineage): show fully qualified dataset name on expansion
- #4561 @anshbansal fix: dependencies for usage sources
- #3782 @CorentinDuhamel feat(ingest): indent sql queries for usage sources
- #4551 @pedro93 fix(rollback) Removes status & key aspects from affected aspects count during rollback
- #4538 @dexter-mh-lee fix(policy): Remove all from the resource type choices
- #4544 @anshbansal fix(ingest): snowflake-usage - log warning instead of error out
- #4542 @mattmatravers build(ui): allow custom nodeDistBaseUrl
- #4545 @mayurinehate fix(kafka-connect): add platform for default case in jdbc connector, update tests for platform instance map
- #4547 @anshbansal chore: update pull request template
- #4537 @RyanHolstien fix(dataPlatformInstance): add data platform instance to entity registry
- #4375 @mayurinehate fix(kafka-connect): fix lineage for postgres-like 3-level hierarchy d…
- #4492 @RyanHolstien fix(cli): delete - handle case insensitive entity types
- #4130 @sgomezvillamor feat(ingest): glue - adds platform instance capability
- #4456 @mohdsiddique feat(stateful dbt): add stateful ingestion capability in dbt source
- #4482 @pedro93 feat(platform): adds side-effect report for rollbacks
- #4275 @maggiehays docs: Ingestion Source Docs Template
- #4470 @mayurinehate feat(tableau): emit lineage edge from embedded datasource to upstream…
- #4535 @pppsunil feat(ingestion): Support pluggable Schema Registry for Kafka Source
- #4369 @eburairu feat(ui): Add new loading pattern logo
- #4493 @leifker feat(integration): protobuf - additional annotations and features
- #4435 @zhoxie-cisco perf(docker): datahub-gms - add jetty configuration xml
- #4532 @anshbansal doc: add example of profiling in default example
- #4533 @anshbansal doc: clarify CLI releases
- #4528 @daha fix(doc): Change to forward slash-separated strings, as in the example
- #4315 @daha fix(docs): Minor fixes to pip install commands
- #4521 @anshbansal doc: update docker docs for mentioning Python CLI
- #4523 @kevinhu fix(ingest): mssql - support database_alias
- #4501 @arunvasudevan feat(ingest): kafka-connect - support mapping for multiple DB instances
- #4477 @jjoyce0510 feat(metadata service): Introducing Platform Events
- #4526 @jjoyce0510 Adding has container
- #4494 @cuong-pham fix(ingest): make tableau ingestion more resilient to error
- #4519 @anshbansal doc(ingestion): add examples of running in docker and Kubernetes
- #4485 @andres-lowrie docs(metadata-ingestion): callout props in para
- #4511 @RyanHolstien Oss/urn validation
- #4525 @dexter-mh-lee feat(policy): Add tooltip and view button
- #4507 @mayurinehate feat(assertion): update python example, assertion entity doc
- #4513 @kevinhu feat(ingestion): detect and disable telemetry in CI
- #4516 @dexter-mh-lee feat(policy): Add domain based and view based policies
- #4520 @rslanka Fix: Snowflake Table to View lineage
- #4510 @anshbansal feat(ingest): Add config to improve user exp for initial ingestion and fix docs
- #4517 @anshbansal feat(ingest): option for number of workunits in preview
- #4503 @treff7es feat(ingest): athena - set Athena location as upstream
- #4490 @MugdhaHardikar-GSLab feat(s3): add s3 source
- #4468 @tc350981 feat(notebook): graphqul related logic change for notebook
- #4401 @anshbansal doc: update instructions for updating DataHub on quickstart
- #4500 @ShubhamThakre SecretBuilderModal -> name field validation updated
- #4505 @anshbansal docs: add example of database and schema allow/deny patterns
- #4504 @anshbansal fix(snowflake): allow/deny patterns
- #4496 @shirshanka feat(ingest): dbt,looker,sql_common,kafka - moving sources to produce display names and subtypes more consistently
- #4480 @anshbansal feat(snowflake): stop querying for usage data when no mix/max dates
- #4483 @anshbansal fix(snowflake-usage): do not ingest for stage as a dataaset
- #4497 @shirshanka moving to dockerhub for actions container
- #4475 @darapuk fix: Update GroupProfile to read from properties over deprecated info aspect
- #4489 @anshbansal fix(ingestion): pin Jinja2 to version < 3.1.0
- #4484 @anshbansal fix(ingestion): stop CLI build failures
- #4467 @anshbansal doc: add caveats to snowflake doc
- #4481 @anshbansal fix(snowflake): don't recommend accountadmin role for snowflake
- #4479 @anshbansal fix: change log level to debug
- #4476 @dexter-mh-lee Add recent searches filtering
- #4469 @tc350981 feat(ingest): add python utility classes for NotebookUrn, CorpuserUrn and CorpGroupUrn
- #4439 @ShubhamThakre Feature/modal-validation-and-UI-fixes-updates
- #4398 @kevinhu feat(ingest): simplify event IDs for function invocations
- #4474 @sgomezvillamor chore: acryl-data 0.6.12
- #4473 @treff7es fix(redshift) Properly handling database alias in redshift usage and redshift lineage generation
- #4471 @gabe-lyons enabling ml tabs
- #4460 @eclaassen-pb fix: java dependency vulnerabilities
- #4453 @treff7es feat(ingest) data-lake: Add s3 properties metadata when ingesting s3 files
- #4464 @anshbansal fix: change for repository change
- #4466 @anshbansal fix(snowflake-usage): add more error handling
- #4445 @nandacamargo fix(ingest): add fix to tableau connector when table has None fields
- #4457 @mayurinehate docs(hive): update recipe with example to specify kerberos auth
- #4313 @pedro-iatzky fix(ingest): bigquery - fix ingestion of external tables
- #4462 @gabe-lyons adding final transport options
- #4223 @tc350981 feat(notebook): add data models for Notebook entity
- #4450 @pedro93 feat(frontend) Adds multiple group claim support
- #4442 @anshbansal feat(ingestion): snowflake, bigquery - enhancements to log and bugfix
- #4333 @anshbansal doc: add guide for ui tabs
- #4443 @jjoyce0510 Fixing privilege option display bug
- #4451 @gabe-lyons fix tableau connector when it cannot connect to URI
- #4237 @tc350981 (docs) add RFC file to introduce Notebook entity data model
- #4446 @kevinneville fix: Replace old repository link with new link
- #4447 @cuong-pham getting database directly from upstream tables incase there are multiple databases in upstreamDatabases
- #4436 @treff7es Passing entity properly on deletion
- #4433 @rslanka Fix bug in the SchemaField type computation for AVRO logical types.
DataHub v0.8.31
Bugfix release to prevent failing reindexing of system metadata index in elasticsearch