DataHub V0.8.36
V0.8.36
Highlights
User Experience
NEW – Manage Glossary Terms via the DataHub UI! Delivering on our Q2’22 Roadmap item, end users can now create, edit, move, delete, and deprecate Glossary Terms via the UI! With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to restore your indices in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!
Ability to add multiple Owners, Tags, Terms
Developer Experience
The new Revokable Token API supports a new type of Access Token which can be revoked & queried, allowing admins to easily delete tokens for operational & security reasons. Read all about it in the Access Token Management Usage Guide.
Ingestion Updates
This release includes 3 new Metadata Sources:
- Iceberg
- Vertica
- SAP HANA
📣 Massive shoutout to DataHub Community members @cccs-eric, @eburairu, and @buggythepirate for driving these contributions! 📣
These sources are currently marked as “Testing” - we encourage you to try them out & provide feedback in the DataHub #ingestion Slack channel!
We’ve rolled out the following ingestion-related improvements:
- AWS Glue - data profiling is now supported
- S3 ingestion speed-up
- Various bug fixes
Full Commit Log
- #5071 @dexter-mh-lee fix(docker): Fix mysql setup bug
- #5066 @jjoyce0510 refactor(docs): Rename metadata modeling ingestion sidebar titles
- #5036 @mmmeeedddsss fix(mysql-setup-job): add mysql default port override support
- #5056 @nj7 fix: ES Rest Client Creation for non ssl authenticated connection
- #5053 @ShubhamThakre fix(ui): ui bug fix for datasets sidebar stats section
- #5061 @anshbansal feat(redash): add parallelism support for ingestion
- #5017 @anshbansal feat(model): new chart types
- #5047 @RyanHolstien fix(datahub-upgrade): exclude unnecessary configuration from standalone applications
- #5052 @shirshanka feat(ci): datahub-client - add workflow, fix build
- #5054 @jjoyce0510 docs(actions): Adding DataHub Actions to docs website
- #5031 @piyushn-stripe feat(frontend): Allow overriding frontend with a custom akka http server
- #5050 @dexter-mh-lee Remove exception on ingest policies
- #5043 @Masterchen09 fix(docs): hana - rename SAP HANA source and data platform
- #5051 @shirshanka fix(ingest): fix build breakage due to traitlets 5.2.2 bug
- #5045 @anshbansal fix(redash): fix bug with names, add option for page size, debugging info
- #5022 @jjoyce0510 fix(restore): Add RESTATE ChangeType to MCL / MCP to permit restore indices
- #5041 @anshbansal doc(bigquery): fix missing permissions
- #5030 @endeesa fix(doc) - Specify docker-compose version to avoid compatibility issues
- #4879 @BoyuanZhangDE feat(ingest): glue - enable profiling
- #5035 @treff7es fix(profiling): bigquery - Fix for Bigquery temp table creation on GE >= 0.15.3
- #5040 @shirshanka fix(build): m1 build fails to install hdb-cli
- #5026 @chriscollins3456 feat(glossary) Business Glossary updates
- #4940 @MugdhaHardikar-GSLab fix(spark-lineage): remove need for sparksession.stop call
- #5023 @rslanka fix(ingest): common - fix nullability determination for the AVRO fixed type.
- #5012 @anshbansal fix(cli): don't use env for container, add example
- #5021 @maggiehays docs(townhall): update townhall rsvp link and add may townhall detail
- #5038 @shirshanka fix(build): docgen should fail if plugin is not loadable
- #5033 @RyanHolstien fix(timelineAPI): fix issue with semantic versioning
- #5034 @RyanHolstien fix(telemetry): exclude configuration from standalone apps
- #5029 @RyanHolstien feat: telemetry improvements
- #5028 @gabe-lyons dont set platform instances for sources
- #5027 @anshbansal fix(parsing): incorrect parsing for commas
- #4938 @Ankit-Keshari-Vituity refactor(ui): UI Integration to add multiple tags, terms and owners
- #5025 @anshbansal fix(parsing): improve sql parsing, some debugging redash
- #5024 @rslanka fix(ingestion): Remove hana from base_dev_requirements to unblock m1 users
- #5014 @anshbansal fix(bigquery): reduce number of calls for details of partitioning
- #5016 @ShubhamThakre fix(ui): arrow click position update
- #5019 @rslanka fix(build): fix for hana build failure for aarch64.
- #5020 @jjoyce0510 feat(Tests): Make DataHub Tests Feature configurable via env variable
- #5005 @hsheth2 test(ingestion): change class names to avoid unittest warnings
- #5006 @hsheth2 fix(ingestion): use raw strings for regexes
- #5010 @rslanka feat(ingestion): Add Iceberg source
- #5001 @PatrickfBraz fix(bigquery-usage): fix audit metadata query template
- #4997 @anshbansal fix(redash): improve logging for debugging, add validation for dataset urn, some refactoring
- #4376 @buggythepirate feat(ingest): Added new ingestion source SAP HANA
- #5011 @rslanka Fix pulsar source docs.
- #4555 @eburairu feat(ingest): Add Source from Vertica
- #5008 @anshbansal fix(dbt): missing aws dependency
- #5007 @anshbansal fix(bigquery): restrict protobuf version
- #5004 @pedro93 fix(gms): Fix incorrect StatefulTokenService init
- #5002 @ShubhamThakre fix(ui): ui bug fix - fixing search card vertical margin
- #4994 @anshbansal doc(delete): add example for dataflow and datajob
- #4988 @jjoyce0510 feat(DataHub Operations): Adding GraphQL mutation for reporting Dataset operations
- #4998 @shirshanka fix(cli): timeline - adjust for timeline API changes on server
- #5000 @pedro93 fix(docs): Fixes token docs
- #4989 @jjoyce0510 feat(Tests): Metadata Tests Models + APIs + UI (Part 1)
- #4995 @treff7es fix(airflow): Fix for Airflow 1 support
- #4993 @shirshanka chore(deps): upgrade gson version
- #4935 @BoyuanZhangDE feat(dbt): enable dbt read artifacts from s3
- #4833 @treff7es feat(airflow): Airflow lineage ingestion plugin
- #4931 @mayurinehate fix(ingest): tableau - fix chart custom properties None key error, update docs
- #4943 @mayurinehate feat(model): add created, lastModified auditstamps to SchemaField
- #4991 @anshbansal refactor(redash): emit charts first and try with id based dashboard API first
- #4942 @mohdsiddique metabase chart are missing from dashboard
- #4992 @anshbansal doc(ingest): update golden file command
- #4927 @treff7es feat(ingest): s3 - speeding up ingestion with sampling
- #4979 @pedro93 fix(smoke-tests) Increases sleep timeout in rollback test to prevent flakiness
- #4964 @dexter-mh-lee feat(run): Create a describe run endpoint for fetching aspects created by the ingestion run
- #4169 @claudio-benfatto feat(ingestion): optionally disable some kafka schema warnings
- #4972 @mayurinehate feat(great-expectations): allow DATAHUB_DEBUG env var to enable debug logs in GE Action
- #4957 @justinas-marozas refactor(metadata-io): introduce a storage-independent in-memory entity aspect model
- #4982 @jjoyce0510 feat(authorization): Adding AuthorizerContext + ResourceSpecResolver to context
- #4984 @anshbansal doc(ingestion): default boolean fix, broken bigquery docgen
- #4970 @pedro93 feat(graphql) Add new Revokable Token API
- #4987 @anshbansal fix(ingest): remove new schema field usage
- #4985 @anshbansal fix(redash): use dashboard id if slug does not work
- #4986 @pedro93 chore(deps): upgrade datastax libs version
- #4981 @RyanHolstien fix(metadata-service): telemetry - fix hardcoded aspect name, suppress errors when producing MAE
- #4983 @shirshanka fix(ingest): mode - dashboards without creator info fails to process
- #4975 @chriscollins3456 fix(UI) Fix multiple UI usability issues
- #4977 @maggiehays docs(townhall): update invite links and townhall history
- #4980 @MugdhaHardikar-GSLab feat(spark-lineage): support for persist API
- #4974 @anshbansal feat(bigquery): add partition key tag
- #4967 @anshbansal fix(bigquery): add rate limiting for api calls made
- #4971 @shirshanka fix(cli): graph - get_aspect_v2 method fails to deserialize aspects correctly
- #4958 @anshbansal doc(ingest): mysql - describe required grants
- #4969 @RyanHolstien doc(telemetry): fix telemetry doc
- #4878 @MugdhaHardikar-GSLab fix(datahub-client): support utf8 encoding
- #4961 @anshbansal feat(bigquery): reduce logging
- #4909 @ShubhamThakre fix(ui): policy outside modal click issue update
- #4968 @jeffmerrick docs(website): Remove banner and nav item for metadata day 2022
- #4965 @mmmeeedddsss docs(datahub-kafka-sink): add topic_routes config to doc of datahub-kafka-sink
- #4966 @liyuhui666 fix(data platforms): Update data_platforms.json
- #4922 @mayurinehate feat(cli): raise error if get entity api fails
- #4963 @Masterchen09 fix(ui): do not show copy URN buttons when Clipboard API is not available
- #4962 @RyanHolstien feat(release): update CLI version
- #4960 @RyanHolstien feat: updates for 0.8.35
- #4945 @treff7es Revert "feat(spark-lineage): add support for iceberg and cache based plans (#4882)"
- #4954 @dexter-mh-lee fix(ci): remove scheduled artifact deletion run to avoid api rate limiting
- #4932 @anshbansal fix(bigquery): add dataset_id for bigquery
- #4952 @RyanHolstien fix(metadata-service): timeline - ignore platform and schema changes
- #4953 @dexter-mh-lee fix(ci): docker - remove multiplatform builds for unsupported images
- #4950 @dexter-mh-lee fix(ci): add artifact cleaner, make docker publish sections consistent
- #4947 @RyanHolstien fix(workflow): fix mysql credentials
- #4951 @aditya-radhakrishnan fix(frontend): Update run-local-frontend to reflect the new Play changes
- #4936 @gabe-lyons feat(transformers): add transformers to provide tags & terms to schema fields based on regex patterns
- #4948 @dexter-mh-lee Fix docker unified
- #4944 @gabe-lyons make graphql OperationType enum match up w/ pdl