v0.10.3
Release Highlights
User Experience
- Define Data Products via YAML and manage associated entities within a Domain
- Search experience: quickly apply a filter at time of search
- Form-based PowerBI ingestion
Developer Experience
- Progress toward Removing Confluent Schema Registry requirement -- Helm & Quickstart simplifications to follow
- NOTE: this will only work for new deployments of DataHub; If you have already deployed DataHub with Confluent Schema Registry, you will not be able to disable it
- Delete CLI - correctly handles deleting timeseries aspects
- Ongoing improvements to Quickstart stability
- Support entity types filter in
get_urns_by_filter
- Search customization
- regex based query matching
- full control over scoring functions (useable on any document field, i.e. tags, deprecated flags, etc)
- enable/disable fuzzy, prefix, exact match queries
Ingestion
- BigQuery - Improve ingestion disk usage & speed; extract dataset usage from Views
- Unity Catalog - Capture create/last modified timestamps; extract usage; data profiling support
- PowerBI - Update workspace concept mapping; support
modified_since
,extract_dataset_schema
, and more - Superset – support stateful ingestion
- Business Glossary – Simplify ingestion source
- Kafka – Add description in dataset properties
- S3 – Support stateful ingestion &
last_updated
- CSV Enricher – Support updating more types
- PII Classification - Configurable sample size
- Nifi - Support Kerberos authentication
What's Changed
- fix(ingest/bigquery): Add to lineage, not overwrite, when using sql parser by @asikowitz in #7814
- fix(ingest/bigquery): Enable lineage and usage ingestion without tables by @asikowitz in #7820
- fix(ingest/bigquery): Do not query columns when not ingesting tables or views by @asikowitz in #7823
- fix(ingest/bigquery): update usage query, remove erroneous init by @mayurinehate in #7811
- fix(ingest/bigquery): Handle null values from usage aggregation by @asikowitz in #7827
- perf(ingest/bigquery): Improve bigquery usage disk usage and speed by @asikowitz in #7825
- fix(cli): use correct ingestion image in script by @hsheth2 in #7826
- fix(release): prevent republish of images on release edits by @RyanHolstien in #7828
- feat(): finish populating the entity registry by @hsheth2 in #7818
- fix(ui) Fix 404 page routing bug by @chriscollins3456 in #7824
- feat(ui): Support PowerBI Ingestion via UI form by @jjoyce0510 in #7817
- fix(ingest/snowflake): fix column name in snowflake optimised lineage by @mayurinehate in #7834
- feat(ingest/unity): capture create/lastModified timestamps by @hsheth2 in #7819
- fix(test): fix spark lineage test by @david-leifker in #7829
- docs(): add markprompt help chat by @jeffmerrick in #7837
- Update DataJobInputOutput.pdl to express that CLL fields are not shown in the UI right now by @gabe-lyons in #7830
- feat(cli): improve quickstart stability by @hsheth2 in #7839
- chore(ci): regular upgrade base requirements.txt by @anshbansal in #7821
- feat(timeseries): Support sorting timeseries aspects by non-timestampMillis field + fix operations resolver by @jjoyce0510 in #7840
- doc(ingestion/tableau): Fix rendering ingestion quickstart guide by @mohdsiddique in #7808
- fix(ingest): pin sqlparse version by @hsheth2 in #7847
- feat(urn): Add a validator when creating an URN that it is no longer than the li… by @iprentic in #7836
- chore(ingest): bug fix in sqlparse pin by @hsheth2 in #7848
- feat: enriching guide on creating dataset by @yoonhyejin in #7777
- feat(docs): consolidate api guides by @yoonhyejin in #7857
- fix(ingest/salesforce): use report timestamp for operations by @hsheth2 in #7838
- chore(ci): fix CI failing due to lint by @anshbansal in #7863
- fix(mcl): fix improper pass by reference by @RyanHolstien in #7860
- feat(urn) Add validator to reject URNs which contain the character we plan to u… by @iprentic in #7859
- feat(elasticsearch): Add servlet which provides an endpoint for a healthcheck on the ES cl… by @iprentic in #7799
- fix(ui) Add UI fixes and design tweaks to AutoComplete by @chriscollins3456 in #7845
- fix(ui) Get all entity assertions in chrome extension by @chriscollins3456 in #7849
- refactor(platform): Refactoring ES Utils, adding EXISTS condition support to Filter Criterion by @jjoyce0510 in #7832
- chore(ui): change background color to transparent for avatar with photoUrl by @hieunt-itfoss in #7527
- refactor(ingest): Add helper DataHubGraph methods by @asikowitz in #7851
- fix(ui) Disable cache on Domain and Glossary Related Entities pages by @chriscollins3456 in #7867
- fix(cache): Fix cache key serialization in search service by @pedro93 in #7858
- docs(ingest): update dbt and aws docs by @hsheth2 in #7870
- docs(ingest): fix CorpGroup example by @hsheth2 in #7816
- docs(ingest/powerbi): update workspace concept mapping by @eeepmb in #7835
- feat(ingest/powerbi): support modified_since, extract_dataset_schema and many more by @aezomz in #7519
- Remove usages of commons-text library lower than 1.10.0 by @iprentic in #7850
- feat(glue): allow resource links to be ignored by @YusufMahtab in #7639
- feat(ingestion): lookml refinement support by @mohdsiddique in #7781
- feat(ingest/unity): Ingest ownership for containers; lookup service principal display names by @asikowitz in #7869
- Logging and test models fixes by @david-leifker in #7884
- feat(model) Add ContainerPath aspect model by @chriscollins3456 in #7774
- bug(7882): run kafka-configs.sh on DataHubUpgradeHistory_v1 to make sure the retention.ms is set to infinite by @jinlintt in #7883
- fix: refactor toc by @yoonhyejin in #7862
- feat(cli): Modifies ingest-sample-data command to use DataHub url & token based on config by @pedro93 in #7896
- feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern by @mayurinehate in #7842
- fix(ingestion/tableau): backward compatibility with version 2021.1 an… by @mayurinehate in #7864
- fix(ingest/dbt): ensure dbt shows view properties by @hsheth2 in #7872
- docs(airflow): add debug guide on url generation by @hsheth2 in #7885
- feat(sdk): support entity types filter in
get_urns_by_filter
by @hsheth2 in #7902 - fix(ingest/snowflake): fix optimised lineage query, filter temporary … by @mayurinehate in #7894
- fix(ingest/bigquery): fix handling of time decorator offset queries by @mayurinehate in #7843
- fix(ingest): fix minor bug + protective dep requirements by @hsheth2 in #7861
- fix(cli): remove duplicate labels from quickstart files by @hsheth2 in #7886
- Revert "feat(cli): Modifies ingest-sample-data command to use DataHub… by @pedro93 in #7899
- feat(sdk): add
DataHubGraph.get_entity_semityped
method by @hsheth2 in #7905 - test(ingest/biz-glossary): add test for enable_auto_id by @hsheth2 in #7911
- feat(ingest): add GCS ingestion source by @mayurinehate in #7903
- [bugfix] Fix remote file ingestion for Windows by @xiphl in #7888
- refactor(ingest): report soft deleted stale entities with LossyList by @asikowitz in #7907
- fix(platforms): fix json parse exception for data platforms by @RyanHolstien in #7918
- docs(release): managed DataHub 0.2.6 by @anshbansal in #7922
- fix(deploy): add missing plugin files for mysql-client library in mysql-setup by @AndrewZures in #7909
- docs(deploy): document some of the environment variables by @david-leifker in #7906
- fix(system-update): fix no wait flag by @david-leifker in #7927
- fix(consumer): fix datahub usage event topic consumer by @david-leifker in #7866
- logging(auth): adding optional logging to authentication exceptions by @david-leifker in #7929
- feat(search): enable search initial customization by @david-leifker in #7901
- feat(schema-registry): replace confluent schema registry by @david-leifker in #7930
- feat(ingest/unity): Add usage extraction; add TableReference by @asikowitz in #7910
- fix(ingest/unity-catalog): Add usage_common dependency to unity catalog plugin by @asikowitz in #7935
- feat(search): add filter for specific entities by @iprentic in #7919
- fix(ingest/unity): Add sqllineage dependency by @asikowitz in #7938
- fix(ingest/hive): fix containers generation for hive by @mayurinehate in #7926
- docs(ingest): add note about path_specs configuration in data lake sources by @mayurinehate in #7941
- feat: add missing python sdk guides based on DatahubGraph by @yoonhyejin in #7875
- fix(ingest/unity): use fully qualified catalog/schema patterns by @hsheth2 in #7900
- feat(airflow): respect port parameter if provided by @hsheth2 in #7945
- fix(ingest): improve error message when graph connection fails by @hsheth2 in #7946
- fix(docs): Adding relationship types section to Business Glossary docs by @jjoyce0510 in #7949
- docs(ingest): update max_threads default value by @felipeac in #7947
- fix(ui) Fix Tag Details button to use url encoding by @chriscollins3456 in #7948
- docs: amend italic formatting by @HansBambel in #7893
- fix(ldap): properly handle escaped characters in LDAP DNs by @Reilman79 in #7928
- docs(ingest/postgres): add example with ssl configuration by @hsheth2 in #7916
- refactor(ingest/biz-glossary): simplify business glossary source by @hsheth2 in #7912
- fix: Fix broken links on PowerBI by @yoonhyejin in #7959
- feat(model) Update aspect containerPath -> browsePathsV2 by @chriscollins3456 in #7942
- fix(ui) Fix displaying column level lineage for sibling nodes by @chriscollins3456 in #7955
- fix(ingest/bigquery): Filter projects for lineage and usage by @asikowitz in #7954
- feat(tracking) Add tracking events to our chrome extension page by @chriscollins3456 in #7967
- fix(search): Handle .keyword properly in the entity type query to ind… by @iprentic in #7957
- feat(es) Store and map containerPath to elastic search properly by @chriscollins3456 in #7898
- fix: build vercel python from source by @hsheth2 in #7972
- feat(models): Make assets searchable by their external URLs by @jjoyce0510 in #7953
- fix(ingest/salesforce): support JSON web token auth by @matthew-piatkus-cko in #7963
- fix(SearchBar): Restore explore all link by @joshuaeilers in #7973
- fix(ingest/tableau): Add a try catch to LineageRunner parser by @maaaikoool in #7965
- fix(ingest/salesforce): fix lint by @hsheth2 in #7980
- fix(ingest): use certs correctly in rest emitter by @hsheth2 in #7978
- fix(ingestion/redshift) - Fixing schema query by @treff7es in #7975
- chore(log): change sout to log by @anshbansal in #7931
- fix(ingest/redshift): Enabling autocommit for Redshift connection by @treff7es in #7983
- fix(ingest): use with for opened connections by @mayurinehate in #7908
- fix(ingest/unity): improve error message if no scheme in workspace_url by @mayurinehate in #7951
- fix(download as csv): Support download to csv for impact analysis tab by @jjoyce0510 in #7956
- docs(development): update per feedback from community by @david-leifker in #7958
- fix(ingest/bigquery): remove incorrectly used table_pattern filter by @mayurinehate in #7810
- feat(snowflake): add config option to specify deny patterns for upstreams by @mayurinehate in #7962
- fix(docker-compose): make startup more robust with deterministic services' dependencies by @gcernier-semarchy in #7880
- fix(cache): update search cache when skipped, but enabled by @RyanHolstien in #7936
- feat(telemetry): add server version by @RyanHolstien in #7979
- docs: add tips on language switchable tap on docs by @yoonhyejin in #7984
- fix(privileges) Use glossary term manage children privileges for edit docs and links by @chriscollins3456 in #7985
- fix(ingest/postgres): Allow specification of initial engine database; set default database to postgres by @asikowitz in #7915
- refactor(ingest/unity): Use databricks-sdk over databricks-cli for usage query by @asikowitz in #7981
- chore: cleanup some devtool console warnings by @joshuaeilers in #7988
- feat(search): support only searching by quick filter by @joshuaeilers in #7997
- feat(docs): Add cli documentation on how to add custom platforms by @pedro93 in #7993
- fix(search): fix custom search config parsing by @david-leifker in #8010
- fix(auth): guards against creating a user for the system actor by @aditya-radhakrishnan in #7996
- chore(security): update org json json dependency - cve-2022-45688 by @RyanHolstien in #7991
- feat(metrics): add metrics for upgrade steps by @RyanHolstien in #7992
- feat(models): Adding searchable for chart and dashboard url by @jjoyce0510 in #8002
- feat(ingest/s3): Inferring schema from the alphabetically last folder by @treff7es in #8005
- feat(ingest/classification): add classification report by @mayurinehate in #7925
- docs(managed datahub): release notes for v0.2.7 by @anshbansal in #8020
- fix(ui ingest): Fix mapping for token_name, token_value form fields for Tableau by @jjoyce0510 in #8018
- fix(ui): add loading indicator for download as CSV action by @aditya-radhakrishnan in #8003
- fix(ingest/snowflake): fix lineage query aggregation for optimised li… by @mayurinehate in #8011
- feat(ingest/unity): Add profiling support by @asikowitz in #7976
- feat(docs): Add example documentation for scrollAcrossEntities by @pedro93 in #8014
- fix(ingest/unity): Update databricks-cli pin by @asikowitz in #8024
- fix(ingest/s3) Adding missing more-itertools dependency by @treff7es in #8023
- feat(cli): move registry delete to separate subcommand by @hsheth2 in #7968
- fix(sdk): throw errors on empty gms server urls by @hsheth2 in #8017
- feat(ingest/superset): add stateful ingestion by @cccs-Dustin in #8013
- Gitignor'ing generated binary files in OSS by @meyerkev in #8031
- fix(PFP-260): Upgrading sqlite to fix SQLITE-449762 by @meyerkev in #8032
- feat(ingest): support importing local modules by @hsheth2 in #8026
- fix(timeline-events): fix NPE in timeline events by @david-leifker in #8038
- fix(posts): fix formatting for posts where the title can get cut off by @aditya-radhakrishnan in #8001
- fix(ingestion/metabase): metabase connector bigquery lineage fix by @shubhamjagtap639 in #8042
- fix(es) Fix browseV2 index mappings by @chriscollins3456 in #8034
- fix(search): enter key with no query should search all by @joshuaeilers in #8036
- feat(ingest): Allow csv-enricher to update more types by @xiphl in #7932
- fix(search): only show explore all btn on search and home by @joshuaeilers in #8047
- fix(ingest/dbt): fix dbt subtypes for sources by @hsheth2 in #8048
- fix(ingest/bigquery): update usage audit log query to include create/… by @mayurinehate in #7995
- feat(docs): add guide on integration ML system via SDKs by @yoonhyejin in #8029
- refactor(ingest): Make get_workunits() return MetadataWorkUnits by @asikowitz in #8051
- refractor(classification): simplify classification handler by @mayurinehate in #8056
- feat: Add support for Data Products by @shirshanka in #8039
- fix(build): fix lint issue by @shirshanka in #8066
- feat(system-update): remove datahub-update requirement on schema reg by @david-leifker in #7999
- fix(gitignore): update gitignore for generated files by @minjin0121 in #7940
- feat(ingestion/kafka): add description in dataset properties by @shubhamjagtap639 in #7974
- fix(ingestion/tableau): ingest parent project name in container properties by @mohdsiddique in #8030
- refactor(ingest): Move source_helpers.py from datahub/utilities -> datahub/api by @asikowitz in #8052
- fix(ingest/snowflake): lowercase user urn when using email by @matwalk in #7767
- fix(ingest/tableau): don't use unsupported sql condition field by @mayurinehate in #8065
- fix(ingest/looker): don't prematurely show connectivity success by @hsheth2 in #8070
- feat(web): update AWS logos by @rinzool in #8057
- fix(metadata-io): remove assert in favor of exceptions by @david-leifker in #8035
- feat: add docs on column-level linage by @yoonhyejin in #8062
- ci: prevent qodana from using all of our cache by @hsheth2 in #8054
- ci(ingest/clickhouse): don't use kernel ephemeral ports by @hsheth2 in #8060
- test(sdk): better error messages in registry codegen test by @hsheth2 in #8081
- doc(managed datahub): update release notes for 0.2.7 by @anshbansal in #8088
- feat(ingest/s3) - Stateful ingestion and last-updated support by @treff7es in #8022
- docs(ingest/snowflake): fix authentication type docs by @hsheth2 in #8059
- fix(ingest/s3_data_lake)_ingestor_skips_directories_with_similar_prefix by @alplatonov in #8078
- fix(ui) Fix entity name styling to show deprecation and others properly by @chriscollins3456 in #8084
- test(sdk): move cli tests into the unit dir by @hsheth2 in #8028
- feat(sdk): better auth error messages in the rest emitter by @hsheth2 in #8025
- feat(caching): skip cache on ownership tabs by @gabe-lyons in #8082
- feat(embed): embed lookup route by @joshuaeilers in #8033
- fix(ingest/delta-lake): Walk through directory structure with full path; reduce resource creation by @asikowitz in #8072
- feat(search): Add AggregateAcrossEntities endpoint by @iprentic in #8000
- chore(vulnerability): add exclusions for json to prevent leaking dependency by @RyanHolstien in #8090
- fix(ingestion/powerbi): skip erroneous pages of a report by @shubhamjagtap639 in #8021
- feat(docs): Update markprompt by @jeffmerrick in #8079
- feat(images): Add build processes for arm64v8 image variants by @pedro93 in #7990
- feat(ingest): add
env
to container properties by @hsheth2 in #8027 - fix(checkstyle): Fix checkstyle violations to turn master green by @iprentic in #8099
- doc(auth): fixes doc in DataHubSystemAuthenticator.java by @sgomezvillamor in #8071
- refactor(ingest): Auto report workunits by @asikowitz in #8061
- feat(cli): support
datahub ingest mcps
by @hsheth2 in #7871 - feat: datahub-upgrade.sh to support old versions by @ollisala in #7891
- feat(ingest/s3): type aware directory sorting by @treff7es in #8089
- fix(ci): add missing updates to restli-spec by @anshbansal in #8106
- fix(ingest/build): setting typing extension <4.6.0 because it breaks tests by @treff7es in #8108
- fix(upgrade): removes sleep from bootstrap process by @RyanHolstien in #8016
- fix(jackson): increase max serialized string length default by @RyanHolstien in #8053
- fix(ui): SchemaDescriptionField 'read-more' doesn't affect table height by @jfrancos-mai in #7970
- fix(ingest): emitter bug fixes by @hsheth2 in #8093
- fix(sample data): Update timestamps in bootstrap_mce.json to more recent by @iprentic in #8103
- feat(ui) Add readOnly flag that disables profile URL editing by @chriscollins3456 in #8067
- feat(cli): delete cli v2 by @hsheth2 in #8068
- refactor(ingest): simplify stateful ingestion provider interface by @hsheth2 in #8104
- Update updating-datahub.md with breaking changes by @chriscollins3456 in #7964
- feat(ui) Show documentation on Domain pages first by @chriscollins3456 in #8110
- docs(readme): adds PITS Global Data Recovery Services to the adopters list by @pheianox in #8080
- fix(ingest/redshift): Making Redshift source more verbose by @treff7es in #8109
- feat(ingest): Browse Path v2 helper by @asikowitz in #8012
- feat(classification): configurable sample size by @mayurinehate in #8096
- fix logic for multiple entities found and clean up messy code by @joshuaeilers in #8113
- fix(search): Update _entityType transform logic to work for entities containing _ by @iprentic in #8112
- feat(ingest/bigquery): usage for views by @mayurinehate in #8046
- fix(ui): Open mailto link in new tab by @jfrancos-mai in #7982
- fix(search): Transform _entityType/index output for scroll across entities as well by @iprentic in #8117
- feat(ingest): Add GenericAspectTransformer by @amanda-her in #7994
- refactor(ingest): Call source_helpers via new WorkUnitProcessors in base Source by @asikowitz in #8101
- feat(ingest/nifi): kerberos authentication by @mayurinehate in #8097
- fix(ingest/redshift):fixing schema filter by @treff7es in #8119
- feat(ingest/unity): Allow ingestion without metastore admin role by @asikowitz in #8091
- feat(ingest/bigquery): Add BigQuery Views lineage extraction from Google Data Catalog API by @viniciusdsmello in #8100
- fix(ingest/redshift): Fixing Redshift subtypes by @treff7es in #8125
- fix(ingest): Fix breaking smoke test on stateful ingestion by @asikowitz in #8128
New Contributors
- @eeepmb made their first contribution in #7835
- @YusufMahtab made their first contribution in #7639
- @AndrewZures made their first contribution in #7909
- @HansBambel made their first contribution in #7893
- @matthew-piatkus-cko made their first contribution in #7963
- @joshuaeilers made their first contribution in #7973
- @gcernier-semarchy made their first contribution in #7880
- @shubhamjagtap639 made their first contribution in #8042
- @minjin0121 made their first contribution in #7940
- @matwalk made their first contribution in #7767
- @rinzool made their first contribution in #8057
- @alplatonov made their first contribution in #8078
- @ollisala made their first contribution in #7891
- @jfrancos-mai made their first contribution in #7970
- @pheianox made their first contribution in #8080
Full Changelog: v0.10.2...v0.10.3