Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Data profiling in Databricks for data quality #52

Open
jonaslieben opened this issue Oct 20, 2023 · 0 comments
Open

[FEATURE] Data profiling in Databricks for data quality #52

jonaslieben opened this issue Oct 20, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@jonaslieben
Copy link

jonaslieben commented Oct 20, 2023

Problem statement
As a user implementing data quality in DataBricks, besides implementing DQ validations, I want to perform data profiling which is directly supported in Databricks but also save the profiling results, so that we can use them for data observability reasons and detecting strange patterns.

Describe the solution you'd like

  • Profiling functionality in Spark Expectations which also allows you to save data profiles over time in a different table
  • Ability to define expectations on data profiling information (for user friendliness)

Describe alternatives you've considered

  • There are some tools on the market which have this capability, but they are not directly available within the Databricks ecosystem. There are tools like Informatica which can perform this activity.
  • Default DataBricks profiling capability. This is useful for doing ad-hoc data profiling, but cannot save the results and therefore lacks the potential to do data observability

Additional context
I can give additional context if needed or if the requirement is not entirely clear.

@jonaslieben jonaslieben added the enhancement New feature or request label Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant