-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Brainstorm data quality features #53
Comments
@robertkossendey - these sound like good suggestions. I'm guessing some external libs would help for this type of functionality (Great Expectations or PyDeequ perhaps), but don't want to add any dependencies to this lib. Let's keep this open as a "meta-issue". When you have ideas for individual functions, feel free to open up a separate issue and we can chat in detail before you put in the work. Thanks! |
@MrPowers I wouldn't like to use any other framework tbh. If you're okay with it I would create a PoC PR that allows you to specify a condition and if that condition is not fulfilled a write would fail. |
@robertkossendey - yep, PoC PR sounds like a great next step! |
Hey guys I actually had built a library to mock the dlt behaviors outside of databricks: dlt-with-debug I think I can take out the expectation mock apis and add them here in mack. |
@souvik-databricks very cool! Maybe you can open up a PR and we can collaborate on that then :) |
I will raise the PR on this @robertkossendey |
Constraints are great data quality features that allow users to define filters / rules that identify invalid records.
But the only allow for fail on invalid records and I think we could do better.
Some ideas:
- Ability to write invalid rows to "Quarantine" tableWDYT @MrPowers
The text was updated successfully, but these errors were encountered: