Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add historic rules #38

Open
holdenk opened this issue Sep 19, 2023 · 3 comments
Open

[FEATURE] Add historic rules #38

holdenk opened this issue Sep 19, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@holdenk
Copy link
Contributor

holdenk commented Sep 19, 2023

Is your feature request related to a problem? Please describe.

It would be nice to be able to express the desire for the new data to "look like" the old data (in terms of distribution).

Describe the solution you'd like

Since spark expectations collects summary stats already adding validation rules to allow there to be tolerances on the difference in today's summary v.s. the previous summary could be a good start.

Describe alternatives you've considered

I suppose we could write a query rule where folks just manually write the SQL query.

Additional context

TFDV goes above and beyond with it's historic views -- https://www.tensorflow.org/tfx/data_validation/get_started

@holdenk holdenk added the enhancement New feature or request label Sep 19, 2023
@asingamaneni
Copy link
Collaborator

@holdenk This is a good idea. We could add another rule_type stats_dq which would complement our existing rule types row_dq, agg_dq and query_dq.

By default, we can offer a view derived from the stats_df we generate. Currently, users specify a stats_table. We can propose a new stats_table_view constructed from the job's stats_df. Additionally, we can read from the current stats_table to establish a stats_table_existing_view.

Leveraging these two views, users can craft queries tailored to their validation needs. For added convenience, we'll include standard query examples in our documentation.

@holdenk
Copy link
Contributor Author

holdenk commented Sep 19, 2023

Awesome 👏

@phanikumarvemuri
Copy link
Contributor

@holdenk @asingamaneni Great feature.
I think at some point we need to provide an interface for users to implement custom rule types that can be integrated with spark-expectations .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants