Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented options for num_quantiles feature in NumericStatsMixin #896

Closed
wants to merge 15 commits into from
Closed

Implemented options for num_quantiles feature in NumericStatsMixin #896

wants to merge 15 commits into from

Conversation

clee1152
Copy link
Contributor

Original issue here.

lizlouise1335 and others added 4 commits June 13, 2023 16:50
* memory optimization preset

ttrying again

ttrying again 3

ttrying again 4

accidentally pushed my updated makefile

* Wrote catch for invalid presets, wrote test for catch for invalid presets, debugged new optimization preset

* Forgot to run pre-commit, fixed those issues

* black doing weird things

* made preset validation more maintainable by moving it to the constructor and getting rid of preset list
* RowStatisticsOptions: Add null row count

Added null_row_count as an option in RowStatisticsOptions. It toggles the functionality for row_has_null_ratio and row_is_null_ratio in _update_row_statistics.

* Unit test for RowStatisticOptions:

* Black formatting

* RowStatisticsOptions: Add null row count

Added null_row_count as an option in RowStatisticsOptions. It toggles the functionality for row_has_null_ratio and row_is_null_ratio in _update_row_statistics.

* Unit test for RowStatisticOptions:

* Black formatting

* added a unit test for RowStatisticsOptions

* Deleted test cases that were written in the wrong file

* updated testing for null_count toggle in _update_row_statistics

* removed the RowStatisticsOptions from test_profiler_options imports

* add line

* Created toggle option for null_count

* RowStatisticsOptions: Add implementation

* Revert "RowStatisticsOptions: Add implementation"

This reverts commit 2da6a93.

* RowStatsticsOptions: Create option

* fixed pre-commit error

* Update dataprofiler/profilers/profiler_options.py

Co-authored-by: Taylor Turner <[email protected]>

* Update dataprofiler/profilers/profiler_options.py

Co-authored-by: Taylor Turner <[email protected]>

* fixed documentation

---------

Co-authored-by: Taylor Turner <[email protected]>
* memory optimization preset

ttrying again

ttrying again 3

ttrying again 4

accidentally pushed my updated makefile

* trying

* trying

* black doing weird things

* trying

* made preset validation more maintainable by moving it to the constructor and getting rid of preset list

* Update to open-source in prep for wrapper changes for mem op preset

* updated preset toggles and preset name (mem op -> large data)

* updated tests to match

* continued name and test and toggle updates

* fix comments
* Implementing option

* Implementing option

* took out redundant if statement. added test case for when null_count is disabled.

* attempt to check for conflicts between profile merges

* added test to check if two profilers have null_count enabled before merging them together

* fixed typo and added a trycatch to prevent failing test

* No mocks needed. Fixed assertRaisesRegex error

* Changed variables names and added a new test to check for check the null_count when null_count is disabled.

* Changed name of test, moved tests to TestStructuredProfilerRowStatistics. Fixed position of if statement to prevent unnecessary code from running.

* added null_count test cases

* fixed indentation mistake

* fixed typo

* removed a useless commented a line

* Updated test name
@CLAassistant
Copy link

CLAassistant commented Jun 22, 2023

CLA assistant check
All committers have signed the CLA.

@taylorfturner taylorfturner added the New Feature A feature addition not currently in the library label Jun 22, 2023
Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • needs tests for these edits
  • also one slight change not to return in __calculations

@taylorfturner taylorfturner added the Work In Progress Solution is being developed label Jun 22, 2023
@clee1152
Copy link
Contributor Author

clee1152 commented Jun 23, 2023

Changes:

  • Added tests in test_numerical_options.py for num_quantiles
  • removed num_quantiles return in calculations

@clee1152
Copy link
Contributor Author

  • Fixed num_quantiles not passing the tests in the IntColumns, FloatColumns, TextColumns, and NumericStatsMixin tests.
  • Just need to implement test_num_quantiles_option

@taylorfturner
Copy link
Contributor

@clee1152 you also need to add scipy>=1.4.1,<1.11.0 on L 13 in your requirements.txt file ... that will resolve the graph_profile testing errors

@clee1152
Copy link
Contributor Author

Finished writing tests for num_quantiles feature. Tests in test_num_quantiles_option.py.

@taylorfturner taylorfturner removed the Work In Progress Solution is being developed label Jun 28, 2023
@taylorfturner taylorfturner enabled auto-merge (squash) June 28, 2023 15:41
tyfarnan
tyfarnan previously approved these changes Jun 28, 2023
auto-merge was automatically disabled June 28, 2023 17:25

Head branch was pushed to by a user without write access

@taylorfturner taylorfturner enabled auto-merge (squash) June 28, 2023 18:20
@@ -269,6 +269,40 @@ def _validate_helper(self, variable_path: str = "ModeOption") -> list[str]:
return errors


class NumQuantilesOption(BooleanOption):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we call this QuantilesOption instead? That way it is extensible to more than just the number of quantiles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree -- more extensible for future development

@@ -367,6 +404,7 @@ def __init__(self) -> None:
self.median_abs_deviation = BooleanOption(is_enabled=True)
self.num_zeros = BooleanOption(is_enabled=True)
self.num_negatives = BooleanOption(is_enabled=True)
self.num_quantiles = NumQuantilesOption(is_enabled=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should be in NumericalOptions? We already have
self.histogram_and_quantiles = HistogramOption().

Should we rename HistogramOption and insert it in that class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. Does this mean we will not go with your previous suggestion of renaming NumQuantilesOption to QuantileOptions and instead just insert it into HistogramOption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair Q. @taylorfturner may have more thoughts, but if we are keeping
histogram_and_quantiles
Then we should at least rename HistogramOption -> HistogramAndQuantilesOption
We could just move over the num_quantiles setting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me! Will see what @taylorfturner thinks, then implement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @clee1152 and I just talked about this -- I'm game for self. histogram_and_quantiles = HistogramAndQuantilesOption() where self.num_quantiles is set in class HistogramAndQuantilesOption() @JGSweets

@taylorfturner taylorfturner changed the base branch from feature/options to feature/quantile_options July 6, 2023 12:55
@taylorfturner taylorfturner added the Work In Progress Solution is being developed label Jul 6, 2023
@clee1152
Copy link
Contributor Author

Migrated to new PR here.

@clee1152 clee1152 closed this Jul 10, 2023
auto-merge was automatically disabled July 10, 2023 16:52

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Feature A feature addition not currently in the library Work In Progress Solution is being developed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants