-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented options for num_quantiles
feature in NumericStatsMixin
#896
Implemented options for num_quantiles
feature in NumericStatsMixin
#896
Conversation
* memory optimization preset ttrying again ttrying again 3 ttrying again 4 accidentally pushed my updated makefile * Wrote catch for invalid presets, wrote test for catch for invalid presets, debugged new optimization preset * Forgot to run pre-commit, fixed those issues * black doing weird things * made preset validation more maintainable by moving it to the constructor and getting rid of preset list
* RowStatisticsOptions: Add null row count Added null_row_count as an option in RowStatisticsOptions. It toggles the functionality for row_has_null_ratio and row_is_null_ratio in _update_row_statistics. * Unit test for RowStatisticOptions: * Black formatting * RowStatisticsOptions: Add null row count Added null_row_count as an option in RowStatisticsOptions. It toggles the functionality for row_has_null_ratio and row_is_null_ratio in _update_row_statistics. * Unit test for RowStatisticOptions: * Black formatting * added a unit test for RowStatisticsOptions * Deleted test cases that were written in the wrong file * updated testing for null_count toggle in _update_row_statistics * removed the RowStatisticsOptions from test_profiler_options imports * add line * Created toggle option for null_count * RowStatisticsOptions: Add implementation * Revert "RowStatisticsOptions: Add implementation" This reverts commit 2da6a93. * RowStatsticsOptions: Create option * fixed pre-commit error * Update dataprofiler/profilers/profiler_options.py Co-authored-by: Taylor Turner <[email protected]> * Update dataprofiler/profilers/profiler_options.py Co-authored-by: Taylor Turner <[email protected]> * fixed documentation --------- Co-authored-by: Taylor Turner <[email protected]>
* memory optimization preset ttrying again ttrying again 3 ttrying again 4 accidentally pushed my updated makefile * trying * trying * black doing weird things * trying * made preset validation more maintainable by moving it to the constructor and getting rid of preset list * Update to open-source in prep for wrapper changes for mem op preset * updated preset toggles and preset name (mem op -> large data) * updated tests to match * continued name and test and toggle updates * fix comments
* Implementing option * Implementing option * took out redundant if statement. added test case for when null_count is disabled. * attempt to check for conflicts between profile merges * added test to check if two profilers have null_count enabled before merging them together * fixed typo and added a trycatch to prevent failing test * No mocks needed. Fixed assertRaisesRegex error * Changed variables names and added a new test to check for check the null_count when null_count is disabled. * Changed name of test, moved tests to TestStructuredProfilerRowStatistics. Fixed position of if statement to prevent unnecessary code from running. * added null_count test cases * fixed indentation mistake * fixed typo * removed a useless commented a line * Updated test name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- needs tests for these edits
- also one slight change not to return in
__calculations
Changes:
|
|
@clee1152 you also need to add |
Finished writing tests for |
dataprofiler/tests/profilers/profiler_options/test_num_quantiles_option.py
Outdated
Show resolved
Hide resolved
dataprofiler/tests/profilers/profiler_options/test_num_quantiles_option.py
Outdated
Show resolved
Hide resolved
dataprofiler/tests/profilers/profiler_options/test_num_quantiles_option.py
Outdated
Show resolved
Hide resolved
dataprofiler/tests/profilers/profiler_options/test_num_quantiles_option.py
Outdated
Show resolved
Hide resolved
dataprofiler/tests/profilers/profiler_options/test_num_quantiles_option.py
Outdated
Show resolved
Hide resolved
dataprofiler/tests/profilers/profiler_options/test_num_quantiles_option.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Taylor Turner <[email protected]>
Head branch was pushed to by a user without write access
…es_option.py Co-authored-by: Taylor Turner <[email protected]>
@@ -269,6 +269,40 @@ def _validate_helper(self, variable_path: str = "ModeOption") -> list[str]: | |||
return errors | |||
|
|||
|
|||
class NumQuantilesOption(BooleanOption): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we call this QuantilesOption
instead? That way it is extensible to more than just the number of quantiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree -- more extensible for future development
@@ -367,6 +404,7 @@ def __init__(self) -> None: | |||
self.median_abs_deviation = BooleanOption(is_enabled=True) | |||
self.num_zeros = BooleanOption(is_enabled=True) | |||
self.num_negatives = BooleanOption(is_enabled=True) | |||
self.num_quantiles = NumQuantilesOption(is_enabled=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this should be in NumericalOptions
? We already have
self.histogram_and_quantiles = HistogramOption()
.
Should we rename HistogramOption
and insert it in that class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clee1152 @taylorfturner , thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense. Does this mean we will not go with your previous suggestion of renaming NumQuantilesOption
to QuantileOptions
and instead just insert it into HistogramOption
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair Q. @taylorfturner may have more thoughts, but if we are keeping
histogram_and_quantiles
Then we should at least rename HistogramOption
-> HistogramAndQuantilesOption
We could just move over the num_quantiles
setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me! Will see what @taylorfturner thinks, then implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrated to new PR here. |
Pull request was closed
Original issue here.