Scarliles/defuse partitioner #70

SamuelCarliles3 · 2024-07-06T02:42:30Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Defuses Partitioner to prevent viral spread of concrete implementations for each Partitioner subtype in classes which hold a concrete instance

Any other comments?

asv benchmarks run fine in my linux dev vm, fail on setup_cache in my m2 macbook...

added regression forest benchmark

…ubmodulev3

github-actions · 2024-07-06T02:43:42Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`ruff`

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.1.


examples/linear_model/plot_tweedie_regression_insurance_claims.py:82:35: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
   |
81 |     # unquote string fields
82 |     for column_name in df.columns[df.dtypes.values == object]:
   |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
83 |         df[column_name] = df[column_name].str.strip("'")
84 |     return df.iloc[:n_samples]
   |

sklearn/cluster/_optics.py:327:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
325 |         """
326 |         dtype = bool if self.metric in PAIRWISE_BOOLEAN_FUNCTIONS else float
327 |         if dtype == bool and X.dtype != bool:
    |            ^^^^^^^^^^^^^ E721
328 |             msg = (
329 |                 "Data will be converted to boolean for"
    |

sklearn/cluster/tests/test_dbscan.py:294:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
292 |     obj = DBSCAN()
293 |     s = pickle.dumps(obj)
294 |     assert type(pickle.loads(s)) == obj.__class__
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
    |

sklearn/linear_model/tests/test_ridge.py:1023:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1022 |     assert len(ridge_cv.coef_.shape) == 1
1023 |     assert type(ridge_cv.intercept_) == np.float64
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
1024 | 
1025 |     cv = KFold(5)
     |

sklearn/linear_model/tests/test_ridge.py:1031:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1030 |     assert len(ridge_cv.coef_.shape) == 1
1031 |     assert type(ridge_cv.intercept_) == np.float64
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
     |

sklearn/metrics/pairwise.py:2364:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
2362 |         dtype = bool if metric in PAIRWISE_BOOLEAN_FUNCTIONS else "infer_float"
2363 | 
2364 |         if dtype == bool and (X.dtype != bool or (Y is not None and Y.dtype != bool)):
     |            ^^^^^^^^^^^^^ E721
2365 |             msg = "Data was converted to boolean for metric %s" % metric
2366 |             warnings.warn(msg, DataConversionWarning)
     |

sklearn/model_selection/_search.py:1100:24: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1098 |                 arr_dtype = np.dtype(object)
1099 |             else:
1100 |                 if any(np.min_scalar_type(x) == object for x in param_list):
     |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
1101 |                     # `np.result_type` might get thrown off by `.dtype` properties
1102 |                     # (which some estimators have).
     |

sklearn/model_selection/_search.py:1107:52: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1105 |                     # https://github.com/scikit-learn/scikit-learn/issues/29157
1106 |                     arr_dtype = np.dtype(object)
1107 |             if len(param_list) == n_candidates and arr_dtype != object:
     |                                                    ^^^^^^^^^^^^^^^^^^^ E721
1108 |                 # Exclude `object` else the numpy constructor might infer a list of
1109 |                 # tuples to be a 2d array.
     |

sklearn/model_selection/_split.py:2899:27: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
2897 |                 if value is None and hasattr(self, "cvargs"):
2898 |                     value = self.cvargs.get(key, None)
2899 |             if len(w) and w[0].category == FutureWarning:
     |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
2900 |                 # if the parameter is deprecated, don't show it
2901 |                 continue
     |

sklearn/model_selection/tests/test_validation.py:589:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
588 |             # Make sure all the arrays are of np.ndarray type
589 |             assert type(cv_results["test_r2"]) == np.ndarray
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
590 |             assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray
591 |             assert type(cv_results["fit_time"]) == np.ndarray
    |

sklearn/model_selection/tests/test_validation.py:590:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
588 |             # Make sure all the arrays are of np.ndarray type
589 |             assert type(cv_results["test_r2"]) == np.ndarray
590 |             assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
591 |             assert type(cv_results["fit_time"]) == np.ndarray
592 |             assert type(cv_results["score_time"]) == np.ndarray
    |

sklearn/model_selection/tests/test_validation.py:591:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
589 |             assert type(cv_results["test_r2"]) == np.ndarray
590 |             assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray
591 |             assert type(cv_results["fit_time"]) == np.ndarray
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
592 |             assert type(cv_results["score_time"]) == np.ndarray
    |

sklearn/model_selection/tests/test_validation.py:592:20: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
590 |             assert type(cv_results["test_neg_mean_squared_error"]) == np.ndarray
591 |             assert type(cv_results["fit_time"]) == np.ndarray
592 |             assert type(cv_results["score_time"]) == np.ndarray
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
593 | 
594 |             # Ensure all the times are within sane limits
    |

sklearn/utils/estimator_checks.py:1509:8: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1508 |     # func can output tuple (e.g. score_samples)
1509 |     if type(result_full) == tuple:
     |        ^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
1510 |         result_full = result_full[0]
1511 |         result_by_batch = list(map(lambda x: x[0], result_by_batch))
     |

sklearn/utils/tests/test_validation.py:1343:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
     |
1341 |         )
1342 |     assert str(raised_error.value) == str(err_msg)
1343 |     assert type(raised_error.value) == type(err_msg)
     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E721
     |

sklearn/utils/validation.py:874:49: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
    |
872 |         if all(isinstance(dtype_iter, np.dtype) for dtype_iter in dtypes_orig):
873 |             dtype_orig = np.result_type(*dtypes_orig)
874 |         elif pandas_requires_conversion and any(d == object for d in dtypes_orig):
    |                                                 ^^^^^^^^^^^ E721
875 |             # Force object if any of the dtypes is an object
876 |             dtype_orig = object
    |

Found 16 errors.

`cython-lint`

cython-lint detected issues. Please fix them locally and push the changes. Here you can see the detected issues. Note that the installed cython-lint version is cython-lint=0.16.2.


/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pxd:13:1: E265 block comment should start with '# '
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pxd:71:90: W291 trailing whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:9:40: 'swap' imported but unused
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:16:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:36:1: W293 blank line contains whitespace
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:539:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:540:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:541:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:542:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:543:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:544:37: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:550:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:551:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:552:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:553:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_partitioner.pyx:554:41: E127 continuation line over-indented for visual indent
/home/runner/work/scikit-learn/scikit-learn/sklearn/tree/_sort.pxd:13:5: E128 continuation line under-indented for visual indent

_{Generated for commit: 09a8ec5. Link to the linter CI: here}

SamuelCarliles3 and others added 9 commits April 22, 2024 18:54

added regression forest benchmark

7a70a0b

Merge pull request #2 from ssec-jhu/scarliles/regression-benchmark

d9ad68a

added regression forest benchmark

ran black for linting check

893d588

Merge branch 'submodulev3' of github.com:ssec-jhu/scikit-learn into s…

548493c

…ubmodulev3

Merge branch 'neurodata:submodulev3' into submodulev3

089d901

Merge branch 'submodulev3' of github.com:ssec-jhu/scikit-learn into s…

3ba5f74

…ubmodulev3

Merge remote-tracking branch 'neurodata/submodulev3' into submodulev3

29a52be

broke sort functions, partitioners out of _splitter.pyx

cf52ff5

refactored partitioner

8e433a6

fixed some unintended commented out lines in SparsePartitioner

09a8ec5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scarliles/defuse partitioner #70

Scarliles/defuse partitioner #70

SamuelCarliles3 commented Jul 6, 2024

github-actions bot commented Jul 6, 2024 •

edited

Loading

Scarliles/defuse partitioner #70

Are you sure you want to change the base?

Scarliles/defuse partitioner #70

Conversation

SamuelCarliles3 commented Jul 6, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Jul 6, 2024 • edited Loading

❌ Linting issues

ruff

cython-lint

github-actions bot commented Jul 6, 2024 •

edited

Loading

`ruff`

`cython-lint`