Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding OrdinalRidge and LAD regressors #687

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions conda-recipe/skll/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ requirements:
- setuptools
- beautifulsoup4
- joblib
- mord
- numpy {{ numpy }}
- pandas
- ruamel.yaml
Expand All @@ -49,6 +50,7 @@ requirements:
- python
- beautifulsoup4
- joblib
- mord
- numpy
- pandas
- ruamel.yaml
Expand Down
1 change: 1 addition & 0 deletions conda_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
beautifulsoup4
joblib
mord
numpy
nose-cov
pandas
Expand Down
22 changes: 15 additions & 7 deletions doc/run_experiment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -334,11 +334,13 @@ Regressors:
* **GradientBoostingRegressor**: `Gradient Boosting Regressor <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html#sklearn.ensemble.GradientBoostingRegressor>`__
* **HuberRegressor**: `Huber Regression <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor>`__
* **KNeighborsRegressor**: `K-Nearest Neighbors Regression <https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor>`__
* **LAD**: `Least Absolute Deviation <https://pythonhosted.org/mord/reference.html#mord.LAD>`__
* **Lars**: `Least Angle Regression <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lars.html#sklearn.linear_model.Lars>`__
* **Lasso**: `Lasso Regression <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso>`__
* **LinearRegression**: `Linear Regression <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression>`__
* **LinearSVR**: `Support Vector Regression using LibLinear <https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVR.html#sklearn.svm.LinearSVR>`__
* **MLPRegressor**: `Multi-layer Perceptron Regression <https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor>`__
* **OrdinalRidge**: `Ridge Regression with negative absolute error as score <https://pythonhosted.org/mord/reference.html#mord.OrdinalRidge>`__
* **RandomForestRegressor**: `Random Forest Regression <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor>`__
* **RANSACRegressor**: `RANdom SAmple Consensus Regression <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RANSACRegressor.html#sklearn.linear_model.RANSACRegressor>`__. Note that the default base estimator is a ``LinearRegression``. A different base regressor can be used by specifying a ``base_estimator`` fixed parameter in the :ref:`fixed_parameters <fixed_parameters>` list.
* **Ridge**: `Ridge Regression <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge>`__
Expand All @@ -354,11 +356,17 @@ Regressors:

Refer to this `example voting configuration file <https://github.com/EducationalTestingService/skll/blob/main/examples/boston/voting.cfg>`__ to see how these parameters are used.

For all regressors *except* ``VotingRegressor``, you can also prepend
``Rescaled`` to the beginning of the full name (e.g., ``RescaledSVR``)
to get a version of the regressor where predictions are rescaled and
constrained to better match the training set. Rescaled regressors
can, however, be used as underlying estimators for ``VotingRegressor``
For all regressors *except* ``LAD``, ``OrdinalRidge``, and ``VotingRegressor``,
you can also prepend ``Rescaled`` to the beginning of the full name
(e.g., ``RescaledSVR``) to get a version of the regressor where predictions
are rescaled and constrained to better match the training set.

``Rescaled`` version of ``LAD`` and ``OrdinalRidge`` regressors are not available
because predictions of these models are already transformed in the range zero to
maximum of the labels, and rescaling of the predictions won't correlate to
the original.

Rescaled regressors can, however, be used as underlying estimators for ``VotingRegressor``
learners.

.. _featuresets:
Expand Down Expand Up @@ -611,7 +619,7 @@ Lasso:

{'random_state': 123456789}

LinearSVC and LinearSVR
LAD, LinearSVC and LinearSVR
.. code-block:: python

{'random_state': 123456789}
Expand All @@ -638,7 +646,7 @@ RANSACRegressor

{'loss': 'squared_loss', 'random_state': 123456789}

Ridge and RidgeClassifier
OrdinalRidge, Ridge and RidgeClassifier
.. code-block:: python

{'random_state': 123456789}
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
beautifulsoup4
joblib
mord
numpy
pandas
ruamel.yaml
Expand Down
35 changes: 25 additions & 10 deletions skll/learner/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import joblib
import numpy as np
import scipy.sparse as sp
from mord import LAD, OrdinalRidge
from sklearn.dummy import DummyClassifier, DummyRegressor # noqa: F401
from sklearn.ensemble import (
AdaBoostClassifier,
Expand Down Expand Up @@ -274,16 +275,31 @@ def __init__(self, # noqa: C901
self._model_kwargs['multi_class'] = 'auto'

if issubclass(self._model_type,
(AdaBoostClassifier, AdaBoostRegressor,
DecisionTreeClassifier, DecisionTreeRegressor,
DummyClassifier, ElasticNet,
(AdaBoostClassifier,
AdaBoostRegressor,
DecisionTreeClassifier,
DecisionTreeRegressor,
DummyClassifier,
ElasticNet,
GradientBoostingClassifier,
GradientBoostingRegressor, Lasso, LinearSVC,
LinearSVR, LogisticRegression, MLPClassifier,
MLPRegressor, RandomForestClassifier,
RandomForestRegressor, RANSACRegressor, Ridge,
RidgeClassifier, SGDClassifier, SGDRegressor,
SVC, TheilSenRegressor)):
GradientBoostingRegressor,
LAD,
Lasso,
LinearSVC,
LinearSVR,
LogisticRegression,
MLPClassifier,
MLPRegressor,
OrdinalRidge,
RandomForestClassifier,
RandomForestRegressor,
RANSACRegressor,
Ridge,
RidgeClassifier,
SGDClassifier,
SGDRegressor,
SVC,
TheilSenRegressor)):
self._model_kwargs['random_state'] = 123456789

if sampler_kwargs:
Expand Down Expand Up @@ -612,7 +628,6 @@ def _create_estimator(self):
if default_param_grid is None:
raise ValueError(f"{self._model_type.__name__} is not a valid "
"learner type.")

estimator = self._model_type(**self._model_kwargs)

return estimator, default_param_grid
Expand Down
33 changes: 25 additions & 8 deletions tests/test_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@ def check_rescaling(name, grid_search=False):


def test_rescaling():
"""test to make sure the rescaled model gives same performance as original"""
# we are not using LAD and OrdinalRidge because they
# does some clipping of the predictions, and thus
# making predictions different.
for regressor_name in ['BayesianRidge',
'ElasticNet',
'HuberRegressor',
Expand Down Expand Up @@ -236,7 +240,8 @@ def test_linear_models():
# the utility function to run the non-linear tests
def check_non_linear_models(name,
use_feature_hashing=False,
use_rescaling=False):
use_rescaling=False,
expected_corr=0.95):

# create a FeatureSet object with the data we want to use
if use_feature_hashing:
Expand Down Expand Up @@ -269,7 +274,7 @@ def check_non_linear_models(name,
# using make_regression_data. To do this, we just
# make sure that they are correlated with pearson > 0.95
cor, _ = pearsonr(predictions, test_fs.labels)
assert_greater(cor, 0.95)
assert_greater(cor, expected_corr)


# the runner function for non-linear regression models
Expand All @@ -284,11 +289,23 @@ def test_non_linear_models():
yield (check_non_linear_models,
regressor_name,
use_feature_hashing,
use_rescaling)
use_rescaling,
0.95)

# the utility function to run the tree-based regression tests

# the runner function for MORD regression models
def test_mord_models():
for (regressor_name,
use_feature_hashing) in product(['OrdinalRidge', 'LAD'],
[False, True]):
yield (check_non_linear_models,
regressor_name,
use_feature_hashing,
False,
0.86)


# the utility function to run the tree-based regression tests
def check_tree_models(name,
use_feature_hashing=False,
use_rescaling=False):
Expand Down Expand Up @@ -696,8 +713,8 @@ def test_invalid_regression_grid_objective():
for learner in ['AdaBoostRegressor', 'BayesianRidge',
'DecisionTreeRegressor', 'ElasticNet',
'GradientBoostingRegressor', 'HuberRegressor',
'KNeighborsRegressor', 'Lars', 'Lasso',
'LinearRegression', 'MLPRegressor',
'KNeighborsRegressor', 'LAD', 'Lars', 'Lasso',
'LinearRegression', 'MLPRegressor', 'OrdinalRidge'
'RandomForestRegressor', 'RANSACRegressor',
'Ridge', 'LinearSVR', 'SVR', 'SGDRegressor',
'TheilSenRegressor']:
Expand All @@ -721,8 +738,8 @@ def test_invalid_regression_metric():
for learner in ['AdaBoostRegressor', 'BayesianRidge',
'DecisionTreeRegressor', 'ElasticNet',
'GradientBoostingRegressor', 'HuberRegressor',
'KNeighborsRegressor', 'Lars', 'Lasso',
'LinearRegression', 'MLPRegressor',
'KNeighborsRegressor', 'LAD', 'Lars', 'Lasso',
'LinearRegression', 'MLPRegressor', 'OrdinalRidge'
'RandomForestRegressor', 'RANSACRegressor',
'Ridge', 'LinearSVR', 'SVR', 'SGDRegressor',
'TheilSenRegressor']:
Expand Down