Adds sample_weight option to estimators fit method #441

kota7 · 2018-09-23T23:00:59Z

Description

Adds sample_weight option to the fit method of estimators.
Aims to cover the followings:

Related issues or pull requests

Fixes #438

Pull Request Checklist

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
Ran nosetests ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., nosetests ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./mlxtend

pep8speaks · 2018-09-23T23:01:01Z

Hello @kota7! Thanks for updating the PR.

In the file mlxtend/classifier/ensemble_vote.py, following are the PEP8 issues :

Line 176:54: W291 trailing whitespace

There are no PEP8 issues in the file mlxtend/classifier/stacking_classification.py !
There are no PEP8 issues in the file mlxtend/classifier/stacking_cv_classification.py !
There are no PEP8 issues in the file mlxtend/classifier/tests/test_ensemble_vote_classifier.py !
There are no PEP8 issues in the file mlxtend/classifier/tests/test_stacking_classifier.py !
There are no PEP8 issues in the file mlxtend/classifier/tests/test_stacking_cv_classifier.py !
There are no PEP8 issues in the file mlxtend/regressor/stacking_cv_regression.py !
In the file mlxtend/regressor/stacking_regression.py, following are the PEP8 issues :

Line 69:80: E501 line too long (80 > 79 characters)

There are no PEP8 issues in the file mlxtend/regressor/tests/test_stacking_cv_regression.py !
There are no PEP8 issues in the file mlxtend/regressor/tests/test_stacking_regression.py !

Comment last updated on September 24, 2018 at 03:55 Hours UTC

rasbt · 2018-09-23T23:07:32Z

mlxtend/regressor/stacking_regression.py

@@ -111,6 +111,8 @@ def fit(self, X, y):
            n_features is the number of features.
        y : array-like, shape = [n_samples] or [n_samples, n_targets]
            Target values.
+        sample_weight : array-like, shape = [n_samples], optional
+            Sample weights.


Could you additionally specify that these are used by both the level-1 and meta-regressors?

Sure. And the meta regressor too.

Yeah, maybe as doc string sth like

Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor.

The rest of the PR looks great btw! Thanks!

coveralls · 2018-09-23T23:16:13Z

Coverage increased (+0.5%) to 91.556% when pulling 4a7229d on kota7:fit-with-sample_weight into 030c1b7 on rasbt:master.

kota7 · 2018-09-24T00:16:19Z

@rasbt This thread may not be the best place to talk about this, but when I update the test script test_stacking_cv_regressor.py, I found that generating another random vector make existing tests fail.

For example, below this part:

mlxtend/mlxtend/regressor/tests/test_stacking_cv_regression.py

Lines 22 to 30 in 030c1b7

    
           # Some test data 
        
           np.random.seed(1) 
        
           X1 = np.sort(5 * np.random.rand(40, 1), axis=0) 
        
           X2 = np.sort(5 * np.random.rand(40, 2), axis=0) 
        
           X3 = np.zeros((40, 3)) 
        
           y = np.sin(X1).ravel() 
        
           y[::5] += 3 * (0.5 - np.random.rand(8)) 
        
           y2 = np.zeros((40,))

if you add another line:

w = np.random.random(40)

then tests start to fail. This is perhaps because cross validation generators share random seed with numpy.
To see this, the below code generates two different results, due to the call for np.random.random after setting seed.

import numpy as np
from sklearn.model_selection import KFold

np.random.seed(1)
cv = KFold(2, shuffle=True)
print(list(cv.split([1,2,3,4,5,6])))

np.random.seed(1)
np.random.random(10)
cv = KFold(2, shuffle=True)
print(list(cv.split([1,2,3,4,5,6])))

As a side effect, when I some test raises exception, then the random seed changes, which makes other tests to fail. Due to this, error report becomes confusing since tests that have no problem fail.

A possible workaround is to explicitly define cross validation object with specific random state at the top of script, and keep using it in tests. Would you want me to do this, or do you have any thoughts?

rasbt · 2018-09-24T00:33:42Z

Thanks for bringing this up. This is actually a bit of a messy unit test design. The random seed should either be set for each function individually or it should be using the random_state via scikit-learn, which is probably the better solution.

I think the reason why the StackingCVRegressor currently doesn't have a random_state parameter is because it uses the sklearn.model_selection.check_cv method, which doesn't support that either.

So, one option would be resetting the random seed in each unit test seems to be the best option for now, but that might be a lot of work, unfortunately, because the unit tests might produce different results then like you mentioned.

Your suggestion

A possible workaround is to explicitly define cross validation object with specific random state at the top of script, and keep using it in tests.

is probably better because in case we change the the StackingCVRegressor in future to have its own random_state, a fixed Kfold object wouldn't require changing all the unit tests again

rasbt · 2018-09-24T03:20:26Z

Looks great so far, thanks! Are you planning to add this for the other Stacking classes as well? You don't have to, but if you do, I would really appreciate it and wait a bit before merging.

kota7 · 2018-09-24T04:02:40Z

@rasbt Yearh, I made a very similar edits to five estimators. I think it is ready for your review. Thanks!

rasbt · 2018-09-24T04:45:37Z

This looks really nice overall. I am wondering though..., don't the sklearn estimators accept sample_weight=None? This could simplify a lot of the if/else statements that are currently in there:

     if sample_weight is None:
        regr.fit(X, y)		            
             regr.fit(X, y)
         else:
             regr.fit(X, y, sample_weight=sample_weight)

kota7 · 2018-09-24T04:57:42Z

As far as I know, Lasso, MLPClassifier, KNerighborsClassifier do not support it, hence raises exception if you give sample_weight to them. This piece of unittest catches that:

mlxtend/mlxtend/regressor/tests/test_stacking_regression.py

Lines 132 to 146 in b46079a

    
           def test_weight_unsupported_with_no_weight(): 
        
               # pass no weight to regressors with no weight support 
        
               # should not be a problem 
        
               lr = LinearRegression() 
        
               svr_lin = SVR(kernel='linear') 
        
               ridge = Ridge(random_state=1) 
        
               svr_rbf = SVR(kernel='rbf') 
        
               lasso = Lasso(random_state=1) 
        
               stregr = StackingRegressor(regressors=[svr_lin, lr, ridge, lasso], 
        
                                          meta_regressor=svr_rbf) 
        
               stregr.fit(X1, y).predict(X1) 
        
               stregr = StackingRegressor(regressors=[svr_lin, lr, ridge], 
        
                                          meta_regressor=lasso) 
        
               stregr.fit(X1, y).predict(X1)

That is, if you code just regr.fit(X, y, sample_weight=sample_weight), sample_weight=None is passed to Lasso, which raises error.

rasbt · 2018-09-24T05:19:05Z

Good point! Otherwise the PR seems fine and I'd be happy to merge. Or do you have any additions in mind?

kota7 · 2018-09-24T06:19:49Z

No more addition. Please merge!

rasbt · 2018-09-24T06:45:56Z

Awesome! Thanks for this PR, really appreciate it!

kota7 added 2 commits September 23, 2018 17:39

Adds sample_weight option to StackingRegressor

c1af79b

update CHANGELOG

7900a89

rasbt reviewed Sep 23, 2018

View reviewed changes

kota7 added 3 commits September 23, 2018 20:24

add sample_weight to stacking cv regressor

000a22e

Add test for cv regressor and docstring

e871768

Fix style

944968e

kota7 added 5 commits September 23, 2018 22:52

Add sample _weight to ensemble vote

f5057bc

Add sample_weight to stacking classifier

4d09f8f

add sample_weight to stacking cv classification

b21be33

fix behavior when sample weight is None

b46079a

Update changelog

4a7229d

kota7 changed the title ~~[WIP] Adds sample_weight option to estimators fit method~~ Adds sample_weight option to estimators fit method Sep 24, 2018

rasbt merged commit c55d849 into rasbt:master Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds sample_weight option to estimators fit method #441

Adds sample_weight option to estimators fit method #441

kota7 commented Sep 23, 2018 •

edited

Loading

pep8speaks commented Sep 23, 2018 •

edited

Loading

rasbt Sep 23, 2018

kota7 Sep 23, 2018

rasbt Sep 23, 2018

coveralls commented Sep 23, 2018 •

edited

Loading

kota7 commented Sep 24, 2018

rasbt commented Sep 24, 2018

rasbt commented Sep 24, 2018

kota7 commented Sep 24, 2018 •

edited

Loading

rasbt commented Sep 24, 2018 •

edited

Loading

kota7 commented Sep 24, 2018

rasbt commented Sep 24, 2018

kota7 commented Sep 24, 2018

rasbt commented Sep 24, 2018

Adds sample_weight option to estimators fit method #441

Adds sample_weight option to estimators fit method #441

Conversation

kota7 commented Sep 23, 2018 • edited Loading

Description

Related issues or pull requests

Pull Request Checklist

pep8speaks commented Sep 23, 2018 • edited Loading

Comment last updated on September 24, 2018 at 03:55 Hours UTC

rasbt Sep 23, 2018

Choose a reason for hiding this comment

kota7 Sep 23, 2018

Choose a reason for hiding this comment

rasbt Sep 23, 2018

Choose a reason for hiding this comment

coveralls commented Sep 23, 2018 • edited Loading

kota7 commented Sep 24, 2018

rasbt commented Sep 24, 2018

rasbt commented Sep 24, 2018

kota7 commented Sep 24, 2018 • edited Loading

rasbt commented Sep 24, 2018 • edited Loading

kota7 commented Sep 24, 2018

rasbt commented Sep 24, 2018

kota7 commented Sep 24, 2018

rasbt commented Sep 24, 2018

kota7 commented Sep 23, 2018 •

edited

Loading

pep8speaks commented Sep 23, 2018 •

edited

Loading

coveralls commented Sep 23, 2018 •

edited

Loading

kota7 commented Sep 24, 2018 •

edited

Loading

rasbt commented Sep 24, 2018 •

edited

Loading