-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds sample_weight option to estimators fit method #441
Conversation
Hello @kota7! Thanks for updating the PR.
Comment last updated on September 24, 2018 at 03:55 Hours UTC |
@@ -111,6 +111,8 @@ def fit(self, X, y): | |||
n_features is the number of features. | |||
y : array-like, shape = [n_samples] or [n_samples, n_targets] | |||
Target values. | |||
sample_weight : array-like, shape = [n_samples], optional | |||
Sample weights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you additionally specify that these are used by both the level-1 and meta-regressors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. And the meta regressor too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe as doc string sth like
Sample weights passed as sample_weights
to each regressor in the regressors
list as well as the meta_regressor
.
The rest of the PR looks great btw! Thanks!
@rasbt This thread may not be the best place to talk about this, but when I update the test script For example, below this part: mlxtend/mlxtend/regressor/tests/test_stacking_cv_regression.py Lines 22 to 30 in 030c1b7
if you add another line:
then tests start to fail. This is perhaps because cross validation generators share random seed with numpy.
As a side effect, when I some test raises exception, then the random seed changes, which makes other tests to fail. Due to this, error report becomes confusing since tests that have no problem fail. A possible workaround is to explicitly define cross validation object with specific random state at the top of script, and keep using it in tests. Would you want me to do this, or do you have any thoughts? |
Thanks for bringing this up. This is actually a bit of a messy unit test design. The random seed should either be set for each function individually or it should be using the I think the reason why the StackingCVRegressor currently doesn't have a So, one option would be resetting the random seed in each unit test seems to be the best option for now, but that might be a lot of work, unfortunately, because the unit tests might produce different results then like you mentioned. Your suggestion
is probably better because in case we change the the StackingCVRegressor in future to have its own random_state, a fixed Kfold object wouldn't require changing all the unit tests again |
Looks great so far, thanks! Are you planning to add this for the other Stacking classes as well? You don't have to, but if you do, I would really appreciate it and wait a bit before merging. |
@rasbt Yearh, I made a very similar edits to five estimators. I think it is ready for your review. Thanks! |
This looks really nice overall. I am wondering though..., don't the sklearn estimators accept
|
As far as I know, mlxtend/mlxtend/regressor/tests/test_stacking_regression.py Lines 132 to 146 in b46079a
That is, if you code just |
Good point! Otherwise the PR seems fine and I'd be happy to merge. Or do you have any additions in mind? |
No more addition. Please merge! |
Awesome! Thanks for this PR, really appreciate it! |
Description
Adds
sample_weight
option to thefit
method of estimators.Aims to cover the followings:
Related issues or pull requests
Fixes #438
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)nosetests ./mlxtend -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,nosetests ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend