You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we introduce data leakage, since we select the features on the whole data set.
With scikit-learn's pipelines it's possible to select the 20 (rather a fraction o features) best features for each fold with select from model
With the current set up, you are probably overestimating the AUC.
Besides, cross val score assumes IID samples. However, this will clearly not be the case, since one entity has typically several occurences. I think some thing like time series split or rather an adaption (since we don't have time series in a classical way but rather time slices) should be the correct thing to use here.
Comments on those issues?
Currently, at work, I have the same issues, so I really appreciate the library you developed so far. I haven't seen something similar so far. So thumbs up in any case
The text was updated successfully, but these errors were encountered:
You are right. Actually, the other examples show the correct usage of "backtesting". Shouldn't we at least make a remark here. Changing this, completely, will provide different kaggle results and thus requires more effort to change things.
Hey,
in the notebook, when using:
we introduce data leakage, since we select the features on the whole data set.
With scikit-learn's pipelines it's possible to select the 20 (rather a fraction o features) best features for each fold with select from model
With the current set up, you are probably overestimating the AUC.
Besides, cross val score assumes IID samples. However, this will clearly not be the case, since one entity has typically several occurences. I think some thing like time series split or rather an adaption (since we don't have time series in a classical way but rather time slices) should be the correct thing to use here.
Comments on those issues?
Currently, at work, I have the same issues, so I really appreciate the library you developed so far. I haven't seen something similar so far. So thumbs up in any case
The text was updated successfully, but these errors were encountered: