-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix package dependency issues #36
base: master
Are you sure you want to change the base?
Conversation
# by the `fit` method, since this allows us to access the transformed | ||
# dataframe. For other models we could use the `transform` method, but | ||
# the GradientBoostingRegressor does not have a `transform` method. | ||
X_transformed, _ = pipeline.price_pipe._fit(X_train, y_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@solegalli previously I was able to use the _fit
method to access the transformed dataframe of the pipeline. This method seems to have been updated (possibly here: scikit-learn/scikit-learn@88ce8cd) and is throwing an error.
Any idea how to access the transformed dataframe for the GradientBoostingRegressor
(since it doesn't have a transform
method?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not straightforward. Here some potential solutions:
https://stackoverflow.com/questions/54332654/how-to-analyse-the-intermediate-steps-of-sklearn-pipeline
https://stackoverflow.com/questions/48743032/get-intermediate-data-state-in-scikit-learn-pipeline?
In short, I think you need to apply all the transformations one after the other, just up to the one before the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I asked the sklearn community through the email list and one of the developers told me to do this:
Using slicing: model[:-1].transform(X)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That worked perfectly :)
from gradient_boosting_model import pipeline
from gradient_boosting_model.config.core import config
from functools import reduce
def test_pipeline_drops_unnecessary_features(pipeline_inputs):
# Given
X_train, X_test, y_train, y_test = pipeline_inputs
assert config.model_config.drop_features in X_train.columns
pipeline.price_pipe.fit(X_train, y_train)
# When
# We access the transformed inputs with slicing
transformed_inputs = pipeline.price_pipe[:-1].transform(X_train)
# Then
assert config.model_config.drop_features in X_train.columns
assert config.model_config.drop_features not in transformed_inputs.columns
def test_pipeline_transforms_temporal_features(pipeline_inputs):
# Given
X_train, X_test, y_train, y_test = pipeline_inputs
# When
# We access the transformed inputs with slicing
transformed_inputs = pipeline.price_pipe[:-1].transform(X_train)
# Then
assert (
transformed_inputs.iloc[0]["YearRemodAdd"]
== X_train.iloc[0]["YrSold"] - X_train.iloc[0]["YearRemodAdd"]
)
@@ -1,2 +1,3 @@ | |||
flask>=1.1.1,<1.2.0 | |||
markupsafe==2.0.1 # https://github.com/aws/aws-sam-cli/issues/3661 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key Section 5 fix
feature_engine>=0.3.1,<0.4.0 | ||
joblib>=0.14.1,<0.15.0 | ||
numpy>=1.20.0,<1.21.0 | ||
pandas>=1.3.5,<1.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key section 4 fix 1
|
||
# old model for testing purposes | ||
# source code: https://github.com/trainindata/deploying-machine-learning-models/tree/master/packages/regression_model | ||
tid-regression-model>=2.0.20,<2.1.0 | ||
tid-regression-model==3.1.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key section 4 fix 2
@@ -1,4 +1,5 @@ | |||
Flask>=1.1.1,<1.2.0 | |||
markupsafe==2.0.1 # https://github.com/aws/aws-sam-cli/issues/3661 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key fix for section 9
@@ -1,12 +1,13 @@ | |||
# ML Model | |||
tid-gradient-boosting-model>=0.1.18,<0.2.0 | |||
tid-gradient-boosting-model>=0.3.0,<0.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key fix for section 9 (2/2)
Note required new publishing of gradient boosting model (so sklearn versions are compatible)
1716ef4
to
72ae086
Compare
Update: For students checking this PR, the fixes have now been applied to the commit history. Please update your local copy from the current master branch
Fix for the
Unit Testing a Production ML Model
section