-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom model no longer supporting scikit-learn 1.3.0 ColumnTransformer #128
Comments
The issue seems to be fixed using scikit-learn 1.5.1 and snowflake-ml-python 1.6.4 (both of which are available and compatible in the Snowflake Anaconda Channel for python 3.11). This issue affected a production model that has been working for the last 6 months using scikit-learn 1.3.0, which was both pinned in a stored procedure's I imagine a issue mitigation would be to export an environment.yml conda file from a successful deployed environment and use it for future deployments? |
Glad to know that you were able to fix the issue. But it is also important that once production runs, it continues to run. Could you please let us know a bit more about your production setup? Are you training a new model within sproc every time and trying to log? All of a sudden one day, log_model() stopped working. Am I right? |
For example, in the log above you see that My setup consists of two stored procedures. One for training the model which dumps two scikit-learn objects (a preprocessing pipeline and a model) to a stage. And one for deploying it, which basically loads the objects from the stage, instantiate a custom model with those objects and logs it to the registry. In the training procedure I've pinned these dependencies.
In the deployment procedure I've pinned these dependencies
And also pass these conda dependencies Every month we run both procedures. And it's worked for I'd say 6 consecutive runs and then suddenly stopped working. If a new image is created every time we run |
I've been debugging the following error when using a model for inference which I never got before using the same deployment procedure.
I suspect that the issue is caused by ColumnTransformer. A string is passed (perhaps the name of the estimator) instead of the instance of the estimator.
This is my testing code. I'm using snowflake-ml 1.5.4 and scikit-learn 1.3.0
Failed test results:
2024-11-06 16:24:34.893 | INFO | main:test_deployment:36 - Logging model to registry...
2024-11-06 16:25:25.593 | INFO | main:test_deployment:46 - Done
2024-11-06 16:25:25.594 | INFO | main:test_deployment:49 - Testing deployed model...
2024-11-06 16:25:36.653 | ERROR | main:test_deployment:54 - (1300) (1304): 01b83179-0105-548e-00ff-7501687c0df7: 100357 (P0000): Python Interpreter Error:
Traceback (most recent call last):
File "/home/udf/4285898201/predict.py", line 78, in infer
predictions_df = runner(input_df[input_cols])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/folders/6l/309364ds7qd0ph4xl9p7c2k40000gq/T/ipykernel_86032/1446644176.py", line 16, in predict
File "/home/udf/4285898201/snowflake-ml-python.zip/snowflake/ml/model/custom_model.py", line 28, in call
return self._func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 313, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/compose/_column_transformer.py", line 1076, in transform
Xs = self._call_func_on_transformers(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/compose/_column_transformer.py", line 885, in _call_func_on_transformers
return Parallel(n_jobs=self.n_jobs)(jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 74, in call
return super().call(iterable_with_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/joblib/parallel.py", line 1918, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 136, in call
return self.function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/pipeline.py", line 1290, in _transform_one
res = transformer.transform(X, **params.transform)
^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'transform'
in function PREDICT with handler predict.infer
2024-11-06 16:25:36.654 | INFO | main:test_deployment:56 - Cleaning up...
2024-11-06 16:25:36.825 | INFO | main:test_deployment:58 - Done
Same as above but with just a no-op inside
ColumnTransformer
.2024-11-06 16:29:05.159 | INFO | main:test_deployment:36 - Logging model to registry...
2024-11-06 16:29:51.906 | INFO | main:test_deployment:46 - Done
2024-11-06 16:29:51.908 | INFO | main:test_deployment:49 - Testing deployed model...
2024-11-06 16:29:59.890 | ERROR | main:test_deployment:54 - (1300) (1304): 01b8317d-0105-56a8-00ff-7501687c875f: 100357 (P0000): Python Interpreter Error:
Traceback (most recent call last):
File "/home/udf/4285898205/predict.py", line 78, in infer
predictions_df = runner(input_df[input_cols])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/folders/6l/309364ds7qd0ph4xl9p7c2k40000gq/T/ipykernel_86032/2800746830.py", line 16, in predict
File "/home/udf/4285898205/snowflake-ml-python.zip/snowflake/ml/model/custom_model.py", line 28, in call
return self._func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 313, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/compose/_column_transformer.py", line 1076, in transform
Xs = self._call_func_on_transformers(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/compose/_column_transformer.py", line 885, in _call_func_on_transformers
return Parallel(n_jobs=self.n_jobs)(jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 74, in call
return super().call(iterable_with_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/joblib/parallel.py", line 1918, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 136, in call
return self.function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python_udf/4efe4f8655d1cbd717f4875029b07a29850ba16ad61ee320466104b713e358ec/lib/python3.11/site-packages/sklearn/pipeline.py", line 1290, in _transform_one
res = transformer.transform(X, **params.transform)
^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'transform'
in function PREDICT with handler predict.infer
2024-11-06 16:29:59.890 | INFO | main:test_deployment:56 - Cleaning up...
2024-11-06 16:30:00.043 | INFO | main:test_deployment:58 - Done
Successful test results:
Here I show some successful results using just an estimator and even a pipeline.
2024-11-06 16:34:54.166 | INFO | main:test_deployment:36 - Logging model to registry...
2024-11-06 16:36:02.810 | INFO | main:test_deployment:46 - Done
2024-11-06 16:36:02.811 | INFO | main:test_deployment:49 - Testing deployed model...
2024-11-06 16:36:12.690 | INFO | main:test_deployment:52 - PASS
2024-11-06 16:36:12.691 | INFO | main:test_deployment:56 - Cleaning up...
2024-11-06 16:36:12.842 | INFO | main:test_deployment:58 - Done
2024-11-06 16:37:13.142 | INFO | main:test_deployment:36 - Logging model to registry...
2024-11-06 16:37:48.581 | INFO | main:test_deployment:46 - Done
2024-11-06 16:37:48.582 | INFO | main:test_deployment:49 - Testing deployed model...
2024-11-06 16:37:58.237 | INFO | main:test_deployment:52 - PASS
2024-11-06 16:37:58.239 | INFO | main:test_deployment:56 - Cleaning up...
2024-11-06 16:37:58.377 | INFO | main:test_deployment:58 - Done
The text was updated successfully, but these errors were encountered: