BUG: Fix identification of deserialized np.nan #15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I believe the bug is the same as:
scikit-learn/scikit-learn#11462
Basically, after serializing and deserializing a KNN or Forest imputer, it fails to transform new data, crashing with:
File "/home/rbowden/.local/share/virtualenvs/qls_py-a_jz9n52/lib/python3.7/site-packages/missingpy/missforest.py", line 505, in transform
force_all_finite=force_all_finite, copy=self.copy)
File "/home/rbowden/.local/share/virtualenvs/qls_py-a_jz9n52/lib/python3.7/site-packages/sklearn/utils/validation.py", line 542, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/home/rbowden/.local/share/virtualenvs/qls_py-a_jz9n52/lib/python3.7/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
The temporary fix in my code had been:
imputer.missing_values = np.nan
But I believe this patch fixes the issue within missingpy itself (or at least, fixes that particular issue on my end).