Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing features removal with SimpleImputer #124

Open
vitorsrg opened this issue Feb 10, 2020 · 0 comments
Open

Missing features removal with SimpleImputer #124

vitorsrg opened this issue Feb 10, 2020 · 0 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@vitorsrg
Copy link
Contributor

Code sample

In the sample code below, a column is removed from the dataset during the pipeline

>>> from sklearn.impute import SimpleImputer
>>> import numpy as np
>>> imp = SimpleImputer()
>>> imp.fit([[0, np.nan], [1, np.nan]])
>>> imp.transform([[0, np.nan], [1, 1]])
array([[0.],
       [1.]])

Problem description

Currently sklearn.impute.SimpleImputer silently removes features that are np.nan on every training sample.

Therefore

new_cols = pd.DataFrame(data=new_data, columns=columns_to_impute).to_dict('list')
fails as new_data.shape[1] != len(columns_to_impute).

Possible solutions

For the problematic features, either keep their values if valid or impute a default value during transform.

@vitorsrg vitorsrg added bug Something isn't working enhancement New feature or request labels Feb 10, 2020
This was referenced Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant