Missing features removal with SimpleImputer #124

vitorsrg · 2020-02-10T19:49:19Z

Code sample

In the sample code below, a column is removed from the dataset during the pipeline

>>> from sklearn.impute import SimpleImputer
>>> import numpy as np
>>> imp = SimpleImputer()
>>> imp.fit([[0, np.nan], [1, np.nan]])
>>> imp.transform([[0, np.nan], [1, 1]])
array([[0.],
       [1.]])

Problem description

Currently sklearn.impute.SimpleImputer silently removes features that are np.nan on every training sample.

Therefore

fklearn/src/fklearn/training/imputation.py

Line 43 in 06475b6

    
           new_cols = pd.DataFrame(data=new_data, columns=columns_to_impute).to_dict('list')

fails as new_data.shape[1] != len(columns_to_impute).

Possible solutions

For the problematic features, either keep their values if valid or impute a default value during transform.

The text was updated successfully, but these errors were encountered:

vitorsrg added bug Something isn't working enhancement New feature or request labels Feb 10, 2020

This was referenced Feb 12, 2020

Imputer fill_value #125

Closed

Imputer fill_value #126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing features removal with SimpleImputer #124

Missing features removal with SimpleImputer #124

vitorsrg commented Feb 10, 2020

Missing features removal with SimpleImputer #124

Missing features removal with SimpleImputer #124

Comments

vitorsrg commented Feb 10, 2020

Code sample

Problem description

Possible solutions