You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PerColumnImputer can impute floating point values into integer data with the mean or median numeric impute strategies. When this happens, we cannot simply reinitialize the original data's woodwork schema via X_t.ww.init(schema=original_schema.get_subset_schema(X_t.columns)) like we currently do, since it would try to use Int64 on floating point data, which results in an error.
We'll need to use _get_new_logical_types_for_imputed_data similar to how other imputers do in order to use the correct logical types for imputed data. Note that because the per-column imputer can have different strategies for different columns, we'll need to either change _get_new_logical_types_for_imputed_data to allow per column strategies, or call it individually for every column.
below is a test that produces the type conversion error
deftest_per_column_imputer_float_imputed_into_int(imputer_test_data):
X=imputer_test_data.ww[["int with nan"]]
strategies= {
"int with nan": {"impute_strategy": "mean"},
}
transformer=PerColumnImputer(impute_strategies=strategies)
transformer.fit(X)
transformer.transform(X)
The text was updated successfully, but these errors were encountered:
The
PerColumnImputer
can impute floating point values into integer data with themean
ormedian
numeric impute strategies. When this happens, we cannot simply reinitialize the original data's woodwork schema viaX_t.ww.init(schema=original_schema.get_subset_schema(X_t.columns))
like we currently do, since it would try to useInt64
on floating point data, which results in an error.We'll need to use
_get_new_logical_types_for_imputed_data
similar to how other imputers do in order to use the correct logical types for imputed data. Note that because the per-column imputer can have different strategies for different columns, we'll need to either change_get_new_logical_types_for_imputed_data
to allow per column strategies, or call it individually for every column.below is a test that produces the type conversion error
The text was updated successfully, but these errors were encountered: