You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue: OneHotEncoding defaults to sets catagories_ to a numpy dtype='<U1' when explicitly given an array of text [["a", "b", "c"]] sklearn sets catagories_ dtype object.
Which is the correct behavior:
either cast catagories_ to dtype=object from OneHotEncoding on dask-ml when explicitly setting arrays (auto[1] is off)
or fix the test so it does not care about catagories_ dtype mismatch OR explicitly set dtype as argument in tests
[1] please note this issue is only with categories explicitly set and not when set to 'auto'.
When set to 'auto' dtype is numpy.float64 array, which is a third behavior. However we are matching what is done by sklearn there.
* text matrix
* spliting the string creates the expected input to FeatureHasher #964
* FeatureHasher issue #963
* addressing catagories_ type mismatch when auto by explicitly setting dtype on test data to object #964
* reverted to just ubuntu for time saving
Describe the issue:
OneHotEncoding defaults to sets
catagories_
to a numpy dtype='<U1' when explicitly given an array of text [["a", "b", "c"]] sklearn setscatagories_
dtypeobject
.Which is the correct behavior:
catagories_
to dtype=object from OneHotEncoding on dask-ml when explicitly setting arrays (auto[1] is off)catagories_
dtype mismatch OR explicitly set dtype as argument in tests[1] please note this issue is only with categories explicitly set and not when set to 'auto'.
When set to 'auto' dtype is numpy.float64 array, which is a third behavior. However we are matching what is done by sklearn there.
Minimal Complete Verifiable Example:
see test_basic_array
Anything else we need to know?:
This is illustrated in the failing test
Sklearn documentation found here shows dtype as object where dask-ml shows dtype as
<U1
found hereEnvironment:
The text was updated successfully, but these errors were encountered: