You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using cudf.pandas and iterating over the dtypes of a dataframe, categorical dtype objects are reported as cudf.CategoricalDtype and not pandas.CategoricalDtype, causing isinstance checks to fail unexpectedly.
Steps/Code to reproduce bug
Run the following using python -m cudf.pandas and compare to output without cudf.pandas
$ python repro.py
In for loop: True
With iloc: True
$ python -m cudf.pandas repro.py
In for loop: False
With iloc: True
Expected behavior
Output should be the same for the isinstance checks with and without cudf.pandas and regardless of whether or not we are iterating over dtypes or selecting them by index.
Environment details (please complete the following information):
Environment location: GCP g2-standard-8 instance
Linux Distro/Architecture: Debian 11 Bullseye amd64
GPU Model/Driver: L4 / 550.90.07
CUDA: 12.4
Method of cuDF & cuML install: conda (RAPIDS 24.10)
Additional context
This prevents training an XGBoost model on categorical variables using cudf.pandas if the .plot method of a Series has been called beforehand. See #17166 for information on unexpected behavior from .plot.
The text was updated successfully, but these errors were encountered:
Okay removing the custom iterator made your minimum repro work, but It could break other things (we'll see).
In [1]: %load_ext cudf.pandas
In [2]: import pandas as pd
...:
...: df = pd.DataFrame({"A": ["a", "b", "c", "a"]})
...: df["A"] = df["A"].astype('category')
...:
...: print("In for loop: ", [isinstance(t, pd.CategoricalDtype) for t in df.dtypes][0])
...: print("With iloc: ", isinstance(df.dtypes.iloc[0], pd.CategoricalDtype))
In for loop: True
With iloc: True
Describe the bug
When using cudf.pandas and iterating over the dtypes of a dataframe, categorical dtype objects are reported as
cudf.CategoricalDtype
and notpandas.CategoricalDtype
, causingisinstance
checks to fail unexpectedly.Steps/Code to reproduce bug
Run the following using
python -m cudf.pandas
and compare to output withoutcudf.pandas
Expected behavior
Output should be the same for the
isinstance
checks with and withoutcudf.pandas
and regardless of whether or not we are iterating over dtypes or selecting them by index.Environment details (please complete the following information):
conda list
Output:Additional context
This prevents training an XGBoost model on categorical variables using
cudf.pandas
if the.plot
method of aSeries
has been called beforehand. See #17166 for information on unexpected behavior from.plot
.The text was updated successfully, but these errors were encountered: