Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG (string dtype): comparison of string column to mixed object column fails #60228 (fixed) #60392

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -769,6 +769,7 @@ Styler
Other
^^^^^
- Bug in :class:`DataFrame` when passing a ``dict`` with a NA scalar and ``columns`` that would always return ``np.nan`` (:issue:`57205`)
- Bug in :func:`comparison_op` where comparing a ``string`` dtype array with an ``object`` dtype array containing mixed types would raise a ``TypeError`` when PyArrow-based strings are enabled. (:issue:`60228`)
- Bug in :func:`eval` on :class:`ExtensionArray` on including division ``/`` failed with a ``TypeError``. (:issue:`58748`)
- Bug in :func:`eval` where the names of the :class:`Series` were not preserved when using ``engine="numexpr"``. (:issue:`10239`)
- Bug in :func:`eval` with ``engine="numexpr"`` returning unexpected result for float division. (:issue:`59736`)
Expand Down
16 changes: 15 additions & 1 deletion pandas/core/ops/array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
is_numeric_v_string_like,
is_object_dtype,
is_scalar,
is_string_dtype,
)
from pandas.core.dtypes.generic import (
ABCExtensionArray,
Expand All @@ -53,7 +54,10 @@

from pandas.core import roperator
from pandas.core.computation import expressions
from pandas.core.construction import ensure_wrapped_if_datetimelike
from pandas.core.construction import (
array as pd_array,
ensure_wrapped_if_datetimelike,
)
from pandas.core.ops import missing
from pandas.core.ops.dispatch import should_extension_dispatch
from pandas.core.ops.invalid import invalid_comparison
Expand Down Expand Up @@ -321,6 +325,16 @@ def comparison_op(left: ArrayLike, right: Any, op) -> ArrayLike:
"Lengths must match to compare", lvalues.shape, rvalues.shape
)

if (is_string_dtype(lvalues) and is_object_dtype(rvalues)) or (
is_object_dtype(lvalues) and is_string_dtype(rvalues)
):
if lvalues.dtype.name == "string" and rvalues.dtype == object:
lvalues = lvalues.astype("string")
rvalues = pd_array(rvalues, dtype="string")
elif rvalues.dtype.name == "string" and lvalues.dtype == object:
rvalues = rvalues.astype("string")
lvalues = pd_array(lvalues, dtype="string")

if should_extension_dispatch(lvalues, rvalues) or (
(isinstance(rvalues, (Timedelta, BaseOffset, Timestamp)) or right is NaT)
and lvalues.dtype != object
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/series/methods/test_compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,17 @@ def test_compare_datetime64_and_string():
tm.assert_series_equal(result_eq1, expected_eq)
tm.assert_series_equal(result_eq2, expected_eq)
tm.assert_series_equal(result_neq, expected_neq)


def test_comparison_string_mixed_object():
# Issue https://github.com/pandas-dev/pandas/issues/60228
pd.options.future.infer_string = True

ser_string = pd.Series(["a", "b"], dtype="string")
ser_mixed = pd.Series([1, "b"])

result = ser_string == ser_mixed
expected = pd.Series([False, True], dtype="boolean")
tm.assert_series_equal(result, expected)

pd.options.future.infer_string = False
Loading