Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: .str.contains na validation #59561

Closed
jbrockmendel opened this issue Aug 20, 2024 · 4 comments · Fixed by #59615
Closed

BUG: .str.contains na validation #59561

jbrockmendel opened this issue Aug 20, 2024 · 4 comments · Fixed by #59615
Labels
API - Consistency Internal Consistency of API/Behavior Bug Strings String extension data type and string data
Milestone

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Aug 20, 2024

import pandas as pd
import pyarrow as pa

ser = pd.Series(['a', 'b', None], dtype=pd.StringDtype(storage="pyarrow"))
ser2 = ser.astype(pd.ArrowDtype(pa.string()))

ser.str.contains("foo", na="bar")  # <- casts "bar" to True
ser2.str.contains("foo", na="bar")  # <- raises

There's a small difference in the _str_contains methods in ArrowExtensionArray vs ArrowStringArray. The latter uses bool(na) when filling null entries, the former uses na directly.

I prefer the no-casting behavior, but mainly think we should be consistent.

update Looks like pandas/tests/strings/test_find_replace.py::test_contains_nan specifically tests na="foo"

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 20, 2024
@mroeschke
Copy link
Member

+1 for the no casting behavior

@mroeschke mroeschke added Strings String extension data type and string data API - Consistency Internal Consistency of API/Behavior and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 20, 2024
@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Aug 20, 2024
@jorisvandenbossche
Copy link
Member

Regardless of which option we choose, it is different from current object dtype (which just puts "bar" in the result as object dtype), so it's another case we should document in #59328

@jbrockmendel
Copy link
Member Author

how about we deprecate (in 2.3) the no-validation behavior across the board?

@jorisvandenbossche
Copy link
Member

Yes, that sounds as a good idea. That makes it less of a plain breaking change for 3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants