Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: get_indexer rountripping through string dtype #56013

Merged
merged 9 commits into from
Nov 16, 2024

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 16, 2023

cc @jbrockmendel this feels a bit special casy, thoughts?

@mroeschke mroeschke added Strings String extension data type and string data Index Related to the Index class or subclasses labels Nov 17, 2023
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 22, 2023
@@ -6591,6 +6591,9 @@ def _maybe_cast_listlike_indexer(self, target) -> Index:
"""
Analogue to maybe_cast_indexer for get_indexer instead of get_loc.
"""
if not hasattr(target, "dtype") and self.dtype == object:
# Avoid inference for object since we are casting back later anyway
return Index(target, dtype=self.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this might prevent us from doing intentional inference in _maybe_downcast_for_indexing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel could you think of a concrete situation where this might happen?

(the tests are all passing, so don't seem to cover that?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at _maybe_downcast_for_indexing more closely, we are downcasting object dtype index to the other dtype, because this can give a performance improvement, right?

This might actually also be true for object/string combo, but I think correctness if more important, and if missing values are handled differently by downcasting, we should avoid that for now for this combo, until we can solve that better.

I will just make the check a bit more specific for object dtype then (even more ugly, but at least fixing this issue)

@mroeschke
Copy link
Member

Looks like this has gone stale and needs a rebase. Closing to clear the queue but feel free to reopen when you have time

@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Nov 4, 2024
@jorisvandenbossche jorisvandenbossche merged commit 54c88a2 into pandas-dev:main Nov 16, 2024
51 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Nov 16, 2024
jorisvandenbossche pushed a commit that referenced this pull request Nov 17, 2024
…ough string dtype) (#60339)

Backport PR #56013: BUG: get_indexer rountripping through string dtype

Co-authored-by: Patrick Hoefler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Index.get_indexer casts values is given as list
4 participants