Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Behavior: AnnData Allows Indexing with Float Arrays Without Error #1735

Open
2 of 3 tasks
yubin-ai opened this issue Oct 31, 2024 · 5 comments · May be fixed by #1746
Open
2 of 3 tasks

Unexpected Behavior: AnnData Allows Indexing with Float Arrays Without Error #1735

yubin-ai opened this issue Oct 31, 2024 · 5 comments · May be fixed by #1746
Labels
Milestone

Comments

@yubin-ai
Copy link

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

I encountered unexpected behavior when indexing an AnnData object using a list of float values. While indexing with a single float correctly raises an IndexError, indexing with a list or array of floats does not raise an error and proceeds as if the floats were valid indices. This seems inconsistent and could lead to unintended results.

Code:

import anndata
import numpy as np

# Create a sample AnnData object
adata = anndata.AnnData(np.random.rand(100, 10))

# Indexing with a single float value raises an error (as expected)
try:
    adata[43.4, :].obs
except IndexError as e:
    print(f"Single float index error (expected): {e}")

# Indexing with a list of floats does not raise an error (unexpected behavior)
float_indices = [42.85256014, 62.04391223, 26.08972756, 54.38563822, 90.45806554, 78.73412668]
try:
    result = adata[float_indices, :].obs
    print("Indexing with float list succeeded (unexpected):")
    print(result)
except IndexError as e:
    print(f"Float list index error (expected): {e}")

Traceback:

Single float index error (expected): Unknown indexer 43.4 of type <class 'float'>
Indexing with float list succeeded (unexpected):
Empty DataFrameView
Columns: []
Index: [42, 62, 26, 54, 90, 78]

Versions


anndata 0.10.9
numpy 1.26.4
pandas 2.2.3
session_info 1.0.0
torch 2.5.0+cu124
tqdm 4.66.5
...
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0]
Linux-5.10.226-214.880.amzn2.x86_64-x86_64-with-glibc2.36

@ilan-gold
Copy link
Contributor

ilan-gold commented Nov 4, 2024

Hmm, @yubin-ai I am not sure this is unexpected, but perhaps strange. For example:

import anndata as ad
import pandas as pd

adata = ad.AnnData(obs=pd.DataFrame({ 'a': ['c', 'b']}, index=[1.2, 1.3]))
adata[[1.2]]

works, and should work, but

adata[1.2]

would error. In general, I think floating numbers as an index is probably fine, but this raises a question of ambiguity with integers as well. If someone has an integer index, how should we interpret those integers? pandas has mechanisms for disambiguating the purpose of an indexing object, whether its label-based (as above) or its positional.

@flying-sheep thoughts here?

@yubin-ai
Copy link
Author

yubin-ai commented Nov 4, 2024

@ilan-gold Indeed, it’s strange. In your case, using floats as query seems logical that there are floats index. However, in my example, none of the indices are floats. It appears that adata, or perhaps just the underlying pandas DataFrame, rounds the float to an integer and then selects the rounded index, which is concerning since it returns entries that don’t actually exist/match. My title might have to be modified a bit to describe this more clearly.

In my case, the float input was due to an error on my end—a wrong variable was passed—and I didn’t catch it because of how smoothly it was handled. A quick assertion or error check to confirm the query index exists in adata.obs.index could help prevent issues like this.

@ilan-gold
Copy link
Contributor

I agree @yubin-ai - I misread your original print statement too, so didn't catch that it was actually downcasting. So yes, we should then check that.

@ilan-gold
Copy link
Contributor

@yubin-ai I have just been informed we only accept string indices, so my example doesn't really work. So I really think we should just error out then. Thanks for the issue

@ilan-gold ilan-gold added this to the 0.11.1 milestone Nov 8, 2024
@ilan-gold ilan-gold linked a pull request Nov 8, 2024 that will close this issue
3 tasks
@yubin-ai
Copy link
Author

yubin-ai commented Nov 8, 2024

@ilan-gold an error out sounds great. Thanks a lot for working on it!

@ilan-gold ilan-gold modified the milestones: 0.11.1, 0.11.2 Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants