Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging AnnData.obs object with Pandas DataFrame may fail #441

Closed
WeilerP opened this issue Nov 3, 2020 · 5 comments
Closed

Merging AnnData.obs object with Pandas DataFrame may fail #441

WeilerP opened this issue Nov 3, 2020 · 5 comments

Comments

@WeilerP
Copy link
Contributor

WeilerP commented Nov 3, 2020

Description

Given an AnnData object adata, it is currently not possible to merge adata.obs with a Pandas DataFrame if the resulting index differs from the original adata.obs_names.

For example (using anndata=0.7.4),

import numpy as np
import pandas as pd
from anndata import AnnData

adata = AnnData(
    pd.DataFrame(
        np.random.randint(0, 10, (3, 3)),
        index=["barcode_1", "barcode_2", "barcode_3"],
        columns=["gene_1", "gene_2", "gene_3"]
    )
)

df = pd.DataFrame(index=["barcode_1", "barcode_2"])

adata.obs = adata.obs.merge(df, left_index=True, right_index=True, how="inner")

throws the ValueError

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_core/anndata.py", line 834, in obs
    self._set_dim_df(value, "obs")
  File "/opt/anaconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_core/anndata.py", line 783, in _set_dim_df
    value_idx = self._prep_dim_index(value.index, attr)
  File "/opt/anaconda3/envs/anndata_bug/lib/python3.8/site-packages/anndata/_core/anndata.py", line 795, in _prep_dim_index
    raise ValueError(
ValueError: Length of passed value for obs_names is 2, but this AnnData has shape: (3, 3)

Is this intended or is it planned to allow merges of this kind in the future? Is there an easier workaround than subsetting both the DataFrame and AnnData prior to merging?

@ivirshup
Copy link
Member

ivirshup commented Nov 9, 2020

I don't think we'll allow assignment to the .obs attribute to change the shape of the AnnData object, no.

@ivirshup
Copy link
Member

ivirshup commented Nov 9, 2020

I think it could be useful for this kind of thing to be easier. Not sure what the right api is for it though.

@WeilerP
Copy link
Contributor Author

WeilerP commented Nov 19, 2020

How about a new class method merge_obs which basically executes the following steps:

df, adata_obs = df.align(adata.obs, join="inner", axis=0)
adata = adata[adata_obs.index, :]
adata.obs = adata.obs.merge(labels, left_index=True, right_index=True)

The arguments to tweak Pandas align and merge should be arguments of the method.

@ivirshup
Copy link
Member

ivirshup commented Nov 20, 2020

How about something more like xarray.merge? This could reuse code/ arguments from anndata.concat. Usage could look like:

adata = anndata.merge([adata, {"obs": df, "obsm": {...}}], join="inner", strategy="first")

One issue is that this doesn't mutate the AnnData object, but there could be an inplace variant that just didn't allow outer joins.


This could also lead to adding reindex and reindex_like methods.

@ivirshup
Copy link
Member

ivirshup commented Dec 6, 2021

I'm going to close this in favor of #658 where I've fleshed the idea out a bit more, so what would actually be implemented is the focus of the issue

@ivirshup ivirshup closed this as completed Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants