Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ad.merge function #658

Open
ivirshup opened this issue Dec 6, 2021 · 3 comments
Open

ad.merge function #658

ivirshup opened this issue Dec 6, 2021 · 3 comments

Comments

@ivirshup
Copy link
Member

ivirshup commented Dec 6, 2021

I would like to add a anndata.merge function, with similar functionality to xarray.merge.

Example

>>> expr_adata
AnnData object with n_obs × n_vars = 3000 × 15000
    layers: 'counts'
>>> pca_adata
AnnData object with n_obs × n_vars = 3000 × 15000
    uns: 'pca'
    obsm: 'X_pca'
    varm: 'PCs'
>>> ad.merge([expr_adata, pca_adata])
AnnData object with n_obs × n_vars = 3000 × 15000
    layers: 'counts'
    uns: 'pca'
    obsm: 'X_pca'
    varm: 'PCs'

Use cases

Partial-AnnDatas returned from functions

Many scanpy function take an anndata object, produce a number of elements, and add them back to the original anndata object. We could instead produce a new object which only holds the new elements, then ad.merge the results together. By itself, this is the exact same thing, but this refactoring would allow a few new uses.

Instead of updating the original, we could keep the results seperate. This would be useful for generating multiple parameterizations, or having a lightweight object to pass to further objects – as opposed to mutating or copying the whole original object.

Seperating parts of analyses

We could want to keep elements from annotation or analysis seperate until we need them. We could avoid keeping the large arrays in layers for a velocity analysis, until we actually want them.

scirpy

Scirpy has a function for doing this specifically with immense receptor data: scirpy.pp.merge_with_ir. This would be a more general case. The IR AnnData here is a bit like the "partial-AnnDatas" discussed above.

(please let me know if this isn't the case @grst)

Previous discussion

This has been suggested and discussed a number of places.

Requirements

This would require full support for adata.X = None #467

Implimenting this would fit well with an anndata.align (#531) function (e.g. pass multiple anndata objects, return them with axes aligned). As the updates and reindexing are orthogonal.

@ivirshup
Copy link
Member Author

sgkit does something similar to the "Partial-AnnDatas returned from functions", and controls whether a partial or updated object is returned with the merge kwarg

@grst
Copy link
Contributor

grst commented Jan 29, 2023

Just noting that the scirpy use-case is obsolete in favor of MuData after scverse/scirpy#356 gets merged.

@pcm32
Copy link

pcm32 commented May 8, 2024

This would be very nice to have, we produce a lot of heavy objects redundantly due to parametrisation of runs.

@ivirshup ivirshup modified the milestones: 0.11.0, 0.12.0 Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants