Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add counts layer automatically #2261

Open
1 task
mbuttner opened this issue May 23, 2022 · 8 comments
Open
1 task

Add counts layer automatically #2261

mbuttner opened this issue May 23, 2022 · 8 comments
Labels
Area – API API design

Comments

@mbuttner
Copy link
Contributor

mbuttner commented May 23, 2022

  • Additional function parameters / changed functionality / changed defaults?

Hi there,
it would be great to have the counts layer being added automatically to the anndata object when data is being normalised. Could this be added to the sc.pp.normalize_total function?

...

@wubaosheng
Copy link

before normolization, you can do adata.layers['counts']=adata.X.copy() to add the counts of all genes to the layers.

@ivirshup
Copy link
Member

@gtca has wanted this too (e.g. scverse/anndata#706)

@gtca, we've talked about implementing this, and what the API could look like. Did we write more than the referenced issue?

@ivirshup ivirshup added the Area – API API design label Jun 15, 2022
@gtca
Copy link

gtca commented Jun 15, 2022

@ivirshup I don't think so, unless there's work towards scverse/anndata#244.

To follow the ideas in scverse/anndata#706, seems like the steps would be:

  • add an attribute ._X_layer to store which layer .X references;
  • use .X to reference .layers[._X_layer];
  • add in_layer= and out_layer= arguments to scanpy's .pp functions;
  • these functions will also alter ._X_layer.

The second to last point can actually be implemented irrespective of the AnnData change as in_layer=None will mean taking .X.
The question is, should we consider changing the defaults right away, e.g. in_layer="counts", out_layer="lognorm"?

@carmensandoval
Copy link

carmensandoval commented Feb 2, 2023

Related question - is it necessary to do

adata.layers['counts']=adata.X.copy()

or is:

adata.layers['counts']=adata.X sufficient?

@gtca
Copy link

gtca commented Feb 2, 2023

@carmensandoval, there's no implicit copying so one should make a copy explicitly.

Please see the following code for more details:

import numpy as np
from anndata import AnnData
from jax import random

adata = AnnData(X=np.array(random.normal(random.PRNGKey(0), (100, 10))))

print(id(adata.X))
# => 5393766064

adata.layers["X"] = adata.X
adata.layers["X_copy"] = adata.X.copy()

for layer in ("X", "X_copy"):
    print(f"{layer}: ", id(adata.layers[layer]))
# => X:  5393766064
# => X_copy:  5393773552

print(adata.X[0, 0])
# => -1.5721827

adata.X[0, 0] = 0.0
for layer in ("X", "X_copy"):
    print(f"{layer}: ", adata.layers[layer][0, 0])
# => X:  0.0
# => X_copy:  -1.5721827

@cdpolt
Copy link

cdpolt commented Apr 22, 2024

So far as I can tell, any further downstream operations also acts on layers... so it is not useful to store raw counts there since they will just be modified with counts normalization, log normalization, etc. Storing things in layers sequentially, I just end up with a bunch of layers that all are identically fully processed rather than preserving the raw-er aspect of the counts matrix. Not sure if this is new behavior but it is super frustrating

@gtca
Copy link

gtca commented Apr 22, 2024

@cdpolt, is there are specific change ("new behavior") you're referring to?

Storing things in layers sequentially, I just end up with a bunch of layers that all are identically fully processed

Would the code in the previous message be helpful to understand why that happens and how to fix that?

@cdpolt
Copy link

cdpolt commented Apr 22, 2024

@gtca Yes I now see the point about explicit copying preventing the further modification, thanks, that's perfect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area – API API design
Projects
None yet
Development

No branches or pull requests

6 participants