-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Make .X a layer #706
Labels
Comments
Closed
1 task
Probably makes sense to implement this before we move to supporting Awkward Arrays (or fwiw other data types) in I was wondering if it would make sense that |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is an implementation proposal draft for unifying
.X
and layers as mentioned e.g. in #244. Expanding on previous issues, this one tracks concrete implementation details for achieving the end goal.At this stage, feedback on the topic is appreciated — either here or in #244.
.X
attribute as a referenceThe proposal is to use
adata.X
as a reference to one of the layers. In the text of the proposal below,*.X
or*adata.X
will denote the data in the layer thatadata.X
points to (akin to pointer dereferencing).Benefits
Some benefits include:
.X
and.layers
items store conceptually identical entities.adata.X
will just point to one of them instead of containing its copy.Options
1. Explicit layer name
.X
links to an explicit layer. This reference can be changed by a function that is run on an AnnData object. E.g. a normalisation function can create a new layer and update the reference instead of modifying the values in-place. In-place APIs will continue to modify the values in the*.X
.When
*.X
is deleted,.X
can be eitherNone
or, less favourably, point to the layer that was defined previously in case there's a memory of which layers were linked to in.X
before.2. Last layer
.X
can link toadata.layers[list(adata.layers.keys())[-1]]
. This is easier to reason about but might be an issue when there's no explicit order in layers (e.g. raw counts -> normalised counts -> scaled counts) and new layers are added (e.g. unspliced counts).Additional methods
There should be a way to explicitly change which layer
.X
points to.Alternatives
An alternative, as mentioned in #244, would be to have
.X
as a specific layer. While this is simpler in the short-term, that might not provide some of the benefits listed above.Effect for the end user
Taking the current scanpy API as an example where the matrix is modified in-place, nothing changes for the end user:
On disk
For HDF5, a non-breaking change should be achievable with the HDF5 references functionality. It is to be checked that R (
rhdf5
,hdf5r
) and Julia HDF5 libraries are able to work with references.For Zarr, feedback is needed if this is possible. It is probably not an issue if it's not as Zarr storage for AnnData is less mature/common.
Potential
This is to be expanded in other issues to come.
Idempotent analysis API
In practice, the scaling example above typically looks like this together with its context:
This feature would allow to implement idempotent APIs, i.e.
that would not modify counts, and, importantly, will not affect the end user API provided that the appropriate default arguments for
in_layer
andout_layer
are chosen.The text was updated successfully, but these errors were encountered: