Decision: Could .X = None? #464

ivirshup · 2020-11-24T08:09:23Z

I'd like X to be optional. I mean, it already is, but apparently more thorough testing is needed to verify that it works properly.

What is the opinion of the major anndata developers? 👍 or 👎?

If we agree X should be able to be None, we can use this PR to add tests and fix the code where necessary.

Originally posted by @rcannood in #463 (comment)

The text was updated successfully, but these errors were encountered:

ivirshup · 2020-11-24T09:02:13Z

Some additional context

This has come up previously in #330. It seems that at some point this may have worked, but it's not covered in many tests.

Why

If X is just another layer, why not? It would be a more flexible model. If X is optional, the only truly necessary attributes would be obs_names and var_names.

As mentioned by @rcannood (#463 (comment)), you could leave X out of the saved data – storing just annotations. Now you could store different sets of annotations separately, then merge the elements you want together (#441 (comment)).

I think this could be a nice way to structure the output of operations when inplace=None. It could return an object with a similar structure, but only the resulting set of elements. The end of sc.pp.neighbors could look like:

result = AnnData(
    obs=input.obs[[]],  # Just the index
    var = input.var[[]],
    obsp={"distances": distances, "connectivities": connectivities},
    uns={"neighbors": metadata}
)
if inplace:
    input.update(result)
else:
    return result

Why not

I think most code assumes that X exists. That's certainly true for some code in anndata. There would probably need to be updates in scanpy to handle this case as well.

My main worry is that this touches the X property, and the AnnData initializer, which I think are the roughest parts of the codebase.

Current status

What currently works

You can currently instantiate an AnnData object with X = None (relevant code).

a = AnnData(shape=(10, 20))

This can be read and written.

For Raw, passing X=None seems to mean X is either backed, or should be copied from the parent. This constructor is pretty weird though.

What doesn't

Copying and raw for sure. adata.X = None and del adata.X error as well, but the linked PR has fixes. Just search the codebase for .X and you'll find code that doesn't handle this case (e.g. concat, .chunk_X, .transpose). For elements in layers, this is handled by throwing key errors (when a key is specified) or iterating over the extant elements.

While it turns out we do have some tests around this case, we don't have much. I think we should be able to parameterize existing tests around this though.

Miscellaneous thoughts

This would be a great time to drop dtype, since it doesn't really mean anything if X is None Remove dtype argument from AnnData constructor #434

Ping: @Koncopd @gokceneraslan @falexwolf @flying-sheep

flying-sheep · 2020-11-24T09:16:20Z

I’m in favor. I agree that since X is just the default layer, that means that the list of layers should also be able to be empty.

For Raw, we can use the empty pattern to take the place of the default, right? With None just meaning None:

# utils.py
class Empty(Enum):
    token = auto()

# _core/raw.py
def __init__(
    self,
    ...
    X: Union[np.ndarray, sparse.spmatrix, None, Empty] = Empty.token,
    ...
): ...

Regarding initializers: I think it’s really hard to write them well, as you can’t depend on most things being there yet, so most assumptions in methods are broken, and one needs to be clever, arrange things carefully, and probably still repeat some code. The only fix I can come up with is “composition instead of inheritance”: Break an object down into independent parts which are created individually and then combined. Python isn’t the ideal language for it.

ivirshup · 2020-11-25T06:28:13Z

@flying-sheep

I agree that since X is just the default layer, that means that the list of layers should also be able to be empty.

Unless we actually store X in layers and have an attribute saying which layer corresponds to X, I think X will remain special. That would be a big API change, though could be closer to what SingleCellExperiment and loom do. It would also make backed modification harder.

For Raw, we can use the empty pattern to take the place of the default, right? With None just meaning None:

Isn't None sufficient here? Backed mode can be handled the same way as AnnData, e.g. an .is_backed property. I think the issue with raw is more that it's initializer is broken. See #449 for some examples. Basically it assumes the only way it's going to be called by a user is adata.raw = adata. In most other cases, it's going to do the wrong thing.

Regarding initializers: I think it’s really hard to write them well, as you can’t depend on most things being there yet.

I think we also have an issue of handling too many cases in one function right now. I think it would be reasonable to split out backed initialization into a seperate function, as well as I also think this could even make this even more simple. After setting .obs_names and .var_names, you've got shape. The rest of the attributes can just be assigned through their respective setter methods.

LuckyMD · 2020-11-25T10:28:54Z

Another argument for adata.X being allowed to be None is that there was just a preprint posted to BioRxiv about insufficient cellular metadata being stored in GEO for scRNA-seq submissions. AnnData might be a good standard format for metadata storage also in GEO?

ivirshup · 2020-11-26T07:10:49Z

There has also been broad agreement with this on slack (@rcannood did you want to be invited to that btw?). I'd be happy with aiming for this in 0.8.

@rcannood, how much of this were you interested in helping with?

Of the top of my head, these are the features we'll need check:

Getter, setter, deleter
Initializer (seems to work, needs testing)
Views (basically just check that .X = None doesn't change other view behavior, a[idx].X doesn't do anything weird, and copying views still works)
_subset_inplace
Copying
Backed mode
Concatenation
Transpose
Methods which default to values from X, e.g. obs_vector, to_df.
IO (seems to work, needs testing)
- Maybe specialized writers like to_csvs will need some attention here?
Raw (pretty similar set of things to be checked)
- Initializer
- Backed
- Copies
- Concatenation
- IO

It's a long list, but I'm being granular. These should each be quite small tasks. Ideally many of these can be covered by a refactor of the tests.

ivirshup · 2020-11-26T07:11:47Z

@LuckyMD I would be thrilled if GEO started having some standards for reporting beyond data being in SRA.

rcannood · 2020-11-26T07:56:09Z

I can take a crack at it, as long as it's reviewed by a core anndata developer :) I already have small changes in #463. Do you want to create a separate feature/x_is_none branch for this? Or do we continue from the PR I created earlier?

ivirshup · 2020-11-27T03:28:04Z

That sounds great!

Do you want to create a separate feature/x_is_none branch for this?

I was thinking this should be split into multiple PRs, since many of these are orthogonal. Plus it's much easier to review and merge small PRs. Let's keep that PR's scope to just fixing up the .X property.

I'm gonna mark this as resolved and start tracking what needs to be done in #467

ivirshup added enhancement question labels Nov 24, 2020

ivirshup added this to the 0.8 milestone Nov 25, 2020

rcannood mentioned this issue Nov 26, 2020

Allow del adata.X, adata.X = None #463

Merged

22 tasks

ivirshup closed this as completed Nov 27, 2020

tomouellette mentioned this issue Sep 8, 2023

Assigning to X after initializing as NoneType #1124

Open

rcannood mentioned this issue Sep 10, 2024

[REVIEW]: anndata: Access and store annotated data matrices openjournals/joss-reviews#4371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision: Could .X = None? #464

Decision: Could .X = None? #464

ivirshup commented Nov 24, 2020 •

edited

Loading

ivirshup commented Nov 24, 2020 •

edited

Loading

flying-sheep commented Nov 24, 2020 •

edited

Loading

ivirshup commented Nov 25, 2020

LuckyMD commented Nov 25, 2020 •

edited

Loading

ivirshup commented Nov 26, 2020

ivirshup commented Nov 26, 2020

rcannood commented Nov 26, 2020 •

edited

Loading

ivirshup commented Nov 27, 2020 •

edited

Loading

Decision: Could .X = None? #464

Decision: Could .X = None? #464

Comments

ivirshup commented Nov 24, 2020 • edited Loading

ivirshup commented Nov 24, 2020 • edited Loading

Some additional context

Why

Why not

Current status

What currently works

What doesn't

Miscellaneous thoughts

flying-sheep commented Nov 24, 2020 • edited Loading

ivirshup commented Nov 25, 2020

LuckyMD commented Nov 25, 2020 • edited Loading

ivirshup commented Nov 26, 2020

ivirshup commented Nov 26, 2020

rcannood commented Nov 26, 2020 • edited Loading

ivirshup commented Nov 27, 2020 • edited Loading

ivirshup commented Nov 24, 2020 •

edited

Loading

ivirshup commented Nov 24, 2020 •

edited

Loading

flying-sheep commented Nov 24, 2020 •

edited

Loading

LuckyMD commented Nov 25, 2020 •

edited

Loading

rcannood commented Nov 26, 2020 •

edited

Loading

ivirshup commented Nov 27, 2020 •

edited

Loading