Remove inplace argument in scanpy 2.0 #2583

grst · 2023-08-01T19:23:56Z

What kind of feature would you like to request?

Other?

Please describe your wishes

Another point that I'd like to throw into the scanpy 2.0 discussion:

Right now, many functions have the inplace argument, that determines if a function should write back to adata or return the result instead.

With this behavior it is hard to make correct type hints. While it is possible with @overload, it is cumbersome because it requires to type out the entire function signature twice. When I asked if it is possible to write these overloads in a more concise way on stackoverflow several users argued that changing the return type based on an argument is an anti-pattern, and I think they convinced me.

Alternative approach

Have two API levels, e.g.

def scanpy.tl.pca(adata: AnnData, **kwargs) -> None: ...

and

def scanpy.lowlevel.tl.pca(data: np.ndarray | sp.spmatrix, n_pcs) -> np.ndarray: ...

Where the former is a wrapper for the latter. This allows to separate the implementation of the actual method using only numpy/scipy data types from the scverse-specific behavior.

The text was updated successfully, but these errors were encountered:

grst · 2023-08-03T15:28:59Z

@flying-sheep suggested to write a hatch-plugin that can automatically create the respective @overload type hints

ivirshup · 2023-08-03T15:37:27Z

Notes from discussion with @Intron7, @flying-sheep, @gtca, and @grst at hackathon:

Updated idea from @gtca, based on: [RFC] Make .X a layer anndata#706
- Use layer_to, layer_fromas argument. Has possibility to still do operations inplace on arrays if you only pass layer_from
- Could be useful to have semantically meaningful default arguments e.g. layers_from="counts"``layers_to="normalized"
Returning a new AnnData object with only new arrays could be a flexible base, as discussed in ad.merge function anndata#658
- inplace=False returning function specific types (sometimes an array, sometimes a dict of arrays) is bad. This would be an alternative.
Being able to update arrays inplace is still important for memory usage
Lots of discussion of when/ how we want to modify the AnnData

flying-sheep · 2023-08-03T17:18:24Z

Unformatted notes by me:

Behaviors that exist for inplace/copy:

update AnnData in place (where appropriate, choose target layer, obsm[key], …)
leave original AnnData alone, return
- new AnnData
- newly created array

inplace=False/copy=True returning array instead of whole object (AnnData) is confusing, but is sometimes done.
but having a choice to return the array makes sense.

grst · 2023-08-11T07:31:22Z

What if we had copy-on-write behavior for AnnData? Then we could never modify AnnData inplace but always return a view of an AnnData with references to the objects that were unchanged and only the new data added.

Pandas seems to be going that route: https://github.com/pandas-dev/pandas/blob/57390ada100466dac777e5b66d5a4f2a72700c38/web/pandas/pdeps/0008-inplace-methods-in-pandas.md (HT @bernheder)

flying-sheep · 2023-08-11T07:51:30Z

Our views are a limited version of COW (_init_as_actual).

If I understand the idea correctly, I’d be for it if we could guarantee that all nested data structures delegate modification to us. I don’t know if that’s possible with things like dask arrays and (at the time of writing) pandas data frames.

gtca · 2023-08-11T11:10:54Z

@flying-sheep This guarantee seems hard to ask for though?

@grst It would be great but it's unclear how to do that properly for a container-like data structure such as AnnData. E.g. even for data frames for value changes that document already states the following:

In contrast, the inplace parameter will be kept for any methods that only modify the underlying data of a pandas object.

grst · 2023-08-11T11:23:49Z

@flying-sheep This guarantee seems hard to ask for though?

Agree it's challenging, but maybe not impossible

@grst It would be great but it's unclear how to do that properly for a container-like data structure such as AnnData.

A couple of examples how I think this would look like:

sc.tl.pca adds a key to .varm, to .obsm and to .uns. We create a new AnnData object where every layer, every element of varm, every element of .obsm, .X, .obs, and .var are references to the objects already present in the previous AnnData object. There's now just additional keys in .obsm and .varm pointing to the new objects.
assign a new column in obs: adata.obs["key"] = value. Create a new AnnData object where all elements are pointers to the objects that already exist. Only .obs becomes a copy of the previous obs data frame (for objects that support copy on write themselves, like pandas does (or will?), the behavior could be delegated and not even the entire data frame would need to be copied.
modify a value in X: adata.X[20, 25] = 42. Again everything would be pointers to the old objects, but X needs to be copied.

grst added Enhancement ✨ Area – API API design labels Aug 1, 2023

grst added this to 2023-08-03 Scanpy tasks for Core Hackathon Aug 1, 2023

grst moved this to Meta in 2023-08-03 Scanpy tasks for Core Hackathon Aug 1, 2023

flying-sheep mentioned this issue Feb 23, 2024

log1p return matrix #2881

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove inplace argument in scanpy 2.0 #2583

Remove inplace argument in scanpy 2.0 #2583

grst commented Aug 1, 2023 •

edited by flying-sheep

Loading

grst commented Aug 3, 2023

ivirshup commented Aug 3, 2023

flying-sheep commented Aug 3, 2023 •

edited

Loading

grst commented Aug 11, 2023

flying-sheep commented Aug 11, 2023

gtca commented Aug 11, 2023 •

edited

Loading

grst commented Aug 11, 2023

Remove inplace argument in scanpy 2.0 #2583

Remove inplace argument in scanpy 2.0 #2583

Comments

grst commented Aug 1, 2023 • edited by flying-sheep Loading

What kind of feature would you like to request?

Please describe your wishes

Alternative approach

grst commented Aug 3, 2023

ivirshup commented Aug 3, 2023

flying-sheep commented Aug 3, 2023 • edited Loading

grst commented Aug 11, 2023

flying-sheep commented Aug 11, 2023

gtca commented Aug 11, 2023 • edited Loading

grst commented Aug 11, 2023

grst commented Aug 1, 2023 •

edited by flying-sheep

Loading

flying-sheep commented Aug 3, 2023 •

edited

Loading

gtca commented Aug 11, 2023 •

edited

Loading