Selecting rows in AnnData #8

tischi · 2022-07-14T12:05:11Z

Hi,

This is more a discussion than an issue. Please let me know if we should move that somewhere else.

Let me start with a question: Do you have some mechanism to select a couple of rows in an AnnData object? If yes, how do you represent this in terms of python code / data structure? For example, is it just a list with the selected row indices or something else?

The text was updated successfully, but these errors were encountered:

LucaMarconato · 2022-07-14T12:32:20Z

For the moment I would just create a new object that is a subset of the original one. This can be stored inside the same (or in a new SpatialData object) and can plotted together with the rest of the building blocks (like large images).

In the case of very large feature tables, or point tables, one may take advantage of something like an AnnData view. In implementing the backing system we need to take into account for this.

tischi · 2022-07-14T12:46:19Z

of something like an AnnData view

In MoBIE, we have a concept of selected annotations, and we can also serialize them to a JSON file, which lives outside AnnData. The JSON file contains Strings that uniquely identify one row in the table (AnnData). An issue is that sometimes that comprises several columns. For example to uniquely identify an image segment we need: (String) imageId, (int) timePoint, (int) labelId (because we allow that the same labelId MAY occur several times if it annotates different time points in the image).

Any thoughts/comments about this?

LucaMarconato · 2022-07-14T12:57:26Z

Ok I understand. If the selected entities are rows of feature tables, also coming from different tables, then we can have an in-memory and serializable representation by creating a global table and by adding a boolean column in obsm (this is similar to the Squidpy’s approach like when selecting visium spots from a napari interactive call). These selected rows are not selection regions/cells themselves, but we can easily retrieve and plot that information since feature tables are linked to regions.
On the other side, we currently don’t have a way to represent a serializable selection of points (we will only have an in-memory represention useful for operations like accumulation of points into regions).

tischi · 2022-07-14T13:06:38Z

On the other side, we currently don’t have a way to represent a serializable selection of points (we will only have an in-memory represention useful for operations like accumulation of points into regions).

Serialization is one thing.

Another related thing (in MoBIE) is that we have a text field (UI) where a user can type "a unique ID" in order to manually select a region/segment.
This can be useful, e.g., for collaborative work, e.g. telling another scientist: "Hey you should really look at that region". For this (I think) we may need a String representation of a region?!

...or we could simply go for the rowIndex in the table. I think we decided against this because we thought this is less robust?!
@constantinpape, do you remember?

tischi · 2022-07-14T13:10:34Z

Sorry, brainstorm mode...

I do think rowIndex is maybe less flexible, because one may combine tables into bigger tables, and then the rowIndex would change.

LucaMarconato · 2022-07-14T14:03:19Z

For “short term usage”, when consistency over time is not a major point, the index would work. A UUID based approach (basically a name for each region), would make possible to have a persistent naming, also when merging tables, but then if the user modifies the object (like it applies a transformation creating a new image), it would fail because the UUID would change. To overcome one could use the tuple (coordinates, coordinate space), to query which region is covering that particular coordinate. This would be slower than a index or UUID, but would allow for “long term usage”.

tischi · 2022-07-18T08:07:45Z

FWIW that's the WIP in MoBIE:

public interface Location
{
	int timePoint();
	double[] anchor();
}

public interface Annotation extends Location
{
	String getId();
	Object getValue( String columnName );
	void setString( String columnName, String value );
}

public interface Segment extends Location
{
	String imageId();
	int labelId();

	static String toAnnotationId( String imageId, int timePoint, int labelId )
	{
		return ""+imageId+";"+timePoint+";"+labelId;
	}

	static String toAnnotationId( Segment segment )
	{
		return toAnnotationId( segment.imageId(), segment.timePoint(), segment.labelId() );
	}

	RealInterval boundingBox();
	void setBoundingBox( RealInterval boundingBox );

	float[] mesh();
	void setMesh( float[] mesh );
}

public interface AnnotatedSegment extends Segment, Annotation
{
	static String toAnnotationId( String imageId, int timePoint, int labelId )
	{
		return ""+imageId+";"+timePoint+";"+labelId;
	}

	static String toAnnotationId( Segment segment )
	{
		return toAnnotationId( segment.imageId(), segment.timePoint(), segment.labelId() );
	}
}

In other words any Annotation MUST have an ID and a Location. One may actually better call it SpatialAnnotation but I wanted to keep it short. The idea would be that no matter whether this is a Segment, Region, or Spot, it always MUST have an ID.

TBH, I am not sure I am getting it right yet, because now both Segment and Annotation extend Location, seems like some conceptual fine tuning still needs to be done...

LucaMarconato · 2024-07-09T14:58:11Z

Hi, here is a recap of the status of the framework with regard to this discussion. We are going through the old issues in the repo; closing the discussion.

We don't have an explicit concept of "selected annotations" that is saved on disk, nor plans to have such information in the storage format. This because because the user can achieve an equivalent behavior by saving a categorical column to an annotation table. Also we provide APIs that allows to retrieve geometries based on a list of instances from the table (see this PR: #627).

LucaMarconato closed this as completed Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selecting rows in AnnData #8

Selecting rows in AnnData #8

tischi commented Jul 14, 2022

LucaMarconato commented Jul 14, 2022

tischi commented Jul 14, 2022

LucaMarconato commented Jul 14, 2022

tischi commented Jul 14, 2022 •

edited

Loading

tischi commented Jul 14, 2022

LucaMarconato commented Jul 14, 2022

tischi commented Jul 18, 2022

LucaMarconato commented Jul 9, 2024

Selecting rows in AnnData #8

Selecting rows in AnnData #8

Comments

tischi commented Jul 14, 2022

LucaMarconato commented Jul 14, 2022

tischi commented Jul 14, 2022

LucaMarconato commented Jul 14, 2022

tischi commented Jul 14, 2022 • edited Loading

tischi commented Jul 14, 2022

LucaMarconato commented Jul 14, 2022

tischi commented Jul 18, 2022

LucaMarconato commented Jul 9, 2024

tischi commented Jul 14, 2022 •

edited

Loading