Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting rows in AnnData #8

Closed
tischi opened this issue Jul 14, 2022 · 8 comments
Closed

Selecting rows in AnnData #8

tischi opened this issue Jul 14, 2022 · 8 comments

Comments

@tischi
Copy link

tischi commented Jul 14, 2022

Hi,

This is more a discussion than an issue. Please let me know if we should move that somewhere else.

Let me start with a question: Do you have some mechanism to select a couple of rows in an AnnData object? If yes, how do you represent this in terms of python code / data structure? For example, is it just a list with the selected row indices or something else?

@LucaMarconato
Copy link
Member

For the moment I would just create a new object that is a subset of the original one. This can be stored inside the same (or in a new SpatialData object) and can plotted together with the rest of the building blocks (like large images).

In the case of very large feature tables, or point tables, one may take advantage of something like an AnnData view. In implementing the backing system we need to take into account for this.

@tischi
Copy link
Author

tischi commented Jul 14, 2022

of something like an AnnData view

In MoBIE, we have a concept of selected annotations, and we can also serialize them to a JSON file, which lives outside AnnData. The JSON file contains Strings that uniquely identify one row in the table (AnnData). An issue is that sometimes that comprises several columns. For example to uniquely identify an image segment we need: (String) imageId, (int) timePoint, (int) labelId (because we allow that the same labelId MAY occur several times if it annotates different time points in the image).

Any thoughts/comments about this?

@LucaMarconato
Copy link
Member

Ok I understand. If the selected entities are rows of feature tables, also coming from different tables, then we can have an in-memory and serializable representation by creating a global table and by adding a boolean column in obsm (this is similar to the Squidpy’s approach like when selecting visium spots from a napari interactive call). These selected rows are not selection regions/cells themselves, but we can easily retrieve and plot that information since feature tables are linked to regions.
On the other side, we currently don’t have a way to represent a serializable selection of points (we will only have an in-memory represention useful for operations like accumulation of points into regions).

@tischi
Copy link
Author

tischi commented Jul 14, 2022

On the other side, we currently don’t have a way to represent a serializable selection of points (we will only have an in-memory represention useful for operations like accumulation of points into regions).

Serialization is one thing.

Another related thing (in MoBIE) is that we have a text field (UI) where a user can type "a unique ID" in order to manually select a region/segment.
This can be useful, e.g., for collaborative work, e.g. telling another scientist: "Hey you should really look at that region". For this (I think) we may need a String representation of a region?!

...or we could simply go for the rowIndex in the table. I think we decided against this because we thought this is less robust?!
@constantinpape, do you remember?

@tischi
Copy link
Author

tischi commented Jul 14, 2022

Sorry, brainstorm mode...

I do think rowIndex is maybe less flexible, because one may combine tables into bigger tables, and then the rowIndex would change.

@LucaMarconato
Copy link
Member

For “short term usage”, when consistency over time is not a major point, the index would work. A UUID based approach (basically a name for each region), would make possible to have a persistent naming, also when merging tables, but then if the user modifies the object (like it applies a transformation creating a new image), it would fail because the UUID would change. To overcome one could use the tuple (coordinates, coordinate space), to query which region is covering that particular coordinate. This would be slower than a index or UUID, but would allow for “long term usage”.

@tischi
Copy link
Author

tischi commented Jul 18, 2022

FWIW that's the WIP in MoBIE:

public interface Location
{
	int timePoint();
	double[] anchor();
}

public interface Annotation extends Location
{
	String getId();
	Object getValue( String columnName );
	void setString( String columnName, String value );
}

public interface Segment extends Location
{
	String imageId();
	int labelId();

	static String toAnnotationId( String imageId, int timePoint, int labelId )
	{
		return ""+imageId+";"+timePoint+";"+labelId;
	}

	static String toAnnotationId( Segment segment )
	{
		return toAnnotationId( segment.imageId(), segment.timePoint(), segment.labelId() );
	}

	RealInterval boundingBox();
	void setBoundingBox( RealInterval boundingBox );

	float[] mesh();
	void setMesh( float[] mesh );
}

public interface AnnotatedSegment extends Segment, Annotation
{
	static String toAnnotationId( String imageId, int timePoint, int labelId )
	{
		return ""+imageId+";"+timePoint+";"+labelId;
	}

	static String toAnnotationId( Segment segment )
	{
		return toAnnotationId( segment.imageId(), segment.timePoint(), segment.labelId() );
	}
}

In other words any Annotation MUST have an ID and a Location. One may actually better call it SpatialAnnotation but I wanted to keep it short. The idea would be that no matter whether this is a Segment, Region, or Spot, it always MUST have an ID.

TBH, I am not sure I am getting it right yet, because now both Segment and Annotation extend Location, seems like some conceptual fine tuning still needs to be done...

@LucaMarconato
Copy link
Member

Hi, here is a recap of the status of the framework with regard to this discussion. We are going through the old issues in the repo; closing the discussion.

We don't have an explicit concept of "selected annotations" that is saved on disk, nor plans to have such information in the storage format. This because because the user can achieve an equivalent behavior by saving a categorical column to an annotation table. Also we provide APIs that allows to retrieve geometries based on a list of instances from the table (see this PR: #627).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants