Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reader for Stereo-seq files. #70

Merged
merged 54 commits into from
May 24, 2024

Conversation

LLehner
Copy link
Member

@LLehner LLehner commented Jul 18, 2023

Add reader for Stereo-seq files.

TODO:

  • Read cellbin.gef completely
  • Fix LabelsModel
  • Fix ShapesModel
  • counts_per_cell present in .obs?
  • Update table.obsm["cellBorder"]
  • Automatically retrieve dataset identifier

@LLehner
Copy link
Member Author

LLehner commented Jul 31, 2023

file format description @LucaMarconato

https://github.com/STOmics/SAW/tree/main/Documents/FileFormat

@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2023

Codecov Report

Attention: Patch coverage is 45.51282% with 85 lines in your changes are missing coverage. Please review.

Project coverage is 36.98%. Comparing base (755d475) to head (10c267a).
Report is 199 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
- Coverage   41.92%   36.98%   -4.94%     
==========================================
  Files          16       17       +1     
  Lines         854     1352     +498     
==========================================
+ Hits          358      500     +142     
- Misses        496      852     +356     
Files Coverage Δ
src/spatialdata_io/__init__.py 100.00% <100.00%> (ø)
src/spatialdata_io/_constants/_constants.py 100.00% <100.00%> (ø)
src/spatialdata_io/readers/stereoseq.py 16.66% <16.66%> (ø)

... and 9 files with indirect coverage changes

@LLehner LLehner marked this pull request as draft August 20, 2023 11:41
@giovp
Copy link
Member

giovp commented Apr 24, 2024

@LLehner @LucaMarconato how is this looking? should we merge?

@LLehner
Copy link
Member Author

LLehner commented Apr 24, 2024

@giovp tasks 3. and 4. from issue#97 still need to be fixed.

@giovp
Copy link
Member

giovp commented Apr 25, 2024

fantastic, what is the blocker?

@LLehner
Copy link
Member Author

LLehner commented Apr 28, 2024

fantastic, what is the blocker?

aggregation of e.g. image channels over segmentation masks doesn't work yet, perhaps the segmentation mask isn't properly linked to the table. Also plotting with rendering shapes doesn't work yet.

@timtreis
Copy link
Member

timtreis commented May 6, 2024

Is the plotting an issue in spatialdata-plot ? Can you open an issue and tag me, referencing this convo?

@LucaMarconato
Copy link
Member

LucaMarconato commented May 10, 2024

I looked into the points 3 and 4 from #97.

  • Point 3: the labels object has integer values in {0, 1}, which the background being 0 and each cell having label index 1. This makes the aggregation not computable. Possible solutions are to introduce an arbitrary labeling by identifying the connected components, or to look if the labeling already exists in the raw data. @florianingelfinger do you know if the raw data contains such information or should we proceed with a arbitrary labeling? Also, Florian mentioned that some data is available in obsm, I will look into it to see if it can be helpful for this. Edit: I am now parsing the polygonal data from obsm. I am not converting the labels (explained in a comment below).

  • Point 4: there was a bug with the instance_key column getting set to None, I fixed it in ac34e78 (#70). This still doesn't fix point 4 but now I think it should be easy, probably just a string mismatch. I will look into it.

In addition to solving the above, I would also like to address the following points:

  • now that we support representing multiple annotations tables, I would create a table for each bin size (currently we just parse the table for the cell-level data).
  • currently in napari we tacitly subsample a points layer if it contains more than 100000 points. I will add a warning icon next to the label so that the user is warned that this happens (otherwise plotting the bin sizes of 1 leads to a unintuitive visualization). The warnings will tell how to remove this limit.

Finally, we will be working on a rasterization-based approach for rendering large collections of bins. This will come after this PR is merged, but when available, will improve the user experience around Stereo-seq data.

@florianingelfinger
Copy link

Many thanks for your work! To my knowledge there is no cell identifier associated with each cell in the raw data or at least we have not used one so far. I would proceed as suggested with arbitrary labeling!

@LucaMarconato
Copy link
Member

I fixed all the points above, with the exception to the parsing of the labels, which I now parse as an image with two colors instead of as a labels, to avoid confusion. I have tried using scikit-image to relabel the labels image but the number of labels that I obtain and the number of cells is slightly different. This is easily fixable but I will rather skip doing this proprocessing within the reader and let the user choose to perform this if needed.

I will polish the code and make a short example notebook, after this we are good to merge.

@LucaMarconato
Copy link
Member

I prepared and uploaded the notebook here; I removed the outputs because the data is not currently public.

The notebook is affected by two bugs of spatialdata-plot, that I tracked here and here. The visualization work with napari-spatialdata. @timtreis since you have the data locally, could you please have a look at them?

Anyway, since the bugs are not in spatialdata-io, now the PR is ready to merge! Thanks all for the work! 🚀

@LucaMarconato LucaMarconato merged commit 5c912d1 into scverse:main May 24, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvements for Stereoseq reader
6 participants