Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visium: Failed to mask tissue #33

Open
iaaaka opened this issue Dec 29, 2021 · 9 comments
Open

Visium: Failed to mask tissue #33

iaaaka opened this issue Dec 29, 2021 · 9 comments

Comments

@iaaaka
Copy link

iaaaka commented Dec 29, 2021

Hi, thank you for the great method! I try to apply it to my Visium dataset but I got the following warnings for all my samples on conversion step:
[2021-12-29 13:40:49,020] ℹ : Running xfuse version 0.2.1 [2021-12-29 13:40:56,493] ℹ : Computing tissue mask: [2021-12-29 13:40:56,500] ⚠ WARNING : UserWarning (/nfs/users/nfs_p/pm19/.local/lib/python3.9/site-packages/xfuse/utility/mask.py:67): Failed to mask tissue OpenCV(4.5.4) /tmp/pip-req-build-kv0l0wqx/opencv/modules/imgproc/src/grabcut.cpp:386: error: (-215:Assertion failed) !bgdSamples.empty() && !fgdSamples.empty() in function 'initGMMs' [2021-12-29 13:41:07,029] ⚠ WARNING : UserWarning (/nfs/users/nfs_p/pm19/.local/lib/python3.9/site-packages/xfuse/convert/utility.py:217): Count matrix contains duplicated columns. Counts will be summed by column name. [2021-12-29 13:41:09,749] ⚠ WARNING : FutureWarning (/nfs/users/nfs_p/pm19/.local/lib/python3.9/site-packages/xfuse/convert/utility.py:227): Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
I mostly worry about "Failed to mask tissue" warning. In this dataset we instructed spaceranger to consider all spots because tissue autodetection failed to find relatively transparent adipose tissue. Then we manually annotated tissue spots and I introduced this information into tissue-positions file (second column). As far as I can see xfuse ignores this information and attempts to mask tissue internaly, but this procedure fails. Am I right that in this case xfuse considers all spots? At least it looks like this based on manuall inspection of data.h5 file and high intensity of some of metagenes in out-of-tissue regions. May I force xfuse to use tissue mask provided in tissue-positions file?

Is "Count matrix contains duplicated columns" warning about gene names?

Then, when I run xfuse at some points it tells that "Registering experiment: ST (data type: "ST")" while actually it is visium data, is it important or can I just ignore it?

@ludvb
Copy link
Owner

ludvb commented Jan 3, 2022

Hi,

Thanks for your feedback! You are right that xfuse currently ignores the second column in the spot_positions file and only uses the image to compute the tissue mask. Tissue masking does not always work, especially when the tissue is not clearly delineated from the background. It is definitely something that would be good to improve.

I have created a new branch improve-visium-masking that attempts to make use of the tissue information in the spot_positions file. Would be interesting to hear if it works better for your tissue! You can use the command pip install git+https://github.com/ludvb/xfuse.git@improve-visium-masking to install the new branch if you'd like to try it out. We could potentially also provide a way for users to provide a custom image mask if this still doesn't work.

To visualize the mask, you can run something like:

import h5py
import matplotlib.pyplot as plt

with h5py.File("/path/to/data.h5") as d:
  mask = d['label'][()] != 1

plt.imshow(mask)
plt.show()

Regarding the duplicated columns warning: Xfuse uses the HGNC IDs from the Space Ranger hdf5 file. There will be some distinct HGNC IDs that refer to multiple ENSEMBL IDs (typically corresponding to different splice variants). The counts for those ENSEMBL IDs are summed when computing the counts for each HGNC ID. This warning is expected for Space Ranger data.

Regarding experiment type: I agree this log message is confusing, ST and Visium data are in fact modeled in the same way. The "ST" experiment type is currently the only one in use.

@iaaaka
Copy link
Author

iaaaka commented Jan 26, 2022

Hi
Thank you, it works much better, but on some samples it mask fiducials as tissue (LP3_2 and LP4_2, see image).
The other question is whether it is possible to extract prediction in numerical form?
image

@ludvb
Copy link
Owner

ludvb commented Feb 1, 2022

Thanks for reporting back! And great to see that the masking works better now. I would not worry too much about the fiducials as they shouldn't impact learning too much, but imputation results may be off in those areas.

It should be possible to extract the prediction data by setting writer = "tensor" under the gene maps config section, e.g.:

[analyses.analysis-gene_maps]
type = "gene_maps"

[analyses.analysis-gene_maps.options]
gene_regex = ".*"
writer = "tensor"

The results are saved as pickled torch.Tensors and can be loaded using torch.load.

Something to be mindful of is that output files tend to be very large, as they store all monte carlo samples, so it may be a good idea to limit the analysis to specific genes using the gene_regex option.

@angadps
Copy link
Contributor

angadps commented Apr 4, 2022

Hi, I'm hitting the same error as described in the first post above. Incidentally, I hit the error when using the improve-visium-masking branch instead of the master.

When running master, most samples work fine except ~15% of them where the masks are being inverted. This usually happens in slides which are almost fully covered by tissue (and so less clear background).

To fix that, I tried the improve-visium-masking branch which from the above discussion, utilizes the tissue-position-list file. However, I don't find that happening. I notice both tissue and background are being picked up and so masking is pretty much not happening.

Am I missing something here? Do I need to set certain config options to make this work smoothly?

@ludvb
Copy link
Owner

ludvb commented Apr 6, 2022

Hi,

Thanks for the report and all the debugging effort so far! :) It seems the current masking procedure has several failure modes. A lot of tweaking would probably be required to make it fully robust, but I think we at least should provide a means for users to specify a custom mask. The custom mask can be annotated manually or created by more specialized tools.

I've updated the improve-visium-masking branch. There is now a new command line flag --mask-file which can be used to specify a single-channel image file of the same size as the --image with the annotations 0=background, 1=foreground, 2=likely background, 3=likely foreground. If you have time to try it out, any feedback would be super helpful!

@angadps
Copy link
Contributor

angadps commented Apr 12, 2022

Hi,

Thanks for the new option. At the moment I won't have time to try it out so sorry about that. When reviewing the masking (for samples where it fails), I notice only a small number of pixels with certain 1=foreground values, most of them being GC_PR_FGD. Perhaps this approach might help. I could create a mask file using the histomicstk package which has worked for me in the past.
A simple alternative in my mind is to create slightly crude mask using the tissue_position_lists.csv file in the spaceranger output. Each circular spot could be stretched into a rectangle of appropriate size to cover all the pixels (or using a better approach if you're already using one). I hope to find time for this later. Thanks!

@ludvb
Copy link
Owner

ludvb commented Apr 13, 2022

The way the masking should work right now is that spots will be assigned as GC_FGD or GC_BGD based on the annotation in the tissue_positions_lists.csv file, while spots outside the tissue will be assigned as GC_PR_FGD or GC_PR_BGD based on the closest spot:

initial_mask = np.where(
label != 0,
np.where(in_tissue, cv.GC_FGD, cv.GC_BGD),
np.where(in_tissue[idx1, idx2], cv.GC_PR_FGD, cv.GC_PR_BGD),
)

There are probably better ways to do this - if you figure something out, any contribution would be much appreciated!

One thing to keep in mind with this way of initializing the mask is that it's best to use the raw_feature_bc_matrix from Space Ranger. The filtered matrix does not contain data from spots outside the tissue, so those spots will get filtered out before the masking step here:

.loc[bc_matrix["matrix"]["barcodes"][()].astype(str)]

This means everything will be assigned as GC_FGD or GC_PR_FGD when using the filtered matrix. I'm not sure if this may be the cause of the issues you are experiencing, but we should probably add a note about this in the README or postpone filtering the tissue_positions_list until after the mask has been created.

@angadps
Copy link
Contributor

angadps commented Apr 13, 2022

Thanks for the explanation. That might explain one of the two failure scenarios that I'm seeing. It might be worth looking at the raw_feature_bc_matrix file instead of the filtered one as I'm seeing a lot more GC_PR_FGD than I should be.

So far this has impacted < 20% of my test samples so I'm still able to evaluate a lot of them with the current piece of code. Eventually I'll be getting back to those 20% and using the raw data matrix will be the first thing that I might try out. I will definitely keep you posted!

@ludvb
Copy link
Owner

ludvb commented Apr 14, 2022

Yep, could be the case. Thanks for your help ironing out all the issues so far. Do keep me posted on how it goes! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants