-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visium: Failed to mask tissue #33
Comments
Hi, Thanks for your feedback! You are right that xfuse currently ignores the second column in the spot_positions file and only uses the image to compute the tissue mask. Tissue masking does not always work, especially when the tissue is not clearly delineated from the background. It is definitely something that would be good to improve. I have created a new branch improve-visium-masking that attempts to make use of the tissue information in the spot_positions file. Would be interesting to hear if it works better for your tissue! You can use the command To visualize the mask, you can run something like: import h5py
import matplotlib.pyplot as plt
with h5py.File("/path/to/data.h5") as d:
mask = d['label'][()] != 1
plt.imshow(mask)
plt.show() Regarding the duplicated columns warning: Xfuse uses the HGNC IDs from the Space Ranger hdf5 file. There will be some distinct HGNC IDs that refer to multiple ENSEMBL IDs (typically corresponding to different splice variants). The counts for those ENSEMBL IDs are summed when computing the counts for each HGNC ID. This warning is expected for Space Ranger data. Regarding experiment type: I agree this log message is confusing, ST and Visium data are in fact modeled in the same way. The "ST" experiment type is currently the only one in use. |
Thanks for reporting back! And great to see that the masking works better now. I would not worry too much about the fiducials as they shouldn't impact learning too much, but imputation results may be off in those areas. It should be possible to extract the prediction data by setting [analyses.analysis-gene_maps]
type = "gene_maps"
[analyses.analysis-gene_maps.options]
gene_regex = ".*"
writer = "tensor" The results are saved as pickled torch.Tensors and can be loaded using Something to be mindful of is that output files tend to be very large, as they store all monte carlo samples, so it may be a good idea to limit the analysis to specific genes using the |
Hi, I'm hitting the same error as described in the first post above. Incidentally, I hit the error when using the When running To fix that, I tried the Am I missing something here? Do I need to set certain config options to make this work smoothly? |
Hi, Thanks for the report and all the debugging effort so far! :) It seems the current masking procedure has several failure modes. A lot of tweaking would probably be required to make it fully robust, but I think we at least should provide a means for users to specify a custom mask. The custom mask can be annotated manually or created by more specialized tools. I've updated the |
Hi, Thanks for the new option. At the moment I won't have time to try it out so sorry about that. When reviewing the masking (for samples where it fails), I notice only a small number of pixels with certain 1=foreground values, most of them being |
The way the masking should work right now is that spots will be assigned as Lines 105 to 109 in 042e9a9
There are probably better ways to do this - if you figure something out, any contribution would be much appreciated! One thing to keep in mind with this way of initializing the mask is that it's best to use the raw_feature_bc_matrix from Space Ranger. The filtered matrix does not contain data from spots outside the tissue, so those spots will get filtered out before the masking step here: Line 70 in 042e9a9
This means everything will be assigned as GC_FGD or GC_PR_FGD when using the filtered matrix. I'm not sure if this may be the cause of the issues you are experiencing, but we should probably add a note about this in the README or postpone filtering the tissue_positions_list until after the mask has been created.
|
Thanks for the explanation. That might explain one of the two failure scenarios that I'm seeing. It might be worth looking at the raw_feature_bc_matrix file instead of the filtered one as I'm seeing a lot more So far this has impacted < 20% of my test samples so I'm still able to evaluate a lot of them with the current piece of code. Eventually I'll be getting back to those 20% and using the raw data matrix will be the first thing that I might try out. I will definitely keep you posted! |
Yep, could be the case. Thanks for your help ironing out all the issues so far. Do keep me posted on how it goes! :) |
Hi, thank you for the great method! I try to apply it to my Visium dataset but I got the following warnings for all my samples on conversion step:
[2021-12-29 13:40:49,020] ℹ : Running xfuse version 0.2.1 [2021-12-29 13:40:56,493] ℹ : Computing tissue mask: [2021-12-29 13:40:56,500] ⚠ WARNING : UserWarning (/nfs/users/nfs_p/pm19/.local/lib/python3.9/site-packages/xfuse/utility/mask.py:67): Failed to mask tissue OpenCV(4.5.4) /tmp/pip-req-build-kv0l0wqx/opencv/modules/imgproc/src/grabcut.cpp:386: error: (-215:Assertion failed) !bgdSamples.empty() && !fgdSamples.empty() in function 'initGMMs' [2021-12-29 13:41:07,029] ⚠ WARNING : UserWarning (/nfs/users/nfs_p/pm19/.local/lib/python3.9/site-packages/xfuse/convert/utility.py:217): Count matrix contains duplicated columns. Counts will be summed by column name. [2021-12-29 13:41:09,749] ⚠ WARNING : FutureWarning (/nfs/users/nfs_p/pm19/.local/lib/python3.9/site-packages/xfuse/convert/utility.py:227): Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
I mostly worry about "Failed to mask tissue" warning. In this dataset we instructed spaceranger to consider all spots because tissue autodetection failed to find relatively transparent adipose tissue. Then we manually annotated tissue spots and I introduced this information into tissue-positions file (second column). As far as I can see xfuse ignores this information and attempts to mask tissue internaly, but this procedure fails. Am I right that in this case xfuse considers all spots? At least it looks like this based on manuall inspection of data.h5 file and high intensity of some of metagenes in out-of-tissue regions. May I force xfuse to use tissue mask provided in tissue-positions file?
Is "Count matrix contains duplicated columns" warning about gene names?
Then, when I run xfuse at some points it tells that "Registering experiment: ST (data type: "ST")" while actually it is visium data, is it important or can I just ignore it?
The text was updated successfully, but these errors were encountered: