Skip to content

csangara/thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cell Type Deconvolution in Spatial Transcriptomics

‼️ NEW REPOSITORY: https://github.com/saeyslab/spotless-benchmark

The new repo includes a Nextflow pipeline and 6 additional methods.


In this repository, you can find the analysis scripts and plots pertaining to the dissertation. There are scripts to run and evaluate the five deconvolution methods (cell2location, MuSiC, stereoscope, RCTD, and SPOTlight). Later, I also apply cell2location and RCTD on real data.

Benchmarking

To perform benchmarking, synthetic data has to be created from a reference scRNA-seq dataset. Then, we run different deconvolution methods on the datasets and evaluate them. I made use of seven scRNA-seq datasets to generate synthetic spatial data using the package synthvisium (not yet publicly available). The raw datasets along with the download links are listed below.

Dataset Direct download link
Brain cortex Link
Cerebellum (sc) Link
Cerebellum (sn)
Hippocampus Link
Kidney Link
PBMC Link
SCC (patient 5) Link

(Both cerebellum datasets can be downloaded from the link.)

I did not preprocess the scRNA-seq data myself so I cannot share the scripts here, but the procedure is described in section 5.1 of the text. You can also follow this Seurat vignette for a quick preprocessing of the PBMC data.

Synthetic data generation

As an alternative to synthvisium, you can generate synthetic data using scripts from SPOTlight, stereoscope, or cell2location. Some sample code for running these functions can be found at Scripts/synthetic_data_generation, although the cell2location functions have to be cloned from here.

The countsimQC reports between different synthetic data generation algorithms can be found in the folder countsimQC/.

Running deconvolution methods

Scripts for running the deconvolution methods can be found at Scripts/run_deconv along with a description for using those files. The deconvolution results are compiled in the folder results/.

For scripts to generate downsampled data and get the runtime of each method, check out Scripts/run_deconv_downsample.

Evaluation

Evaluation scripts are found at Scripts/ with the prefix evaluation_. These make use of the deconvolution results saved in results/.

Results folder

The summary files/ folder contains a few spreadsheets, e.g., the p-values of the pairwise Wilcoxon tests and the median RMSE values. Aside from that, the folder structure of results/ is dataset → replicate → method output. Within each replicate folder (rep prefix), you will find:

  • all_metrics*.rds: a file with the computed metrics (RMSE and six classification metrics) for all 8 dataset types
  • plots/: UMAP plots shaded with inferred proportions of each cell type (not in dissertation)
  • corr_distribution/: density plots of the correlation across all spots (not in dissertation)

Plots

Along with high-resolution of the plots found in the thesis, you can also find scripts that are used to generate the plots. The plots are in the directory plots/ and there I try to make a link with the corresponding scripts.

Application on real data

You can find the scripts for preprocessing, running deconvolution tools, and evaluating the liver dataset in the folder Scripts/liver/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published