-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
71 lines (60 loc) · 2.97 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
Readme for Doha, Wang et all manuscript submitted to Nature Communications (NCOMMS-22-49559-T)
Included are six .rmd scripts used for analyzing the single cell RNA-seq data used in this manuscript.
Scripts are designed to run sequentially (s1->s2->...s6), and will store their output in the 'analysis_files' directory.
System requirements
>64GB system memory
All code was originally processed using:
Software:
Windows 10 Enterprise (21H2)
Rstudio (2022.07.2, Build 576)
R (4.2.2)
System hardware:
Ryzen 2990WX
128GB System memory
Included code files:
s0_package_installation.rmd
s1_vehicle_stroma_preprocessing.rmd
s2_qc_integration.rmd
s3_cluster_optimization
s4_celltype_definition.rmd
s5_human_tnbc_integration_rliger.rmd
s6_manuscript_figures.rmd
Other files:
readme.txt : readme file
library_metadata : Metadata file used to associate treatment / hashtag / phenotype data with sequencing results
s{1:6}*.html : .html output of associated code file when knitted.
Code descriptions:
s0: Installation for all required analysis packages.
s1: Load UMI count matrices from 10X cellranger output, use Soupx to remove contaminating transcripts, perform hashtag demultiplexing, identify doublets with DoubletFinder and combine all libraries to a single .rds file.
s2: Combine libraries into a single Seurat object, normalize/dimensionality reduction / cluster without integration. Compute QC metrics, and assign cell cycle. Filter to high quality singlets, and then integrate with rLiger and harmony.
s3: Clustering resolution sweep (Leiden algorithm) and optimize resolution to minimize RMSE and maximize approximate silhouette width. Compute DEGs across optimized clusters and assign celltype lineage based on canonical markers. Subset proliferative population and assign lineage.
s4: Compute DEGs across clusters within lineage, and perform gene enrichment analysis. Assign cluster label.
s5: Train a classifier on celltype identified in Wu et al (Nat.Genet. 2021) and apply to MycPten;fl data. Integrate Wu et al human data set with MycPten;fl data using iNMF (Rliger).
s6: Visualize results for manuscript main and supplemental figures.
Code runtime estimate for standard desktop:
s0: ~10 minutes
s1: ~3 hours
s2: ~12 hours
s3: ~3 hours
s4: ~1 hour
s5: ~4 hours
s6: ~10 minutes
Required R packages (version used, source)
Matrix (1.5-1, CRAN)
tidyverse (1.3.2, CRAN)
Seurat (4.3.0, CRAN)
ggalluvial (0.12.3, CRAN)
harmony (0.1.1, CRAN)
SoupX (1.6.2, CRAN)
cluster (2.1.4, CRAN)
clusterProfiler (4.4.4, Bioconductor)
org.Mm.eg.db (3.16, Bioconductor)
bluster (1.6.0, Bioconductor)
enrichplot (1.16.2, Bioconductor)
rliger (1.0.0, github: 'welch-lab/liger')
DoubletFinder (2.0.3, github: 'chris-mcginnis-ucsf/DoubletFinder')
SeuratWrappers (0.3.0, github: 'satijalab/seurat-wrappers')
nichenetr (1.1.0, github: 'saeyslab/nichenetr')
scPred (1.9.2, github: 'immunogenomics/scPred')
Demo data:
s1_seurat_list_subset.rds (A subset of 50 cells per library, produced by s1. ~30MB uncompressed)