Analysis of patitnet survival, DNA alterations, gene expression, TCR sequencing and immunofluorescence staining versus organotropism.
- Main analysis notebook to produce figures in paper is found here. Contents:
- Detect nuclear foci of replication stress markers pRPA, gH2AX and RAD51.
- Calculate mean foci per epithelial cell in primary PDAC tumors and link to organotropism (i.e. liver or lung metastasis)
- Calculate fraction of multiplex IHC cell types per tissue and link to organotropism.
- Patient metadata. Primary vs met. DDR vs TMB.
- CPH modeling CPH forest plots
- gene expression analysis
- TCR analysis TCR survival
- GSVA violins GSEA bar plots
- Figures made using R
- R scripts to develop pORG/pSUB gene sets, calculate PurIST subtyping and GSVA scores, and generate additional plots for figures (There are two scripts responsible for these tasks. The PublicAnalysis.R script and a SupportFunctions.R script.) The main script includes instructions for downloading prerequisite, public data and software from public sources. It then performs the following tasks:
- Setup paths and load scripts and data
- Apply PurIST subtyping to PDAC samples
- Generate pORG/pSUB gene sets and GSVA scores (for pORG/pSUB and Hallmarks gene sets)
- Generate OncoPrint plots for figures
- Generate some additional Kaplan-Meier plots for figures
- Generate *.gct and *.cls files to reproduce GSEA analysis with the Broad GSEA software (also includes notes on settings used for analysis shown in publication)
-
Large files including raw image data, single cell image features, and detailed Adaptive TCRseq and DNA sequence data can be found here.
-
Additional analysis notebook to load Adaptive TCR seq data and calculate TCR seq metrics found here.
-
Immunarch code to generate repertoire overlap found here.
If utilizing images, data or code, please cite our work: Ongoing Replication Stress Response and New Clonal T Cell Development Discriminate Between Liver and Lung Recurrence Sites and Patient Outcomes in Pancreatic Ductal Adenocarcinoma
To run the analysis notebooks, install python3/miniconda (installers for Windows, macOS and Linux), and enter the following in the terminal to set up an analysis
environment.
conda create -n analysis
conda activate analysis
conda install seaborn pytables pandas ipykernel
conda install -c conda-forge jupyterlab matplotlib scikit-image tifffile statsmodels
pip install statannotations
Finally, clone my repo for processing, visualization and analysis of multiplex imaging data
git clone https://gitlab.com/engje/mplex_image.git
R version 4.1.2 was used with R packages DESeq2, GSVA, msigdbr, gplots, and ggplot. GSEA was run in JAVA using the command line interface.
R version 3.6.3 was used with the edgeR package (v 3.26.8)
Additional R packages used:
- immunarch (v0.9.0)
- ClusterProfiler (v4.6.2)
- immunedeconv (v2.1.0)
- enrichplot (v1.18.4)
- Seurat (v4.3.0)
- pheatmap (v1.0.12)
- MSigDB database (v7.5.1)
- FastQC (ver 0.11.8)
- MultiQC (ver 1.7)
- trim-galore (ver 0.6.3)
- kallisto (ver 0.44.0)
- genome assembly GRCh38.p5
- gencode (ver 24)
- CNAtools
- GenVisR