Skip to content

Analysis of omic and imaging data in pancreatic ductal adenocarcinoma distinguishes liver and lung recurrence and patient outcomes

License

Notifications You must be signed in to change notification settings

engjen/Liver_Lung_PDAC

Repository files navigation

Liver_Lung_PDAC

Analysis of patitnet survival, DNA alterations, gene expression, TCR sequencing and immunofluorescence staining versus organotropism.

Code

  • Main analysis notebook to produce figures in paper is found here. Contents:
  1. Detect nuclear foci of replication stress markers pRPA, gH2AX and RAD51.
  2. Calculate mean foci per epithelial cell in primary PDAC tumors and link to organotropism (i.e. liver or lung metastasis)
  3. Calculate fraction of multiplex IHC cell types per tissue and link to organotropism.
  4. Patient metadata. Primary vs met. DDR vs TMB.
  5. CPH modeling CPH forest plots
  6. gene expression analysis
  7. TCR analysis TCR survival
  8. GSVA violins GSEA bar plots
  • Figures made using R
  1. Figure 2 A-C here
  2. Figure 7D here
  3. Supplemental figure 4A here
  4. Supplemental figure 5 here
  • R scripts to develop pORG/pSUB gene sets, calculate PurIST subtyping and GSVA scores, and generate additional plots for figures (There are two scripts responsible for these tasks. The PublicAnalysis.R script and a SupportFunctions.R script.) The main script includes instructions for downloading prerequisite, public data and software from public sources. It then performs the following tasks:
  1. Setup paths and load scripts and data
  2. Apply PurIST subtyping to PDAC samples
  3. Generate pORG/pSUB gene sets and GSVA scores (for pORG/pSUB and Hallmarks gene sets)
  4. Generate OncoPrint plots for figures
  5. Generate some additional Kaplan-Meier plots for figures
  6. Generate *.gct and *.cls files to reproduce GSEA analysis with the Broad GSEA software (also includes notes on settings used for analysis shown in publication)
  • Large files including raw image data, single cell image features, and detailed Adaptive TCRseq and DNA sequence data can be found here.

  • Additional analysis notebook to load Adaptive TCR seq data and calculate TCR seq metrics found here.

  • Immunarch code to generate repertoire overlap found here.

Citation

If utilizing images, data or code, please cite our work: Ongoing Replication Stress Response and New Clonal T Cell Development Discriminate Between Liver and Lung Recurrence Sites and Patient Outcomes in Pancreatic Ductal Adenocarcinoma

Analysis environment

Python

To run the analysis notebooks, install python3/miniconda (installers for Windows, macOS and Linux), and enter the following in the terminal to set up an analysis environment.

conda create -n analysis

conda activate analysis

conda install seaborn pytables pandas ipykernel

conda install -c conda-forge jupyterlab matplotlib scikit-image tifffile statsmodels

pip install statannotations

Finally, clone my repo for processing, visualization and analysis of multiplex imaging data

git clone https://gitlab.com/engje/mplex_image.git

R Packages and Versions

R version 4.1.2 was used with R packages DESeq2, GSVA, msigdbr, gplots, and ggplot. GSEA was run in JAVA using the command line interface.

R version 3.6.3 was used with the edgeR package (v 3.26.8)

Additional R packages used:

  • immunarch (v0.9.0)
  • ClusterProfiler (v4.6.2)
  • immunedeconv (v2.1.0)
  • enrichplot (v1.18.4)
  • Seurat (v4.3.0)
  • pheatmap (v1.0.12)
  • MSigDB database (v7.5.1)
  • FastQC (ver 0.11.8)
  • MultiQC (ver 1.7)
  • trim-galore (ver 0.6.3)
  • kallisto (ver 0.44.0)
  • genome assembly GRCh38.p5
  • gencode (ver 24)
  • CNAtools
  • GenVisR

About

Analysis of omic and imaging data in pancreatic ductal adenocarcinoma distinguishes liver and lung recurrence and patient outcomes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •