This is the code for a re-analysis of a GEO dataset that I originally analyzed for this paper using statistical methods that were not yet available at the time, such as the csaw Bioconductor package, which provides a principled way to normalize windowed counts of ChIP-Seq reads and test them for differential binding. The original paper only analyzed binding within pre-defined promoter regions. In addition, some improvements have also been made to the RNA-seq analysis using newer features of limma such as quality weights.
This workflow downloads the sequence data and sample metadata from the public GEO/SRA release, so anyone can download and run this code to reproduce the full analysis.
- ChIP-seq
- Mapping with bowtie2
- Peak calling with MACS2 and Epic
- Fetching of blacklists from UCSC
- Generation of greylists from ChIP-Seq input samples
- IDR analysis of blacklist-filtered peak calls
- Computation of cross-correlation function for ChIP-Seq samples, excluding blacklisted regions
- Counting in windows across the genome
- RNA-seq
- Mapping with STAR & HISAT2
- Counting reads aligned to genes
- Alignment-free bias-corrected transcript quantification using Salmon & Kallisto
- Differential gene expression
- Integrating RNA-seq and ChIP-seq
- Gene set tests
- QC Stuff
- mixOmics: http://mixomics.org/
- ica: https://cran.rstudio.com/web/packages/ica/index.html
- Motif enrichment
- pcaExplorer: https://bioconductor.org/packages/release/bioc/html/pcaExplorer.html
- Remove unnecessary library() calls
- Put spaces around equals signs
- Document how to run the pipeline
- Provide install script for R & Python packages.
- ascp Aspera download client for downloading SRA files
- Bedtools
- Bowtie2 aligner
- Epic peak caller
- fastq-tools
- HISAT2 aligner
- IDR python script
- Kallisto RNA-seq quantifier
- MACS2 peak caller
- Picard tools for various file manipulation utilities
- Salmon RNA-seq quantifier (devel version 0.7.3)
- Shoal
- Snakemake for running the workflow
- SRA toolkit for extracting reads from SRA files
- STAR aligner
- UCSC command-line tools (e.g. liftOver)
- R,
Bioconductor, and the following R
packages:
- From CRAN: assertthat, doParallel, dplyr, future, getopt, GGally, ggforce, ggfortify, ggplot2, ks, lazyeval, lubridate, magrittr, MASS, Matrix, openxlsx, optparse, parallel, purrr, RColorBrewer, readr, reshape2, rex, scales, stringi, stringr
- From Bioconductor: annotate, Biobase, BiocParallel, BSgenome.Hsapiens.UCSC.hg19, BSgenome.Hsapiens.UCSC.hg38, ChIPQC, csaw, edgeR, GenomicFeatures, GenomicRanges, GEOquery, limma, org.Hs.eg.db, Rsamtools, Rsubread, rtracklayer, S4Vectors, SRAdb, SummarizedExperiment, TxDb.Hsapiens.UCSC.hg19.knownGene, tximport
- Installed manually: sleuth, wasabi
- Python 3 and the following Python packages: biopython, atomicwrites, numpy, pandas, plac, pysam, rpy2, snakemake