This nf-based atac-seq pipeline was originally built by nf-core team. And it was then adapted for axolotl genome. Many features were modified or discarded for the sake of simplicity instead of bing comprehensive.
nfcore/atacseq is a bioinformatics analysis pipeline used for ATAC-seq data.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
- Raw read QC (
FastQC
) - Adapter trimming (
Trim Galore!
) - Alignment (
BWA
) - Mark duplicates (
picard
) - Merge alignments from multiple libraries of the same sample (
picard
)- Re-mark duplicates (
picard
) - Filtering to remove:
- reads mapping to mitochondrial DNA (
SAMtools
) - reads mapping to blacklisted regions (
SAMtools
,BEDTools
) - reads that are marked as duplicates (
SAMtools
) - reads that arent marked as primary alignments (
SAMtools
) - reads that are unmapped (
SAMtools
) - reads that map to multiple locations (
SAMtools
) - reads containing > 4 mismatches (
BAMTools
) - reads that are soft-clipped (
BAMTools
) - reads that have an insert size > 2kb (
BAMTools
; paired-end only) - reads that map to different chromosomes (
Pysam
; paired-end only) - reads that arent in FR orientation (
Pysam
; paired-end only) - reads where only one read of the pair fails the above criteria (
Pysam
; paired-end only)
- reads mapping to mitochondrial DNA (
- Alignment-level QC and estimation of library complexity (
picard
,Preseq
) - Create normalised bigWig files scaled to 1 million mapped reads (
BEDTools
,bedGraphToBigWig
) - Generate gene-body meta-profile from bigWig files (
deepTools
) - Calculate genome-wide enrichment (
deepTools
) - Call broad/narrow peaks (
MACS2
) - Annotate peaks relative to gene features (
HOMER
) - Create consensus peakset across all samples and create tabular file to aid in the filtering of the data (
BEDTools
) - Count reads in consensus peaks (
featureCounts
) - Differential accessibility analysis, PCA and clustering (
R
,DESeq2
) - Generate ATAC-seq specific QC html report (
ataqv
)
- Re-mark duplicates (
- Merge filtered alignments across replicates (
picard
)- Re-mark duplicates (
picard
) - Remove duplicate reads (
SAMtools
) - Create normalised bigWig files scaled to 1 million mapped reads (
BEDTools
,bedGraphToBigWig
) - Call broad/narrow peaks (
MACS2
) - Annotate peaks relative to gene features (
HOMER
) - Create consensus peakset across all samples and create tabular file to aid in the filtering of the data (
BEDTools
) - Count reads in consensus peaks relative to merged library-level alignments (
featureCounts
) - Differential accessibility analysis, PCA and clustering (
R
,DESeq2
)
- Re-mark duplicates (
- Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation (
IGV
). - Present QC for raw read, alignment, peak-calling and differential accessibility results (
ataqv
,MultiQC
,R
)
-
Install
nextflow
-
Install either
Docker
orSingularity
for full pipeline reproducibility (please only useConda
as a last resort; see docs) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/atacseq -profile test,<docker/singularity/conda/institute>
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. -
Start running your own analysis!
nextflow run nf-core/atacseq -profile <docker/singularity/conda/institute> --input design.csv --genome GRCh37
See usage docs for all of the available options when running the pipeline.
The nf-core/atacseq pipeline comes with documentation about the pipeline, found in the docs/
directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
The pipeline was originally written by The Bioinformatics & Biostatistics Group for use at The Francis Crick Institute, London.
The pipeline was developed by Harshil Patel.