Skip to content

Commit

Permalink
add some docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ATpoint committed Mar 4, 2022
1 parent 0637b97 commit 458cab4
Showing 1 changed file with 33 additions and 22 deletions.
55 changes: 33 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ This Nextflow pipeline is intended for preprocessing of DNA-seq
experiments such as ATAC-seq, ChIP-seq and CUT&RUN. It currently
performs trimming and alignment of fastq files, filtering of the
resulting BAM files, basic peak calling and a QC assessment by
calculating Fractions Of Reads Per Peaks (FRiPs). Specific CUT&RUN
features might be added in later versions, the defaults should work well
for most input data though.
calculating Fractions Of Reads Per Peaks (FRiPs).

<br>

Expand All @@ -20,31 +18,44 @@ for most input data though.

## Usage

...documentation will follow...
Creating an index for the mapping on our HPC, with 16 cores using SLURM and the `normal` queue.
Output will be in `atac_chip_preprocess_results/bowtie2_idx/` created in the location from which the run is launched.

## Citations
```bash

- [nf-core project](https://nf-co.re/)
NXF_VER=21.04.3 \
nextflow run atpoint/atac_chip_preprocess -r 2.0 -profile singularity,slurm -with-trace -with-report report.html \
--ref_genome /path/to/genome.fa.gz --idx_threads 16 --idx_mem '16.GB' --only_idx --queue 'normal' \
-bg > report.log

- [Ewels et al (2020) The nf-core paper. Nature Biotechnology volume
38, pages
276–278](https://www.nature.com/articles/s41587-020-0439-x)
```

- [Nextflow Docs](https://www.nextflow.io/docs/latest/index.html#)
Mapping a dataset. Note, for single-end data use `--mode single`, for paired-end data use `--mode paired` and for ATAC-seq
add additionally the `--atacseq` flag.

- [Seqera Training](https://seqera.io/training/)
```bash

- [https://github.com/nextflow-io/rnaseq-nf -- The Seqera Labs DSL2
proof-of-concept workflow](https://github.com/nextflow-io/rnaseq-nf)
NXF_VER=21.04.3 \
nextflow run atpoint/atac_chip_preprocess -r 2.0 -profile singularity,slurm -with-trace -with-report report.html \
--idx path/to/idx/idxbasename \
--fastq "$(pwd)"'/*_{1,2}.fastq.gz' \
--align_threads 12 --sort_threads 2 --sort_mem '4G' --queue 'normal' \
-bg > report.log

- [Merkel, D (2014). Docker: lightweight linux containers for
consistent development and deployment. Linux
Journal](https://dl.acm.org/doi/10.5555/2600239.2600241)
```

- [Kurtzer et al (2017) Singularity: Scientific containers for
mobility of compute. PLoS
ONE](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177459)
## Pipeline

- trim reads with `cutadapt`, either the TruSeq adapter (default) or the Nextera adapter (default is using `--atacseq`)
- map trimmed reads with `bowtie2` and mark duplicates with `samblaster`
- remove unmapped reads, alignments with MAPQ < 20, mitochondrial alignments (for ATAC-seq), alignments to non-primary chromosomes,
non-primary and supplementary alignments, PCR duplicates
- call peaks per sample with `macs2`
- calculate FRiPs per sample based on the per-sample peaks
- check insert sizes (for paired-end data)

## Containerization

Use either `-profile docker,singularity` to run via the hardcoded container.
Using `-profile conda` is possible but untested, it will build the environment based on the `environment.yml` file which the container is based on.

- [Grüning et al (2018) Bioconda: sustainable and comprehensive
software distribution for the life sciences. Nat Methods
15:475-476](https://www.nature.com/articles/s41592-018-0046-7)

0 comments on commit 458cab4

Please sign in to comment.