diff --git a/.gitattributes b/.gitattributes index e00a13e..d81f2df 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1 +1,2 @@ -*.nf linguist-language=Groovy \ No newline at end of file +*.nf linguist-language=Groovy +*.config linguist-language=Groovy \ No newline at end of file diff --git a/.gitignore b/.gitignore index 2280015..6272424 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,5 @@ .nextflow* work/ -data/ \ No newline at end of file +data/ +results/ +.DS_Store \ No newline at end of file diff --git a/README.md b/README.md index 1067e0e..aaeaa9c 100644 --- a/README.md +++ b/README.md @@ -14,53 +14,62 @@ the results files. ## Installation ### NextFlow installation -To use this pipeline, you need to have a working version of NextFlow installed. You can find more -information about this pipeline tool at [nextflow.io](http://www.nextflow.io/). The typical installation -of NextFlow looks like this: +See https://github.com/SciLifeLab/NGI-NextflowDocs for instructions on how to install and configure +Nextflow. -``` -curl -fsSL get.nextflow.io | bash -mv ./nextflow ~/bin -``` +### Pipeline installation +This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub when run if +`SciLifeLab/NGI-MethylSeq` is specified as the pipeline name. -#### UPPMAX -If you're running on a Swedish UPPMAX cluster you can load NextFlow as an environment module instead: +If you prefer, you can download the files yourself from GitHub and run them directly: ``` -module load nextflow +git clone https://github.com/SciLifeLab/NGI-MethylSeq.git +nextflow NGI-MethylSeq/main.nf ``` -The first time you load this you will get a warning about setting environment variables. To automatically set these at login, you can add the following lines to your `~/.bashrc` file: -```bash -export NXF_LAUNCHBASE=$SNIC_TMP -export NXF_TEMP=$SNIC_TMP +## Configuration +By default, the pipeline is configured to run on the Swedish UPPMAX cluster (milou / irma). + +You will need to specify your UPPMAX project ID when running a pipeline. To do this, use +the command line flag `--project `. + +To avoid having to specify this every time you run Nextflow, you can add it to your +personal Nextflow config file instead. Add this line to `~/.nextflow/config`: + +```groovy +params.project = 'project_ID' ``` -### NextFlow configuration -Next, you need to set up a config file so that NextFlow knows how to run and where to find reference -indexes. You can find an example configuration file for UPPMAX (milou) with this repository: -[`example_uppmax_config`](https://github.com/SciLifeLab/NGI-MethylSeq/blob/master/example_uppmax_config). +The pipeline will exit with an error message if you try to run it pipeline with the default +UPPMAX config profile and don't set project. -Copy this file to `~/.nextflow/config` and edit the line `'-A YOUR_PROJECT_ID'` to contain your -UPPMAX project identifier. -It is entirely possible to run this pipeline on other clusters - just note that you may need to customise -the `process` environment (eg. if you're using a cluster system other than SLURM) and the paths to reference -files. +### Running on other clusters +It is entirely possible to run this pipeline on other clusters, though you will need to set up +your own config file so that the script knows where to find your reference files and how your +cluster works. -### Pipeline installation -This pipeline itself needs no installation - NextFlow will automatically fetch it from GitHub when run if -`SciLifeLab/NGI-MethylSeq` is specified as the pipeline name. +Copy the contents of [`conf/uppmax.config`](conf/uppmax.config) to your own config file somewhere +and then reference it with `-c` when running the pipeline. + +If you think that there are other people using the pipeline who would benefit from your configuration +(eg. other common cluster setups), please let us know. It should be easy to create a new config file +in `conf` and reference this as a named profile in [`nextflow.config`](nextflow.config). Then these +configuration options can be used by specifying `-profile ` when running the pipeline. -If you prefer, you can download the files yourself from GitHub and run them directly: -``` -git clone https://github.com/SciLifeLab/NGI-MethylSeq.git -nextflow NGI-MethylSeq/main.nf -``` ## Running the pipeline The typical command for running the pipeline is as follows: ``` -nextflow SciLifeLab/NGI-MethylSeq --reads '*_R{1,2}.fastq.gz' --genome GRCm38 +nextflow SciLifeLab/NGI-MethylSeq --reads '*_R{1,2}.fastq.gz' --genome GRCh37 +``` + +Note that the pipeline will create files in your working directory: +```bash +work # Directory containing the nextflow working files +results # Finished results (configurable, see below) +.nextflow_log # Log file from Nextflow +# Other nextflow hidden files, eg. history of pipeline runs and old logs. ``` ### `--reads` @@ -69,6 +78,8 @@ Location of the input FastQ files: --reads 'path/to/data/sample_*_{1,2}.fastq' ``` +**NB: Must be enclosed in quotes!** + Note that the `{1,2}` parentheses are required to specify paired end data. Running `--reads '*.fastq'` will treat all files as single end. Also, note that the file path should be in quotation marks to prevent shell glob expansion. @@ -76,13 +87,26 @@ If left unspecified, the pipeline will assume that the data is in a directory ca ### `--genome` The reference genome to use of the analysis, needs to be one of the genome specified in the config file. -The human `GRCh37` genome is set as default. + +See [`conf/uppmax.config`](conf/uppmax.config) for a list of the supported reference genomes +and their keys. Common genomes that are supported are: + +* Human + * `--genome GRCh37` +* Mouse + * `--genome GRCm38` +* Drosophila + * `--genome BDGP6` +* _S. cerevisiae_ + * `--genome 'R64-1-1'` + +> There are numerous others - check the config file for more. + +If you usually want to work with a single species, you can set a default in your user config file. +For example, add this line to `~/.nextflow/config`: ``` ---genome 'GRCm38' +params.genome = 'GRCh37' ``` -The `example_uppmax_config` file currently has the location of references for most of the -[Illumina iGenomes](http://support.illumina.com/sequencing/sequencing_software/igenome.html) -held on UPPMAX. ### Trimming Parameters The pipeline accepts a number of parameters to change how the trimming is done, according to your data type. @@ -105,7 +129,7 @@ You can specify custom trimming parameters as follows: Finally, specifying `--rrbs` will pass on the `--rrbs` parameter to TrimGalore! -## Bismark Parameters +### Bismark Parameters Using the `--pbat` parameter will affect the trimming (see above) and also set the `--pbat` flag when aligning with Bismark. @@ -114,6 +138,19 @@ This can also be set with `--non_directional` (doesn't affect trimming). Use the `--unmapped` flag to set the `--unmapped` flag with Bismark align and save the unmapped reads. +### Deduplication +By default, the pipeline includes a deduplication step after alignment. If you would like to skip this +step (eg. for RRBS data), use the `--nodedup` command line option. + +### `--bismark_index` +If you prefer, you can specify the full path to your reference genome when you run the pipeline: +``` +--bismark_index [path to Bismark index] +``` + +### `--outdir` +The output directory where the results will be saved. + ### `-c` Specify the path to a specific config file (this is a core NextFlow command). Useful if using different UPPMAX projects or different sets of reference genomes. diff --git a/bwa-meth.nf b/bwa-meth.nf new file mode 100644 index 0000000..9605e27 --- /dev/null +++ b/bwa-meth.nf @@ -0,0 +1,441 @@ +#!/usr/bin/env nextflow +/* +vim: syntax=groovy +-*- mode: groovy;-*- +======================================================================================== + B S - S E Q M E T H Y L A T I O N : B W A - M E T H +======================================================================================== + Methylation (BS-Seq) Analysis Pipeline using bwa-meth. Started November 2016. + #### Homepage / Documentation + https://github.com/SciLifeLab/NGI-MethylSeq + #### Authors + Phil Ewels +---------------------------------------------------------------------------------------- +*/ + + +/* + * SET UP CONFIGURATION VARIABLES + */ + +// Pipeline version +version = 0.1 + +// Configurable variables +params.project = false +params.genome = false +params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false +params.fasta_index = params.genome ? params.genomes[ params.genome ].fasta_index ?: false : false +params.bwa_meth_index = params.genome ? params.genomes[ params.genome ].bwa_meth ?: false : false +params.saveReference = true +params.reads = "data/*_R{1,2}.fastq.gz" +params.outdir = './results' +params.notrim = false +params.nodedup = false +params.allcontexts = false +params.mindepth = 0 +params.ignoreFlags = false + +// Validate inputs +if( params.bwa_meth_index ){ + bwa_meth_index = file("${params.bwa_meth_index}.bwameth.c2t.bwt") + bwa_meth_indices = Channel.fromPath( "${params.bwa_meth_index}*" ).toList() + if( !bwa_meth_index.exists() ) exit 1, "bwa-meth index not found: ${params.bwa_meth_index}" +} +if( params.fasta_index ){ + fasta_index = file(params.fasta_index) + if( !fasta.exists() ) exit 1, "Fasta file not found: ${params.fasta_index}" +} +if ( params.fasta ){ + fasta = file(params.fasta) + if( !fasta.exists() ) exit 1, "Fasta file not found: ${params.fasta}" +} else { + exit 1, "No reference Fasta file specified! Please use --fasta" +} + +params.pbat = false +params.single_cell = false +params.epignome = false +params.accel = false +params.cegx = false +if(params.pbat){ + params.clip_r1 = 6 + params.clip_r2 = 6 + params.three_prime_clip_r1 = 0 + params.three_prime_clip_r2 = 0 +} else if(params.single_cell){ + params.clip_r1 = 9 + params.clip_r2 = 9 + params.three_prime_clip_r1 = 0 + params.three_prime_clip_r2 = 0 +} else if(params.epignome){ + params.clip_r1 = 6 + params.clip_r2 = 6 + params.three_prime_clip_r1 = 6 + params.three_prime_clip_r2 = 6 +} else if(params.accel){ + params.clip_r1 = 10 + params.clip_r2 = 15 + params.three_prime_clip_r1 = 10 + params.three_prime_clip_r2 = 10 +} else if(params.cegx){ + params.clip_r1 = 6 + params.clip_r2 = 6 + params.three_prime_clip_r1 = 2 + params.three_prime_clip_r2 = 2 +} else { + params.clip_r1 = 0 + params.clip_r2 = 0 + params.three_prime_clip_r1 = 0 + params.three_prime_clip_r2 = 0 +} + +def single + +log.info "===================================================" +log.info " NGI-MethylSeq : Bisulfite-Seq BWA-Meth v${version}" +log.info "===================================================" +log.info "Reads : ${params.reads}" +log.info "Genome : ${params.genome}" +log.info "Bismark Index : ${params.bismark_index}" +log.info "Current home : $HOME" +log.info "Current user : $USER" +log.info "Current path : $PWD" +log.info "Script dir : $baseDir" +log.info "Working dir : $workDir" +log.info "Output dir : ${params.outdir}" +log.info "---------------------------------------------------" +if(params.rrbs){ log.info "RRBS Mode : On" } +log.info "Deduplication : ${params.nodedup ? 'No' : 'Yes'}" +log.info "PileOMeth : C Contexts - ${params.allcontexts ? 'All (CpG, CHG, CHH)' : 'CpG only'}" +log.info "PileOMeth : Minimum Depth - ${params.mindepth}" +if(params.ignoreFlags){ log.info "PileOMeth: : Ignoring SAM Flags" } +log.info "---------------------------------------------------" +if(params.notrim){ log.info "Trimming Step : Skipped" } +if(params.pbat){ log.info "Trim Profile : PBAT" } +if(params.single_cell){ log.info "Trim Profile : Single Cell" } +if(params.epignome){ log.info "Trim Profile : Epignome" } +if(params.accel){ log.info "Trim Profile : Accel" } +if(params.cegx){ log.info "Trim Profile : CEGX" } +if(params.clip_r1 > 0) log.info "Trim R1 : ${params.clip_r1}" +if(params.clip_r2 > 0) log.info "Trim R2 : ${params.clip_r2}" +if(params.three_prime_clip_r1 > 0) log.info "Trim 3' R1 : ${params.three_prime_clip_r1}" +if(params.three_prime_clip_r2 > 0) log.info "Trim 3' R2 : ${params.three_prime_clip_r2}" +log.info "---------------------------------------------------" +log.info "Config Profile : ${workflow.profile}" +if(params.project) log.info "UPPMAX Project : ${params.project}" +log.info "===================================================" + +// Validate inputs +if( workflow.profile == 'standard' && !params.project ) exit 1, "No UPPMAX project ID found! Use --project" + +/* + * Create a channel for input read files + */ +Channel + .fromFilePairs( params.reads, size: -1 ) + .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}" } + .into { read_files_fastqc; read_files_trimming } + + + +/* + * PREPROCESSING - Build bwa-mem index + */ +if(!params.bwa_meth_index){ + process makeBwaMemIndex { + tag fasta + publishDir path: "${params.outdir}/reference_genome", saveAs: { params.saveReference ? it : null }, mode: 'copy' + + input: + file fasta from fasta + + output: + file "${fasta}.bwameth.c2t.bwt" into bwa_meth_index + file "${fasta}*" into bwa_meth_indices + + script: + """ + bwameth.py index $fasta + """ + } +} + +/* + * PREPROCESSING - Index Fasta file + */ +if(!params.fasta_index){ + process makeFastaIndex { + tag fasta + publishDir path: "${params.outdir}/reference_genome", saveAs: { params.saveReference ? it : null }, mode: 'copy' + + input: + file fasta + + output: + file "${fasta}.fai" into fasta_index + + script: + """ + samtools faidx $fasta + """ + } +} + + + +/* + * STEP 1 - FastQC + */ +process fastqc { + tag "$name" + publishDir "${params.outdir}/fastqc", mode: 'copy' + + input: + set val(name), file(reads) from read_files_fastqc + + output: + file '*_fastqc.{zip,html}' into fastqc_results + + script: + """ + fastqc -q $reads + """ +} + +/* + * STEP 2 - Trim Galore! + */ +if(params.notrim){ + trimmed_reads = read_files_trimming + trimgalore_results = [] +} else { + process trim_galore { + tag "$name" + publishDir "${params.outdir}/trim_galore", mode: 'copy' + + input: + set val(name), file(reads) from read_files_trimming + + output: + set val(name), file('*fq.gz') into trimmed_reads + file '*trimming_report.txt' into trimgalore_results + + script: + single = reads instanceof Path + c_r1 = params.clip_r1 > 0 ? "--clip_r1 ${params.clip_r1}" : '' + c_r2 = params.clip_r2 > 0 ? "--clip_r2 ${params.clip_r2}" : '' + tpc_r1 = params.three_prime_clip_r1 > 0 ? "--three_prime_clip_r1 ${params.three_prime_clip_r1}" : '' + tpc_r2 = params.three_prime_clip_r2 > 0 ? "--three_prime_clip_r2 ${params.three_prime_clip_r2}" : '' + rrbs = params.rrbs ? "--rrbs" : '' + if (single) { + """ + trim_galore --gzip $rrbs $c_r1 $tpc_r1 $reads + """ + } else { + """ + trim_galore --paired --gzip $rrbs $c_r1 $c_r2 $tpc_r1 $tpc_r2 $reads + """ + } + } +} + +/* + * STEP 3 - align with bwa-mem + */ +process bwamem_align { + tag "$name" + publishDir "${params.outdir}/bwa-mem_alignments", mode: 'copy' + + input: + set val(name), file(reads) from trimmed_reads + file index from bwa_meth_index.first() + file bwa_meth_indices from bwa_meth_indices.first() + + output: + file '*.bam' into bam_aligned, bam_flagstat + + script: + fasta = index.toString() - '.bwameth.c2t.bwt' + prefix = reads[0].toString() - ~/(_R1)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/ + """ + set -o pipefail # Capture exit codes from bwa-meth + bwameth.py \\ + --threads ${task.cpus} \\ + --reference $fasta \\ + $reads | samtools view -bS - > ${prefix}.bam + """ +} + +/* + * STEP 4.1 - samtools flagstat on samples + */ +process samtools_flagstat { + tag "${bam.baseName}" + publishDir "${params.outdir}/bwa-mem_alignments", mode: 'copy' + + input: + file bam from bam_flagstat + + output: + file "${bam.baseName}_flagstat.txt" into flagstat_results + file "${bam.baseName}_stats.txt" into samtools_stats_results + + script: + """ + samtools flagstat $bam > ${bam.baseName}_flagstat.txt + samtools stats $bam > ${bam.baseName}_stats.txt + """ +} +/* + * STEP 4.2 - sort and index alignments + */ +process samtools_sort { + tag "${bam.baseName}" + publishDir "${params.outdir}/bwa-mem_alignments_sorted", mode: 'copy' + + executor 'local' + + input: + file bam from bam_aligned + + output: + file "${bam.baseName}.sorted.bam" into bam_sorted, bam_for_index + + script: + """ + samtools sort \\ + $bam + -m ${task.memory.toBytes() / task.cpus} \\ + -@ ${task.cpus} \\ + > ${bam.baseName}.sorted.bam + """ +} +/* + * STEP 4.3 - sort and index alignments + */ +process samtools_index { + tag "${bam.baseName}" + publishDir "${params.outdir}/bwa-mem_alignments_sorted", mode: 'copy' + + input: + file bam from bam_for_index + + output: + file "${bam}.bai" into bam_index + + script: + """ + samtools index $bam + """ +} + + +/* + * STEP 5 - Mark duplicates + */ +process markDuplicates { + tag "${bam.baseName}" + publishDir "${params.outdir}/bwa-mem_markDuplicates", mode: 'copy' + + input: + file bam from bam_sorted + + output: + file "${bam.baseName}.markDups.bam" into bam_md, bam_md_qualimap + file "${bam.baseName}.markDups_metrics.txt" into picard_results + + script: + """ + java -Xmx2g -jar \$PICARD_HOME/picard.jar MarkDuplicates \\ + INPUT=$bam \\ + OUTPUT=${bam.baseName}.markDups.bam \\ + METRICS_FILE=${bam.baseName}.markDups_metrics.txt \\ + REMOVE_DUPLICATES=false \\ + ASSUME_SORTED=true \\ + PROGRAM_RECORD_ID='null' \\ + VALIDATION_STRINGENCY=LENIENT + + # Print version number to standard out + echo "File name: $bam Picard version "\$(java -Xmx2g -jar \$PICARD_HOME/picard.jar MarkDuplicates --version 2>&1) + """ +} + + +/* + * STEP 6 - extract methylation with PileOMeth + */ +process pileOMeth { + tag "${bam.baseName}" + publishDir "${params.outdir}/PileOMeth", mode: 'copy' + + input: + file bam from bam_md + file fasta from fasta + file fasta_index from fasta + + output: + file '*' into pileometh_results + + script: + allcontexts = params.allcontexts ? '--CHG --CHH' : '' + mindepth = params.mindepth > 0 ? "--minDepth ${params.mindepth}" : '' + ignoreFlags = params.ignoreFlags ? "--ignoreFlags" : '' + """ + PileOMeth extract $allcontexts $ignoreFlags $mindepth $fasta $bam + PileOMeth mbias $allcontexts $ignoreFlags $fasta $bam ${bam.baseName} + """ +} + +/* + * STEP 7 - Qualimap + */ +process qualimap { + tag "${bam.baseName}" + publishDir "${params.outdir}/Qualimap", mode: 'copy' + + input: + file bam from bam_md_qualimap + + output: + file '${bam.baseName}_qualimap' into qualimap_results + + script: + gcref = params.genome == 'GRCh37' ? '-gd HUMAN' : '' + gcref = params.genome == 'GRCm38' ? '-gd MOUSE' : '' + """ + samtools sort $bam -o ${bam.baseName}.sorted.bam + qualimap bamqc $gcref \\ + -bam ${bam.baseName}.sorted.bam \\ + -outdir ${bam.baseName}_qualimap \\ + --skip-duplicated \\ + --collect-overlap-pairs \\ + --java-mem-size=${task.memory.toGiga()}G \\ + -nt ${task.cpus} + """ +} + +/* + * STEP 8 - MultiQC + */ +process multiqc { + publishDir "${params.outdir}/MultiQC", mode: 'copy' + + input: + file ('fastqc/*') from fastqc_results.flatten().toList() + file ('trimgalore/*') from trimgalore_results.flatten().toList() + file ('samtools/*') from flagstat_results.flatten().toList() + file ('samtools/*') from samtools_stats_results.flatten().toList() + file ('picard/*') from picard_results.flatten().toList() + file ('pileometh/*') from pileometh_results.flatten().toList() + file ('qualimap/*') from qualimap_results.flatten().toList() + + output: + file '*multiqc_report.html' + file '*multiqc_data' + + script: + """ + multiqc -f . + """ +} diff --git a/conf/uppmax-devel.config b/conf/uppmax-devel.config new file mode 100644 index 0000000..9b4f7cd --- /dev/null +++ b/conf/uppmax-devel.config @@ -0,0 +1,199 @@ +/* ------------------------------------------------- + * Example Nextflow config file for UPPMAX (milou) + * ------------------------------------------------- + * Defines reference genomes, using iGenome paths + * Should be saved either with Nextflow installation or + * as file ~/.nextflow/config + */ + + +process { + executor = 'slurm' + queue = 'devcore' + cpus = { 1 * task.attempt } + memory = { 8.GB * task.attempt } + time = 1.h + clusterOptions = { "-A $params.project " + (params.clusterOptions ?: '') } + + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'finish' } + maxRetries = 3 + maxErrors = '-1' + + // Environment modules and resource requirements + $fastqc { + module = ['bioinfo-tools', 'FastQC'] + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + $trim_galore { + module = ['bioinfo-tools', 'TrimGalore'] + cpus = { 2 * task.attempt } + memory = { 16.GB * task.attempt } + } + $bismark_align { + module = ['bioinfo-tools', 'samtools/1.3', 'bismark'] + cpus = { 8 * task.attempt } + memory = { 64.GB * task.attempt } + } + $bismark_deduplicate { + module = ['bioinfo-tools', 'samtools/1.3', 'bismark'] + cpus = { 8 * task.attempt } + memory = { 64.GB * task.attempt } + } + $bismark_methXtract { + module = ['bioinfo-tools', 'samtools/1.3', 'bismark'] + cpus = { 4 * task.attempt } + memory = { 32.GB * task.attempt } + } + $bismark_report { + module = ['bioinfo-tools', 'bismark'] + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + $bismark_summary { + module = ['bioinfo-tools', 'bismark'] + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + $qualimap { + module = ['bioinfo-tools', 'samtools/1.3', 'QualiMap'] + cpus = { 4 * task.attempt } + memory = { 32.GB * task.attempt } + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + // NB: Overwrite this in your user config file (~/.nextflow/config) + // if you have your own installation of MultiQC outside of the environment module system. + // eg: Add the line: params.$multiqc.module = '' + $multiqc { + module = ['bioinfo-tools', 'MultiQC'] + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + + $makeBwaMemIndex { + module = ['bioinfo-tools', 'bwa', 'bwa-meth', 'samtools/1.3'] + } + $makeFastaIndex { + module = ['bioinfo-tools', 'samtools/1.3'] + } + $bwamem_align { + module = ['bioinfo-tools', 'bwa', 'bwa-meth', 'samtools/1.3'] + cpus = { 8 * task.attempt } + memory = { 64.GB * task.attempt } + } + $samtools_flagstat { + module = ['bioinfo-tools', 'samtools/1.3'] + } + $samtools_sort { + module = ['bioinfo-tools', 'samtools/1.3'] + cpus = { 4 * task.attempt } + memory = { 32.GB * task.attempt } + } + $samtools_index { + module = ['bioinfo-tools', 'samtools/1.3'] + } + $markDuplicates { + module = ['bioinfo-tools', 'picard/2.0.1'] + cpus = { 4 * task.attempt } + memory = { 32.GB * task.attempt } + } + $pileOMeth { + module = ['bioinfo-tools', 'PileOMeth'] + cpus = { 6 * task.attempt } + memory = { 48.GB * task.attempt } + } + + +} + +params { + clusterOptions = false + params.saveReference = true + // illumina iGenomes reference file paths on UPPMAX + genomes { + 'GRCh37' { + bismark = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta' + } + 'GRCm38' { + bismark = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta' + } + 'TAIR10' { + bismark = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta' + } + 'EB2' { + bismark = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta' + } + 'UMD3.1' { + bismark = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta' + } + 'WBcel235' { + bismark = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta' + } + 'CanFam3.1' { + bismark = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta' + } + 'GRCz10' { + bismark = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta' + } + 'BDGP6' { + bismark = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta' + } + 'EquCab2' { + bismark = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta' + } + 'EB1' { + bismark = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta' + } + 'Galgal4' { + bismark = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta' + } + 'Gm01' { + bismark = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta' + } + 'Mmul_1' { + bismark = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta' + } + 'IRGSP-1.0' { + bismark = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta' + } + 'CHIMP2.1.4' { + bismark = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta' + } + 'Rnor_6.0' { + bismark = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta' + } + 'R64-1-1' { + bismark = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta' + } + 'EF2' { + bismark = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta' + } + 'Sbi1' { + bismark = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta' + } + 'Sscrofa10.2' { + bismark = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta' + } + 'AGPv3' { + bismark = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta' + } + } +} diff --git a/conf/uppmax.config b/conf/uppmax.config new file mode 100644 index 0000000..f422cf8 --- /dev/null +++ b/conf/uppmax.config @@ -0,0 +1,205 @@ +/* ------------------------------------------------- + * Example Nextflow config file for UPPMAX (milou) + * ------------------------------------------------- + * Defines reference genomes, using iGenome paths + * Should be saved either with Nextflow installation or + * as file ~/.nextflow/config + */ + + +process { + executor = 'slurm' + cpus = { 1 * task.attempt } + memory = { 8.GB * task.attempt } + time = { 2.h * task.attempt } + clusterOptions = { "-A $params.project " + (params.clusterOptions ?: '') } + + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'finish' } + maxRetries = 3 + maxErrors = '-1' + + // Environment modules and resource requirements + $fastqc { + module = ['bioinfo-tools', 'FastQC'] + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + $trim_galore { + module = ['bioinfo-tools', 'TrimGalore'] + cpus = { 2 * task.attempt } + memory = { 16.GB * task.attempt } + time = { 12.h * task.attempt } + } + $bismark_align { + module = ['bioinfo-tools', 'samtools/1.3', 'bismark'] + cpus = { 8 * task.attempt } + memory = { 64.GB * task.attempt } + time = { 36.h * task.attempt } + } + $bismark_deduplicate { + module = ['bioinfo-tools', 'samtools/1.3', 'bismark'] + cpus = { 8 * task.attempt } + memory = { 64.GB * task.attempt } + time = { 12.h * task.attempt } + } + $bismark_methXtract { + module = ['bioinfo-tools', 'samtools/1.3', 'bismark'] + cpus = { 4 * task.attempt } + memory = { 32.GB * task.attempt } + time = { 8.h * task.attempt } + } + $bismark_report { + module = ['bioinfo-tools', 'bismark'] + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + $bismark_summary { + module = ['bioinfo-tools', 'bismark'] + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + $qualimap { + module = ['bioinfo-tools', 'samtools/1.3', 'QualiMap'] + cpus = { 4 * task.attempt } + memory = { 32.GB * task.attempt } + time = { 6.h * task.attempt } + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + // NB: Overwrite this in your user config file (~/.nextflow/config) + // if you have your own installation of MultiQC outside of the environment module system. + // eg: Add the line: params.$multiqc.module = '' + $multiqc { + module = ['bioinfo-tools', 'MultiQC'] + errorStrategy = { ( task.exitStatus == 143 || task.exitStatus == 137 ) ? 'retry' : 'ignore' } + } + + $makeBwaMemIndex { + module = ['bioinfo-tools', 'bwa', 'bwa-meth', 'samtools/1.3'] + } + $makeFastaIndex { + module = ['bioinfo-tools', 'samtools/1.3'] + } + $bwamem_align { + module = ['bioinfo-tools', 'bwa', 'bwa-meth', 'samtools/1.3'] + cpus = { 8 * task.attempt } + memory = { 64.GB * task.attempt } + time = { 24.h * task.attempt } + } + $samtools_flagstat { + module = ['bioinfo-tools', 'samtools/1.3'] + } + $samtools_sort { + module = ['bioinfo-tools', 'samtools/1.3'] + cpus = { 4 * task.attempt } + memory = { 32.GB * task.attempt } + time = { 8.h * task.attempt } + } + $samtools_index { + module = ['bioinfo-tools', 'samtools/1.3'] + } + $markDuplicates { + module = ['bioinfo-tools', 'picard/2.0.1'] + cpus = { 4 * task.attempt } + memory = { 32.GB * task.attempt } + } + $pileOMeth { + module = ['bioinfo-tools', 'PileOMeth'] + cpus = { 6 * task.attempt } + memory = { 48.GB * task.attempt } + } + + +} + +params { + clusterOptions = false + params.saveReference = true + // illumina iGenomes reference file paths on UPPMAX + genomes { + 'GRCh37' { + bismark = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta' + } + 'GRCm38' { + bismark = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta' + } + 'TAIR10' { + bismark = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta' + } + 'EB2' { + bismark = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta' + } + 'UMD3.1' { + bismark = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta' + } + 'WBcel235' { + bismark = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta' + } + 'CanFam3.1' { + bismark = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta' + } + 'GRCz10' { + bismark = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta' + } + 'BDGP6' { + bismark = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta' + } + 'EquCab2' { + bismark = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta' + } + 'EB1' { + bismark = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta' + } + 'Galgal4' { + bismark = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta' + } + 'Gm01' { + bismark = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta' + } + 'Mmul_1' { + bismark = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta' + } + 'IRGSP-1.0' { + bismark = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta' + } + 'CHIMP2.1.4' { + bismark = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta' + } + 'Rnor_6.0' { + bismark = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta' + } + 'R64-1-1' { + bismark = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta' + } + 'EF2' { + bismark = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta' + } + 'Sbi1' { + bismark = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta' + } + 'Sscrofa10.2' { + bismark = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta' + } + 'AGPv3' { + bismark = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex' + fasta = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta' + } + } +} diff --git a/example_uppmax_config b/example_uppmax_config deleted file mode 100644 index 0f64015..0000000 --- a/example_uppmax_config +++ /dev/null @@ -1,243 +0,0 @@ -/* ------------------------------------------------- - * Example Nextflow config file for UPPMAX (milou) - * ------------------------------------------------- - * Defines reference genomes, using iGenome paths - * Should be saved either with Nextflow installation or - * as file ~/.nextflow/config - */ - - -process { - executor = 'slurm' - cpus = 1 - memory = '16 GB' - time = '48h' - clusterOptions { - '-A YOUR_PROJECT_ID' - } -} - -params { - genomes { - 'GRCh37' { - bed12 = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/' - } - 'GRCm38' { - bed12 = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/' - } - 'TAIR10' { - bed12 = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/' - } - 'EB2' { - bed12 = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/' - } - 'UMD3.1' { - bed12 = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/' - } - 'WBcel235' { - bed12 = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/' - } - 'CanFam3.1' { - bed12 = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/' - } - 'GRCz10' { - bed12 = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/' - } - 'BDGP6' { - bed12 = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/' - } - 'EquCab2' { - bed12 = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/' - } - 'EB1' { - bed12 = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/' - } - 'Galgal4' { - bed12 = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/' - } - 'Gm01' { - bed12 = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/' - } - 'Mmul_1' { - bed12 = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/' - } - 'IRGSP-1.0' { - bed12 = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/' - } - 'CHIMP2.1.4' { - bed12 = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/' - } - 'Rnor_6.0' { - bed12 = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/' - } - 'R64-1-1' { - bed12 = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/' - } - 'EF2' { - bed12 = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/' - } - 'Sbi1' { - bed12 = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/' - } - 'Sscrofa10.2' { - bed12 = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/' - } - 'AGPv3' { - bed12 = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed' - bismark = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex' - bowtie = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/BowtieIndex/genome' - bowtie2 = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/genome' - bwa = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa' - fasta = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta' - gtf = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf' - star = '/sw/data/uppnex/igenomes/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/' - } - } -} diff --git a/main.nf b/main.nf index 8a69f04..57a2c03 100644 --- a/main.nf +++ b/main.nf @@ -6,41 +6,14 @@ vim: syntax=groovy B S - S E Q M E T H Y L A T I O N B E S T - P R A C T I C E ======================================================================================== New Methylation (BS-Seq) Best Practice Analysis Pipeline. Started June 2016. - @Authors + #### Homepage / Documentation + https://github.com/SciLifeLab/NGI-MethylSeq + #### Authors Phil Ewels ----------------------------------------------------------------------------------------- - Basic command: - $ nextflow main.nf - - Pipeline variables can be configured with the following command line options: - --genome [ID] (default: GRCh37) - --index [path] (default: set by genome ID in config) - --reads [path] (default: data/*{_1,_2}*.fastq.gz) - --outdir [path] (default: ./results) - --name [str] (default: BS-Seq Best Practice) - - For example: - $ nextflow main.nf --reads 'path/to/data/sample_*_{1,2}.fq.gz' --genome GRCm38 ---------------------------------------------------------------------------------------- -The pipeline can determine whether the input data is single or paired end. This relies on -specifying the input files correctly. For paired en data us the example above, i.e. -'sample_*_{1,2}.fastq.gz'. Without the glob {1,2} (or similiar) the data will be treated -as single end. ----------------------------------------------------------------------------------------- - Pipeline overview: - - FastQC - read quility control - - Trim Galore! - trimming - - Bismark - align - - Bismark - deduplication - - Bismark - methylation extraction - - Bismark - sample report - - Bismark - summary report - - MultiQC ---------------------------------------------------------------------------------------- */ - /* * SET UP CONFIGURATION VARIABLES */ @@ -49,11 +22,36 @@ as single end. version = 0.1 // Configurable variables -params.genome = 'GRCh37' -params.index = params.genomes[ params.genome ].bismark -params.reads = "data/*{_1,_2}*.fastq.gz" +params.project = false +params.emailAddress = false +params.genome = false +params.bismark_index = params.genome ? params.genomes[ params.genome ].bismark ?: false : false +params.saveReference = false +params.reads = "data/*_R{1,2}.fastq.gz" params.outdir = './results' +params.notrim = false +params.nodedup = false +params.relaxMismatches = false +params.numMismatches = 0.6 +// 0.6 will allow a penalty of bp * -0.6 +// For 100bp reads, this is -60. Mismatches cost -6, gap opening -5 and gap extension -2 +// Sp -60 would allow 10 mismatches or ~ 8 x 1-2bp indels +// Bismark default is 0.2 (L,0,-0.2), Bowtie2 default is 0.6 (L,0,-0.6) + +// Validate inputs +if( params.bismark_index ){ + bismark_index = file(params.bismark_index) + if( !bismark_index.exists() ) exit 1, "Bismark index not found: ${params.bismark_index}" +} else { + exit 1, "No reference genome specified! Please use --genome or --bismark_index" +} +params.rrbs = false +params.pbat = false +params.single_cell = false +params.epignome = false +params.accel = false +params.cegx = false if(params.pbat){ params.clip_r1 = 6 params.clip_r2 = 6 @@ -88,236 +86,186 @@ if(params.pbat){ def single -log.info "====================================" +log.info "==================================================" log.info " NGI-MethylSeq : Bisulfite-Seq Best Practice v${version}" -log.info "====================================" -log.info "Reads : ${params.reads}" -log.info "Genome : ${params.genome}" -log.info "Index : ${params.index}" -log.info "Current home : $HOME" -log.info "Current user : $USER" -log.info "Current path : $PWD" -log.info "Script dir : $baseDir" -log.info "Working dir : $workDir" -log.info "Output dir : ${params.outdir}" -log.info "====================================" -if(params.pbat){ log.info "Trim Profile : PBAT" } -if(params.single_cell){ log.info "Trim Profile : Single Cell" } -if(params.epignome){ log.info "Trim Profile : Epignome" } -if(params.accel){ log.info "Trim Profile : Accel" } -if(params.cegx){ log.info "Trim Profile : CEGX" } -log.info "Output dir : ${params.outdir}" -log.info "Trim R1 : ${params.clip_r1}" -log.info "Trim R2 : ${params.clip_r2}" -log.info "Trim 3' R1 : ${params.three_prime_clip_r1}" -log.info "Trim 3' R2 : ${params.three_prime_clip_r2}" -log.info "====================================" +log.info "==================================================" +log.info "Reads : ${params.reads}" +log.info "Genome : ${params.genome}" +log.info "Bismark Index : ${params.bismark_index}" +log.info "Current home : $HOME" +log.info "Current user : $USER" +log.info "Current path : $PWD" +log.info "Script dir : $baseDir" +log.info "Working dir : $workDir" +log.info "Output dir : ${params.outdir}" +log.info "---------------------------------------------------" +log.info "Deduplication : ${params.nodedup ? 'No' : 'Yes'}" +if(params.rrbs){ log.info "RRBS Mode : On" } +if(params.relaxMismatches){ log.info "Mismatch Func : L,0,-${params.numMismatches} (Bismark default = L,0,-0.2)" } +log.info "---------------------------------------------------" +if(params.notrim){ log.info "Trimming Step : Skipped" } +if(params.pbat){ log.info "Trim Profile : PBAT" } +if(params.single_cell){ log.info "Trim Profile : Single Cell" } +if(params.epignome){ log.info "Trim Profile : Epignome" } +if(params.accel){ log.info "Trim Profile : Accel" } +if(params.cegx){ log.info "Trim Profile : CEGX" } +if(params.clip_r1 > 0) log.info "Trim R1 : ${params.clip_r1}" +if(params.clip_r2 > 0) log.info "Trim R2 : ${params.clip_r2}" +if(params.three_prime_clip_r1 > 0) log.info "Trim 3' R1 : ${params.three_prime_clip_r1}" +if(params.three_prime_clip_r2 > 0) log.info "Trim 3' R2 : ${params.three_prime_clip_r2}" +log.info "---------------------------------------------------" +log.info "Config Profile : ${workflow.profile}" +if(params.project) log.info "UPPMAX Project : ${params.project}" +log.info "==================================================" // Validate inputs -index = file(params.index) -if( !index.exists() ) exit 1, "Missing Bismark index: $index" +if( workflow.profile == 'standard' && !params.project ) exit 1, "No UPPMAX project ID found! Use --project" /* - * Create a channel for read files - groups based on shared prefixes + * Create a channel for input read files */ Channel - .fromPath( params.reads ) - .ifEmpty { error "Cannot find any reads matching: ${params.reads}" } - .map { path -> - def prefix = readPrefix(path, params.reads) - tuple(prefix, path) - } - .groupTuple(sort: true) - .set { read_files } - -read_files.into { read_files_fastqc; read_files_trimming } - + .fromFilePairs( params.reads, size: -1 ) + .ifEmpty { exit 1, "Cannot find any reads matching: ${params.reads}" } + .into { read_files_fastqc; read_files_trimming } /* * STEP 1 - FastQC */ - process fastqc { - tag "$prefix" - - module 'bioinfo-tools' - module 'FastQC' - - memory { 2.GB * task.attempt } - time { 4.h * task.attempt } - errorStrategy { task.exitStatus == 143 ? 'retry' : 'ignore' } - maxRetries 3 - maxErrors '-1' - + tag "$name" publishDir "${params.outdir}/fastqc", mode: 'copy' input: - set val(prefix), file(reads:'*') from read_files_fastqc + set val(name), file(reads) from read_files_fastqc output: file '*_fastqc.{zip,html}' into fastqc_results + script: """ - fastqc $reads + fastqc -q $reads """ } - /* * STEP 2 - Trim Galore! */ - -process trim_galore { - tag "$prefix" - - module 'bioinfo-tools' - module 'TrimGalore' - - cpus 3 - memory { 3.GB * task.attempt } - time { 16.h * task.attempt } - errorStrategy { task.exitStatus == 143 ? 'retry' : 'terminate' } - maxRetries 3 - maxErrors '-1' - - publishDir "${params.outdir}/trim_galore", mode: 'copy' - - input: - set val(prefix), file(reads:'*') from read_files_trimming - - output: - file '*fq.gz' into trimmed_reads - file '*trimming_report.txt' into trimgalore_results - - script: - single = reads instanceof Path - c_r1 = params.clip_r1 > 0 ? "--clip_r1 ${params.clip_r1}" : '' - c_r2 = params.clip_r2 > 0 ? "--clip_r2 ${params.clip_r2}" : '' - tpc_r1 = params.three_prime_clip_r1 > 0 ? "--three_prime_clip_r1 ${params.three_prime_clip_r1}" : '' - tpc_r2 = params.three_prime_clip_r2 > 0 ? "--three_prime_clip_r2 ${params.three_prime_clip_r2}" : '' - rrbs = params.rrbs ? "--rrbs" : '' - if (single) { - """ - trim_galore --gzip $rrbs $c_r1 $c_r2 $tpc_r1 $tpc_r2 $reads - """ - } else { - """ - trim_galore --paired --gzip $rrbs $c_r1 $c_r2 $tpc_r1 $tpc_r2 $reads - """ +if(params.notrim){ + trimmed_reads = read_files_trimming + trimgalore_results = [] +} else { + process trim_galore { + tag "$name" + publishDir "${params.outdir}/trim_galore", mode: 'copy' + + input: + set val(name), file(reads) from read_files_trimming + + output: + set val(name), file('*fq.gz') into trimmed_reads + file '*trimming_report.txt' into trimgalore_results + + script: + single = reads instanceof Path + c_r1 = params.clip_r1 > 0 ? "--clip_r1 ${params.clip_r1}" : '' + c_r2 = params.clip_r2 > 0 ? "--clip_r2 ${params.clip_r2}" : '' + tpc_r1 = params.three_prime_clip_r1 > 0 ? "--three_prime_clip_r1 ${params.three_prime_clip_r1}" : '' + tpc_r2 = params.three_prime_clip_r2 > 0 ? "--three_prime_clip_r2 ${params.three_prime_clip_r2}" : '' + rrbs = params.rrbs ? "--rrbs" : '' + if (single) { + """ + trim_galore --gzip $rrbs $c_r1 $tpc_r1 $reads + """ + } else { + """ + trim_galore --paired --gzip $rrbs $c_r1 $c_r2 $tpc_r1 $tpc_r2 $reads + """ + } } } /* * STEP 3 - align with Bismark */ - process bismark_align { - tag "$trimmed_reads" - - module 'bioinfo-tools' - module 'samtools' - module 'bismark' - - cpus 6 - memory { 32.GB * task.attempt } - time { 36.h * task.attempt } - errorStrategy { task.exitStatus == 143 ? 'retry' : 'terminate' } - maxRetries 3 - maxErrors '-1' - - publishDir "${params.outdir}/bismark/aligned", mode: 'copy' + tag "$name" + publishDir "${params.outdir}/bismark_alignments", mode: 'copy' input: - file index - file trimmed_reads + file index from bismark_index + set val(name), file(reads) from trimmed_reads output: - file '*.bam' into bam, bam_2 - file '*report.txt' into bismark_align_log_1, bismark_align_log_2, bismark_align_log_3 - file '*.{png,gz}' into bismark_align_results - file '*.{fq, fastq}' into bismark_unmapped + file "*.bam" into bam, bam_2 + file "*report.txt" into bismark_align_log_1, bismark_align_log_2, bismark_align_log_3 + if(params.unmapped){ file "*.fq.gz" into bismark_unmapped } script: pbat = params.pbat ? "--pbat" : '' non_directional = params.single_cell || params.non_directional ? "--non_directional" : '' unmapped = params.unmapped ? "--unmapped" : '' + mismatches = params.relaxMismatches ? "--score_min L,0,-${params.numMismatches}" : '' if (single) { """ - bismark --bam $pbat $non_directional $unmapped $index $trimmed_reads + bismark --bam $pbat $non_directional $unmapped $mismatches $index $reads """ } else { """ - bismark --bam $pbat $non_directional $unmapped $index -1 ${trimmed_reads[0]} -2 ${trimmed_reads[1]} + bismark \\ + --bam \\ + --dovetail \\ + $pbat $non_directional $unmapped $mismatches \\ + $index \\ + -1 ${reads[0]} \\ + -2 ${reads[1]} """ } } - /* * STEP 4 - Bismark deduplicate */ - -process bismark_deduplicate { - tag "$bam" - - module 'bioinfo-tools' - module 'samtools' - module 'bismark' - - memory { 32.GB * task.attempt } - time { 12.h * task.attempt } - errorStrategy { task.exitStatus == 143 ? 'retry' : 'terminate' } - maxRetries 3 - maxErrors '-1' - - publishDir "${params.outdir}/bismark/deduplicated", mode: 'copy' - - input: - file bam - - output: - file '*deduplicated.bam' into bam_dedup - file '*.deduplication_report.txt' into bismark_dedup_log_1, bismark_dedup_log_2, bismark_dedup_log_3 - file '*.{png,gz}' into bismark_dedup_results - - script: - if (single) { - """ - deduplicate_bismark -s --bam $bam - """ - } else { - """ - deduplicate_bismark -p --bam $bam - """ +if (params.nodedup) { + bam_dedup = bam +} else { + process bismark_deduplicate { + tag "${bam.baseName}" + publishDir "${params.outdir}/bismark_deduplicated", mode: 'copy' + + input: + file bam + + output: + file "${bam.baseName}.deduplicated.bam" into bam_dedup, bam_dedup_qualimap + file "${bam.baseName}.deduplication_report.txt" into bismark_dedup_log_1, bismark_dedup_log_2, bismark_dedup_log_3 + + script: + if (single) { + """ + deduplicate_bismark -s --bam $bam + """ + } else { + """ + deduplicate_bismark -p --bam $bam + """ + } } } /* * STEP 5 - Bismark methylation extraction */ - process bismark_methXtract { - tag "$bam_dedup" - - module 'bioinfo-tools' - module 'samtools' - module 'bismark' - - cpus 4 - memory { 8.GB * task.attempt } - time { 8.h * task.attempt } - errorStrategy { task.exitStatus == 143 ? 'retry' : 'terminate' } - maxRetries 3 - maxErrors '-1' - - publishDir "${params.outdir}/bismark/methylation", mode: 'copy' + tag "${bam.baseName}" + publishDir "${params.outdir}/bismark_methylation_calls", mode: 'copy' input: - file bam_dedup + file bam from bam_dedup output: - file '*.splitting_report.txt' into bismark_splitting_report_1, bismark_splitting_report_2, bismark_splitting_report_3 - file '*.M-bias.txt' into bismark_mbias_1, bismark_mbias_3, bismark_mbias_3 + file "${bam.baseName}_splitting_report.txt" into bismark_splitting_report_1, bismark_splitting_report_2, bismark_splitting_report_3 + file "${bam.baseName}.M-bias.txt" into bismark_mbias_1, bismark_mbias_2, bismark_mbias_3 file '*.{png,gz}' into bismark_methXtract_results script: @@ -325,19 +273,20 @@ process bismark_methXtract { """ bismark_methylation_extractor \\ --multi ${task.cpus} \\ - --buffer_size ${task.memory} \\ + --buffer_size ${task.memory.toGiga()}G \\ + --ignore_r2 2 \\ --bedGraph \\ --counts \\ --gzip \\ -s \\ --report \\ - $bam_dedup + $bam """ } else { """ bismark_methylation_extractor \\ --multi ${task.cpus} \\ - --buffer_size ${task.memory} \\ + --buffer_size ${task.memory.toGiga()}G \\ --ignore_r2 2 \\ --ignore_3prime_r2 2 \\ --bedGraph \\ @@ -346,7 +295,7 @@ process bismark_methXtract { -p \\ --no_overlap \\ --report \\ - $bam_dedup + $bam """ } } @@ -356,14 +305,8 @@ process bismark_methXtract { * STEP 6 - Bismark Sample Report */ process bismark_report { - module 'bioinfo-tools' - module 'bismark' - - memory '2GB' - time '1h' - errorStrategy 'ignore' - - publishDir "${params.outdir}/bismark/summaries", mode: 'copy' + tag "$name" + publishDir "${params.outdir}/bismark_reports", mode: 'copy' input: file bismark_align_log_1 @@ -374,6 +317,8 @@ process bismark_report { output: file '*{html,txt}' into bismark_reports_results + script: + name = bismark_align_log_1.toString() - ~/(_R1)?(_trimmed|_val_1).+$/ """ bismark2report \\ --alignment_report $bismark_align_log_1 \\ @@ -383,110 +328,105 @@ process bismark_report { """ } - /* * STEP 7 - Bismark Summary Report */ - process bismark_summary { - module 'bioinfo-tools' - module 'bismark' - - memory '2GB' - time '1h' - errorStrategy 'ignore' - - publishDir "${params.outdir}/bismark", mode: 'copy' + publishDir "${params.outdir}/bismark_summary", mode: 'copy' input: - file bam_2.toList() - file bismark_align_log_2.toList() - file bismark_dedup_log_2.toList() - file bismark_splitting_report_2.toList() - file bismark_mbias_2.toList() + file ('*') from bam_2.flatten().toList() + file ('*') from bismark_align_log_2.flatten().toList() + file ('*') from bismark_dedup_log_2.flatten().toList() + file ('*') from bismark_splitting_report_2.flatten().toList() + file ('*') from bismark_mbias_2.flatten().toList() output: file '*{html,txt}' into bismark_summary_results + script: """ - bismark2summary . + bismark2summary """ } - /* - * STEP 7 - MultiQC + * STEP 8 - Qualimap */ - -process multiqc { - module 'bioinfo-tools' - // Don't load MultiQC module here as overwrites environment installation. - // Load env module in process instead if multiqc command isn't found. +process qualimap { + tag "${bam.baseName}" + publishDir "${params.outdir}/Qualimap", mode: 'copy' - memory '4GB' - time '2h' - errorStrategy 'ignore' + input: + file bam from bam_dedup_qualimap + + output: + file '${bam.baseName}_qualimap' into qualimap_results + script: + gcref = params.genome == 'GRCh37' ? '-gd HUMAN' : '' + gcref = params.genome == 'GRCm38' ? '-gd MOUSE' : '' + """ + samtools sort $bam -o ${bam.baseName}.sorted.bam + qualimap bamqc $gcref \\ + -bam ${bam.baseName}.sorted.bam \\ + -outdir ${bam.baseName}_qualimap \\ + --collect-overlap-pairs \\ + --java-mem-size=${task.memory.toGiga()}G \\ + -nt ${task.cpus} + """ +} + +/* + * STEP 9 - MultiQC + */ +process multiqc { publishDir "${params.outdir}/MultiQC", mode: 'copy' input: - file ('fastqc/*') from fastqc_results.toList() - file ('trimgalore/*') from trimgalore_results.toList() - file ('bismark/*') from bismark_align_log_3.toList() - file ('bismark/*') from bismark_dedup_log_3.toList() - file ('bismark/*') from bismark_splitting_report_3.toList() - file ('bismark/*') from bismark_mbias_3.toList() - file ('bismark/*') from bismark_reports_results.toList() - file ('bismark/*') from bismark_summary_results.toList() + file ('fastqc/*') from fastqc_results.flatten().toList() + file ('trimgalore/*') from trimgalore_results.flatten().toList() + file ('bismark/*') from bismark_align_log_3.flatten().toList() + file ('bismark/*') from bismark_dedup_log_3.flatten().toList() + file ('bismark/*') from bismark_splitting_report_3.flatten().toList() + file ('bismark/*') from bismark_mbias_3.flatten().toList() + file ('bismark/*') from bismark_reports_results.flatten().toList() + file ('bismark/*') from bismark_summary_results.flatten().toList() + file ('qualimap/*') from qualimap_results.flatten().toList() output: file '*multiqc_report.html' file '*multiqc_data' - + + script: """ - # Load MultiQC with environment module if not already in PATH - type multiqc >/dev/null 2>&1 || { module load MultiQC; }; multiqc -f . """ } -/* - * Helper function, given a file Path - * returns the file name region matching a specified glob pattern - * starting from the beginning of the name up to last matching group. - * - * For example: - * readPrefix('/some/data/file_alpha_1.fa', 'file*_1.fa' ) - * - * Returns: - * 'file_alpha' - */ -def readPrefix( Path actual, template ) { - - final fileName = actual.getFileName().toString() - - def filePattern = template.toString() - int p = filePattern.lastIndexOf('/') - if( p != -1 ) filePattern = filePattern.substring(p+1) - if( !filePattern.contains('*') && !filePattern.contains('?') ) - filePattern = '*' + filePattern - - def regex = filePattern - .replace('.','\\.') - .replace('*','(.*)') - .replace('?','(.?)') - .replace('{','(?:') - .replace('}',')') - .replace(',','|') - - def matcher = (fileName =~ /$regex/) - if( matcher.matches() ) { - def end = matcher.end(matcher.groupCount() ) - def prefix = fileName.substring(0,end) - while(prefix.endsWith('-') || prefix.endsWith('_') || prefix.endsWith('.') ) - prefix=prefix[0..-2] - return prefix +// E-mail sent upon pipeline completion +workflow.onComplete { + if(params.emailAddress){ + def subject = "NGI-MethylSeq pipeline completed ${ workflow.success ? 'successfully' : 'with errors' }" + + ['mail', '-s', subject, params.emailAddress].execute() << """ + + NGI-MethylSeq pipeline execution summary + ---------------------------------------- + Starterd at : ${workflow.start} + Completed at : ${workflow.complete} + Duration : ${workflow.duration} + Success : ${workflow.success} + Launch Dir : ${workflow.launchDir} + Work Dir : ${workflow.workDir} + Command : ${workflow.commandLine} + Resumed : ${workflow.resume} + Profile : ${workflow.profile == 'standard' ? 'UPPMAX' : workflow.profile} + Nextflow v : ${nextflow.version} + Exit status : ${workflow.exitStatus} + Error msg : ${workflow.errorMessage ?: '-'} + Error report : ${workflow.errorReport ?: '-'} + """ } - return fileName } diff --git a/nextflow.config b/nextflow.config new file mode 100644 index 0000000..71fe8a2 --- /dev/null +++ b/nextflow.config @@ -0,0 +1,28 @@ +/* +vim: syntax=groovy +-*- mode: groovy;-*- + * ------------------------------------------------- + * NGI-ChIPseq Nextflow config file + * ------------------------------------------------- + * Default config options for all environments. + * Cluster-specific config options should be saved + * in the conf folder and imported under a profile + * name here. + */ + +profiles { + + standard { + includeConfig 'conf/uppmax.config' + } + devel { + includeConfig 'conf/uppmax-devel.config' + } + + // UNDER DEVELOPMENT (not yet written) + docker { + process.container = 'your/image' + docker.enabled = true + } + +}