Lysiane changes

It includes the following changes: -Fix linter warnings -Introduce conditional parameter validation logic for exomiser and vep -Remove exomiser test analysis file as default files seem to be compatible with public test dataset -Use a dedicated `exomiser_genome` parameter -Add utility functions to check if a tool is present and corresponding nf-test tests -Make Exomiser stub output files identical to real output files -Infer exomiser version from version file -Standardize exomizer process outputs -Introduce per sequencing type analysis file -use process input instead params to pass configuration information -Update README.md, OUTPUT.md and USAGE.md -Add REFERENCE_DATA.md -Modify postprocessing workflow code to use def keyword for local variables and use more standard variable names -Modify the github ci nf-test command: remove the local tag constraint (not necessary anymore) and activate ci mode -Add basic module test for exomiser (stub mode)
Ferlab-Ste-Justine · Sep 24, 2024 · ae7818e · ae7818e
1 parent 58855f2
commit ae7818e
Show file tree

Hide file tree

Showing 23 changed files with 806 additions and 332 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -20,5 +20,6 @@ These are the most common things requested on pull requests (PRs).
 - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir <OUTDIR>`).
 - [ ] Usage Documentation in `docs/usage.md` is updated.
 - [ ] Output Documentation in `docs/output.md` is updated.
+- [ ] Reference Data Documentation in `docs/reference_data.md` is updated.
 - [ ] `CHANGELOG.md` is updated.
 - [ ] `README.md` is updated (including new tool citations and authors/contributors).
diff --git a/.github/workflows/ci-nf-test.yml b/.github/workflows/ci-nf-test.yml
@@ -38,12 +38,10 @@ jobs:
       - name: Run nf-test
         run: |
           nf-test test \
-            --tag=local \
+            --ci \
             --changed-since="HEAD^1" \
             --tap=test.tap \
             --verbose
         # Notes:
-        #  - The --tag option must appear before the --changed-since option to be applied
-        #    correctly.
         #  - The --verbose option is required for some nf-core tests to pass. It's not 
         #    needed now as we only run local tests, but we mention for future use.
diff --git a/.nf-core.yml b/.nf-core.yml
@@ -34,6 +34,9 @@ lint:
   nextflow_config:
   - manifest.name
   - manifest.homePage
+  - config_defaults:
+    - params.exomiser_analysis_wes
+    - params.exomiser_analysis_wgs
 nf_core_version: 2.14.1
 repository_type: pipeline
 template:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## v2.0.0dev - [date]
 
 ### `Added`
+- [#25](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/25) Added Exomiser module and introduced `tools` parameter to control the execution of VEP and Exomiser.
 - [#26](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/26) Add version file in exomiser docker image
 
 ### `Known issues`

diff --git a/README.md b/README.md
@@ -1,16 +1,18 @@
 [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
 
 [![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.10.1-23aa62.svg)](https://www.nextflow.io/)
-[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
 [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
 [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
-[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/ferlab/postprocessing)
+
+<!-- HIDDING BECAUSE NOT SUPPORTED YET
+[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
+-->
 
 ## Introduction
 
-**ferlab/postprocessing** is a bioinformatics pipeline that takes GVCFs from several samples to combine, perform joint genotyping, tag low quality variant and annotate a final vcf version.
+**Ferlab-Ste-Justine/Post-processing-Pipeline** is a bioinformatics pipeline designed for family-based analysis of GVCFs from multiple samples. 
+It performs joint genotyping, tags low-quality variants, and optionally annotates the final vcf data using vep and/or exomiser.
 
-<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
 ###  Summary:
 1. Remove MNPs using bcftools 
 2. Normalize .gvcf
@@ -19,104 +21,56 @@
 5. Tag false positive variants with either:
   - For whole genome sequencing data: [Variant quality score recalibration (VQSR)](https://gatk.broadinstitute.org/hc/en-us/articles/360036510892-VariantRecalibrator)
   - For whole exome sequencing data: [Hard-Filtering](https://gatk.broadinstitute.org/hc/en-us/articles/360036733451-VariantFiltration)
-6. Annotate variants with [Variant effect predictor (VEP)](https://useast.ensembl.org/info/docs/tools/vep/index.html)
+6. Optionnally annotate variants with [Variant effect predictor (VEP)](https://useast.ensembl.org/info/docs/tools/vep/index.html)
+7. Optionnally integrate phenotype data to annotate, filter and prioritise variants likely to be disease-causing with [exomiser](https://www.sanger.ac.uk/tool/exomiser/)
 
+<!-- TODO: UPDATE THIS DIAGRAM -->
 ![PostProcessingDiagram](assets/PostProcessingImage.png?raw=true)
 
 ## Usage
 
-> [!NOTE]
-> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
-
-### Samples
-The workflow will accept sample data separated by commas (CSV format). The path to the sample file must be specified with the "**input**" parameter. The column names are : familyId,sample,sequencingType,file. The sequencing type must be either WES (Whole Exome Sequencing) or WGS (Whole Genome Sequencing).
-
-**sample.csv**
-```csv
-**familyId**,**sample**,**sequencingType**,**file**
-CONGE-XXX,01,WES,CONGE-XXX-01.hard-filtered.gvcf.gz
-CONGE-XXX,02,WES,CONGE-XXX-02.hard-filtered.gvcf.gz
-CONGE-XXX,03,WES,CONGE-XXX-03.hard-filtered.gvcf.gz
-CONGE-YYY,01,WGS,CONGE-YYY-01.hard-filtered.gvcf.gz
-CONGE-YYY,02,WGS,CONGE-YYY-02.hard-filtered.gvcf.gz
-CONGE-YYY,03,WGS,CONGE-YYY-03.hard-filtered.gvcf.gz
-```
-
-
-> [!NOTE]
-> The sequencing type also determines the type of variant filtering the pipeline will use.
-> 
-> In the case of Whole Genome Sequencing, VQSR (Variant Quality Score Recalibration) is used (preferred method).
-> 
-> In the case of Whole Exome Sequencing, Hard-filtering needs to be used.
-
-Now, you can run the pipeline using:
-
-<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
+Here is an example nextflow command to run the pipeline:
 
 ```bash
-nextflow run ferlab/postprocessing \
-   -profile <docker/singularity/.../> \
+nextflow run -c cluster.config Ferlab-Ste-Justine/Post-processing-Pipeline -r "v2.0.0" \
+    -params-file params.json  \
    --input samplesheet.csv \
-   --outdir <OUTDIR>
+   --outdir results/dir \
+   --tools vep,exomiser
 ```
 
+> [!NOTE]
+> If you are new to nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up nextflow.
+
 > [!WARNING]
-> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
+> Please provide pipeline parameters via the CLI or nextflow `-params-file` option. Custom config files including those provided by the `-c` nextflow option can be used to provide any configuration _**except for parameters**_;
 > see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
 
-### References
-Reference files are necessary at multiple steps of the workflow, notably for joint-genotyping,the variant effect predictor (VEP) and VQSR. 
-Using igenome, we can retrieve the relevant files for the desired version of the human genome.
-Specifically, we specifiy the igenome version with the **genome** parameter. Most likely this value will be *'GRCh38'*
-
 
-Next, we also need broader references, which are contained in a path defined by the **broad** parameter.
+For more details, see [docs/usage.md](docs/usage.md) and [docs/reference_data.md](docs/reference_data.md).
 
-The broad directory must contain the following files:
 
-- The interval list which determines the genomic interval(s) over which we operate: filename of this list must be defined with the **intervalsFile** parameter
-- Highly validated variance ressources currently required by VQSR. ***These are currently hard coded in the pipeline!***
-  - HapMap file : hapmap_3.3.hg38.vcf.gz
-  - 1000G omni2.5 file : 1000G_omni2.5.hg38.vcf.gz
-  - 1000G reference file : 1000G_phase1.snps.high_confidence.hg38.vcf.gz
-  - SNP database : Homo_sapiens_assembly38.dbsnp138.vcf.gz
+### Stub mode and quick tests
 
-
-Finally, the vep cache directory must be specified with **vepCache**, which is usually created by vep itself on first installation.
-Generally, we only need the human files obtainable from https://ftp.ensembl.org/pub/release-112/variation/vep/homo_sapiens_vep_112_GRCh38.tar.gz
+The `-stub` (or `-stub-run`) option can be added to run the "stub" block of processes instead of the "script" block. This can be helpful for testing.
 
-### Stub run
-The -stub-run option can be added to run the "stub" block of processes instead of the "script" block. This can be helpful for testing.
 
-🚧
-
-Parameters summary
------
+To test your setup in stub mode, simply run `nextflow run Ferlab-Ste-Justine/Post-processing-Pipeline -profile test,docker -stub`. 
 
-| Parameter name | Required? | Accepted input |
-| --- | --- | --- |
-| `input` | _Required_ | file |
-| `outdir` | _Required_ | path |
-| `genome` | _Required_ | igenome version, ie 'GRCh38'|
-| `broad` | _Required_ | path |
-| `intervalsFile` | _Required_ | list of genome intervals |
-| `vepCache` | _Required_ | path |
+For tests with real data, see documentation in the [test configuration profile](conf/test.config)
 
 
 Pipeline Output
 -----
-Path to output directory must be specified in **outdir** parameter.
-🚧
+Path to output directory must be specified via the `outdir` parameter.
 
+See [docs/output.md](docs/output.md) for more details about pipeline outputs.
 
-## Credits
 
-ferlab/postprocessing was originally written by Damien Geneste, David Morais, Felix-Antoine Le Sieur, Jeremy Costanza, Lysiane Bouchard.
+## Credits
 
-We thank the following people for their extensive assistance in the development of this pipeline:
+Ferlab-Ste-Justine/Post-processing-Pipeline was originally written by Damien Geneste, David Morais, Felix-Antoine Le Sieur, Jeremy Costanza, Lysiane Bouchard.
 
-<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
 
 ## Contributions and Support
 
@@ -140,11 +94,10 @@ The documentation of the various tools used in this workflow are available here:
 
 [VEP](https://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html)
 
-## Citations
+[EXOMISER](https://exomiser.readthedocs.io/en/latest/)
 
-<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
 
-An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
+## Citations
 
 This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).
 

diff --git a/assets/exomiser/test_exomiser_analysis.yml b/assets/exomiser/test_exomiser_analysis.yml
diff --git a/conf/test.config b/conf/test.config
@@ -50,8 +50,7 @@ params {
     tools = "vep,exomiser"
 
     // Exomiser parameters
-    exomiser_analysis = "assets/exomiser/test_exomiser_analysis.yml"
     exomiser_data_dir = "data-test/reference/exomiser"
     exomiser_data_version = "2402"
-    genome = "hg38"
+    exomiser_genome = "hg38"
 }
diff --git a/docs/output.md b/docs/output.md
@@ -3,9 +3,8 @@
 ## Introduction
 
 This document describes the output produced by the pipeline.
-The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
+The directories listed below will be created in the output directory after the pipeline has finished. All paths are relative to the top-level output directory.
 
-<!-- TODO nf-core: Write this documentation describing your workflow's output -->
 
 ## Pipeline overview
 
@@ -20,7 +19,11 @@ The directories listed below will be created in the results directory after the
   - A copy of the nextflow log file: `nextflow.log`. Note that it will miss logs written after the workflow.onComplete handler is run.
   - Copies of the configuration files used: `config/*.config`. This includes the default `nextflow.config` file as well as any additional configuration files passed as parameters.
   - Other metadata relevant for reproducibility: `metadata.txt` . It contains information such as the original command line, the name of the branch and revision used, the username of the person who submitted the job, a list of configuration files passed, the nextflow work directory, etc.
-
+- `splitmultiallelics/`: pipeline output before running the tools specified via the `tools` parameter.
+- `vep/`: vep output
+- `exomiser/results`: exomiser output
+
+You might see other folders named after different pipeline processes. These are considered intermediate pipeline outputs.
 
 </details>