Skip to content

Commit

Permalink
Lysiane changes
Browse files Browse the repository at this point in the history
It includes the following changes:
-Fix linter warnings
-Introduce conditional parameter validation logic for exomiser and vep
-Remove exomiser test analysis file as default files seem to be compatible with public test dataset
-Use a dedicated `exomiser_genome` parameter
-Add utility functions to check if a tool is present and corresponding nf-test tests
-Make Exomiser stub output files identical to real output files
-Infer exomiser version from version file
-Standardize exomizer process outputs
-Introduce per sequencing type analysis file
-use process input instead params to pass configuration information
-Update README.md, OUTPUT.md and USAGE.md
-Add REFERENCE_DATA.md
-Modify postprocessing workflow code to use def keyword for local variables and use more standard variable names
-Modify the github ci nf-test command: remove the local tag constraint (not necessary anymore) and activate ci mode
-Add basic module test for exomiser (stub mode)
  • Loading branch information
LysianeBouchard committed Sep 24, 2024
1 parent 58855f2 commit ae7818e
Show file tree
Hide file tree
Showing 23 changed files with 806 additions and 332 deletions.
1 change: 1 addition & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,6 @@ These are the most common things requested on pull requests (PRs).
- [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir <OUTDIR>`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] Reference Data Documentation in `docs/reference_data.md` is updated.
- [ ] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and authors/contributors).
4 changes: 1 addition & 3 deletions .github/workflows/ci-nf-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,10 @@ jobs:
- name: Run nf-test
run: |
nf-test test \
--tag=local \
--ci \
--changed-since="HEAD^1" \
--tap=test.tap \
--verbose
# Notes:
# - The --tag option must appear before the --changed-since option to be applied
# correctly.
# - The --verbose option is required for some nf-core tests to pass. It's not
# needed now as we only run local tests, but we mention for future use.
3 changes: 3 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ lint:
nextflow_config:
- manifest.name
- manifest.homePage
- config_defaults:
- params.exomiser_analysis_wes
- params.exomiser_analysis_wgs
nf_core_version: 2.14.1
repository_type: pipeline
template:
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## v2.0.0dev - [date]

### `Added`
- [#25](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/25) Added Exomiser module and introduced `tools` parameter to control the execution of VEP and Exomiser.
- [#26](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/26) Add version file in exomiser docker image

### `Known issues`
Expand Down
105 changes: 29 additions & 76 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.10.1-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/ferlab/postprocessing)

<!-- HIDDING BECAUSE NOT SUPPORTED YET
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
-->

## Introduction

**ferlab/postprocessing** is a bioinformatics pipeline that takes GVCFs from several samples to combine, perform joint genotyping, tag low quality variant and annotate a final vcf version.
**Ferlab-Ste-Justine/Post-processing-Pipeline** is a bioinformatics pipeline designed for family-based analysis of GVCFs from multiple samples.
It performs joint genotyping, tags low-quality variants, and optionally annotates the final vcf data using vep and/or exomiser.

<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
### Summary:
1. Remove MNPs using bcftools
2. Normalize .gvcf
Expand All @@ -19,104 +21,56 @@
5. Tag false positive variants with either:
- For whole genome sequencing data: [Variant quality score recalibration (VQSR)](https://gatk.broadinstitute.org/hc/en-us/articles/360036510892-VariantRecalibrator)
- For whole exome sequencing data: [Hard-Filtering](https://gatk.broadinstitute.org/hc/en-us/articles/360036733451-VariantFiltration)
6. Annotate variants with [Variant effect predictor (VEP)](https://useast.ensembl.org/info/docs/tools/vep/index.html)
6. Optionnally annotate variants with [Variant effect predictor (VEP)](https://useast.ensembl.org/info/docs/tools/vep/index.html)
7. Optionnally integrate phenotype data to annotate, filter and prioritise variants likely to be disease-causing with [exomiser](https://www.sanger.ac.uk/tool/exomiser/)

<!-- TODO: UPDATE THIS DIAGRAM -->
![PostProcessingDiagram](assets/PostProcessingImage.png?raw=true)

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
### Samples
The workflow will accept sample data separated by commas (CSV format). The path to the sample file must be specified with the "**input**" parameter. The column names are : familyId,sample,sequencingType,file. The sequencing type must be either WES (Whole Exome Sequencing) or WGS (Whole Genome Sequencing).

**sample.csv**
```csv
**familyId**,**sample**,**sequencingType**,**file**
CONGE-XXX,01,WES,CONGE-XXX-01.hard-filtered.gvcf.gz
CONGE-XXX,02,WES,CONGE-XXX-02.hard-filtered.gvcf.gz
CONGE-XXX,03,WES,CONGE-XXX-03.hard-filtered.gvcf.gz
CONGE-YYY,01,WGS,CONGE-YYY-01.hard-filtered.gvcf.gz
CONGE-YYY,02,WGS,CONGE-YYY-02.hard-filtered.gvcf.gz
CONGE-YYY,03,WGS,CONGE-YYY-03.hard-filtered.gvcf.gz
```


> [!NOTE]
> The sequencing type also determines the type of variant filtering the pipeline will use.
>
> In the case of Whole Genome Sequencing, VQSR (Variant Quality Score Recalibration) is used (preferred method).
>
> In the case of Whole Exome Sequencing, Hard-filtering needs to be used.
Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
Here is an example nextflow command to run the pipeline:

```bash
nextflow run ferlab/postprocessing \
-profile <docker/singularity/.../> \
nextflow run -c cluster.config Ferlab-Ste-Justine/Post-processing-Pipeline -r "v2.0.0" \
-params-file params.json \
--input samplesheet.csv \
--outdir <OUTDIR>
--outdir results/dir \
--tools vep,exomiser
```

> [!NOTE]
> If you are new to nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up nextflow.
> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> Please provide pipeline parameters via the CLI or nextflow `-params-file` option. Custom config files including those provided by the `-c` nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
### References
Reference files are necessary at multiple steps of the workflow, notably for joint-genotyping,the variant effect predictor (VEP) and VQSR.
Using igenome, we can retrieve the relevant files for the desired version of the human genome.
Specifically, we specifiy the igenome version with the **genome** parameter. Most likely this value will be *'GRCh38'*


Next, we also need broader references, which are contained in a path defined by the **broad** parameter.
For more details, see [docs/usage.md](docs/usage.md) and [docs/reference_data.md](docs/reference_data.md).

The broad directory must contain the following files:

- The interval list which determines the genomic interval(s) over which we operate: filename of this list must be defined with the **intervalsFile** parameter
- Highly validated variance ressources currently required by VQSR. ***These are currently hard coded in the pipeline!***
- HapMap file : hapmap_3.3.hg38.vcf.gz
- 1000G omni2.5 file : 1000G_omni2.5.hg38.vcf.gz
- 1000G reference file : 1000G_phase1.snps.high_confidence.hg38.vcf.gz
- SNP database : Homo_sapiens_assembly38.dbsnp138.vcf.gz
### Stub mode and quick tests


Finally, the vep cache directory must be specified with **vepCache**, which is usually created by vep itself on first installation.
Generally, we only need the human files obtainable from https://ftp.ensembl.org/pub/release-112/variation/vep/homo_sapiens_vep_112_GRCh38.tar.gz
The `-stub` (or `-stub-run`) option can be added to run the "stub" block of processes instead of the "script" block. This can be helpful for testing.

### Stub run
The -stub-run option can be added to run the "stub" block of processes instead of the "script" block. This can be helpful for testing.

🚧

Parameters summary
-----
To test your setup in stub mode, simply run `nextflow run Ferlab-Ste-Justine/Post-processing-Pipeline -profile test,docker -stub`.

| Parameter name | Required? | Accepted input |
| --- | --- | --- |
| `input` | _Required_ | file |
| `outdir` | _Required_ | path |
| `genome` | _Required_ | igenome version, ie 'GRCh38'|
| `broad` | _Required_ | path |
| `intervalsFile` | _Required_ | list of genome intervals |
| `vepCache` | _Required_ | path |
For tests with real data, see documentation in the [test configuration profile](conf/test.config)


Pipeline Output
-----
Path to output directory must be specified in **outdir** parameter.
🚧
Path to output directory must be specified via the `outdir` parameter.

See [docs/output.md](docs/output.md) for more details about pipeline outputs.

## Credits

ferlab/postprocessing was originally written by Damien Geneste, David Morais, Felix-Antoine Le Sieur, Jeremy Costanza, Lysiane Bouchard.
## Credits

We thank the following people for their extensive assistance in the development of this pipeline:
Ferlab-Ste-Justine/Post-processing-Pipeline was originally written by Damien Geneste, David Morais, Felix-Antoine Le Sieur, Jeremy Costanza, Lysiane Bouchard.

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->

## Contributions and Support

Expand All @@ -140,11 +94,10 @@ The documentation of the various tools used in this workflow are available here:

[VEP](https://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html)

## Citations
[EXOMISER](https://exomiser.readthedocs.io/en/latest/)

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
## Citations

This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

Expand Down
32 changes: 0 additions & 32 deletions assets/exomiser/test_exomiser_analysis.yml

This file was deleted.

3 changes: 1 addition & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,7 @@ params {
tools = "vep,exomiser"

// Exomiser parameters
exomiser_analysis = "assets/exomiser/test_exomiser_analysis.yml"
exomiser_data_dir = "data-test/reference/exomiser"
exomiser_data_version = "2402"
genome = "hg38"
exomiser_genome = "hg38"
}
9 changes: 6 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
## Introduction

This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
The directories listed below will be created in the output directory after the pipeline has finished. All paths are relative to the top-level output directory.

<!-- TODO nf-core: Write this documentation describing your workflow's output -->

## Pipeline overview

Expand All @@ -20,7 +19,11 @@ The directories listed below will be created in the results directory after the
- A copy of the nextflow log file: `nextflow.log`. Note that it will miss logs written after the workflow.onComplete handler is run.
- Copies of the configuration files used: `config/*.config`. This includes the default `nextflow.config` file as well as any additional configuration files passed as parameters.
- Other metadata relevant for reproducibility: `metadata.txt` . It contains information such as the original command line, the name of the branch and revision used, the username of the person who submitted the job, a list of configuration files passed, the nextflow work directory, etc.

- `splitmultiallelics/`: pipeline output before running the tools specified via the `tools` parameter.
- `vep/`: vep output
- `exomiser/results`: exomiser output

You might see other folders named after different pipeline processes. These are considered intermediate pipeline outputs.

</details>

Expand Down
Loading

0 comments on commit ae7818e

Please sign in to comment.