Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/clin 2947 add exomiser #25

Merged
merged 1 commit into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,6 @@ These are the most common things requested on pull requests (PRs).
- [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir <OUTDIR>`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] Reference Data Documentation in `docs/reference_data.md` is updated.
- [ ] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and authors/contributors).
4 changes: 1 addition & 3 deletions .github/workflows/ci-nf-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,10 @@ jobs:
- name: Run nf-test
run: |
nf-test test \
--tag=local \
--ci \
--changed-since="HEAD^1" \
--tap=test.tap \
--verbose
# Notes:
# - The --tag option must appear before the --changed-since option to be applied
# correctly.
# - The --verbose option is required for some nf-core tests to pass. It's not
# needed now as we only run local tests, but we mention for future use.
4 changes: 4 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ lint:
- assets/nf-core-postprocessing_logo_light.png
- docs/images/nf-core-postprocessing_logo_light.png
- docs/images/nf-core-postprocessing_logo_dark.png
- docs/README.md
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/CONTRIBUTING.md
- .github/PULL_REQUEST_TEMPLATE.md
Expand All @@ -34,6 +35,9 @@ lint:
nextflow_config:
- manifest.name
- manifest.homePage
- config_defaults:
- params.exomiser_analysis_wes
- params.exomiser_analysis_wgs
nf_core_version: 2.14.1
repository_type: pipeline
template:
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## v2.0.0dev - [date]

### `Added`
- [#25](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/25) Added exomiser module and introduced `tools` parameter to control the execution of VEP and Exomiser.
- [#25](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/25) Group vep output files in subfolder `vep`.
- [#26](https://github.com/Ferlab-Ste-Justine/Post-processing-Pipeline/pull/26) Add version file in exomiser docker image

### `Known issues`
Expand Down
110 changes: 35 additions & 75 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.10.1-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/ferlab/postprocessing)

<!-- HIDDING BECAUSE NOT SUPPORTED YET
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
-->

## Introduction

**ferlab/postprocessing** is a bioinformatics pipeline that takes GVCFs from several samples to combine, perform joint genotyping, tag low quality variant and annotate a final vcf version.
**Ferlab-Ste-Justine/Post-processing-Pipeline** is a bioinformatics pipeline designed for family-based analysis of GVCFs from multiple samples.
It performs joint genotyping, tags low-quality variants, and optionally annotates the final vcf data using vep and/or prioritize variant using exomiser.

<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
### Summary:
1. Remove MNPs using bcftools
2. Normalize .gvcf
Expand All @@ -19,104 +21,62 @@
5. Tag false positive variants with either:
- For whole genome sequencing data: [Variant quality score recalibration (VQSR)](https://gatk.broadinstitute.org/hc/en-us/articles/360036510892-VariantRecalibrator)
- For whole exome sequencing data: [Hard-Filtering](https://gatk.broadinstitute.org/hc/en-us/articles/360036733451-VariantFiltration)
6. Annotate variants with [Variant effect predictor (VEP)](https://useast.ensembl.org/info/docs/tools/vep/index.html)
6. Optionnally annotate variants with [Variant effect predictor (VEP)](https://useast.ensembl.org/info/docs/tools/vep/index.html)
7. Optionnally integrate phenotype data to annotate, filter and prioritise variants likely to be disease-causing with [exomiser](https://www.sanger.ac.uk/tool/exomiser/)

![PostProcessingDiagram](assets/PostProcessingImage.png?raw=true)

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.

### Samples
The workflow will accept sample data separated by commas (CSV format). The path to the sample file must be specified with the "**input**" parameter. The column names are : familyId,sample,sequencingType,file. The sequencing type must be either WES (Whole Exome Sequencing) or WGS (Whole Genome Sequencing).

**sample.csv**
```csv
**familyId**,**sample**,**sequencingType**,**file**
CONGE-XXX,01,WES,CONGE-XXX-01.hard-filtered.gvcf.gz
CONGE-XXX,02,WES,CONGE-XXX-02.hard-filtered.gvcf.gz
CONGE-XXX,03,WES,CONGE-XXX-03.hard-filtered.gvcf.gz
CONGE-YYY,01,WGS,CONGE-YYY-01.hard-filtered.gvcf.gz
CONGE-YYY,02,WGS,CONGE-YYY-02.hard-filtered.gvcf.gz
CONGE-YYY,03,WGS,CONGE-YYY-03.hard-filtered.gvcf.gz
```
### Workflow subway schema

The full Ferlab workflow is shown in the image below, including the steps applicable prior to this pipeline. The steps relevant to the Ferlab-Ste-Justine/Post-processing-Pipeline correspond to the post-processing block.
![PostProcessingDiagram](docs/images/ferlab_workflow.png)

> [!NOTE]
> The sequencing type also determines the type of variant filtering the pipeline will use.
>
> In the case of Whole Genome Sequencing, VQSR (Variant Quality Score Recalibration) is used (preferred method).
>
> In the case of Whole Exome Sequencing, Hard-filtering needs to be used.
This schema was done using [inkscape](https://inkscape.org/) with the good pratices recommended by the nf-core community. See [nf-core Graphic Design](https://nf-co.re/docs/guidelines/graphic_design).

Now, you can run the pipeline using:
## Usage

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
Here is an example nextflow command to run the pipeline:

```bash
nextflow run ferlab/postprocessing \
-profile <docker/singularity/.../> \
nextflow run -c cluster.config Ferlab-Ste-Justine/Post-processing-Pipeline -r "v2.0.0" \
-params-file params.json \
--input samplesheet.csv \
--outdir <OUTDIR>
--outdir results/dir \
--tools vep,exomiser
```

> [!NOTE]
> If you are new to nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up nextflow.

> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> Please provide pipeline parameters via the CLI or nextflow `-params-file` option. Custom config files including those provided by the `-c` nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).

### References
Reference files are necessary at multiple steps of the workflow, notably for joint-genotyping,the variant effect predictor (VEP) and VQSR.
Using igenome, we can retrieve the relevant files for the desired version of the human genome.
Specifically, we specifiy the igenome version with the **genome** parameter. Most likely this value will be *'GRCh38'*


Next, we also need broader references, which are contained in a path defined by the **broad** parameter.
For more details, see [docs/usage.md](docs/usage.md) and [docs/reference_data.md](docs/reference_data.md).

The broad directory must contain the following files:

- The interval list which determines the genomic interval(s) over which we operate: filename of this list must be defined with the **intervalsFile** parameter
- Highly validated variance ressources currently required by VQSR. ***These are currently hard coded in the pipeline!***
- HapMap file : hapmap_3.3.hg38.vcf.gz
- 1000G omni2.5 file : 1000G_omni2.5.hg38.vcf.gz
- 1000G reference file : 1000G_phase1.snps.high_confidence.hg38.vcf.gz
- SNP database : Homo_sapiens_assembly38.dbsnp138.vcf.gz
### Stub mode and quick tests


Finally, the vep cache directory must be specified with **vepCache**, which is usually created by vep itself on first installation.
Generally, we only need the human files obtainable from https://ftp.ensembl.org/pub/release-112/variation/vep/homo_sapiens_vep_112_GRCh38.tar.gz
The `-stub` (or `-stub-run`) option can be added to run the "stub" block of processes instead of the "script" block. This can be helpful for testing.

### Stub run
The -stub-run option can be added to run the "stub" block of processes instead of the "script" block. This can be helpful for testing.

🚧

Parameters summary
-----
To test your setup in stub mode, simply run `nextflow run Ferlab-Ste-Justine/Post-processing-Pipeline -profile test,docker -stub`.

| Parameter name | Required? | Accepted input |
| --- | --- | --- |
| `input` | _Required_ | file |
| `outdir` | _Required_ | path |
| `genome` | _Required_ | igenome version, ie 'GRCh38'|
| `broad` | _Required_ | path |
| `intervalsFile` | _Required_ | list of genome intervals |
| `vepCache` | _Required_ | path |
For tests with real data, see documentation in the [test configuration profile](conf/test.config)


Pipeline Output
-----
Path to output directory must be specified in **outdir** parameter.
🚧
Path to output directory must be specified via the `outdir` parameter.

See [docs/output.md](docs/output.md) for more details about pipeline outputs.

## Credits

ferlab/postprocessing was originally written by Damien Geneste, David Morais, Felix-Antoine Le Sieur, Jeremy Costanza, Lysiane Bouchard.
## Credits

We thank the following people for their extensive assistance in the development of this pipeline:
Ferlab-Ste-Justine/Post-processing-Pipeline was originally written by Damien Geneste, David Morais, Felix-Antoine Le Sieur, Jeremy Costanza, Lysiane Bouchard.

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->

## Contributions and Support

Expand All @@ -136,15 +96,15 @@ The documentation of the various tools used in this workflow are available here:
- [CombineGVCFs](https://gatk.broadinstitute.org/hc/en-us/articles/360037593911-CombineGVCFs)
- [GenotypeGVCFs](https://gatk.broadinstitute.org/hc/en-us/articles/360037057852-GenotypeGVCFs)
- [VariantRecalibrator](https://gatk.broadinstitute.org/hc/en-us/articles/360035531612-Variant-Quality-Score-Recalibration-VQSR)
- [VariantFiltration](https://gatk.broadinstitute.org/hc/enus/articles/360041850471-VariantFiltration))
- [VariantFiltration](https://gatk.broadinstitute.org/hc/enus/articles/360041850471-VariantFiltration)
- [HardFiltering](https://gatk.broadinstitute.org/hc/en-us/articles/360035531112--How-to-Filter-variants-either-with-VQSR-or-by-hard-filtering)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove last parenthesis + update link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[VEP](https://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html)

## Citations
[EXOMISER](https://exomiser.readthedocs.io/en/latest/)

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
## Citations

This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

Expand Down
Binary file removed assets/PostProcessingImage.png
Binary file not shown.
7 changes: 4 additions & 3 deletions assets/TestSampleSheet.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
familyId,sample,sequencingType,gvcf
Family1,Test1,WES,https://github.com/nf-core/test-datasets/raw/modules/data/genomics/homo_sapiens/illumina/gvcf/test.genome.vcf.gz
Family1,Test2,WGS,https://github.com/nf-core/test-datasets/raw/modules/data/genomics/homo_sapiens/illumina/gvcf/test2.genome.vcf.gz
familyId,sample,sequencingType,gvcf,familyPheno
amily1,Test1,WES,https://github.com/nf-core/test-datasets/raw/modules/data/genomics/homo_sapiens/illumina/gvcf/test.genome.vcf.gz,assets/exomiser/pheno/family1.yml
Family1,Test2,WES,https://github.com/nf-core/test-datasets/raw/modules/data/genomics/homo_sapiens/illumina/gvcf/test2.genome.vcf.gz,assets/exomiser/pheno/family1.yml
Family2,Test1,WGS,https://github.com/nf-core/test-datasets/raw/modules/data/genomics/homo_sapiens/illumina/gvcf/test.genome.vcf.gz,assets/exomiser/pheno/family2.yml
64 changes: 64 additions & 0 deletions assets/exomiser/default_exomiser_WES_analysis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
## Exomiser Analysis Template.
# These are all the possible options for running exomiser. Use this as a template for
# your own set-up.
---
analysisMode: PASS_ONLY
inheritanceModes: {
AUTOSOMAL_DOMINANT: 0.1,
AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1,
AUTOSOMAL_RECESSIVE_COMP_HET: 2.0,
X_DOMINANT: 0.1,
X_RECESSIVE_HOM_ALT: 0.1,
X_RECESSIVE_COMP_HET: 2.0,
MITOCHONDRIAL: 0.2
}
frequencySources: [
UK10K,

GNOMAD_E_AFR,
GNOMAD_E_AMR,
# GNOMAD_E_ASJ,
GNOMAD_E_EAS,
# GNOMAD_E_FIN,
GNOMAD_E_NFE,
# GNOMAD_E_OTH,
GNOMAD_E_SAS,

GNOMAD_G_AFR,
GNOMAD_G_AMR,
# GNOMAD_G_ASJ,
GNOMAD_G_EAS,
# GNOMAD_G_FIN,
GNOMAD_G_NFE,
# GNOMAD_G_OTH,
GNOMAD_G_SAS
]
# Possible pathogenicitySources: (POLYPHEN, MUTATION_TASTER, SIFT), (REVEL, MVP), CADD, REMM, SPLICE_AI, ALPHA_MISSENSE
# REMM is trained on non-coding regulatory regions
# *WARNING* if you enable CADD or REMM ensure that you have downloaded and installed the CADD/REMM tabix files
# and updated their location in the application.properties. Exomiser will not run without this.
pathogenicitySources: [ REVEL, MVP ]
#this is the standard exomiser order.
steps: [
failedVariantFilter: { },
variantEffectFilter: {
remove: [
FIVE_PRIME_UTR_EXON_VARIANT,
FIVE_PRIME_UTR_INTRON_VARIANT,
THREE_PRIME_UTR_EXON_VARIANT,
THREE_PRIME_UTR_INTRON_VARIANT,
NON_CODING_TRANSCRIPT_EXON_VARIANT,
NON_CODING_TRANSCRIPT_INTRON_VARIANT,
CODING_TRANSCRIPT_INTRON_VARIANT,
UPSTREAM_GENE_VARIANT,
DOWNSTREAM_GENE_VARIANT,
INTERGENIC_VARIANT,
REGULATORY_REGION_VARIANT
]
},
frequencyFilter: { maxFrequency: 2.0 },
pathogenicityFilter: { keepNonPathogenic: true },
inheritanceFilter: { },
omimPrioritiser: { },
hiPhivePrioritiser: { }
]
55 changes: 55 additions & 0 deletions assets/exomiser/default_exomiser_WGS_analysis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
## Exomiser genome analysis template.
# These are all the possible options for running exomiser. Use this as a template for
# your own set-up.
---
analysisMode: PASS_ONLY
inheritanceModes: {
AUTOSOMAL_DOMINANT: 0.1,
AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1,
AUTOSOMAL_RECESSIVE_COMP_HET: 2.0,
X_DOMINANT: 0.1,
X_RECESSIVE_HOM_ALT: 0.1,
X_RECESSIVE_COMP_HET: 2.0,
MITOCHONDRIAL: 0.2
}
frequencySources: [
UK10K,

GNOMAD_E_AFR,
GNOMAD_E_AMR,
# GNOMAD_E_ASJ,
GNOMAD_E_EAS,
# GNOMAD_E_FIN,
GNOMAD_E_NFE,
# GNOMAD_E_OTH,
GNOMAD_E_SAS,

GNOMAD_G_AFR,
GNOMAD_G_AMR,
# GNOMAD_G_ASJ,
GNOMAD_G_EAS,
# GNOMAD_G_FIN,
GNOMAD_G_NFE,
# GNOMAD_G_OTH,
GNOMAD_G_SAS
]
# Possible pathogenicitySources: (POLYPHEN, MUTATION_TASTER, SIFT), (REVEL, MVP), CADD, REMM, SPLICE_AI, ALPHA_MISSENSE
# REMM is trained on non-coding regulatory regions
# *WARNING* if you enable CADD or REMM ensure that you have downloaded and installed the CADD/REMM tabix files
# and updated their location in the application.properties. Exomiser will not run without this.
pathogenicitySources: [ REVEL, MVP ]
# this is the recommended order for a genome-sized analysis.
steps: [
hiPhivePrioritiser: { },
# running the prioritiser followed by a priorityScoreFilter will remove genes
# which are least likely to contribute to the phenotype defined in hpoIds, this will
# dramatically reduce the time and memory required to analyse a genome.
# 0.501 is a good compromise to select good phenotype matches and the best protein-protein interactions hits from hiPhive
priorityScoreFilter: { priorityType: HIPHIVE_PRIORITY, minPriorityScore: 0.501 },
failedVariantFilter: { },
regulatoryFeatureFilter: { },
frequencyFilter: { maxFrequency: 2.0 },
pathogenicityFilter: { keepNonPathogenic: true },
inheritanceFilter: { },
omimPrioritiser: { }
]
30 changes: 30 additions & 0 deletions assets/exomiser/pheno/family1.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
id: family1
proband:
subject:
id: testN
sex: FEMALE
phenotypicFeatures:
- type:
id: HP:0001159
label: Syndactyly

pedigree:
persons:
- individualId: testN
paternalId: testT
sex: FEMALE
affectedStatus: AFFECTED
- individualId: testT
sex: MALE
affectedStatus: UNAFFECTED

metaData:
resources:
- id: hp
name: human phenotype ontology
url: http://purl.obolibrary.org/obo/hp.owl
version: hp/releases/2019-11-08
namespacePrefix: HP
iriPrefix: 'http://purl.obolibrary.org/obo/HP_'
phenopacketSchemaVersion: 2.0
Loading