Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
julesjacobsen committed Feb 28, 2024
1 parent 8bfbed7 commit eceb9aa
Show file tree
Hide file tree
Showing 6 changed files with 93 additions and 67 deletions.
23 changes: 22 additions & 1 deletion docs/acmg_assignment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,24 @@ Computational and Predictive Data
PVS1
----
Variants must have a predicted loss of function effect, be in a gene with known disease associations and have a gene
constraint LOF O/E < 0.7635 (gnomAD 2.1.1) to suggest that a gene is LoF intolerant. Variants not predicted to lead to
constraint LOF O/E < 0.7635 (gnomAD 4.0) to suggest that a gene is LoF intolerant. Variants not predicted to lead to
NMD (those located in the last exon) will have the modifier downgraded to Strong.

PS1
---
Variants with the same amino acid change as previously reported P/LP missense or in-frame indel ClinVar variants will be
assigned `PS1` with a strength of `Strong` for variants >= 2 stars, `Moderate` for variants with 1 star or `Supporting`
for those without a ClinVar start rating.

PM4
---
Stop-loss and in-frame insertions or deletions, not previously assigned a `PVS1` criterion are assigned `PM4`.

PM5
---
Variants having a novel missense change to an amino acid where a previously reported ClinVar P/LP variant has been seen
will be assigned `PM5` with a strength of `Moderate` for those with >=2 stars or `Supporting` otherwise.

PP3 / BP4
---------
If REVEL is chosen as a pathogenicity predictor for missense variants, `PP3` and `BP4` are assigned using the modifiers
Expand All @@ -46,6 +57,16 @@ Note that this suggests the use of modifiers up to Strong in the case of pathoge
Otherwise, an ensemble-based approach will be used for other pathogenicity predictors as per the original 215 guidelines.
It should be noted we found better performance using the REVEL-based approach when testing against the 100K genomes data.

Functional Data
===============
PM1
---
Missense and inframe indels are assigned `PM1` if the surrounding region of 25 nucleotides either side of the variant
contain at least 4 reported P/LP variants in ClinVar and no B/LB variants. If the number of P/LP variants is greater
than the number of VUS in the region the strength will be assigned `Moderate` but regions containing P/LP <= VUS
(and no B/BL) will have the strength downgraded to `Supporting`.


Segregation Data
================
BS4
Expand Down
69 changes: 35 additions & 34 deletions docs/advanced_analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,12 +107,7 @@ requires anything different, it is possible to manually define the data sources
TOPMED,
UK10K,
ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,
EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
EXAC_OTHER,
ESP_AA, ESP_EA, ESP_ALL,
GNOMAD_E_AFR,
GNOMAD_E_AMR,
Expand Down Expand Up @@ -208,25 +203,18 @@ Here you can specify which variant frequency databases you want to use. You can
array format as the HPO IDs.

The data sources used are from `1000 genomes <http://www.1000genomes.org>`_ (via DBSNP), `DBSNP <https://www.ncbi.nlm.nih.gov/projects/SNP/>`_,
`ESP <https://evs.gs.washington.edu/EVS/>`_, `ExAC, gnomAD exomes and gnomAD genomes <https://gnomad.broadinstitute.org/about>`_,
`UK10K <https://www.uk10k.org/>`_ (via DBSNP), `TOPMed <https://topmed.nhlbi.nih.gov/>`_ (via DBSNP).
`ESP <https://evs.gs.washington.edu/EVS/>`_, `UK10K <https://www.uk10k.org/>`_ (via DBSNP), `TOPMed <https://topmed.nhlbi.nih.gov/>`_ (via DBSNP).

As of the 2402 data release `ExAC, gnomAD exomes and gnomAD genomes <https://gnomad.broadinstitute.org/about>`_ source
has been removed as this is part of the gnomAD 2.1+ data.

DBSNP:
``THOUSAND_GENOMES``,
``UK10K``,
``TOPMED``

ESP:
``ESP_AFRICAN_AMERICAN``, ``ESP_EUROPEAN_AMERICAN``, ``ESP_ALL``

ExAC:
``EXAC_AFRICAN_INC_AFRICAN_AMERICAN``,
``EXAC_AMERICAN``,
``EXAC_SOUTH_ASIAN``,
``EXAC_EAST_ASIAN``,
``EXAC_FINNISH``,
``EXAC_NON_FINNISH_EUROPEAN``,
``EXAC_OTHER``
``ESP_AA``, ``ESP_EA``, ``ESP_ALL``

gnomAD exomes:
``GNOMAD_E_AFR``,
Expand All @@ -235,21 +223,26 @@ gnomAD exomes:
``GNOMAD_E_EAS``,
``GNOMAD_E_FIN``,
``GNOMAD_E_NFE``,
``GNOMAD_E_MID``,
``GNOMAD_E_OTH``,
``GNOMAD_E_SAS``,

gnomAD genomes:
``GNOMAD_G_AFR``,
``GNOMAD_G_AMR``,
``GNOMAD_G_AMI``,
``GNOMAD_G_ASJ``,
``GNOMAD_G_EAS``,
``GNOMAD_G_FIN``,
``GNOMAD_G_NFE``,
``GNOMAD_G_MID``,
``GNOMAD_G_OTH``,
``GNOMAD_G_SAS``

We recommend using all databases if the proband population background is unknown, although removing the ``GNOMAD_E_ASJ``
and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi population e.g.
We recommend using all databases if the proband population background is unknown, although removing the ``ASJ``, ``AMI``,
``FIN``, ``MID`` and ``OTH`` populations is recommended as these are small/founder populations which are likely to have
artificially high allele frequencies for some relevant variants. These populations will not be included when assessing
the population frequency for the ACMG assignments, even if used in the filtering.

.. code-block:: yaml
Expand All @@ -258,29 +251,24 @@ and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi pop
TOPMED,
UK10K,
ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,
EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
EXAC_OTHER,
ESP_AA, ESP_EA, ESP_ALL,
GNOMAD_E_AFR,
GNOMAD_E_AMR,
# GNOMAD_E_ASJ,
# GNOMAD_E_ASJ,
GNOMAD_E_EAS,
GNOMAD_E_FIN,
# GNOMAD_E_FIN,
GNOMAD_E_NFE,
GNOMAD_E_OTH,
# GNOMAD_E_OTH,
GNOMAD_E_SAS,
GNOMAD_G_AFR,
GNOMAD_G_AMR,
# GNOMAD_G_ASJ,
# GNOMAD_G_ASJ,
GNOMAD_G_EAS,
GNOMAD_G_FIN,
# GNOMAD_G_FIN,
GNOMAD_G_NFE,
GNOMAD_G_OTH,
# GNOMAD_G_OTH,
GNOMAD_G_SAS
]
Expand All @@ -289,14 +277,27 @@ and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi pop

pathogenicitySources:
---------------------
Possible pathogenicitySources: ``POLYPHEN``, ``MUTATION_TASTER``, ``SIFT``, ``REVEL``, ``MVP``, ``CADD``, ``REMM``. ``REMM`` is trained on
Possible pathogenicitySources: ``POLYPHEN``, ``MUTATION_TASTER``, ``SIFT``, ``REVEL``, ``MVP``, ``ALPHA_MISSENSE``,
``SPLICE_AI`` (derived from gnomAD 4.0, so only available for hg38), ``CADD``, ``REMM``. ``REMM`` is trained on
non-coding regulatory regions. **WARNING** if you enable ``CADD``, ensure that you have downloaded and installed the CADD
tabix files and updated their location in the ``application.properties`` (see :ref:`cadd-install`). Exomiser will not run
without this.

We recommend using either ``[REVEL, MVP]`` **OR** ``[POLYPHEN, MUTATION_TASTER, SIFT]`` as REVEL and MVP are newer
predictors which have been shown to have better performance and are more nuanced. Mixing them with the Polyphen2,
MutationTaster or SIFT will give worse performance.
MutationTaster or SIFT will give worse performance. Testing on GEL solved cases with AlphaMissense slightly increased
performance when combined with MVP. We advise testing on local cohorts for assessing local performance.

`REVEL scores are freely available for non-commercial use. For other uses, please contact Weiva Sieh.`

`AlphaMissense Database Copyright (2023) DeepMind Technologies Limited. All predictions are provided for non-commercial
research use only under CC BY-NC-SA license. Researchers interested in predictions not yet provided, and for
non-commercial use, can send an expression of interest to [email protected].`

`SpliceAI source code is provided under the GPLv3 license. SpliceAI includes several third party packages provided under
other open source licenses, please see NOTICE for additional details. The trained models used by SpliceAI (located in
this package at spliceai/models) are provided under the CC BY NC 4.0 license for academic and non-commercial use; other
use requires a commercial license from Illumina, Inc.`

.. code-block:: yaml
Expand Down
8 changes: 8 additions & 0 deletions docs/input_files_and_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,14 @@ setting the analysisMode option to PASS_ONLY. This will also aid your ability to

Analyses can be run in batch mode. Simply put the path to each analysis file in the batch file - one file path per line.

.. important::

The exome and genome analyses found in the `test-analysis-exome.yml` and `test-analysis-genome.yml` files are
recommended for use in most situations, and removing steps from the analysis is likely to negatively impact
performance. It is *strongly* recommended to test any changes against the standard setup on the example samples and
your own solved cases to check the impact of any changes you might want to make.


.. parsed-literal::
java -jar exomiser-cli-|version|.jar --analysis-batch examples/test-analysis-batch.txt
Expand Down
48 changes: 22 additions & 26 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ Installation
Software and Hardware requirements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Minimum 8/16GB RAM (For an exome analysis of a 30,000 variant sample 4GB RAM should suffice. For a genome analysis of a 4,400,000 variant sample 12GB RAM should suffice.)
- Minimum 8/16GB RAM (For an exome analysis of a 30,000 variant sample 4GB RAM should suffice. For a genome analysis of a 4,400,000 variant sample 8GB RAM should suffice.)
- Any 64-bit operating system
- Java 11 or above
- Java 17 or above
- At least 100GB free disk space (SSD preferred for best performance)
- An internet connection is not required to run the Exomiser, although network access will be required if accessing a networked database (optional).
- By default the Exomiser is completely self-contained and is able to run on standard consumer laptops.
Expand All @@ -30,7 +30,7 @@ Windows install
1. Install `7-Zip <http://www.7-zip.org>`_ for unzipping the archive files. The built-in archiving software has issues extracting the zip files.
2. Download the data and distribution files from https://data.monarchinitiative.org/exomiser/latest
3. Extract the distribution files by right-clicking exomiser-cli-|version|-distribution.zip and selecting 7-Zip > Extract Here
4. Extract the data files (e.g. 2109_phenotype.zip, 2109_hg19.zip) by right-clicking the archive and selecting 7-Zip > Extract files... into the exomiser data directory. By default exomiser expects this to be 'exomiser-cli-\ |version|\/data', but this can be changed in the ``application.properties``
4. Extract the data files (e.g. 2402_phenotype.zip, 2402_hg19.zip) by right-clicking the archive and selecting 7-Zip > Extract files... into the exomiser data directory. By default exomiser expects this to be 'exomiser-cli-\ |version|\/data', but this can be changed in the ``application.properties``
5. cd exomiser-cli-|version|
6. java -Xmx4g -jar exomiser-cli-|version|.jar --analysis examples/test-analysis-exome.yml

Expand All @@ -44,18 +44,18 @@ The following shell script should work-
# download the distribution (won't take long)
wget https://data.monarchinitiative.org/exomiser/latest/exomiser-cli-\ |version|\-distribution.zip
# download the data (this is ~80GB and will take a while). If you only require a single assembly, only download the relevant file.
wget https://data.monarchinitiative.org/exomiser/latest/2202_hg19.zip
wget https://data.monarchinitiative.org/exomiser/latest/2202_hg38.zip
wget https://data.monarchinitiative.org/exomiser/latest/2202_phenotype.zip
wget https://data.monarchinitiative.org/exomiser/latest/2402_hg19.zip
wget https://data.monarchinitiative.org/exomiser/latest/2402_hg38.zip
wget https://data.monarchinitiative.org/exomiser/latest/2402_phenotype.zip
# unzip the distribution and data files - this will create a directory called 'exomiser-cli-|version|' in the current working directory
unzip exomiser-cli-|version|-distribution.zip
unzip 2202_*.zip -d exomiser-cli-|version|/data
unzip 2402_*.zip -d exomiser-cli-|version|/data
# Check the application.properties are pointing to the correct versions
# exomiser.hg19.data-version=2202
# exomiser.hg38.data-version=2202
# exomiser.phenotype.data-version=2202
# exomiser.hg19.data-version=2402
# exomiser.hg38.data-version=2402
# exomiser.phenotype.data-version=2402
# run a test exome analysis
cd exomiser-cli-|version|
Expand Down Expand Up @@ -155,7 +155,7 @@ with
exomiser.data-directory=/full/path/to/alternative/data/directory
For example, assuming you unzipped the contents of the `2202_hg38.zip` data file into `/data/exomiser-data`:
For example, assuming you unzipped the contents of the `2402_hg38.zip` data file into `/data/exomiser-data`:

.. parsed-literal::
Expand All @@ -167,9 +167,9 @@ where the contents of `exomiser-data` looks something like this:
$ tree -L 1 /data/exomiser-data/
/data/exomiser-data/
├── 2202_hg19
├── 2202_hg38
├── 2202_phenotype
├── 2402_hg19
├── 2402_hg38
├── 2402_phenotype
├── cadd
└── remm
Expand All @@ -182,45 +182,41 @@ the ``application.properties`` to contain this:
.. code-block:: yaml
### hg19 assembly ###
exomiser.hg19.data-version=2109
exomiser.hg19.variant-white-list-path=2109_hg19_clinvar_whitelist.tsv.gz
exomiser.hg19.data-version=2402
### phenotypes ###
exomiser.phenotype.data-version=2109
exomiser.phenotype.data-version=2402
For a GRCh38/hg38 only setup:

.. code-block:: yaml
### hg38 assembly ###
exomiser.hg38.data-version=2109
exomiser.hg38.variant-white-list-path=2109_hg38_clinvar_whitelist.tsv.gz
exomiser.hg38.data-version=2402
### phenotypes ###
exomiser.phenotype.data-version=2109
exomiser.phenotype.data-version=2402
Or an install supporting both assemblies:

.. code-block:: yaml
### hg19 assembly ###
exomiser.hg19.data-version=2109
exomiser.hg19.variant-white-list-path=2109_hg19_clinvar_whitelist.tsv.gz
exomiser.hg19.data-version=2402
### hg38 assembly ###
exomiser.hg38.data-version=2109
exomiser.hg38.variant-white-list-path=2109_hg38_clinvar_whitelist.tsv.gz
exomiser.hg38.data-version=2402
### phenotypes ###
exomiser.phenotype.data-version=2109
exomiser.phenotype.data-version=2402
*n.b.* each assembly will require approximately 1GB RAM to load. Attempting to analyse a VCF called using an
unsupported/unloaded assembly data will result in an unrecoverable error being thrown.

Notice here that we are loading a whitelist created from ClinVar data. Exomiser will consider any variant on the whitelist
By default, Exomiser uses a whitelist created from ClinVar data. Exomiser will consider any variant on the whitelist
to be maximally pathogenic, regardless of the underlying data (*e.g.* variant effect, allele frequency, predicted pathogenicity)
and always included these in the results.

Expand Down
2 changes: 1 addition & 1 deletion docs/result_interpretation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ was (1) or wasn't (0) used for calculating the EXOMISER_GENE_COMBINED_SCORE and

.. code-block:: tsv
#RANK ID GENE_SYMBOL ENTREZ_GENE_ID MOI P-VALUE EXOMISER_GENE_COMBINED_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_VARIANT_SCORE CONTRIBUTING_VARIANT WHITELIST_VARIANT VCF_ID RS_ID CONTIG START END REF ALT CHANGE_LENGTH QUAL FILTER GENOTYPE FUNCTIONAL_CLASS HGVS EXOMISER_ACMG_CLASSIFICATION EXOMISER_ACMG_EVIDENCE EXOMISER_ACMG_DISEASE_ID EXOMISER_ACMG_DISEASE_NAME CLINVAR_ALLELE_ID CLINVAR_PRIMARY_INTERPRETATION CLINVAR_STAR_RATING GENE_CONSTRAINT_LOEUF GENE_CONSTRAINT_LOEUF_LOWER GENE_CONSTRAINT_LOEUF_UPPER MAX_FREQ_SOURCE MAX_FREQ ALL_FREQ MAX_PATH_SOURCE MAX_PATH ALL_PATH
#RANK ID GENE_SYMBOL ENTREZ_GENE_ID MOI P-VALUE EXOMISER_GENE_COMBINED_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_VARIANT_SCORE CONTRIBUTING_VARIANT WHITELIST_VARIANT VCF_ID RS_ID CONTIG START END REF ALT CHANGE_LENGTH QUAL FILTER GENOTYPE FUNCTIONAL_CLASS HGVS EXOMISER_ACMG_CLASSIFICATION EXOMISER_ACMG_EVIDENCE EXOMISER_ACMG_DISEASE_ID EXOMISER_ACMG_DISEASE_NAME CLINVAR_VARIANT_ID CLINVAR_PRIMARY_INTERPRETATION CLINVAR_STAR_RATING GENE_CONSTRAINT_LOEUF GENE_CONSTRAINT_LOEUF_LOWER GENE_CONSTRAINT_LOEUF_UPPER MAX_FREQ_SOURCE MAX_FREQ ALL_FREQ MAX_PATH_SOURCE MAX_PATH ALL_PATH
1 10-123256215-T-G_AD FGFR2 2263 AD 0.0000 0.9981 1.0000 1.0000 1.0000 1 1 rs121918506 10 123256215 123256215 T G 0 100.0000 PASS 1|0 missense_variant FGFR2:ENST00000346997.2:c.1688A>C:p.(Glu563Ala) LIKELY_PATHOGENIC PM2,PP3_Strong,PP4,PP5 OMIM:123150 Jackson-Weiss syndrome 28333 LIKELY_PATHOGENIC 1 0.13692 0.074 0.27 REVEL 0.965 REVEL=0.965,MVP=0.9517972
2 6-132203615-G-A_AD ENPP1 5167 AD 0.0049 0.8690 0.5773 0.9996 0.9996 1 0 rs770775549 6 132203615 132203615 G A 0 922.9800 PASS 0/1 splice_donor_variant ENPP1:ENST00000360971.2:c.2230+1G>A:p.? UNCERTAIN_SIGNIFICANCE PVS1_Strong OMIM:615522 Cole disease NOT_PROVIDED 0 0.41042 0.292 0.586 GNOMAD_E_SAS 0.0032486517 TOPMED=7.556E-4,EXAC_NON_FINNISH_EUROPEAN=0.0014985314,GNOMAD_E_NFE=0.0017907989,GNOMAD_E_SAS=0.0032486517
//
Expand Down
10 changes: 5 additions & 5 deletions docs/running.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ or
only recognizes class file versions up to 52.0
You are running an older unsupported version of Java. Exomiser requires java version 11 or higher. This can be checked by running:
You are running an older unsupported version of Java. Exomiser requires java version 17 or higher. This can be checked by running:

.. code-block:: console
Expand All @@ -100,9 +100,9 @@ You should see something like this in response:

.. code-block:: console
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)
openjdk version "17.0.9" 2023-10-17
OpenJDK Runtime Environment (build 17.0.9+9-Ubuntu-122.04)
OpenJDK 64-Bit Server VM (build 17.0.9+9-Ubuntu-122.04, mixed mode, sharing)
Versions lower than 11 (e.g. 1.5, 1.6, 1.7, 1.8, 9, 10) will not run exomiser, so you will need to install the latest java version.
Versions lower than 17 (e.g. 1.5, 1.6, 1.7, 1.8, 9, 10) will not run exomiser, so you will need to install the latest java version.

0 comments on commit eceb9aa

Please sign in to comment.