Update docs

exomiser · Feb 28, 2024 · eceb9aa · eceb9aa
1 parent 8bfbed7
commit eceb9aa
Show file tree

Hide file tree

Showing 6 changed files with 93 additions and 67 deletions.
diff --git a/docs/acmg_assignment.rst b/docs/acmg_assignment.rst
@@ -30,13 +30,24 @@ Computational and Predictive Data
 PVS1
 ----
 Variants must have a predicted loss of function effect, be in a gene with known disease associations and have a gene
-constraint LOF O/E < 0.7635 (gnomAD 2.1.1) to suggest that a gene is LoF intolerant. Variants not predicted to lead to
+constraint LOF O/E < 0.7635 (gnomAD 4.0) to suggest that a gene is LoF intolerant. Variants not predicted to lead to
 NMD (those located in the last exon) will have the modifier downgraded to Strong.
 
+PS1
+---
+Variants with the same amino acid change as previously reported P/LP missense or in-frame indel ClinVar variants will be
+assigned `PS1` with a strength of `Strong` for variants >= 2 stars, `Moderate` for variants with 1 star or `Supporting`
+for those without a ClinVar start rating.
+
 PM4
 ---
 Stop-loss and in-frame insertions or deletions, not previously assigned a `PVS1` criterion are assigned `PM4`.
 
+PM5
+---
+Variants having a novel missense change to an amino acid where a previously reported ClinVar P/LP variant has been seen
+will be assigned `PM5` with a strength of `Moderate` for those with >=2 stars or `Supporting` otherwise.
+
 PP3 / BP4
 ---------
 If REVEL is chosen as a pathogenicity predictor for missense variants, `PP3` and `BP4` are assigned using the modifiers
@@ -46,6 +57,16 @@ Note that this suggests the use of modifiers up to Strong in the case of pathoge
 Otherwise, an ensemble-based approach will be used for other pathogenicity predictors as per the original 215 guidelines.
 It should be noted we found better performance using the REVEL-based approach when testing against the 100K genomes data.
 
+Functional Data
+===============
+PM1
+---
+Missense and inframe indels are assigned `PM1` if the surrounding region of 25 nucleotides either side of the variant
+contain at least 4 reported P/LP variants in ClinVar and no B/LB variants. If the number of P/LP variants is greater
+than the number of VUS in the region the strength will be assigned `Moderate` but regions containing P/LP <= VUS
+(and no B/BL) will have the strength downgraded to `Supporting`.
+
+
 Segregation Data
 ================
 BS4

diff --git a/docs/advanced_analysis.rst b/docs/advanced_analysis.rst
@@ -107,12 +107,7 @@ requires anything different, it is possible to manually define the data sources
         TOPMED,
         UK10K,
 
-        ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,
-
-        EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
-        EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
-        EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
-        EXAC_OTHER,
+        ESP_AA, ESP_EA, ESP_ALL,
 
         GNOMAD_E_AFR,
         GNOMAD_E_AMR,
@@ -208,25 +203,18 @@ Here you can specify which variant frequency databases you want to use. You can
 array format as the HPO IDs.
 
 The data sources used are from `1000 genomes <http://www.1000genomes.org>`_ (via DBSNP), `DBSNP <https://www.ncbi.nlm.nih.gov/projects/SNP/>`_,
-`ESP <https://evs.gs.washington.edu/EVS/>`_, `ExAC, gnomAD exomes and gnomAD genomes <https://gnomad.broadinstitute.org/about>`_,
-`UK10K <https://www.uk10k.org/>`_ (via DBSNP), `TOPMed <https://topmed.nhlbi.nih.gov/>`_ (via DBSNP).
+`ESP <https://evs.gs.washington.edu/EVS/>`_, `UK10K <https://www.uk10k.org/>`_ (via DBSNP), `TOPMed <https://topmed.nhlbi.nih.gov/>`_ (via DBSNP).
+
+As of the 2402 data release `ExAC, gnomAD exomes and gnomAD genomes <https://gnomad.broadinstitute.org/about>`_ source
+has been removed as this is part of the gnomAD 2.1+ data.
 
 DBSNP:
     ``THOUSAND_GENOMES``,
     ``UK10K``,
     ``TOPMED``
 
 ESP:
-    ``ESP_AFRICAN_AMERICAN``, ``ESP_EUROPEAN_AMERICAN``, ``ESP_ALL``
-
-ExAC:
-    ``EXAC_AFRICAN_INC_AFRICAN_AMERICAN``,
-    ``EXAC_AMERICAN``,
-    ``EXAC_SOUTH_ASIAN``,
-    ``EXAC_EAST_ASIAN``,
-    ``EXAC_FINNISH``,
-    ``EXAC_NON_FINNISH_EUROPEAN``,
-    ``EXAC_OTHER``
+    ``ESP_AA``, ``ESP_EA``, ``ESP_ALL``
 
 gnomAD exomes:
     ``GNOMAD_E_AFR``,
@@ -235,21 +223,26 @@ gnomAD exomes:
     ``GNOMAD_E_EAS``,
     ``GNOMAD_E_FIN``,
     ``GNOMAD_E_NFE``,
+    ``GNOMAD_E_MID``,
     ``GNOMAD_E_OTH``,
     ``GNOMAD_E_SAS``,
 
 gnomAD genomes:
     ``GNOMAD_G_AFR``,
     ``GNOMAD_G_AMR``,
+    ``GNOMAD_G_AMI``,
     ``GNOMAD_G_ASJ``,
     ``GNOMAD_G_EAS``,
     ``GNOMAD_G_FIN``,
     ``GNOMAD_G_NFE``,
+    ``GNOMAD_G_MID``,
     ``GNOMAD_G_OTH``,
     ``GNOMAD_G_SAS``
 
-We recommend using all databases if the proband population background is unknown, although removing the ``GNOMAD_E_ASJ``
-and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi population e.g.
+We recommend using all databases if the proband population background is unknown, although removing the ``ASJ``, ``AMI``,
+``FIN``, ``MID`` and ``OTH`` populations is recommended as these are small/founder populations which are likely to have
+artificially high allele frequencies for some relevant variants. These populations will not be included when assessing
+the population frequency for the ACMG assignments, even if used in the filtering.
 
 .. code-block:: yaml
 
@@ -258,29 +251,24 @@ and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi pop
       TOPMED,
       UK10K,
 
-      ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,
-
-      EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
-      EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
-      EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
-      EXAC_OTHER,
+      ESP_AA, ESP_EA, ESP_ALL,
 
       GNOMAD_E_AFR,
       GNOMAD_E_AMR,
-      #        GNOMAD_E_ASJ,
+      # GNOMAD_E_ASJ,
       GNOMAD_E_EAS,
-      GNOMAD_E_FIN,
+      # GNOMAD_E_FIN,
       GNOMAD_E_NFE,
-      GNOMAD_E_OTH,
+      # GNOMAD_E_OTH,
       GNOMAD_E_SAS,
 
       GNOMAD_G_AFR,
       GNOMAD_G_AMR,
-      #        GNOMAD_G_ASJ,
+      # GNOMAD_G_ASJ,
       GNOMAD_G_EAS,
-      GNOMAD_G_FIN,
+      # GNOMAD_G_FIN,
       GNOMAD_G_NFE,
-      GNOMAD_G_OTH,
+      # GNOMAD_G_OTH,
       GNOMAD_G_SAS
     ]
 
@@ -289,14 +277,27 @@ and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi pop
 
 pathogenicitySources:
 ---------------------
-Possible pathogenicitySources: ``POLYPHEN``, ``MUTATION_TASTER``, ``SIFT``, ``REVEL``, ``MVP``, ``CADD``, ``REMM``. ``REMM`` is trained on
+Possible pathogenicitySources: ``POLYPHEN``, ``MUTATION_TASTER``, ``SIFT``, ``REVEL``, ``MVP``, ``ALPHA_MISSENSE``,
+``SPLICE_AI`` (derived from gnomAD 4.0, so only available for hg38),  ``CADD``, ``REMM``. ``REMM`` is trained on
 non-coding regulatory regions. **WARNING** if you enable ``CADD``, ensure that you have downloaded and installed the CADD
 tabix files and updated their location in the ``application.properties`` (see :ref:`cadd-install`). Exomiser will not run
 without this.
 
 We recommend using either  ``[REVEL, MVP]`` **OR** ``[POLYPHEN, MUTATION_TASTER, SIFT]`` as REVEL and MVP are newer
 predictors which have been shown to have better performance and are more nuanced. Mixing them with the Polyphen2,
-MutationTaster or SIFT will give worse performance.
+MutationTaster or SIFT will give worse performance. Testing on GEL solved cases with AlphaMissense slightly increased
+performance when combined with MVP. We advise testing on local cohorts for assessing local performance.
+
+`REVEL scores are freely available for non-commercial use. For other uses, please contact Weiva Sieh.`
+
+`AlphaMissense Database Copyright (2023) DeepMind Technologies Limited. All predictions are provided for non-commercial
+research use only under CC BY-NC-SA license. Researchers interested in predictions not yet provided, and for
+non-commercial use, can send an expression of interest to [email protected].`
+
+`SpliceAI source code is provided under the GPLv3 license. SpliceAI includes several third party packages provided under
+other open source licenses, please see NOTICE for additional details. The trained models used by SpliceAI (located in
+this package at spliceai/models) are provided under the CC BY NC 4.0 license for academic and non-commercial use; other
+use requires a commercial license from Illumina, Inc.`
 
 .. code-block:: yaml
 

diff --git a/docs/input_files_and_options.rst b/docs/input_files_and_options.rst
@@ -94,6 +94,14 @@ setting the analysisMode option to PASS_ONLY. This will also aid your ability to
 
 Analyses can be run in batch mode. Simply put the path to each analysis file in the batch file - one file path per line.
 
+.. important::
+
+    The exome and genome analyses found in the `test-analysis-exome.yml` and `test-analysis-genome.yml` files are
+    recommended for use in most situations, and removing steps from the analysis is likely to negatively impact
+    performance. It is *strongly* recommended to test any changes against the standard setup on the example samples and
+    your own solved cases to check the impact of any changes you might want to make.
+
+
 .. parsed-literal::
 
     java -jar exomiser-cli-|version|.jar --analysis-batch examples/test-analysis-batch.txt

diff --git a/docs/installation.rst b/docs/installation.rst
@@ -5,9 +5,9 @@ Installation
 Software and Hardware requirements
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-- Minimum 8/16GB RAM (For an exome analysis of a 30,000 variant sample 4GB RAM should suffice. For a genome analysis of a 4,400,000 variant sample 12GB RAM should suffice.)
+- Minimum 8/16GB RAM (For an exome analysis of a 30,000 variant sample 4GB RAM should suffice. For a genome analysis of a 4,400,000 variant sample 8GB RAM should suffice.)
 - Any 64-bit operating system
-- Java 11 or above
+- Java 17 or above
 - At least 100GB free disk space (SSD preferred for best performance)
 - An internet connection is not required to run the Exomiser, although network access will be required if accessing a networked database (optional).
 - By default the Exomiser is completely self-contained and is able to run on standard consumer laptops.
@@ -30,7 +30,7 @@ Windows install
 1. Install `7-Zip <http://www.7-zip.org>`_ for unzipping the archive files. The built-in archiving software has issues extracting the zip files.
 2. Download the data and distribution files from https://data.monarchinitiative.org/exomiser/latest
 3. Extract the distribution files by right-clicking exomiser-cli-|version|-distribution.zip and selecting 7-Zip > Extract Here
-4. Extract the data files (e.g. 2109_phenotype.zip, 2109_hg19.zip) by right-clicking the archive and selecting 7-Zip > Extract files... into the exomiser data directory. By default exomiser expects this to be 'exomiser-cli-\ |version|\/data', but this can be changed in the ``application.properties``
+4. Extract the data files (e.g. 2402_phenotype.zip, 2402_hg19.zip) by right-clicking the archive and selecting 7-Zip > Extract files... into the exomiser data directory. By default exomiser expects this to be 'exomiser-cli-\ |version|\/data', but this can be changed in the ``application.properties``
 5. cd exomiser-cli-|version|
 6. java -Xmx4g -jar exomiser-cli-|version|.jar --analysis examples/test-analysis-exome.yml
 
@@ -44,18 +44,18 @@ The following shell script should work-
     # download the distribution (won't take long)
     wget https://data.monarchinitiative.org/exomiser/latest/exomiser-cli-\ |version|\-distribution.zip
     # download the data (this is ~80GB and will take a while). If you only require a single assembly, only download the relevant file.
-    wget https://data.monarchinitiative.org/exomiser/latest/2202_hg19.zip
-    wget https://data.monarchinitiative.org/exomiser/latest/2202_hg38.zip
-    wget https://data.monarchinitiative.org/exomiser/latest/2202_phenotype.zip
+    wget https://data.monarchinitiative.org/exomiser/latest/2402_hg19.zip
+    wget https://data.monarchinitiative.org/exomiser/latest/2402_hg38.zip
+    wget https://data.monarchinitiative.org/exomiser/latest/2402_phenotype.zip
 
     # unzip the distribution and data files - this will create a directory called 'exomiser-cli-|version|' in the current working directory
     unzip exomiser-cli-|version|-distribution.zip
-    unzip 2202_*.zip -d exomiser-cli-|version|/data
+    unzip 2402_*.zip -d exomiser-cli-|version|/data
 
     # Check the application.properties are pointing to the correct versions
-    # exomiser.hg19.data-version=2202
-    # exomiser.hg38.data-version=2202
-    # exomiser.phenotype.data-version=2202
+    # exomiser.hg19.data-version=2402
+    # exomiser.hg38.data-version=2402
+    # exomiser.phenotype.data-version=2402
 
     # run a test exome analysis
     cd exomiser-cli-|version|
@@ -155,7 +155,7 @@ with
 
     exomiser.data-directory=/full/path/to/alternative/data/directory
 
-For example, assuming you unzipped the contents of the `2202_hg38.zip` data file into `/data/exomiser-data`:
+For example, assuming you unzipped the contents of the `2402_hg38.zip` data file into `/data/exomiser-data`:
 
 .. parsed-literal::
 
@@ -167,9 +167,9 @@ where the contents of `exomiser-data` looks something like this:
 
     $ tree -L 1 /data/exomiser-data/
         /data/exomiser-data/
-        ├── 2202_hg19
-        ├── 2202_hg38
-        ├── 2202_phenotype
+        ├── 2402_hg19
+        ├── 2402_hg38
+        ├── 2402_phenotype
         ├── cadd
         └── remm
 
@@ -182,45 +182,41 @@ the ``application.properties`` to contain this:
 .. code-block:: yaml
 
     ### hg19 assembly ###
-    exomiser.hg19.data-version=2109
-    exomiser.hg19.variant-white-list-path=2109_hg19_clinvar_whitelist.tsv.gz
+    exomiser.hg19.data-version=2402
 
     ### phenotypes ###
-    exomiser.phenotype.data-version=2109
+    exomiser.phenotype.data-version=2402
 
 
 For a GRCh38/hg38 only setup:
 
 .. code-block:: yaml
 
     ### hg38 assembly ###
-    exomiser.hg38.data-version=2109
-    exomiser.hg38.variant-white-list-path=2109_hg38_clinvar_whitelist.tsv.gz
+    exomiser.hg38.data-version=2402
 
     ### phenotypes ###
-    exomiser.phenotype.data-version=2109
+    exomiser.phenotype.data-version=2402
 
 
 Or an install supporting both assemblies:
 
 .. code-block:: yaml
 
     ### hg19 assembly ###
-    exomiser.hg19.data-version=2109
-    exomiser.hg19.variant-white-list-path=2109_hg19_clinvar_whitelist.tsv.gz
+    exomiser.hg19.data-version=2402
 
     ### hg38 assembly ###
-    exomiser.hg38.data-version=2109
-    exomiser.hg38.variant-white-list-path=2109_hg38_clinvar_whitelist.tsv.gz
+    exomiser.hg38.data-version=2402
 
     ### phenotypes ###
-    exomiser.phenotype.data-version=2109
+    exomiser.phenotype.data-version=2402
 
 
 *n.b.* each assembly will require approximately 1GB RAM to load. Attempting to analyse a VCF called using an
 unsupported/unloaded assembly data will result in an unrecoverable error being thrown.
 
-Notice here that we are loading a whitelist created from ClinVar data. Exomiser will consider any variant on the whitelist
+By default, Exomiser uses a whitelist created from ClinVar data. Exomiser will consider any variant on the whitelist
 to be maximally pathogenic, regardless of the underlying data (*e.g.* variant effect, allele frequency, predicted pathogenicity)
 and always included these in the results.
 

diff --git a/docs/result_interpretation.rst b/docs/result_interpretation.rst
@@ -75,7 +75,7 @@ was (1) or wasn't (0) used for calculating the EXOMISER_GENE_COMBINED_SCORE and
 
 .. code-block:: tsv
 
-    #RANK	ID	GENE_SYMBOL	ENTREZ_GENE_ID	MOI	P-VALUE	EXOMISER_GENE_COMBINED_SCORE	EXOMISER_GENE_PHENO_SCORE	EXOMISER_GENE_VARIANT_SCORE	EXOMISER_VARIANT_SCORE	CONTRIBUTING_VARIANT	WHITELIST_VARIANT	VCF_ID	RS_ID	CONTIG	START	END	REF	ALT	CHANGE_LENGTH	QUAL	FILTER	GENOTYPE	FUNCTIONAL_CLASS	HGVS	EXOMISER_ACMG_CLASSIFICATION	EXOMISER_ACMG_EVIDENCE	EXOMISER_ACMG_DISEASE_ID	EXOMISER_ACMG_DISEASE_NAME	CLINVAR_ALLELE_ID	CLINVAR_PRIMARY_INTERPRETATION	CLINVAR_STAR_RATING	GENE_CONSTRAINT_LOEUF	GENE_CONSTRAINT_LOEUF_LOWER	GENE_CONSTRAINT_LOEUF_UPPER	MAX_FREQ_SOURCE	MAX_FREQ	ALL_FREQ	MAX_PATH_SOURCE	MAX_PATH	ALL_PATH
+    #RANK	ID	GENE_SYMBOL	ENTREZ_GENE_ID	MOI	P-VALUE	EXOMISER_GENE_COMBINED_SCORE	EXOMISER_GENE_PHENO_SCORE	EXOMISER_GENE_VARIANT_SCORE	EXOMISER_VARIANT_SCORE	CONTRIBUTING_VARIANT	WHITELIST_VARIANT	VCF_ID	RS_ID	CONTIG	START	END	REF	ALT	CHANGE_LENGTH	QUAL	FILTER	GENOTYPE	FUNCTIONAL_CLASS	HGVS	EXOMISER_ACMG_CLASSIFICATION	EXOMISER_ACMG_EVIDENCE	EXOMISER_ACMG_DISEASE_ID	EXOMISER_ACMG_DISEASE_NAME	CLINVAR_VARIANT_ID	CLINVAR_PRIMARY_INTERPRETATION	CLINVAR_STAR_RATING	GENE_CONSTRAINT_LOEUF	GENE_CONSTRAINT_LOEUF_LOWER	GENE_CONSTRAINT_LOEUF_UPPER	MAX_FREQ_SOURCE	MAX_FREQ	ALL_FREQ	MAX_PATH_SOURCE	MAX_PATH	ALL_PATH
     1	10-123256215-T-G_AD	FGFR2	2263	AD	0.0000	0.9981	1.0000	1.0000	1.0000	1	1		rs121918506	10	123256215	123256215	T	G	0	100.0000	PASS	1|0	missense_variant	FGFR2:ENST00000346997.2:c.1688A>C:p.(Glu563Ala)	LIKELY_PATHOGENIC	PM2,PP3_Strong,PP4,PP5	OMIM:123150	Jackson-Weiss syndrome	28333	LIKELY_PATHOGENIC	1	0.13692	0.074	0.27				REVEL	0.965	REVEL=0.965,MVP=0.9517972
     2	6-132203615-G-A_AD	ENPP1	5167	AD	0.0049	0.8690	0.5773	0.9996	0.9996	1	0		rs770775549	6	132203615	132203615	G	A	0	922.9800	PASS	0/1	splice_donor_variant	ENPP1:ENST00000360971.2:c.2230+1G>A:p.?	UNCERTAIN_SIGNIFICANCE	PVS1_Strong	OMIM:615522	Cole disease		NOT_PROVIDED	0	0.41042	0.292	0.586	GNOMAD_E_SAS	0.0032486517	TOPMED=7.556E-4,EXAC_NON_FINNISH_EUROPEAN=0.0014985314,GNOMAD_E_NFE=0.0017907989,GNOMAD_E_SAS=0.0032486517
     //

diff --git a/docs/running.rst b/docs/running.rst
@@ -90,7 +90,7 @@ or
     only recognizes class file versions up to 52.0
 
 
-You are running an older unsupported version of Java. Exomiser requires java version 11 or higher. This can be checked by running:
+You are running an older unsupported version of Java. Exomiser requires java version 17 or higher. This can be checked by running:
 
 .. code-block:: console
 
@@ -100,9 +100,9 @@ You should see something like this in response:
 
 .. code-block:: console
 
-    openjdk version "11.0.11" 2021-04-20
-    OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
-    OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)
+    openjdk version "17.0.9" 2023-10-17
+    OpenJDK Runtime Environment (build 17.0.9+9-Ubuntu-122.04)
+    OpenJDK 64-Bit Server VM (build 17.0.9+9-Ubuntu-122.04, mixed mode, sharing)
 
 
-Versions lower than 11 (e.g. 1.5, 1.6, 1.7, 1.8, 9, 10) will not run exomiser, so you will need to install the latest java version.
+Versions lower than 17 (e.g. 1.5, 1.6, 1.7, 1.8, 9, 10) will not run exomiser, so you will need to install the latest java version.