Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Robert Butler authored Mar 30, 2018
1 parent f8e9e43 commit 68c0bce
Showing 1 changed file with 27 additions and 24 deletions.
51 changes: 27 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,18 @@
This project takes variants as input and queries NCBI eutilities to generate ClinVar Variation Report<sup>1</sup> scoring metrics. The overall goal is to generate annotations of use for given batches of variants to inform clinical interpretation. The metrics include:

* Clinotator Raw Score - A weighted metric of pathogenicity based on submitter type, assertion type and assertion age.
* Average Clinical Assertion Age - The average age of clinical assertions made about a variant.
* Clinotator Weighted Significance - A predicted clinical significance based on prediction intervals of the Clinotator Raw Score.
* Reclassification Recommendation - A ranking of the impact of reclassification based on the Clinotator Weighted Significance.
* Average Clinical Assertion Age - Clinical assertions with criteria provided are counted, and their average age is calculated.
* Clinotator Predicted Significance - A predicted clinical significance based on the weighted distribution of all two star variants in ClinVar with two or more clinical assertions.
* Reclassification Recommendation - A ranking of the impact of reclassification based on the Clinotator Predicted Significance.

These and other stats are returned on a per variant basis, in a table, and additionally as vcf annotations if a vcf file is provided. See below for more information.

## Code Example

```bash
usage: clinotator.py [-h] [--log] [--long-log] [-o prefix] [--version] -e EMAIL -t {vid,rsid,vcf} file [file ...]
usage: clinotator.py [-h] [--log] [--long-log] [-o prefix] [--version] -e
EMAIL -t {vid,rsid,vcf}
file [file ...]

Clinical interpretation of ambiguous ClinVar annotations

Expand All @@ -30,7 +32,7 @@ optional arguments:
--version show program's version number and exit
required arguments:
-e EMAIL NCBI requires an email for querying their databases
-e EMAIL NCBI requires an email for database queries
-t {vid,rsid,vcf} vid - ClinVar Variation ID list
rsid - dbSNP rsID list
vcf - vcf file (output vcf generated)
Expand All @@ -42,15 +44,15 @@ Three required bits of information: **(1)** the type of input file, **(2)** the
### Optional Arguments
Additional arguments include a log file (clinotator.log) and specification of the output file prefix (the default is clinotator).
Additional arguments include a log file (--log), a more detailed log file (--long-log) and specification of the output file prefix (-o, the default is clinotator). Also help and version messages are available.
## Motivation
While ClinVar has become an indispensable resource for clinical variant interpretation, that can be a double-edged sword. Often the sheer wealth of types of information can make it difficult to make a final interpretation. The sophisticated architecture the records also makes programmatic analysis of batches of variants challenging. This software filters information types for each variant to focus on clinical assertions being made about the variant. It generates several metrics by which to gauge the robustness of the overall clinical assertion and measure the variance in Interpretation. This can be done by batches of Variation IDs, batches of dbSNP rsIDs or by analysis and annotation of .vcf files. The hope is that this will help identify variants that are candidates for reclassification, and prioritize variants for further research.
While ClinVar has become an indispensable resource for clinical variant interpretation, its sophisticated structure provides a daunting learning curve. Often the sheer depth of types of information provided can make it difficult to analyze variant information with high throughput. Clinotator is a fast and lightweight tool to extract important aspects of criteria-based clinical assertions and uses that information to generate several metrics to assess the strength and consistency of the evidence supporting the variant clinical significance. Clinical assertions are weighted by significance type, age of submission and submitter expertise category to filter outdated or incomplete assertions that otherwise confound interpretation. This can be accomplished in batches: either lists of Variation IDs or dbSNP rsIDs, or with vcf files which are additionally annotated. Clinotator slices out problem variants in minutes without extensive computational effort—just using a personal computer. With the rapidly growing body of variant evidence, most submitters and researchers have limited resources to devote to variant curation. Clinotator provides efficient, systematic prioritization of discordant variants in need of reclassification. The hope is that this tool can inform ClinVar curation and encourage submitters to keep their clinical assertions current by focusing their efforts. Additionally, researchers can utilize new metrics to analyze variants of interest in pursuit of new insights into pathogenicity.
## Installation
Implemented in python (tested on 2.7.12 and >=3.4). You can `git clone` or download the zipfile and unpack as you like. Add the location to your ~/.bash_profile or:
Implemented in python (tested on 2.7.12 and >=3.4). You can `git clone` or download the zipfile and unpack. Add the folder location to your ~/.bash_profile or:
```
export PATH=$PATH:path/to/folder/Clinotator/clinotator
Expand All @@ -66,7 +68,9 @@ clinotator.py -t vid -e [email protected] test.vid
Should produce the following warnings and a clinotator.test.tsv file:
```
Going to download record 1 to 13
INFO:root:Starting on test.vid
INFO:root:Going to download record 1 to 13
INFO:root:Download time: 0.026888549999997242 min, Batches run -> 1
WARNING:root:128294 has a missing assertion date!
WARNING:root:128297 has a missing assertion date!
WARNING:root:ClinVar significance for 3521 does not include B,B/LB,LB,US,LP,LP/P,P
Expand Down Expand Up @@ -97,23 +101,23 @@ Numpy *should* work >= 1.9.0 and pandas >= 0.20.0, but install more recent versi
<dl>
<dt>ClinVar Clinical Significance (CVCS)</dt>
<dd>Clinical significance reported by ClinVar.<sup>2</sup> Ratings metrics are based on the five ACMG/AMP recommended classifications for Mendelian disorders: Benign, Likely benign, Uncertain significance, Likely pathogenic and Pathogenic. Other Clinical significance values are reported, but not factored into the Clinotator metrics.</dd>
<dd>Clinical significance reported by ClinVar.<sup>2</sup> Ratings metrics are based on the five ACMG/AMP recommended <sup>3</sup> classifications for Mendelian disorders: Benign, Likely benign, Uncertain significance, Likely pathogenic and Pathogenic. Other Clinical significance values are reported, but not factored into the Clinotator metrics.</dd>
</dl>
<dl>
<dt>ClinVar Stars (CVSZ)</dt>
<dd>Star rating given by clinvar. Ranges from zero to four.<sup>3</sup></dd>
<dd>Star rating given by clinvar. Ranges from zero to four.<sup>4</sup></dd>
</dl>
<dl>
<dt>ClinVar Number of Clinical Assertions (CVNA)</dt>
<dd>The number of Clinvar Submissions possessing a clinical assertion (with criteria provided). This measure excludes submissions without assertion criteria, including "literature reviews", which are a type of evidence as opposed to an assertion. Additionally, submitter assertions without defined criteria are also omitted. Most assertions with criteria meet or exceed the guidelines put forth by the American College of Medical Genetics and Genomics (ACMG) in 2013 and amended in 2015.<sup>4,5</sup></dd>
<dd>The number of Clinvar Submissions possessing a clinical assertion (with criteria provided). This measure excludes submissions without assertion criteria, including "literature reviews", which are a type of evidence as opposed to an assertion. Additionally, submitter assertions without defined criteria are also omitted. Most assertions with criteria meet or exceed the guidelines put forth by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) in 2015.<sup>3</sup></dd>
</dl>
<dl>
<dt>ClinVar Conditions/Diseases (CVDS)</dt>
<dd>Conditions reported to be associated with this variant.</dd>
</dl>
<dl>
<dt>ClinVar Last Evaluated (CVLE)</dt>
<dd>The date the clinical significance of the variation report was last evaluated. Note this is not the date the variation report was last updated, but the date in the \<ClinicalAssertionList\> field of the ClinVar xml connected to the Review Status.</dd>
<dd>The date the clinical significance of the variation report was last evaluated. Note this is not the date the variation report was last updated, but the date in the <ClinicalAssertionList> field of the ClinVar xml connected to the Review Status.</dd>
</dl>
<dl>
<dt>ClinVar Variant Type (CVVT)</dt>
Expand All @@ -125,22 +129,22 @@ Numpy *should* work >= 1.9.0 and pandas >= 0.20.0, but install more recent versi
<dl>
<dt>Clinotator Raw Score (CTRS)</dt>
<dd>A weighted metric of pathogenicity based on submitter type, assertion type and assertion age. The type of submitter is weighted based on expertise, with regular clinical assertions unweighted at 1.00, expert reviewers receiving a 1.10 and practice guidelines receiving a score of 1.25.
The age of the assertion is weighted as new data is incorporated into assertions as well as previous data, creating a larger set of evidence over time. For the first two years, there is no weight, then there is a 10% reduction in weight per year through 6 years , at which point the penalty stays at a static 50% weight thereafter.
The age of the assertion is penalized as new data is incorporated into newer assertions as well as previous data, creating a larger set of evidence over time. For the first two years, there is no penalty, then there is a 10% reduction gradation in weight per year through 6 years , at which point the penalty stays at a static 50% reduction thereafter.
The assertion type is that largest weight, with values of: Benign(B) = -6, Likely benign(LB) = -3, Uncertain significance(US) = -0.3, Likely pathogenic(LP) = 3 and Pathogenic(P) = 6. For more information on the weighting decisions, see our publication.<sup>6</sup></dd>
The assertion type is that largest weight, with values of: Benign(B) = -6, Likely benign(LB) = -3, Uncertain significance(US) = -0.3, Likely pathogenic(LP) = 3 and Pathogenic(P) = 6. For more information on the weighting decisions, see our publication.<sup>5</sup></dd>
</dl>
<dl>
<dt>Average Clinical Assertion Age (CTAA)</dt>
<dd>As described above, the clinical assertions with criteria provided are counted, and their average age is calculated.</dd>
</dl>
<dl>
<dt>Clinotator Predicted Significance (CTPS)</dt>
<dd>This is a *predicted* clinical significance based on the weighted distribution of all variants in ClinVar with two or more clinical assertions (as of a Clinotator version release date). The ratings are calculated as previously described, on nonparametric prediction intervals with a given confidence of classification. See Figure 1 in our publication for details.<sup>6</sup></dd>
<dd>This is a *predicted* clinical significance based on the weighted distribution of all variants in ClinVar with two or more clinical assertions (as of a Clinotator version release date). The ratings are calculated as previously described, on nonparametric prediction intervals with a given confidence of classification. See Figure 1 in our publication for details.<sup>5</sup></dd>
</dl>
<dl>
<dt>Clinotator Reclassification Recommendation (CTRR)</dt>
<dd>This field ranks reclassification priority based on the difference between the CVCS and the CTWS. This field only includes the seven values of clinical significance associated with Mendelian diseases (B, B/LB, LB, US/CI, LP, LP/P, P). For the purposes of reclassification, "Conflicting interpretations of pathogenicity" is scored the same as Uncertain significance.</dd>
<dd>This field ranks reclassification priority based on the difference between the CVCS and the CTWS. This field only includes the seven values of ClinVar clinical significance associated with Mendelian diseases (B, B/LB, LB, US/CI, LP, LP/P, P). For the purposes of reclassification, "Conflicting interpretations of pathogenicity" is scored the same as Uncertain significance.</dd>
</dl>
* 0 - Reclassification unlikely, consistent identity or insufficient information for a recommendation
Expand All @@ -149,15 +153,15 @@ The assertion type is that largest weight, with values of: Benign(B) = -6, Likel
* 3 - High priority reclassification, significant change in clinical impact
<dl>
<dd>For a detailed decision tree, see Figure 2 in our publication.<sup>6</sup></dd>
<dd>For a detailed decision schema, see our publication (see Fig 2).<sup>5</sup></dd>
</dl>
## Citation
Citation
'''
BibTex format citation
BibTex format citation coming soon
'''
Expand All @@ -182,7 +186,6 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
<sup>1</sup> https://www.ncbi.nlm.nih.gov/clinvar/docs/variation_report/
<sup>2</sup> https://www.ncbi.nlm.nih.gov/clinvar/docs/clinsig/
<sup>3</sup> https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/
<sup>4</sup> Green, R. C., et al. (2013). "ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing." Genet Med 15(7): 565-574.
<sup>5</sup> Richards, S., et al. (2015). "Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology." Genet Med 17(5): 405-424.
<sup>6</sup> Our paper
<sup>3</sup> Richards, S., et al. (2015). "Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology." Genet Med 17(5): 405-424.
<sup>4</sup> https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/
<sup>5</sup> Our paper, Coming Soon

0 comments on commit 68c0bce

Please sign in to comment.