Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Robert Butler authored Mar 8, 2020
1 parent feafd2f commit 185d4c9
Showing 1 changed file with 20 additions and 15 deletions.
35 changes: 20 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

# Clinotator
## Synopsis
## Note: Clinotator is currently down, as the ClinVar data at NCBI has changed format. Currently working on a fix.
### Note: Clinotator v1.4.0 represents a major change for the new xml structure of ClinVar. Several options and descriptions may have changed below.
### Clinical interpretation of ambiguous ClinVar annotations

This project takes variants as input and queries NCBI eutilities to generate ClinVar Variation Report<sup>1</sup> scoring metrics. The overall goal is to generate annotations of use for given batches of variants to inform clinical interpretation. The metrics include:
Expand Down Expand Up @@ -59,32 +59,37 @@ While ClinVar has become an indispensable resource for clinical variant interpre
## Installation
Implemented in python (tested on 2.7.15 and >=3.5). You can `git clone` or download the zipfile and unpack. Add the folder location to your ~/.bash_profile or:
Implemented in python (tested on 2.7.15 and >=3.5). You can `git clone` or download the zipfile and unpack. Add the folder location to your ~/.bash_profile or `export PATH=$PATH:path/to/folder/Clinotator/clinotator.py`
If you have Anaconda/Miniconda installed on your system, self contained installation can be accomplished quickly with no other modifications. The following will find your conda envs folder, create a clinotator conda environment, and install the repo in it (with a link to source the package only when the conda environment is active).
```
CONDA_ENV=$(conda info --base)/envs/clinotator
conda create -y -n clinotator py37 biopython pandas
git clone https://github.com/rbutleriii/Clinotator.git $CONDA_ENV
ln -s $CONDA_ENV/Clinotator/clinotator/clinotator.py $CONDA_ENV/bin/clinotator.py
echo "the path to the test folder is: $CONDA_ENV/Clinotator/test"
```
export PATH=$PATH:path/to/folder/Clinotator/clinotator
```
Examples of each input file type are provided in the test subfolder. For instance:
```
cd path/to/Clinotator/test
conda activate clinotator
cd $CONDA_ENV/Clinotator/test
clinotator.py -t vid -e [email protected] test.vid
```
Should produce the following warnings and a clinotator.test.tsv file:
```
INFO:root:Run date: 2020-03-07
INFO:root:Starting on test.vid
INFO:root:Going to download record 1 to 13
INFO:root:Download time: 0.026888549999997242 min, Batches run -> 1
WARNING:root:128294 has a missing assertion date!
WARNING:root:128297 has a missing assertion date!
WARNING:root:ClinVar significance for 3521 does not include B,B/LB,LB,US,LP,LP/P,P
INFO:root:Download time: 0.03847670988337389 min, Batches run -> 1
WARNING:root:VID: 55794 does not have valid clinical assertions!
INFO:root:file written to clinotator.test.tsv
```
The warnings, as well as some additional information can be stored in the log file with `--log`. `--long-log` will store detailed debugging information, but the file will be larger than the output tsv file. Both log files append information, so batch runs or especially large lists of variants may result in large file sizes.
The warnings, as well as some additional information can be stored in the log file with `--log`. `--long-log` will store detailed debugging information, but the file will be larger than the output tsv file. Both log files append information, so batch runs or especially large lists of variants may result in large log file sizes. Use the output prefix to separate log files for different batches.
### Dependencies
Expand All @@ -104,9 +109,9 @@ Numpy *should* work >= 1.16.4 and pandas >= 0.24.2, but install more recent vers
### Memory/System requirements
Clinotator was designed in a Linux environment and implemented in Python (2.7 or >=3.5), and can run in similar OSX and possibly Windows Python environments. It can be run on a personal computer with relatively modest system requirements; a minimum of 2GB available RAM.
Clinotator was designed in a Linux environment and implemented in Python (2.7.15 or >=3.5), and can run in similar OSX and possibly Windows Python environments. It can be run on a personal computer with relatively modest system requirements; a minimum of 2GB available RAM.
As Clinotator keeps the NCBI xml results in memory, there is a significant memory usage. At the time of writing, the entire ClinVar xml set is approaching 6GB. Loading the entire set into memory is doable with at least 8GB of memory, though it is recommended that you batch your queries in this rare case. More typical usage for subsets of ClinVar or batch vcf annotations should not pose a memory issue.
As Clinotator keeps the NCBI xml results in memory, there is a significant memory usage. Loading the entire ClinVar database into memory is theoretically doable with at least 16GB of memory, though it is recommended that you batch your queries in this rare case. More typical usage for subsets of ClinVar or batch vcf annotations should not pose a memory issue.
## Details on metrics
Expand All @@ -126,19 +131,19 @@ As Clinotator keeps the NCBI xml results in memory, there is a significant memor
</dl>
<dl>
<dt>ClinVar Conditions/Diseases (CVDS)</dt>
<dd>Conditions reported to be associated with this variant.</dd>
<dd>Conditions reported to be associated with this variant. Per the new ClinVar database format, this information is limited to LinkOuts to other data (OMIM, MedGen, etc). Descriptions of the conditions are no longer stored in the main XML file.</dd>
</dl>
<dl>
<dt>ClinVar Alternate Allele (CVAL)</dt>
<dd>The alternate allele connected with ClinVar variation report.</dd>
<dd>The alternate allele connected with ClinVar variation report. Note that these are left aligned allele definitions, so deletions and insertions have the preceding base (CAA --> C and A --> AC).</dd>
</dl>
<dl>
<dt>ClinVar Last Evaluated (CVLE)</dt>
<dd>The date the clinical significance of the variation report was last evaluated. Note this is not the date the variation report was last updated, but the date in the <ClinicalAssertionList> field of the ClinVar xml connected to the Review Status.</dd>
</dl>
<dl>
<dt>ClinVar Variant Type (CVVT)</dt>
<dd>The type of variation in ClinVar. Currently defined as either "Simple" with a single AlleleID or "Haplotype" if multiple AlleleIDs are involved.</dd>
<dd>The type of variation in ClinVar. The newer ClinVar format defines "copy number gain", "copy number loss", "deletion", "duplication", "indel", "insertion", "inversion", "microsatellite" and "single nucleotide variant" with a single AlleleID or "Haplotype" if multiple AlleleIDs are involved.</dd>
</dl>
### Clinotator Metrics
Expand Down

0 comments on commit 185d4c9

Please sign in to comment.