Skip to content

Commit

Permalink
Merge branch 'master' into feature/haploid
Browse files Browse the repository at this point in the history
  • Loading branch information
chaklim committed Dec 22, 2019
2 parents 221dce2 + 8f5bf05 commit 21ed5de
Showing 1 changed file with 6 additions and 13 deletions.
19 changes: 6 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Clair - Yet another deep neural network based variant caller
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/clair/README.html)
Contact: Ruibang Luo
Email: [email protected]
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/clair/README.html) \
Contact: Ruibang Luo \
Email: [email protected]

## Introduction
Single-molecule sequencing technologies have emerged in recent years and revolutionized structural variant calling, complex genome assembly, and epigenetic mark detection. However, the lack of a highly accurate small variant caller has limited the new technologies from being more widely used. In this study, we present Clair, the successor to Clairvoyante, a program for fast and accurate germline small variant calling, using single molecule sequencing data. For ONT data, Clair achieves the best precision, recall and speed as compared to several competing programs, including Clairvoyante, Longshot and Medaka. Through studying the missed variants and benchmarking intentionally overfitted models, we found that Clair may be approaching the limit of possible accuracy for germline small variant calling using pileup data and deep neural networks.

This is the formal release of Clair (Clair v2, Dec 2019). You can find the experimental Clair v1 (Jan 2019) at [https://github.com/aquaskyline/Clair](https://github.com/aquaskyline/Clair). The preprint of Clair v2 is available in [bioAxiv](https://biorxiv.org/cgi/content/short/865782v1).
This is the formal release of Clair (Clair v2, Dec 2019). You can find the experimental Clair v1 (Jan 2019) at [https://github.com/aquaskyline/Clair](https://github.com/aquaskyline/Clair). The preprint of Clair v2 is available in [bioRxiv](https://www.biorxiv.org/content/10.1101/865782v2).

---

Expand Down Expand Up @@ -38,10 +38,7 @@ pypy3 -m pip install blosc intervaltree
pip install numpy blosc intervaltree tensorflow==1.13.2 pysam matplotlib
conda install -c anaconda pigz
conda install -c conda-forge parallel zstd
conda install -c bioconda samtools vcflib

# install vcftools
sudo apt-get install vcftools
conda install -c bioconda samtools vcflib bcftools

# clone Clair
git clone --depth=1 https://github.com/HKU-BAL/Clair.git
Expand Down Expand Up @@ -78,9 +75,6 @@ conda config --add channels conda-forge
conda create -n clair-env -c bioconda clair
conda activate clair-env

# install vcftools
sudo apt-get install vcftools

# store clair.py PATH into $CLAIR variable
CLAIR=`which clair.py`

Expand Down Expand Up @@ -250,7 +244,7 @@ cat command.sh | parallel -j4
for i in OUTPUT_PREFIX.*.vcf; do if ! [ -z "$(tail -c 1 "$i")" ]; then echo "$i"; fi ; done | grep -f - command.sh | sh

# concatenate vcf files and sort the variants called
vcfcat ${OUTPUT_PREFIX}.*.vcf | vcf-sort -c | bgziptabix snp_and_indel.vcf.gz
vcfcat ${OUTPUT_PREFIX}.*.vcf | bcftools sort -m 2G | bgziptabix snp_and_indel.vcf.gz
```

#### Note
Expand All @@ -262,7 +256,6 @@ vcfcat ${OUTPUT_PREFIX}.*.vcf | vcf-sort -c | bgziptabix snp_and_indel.vcf.gz
* If you are working on non-human BAM file (e.g. bacteria), please use `--includingAllContigs` option to include all contigs
* `CUDA_VISIBLE_DEVICES=""` makes GPUs invisible to Clair so it will use CPU for variant calling. Please notice that unless you want to run `commands.sh` in serial, you cannot use GPU because one running copy of Clair will occupy all available memory of a GPU. While the bottleneck of `callVarBam` is at the `CreateTensor` script, which runs on CPU, the effect of GPU accelerate is insignificant (roughly about 15% faster). But if you have multiple GPU cards in your system, and you want to utilize them in variant calling, you may want split the `commands.sh` in to parts, and run the parts by firstly `export CUDA_VISIBLE_DEVICES="$i"`, where `$i` is an integer from 0 identifying the ID of the GPU to be used.
* `vcfcat` and `bgziptabix` commands are from [vcflib](https://github.com/vcflib/vcflib), and are installed by default using option 2 (conda) or option 3 (docker).
* `vcf-sort` command is from [vcftools](https://github.com/vcftools/vcftools)
* Please also check the notes in the above sections for other considerations.

---
Expand Down

0 comments on commit 21ed5de

Please sign in to comment.