-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Yunxi Liu
committed
Nov 20, 2023
1 parent
01f556e
commit ef87fd0
Showing
19 changed files
with
655 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,13 +8,27 @@ Wastewater monitoring is an important tool that can complement clinical testing | |
2. enables fast queries of mutation combinations against all publicly available SARS-CoV-2 genomes | ||
3. improving understanding of SARS-CoV-2 intrahost evolution and transmission events at a large scale | ||
|
||
It is highly recommanded that the wastewater samples are processed with [QuaID](https://gitlab.com/treangenlab/quaid), a novel bioinformatics tool (QuaID) for VoC detection based on quasiunique mutations that are being developed by [Treangenlab](https://gitlab.com/treangenlab). | ||
It is highly recommanded that the wastewater samples are processed with [QuaID](https://gitlab.com/treangenlab/quaid), a novel bioinformatics tool (QuaID) for VoC detection based on quasiunique mutations that are being developed by [Treangenlab](https://gitlab.com/treangenlab). The current version number of Crykey is v1.0 | ||
|
||
## System requirements | ||
|
||
Crykey is supported on Linux system. The user should provide sufficient amount of RAM in order to load the classification database for Crykey. A standard database based on publicly available SARS-CoV-2 genomes till Jan, 10, 2023 takes more than 17GB. This tool (version 1.0) is tested on Linux (Ubuntu 18.04.5 LTS). There is no non-stardard hardware required for this software. | ||
|
||
## Installation | ||
|
||
To install Crykey, simply download the github repo. It's highly recommand that the dependencies is installed on a clean conda enviorment. To create a new conda enviorment, please follow [conda user guide](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). After installing the required dependencies and downloading the pre-build database, the software is ready to be used. The typical install time should be no longer than 30 minutes. | ||
|
||
``` | ||
git clone [email protected]:treangenlab/crykey.git | ||
cd crykey | ||
./install.sh | ||
``` | ||
|
||
## 3rd party software requirements | ||
|
||
Below is the list of 3rd party software requirements. All required sofwtare can be installed via Miniconda after adding `bioconda` to the list of channels. Version specified in the parantheses is the version currently tested. | ||
|
||
* vdb (2.7) | ||
* vdb (2.7) (database building only) | ||
* samtools (1.7) | ||
* SnpEff | ||
|
||
|
@@ -169,6 +183,15 @@ Based on such information, you could determine which of the co-occurring SNVs qu | |
* have sufficiant number of supporting reads, | ||
* the occurence in the database should be low. in other words, the cryptic lineage should be novo or at least rare in the database. | ||
|
||
## Demo Run | ||
|
||
The following are a demo run with the test data we provide. The expected output are store in `demo/test_output`. The test run should take less then 10 minutes to finish. The majority time spend will be loading the database, so having multiple samples run as a batch would significantly increase efficiency of the tool. | ||
``` | ||
python crykey_wastewater.py -i demo/test_metadata.tsv -r demo/SARS-CoV-2-reference.fasta -d [PATH_TO_CRYKEY_DATABASE] -o [PATH_TO_OUTPUT_DIRECTORY] | ||
python crykey_query.py -d [PATH_TO_CRYKEY_DATABASE] -o [PATH_TO_OUTPUT_DIRECTORY] | ||
``` | ||
|
||
|
||
## Manuscript | ||
|
||
You can find the manuscript describing QuaID and corresponding results at [doi.org/10.1101/2023.06.16.23291524](https://www.medrxiv.org/content/10.1101/2023.06.16.23291524v1). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
##fileformat=VCFv4.0 | ||
##fileDate=20210806 | ||
##source=lofreq call -f SARS-CoV-2-reference.fasta --call-indels -o Variant-calling-LoFreq/HHD0802/76-1.clean.vcf Variant-calling-LoFreq/HHD0802/76-1.clean.indelqual.bam | ||
##reference=SARS-CoV-2-reference.fasta | ||
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth"> | ||
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency"> | ||
##INFO=<ID=SB,Number=1,Type=Integer,Description="Phred-scaled strand bias at this position"> | ||
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Counts for ref-forward bases, ref-reverse, alt-forward and alt-reverse bases"> | ||
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL."> | ||
##INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant)."> | ||
##INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position"> | ||
##FILTER=<ID=min_dp_10,Description="Minimum Coverage 10"> | ||
##FILTER=<ID=sb_fdr,Description="Strand-Bias Multiple Testing Correction: fdr corr. pvalue > 0.001000"> | ||
##FILTER=<ID=min_snvqual_59,Description="Minimum SNV Quality (Phred) 59"> | ||
##FILTER=<ID=min_indelqual_38,Description="Minimum Indel Quality (Phred) 38"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO | ||
NC_045512.2 210 . G T 1607 PASS DP=43;AF=1.000000;SB=0;DP4=0,0,17,26 | ||
NC_045512.2 241 . C T 1813 PASS DP=50;AF=0.980000;SB=0;DP4=0,0,23,26 | ||
NC_045512.2 320 . C T 114 PASS DP=55;AF=0.163636;SB=20;DP4=21,25,0,9 | ||
NC_045512.2 727 . T A 288 PASS DP=27;AF=0.925926;SB=0;DP4=1,0,25,0 | ||
NC_045512.2 1133 . A T 61 PASS DP=184;AF=0.027174;SB=0;DP4=95,84,3,2 | ||
NC_045512.2 5669 . T C 1172 PASS DP=104;AF=0.423077;SB=0;DP4=37,23,27,17 | ||
NC_045512.2 5744 . T C 65 PASS DP=96;AF=0.041667;SB=4;DP4=50,42,1,3 | ||
NC_045512.2 5777 . C T 168 PASS DP=96;AF=0.093750;SB=0;DP4=51,36,5,4 | ||
NC_045512.2 6402 . C T 5573 PASS DP=157;AF=0.993631;SB=0;DP4=0,0,70,86 | ||
NC_045512.2 6456 . G A 231 PASS DP=120;AF=0.108333;SB=1;DP4=47,60,5,8 | ||
NC_045512.2 6478 . T C 1041 PASS DP=96;AF=0.406250;SB=1;DP4=26,31,16,23 | ||
NC_045512.2 10029 . C T 507 PASS DP=15;AF=1.000000;SB=0;DP4=0,0,7,8 | ||
NC_045512.2 12926 . A AC 41 PASS DP=266;AF=0.007519;SB=0;DP4=138,127,1,1;INDEL;HRUN=3 | ||
NC_045512.2 16466 . C T 661 PASS DP=19;AF=1.000000;SB=0;DP4=0,0,7,12 | ||
NC_045512.2 17122 . G T 7008 PASS DP=1405;AF=0.226335;SB=1;DP4=681,406,203,115 | ||
NC_045512.2 17135 . C T 1695 PASS DP=1535;AF=0.071661;SB=0;DP4=870,554,68,42 | ||
NC_045512.2 17285 . C T 108 PASS DP=1986;AF=0.008560;SB=8;DP4=1107,861,13,4 | ||
NC_045512.2 17518 . CT C 48 PASS DP=82;AF=0.024390;SB=0;DP4=45,35,1,1;INDEL;HRUN=2 | ||
NC_045512.2 18636 . G A 89 PASS DP=223;AF=0.035874;SB=12;DP4=138,77,8,0 | ||
NC_045512.2 23403 . A G 429 PASS DP=12;AF=1.000000;SB=0;DP4=0,0,5,7 | ||
NC_045512.2 24863 . C T 14056 PASS DP=392;AF=0.997449;SB=0;DP4=0,0,182,209 | ||
NC_045512.2 25174 . A C 94 PASS DP=139;AF=0.064748;SB=1;DP4=59,71,5,4 | ||
NC_045512.2 25469 . C T 6969 PASS DP=193;AF=0.994819;SB=0;DP4=0,1,88,104 | ||
NC_045512.2 26767 . T C 643 PASS DP=20;AF=1.000000;SB=0;DP4=0,0,8,12 | ||
NC_045512.2 27131 . C T 22273 PASS DP=616;AF=0.995130;SB=0;DP4=1,1,261,352 | ||
NC_045512.2 27176 . T C 105 PASS DP=484;AF=0.026860;SB=26;DP4=185,286,0,13 | ||
NC_045512.2 28247 . AGATTTC A 12960 PASS DP=402;AF=0.990050;SB=1;DP4=8,15,163,235;INDEL;HRUN=1 | ||
NC_045512.2 28270 . TA T 44158 PASS DP=1150;AF=0.988696;SB=2;DP4=7,7,681,456;INDEL;HRUN=4 | ||
NC_045512.2 28372 . TG T 40 PASS DP=1443;AF=0.002079;SB=0;DP4=722,905,1,2;INDEL;HRUN=4 | ||
NC_045512.2 28432 . C T 2435 PASS DP=949;AF=0.154900;SB=9;DP4=231,569,52,95 | ||
NC_045512.2 29029 . T C 112 PASS DP=143;AF=0.055944;SB=12;DP4=84,51,8,0 | ||
NC_045512.2 29039 . A T 182 PASS DP=157;AF=0.070064;SB=20;DP4=90,56,11,0 | ||
NC_045512.2 29049 . G A 166 PASS DP=171;AF=0.070175;SB=2;DP4=98,61,9,3 | ||
NC_045512.2 29711 . G T 113 PASS DP=19;AF=0.210526;SB=0;DP4=8,7,2,2 | ||
NC_045512.2 29742 . G T 522 PASS DP=17;AF=0.941176;SB=0;DP4=0,1,6,10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Sample_Collection_Date WWTP Sorted_BAM VCF | ||
01012023 Test demo/test_data.bam demo/test_data.vcf |
Binary file not shown.
Binary file added
BIN
+128 Bytes
demo/test_output/cryptic_alignment/cryptic_reads_1012023_Test.bam.bai
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Date,Site,Nt Mutations,AA Mutations,Support DP,Total DP,Combined Freq | ||
1012023,Test,C6402T;G6456A,ORF1a:P2046L;ORF1a:C2064Y,7,60,0.11666666666666667 | ||
1012023,Test,C27131T;T27176C,M:N203N;M:A218A,13,289,0.04498269896193772 | ||
1012023,Test,A29039T;G29049A,N:K256*;N:R259Q,7,97,0.07216494845360824 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
M04488:8:000000000-G8LL4:1:2103:13396:8558 | ||
M04488:8:000000000-G8LL4:1:2102:9412:26067 | ||
M04488:8:000000000-G8LL4:1:1102:12015:16214 | ||
M04488:8:000000000-G8LL4:1:2104:20712:21847 | ||
M04488:8:000000000-G8LL4:1:1103:22919:9482 | ||
M04488:8:000000000-G8LL4:1:1104:12896:21967 | ||
M04488:8:000000000-G8LL4:1:2104:15016:25238 | ||
M04488:8:000000000-G8LL4:1:1102:14601:5730 | ||
M04488:8:000000000-G8LL4:1:1102:18604:17868 | ||
M04488:8:000000000-G8LL4:1:1102:15226:19521 | ||
M04488:8:000000000-G8LL4:1:1102:14848:6489 | ||
M04488:8:000000000-G8LL4:1:1102:24859:19429 | ||
M04488:8:000000000-G8LL4:1:1102:11148:6398 | ||
M04488:8:000000000-G8LL4:1:1102:13910:12657 | ||
M04488:8:000000000-G8LL4:1:1102:26239:14246 | ||
M04488:8:000000000-G8LL4:1:1102:28000:17740 | ||
M04488:8:000000000-G8LL4:1:1102:8701:16578 | ||
M04488:8:000000000-G8LL4:1:1102:16810:5541 | ||
M04488:8:000000000-G8LL4:1:1102:9024:3703 | ||
M04488:8:000000000-G8LL4:1:1102:17949:24318 | ||
M04488:8:000000000-G8LL4:1:1103:16919:12562 | ||
M04488:8:000000000-G8LL4:1:1104:11272:15582 | ||
M04488:8:000000000-G8LL4:1:2103:26012:23845 | ||
M04488:8:000000000-G8LL4:1:1101:14763:11854 | ||
M04488:8:000000000-G8LL4:1:1103:5630:13033 | ||
M04488:8:000000000-G8LL4:1:2104:21013:17941 | ||
M04488:8:000000000-G8LL4:1:1103:27288:9477 |
Oops, something went wrong.