Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate VCF lines from HG002 BAM #111

Open
themkdemiiir opened this issue Feb 8, 2024 · 1 comment
Open

Duplicate VCF lines from HG002 BAM #111

themkdemiiir opened this issue Feb 8, 2024 · 1 comment

Comments

@themkdemiiir
Copy link

Hello,

I tested your tool on AshkenazimTrio and noticed that vcf_id is common for BND pairs with different quality, and there are duplicate vcf lines.
The reference files I used

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/

The Bam and index files used

https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/

The duplicate vcf lines. I can also share the VCF file if you want so. Thanks

chr5	58813706	SV_736_1	N	]chr5:58813779]N	50	PASS	SVTYPE=BND;REGIONA=58813706,58813943;REGIONB=58813723,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=GGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAATGCACACATACAGGCAATCAGGAATGCAGAAATGAATTTACCAAGTTACAAAATGGGTTAACACCCATGGAGCAAGAATCAGATGCATGCCACCAAACACAATTTATTGGCATTTCTTTCTATTTGCAAGAACTTGTATTATTATTGGTTTTCCACCACCTAC	GT:CN:COV:DV:RV:LQ:RR:DR	0/1:2:62.02100840336134,59.54054054054054,59.0:0:0:0.0,0.0:36,46:36,39
chr5	58813706	SV_1066_1	N	]chr5:58813779]N	50	PASS	SVTYPE=BND;REGIONA=58813706,58813755;REGIONB=58813502,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=CCCCCCATGGATCTTTCTACACGCGCGGGGTTGGGTATCTTCTGTGTGCACACTGCTCACCCCCCGTTCTCATAGACAGGTTGTCTAGTCACTCCAAGCACATGCCTTCCTTAGCCATTGTATTGTTAAGTTTTTATGTTTTATTTATATTTATATTTATATATATATATATATATATATATATATATATATATATACACATACACACATATACATATGGTAGAACCACAGCTTTTATCCAAATATAAAATAAACACATGTCAAAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAAATCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCAGAGGCACAAGTGGATACTCAGTGAACGCAA	GT:CN:COV:DV:RV:LQ:RR:DR	0/1:2:59.32,59.54054054054054,61.881294964028775:0:0:0.0,0.0:36,46:36,39
chr5	58813779	SV_736_2	N	N[chr5:58813706[	50	PASS	SVTYPE=BND;REGIONA=58813706,58813943;REGIONB=58813723,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=GGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAATGCACACATACAGGCAATCAGGAATGCAGAAATGAATTTACCAAGTTACAAAATGGGTTAACACCCATGGAGCAAGAATCAGATGCATGCCACCAAACACAATTTATTGGCATTTCTTTCTATTTGCAAGAACTTGTATTATTATTGGTTTTCCACCACCTAC	GT:CN:COV:DV:RV:LQ:RR:DR	0/1:2:62.02100840336134,59.54054054054054,59.0:0:0:0.0,0.0:36,46:36,39
chr5	58813779	SV_1066_2	N	N[chr5:58813706[	50	PASS	SVTYPE=BND;REGIONA=58813706,58813755;REGIONB=58813502,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=CCCCCCATGGATCTTTCTACACGCGCGGGGTTGGGTATCTTCTGTGTGCACACTGCTCACCCCCCGTTCTCATAGACAGGTTGTCTAGTCACTCCAAGCACATGCCTTCCTTAGCCATTGTATTGTTAAGTTTTTATGTTTTATTTATATTTATATTTATATATATATATATATATATATATATATATATATATATACACATACACACATATACATATGGTAGAACCACAGCTTTTATCCAAATATAAAATAAACACATGTCAAAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAAATCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCAGAGGCACAAGTGGATACTCAGTGAACGCAA	GT:CN:COV:DV:RV:LQ:RR:DR	0/1:2:59.32,59.54054054054054,61.881294964028775:0:0:0.0,0.0:36,46:36,39
@J35P312
Copy link
Member

J35P312 commented Feb 8, 2024

Hello!
Thanks for testing tiddit! There are interesting examples, I see that they are detected through de novo assembly only, and that they are called based on two distinct contigs... I will have a look and see if I can make TIDDIT collapse this kind of calls.

Best regards
Jesper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants