Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variants missing in parallel mode #123

Open
oleraj opened this issue Dec 9, 2021 · 2 comments
Open

Variants missing in parallel mode #123

oleraj opened this issue Dec 9, 2021 · 2 comments

Comments

@oleraj
Copy link

oleraj commented Dec 9, 2021

Hi,

We're looking for somatic/mosaic variants in exome data and there's a particular variant of interest that is not showing up when I run lofreq in parallel mode, however it does show up when I used lofreq in single-threaded mode. Single-threaded mode takes 10x longer to run so I'd like to use parallelization but I'm uncertain why this variant is omitted during the merging step.

My settings are

lofreq call-parallel --pp-threads 16 --ref human_g1k_v37_decoy_plus.fasta --bed Targets_BED.bed -s --call-indels -S gnomad.exomes.r2.0.1.sites.vt.fix.AF001.vcf.gz -o ${prefix}.lofreq.vcf indelqual/${p}.bam

Interestingly, when I check the intermediate files during the run, I do see the variant:

$ tabix /hpcdata/scratch/15010036.1.all.q/lofreq2_call_paralleljumki42g/21.vcf.gz 4:55594084-55594084
4	55594084	.	A	G	92	PASS	DP=40;AF=0.150000;SB=0;DP4=25,9,5,1	

But in the final filtered file the variant is absent.

Also, if I run the analysis in single-threaded mode for the whole exome, or manually for just for this region of interest, the variant shows up in the results, e.g.,

lofreq call --ref $ref -s -r 4:55594084-55594084 --call-indels -S gnomad.exomes.r2.0.1.sites.vt.fix.AF001.vcf.gz $bam

It seems this is not the only variant. I ran the same BAM file with different threads and it seems the more threads I use the fewer the variants remaining in the VCF file:

1 thread – 103737 variants
4 threads – 103601
16 threads – 103601
30 threads – 103597

I tried this first with v.2.1.3.1 but also tried with v.2.1.5 and I see the same thing.

Any idea why some variants would be missing in parallel mode that are present in single-threaded mode?

Thanks!

Andrew

@reesea22
Copy link

Hi,

We were having the same issue. In order to get variants using call-parallel we first had to index the .bam input that we used using samtools index, i.e.,

samtools index <bam file>
lofreq call-parallel --pp-threads <threads> --call-indels -f <reference fasta> <bam file> -o <vcf file>

Hope this helps,
Amy

@oleraj
Copy link
Author

oleraj commented Feb 18, 2022

Hi @reesea22 thanks for the comment. Unfortunately the BAM files are already indexed so I don't think that will solve my issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants