Low truvari benchmark scores #3

riyasj327 · 2024-05-16T19:43:14Z

Hi,

I am trying to run truvari on the hapdiff unphased variants VCF (produced by the haplotype resolved HG002 assemblies) against the HG002 benchmarking VCF. I have given the commands used and the links to the public datasets below:

Hapdiff command:
singularity exec --bind $DD_DIR hapdiff_0.9.sif hapdiff.py --reference $DD_DIR/chm13_v2.fa --pat $DD_DIR/hg002v1.0.1.pat.fasta.gz --mat $DD_DIR/hg002v1.0.1.mat.fasta.gz --out-dir $DD_DIR/hapdiff -t 20

Links to the pat and mat assemblies:
hg002v1.0.1.pat.fasta.gz - https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.pat.fasta.gz
hg002v1.0.1.mat.fasta.gz - https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.mat.fasta.gz

Truvari command:
truvari bench -b CHM13v2.0_HG2-T2TQ100-V1.0.vcf.gz -c /projects/rsaju_prj/LongReadAssembly-test/hapdiff/hapdiff/hapdiff_unphased.vcf.gz -o output/

Links to the base dataset:
base dataset - https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_HG002_DraftBenchmark_defrabbV0.015-20240215/CHM13v2.0_HG2-T2TQ100-V1.0.vcf.gz
comparison dataset - produced by the hapdiff using the command above

Unfortunately, the precision, recall and F1 scores are low(~0.5) when it should be around 0.9? I tried using the latest HG002 benchmarks and good quality HG002 haplotype resolved assemblies available. Please find the summary.json produced by the truvari attached with this issue.
summary.json

Any idea what is going on and why are the scores so low? Any insights on this would be really helpful!

Thanks,
Riya

mikolmogorov · 2024-05-28T13:31:35Z

Hello,

Sorry for the late response! I don't see anything wrong with your command lines, so it is hard to tell why the scores are lower. I would try to look into fp / fn calls output by truvari and see if you can see any patterns there. If can look into the hapdup assembly alignments in these regions along with the hapdiff VCF calls.

oneillkza · 2024-05-28T17:21:38Z

Thanks @fenderglass !

Yep when we looked into the FP/FN calls, it looked like a lot of those were due to tandem repeat arrays just being represented slightly differently. (This may in part be due to actual errors in the gold standard, since it's a newer one based off the HPRC T2T assemblies, which hasn't been as well curated as the old hg19 gold standard).

Anyway, the solution seemed to be to slightly relax some of the parameters in Truvari. @riyasj327, could you please post the full Truvari call you ended up using?

(The numbers we are now getting show F-measures around 0.8 for sniffles2, hapdiff and pav, with slight differences in FP/FN tradeoff between the methods).

mikolmogorov · 2024-05-28T23:09:28Z

Good to know - 0.8 seems closer to the expected number, if you are including SDs / pericentromeric reigons.

riyasj327 · 2024-05-29T16:46:05Z

Thanks @fenderglass and @oneillkza!

Here is the final truvari commands we are using now:

truvari bench -b CHM13v2.0_HG2-T2TQ100-V1.0.vcf.gz -c hapdiff_unphased.vcf.gz -o truvari --pctseq 0.5 --pctsize 0.5 -r 2000 --chunksize 2000 --passonly --includebed CHM13v2.0_HG2-T2TQ100-V1.0_stvar.benchmark.bed

truvari refine --regions truvari/candidate.refine.bed --reference chm13_v2.fa --recount --use-region-coords truvari/

The new Truvari scores:
"TP-base": 18421,
"TP-comp": 18421,
"FP": 726,
"FN": 7969,
"precision": 0.9620828328197629,
"recall": 0.6980295566502464,
"f1": 0.8090563717416607,
"base cnt": 26390,
"comp cnt": 19147

mikolmogorov · 2024-06-10T21:35:23Z

Thanks for the info!

riyasj327 changed the title ~~Low~~ Low truvari benchmark scores May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low truvari benchmark scores #3

Low truvari benchmark scores #3

riyasj327 commented May 16, 2024 •

edited

Loading

mikolmogorov commented May 28, 2024

oneillkza commented May 28, 2024

mikolmogorov commented May 28, 2024

riyasj327 commented May 29, 2024

mikolmogorov commented Jun 10, 2024

Low truvari benchmark scores #3

Low truvari benchmark scores #3

Comments

riyasj327 commented May 16, 2024 • edited Loading

mikolmogorov commented May 28, 2024

oneillkza commented May 28, 2024

mikolmogorov commented May 28, 2024

riyasj327 commented May 29, 2024

mikolmogorov commented Jun 10, 2024

riyasj327 commented May 16, 2024 •

edited

Loading