Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNP calling/filtering: how to modify the ratio of reads to call a SNP and percentage of isolates containing a SNP #12

Open
butterbee opened this issue Jan 13, 2025 · 5 comments

Comments

@butterbee
Copy link

How to set up the parameters to call a SNP that pass a particular ratio of reads and only present in certain % of isolates in a given dataset? Not sure, what are a the VSNP3 default values for these parameters. if there are any, how to change it while running step1 or step2?

@stuber
Copy link
Contributor

stuber commented Jan 13, 2025

When running step 2, there are three threshold parameters you may be interested in changing:

-w QUAL_THRESHOLD, --qual_threshold QUAL_THRESHOLD
Optional: Minimum QUAL threshold for calling a SNP
-x N_THRESHOLD, --n_threshold N_THRESHOLD
Optional: Minimum N threshold. SNPs between this and qual_threshold are reported as N
-y MQ_THRESHOLD, --mq_threshold MQ_THRESHOLD
Optional: At least one position per group must have this minimum MQ threshold to be called.

Default values:
-w [150] --> SNP: QUAL >150
-x [50] --> N: QUAL 50-150
-y [56] --> MQ: >56

@butterbee
Copy link
Author

Thanks for the clarification. This is useful in setting up the quality threshold in step2.
Re SNP filtering, I'm struggling to understand how can we adjust the parameters to include SNPs that are only present in 90% (example) of isolates. Can we make such modifications in step1?
Also, can you clarify that the final SNP alignment produced by step2 is the core SNP alignment (snps that are present in all the isolates)?
Thanks!

@stuber
Copy link
Contributor

stuber commented Jan 13, 2025

There is no way to select a percentage of isolates. This is not within vSNP's scope. However, after running step 2 and examining the output SNP table, if you see a group of SNPs in the table that are being called for a subgroup of samples, a position in that group of SNPs can be selected to be used as a defining SNP. Defining SNPs are found in the *define_filter.xlsx dependency file. You can locate this file by showing your reference type locations with the command: vsnp3_path_adder.py -s

There is additional information on adding a defining SNP here:
https://github.com/USDA-VS/vSNP/blob/master/docs/detailed_usage.md#adding-new-groups-or-subgroups

The final SNP alignment is not a core SNP alignment. The SNP alignments output for the designated groups only include those SNPs that are parsimony informative. When looking at a group, if the same SNP has occurred in all samples within that group, it will not be shown in the SNP table. vSNP was designed to show differences between datasets. As datasets increase and new outbreaks emerge, new defining SNPs are used to group samples into relatively small subsets so the focus can be on SNP changes specific to an outbreak.

@butterbee
Copy link
Author

Thank you very much for the clarification. Sounds like SNP table can be used to identify SNPs that define sub-groups or strains in a given dataset when mapped against the same reference genome. On the other hand, would it possible to use the define filter xlsx to mask snps in certain regions eg. repeat regions ?

I'm interested in building SNP core alignment based maximum likelihood phylogeny and compute the genetic distance between strains through SNP distance. Is there a way to utilise the vcf files (with zero coverage positions) generated in step1 to produce a core SNP alignment and a phylogenetic tree subsequently?
Re vcf files with zero coverage, it seems these files are presented with large number of variants. Any recommendations on filtering high quality snps (only) out from those vcf files?
Thanks!

@stuber
Copy link
Contributor

stuber commented Jan 15, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants