-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strange behaviour on a test example #11
Comments
Looking at the code, it seems that a glf should only contain the genotype likelihoods without the first 6 columns of snp information. Is that correct? Removing those, now both machines get the same results, with posteriors set to (0.333 0.333 0.333) for all snps (independently of the number of flanking snps), which is what we see also on real data. We would expect that, since the genotype likelihoods are all zero, then the posteriors should reflect the prior (haplotype frequencies in the reference panel), and those are not equiprobable. It's almost as if the reference panel is actually not considered, or something else is going on. |
Ok, I've now fixed the glf file, but as I said, the result still doesn't look correct. This should be considered as a small test case for what the results we are getting on real data, so until we understand this, we are stuck. |
The problem is related to the fact that I put 0s in the glf file: the reason for this is that the genotype likelihoods are missing. What is the correct way of specifying missingness? |
I apologize. The variable naming was a mess. It really should have been called g_log_penetrance, not g_snp_penetrance. Hence, for the GLF input, penetrance for a missing genotype in probability space is .33 .33 .33 and the log penetrance is -1.09 -1.09 -1.09. Probabilities always add to one. So you can either edit the code to read in things in probability space or log space. Right now it assumes input is in log space. |
Here is in indication of what I am getting at: [garyc@ssb202q-1 src]$ find . -name '.cpp'|xargs grep g_snp_penetrance |
Be sure the input matrix of the GLF file is of dimension TOTAL_SNPS rows and TOTAL_PERSONS * 3 columns. Thanks. |
Even if I replace all the 0s in the glf file with -1.09, I still get that all posteriors are set to .333. If instead I replace them with a positive value (say 1, or 10), I get more reasonable looking values. So something's not quite right with the handling of missingness. |
I just went through the code. I forgot that it was probabilities after all, and not log-prob. That makes it easier at least for the user. Also, the method will need to have some information that resembles the template haplotypes. i.e. some of the SNPs should at least match the true template haplotype pair's genotypes. With .3 for everything, I see that I get equal probability for any given template haplotype pair. Can you provide an updated common.30.glf with more informative genotype likelihoods? |
Ok, but if the SNPs are missing, shouldn't the posterior reflect the haplotype frequencies? Those are not equiprobable, so I think that the solution you are getting is still incorrect. |
That's strange. I just tested with a new GLF file and these are excerpts [garyc@ssb202q-1 tests]$ head uninf.30.glf POSTERIORS ==> uninf.30.glf <== ==> POSTERIORS <== On 06/01/2015 02:39 PM, Marco Colombo wrote:
|
Yes, I get the same values. I've fixed the example to have .333 everywhere instead of 0s (2e6ac8a). |
In bd46c4e I've committed a small test example to understand why we always seem to get equiprobable posteriors (0.333 0.333 0.333). Varying the flanking_snps option to 6 or more (maximum allowed is 10 here), produces those posteriors. For 5 or less, we start seeing different values.
To complicate matters, the same example produces different results according to the machine I'm using. Building with the debug_posterior set to true, on one machine I get this output:
and on another I get
Could you have a try to see what's going on?
The text was updated successfully, but these errors were encountered: