strange behaviour on a test example #11

mcol · 2015-05-28T14:54:35Z

In bd46c4e I've committed a small test example to understand why we always seem to get equiprobable posteriors (0.333 0.333 0.333). Varying the flanking_snps option to 6 or more (maximum allowed is 10 here), produces those posteriors. For 5 or less, we start seeing different values.

To complicate matters, the same example produces different results according to the machine I'm using. Building with the debug_posterior set to true, on one machine I get this output:

CPU_POSTERIOR: 0 0 0.444444 0.444444 0.111111
CPU_POSTERIOR: 1 0 0.694444 0.277778 0.0277778

and on another I get

CPU_POSTERIOR: 0 0 0.333333 0.333333 0.333333
CPU_POSTERIOR: 1 0 0.333333 0.333333 0.333333

Could you have a try to see what's going on?

The text was updated successfully, but these errors were encountered:

mcol · 2015-05-28T17:30:40Z

Looking at the code, it seems that a glf should only contain the genotype likelihoods without the first 6 columns of snp information. Is that correct?

Removing those, now both machines get the same results, with posteriors set to (0.333 0.333 0.333) for all snps (independently of the number of flanking snps), which is what we see also on real data. We would expect that, since the genotype likelihoods are all zero, then the posteriors should reflect the prior (haplotype frequencies in the reference panel), and those are not equiprobable. It's almost as if the reference panel is actually not considered, or something else is going on.

mcol · 2015-05-29T09:32:09Z

Ok, I've now fixed the glf file, but as I said, the result still doesn't look correct. This should be considered as a small test case for what the results we are getting on real data, so until we understand this, we are stuck.

mcol · 2015-05-29T11:06:01Z

The problem is related to the fact that I put 0s in the glf file: the reason for this is that the genotype likelihoods are missing. What is the correct way of specifying missingness?

gchen98 · 2015-05-29T13:18:02Z

I apologize. The variable naming was a mess. It really should have been called g_log_penetrance, not g_snp_penetrance. Hence, for the GLF input, penetrance for a missing genotype in probability space is .33 .33 .33 and the log penetrance is -1.09 -1.09 -1.09. Probabilities always add to one. So you can either edit the code to read in things in probability space or log space. Right now it assumes input is in log space.

gchen98 · 2015-05-29T13:19:18Z

Here is in indication of what I am getting at:

[garyc@ssb202q-1 src]$ find . -name '.cpp'|xargs grep g_snp_penetrance
./init/denovo_glf.cpp: io_manager->read_input(g_snp_penetrance,this->g_people,this->g_snps);
./init/guide_glf.cpp: io_manager->read_input(ref_haplotype,g_snp_penetrance,this->informative_snp,this->g_people,this->g_snps, this->ref_haplotypes);
./init/guide_glf_opencl.cpp: writeToBuffer(buffer_snp_penetrance,g_people_geno_dim_g_snps,g_snp_penetrance,"buffer_snp_penetrance");
./init/denovo_glf_opencl.cpp: writeToBuffer(buffer_snp_penetrance,g_people_geno_dim_g_snps,g_snp_penetrance,"buffer_snp_penetrance");
./penetrance/denovo_glf.cpp: logpenetrance_new[parent] = g_snp_penetrance[i(geno_dim*
./penetrance/denovo_glf.cpp: logpenetrance_old[parent] = g_snp_penetrance[i_(geno_dim_
./penetrance/denovo_glf.cpp: float logpenetrance_new = g_snp_penetrance[i_(geno_dim_
./penetrance/denovo_glf.cpp: logpenetrance_old = g_snp_penetrance[i_(geno_dim_
./penetrance/guide_glf.cpp: logpenetrance[parent]+=g_snp_penetrance[i*
./penetrance/guide_glf.cpp: g_snp_penetrance[i*
./penetrance/guide_glf.cpp: logpenetrance+=g_snp_penetrance[i_(geno_dim_g_snps)+geno_dim*(g_left_marker+extended_snp_mapping[l]) + n];
[garyc@ssb202q-1 src]$

gchen98 · 2015-05-29T17:54:40Z

Be sure the input matrix of the GLF file is of dimension TOTAL_SNPS rows and TOTAL_PERSONS * 3 columns. Thanks.

mcol · 2015-06-01T10:14:31Z

Even if I replace all the 0s in the glf file with -1.09, I still get that all posteriors are set to .333. If instead I replace them with a positive value (say 1, or 10), I get more reasonable looking values. So something's not quite right with the handling of missingness.

gchen98 · 2015-06-01T19:27:33Z

I just went through the code. I forgot that it was probabilities after all, and not log-prob. That makes it easier at least for the user. Also, the method will need to have some information that resembles the template haplotypes. i.e. some of the SNPs should at least match the true template haplotype pair's genotypes. With .3 for everything, I see that I get equal probability for any given template haplotype pair. Can you provide an updated common.30.glf with more informative genotype likelihoods?

mcol · 2015-06-01T21:39:28Z

Ok, but if the SNPs are missing, shouldn't the posterior reflect the haplotype frequencies? Those are not equiprobable, so I think that the solution you are getting is still incorrect.

gchen98 · 2015-06-01T22:08:25Z

That's strange. I just tested with a new GLF file and these are excerpts
from my input and output:

[garyc@ssb202q-1 tests]$ head uninf.30.glf POSTERIORS

==> uninf.30.glf <==
.33 .33 .33
.33 .33 .33
.33 .33 .33
.33 .33 .33
.33 .33 .33
.33 .33 .33
.33 .33 .33
.33 .33 .33
.33 .33 .33
.33 .33 .33

==> POSTERIORS <==
0 0.444444 0.444444 0.111111
1 0.694444 0.277778 0.0277778
2 0.183673 0.489796 0.326531
3 0.734694 0.244898 0.0204082
4 0.734694 0.244898 0.0204082
5 0.734694 0.244898 0.0204082
6 0.25 0.5 0.25
7 1 1e-10 1e-10
8 1 1e-10 1e-10
9 0.25 0.5 0.25
[garyc@ssb202q-1 tests]$

On 06/01/2015 02:39 PM, Marco Colombo wrote:

Ok, but if the SNPs are missing, shouldn't the posterior reflect the
haplotype frequencies? Those are not equiprobable, so I think that the
solution you are getting is still incorrect.

—
Reply to this email directly or view it on GitHub
#11 (comment).

mcol · 2015-06-02T13:38:44Z

Yes, I get the same values. I've fixed the example to have .333 everywhere instead of 0s (2e6ac8a).

mcol mentioned this issue Jun 1, 2015

wrong haplotype frequencies #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strange behaviour on a test example #11

strange behaviour on a test example #11

mcol commented May 28, 2015

mcol commented May 28, 2015

mcol commented May 29, 2015

mcol commented May 29, 2015

gchen98 commented May 29, 2015

gchen98 commented May 29, 2015

gchen98 commented May 29, 2015

mcol commented Jun 1, 2015

gchen98 commented Jun 1, 2015

mcol commented Jun 1, 2015

gchen98 commented Jun 1, 2015

mcol commented Jun 2, 2015

strange behaviour on a test example #11

strange behaviour on a test example #11

Comments

mcol commented May 28, 2015

mcol commented May 28, 2015

mcol commented May 29, 2015

mcol commented May 29, 2015

gchen98 commented May 29, 2015

gchen98 commented May 29, 2015

gchen98 commented May 29, 2015

mcol commented Jun 1, 2015

gchen98 commented Jun 1, 2015

mcol commented Jun 1, 2015

gchen98 commented Jun 1, 2015

mcol commented Jun 2, 2015