Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong haplotype frequencies #14

Closed
mcol opened this issue Jun 1, 2015 · 4 comments
Closed

wrong haplotype frequencies #14

mcol opened this issue Jun 1, 2015 · 4 comments

Comments

@mcol
Copy link
Collaborator

mcol commented Jun 1, 2015

I'm using the common.30 example as discussed in issue #11, and added some printouts in 20913ec (to activate, set debug_haplotype to true and recompile). The output contains the following:

Tallied up counts for each ref long haplotype group. There are 6 groups.
0010 count: 23
0001 count: 1
1010 count: 1
0000 count: 59
1000 count: 7
0110 count: 9
Sorted long haplotype groups by frequency. There are 6 sorted groups. 
0010 count: 1
1010 count: 1
0001 count: 1
0000 count: 1
1000 count: 1
0110 count: 1

Note how the counts are correct in the first block but are set to 1 in the second block. Further down the printout of haplotype frequencies is the following:

STORING 6 TRUE HAPLOTYPES AND FREQS:
0:0000,0.166667
1:0110,0.166667
2:1000,0.166667
3:0010,0.166667
4:1010,0.166667
5:0001,0.166667

So it seems that we are indeed losing the correct haplotype frequencies.

@gchen98
Copy link
Owner

gchen98 commented Jun 1, 2015

Marco

As a sanity check, I made an artificial GLF file where I took the the first columns of common.30.hap, and summed them. If geno==0 I output .98 .01 .01 if geno==1, output .01 .98 .01 if geno==2 output .01 .01 .98. I should recover a match to the first two haplotypes in the POSTERIORS file, and indeed this is what I saw. So in real life we should see some intermediate accuracy of only some of the 30 SNPs is informative. Let me know how this works out for you.

@gchen98
Copy link
Owner

gchen98 commented Jun 2, 2015

Actually I think this is working right. The program in the second step is making sub-haplotypes based on the intersection of the chip SNPs and the ref haplotype SNPs. The frequencies of the sub-haplotypes are then computed. Try this. Copy your current bim and glf files. In the copies, delete the second and third rows. Edit your settings XML to reflect the new bim and glf, and look at the frequencies, they should no longer be uniform.

@gchen98
Copy link
Owner

gchen98 commented Jun 2, 2015

The basic issue here is that I forgot that in Version 2.0 I made things easier so the user did not have to manually fill in .3 .3 .3 for the non-genotyped SNPs. They just needed to fill in the rows in BIM and GLF that were on the chip and the software was supposed to figure things out from there with the intersection idea for making sub-haplotypes I just described above.

@mcol
Copy link
Collaborator Author

mcol commented Jun 2, 2015

Thanks for the explanation, this is reassuring.

@mcol mcol closed this as completed Jun 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants