-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncertain Calls Despite Known Relationships #5
Comments
Hi, The uncertain classification is a measure that the corrected kin coeff is within the variation of the unrelated individuals corrected kin distribution at the given SD. The 6SD certainity is extremly strict criteria. In the manuscript we used the 6SD threshold for co-analysing ~2000 samples with very heterogenous genome sturcture. This is an overkill for analysing individuals where the reference is matching and all individuals have very similar genome structure. Accordingly, in case your test individuals are from the same population structure and you are sure that the reference is ok, you can lower the SD threshold for higher sensitivity. Even 3SD threshold supposed to be strict in such case. The only drawback is if any sample pairs have specific population structure that is restricted to them (ie 5% Han components in only 2-3 samples where all the rest of the 60 samples have European only genomes), then the IBS that is hared between these few samples will not be "regressed out" by the PCA. So it is crucial that you include soe reference data that also represent this minor component. In this case it is valid to lower the SD thresh for identifying related individuals from the unrelated individuals. Even though the classification says uncertain relation atz the given threshold, the corrected kin coefficients are still estimated as the best likely kin coeff. The out also contains the 6SD threshold that is represantative of overlapping marker fraction that exist between the actual pairs. |
Thank you very much for your detailed response! To provide more context on the dataset I am working with: I have around 10 ancient individuals from the UK confirmed to be related and part of the same lineage using other kinship software, while the remaining ~50 individuals are also from the UK, share the same ancestry background, and are unrelated. All the data were generated using the same enrichment technology, targeting 1240k SNPs across the genome. I lowered the SD threshold as you suggested, which did result in more kinship calls, but these were mostly for higher-degree relationships (4th and 5th degrees). However, the classifications for lower-degree relationships (1st, 2nd, and 3rd degrees), which are my primary focus, remained unchanged. Additionally, some relationships detected by other software were not identified, regardless of the SD threshold I used. Do you have any additional recommendations or insights into why this might be happening? |
Hi! I have been running correctKIN with previously published data, which includes over 60 ancient individuals from the UK, some of whom are related. I have followed the "Extended Tool Documentation on the Usage of correctKIN Tools" and also performed my own pseudohaplodization process. However, most pairs are ultimately classified as uncertain, and I can only determine one 1st-degree and one 2nd-degree relationship pair, even though I know there are more relationships that have been confirmed using other software. Any chance you know what I might be doing wrong?
The text was updated successfully, but these errors were encountered: