Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncertain Calls Despite Known Relationships #5

Open
xavierrocarada opened this issue Nov 25, 2024 · 2 comments
Open

Uncertain Calls Despite Known Relationships #5

xavierrocarada opened this issue Nov 25, 2024 · 2 comments

Comments

@xavierrocarada
Copy link

Hi! I have been running correctKIN with previously published data, which includes over 60 ancient individuals from the UK, some of whom are related. I have followed the "Extended Tool Documentation on the Usage of correctKIN Tools" and also performed my own pseudohaplodization process. However, most pairs are ultimately classified as uncertain, and I can only determine one 1st-degree and one 2nd-degree relationship pair, even though I know there are more relationships that have been confirmed using other software. Any chance you know what I might be doing wrong?

@zmaroti
Copy link
Owner

zmaroti commented Nov 26, 2024

Hi,

The uncertain classification is a measure that the corrected kin coeff is within the variation of the unrelated individuals corrected kin distribution at the given SD. The 6SD certainity is extremly strict criteria. In the manuscript we used the 6SD threshold for co-analysing ~2000 samples with very heterogenous genome sturcture. This is an overkill for analysing individuals where the reference is matching and all individuals have very similar genome structure.

Accordingly, in case your test individuals are from the same population structure and you are sure that the reference is ok, you can lower the SD threshold for higher sensitivity. Even 3SD threshold supposed to be strict in such case. The only drawback is if any sample pairs have specific population structure that is restricted to them (ie 5% Han components in only 2-3 samples where all the rest of the 60 samples have European only genomes), then the IBS that is hared between these few samples will not be "regressed out" by the PCA. So it is crucial that you include soe reference data that also represent this minor component. In this case it is valid to lower the SD thresh for identifying related individuals from the unrelated individuals.

Even though the classification says uncertain relation atz the given threshold, the corrected kin coefficients are still estimated as the best likely kin coeff. The out also contains the 6SD threshold that is represantative of overlapping marker fraction that exist between the actual pairs.
filterRelates also outputs some statstics on the mean, median, 95% conf interval, and also the applied SD threshold for each marker overlap fracion bin. This is also indicative that at the given marker overlap fraction in your data set what is the confidence limit of the detection (any corrected kin coeff that is below the 6SD thresh at the given marker overlap will be classified as uncertain).

@xavierrocarada
Copy link
Author

Thank you very much for your detailed response! To provide more context on the dataset I am working with: I have around 10 ancient individuals from the UK confirmed to be related and part of the same lineage using other kinship software, while the remaining ~50 individuals are also from the UK, share the same ancestry background, and are unrelated. All the data were generated using the same enrichment technology, targeting 1240k SNPs across the genome.

I lowered the SD threshold as you suggested, which did result in more kinship calls, but these were mostly for higher-degree relationships (4th and 5th degrees). However, the classifications for lower-degree relationships (1st, 2nd, and 3rd degrees), which are my primary focus, remained unchanged. Additionally, some relationships detected by other software were not identified, regardless of the SD threshold I used.

Do you have any additional recommendations or insights into why this might be happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants