You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Less of an issue and more of a question, but looking at the HLA genotypes found by arcasHLA, I've noticed that in the output files there is not a specific confidence score. Instead, there are the percentages for the explained reads and abundance for the HLA allele. Using what I know, it seems like the explained percentage is the closest to the confidence score, but I was wondering how it was calculated.
I have a few samples that have several HLA-A alleles. The abundances vary, but for the sake of example, the top alleles are:
allele | abundance
A * 02:844 | 50%
A * 01:11N | 25%
A * 01:37:01 | 13%
A * 01:370 | 10%
Meanwhile, for this allele, the top 10 are all variations of HLA-A2. However, the most likely genotype is a combination of the top allele and the third most abundant making it [A02:844, A01:37:01] explaining about 97% of the reads. But why is this the case? Taking it at face value, would the top two abundances not explain most of the reads, or am I misunderstanding what the abundance represents? Also, with the top ten alleles all being HLA-A2, why is it not the case that HLA-A2 is the only allele present, as in, how are we confident that there are two different alleles. Finally, why would it return the top for alleles by abundance when each should only have 2? I can understand a classification error, but again, the second highest abundance isn't taken into consideration for the top scoring allele pair.
Another final note is that the two allele percentages are all very close, within 1% as a maximum difference. These could be naive questions, but I am having trouble understanding these percentages and the output overall, and the previous publications are vague specifically around the explained reads percent and the ambiguity of the abundances found.
The text was updated successfully, but these errors were encountered:
Less of an issue and more of a question, but looking at the HLA genotypes found by arcasHLA, I've noticed that in the output files there is not a specific confidence score. Instead, there are the percentages for the explained reads and abundance for the HLA allele. Using what I know, it seems like the explained percentage is the closest to the confidence score, but I was wondering how it was calculated.
I have a few samples that have several HLA-A alleles. The abundances vary, but for the sake of example, the top alleles are:
allele | abundance
A * 02:844 | 50%
A * 01:11N | 25%
A * 01:37:01 | 13%
A * 01:370 | 10%
Meanwhile, for this allele, the top 10 are all variations of HLA-A2. However, the most likely genotype is a combination of the top allele and the third most abundant making it [A02:844, A01:37:01] explaining about 97% of the reads. But why is this the case? Taking it at face value, would the top two abundances not explain most of the reads, or am I misunderstanding what the abundance represents? Also, with the top ten alleles all being HLA-A2, why is it not the case that HLA-A2 is the only allele present, as in, how are we confident that there are two different alleles. Finally, why would it return the top for alleles by abundance when each should only have 2? I can understand a classification error, but again, the second highest abundance isn't taken into consideration for the top scoring allele pair.
Another final note is that the two allele percentages are all very close, within 1% as a maximum difference. These could be naive questions, but I am having trouble understanding these percentages and the output overall, and the previous publications are vague specifically around the explained reads percent and the ambiguity of the abundances found.
The text was updated successfully, but these errors were encountered: