-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potentially a serious issue with Eigenstrat to Plink conversions #76
Comments
In Admixtools you should think of column 5 as the count allele -- not
necessarily reference
Similarly PLINK also uses this as the count allele -- so homozygous count
is genotype 2
If you then use PLINK merge with other alleles chosen as count --maybe
there is trouble.
(I am not a PLINK expert)
A trick: If possible always have human reference in a genotype data set.
Makes sorting out these
troubles much easier.
By the way PLINK often chooses the count allele as the "majority allele"
guaranteeing that
different datasets use different conventions.
I do not consider this an eigenstrat bug; it's a bug if
convertf eigenstrat -> PLINK -> eigenstrat is not the identity map.
Nick
…On Thu, Apr 1, 2021 at 6:40 PM geneanalyst ***@***.***> wrote:
This issue affects conversions from Eigenstrat to Plink using convertf and
par.PED.PACKEDPED. As you're aware Eigenstrat .snp follows VCF format with
regards to listing the REF allele (col 5) in the column prior to the ALT
allele (col 6).
However, Plink .bim does the opposite where ALT is listed in col 5 and REF
in col 6.
It seems that convertf is not aware of this Plink .bim format because
after I convert files from Eigenstrat .geno to Plink .bed using
par.PED.PACKEDPED, ALT is still listed in col 6 and REF in col 5 of the
.bim file that was just obtained from .snp.
So now when this is merged with other Plink files you have a totally mixed
up final merged Plink file with REF being in col 5 for some positions and
col 6 for other positions. This may not cause issues in Plink for minor
allele frequency calculations but once converted back to Eigenstrat .geno
may cause flawed results.
My question is would this cause any issues with Admixtools code if REF is
col 5 for some positions and col 6 for other positions and alternatively
ALT in col 6 for some and col5 for other positions ?
I'm quite certain this has gone unnoticed by most researchers converting
files back and forth.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#76>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEE77B3AYWSA54PQN6UTDT3TGTY6TANCNFSM42H4VAJQ>
.
|
I'm not referring to the .geno or .bed files where count alleles are stored. I'm referring to .snp and .bim. For example, snp
.bim 1 rs3094315 0.02013 752566 G A Trouble is Admixtools looks at this .snp and considers G as REF and A as ALT in 1st row. So if this converted Plink is merged with other Plink files not originating as Eigenstrat then some positions will have correct REF / ALT and others will have switched REF/ALT. Does this present any issues when converted back to Eigenstrat ? In other words is it relevant to Admixtools whether some positions have correct REF/ALT and others have reversed REF/ALT |
This issue affects conversions from Eigenstrat to Plink using convertf and par.PED.PACKEDPED. As you're aware Eigenstrat .snp follows VCF format with regards to listing the REF allele (col 5) in the column prior to the ALT allele (col 6).
However, Plink .bim does the opposite where ALT is listed in col 5 and REF in col 6.
It seems that convertf is not aware of this Plink .bim format because after I convert files from Eigenstrat .geno to Plink .bed using par.PED.PACKEDPED, ALT is still listed in col 6 and REF in col 5 of the .bim file that was just obtained from .snp.
So now when this is merged with other Plink files you have a totally mixed up final merged Plink file with REF being in col 5 for some positions and col 6 for other positions. This may not cause issues in Plink for minor allele frequency calculations but once converted back to Eigenstrat .geno may cause flawed downstream analysis.
My question is would this cause any issues with Admixtools code if REF is col 5 for some positions and col 6 for other positions and alternatively ALT in col 6 for some and col5 for other positions ?
I'm quite certain this has gone unnoticed by most researchers converting files back and forth.
The text was updated successfully, but these errors were encountered: