-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alias not mapped if gene is correct #14
Comments
If I understand correctly, AAVS1 is a virus integration site within the PPP1R12C gene, and Cpamd8/Mug2 are both synonyms for a protein-coding gene. In the AAVS1/PPP1R12C case they are different things, and don't need to be "corrected" to show both. In the second case, my reading of https://www.alliancegenome.org/gene/MGI:99836 is that both are correct, so expansion would be informational but not necessary. If I'm right about these cases, I would be inclined to keep the current default behavior. But you could add a flag like |
@lwaldron Yes, it makes sense. I also prefer the current behavior - as far as the input is an approved symbol, providing additional aliases seems to be an unnecessary complication. It would be fine to add a |
Yes, this makes sense for the cases that I have presented. I was just worried about other such cases if present and also as the second reviewer mentioned, it could be that an older dataset is referring to an alias of the gene which is now also an approved symbol. In any case, |
Hi, I am quite the n00b around here. So first of all, please accept my apologies if I am writing in the wrong place. I don´t think I should open a new issue here, as my question is related to this current one. Thanks in advance! EDIT: After working a little more with the package, I may have found a real issue. When a list of symbols contains 2 entries for "2-Mar", they can stand for MARC2 or MARCH2, which are currently identified with an approved symbol in HGNC as MTARC2 and MARCHF2, respectively. However, $findExcelGeneSymbols() maps both only as MTARC2 EDIT2: After further inspection, I found that actually the output gene list of findExcelGeneSymbols() only changes the first "2-Mar" to MTARC2, the other one remains "2-Mar".... |
Just a quick reply in generalities, can provide more specific code if
helpful. There isn't a a 1:1 relationship between aliased and current
symbols, but if you are not too concerned by losing some genes, you might
choose to eliminate genes that mapped to multiple symbols (those with "///"
in the approved column). Or you might split the "///" onto multiple rows,
so that an ambiguous original symbol gets split into all its multitude
possibilities. Finally, if somehow you know which chromosome the genes were
on, providing the chromosome argument can help resolve those ambiguities.
|
I see what you mean @Rendan86 : > checkGeneSymbols("2-Mar")
Maps last updated on: Thu Mar 25 08:36:49 2021
x Approved Suggested.Symbol
1 2-Mar FALSE MARCHF2 /// MTARC2
Warning messages:
1: In checkGeneSymbols("2-Mar") :
Human gene symbols should be all upper-case except for the 'orf' in open reading frames. The case of some letters was corrected.
2: In checkGeneSymbols("2-Mar") : x contains non-approved gene symbols
> findExcelGeneSymbols("2-Mar")
[1] "MTARC2"
Warning message:
In findExcelGeneSymbols("2-Mar") :
Transmogrified gene symbols found. Returning the following corrections: 2-Mar to MTARC2
>
|
Example
According to me the desired behaviour for a unit test should be this:
I can apply the fix for it if you think this issue needs fixing. We'll also need to change some unit test for this to apply.
The text was updated successfully, but these errors were encountered: