-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translation Issue arising from isambiguous(DNA_Gap) == false
#277
Comments
Dear @chelate Thank you for the bug report. However, the translation of gap nucleotides was definitely a bug. I've fixed this in #278 , after which trying to translate a sequence with gaps will throw an exception. There are two alternative behaviours:
Implementing it as throwing an error is more conservative in the sense that it'll always be legal to change it to a non-throwing behaviour later. |
Sorry if I'm annoying, but I have some input on this issue. I agree now that "-" is not ambiguous. It would make much more sense to translate "---" to AA_Gap. I'd argue that "---" is how you'd express AA_Gap in standard code. If you are working with nt and align them in frame, these gaps will show up as "---" and there's really no other way to express them. These are the ubiquitous result of aligning to reference sequences for variable length proteins. The only ambiguous thing is when your alignment assigns some nucleotides to dangling out of frame like "--G". In my case this has happened and that's how I found this bug. Erroring then is reasonable (, but conservative and annoying in practice). I'd prefer AA_X with a warning like "translating out-of-frame alignment as an ambiguous amino acid AA_X" it just means that somebody allowed out of frame alignments upstream in the pipeline from me, or that there was an insertion at the PCR level, depends what the pipeline looks like. Sorry I don't know how to use git or I'd pr this myself. shame |
So I would expect
however, in fact "DNA_Gap" is NOT ambiguous. This has some surprising implications, such as that gaps code for alanine.
I'd strongly suggest either making gaps ambiguous, or possibly mapping to an amino acid gap (is there such a thing?). Maybe I'm just bad at bioinformatics, but I think some data does not distinguish between "gaps" (as in deletions) and missing coverage?
The text was updated successfully, but these errors were encountered: