Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all protein coding biotypes #169

Open
joaoe opened this issue Oct 4, 2016 · 2 comments
Open

Support all protein coding biotypes #169

joaoe opened this issue Oct 4, 2016 · 2 comments

Comments

@joaoe
Copy link

joaoe commented Oct 4, 2016

Currently, after the biotype cleanup, only the biotype "protein_coding" is used in the check in Transcript.is_protein_coding().

Looking at this list http://www.ensembl.org/Help/Glossary?id=275 confuses me a bit, since nontranslating_CDS or polymorphic_pseudogene are included.

Perhaps the list in Transcript.is_protein_coding() should be extended to include IG_gene, TR_gene, non_stop_decay, nonsense_mediated_decay and protein_coding ?

@iskandr
Copy link
Contributor

iskandr commented Oct 5, 2016

The trouble with BCR and TCR genes is that they don't actually code for anything before recombination by Rag. The types which seem possibly more interesting for effect prediction are (1) non-stop decay & NMD genes, (2) polymorphic pseudogenes. However, they're both trickier to handle since you're probably predicting an effect which won't manifest in much or any actual protein product.

@joaoe
Copy link
Author

joaoe commented Oct 5, 2016

which won't manifest in much or any actual protein product.

The same could be said for regular coding transcripts that are not expressed :) Perhaps if something is added by openvax/varcode#195 then this issue can be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants