-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate all transcripts #195
Comments
What kinds of annotations are you interested in for non-coding transcripts? We've been fairly narrowly focused on coding effects so I don't know what you can say about a non-coding transcript. |
I'm interested in comparing tools and get them to perform as similar as possible. But I'm not that interested in non coding transcripts. Other people might though. But, by checking for biotype = "protein_coding" you're skipping a bunch of coding biotypes. If there is a generic API for people to pick whichever transcripts their want, then I guess varcode becomes more useful. |
If a transcript is already annotated as triggering NMD due to an early stop codon, is it useful to predict some other effect in its protein sequence (e.g. single amino acid substitution)? It might be but I can't currently think of the use-case. I can try adding a parameter for a set of biotypes on which we perform predictions but it's not clear to me that those predictions will always be meaningful. |
Everything you said is quite valid. But that's not the issue. The issue is just letting the user pick and choose which transcripts/biotypes he/she wants, still keeping the current behavior as default. Like, right now, IG* and TR* transcripts are too ignored. For instance https://github.com/joaoe/varcode/commit/fe02769f199f9e6c6d2a6e8075786cd2a19d2f89 |
One of the differences between VEP and varcode is that VEP is happy to annotate ALL transcripts in its database, including pseudo-genes. Varcode will limit itself to transcripts which have
Transcript.is_protein_coding
returning True. Also see openvax/pyensembl#169.As such, in the case of VEP, it's up for the developer/user to filter out which transcripts he/she finds useful.
I'd like for there to be a way to tell varcode which biotypes should be accepted. A possibility would be to have an optional callback method ('transcript_filter') when calling
predict_variant_effects()
which returnsFalse
orTrue
if a transcript should be annotated. The developer/user would then implement his/her filtering logic, perhaps even filtering transcripts by ID. That way, uninteresting transcripts can be skipped (saving time and CPU cycles), non coding transcripts of interest can be returned.Another challenge is that VEP also annotates incomplete transcripts. But supporting this might be a bit more laborsome. Perhaps something for a different task.
Thoughts ?
The text was updated successfully, but these errors were encountered: