-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variant class should have get_preferred_transcript() #79
Comments
@pnrobinson I'm thinking about the cardinality of this attribute. We can use MANE as the source of preferred transcript. However, per project info, the MANE attribute is only available for hg38 build.
Next, not all genes have a MANE transcript, there are ~200 protein-coding genes with no MANE (e.g. MT-ATP6, NKRF). It's hard to know if this becomes an issue.. So, I think |
Probably. On the other hand, we could choose the longest transcript to be preferred if there is no MANE. The user should be able to change the preferred transcript anyway and we need to present a table with a summary of all of this right after we set up the cohort? |
OK, so to allow custom preferred transcript, we will need to have a dedicated component in the functional annotation step. I propose taking a mapping from gene symbol to transcript accession, something like: pref_tx_acc_ids = {
'FBN1': 'NM_000138.5',
'MAPK8IP3': 'NM_001318852.2'
} Then, need the following sub-task to resolve preferred transcript for any gene we encounter in the functional annotation workflow:
Now, what is "longest"? I would suggest the transcript with the greatest count of coding bases. In case of tie, choose the tx with longer UTRs. In case of tie, choose the tx with the lowest tx accession ID when comparing the accession alphabetically. Thanks to the workflow, we will always have a preferred transcript, so the cardinality can be |
This is about making sure that
We need to use the |
No description provided.
The text was updated successfully, but these errors were encountered: