Variant class should have get_preferred_transcript() #79

pnrobinson · 2023-10-10T21:43:42Z

No description provided.

ielis · 2023-10-23T02:08:51Z

@pnrobinson I'm thinking about the cardinality of this attribute.

We can use MANE as the source of preferred transcript. However, per project info, the MANE attribute is only available for hg38 build.

The MANE project is only being completed for human genes on GRCh38. There is no plan to retroactively add this data to our archived GRCh37 gene annotation.

Next, not all genes have a MANE transcript, there are ~200 protein-coding genes with no MANE (e.g. MT-ATP6, NKRF). It's hard to know if this becomes an issue..

So, I think 0..1 is what should be done here...

pnrobinson · 2023-10-23T05:07:48Z

Probably. On the other hand, we could choose the longest transcript to be preferred if there is no MANE. The user should be able to change the preferred transcript anyway and we need to present a table with a summary of all of this right after we set up the cohort?

ielis · 2023-10-23T13:30:17Z

OK, so to allow custom preferred transcript, we will need to have a dedicated component in the functional annotation step.

I propose taking a mapping from gene symbol to transcript accession, something like:

pref_tx_acc_ids = {
  'FBN1': 'NM_000138.5',
  'MAPK8IP3': 'NM_001318852.2'
}

Then, need the following sub-task to resolve preferred transcript for any gene we encounter in the functional annotation workflow:

check if the gene symbol is in the pref_tx_acc_ids and use the accession to figure out the "preferred" status
if there is no corresponding entry in pref_tx_acc_ids, fall back to the MANE transcript
if there is no MANE transcript in the response, use the "longest" transcript

Now, what is "longest"? I would suggest the transcript with the greatest count of coding bases. In case of tie, choose the tx with longer UTRs. In case of tie, choose the tx with the lowest tx accession ID when comparing the accession alphabetically.

Thanks to the workflow, we will always have a preferred transcript, so the cardinality can be 1..1.

ielis · 2023-12-05T19:03:26Z

This is about making sure that TranscriptAnnotation.is_preferred is set with a good value.

There should be just one preferred transcript for a gene.
The selection of the preferred transcript happens during the functional annotation.
The selection follows the rules above (1st check user's input dict, then MANE, then length)

We need to use the is_preferred in the downstream analysis. Currently, some predicates need to get explicit tx ID. This is not ideal.
The simplest way to approach this is to show the user how to review the results of the functional annotation, choose a transcript for a gene and then use it in the downstream g2p analysis.
We need to show how this is done in the notebooks/docs.

ielis added the enhancement New feature or request label Oct 16, 2023

ielis added this to the Manuscript milestone Nov 16, 2023

lnrekerle self-assigned this Nov 30, 2023

This was referenced Jan 9, 2024

Develop strategy for handling errors in the input data #127

Closed

Formalize handling and reporting of input errors #128

Merged

ielis linked a pull request Jan 16, 2024 that will close this issue

Formalize handling and reporting of input errors #128

Merged

ielis removed a link to a pull request Jan 16, 2024

Formalize handling and reporting of input errors #128

Merged

ielis closed this as completed in #128 Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variant class should have get_preferred_transcript() #79

Variant class should have get_preferred_transcript() #79

pnrobinson commented Oct 10, 2023

ielis commented Oct 23, 2023

pnrobinson commented Oct 23, 2023

ielis commented Oct 23, 2023

ielis commented Dec 5, 2023

Variant class should have get_preferred_transcript() #79

Variant class should have get_preferred_transcript() #79

Comments

pnrobinson commented Oct 10, 2023

ielis commented Oct 23, 2023

pnrobinson commented Oct 23, 2023

ielis commented Oct 23, 2023

ielis commented Dec 5, 2023