Transcript sources and isoforms #459

14zac2 · 2024-10-31T17:22:01Z

Hello!

I have noticed that when running Mikado, it tends to prefer transcripts from various sources (found by grepping the different alias transcripts in the final GFF). For example, Mikado generally prefers transcripts from StringTie, followed by LiftOff, then BRAKER3, then TOGA. StringTie and LiftOff often have close to 10,000 transcripts from each source, whereas BRAKER3 and TOGA are often around or below 1000. I was thinking this may be related to StringTie and LiftOff having more unique transcripts as input compared to the other sources, but the number of unique transcripts is not high enough to explain this difference. Do you have any idea as to why Mikado might be preferring these two transcript sources over the others?

Also, I was comparing the effects of BLAST and Diamond, and found that no matter the database used, BLAST ends up contributing to a GFF file that has way more isoform possibilities than when Diamond is used for the search. I tried Diamond in "ultra sensitive" mode but found the same result. I notice that BLAST always finds more hits for more of the candidate transcripts and that using the parameter "-max_target_seqs 5" restricts the number of hits-per-transcript-ID to five in Diamond, but not in BLAST (sometimes a single transcript ID has up to 20 hits). Not sure how much this matters.

I'd love your thoughts on both of these observations!

Many thanks,
Zoe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcript sources and isoforms #459

Transcript sources and isoforms #459

14zac2 commented Oct 31, 2024

Transcript sources and isoforms #459

Transcript sources and isoforms #459

Comments

14zac2 commented Oct 31, 2024