Skip to content

Major update v0.3 - new deconvolution and differential mods analysis

Compare
Choose a tag to compare
@drewjbeh drewjbeh released this 26 Jan 15:45
· 303 commits to master since this release

Major update to mim-seq core algorithms

New deconvolution

  • Deconvolution now assesses all mismatches between parent and children instead of individual mismatches one-at-a-time. This makes all unique tRNA sequences distinguishable from each other offering full resolution within the tRNA transcriptome.
  • From the full set of mismatches, a minimal set is defined that is sufficient for resolution of each transcript.
  • The minimal set is chosen at the most 3' position possible to account for coverage drop-off close to the 5' end due to modification-induced stops to RT.
  • A new parameter, --deconv-cov-ratio, allows the user to set a threshold for the drop in coverage due to stops that render some reads difficult to assign to their correct transcript. For example, if many reads (say 60%) end at position 26 in a reference tRNA due to m2,2G modification here, but a mismatch at position 13 is needed to assign this read to a child transcript within the cluster, then many reads will not be able to be assigned and will stay assigned to the parent.
    • In this case, the --deconv-cov-ratio can be set to 0.5 (i.e. a drop in coverage of 50% from the 3' end to the required mismatch in question). This will mark the parent and the specific sequence as not deconvoluted (as 60% of the reads end before the mismatch), and these will be excluded from modification analysis (and table outputs) as well as DESeq2 analysis of differential expression.

Differential modification analysis

  • In experiments with more than 1 condition, conditions will be compared pairwise to assess significant differences in modification status.
  • To achieve this, proportions of modified and unmodified nucleotides at each position for each tRNA are used for the calculation of log Odds Ratios (logOR). These are then tested for significance with a Chi-Squared test, and corrected for multiple testing using FDR.
  • See the original paper Methods section for detailed explanation.
    Behrens et al. (2021) High-resolution quantitative profiling of tRNA abundance and modification status in eukaryotes by mim-tRNAseq. Mol. Cell.

Small bugfixes and changes

  • --min-cov now accepts an integer for filtering low coverage tRNAs (as before, where the integer represents a total read coverage threshold for the tRNA), as well as a fraction between 0 and 1 representing the proportion of reads mapped to that tRNA relative to all tRNA mapped reads (recommended values to start testing are 0.0005).
  • Automatic connection and parsing of data from Modomics API. Local Modomics file can be used using --local-modomics
  • --double-cca mode enables assessment of tRNA ends with 3'CCACCA. This end addition is common in marking tRNAs for degradation. In this mode, CCA analysis files will represent proportions of the second CCA end addition not the first. I.e. the proportion of 3'-CCA is representative of 3'-CCACCA, 3'-CC represents 3'CCACC, and so on. "Absent" indicates all those reads without a second CCA end (which may include those with full single 3'-CCA down to absent single CCA).