Skip to content

Releases: nedialkova-lab/mim-tRNAseq

Updated differential expression heatmaps, logOR filtering, and bugfixes

30 Apr 13:43
Compare
Choose a tag to compare

New features and bugfixes

Differential expression heatmaps

  • New heatmaps in DESeq2 folders which show scaled expression (normalised DESeq2 read counts), and log2 fold change values for all conditions vs control condition for significantly differentially expressed genes (padj <= 0.05).
  • Line plot of basemean expression include on far right.
  • See DE_isodecodersScaled_hm.pdf and DE_anticodonScaled_hm.pdf in /DESeq2/isodecoder and /DESeq2/anticodon, respectively.

Normalised counts outputs

  • Tables of DESeq2 normalised counts for isodecoders and anticodons now output to /counts folder.
  • These are the same values present in the last columns of all DESeq2 diffexpr-results.csv files, which means they are also excluding undeconvoluted clusters.

Bugfixes

  • Fixed ZeroDivisionError in mmQuant.py for 0 coverage at cluster positions - see issue #7
  • Fixed errors for very sparse data plotting issues in modPlot.py - see #6

Major update v0.3 - new deconvolution and differential mods analysis

26 Jan 15:45
Compare
Choose a tag to compare

Major update to mim-seq core algorithms

New deconvolution

  • Deconvolution now assesses all mismatches between parent and children instead of individual mismatches one-at-a-time. This makes all unique tRNA sequences distinguishable from each other offering full resolution within the tRNA transcriptome.
  • From the full set of mismatches, a minimal set is defined that is sufficient for resolution of each transcript.
  • The minimal set is chosen at the most 3' position possible to account for coverage drop-off close to the 5' end due to modification-induced stops to RT.
  • A new parameter, --deconv-cov-ratio, allows the user to set a threshold for the drop in coverage due to stops that render some reads difficult to assign to their correct transcript. For example, if many reads (say 60%) end at position 26 in a reference tRNA due to m2,2G modification here, but a mismatch at position 13 is needed to assign this read to a child transcript within the cluster, then many reads will not be able to be assigned and will stay assigned to the parent.
    • In this case, the --deconv-cov-ratio can be set to 0.5 (i.e. a drop in coverage of 50% from the 3' end to the required mismatch in question). This will mark the parent and the specific sequence as not deconvoluted (as 60% of the reads end before the mismatch), and these will be excluded from modification analysis (and table outputs) as well as DESeq2 analysis of differential expression.

Differential modification analysis

  • In experiments with more than 1 condition, conditions will be compared pairwise to assess significant differences in modification status.
  • To achieve this, proportions of modified and unmodified nucleotides at each position for each tRNA are used for the calculation of log Odds Ratios (logOR). These are then tested for significance with a Chi-Squared test, and corrected for multiple testing using FDR.
  • See the original paper Methods section for detailed explanation.
    Behrens et al. (2021) High-resolution quantitative profiling of tRNA abundance and modification status in eukaryotes by mim-tRNAseq. Mol. Cell.

Small bugfixes and changes

  • --min-cov now accepts an integer for filtering low coverage tRNAs (as before, where the integer represents a total read coverage threshold for the tRNA), as well as a fraction between 0 and 1 representing the proportion of reads mapped to that tRNA relative to all tRNA mapped reads (recommended values to start testing are 0.0005).
  • Automatic connection and parsing of data from Modomics API. Local Modomics file can be used using --local-modomics
  • --double-cca mode enables assessment of tRNA ends with 3'CCACCA. This end addition is common in marking tRNAs for degradation. In this mode, CCA analysis files will represent proportions of the second CCA end addition not the first. I.e. the proportion of 3'-CCA is representative of 3'-CCACCA, 3'-CC represents 3'CCACC, and so on. "Absent" indicates all those reads without a second CCA end (which may include those with full single 3'-CCA down to absent single CCA).

additionalMods reduction, improved predicted mods discovery and isodecoder name shortening

08 Jun 13:26
Compare
Choose a tag to compare

Updates

  • Reduced additional mods file to contain only inosine 34. This allows mim-tRNAseq to discover and annotate other mods on its own and reduces misalignment, e.g. human Asp-GTC-3
  • New mods discovered after round 1 of alignment are then subtracted from predictedMods so that their validity can be rechecked after alignment improvements in round 2. Only these are then added to predictedMods
  • Isodecoder names now exclude 2nd number in name (except "chr" containing names) in all outputs and plots
  • Stops and misinc heatmaps no longer plot gap information ("-")
  • knownMods.csv renamed to allMods.csv as it includes Modomics, additionalMods, and predictedMods sites

Bugfixes

  • allMods.csv positions now 1-based
  • --misinc-thresh now used for heatmap row annotations (i.e. stop and misinc site counts)
  • Duplicates in countTable no longer removes as pandas issue removes non-duplicated rows

Minor bugfixes and updates

08 Apr 12:32
Compare
Choose a tag to compare

Updates

  • Additional mitochondrial modifications from Clark 2016 added to additionalMods.txt
  • GSNAP index names match input files not experiment name flag
  • DESeq2 threshold for significance now adjusted p < 0.01
  • readthroughTable proportion now 1 - stop proportion at position, reflecting actual readthrough
  • DESeq2 single replicates now get normalised count outputs using estimateSizeFactors()

Bugfixes

  • Strict handling of additionalMods location type (i.e. mito or cytosolic) when adding to modified position index
  • Correct handling of insertions between cluster parent and child: modified sites after insertions in child need position correction by number of insertions (i.e. all mods after insertions are decreased to adjust for parent insertions)
  • Indel handling for discrepancy between GSNAP an usearch. In some cases the two algorithms chose different, closely spaced positions to introduce an insertion, effecting downstream cluster splitting and mod position analysis
  • Fixed isodecoders added to unique_isodecoderMMs based on insertions that weren't unique to one isodecoder only
  • Adjust position of mismatch between parent and child if there are preceding insertions so that the correct position in the child is used for storing the identity of the mismatch

Bugfixes to dependancies and new PyPi package

19 Mar 09:56
Compare
Choose a tag to compare

Bugfixes

  • Changes to dependancy versions
  • Needed new version to create new distribution for PyPI

PyPi package and automated species data

18 Mar 17:22
Compare
Choose a tag to compare

New features

  • --species (or -s) flag to specify one of a few built-in species data. This negates the need for specifying tRNA references, tRNAscan out files, and mitochondrial tRNA sequences
  • Species files for which there isn't built in data can still be specified using -t, -o and -m
  • Package now available on PyPi and installable using pip
  • mods/readthroughTable.csv is now and additional which gives information on the proportion of reads that stop at a given site relative to the total reads at that site (as opposed to RTstopTable.csv which gives the proportion of total reads for the tRNA that stop at a site)

Bugfixes to v0.2

07 Feb 15:50
Compare
Choose a tag to compare

Bug fixes

  • --cluster-id 1 now functions correctly in terms of producing Isodecoder_counts.txt
  • Disabled clustering also functions correctly to produce both counts outputs

New features

  • Isodecoder and anticodon level heatmaps of vst normalised tRNA expression

Stable v0.2 release

18 Dec 09:54
b83d61e
Compare
Choose a tag to compare

Major update to modification analysis

mim-tRNAseq now performs modification analysis per unique tRNA sequence instead of per cluster. This is achieved by analysing mismatches between cluster members and using unique mismatches to characterise unique tRNA sequences that can be split from the cluster member. This was previously done to split overall read counts, but now this is performed before modification analysis so that each read can be assigned to a unique tRNA sequence group. Each read is then assessed for stops and modifications after assignment to its new group.

Other new features

  • New predicted modifications and inosines output to mods/predictedMods.csv. This contains predicted sites for each sample run, with canonical position numbering and proportions of each nucleotide misincorporated for easier annotation of new detected mods.

Minor bug-fixes

  • S. pombe reference tRNA names altered for consistent naming in output plots
  • Mitochondrial coverage plots now have legends with two columns so that all items are visible in output PDF

First stable release

06 Nov 09:48
Compare
Choose a tag to compare
v0.1

first release version 0.1