You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my data, I have two DRACH sites that are at the start and end of an RNA sequence (i.e., two bases from the start of the sequence and right before the poly A tail), which don't show up on the m6ANet output. I tried running this data through EpiNano for comparison and they pop up, however, EpiNano appears to be less accurate overall than m6ANet. If I look at these strands in IGV, there are reads that sequenced all the way to the beginning/end, so there are reads within these regions for m6ANet to pull data from.
On a similar note, the Workman et al. 2019 paper (see here: https://www.nature.com/articles/s41592-019-0617-2) studying RNA sequencing via nanopore notes that some sequences in their runs appear "truncated", which may be a limitation of Nanopore sequencing. Given this, I'm curious how m6ANet handles DRACH sites in regions that have less coverage than other sites. If a single read does not map all the way to a DRACH site near the end of the transcript, or there is a "deletion" in the middle of the transcript where the DRACH site lies, does m6ANet disregard those reads during its probability calculation (i.e., only using data from reads that mapped all the way through these sites)?
Thanks!
The text was updated successfully, but these errors were encountered:
Currently, m6Anet excludes sites with less than 20 reads and that might be the reason as to why they don't show up on the m6Anet output. I have not really tested the accuracy of m6Anet for sites with less than 20 reads but technically, as long as they can be mapped, m6Anet can at the very least output single molecular probability for these reads although the current m6Anet version does not support this yet. If you are really interested to retrieve those reads, you can supply --min_segment_count argument during dataprep (the default is 20 but you can set it lower to 1 for example) so that data.info and data.json will contain those reads with lower coverage.
Afterward, during inference, you need to change some hard-coded values in m6anet/scripts/inference.py (DEFAULT_MIN_READS is set to 20, you need to change this value so that the dataloader will not exclude site will fewer than 20 reads), and also within run_inference function in m6anet/utils/inference_utils.py, you will need to change the hard-coded number in calculate_site_proba from 20 to something else. We will do some benchmarking in the near future and hopefully, we can provide recommendations on the number of reads and allow users to specify this in the future.
Regarding your question about deletion in the middle of the transcript, this will really depend on the ability of nanopolish eventalign to segment the transcript. Based on my understanding, nanopolish eventalign might still segment the "deleted" DRACH segments if it thinks that the deletion is purely due to basecalling error, but I am not too sure if this is accurate. Regardless, if such reads are segmented by nanopolish eventalign, then m6Anet will likely be able to process those reads
Hello,
In my data, I have two DRACH sites that are at the start and end of an RNA sequence (i.e., two bases from the start of the sequence and right before the poly A tail), which don't show up on the m6ANet output. I tried running this data through EpiNano for comparison and they pop up, however, EpiNano appears to be less accurate overall than m6ANet. If I look at these strands in IGV, there are reads that sequenced all the way to the beginning/end, so there are reads within these regions for m6ANet to pull data from.
On a similar note, the Workman et al. 2019 paper (see here: https://www.nature.com/articles/s41592-019-0617-2) studying RNA sequencing via nanopore notes that some sequences in their runs appear "truncated", which may be a limitation of Nanopore sequencing. Given this, I'm curious how m6ANet handles DRACH sites in regions that have less coverage than other sites. If a single read does not map all the way to a DRACH site near the end of the transcript, or there is a "deletion" in the middle of the transcript where the DRACH site lies, does m6ANet disregard those reads during its probability calculation (i.e., only using data from reads that mapped all the way through these sites)?
Thanks!
The text was updated successfully, but these errors were encountered: