Skip to content

Releases: ksahlin/isONcorrect

v0.1.3.5

06 Sep 23:34
Compare
Choose a tag to compare

This release restructures the folders to work well with Pythons new suggested way to build packages for PyPI (.toml files).

In essence:

  • A build script pyproject.toml was added to the repo.
  • A src/isoncorrect folder was created instead to replace previous modules folder.
  • The scripts run_isoncorrect and isONcorrect was placed in the src/isoncorrect folder and given .py file endings to behave as modules included in the insoncorrect library.
  • The build instructions now produce the binaries run_isoncorrect and isONcorrect automatically from the run_isoncorrect.py and isONcorrect.py modues by giving the entry point function main() in each file.

The new structure requires isONcorrect to be installed with a package manager conda/pip.

For development (downloading github source), one needs to temporarily modify line 21 in isONcorrect.py from from isoncorrect import create_augmented_reference, help_functions, correct_seqs to import create_augmented_reference, help_functions, correct_seqs.

(Version number had to be increased several increments after several unsuccessful attempts to get the new build to install properly)

v0.1.0

06 Sep 16:55
Compare
Choose a tag to compare

This version adds the following over previous versions:

  • An over-correction checker: The original read and the corrected read are aligned and eventual structural over-corrections are removed. Such events should be rare. We never observed any such event with previous defaults --k 9 --w 10 but rare occurences happened with the new defaults --k 9 --w 20 introduced in v0.0.8. This should be fixed now. This check adds negligible time (~1-2%) to overall runtime
  • Better (sparser) minimizer sampling in poly-A/C/G/T regions with two new rules: 1. sample last minimizer if ties and 2. do not resample a minimizer if last minimizer is still in the window. Reduces repetitive anchors a lot in poly-regions. This improves runtime for instances where long ploy regions are frequent.
  • Related to point 2; Upper limit on how repetitive a paired-minimizer anchor can be in the data (at most 10x the number of reads). I have not observed such cases yet in ONT but setting this just in case as it happened for some degenerate pacbio reads (for which isONcorrect does not typically need to be run anyway).