Unfolding
Buildable source tarball: wfmash-v0.18.0.tar.gz
Improving mapping in complex regions, debugging recursive patching, and other fun.
-
Recursive Inversion Patching:
- Implemented recursive patching for inversions, completing the "multipatch" functionality.
- This allows for more accurate alignment of complex genomic regions with inversions.
-
SAM Output for Multipatch Alignments:
- Added support for SAM output format for multipatch alignments.
- Ensures consistent representation of complex alignments across different output formats.
-
Orientation-Consistent Alignments:
- Improved alignment consistency across all orientations of reference-query pairs.
- Enhances reliability and reproducibility of alignment results.
-
Optimized Inversion Patching:
- Implemented a bound on the maximum score for inverted patches.
- Allows for early termination of alignment when the inverted patch is worse than the forward alignment.
-
Dynamic Multi-Producer Alignment Module:
- Rewrote the alignment module to support multiple producers filling the work queue.
- Dynamically handles memory issues, improving efficiency and scalability.
-
Overlap Filtering in Plane Sweep Algorithm:
- Implemented an overlap filter to prevent keeping suboptimal mappings.
- New CLI option:
-O, --overlap-threshold <F>
- Allows setting the fraction F for dropping mappings overlapping with higher scoring mappings.
- Default value is 0.5.
-
Long Mapping Fragmentation:
- Enabled breaking of long mappings into smaller fragments at junction points.
- Junctions are defined by four consecutive segments, allowing for more precise breakpoint detection around structural variations.
- New CLI option:
-P, --max-mapping-length <N>
- Sets the maximum length of a single mapping before breaking.
- Default value is 1M (1 million bases).
-
Improved Handling of Satellite Sequences:
- The combination of overlap filtering, mapping fragmentation, and recursive patching significantly improves wfmash's ability to handle satellite sequences.
- These changes address common performance issues and mapping problems associated with highly repetitive regions.
- Users should expect better accuracy and efficiency when aligning genomes with abundant satellite sequences.
-
Performance Improvements:
- Various optimizations and code refactoring for better overall performance.
-
Bug Fixes and Minor Enhancements:
- Multiple bug fixes and small improvements throughout the codebase.
This release significantly enhances wfmash's ability to handle complex genomic structures, including challenging satellite sequences. It improves output consistency and optimizes performance for large-scale alignments. The new features and CLI options provide more accurate and detailed alignment information, particularly for regions with inversions, structural variations, and repetitive elements, while offering users greater control over the alignment process. These improvements make wfmash more robust and efficient for a wider range of genomic analyses, especially those involving highly repetitive or complex regions.
What's Changed
Full Changelog: v0.17.0...v0.18.0