Releases · waveygang/wfmash

13 Apr 16:34

AndreaGuarracino

v0.8.2

ad8aeba

wfmash 0.8.2 - Pasticcione

Buildable Source Tarball: wfmash-v0.8.2.tar.gz

This introduces:

updates in how wfmash is compiled/built to ensure greater inter-system compatibility;
adaptive penalties for the alignment, with more permissive wflambda/wflign parameters.

Assets 3

28 Mar 12:40

AndreaGuarracino

v0.8.1

c746c73

wfmash 0.8.1 - Divergenza

Buildable Source Tarball: wfmash-v0.8.1.tar.gz

This introduces:

fixed a bug in mapping filtering for short sequences;
default segment size (-s) at 10 kbps;
fixed alignment penalties regardless of the requested mapping identity (-p): this strongly reduces the runtime and lead to much more compressed representations of the alignments between sequences.

Assets 3

23 Mar 16:29

ekg

v0.8.0

756508c

pensiero divergente

Buildable Source Tarball: wfmash-v0.8.0.tar.gz

wfmash is now substantially better at mapping and alignment at very high sequence divergences. This involves many changes relative to v0.7.0.

mashmap3

The mapping module has been largely rewritten to allow for mappings to span large structural variation. We now apply multiple merging passes in 2D over the query/target mapping matrix (mashmap2 used a 1D approach in the query). The first unites mappings found within 2x the segment length (wfmash -s). Subsequently, multiple rounds of greedy merging and plane-sweep filtering merge the closest mappings on a near diagonal within a given chaining gap (wfmash -c). We finally filter the mappings at 5x segment length (wfmash -l) rather than 3x in previous releases.

The updated mapping merging also allows us to make a sparser first mapping step, as segment mapping drop-outs can be spanned using this approach. This allows us to use relatively sparse minimizer selection, which reduces the number of candidate (usually erroneous) mappings to consider.

We have also applied world minimizers, which are unbiased and faster to compute than window minimizers. To ensure efficient performance, we implement a much stronger filter on repetitive minimizers, filtering out the top 0.5% of most-frequent minimizers, which is now configurable with wfmash -H.

divergence-adaptive wflign

This release also features improvements to the base level alignment that are essential for sensitive alignment at high divergence. We now rest more heavily on the wflign matrix, which leads to a more complete exploration of alignment possibilities. Alignment parameters---such as dynamic programming scoring (for WFA), maximum sketch distance to evaluate a local alignment, and max allowed alignment score---are now set based on a function of the mashmap-based identity.

testing approach

To develop this release, we tested on sequence collections with up to 30% divergence. We ensured that adjustments worked on a series of test cases drawn from humans, yeast, e. coli, potato, and fish, including a scale-up test to an all-vs-all alignment of 45 fish assemblies.

user considerations

In contrast to previous versions, wfmash v0.8.0 is less sensitive to particular segment length settings. The meaning of -p, or the minimum pairwise identity of the mappings, is also somewhat softened, because mappings can now span very large gaps, up to --chain-gap which defaults to 100x the segment length. Very long segment lengths of 50-100kb are probably less necessary, and we're seeing good performance at 5kb to 20kb segment lengths.

The increase in the minimum mapping length filter (from 3x to 5x segment length) reflects increased sensitivity and also potential errors caused by these changes.

An additional concern is that users seeking to map against extremely repetitive sequences may need to set -H lower. Increasing -s can also span gaps caused by repeats and derive alignments for them. Alignments that focus strongly on repetitive regions may still need special parameter tuning. The default settings are now focused on obtaining reasonable homology maps for pangenome and pan-clade alignment problems.

visualization of wflign alignment matrix

Parameter tuning was assisted with visualizations of the wflign (high-order, over 256bp wfmash -W-length segments) alignment matrix. These show regions compared using kmer jaccards in gray, attempted successful alignments in green, and blue for failed alignments.

Two 1Mbp regions of yeast genomes:

... and the full alignment matrix (pafplot):

Two fish chromosomes at ~25% pairwise divergence in aligned regions (wfmash -p 70 -s 20k).

And a few alignments through human lipoprotein A (LPA):

Assets 3

09 Sep 16:06

AndreaGuarracino

v0.7.0

a438d5d

wfmash 0.7.0 - Educazione

Buildable Source Tarball: wfmash-v0.7.0.tar.gz

This release introduces a huge amount of updates:

the mapping parameters (window size and kmer size) are adaptive with respect to the requested segment identity;
the alignment parameters (mismatch/gap penalties and the max mash distance heuristic) are adaptive with respect to the estimated identity for each mapping region;
WFA was updated to the last WFAv2, which includes important memory usage optimizations;
wflign / wflambda are upgraded to WFAv2, leading to a strong reduction of the memory usage;
alignment accuracy is improved during the patching, thanks to the reduced memory usage of the new WFAv2;
robin-hood structures are applied, to improve runtimes;
matches and (part of the) mis-matches are cached in wflign, improving the runtime by paying little memory overhead;
pure-WFA alignment is performed for short sequences (and short mapping regions in long sequences);
ends-free WFA for head/tail patching, replacing edlib;
fixed a reduction bug in the WFA library;
input PAF from other aligners are supported;

Assets 3

26 Jul 10:17

AndreaGuarracino

v0.6.1

3b786e0

wfmash 0.6.1 - Handy

Buildable Source Tarball: wfmash-v0.6.1.tar.gz

This (little) release includes:

handy parameters (#89);
a buildable source tarball;
a little compiling fix.

Assets 3

16 Jul 17:13

ekg

v0.6

059448c

sparsify and use low-memory WFA

Here, we sparsify the wflign problem, and then patch through the gaps using a low-memory version of WFA (cheers @smarco !)

Assets 2

17 May 09:52

ekg

v0.5

37b9e71

sensitive mapping and stable wflign-ing

A number of changes in wfmash have completed the alignment patching in wflign, rendering it stable and memory-thrifty enough to safely apply to large genomes. The mapping in general has also been improved by targeting a smaller windowSize parameter, and capping it at 256 to not generate confusion when mapping large segments.

Assets 2

22 Apr 22:33

ekg

v0.4

1e10586

wavefront inception: the alignment patching

With this version, alignments are patched with WFA (for unaligned regions where the short axis is up to 8kb) and edlib (for very short unaligned stretches). Edlib in semiglobal mode is used to patch up the heads and tails of the alignments. Previous versions have significant dropouts in alignments, but with these changes the issue is largely resolved.

Assets 2

15 Jan 14:09

ekg

v0.3.1

8ff9d7d

wavefront inception: the trace-merging

This point release updates wflign to emit a single merged alignment for each mapping. The output is compact and ready for eventual adaptation to SAM output.

Assets 2

11 Jan 13:55

ekg

v0.3

dd8799a

wavefront inception

wfmash is now sync'ed with edyeet and an update to wflign lets us use WFA to obtain base-level alignment with affine gap costs. This is more biologically plausible than edit-distance based alignment provided by edilb.

Alignment runtime increases by 2-3x, depending on divergence rate given by -p[%], --map-pct-id=[%], with higher thresholds experiencing lower relative slowdown.

wfmash uses both wavefronts and mash distance (locality sensitive hashing) in two contexts. For mapping, it uses MashMap2's algorithm. For base-level alignment, it uses wflign, which is WFλ with λ = WFA guided heuristically with mash distance.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mashmap3

divergence-adaptive wflign

testing approach

user considerations

visualization of wflign alignment matrix

Releases: waveygang/wfmash

wfmash 0.8.2 - Pasticcione

wfmash 0.8.1 - Divergenza

pensiero divergente

mashmap3

divergence-adaptive wflign

testing approach

user considerations

visualization of wflign alignment matrix

wfmash 0.7.0 - Educazione

wfmash 0.6.1 - Handy

sparsify and use low-memory WFA

sensitive mapping and stable wflign-ing

wavefront inception: the alignment patching

wavefront inception: the trace-merging

wavefront inception