feat: Reduce kmer spacing for short sequences #1242
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a good heuristic to avoid lack of seed matching in short sequences (~1% of RSV A is only ~200bp long)
If default kmer spacing is 50, a sequence of 180bp gets only 3 kmers which is not robust
I put the parameter adjustment next to the other short-sequence heuristic but this means we have to make params mutable. Might be better to bury the parameter adjustment further up or down the stack. Thoughts @ivan-aksamentov?
We may or may not want to expose the const
MIN_KMER_NUM
as a CLI arg. I don't think it's necessary for user to adjust so we can get away with hard coding it for now, I think.See https://neherlab.slack.com/archives/C015PFP5V44/p1693408681322229