Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: add capacity for skipmer sketching and search #531

Merged
merged 37 commits into from
Dec 21, 2024
Merged

MRG: add capacity for skipmer sketching and search #531

merged 37 commits into from
Dec 21, 2024

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Nov 20, 2024

Skipmers are something we've considered adding for quite some time, as DNA kmer that ~allows "mismatches" aka increases entropy + sensitivity.

Over in sourmash-bio/sourmash#3395, I added a skipm1n3 and skipm2n3 moltypes, as well as code in SeqToHashes to build them. In sourmash-bio/sourmash#3446 I also added capacity for sourmash python function to read skipmer sigs, so sig cat, sig summarize, etc should now work.

There are two types of skipmers available, keep-2,skip-1 ("skipm2n3") and keep-1,skip-2 ("skipm1n3"). To sketch with skipmers, specify skipm2n3 or skipm1n3 in the parameter string. The skipmer ksize is the "final" size that the k-mer ends up. --i.e. for ksize 3, the sequence ACTAG would produce two skip-mers for m2n3: ACA, CTG.

example sketching commands:

manysketch:

sourmash scripts manysketch -p skipm2n3,k=21,scaled=1000 ms.csv -o output.zip

singlesketch:

sourmash scripts singlesketch -p skipm2n3,k=21,scaled=1000 myfile.fasta -o myfile.sig.gz

-o myfile.zip also works

Skipmer References:

ref #549

@bluegenes bluegenes changed the base branch from main to integrate-buildutils November 20, 2024 01:06
Base automatically changed from integrate-buildutils to main November 20, 2024 22:28
@bluegenes bluegenes changed the title EXP: skipmer sketching MRG: add capacity for skipmer sketching and search Dec 20, 2024
@bluegenes
Copy link
Contributor Author

bluegenes commented Dec 20, 2024

prior to merge (or at least release), need to switch to sourmash 0.18.0 in Cargo.toml

@ctb ctb changed the base branch from main to update_smash December 21, 2024 14:49
Base automatically changed from update_smash to main December 21, 2024 19:02
@ctb ctb enabled auto-merge (squash) December 21, 2024 19:03
@ctb ctb merged commit 6a3c49a into main Dec 21, 2024
3 checks passed
@ctb ctb deleted the try-skipmers branch December 21, 2024 19:11
@ctb ctb mentioned this pull request Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants