Skip to content

Commit

Permalink
finish things off?
Browse files Browse the repository at this point in the history
  • Loading branch information
ctb committed Oct 1, 2023
1 parent 1d1e460 commit 4ed0367
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 8 deletions.
2 changes: 1 addition & 1 deletion doc/command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@ assembled into contigs, the unweighted number would approximate the
number of bases from the contigs that would match perfectly to at
least one genome in the reference database. More practically,
the abundance-weighted number is less sensitive to sequencing errors.
See @CTB classifying signatures or FAQ for more information here!
See [classifying signatures](classifying-signatures.md#abundance-weighting) or [the FAQ](faq.md) for more information!

The command line option `--threshold-bp` sets the threshold below
which matches are no longer reported; by default, this is set to
Expand Down
5 changes: 2 additions & 3 deletions doc/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,11 @@ The other drawback is that FracMinHash sketches _don't work well_ for
very small sequences. Our default parameter choice for DNA
(scaled=1000) works well for finding 10 kb or larger matches between
sequences - some simple Poisson matching math suggests that about
99.98% of 10kb overlaps will be found with scaled=1000. @CTB verify I
think this is 5kb.
99.98% of 5kb overlaps will be found with scaled=1000.

## How can I better understand FracMinHash and sourmash intuitively?

@@ tutorial on k-mers
Please see [the k-mers and minhash tutorial](kmers-and-minhash.ipynb).

## What papers should I read to better understand the FracMinHash approach used by sourmash?

Expand Down
8 changes: 4 additions & 4 deletions doc/new.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ You might try sourmash if you want to -
* taxonomically classify genomes or metagenomes against NCBI and/or GTDB;
* search thousands of metagenomes with a query genome or sequence

Underneath, sourmash uses [FracMinHash sketches](@@) for fast and
Underneath, sourmash uses [FracMinHash sketches](https://www.biorxiv.org/content/10.1101/2022.01.11.475838) for fast and
lightweight sequence comparison; FracMinHash builds on
[MinHash sketching](@@wikipedia) to support both Jaccard similarity
[MinHash sketching](https://en.wikipedia.org/wiki/MinHash) to support both Jaccard similarity
_and_ containment analyses with k-mers. This significantly expands
the range of operations that can be done quickly and in low
memory. sourmash also implements a number of new and powerful analysis
techniques, including [minimum metagenome covers](@@) and [alignment-free ANI
estimation](@@).
techniques, including minimum metagenome covers and alignment-free ANI
estimation.

sourmash is inspired by [mash](https://mash.readthedocs.io), and
supports most mash analyses. sourmash also implements an expanded set
Expand Down

0 comments on commit 4ed0367

Please sign in to comment.