diff --git a/doc/command-line.md b/doc/command-line.md index c5a536682d..bfd44b0d26 100644 --- a/doc/command-line.md +++ b/doc/command-line.md @@ -387,7 +387,7 @@ assembled into contigs, the unweighted number would approximate the number of bases from the contigs that would match perfectly to at least one genome in the reference database. More practically, the abundance-weighted number is less sensitive to sequencing errors. -See @CTB classifying signatures or FAQ for more information here! +See [classifying signatures](classifying-signatures.md#abundance-weighting) or [the FAQ](faq.md) for more information! The command line option `--threshold-bp` sets the threshold below which matches are no longer reported; by default, this is set to diff --git a/doc/faq.md b/doc/faq.md index 2f1e0fecd6..e54eff9a0c 100644 --- a/doc/faq.md +++ b/doc/faq.md @@ -51,12 +51,11 @@ The other drawback is that FracMinHash sketches _don't work well_ for very small sequences. Our default parameter choice for DNA (scaled=1000) works well for finding 10 kb or larger matches between sequences - some simple Poisson matching math suggests that about -99.98% of 10kb overlaps will be found with scaled=1000. @CTB verify I -think this is 5kb. +99.98% of 5kb overlaps will be found with scaled=1000. ## How can I better understand FracMinHash and sourmash intuitively? -@@ tutorial on k-mers +Please see [the k-mers and minhash tutorial](kmers-and-minhash.ipynb). ## What papers should I read to better understand the FracMinHash approach used by sourmash? diff --git a/doc/new.md b/doc/new.md index 29cd529779..e73d0d51a9 100644 --- a/doc/new.md +++ b/doc/new.md @@ -15,14 +15,14 @@ You might try sourmash if you want to - * taxonomically classify genomes or metagenomes against NCBI and/or GTDB; * search thousands of metagenomes with a query genome or sequence -Underneath, sourmash uses [FracMinHash sketches](@@) for fast and +Underneath, sourmash uses [FracMinHash sketches](https://www.biorxiv.org/content/10.1101/2022.01.11.475838) for fast and lightweight sequence comparison; FracMinHash builds on -[MinHash sketching](@@wikipedia) to support both Jaccard similarity +[MinHash sketching](https://en.wikipedia.org/wiki/MinHash) to support both Jaccard similarity _and_ containment analyses with k-mers. This significantly expands the range of operations that can be done quickly and in low memory. sourmash also implements a number of new and powerful analysis -techniques, including [minimum metagenome covers](@@) and [alignment-free ANI -estimation](@@). +techniques, including minimum metagenome covers and alignment-free ANI +estimation. sourmash is inspired by [mash](https://mash.readthedocs.io), and supports most mash analyses. sourmash also implements an expanded set