Skip to content

Commit

Permalink
MRG: update with misc animal genomes (#3422)
Browse files Browse the repository at this point in the history
Adds common hosts and also hg38.

Tackles #2717

<img width="667" alt="Screenshot 2024-12-07 at 9 17 40 AM"
src="https://github.com/user-attachments/assets/bfeff595-1759-4569-8adb-1e950f75a03e">

## Rendered preview:
  • Loading branch information
ctb authored Dec 8, 2024
1 parent 0f48dc5 commit 7e744e7
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions doc/databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,25 @@ The databases do not need to be unpacked or prepared in any way after download.

You can verify that they've been successfully downloaded (and view database properties such as `ksize` and `scaled`) with `sourmash sig summarize <output>`.

## Sketches for human and animal genomes

These sketches are of the latest releases of a number of animal
genomes. Among other uses, they can be used to detect host
contamination in microbial metagenomes.

Each file includes sketches at k=21, k=31, and k=51, at a scaled of
1000, and is about 110 MB.

* Human (hg38) - [hg38.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/hg38.sig.zip)
* Cow (bosTau9) - [bosTau9.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/bosTau9.sig.zip)
* Dog (canFam6) - [canFam6.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/canFam6.sig.zip)
* Horse (equCab3) - [equCab3.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/equCab3.sig.zip)
* Cat (felCat9) - [felCat9.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/felCat9.sig.zip)
* Chicken (galGAl6) - [galGal6.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/galGal6.sig.zip)
* Mouse (mm39) - [mm39.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/mm39.sig.zip)
* Goat (oviAri4) - [oviAri4.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/oviAri4.sig.zip)
* Pig (susCr11) - [susScr11.sig.zip](https://farm.cse.ucdavis.edu/~ctbrown/sourmash-db/host/susScr11.sig.zip)

## GTDB R08-RS214 - DNA databases

[GTDB R08-RS214](https://forum.gtdb.ecogenomic.org/t/announcing-gtdb-r08-rs214/456) consists of 402,709 genomes organized into 85,205 species clusters.
Expand Down

0 comments on commit 7e744e7

Please sign in to comment.