DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. The key features are:
- Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST.
- Frameshift alignments for long read analysis.
- Low resource requirements and suitable for running on standard desktops or laptops.
- Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.
Keep posted about new developments by following me on Twitter.
Please read the manual for detailed installation and usage instructions. This demonstrates a quick example for setting up and using the program on Linux.
Installing the software on your system may be done by downloading it in binary format for immediate use:
wget http://github.com/bbuchfink/diamond/releases/download/v0.9.29/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz
The extracted diamond
binary file should be moved to a directory
contained in your executable search path (PATH environment variable).
To now run an alignment task, we assume to have a protein database file
in FASTA format named nr.faa
and a file of DNA reads that we want to
align named reads.fna
.
In order to set up a reference database for DIAMOND, the makedb
command needs to be executed with the following command line:
$ diamond makedb --in nr.faa -d nr
This will create a binary DIAMOND database file with the specified name
(nr.dmnd
). The alignment task may then be initiated using the blastx
command like this:
$ diamond blastx -d nr -q reads.fna -o matches.m8
The output file here is specified with the –o
option and named
matches.m8
. By default, it is generated in BLAST tabular format.
Note:
- The program may use quite a lot of memory and also temporary
disk space. Should the program fail due to running out of either
one, you need to set a lower value for the block size parameter
-b
(see the manual). - The default (fast) mode was mainly designed for short reads. For
longer sequences, the sensitive modes (options
--sensitive
or--more-sensitive
) are recommended. - The runtime of the program is not linear in the size of the query file and it is much more efficient for large query files (> 1 million sequences) than for smaller ones.
- Low complexity masking is applied to the query and reference sequences by default. Masked residues appear in the output as X.
- The default e-value cutoff of DIAMOND is 0.001 while that of BLAST is 10, so by default the program will search a lot more stringently than BLAST and not report weak hits.
The preferred support channel is the Diamond community website. It provides a platform for users to exchange their experiences and get support directly from the developer. You may also use the GitHub issue tracker or send inquiries by email.
DIAMOND is developed by Benjamin Buchfink at the Detlef Weigel lab, Max Planck Institute for Developmental Biology, Tübingen, Germany.
[Email] [Twitter] [Google Scholar] [MPI-EBIO]
Publication:
- Buchfink B, Xie C, Huson DH, "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12, 59-60 (2015). doi:10.1038/nmeth.3176