This tool is design to classify metagenomic sequences (marker genes, genomes and amplicon reads) using a Hierarchical Taxonomic Classifier.

Please check also the wiki for more information.

Dependencies

The stag classifier requires:

Python 3.7 (or higher)
HMMER3 (or Infernal)
Easel (link)
seqtk
prodigal (to predict genes in genomes)
python library:
- numpy
- pandas
- sklearn
- h5py = 2.10.0

If you have conda, you can install all the dependencies in conda_env_stag.yaml. See Installation wiki for more info.

Installation

git clone https://github.com/zellerlab/stag.git
cd stag
# if environment is needed
conda env create -f conda_env_stag.yaml
python setup.py bdist_wheel
pip install --no-deps --force-reinstall dist/*.whl

Note: in the following examples we assume that the python script stag is in the system path.

Execution

# if environment was installed
conda activate stag
# test the installation
stag test

Taxonomically annotate gene sequences

Given a fasta file (let's say unknown_seq.fasta), you can find the taxonomy annotation of these sequences using:

stag classify -d test_db.stagDB -i unknown_seq.fasta

The output is:

sequence	taxonomy
geneA	d__Bacteria;p__Firmicutes;c__Bacilli;o__Staphylococcales;f__Staphylococcaceae;g__Staphylococcus
geneB	d__Bacteria
geneC	d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria

You can either create a database (see Create a database), or use one that we already compiled:

For marker genes
For 16S amplicon data

Taxonomically annotate genomes

Given a fasta file (let's say unknown_genome.fasta), you can find the taxonomy annotation of this genome with:

stag classify_genome -i unknown_genome.fasta -d gtdb_30.stagDB -o res_dir

The output is saved in the directory res_dir. Inside you will find the file genome_annotation with the annotation in the same format as in the gene classification. More information on the other files can be found here.

To classify multiple genomes, you can use:

stag classify_genome -D all/genomes/dir -d gtdb_30.stagDB -o res_dir

Where all/genomes/dir is a directory, and all fasta files inside the directory will be classified.

Finally, you can find some databases to classify genomes (gtdb_30.stagDB in the examples) here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dependencies

Installation

Execution

Taxonomically annotate gene sequences

Taxonomically annotate genomes

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dependencies

Installation

Execution

Taxonomically annotate gene sequences

Taxonomically annotate genomes