Metafisher is a tool to retrieve toxin antitoxin (TA) systems of type II in genomic sequences.
- Clone the metafisher repository:
git clone https://github.com/JeanMainguy/MeTAfisher
cd MeTAfisher/
- Create an environement with python3 and HMMER installed. You can create this environment using conda:
conda env create -f env/metafisher.yml
conda activate metafisher
- Run metafisher on the genome of Desulfovibrio vulgaris:
./metafisher/metafisher.py --gff data_test/GCF_000070465.1/GCF_000070465.1_ASM7046v1_genomic.gff.gz \
--faa data_test/GCF_000070465.1/GCF_000070465.1_ASM7046v1_protein.faa.gz \
--outdir metafisher_results -v
This program is released as open source software under the terms of MIT License.
To identify potential Toxin and Antitoxin genes, metAfisher uses a list of domains known to be specific of TA systems. These domains are searched in the protein sequences by the tool HHMER. On top of the domain search, potential TA genes can be identified by diamond search based on all sequences of TADB (https://bioinfo-mml.sjtu.edu.cn/TADB2).
To use diamond search strategy, a diamond database with the TADB sequences need to be created.
The protein sequences of Toxin and Antitoxin can be downloaded on the TADB website: https://bioinfo-mml.sjtu.edu.cn/TADB2/download.html
- Download TADB protein sequences
wget https://bioinfo-mml.sjtu.edu.cn/TADB2/download/TADB2/20171013/protein/type_II_pro_T.fas
wget https://bioinfo-mml.sjtu.edu.cn/TADB2/download/TADB2/20171013/protein/type_II_pro_AT.fas
- concat fasta files and build diamond db
mkdir TA_data
cat type_II_pro_T.fas type_II_pro_AT.fas > TA_data/type_II_TA.fasta
diamond makedb --in type_II_TA.fasta -d TA_data/type_II_TA
- Generate stat files
These file are needed to score the potential TA systems. It computes how often a domain is associated with another one in a TA system of TADB.
python metafisher/compute_tadb_stat.py --toxin_faa TA_data/type_II_pro_T.fas --antitoxin_faa TA_data/type_II_pro_AT.fas -v
./metafisher/metafisher.py --gff data_test/GCF_000070465.1/GCF_000070465.1_ASM7046v1_genomic.gff.gz \
--faa data_test/GCF_000070465.1/GCF_000070465.1_ASM7046v1_protein.faa.gz\
--outdir metafisher_results \
--diamond_db TA_data/type_II_TA.dmnd -v