go-FAnnoT
is functional annotion transfer tool based on protein homology. Our motivations to develop this tools were manyfold:
- Defining a precise strategy to build reference datasets. Indeed, most of time, transfer tools consider the annotation of one closely related species annotation as reference, copying possible errors. While it is necessary to adapt reference proteins to the organisms, a more robust strategy is required to ensure the quality of functional annotations.
- Evaluating homology from global alignment and not from a local alignment. Most of the existing tools identify matches on a basis of BLAST search. Unfortunatly, measuring homology on BLAST alignment is not sufficient and sequences should be realigned with a global alignment tool.
- Allowing a flexible thresold setting. In addition to reference datasets, homology thresholds should depends on the organism to annotate. Hence, for example, it can be necessary to lower threshold for species that does not have closely related species in reference databases.
- Standardizing functional annotation in sequence files. This latter aspect is critical to facilitate annotation comparisons.
Hence, go-FAnnoT
broadly consists in the following steps:
- Extracting reference datasets from rich and high quality databases. We decided to use
Uniprot
andTrEMBL
. - Building a hierarchy between the different reference datasets.
- Defining rules (different levels of homolgy) to transfer annotation.
- Process each input proteins iteratively against each datasets until finding a suitable annotation.
- (optional) Complete annotation with InterProScan functional domain prediction.
- Produce standardized functional annotations.
Our tool has been design to use Uniprot databases (SwissProt
or TrEMBL
). The complete SwissProt
database can be downloaded here (choose the file uniprot_sprot.dat.gz)
Concerning the TrEMBL
data, it is recommanded to download only a subset of the database as the complete one is too loarge. Thus, taxon level subsets are available here.
To run go-FAnnoT
, it is necessary to have NCBI-BLAST+ tool suite and NEEDLE (from EMBOSS tool suite) in the system PATH
. To do so, there are several solutions:
- Use a conda environment with these two tools.
- (Or) Install these tools. Binaries are available at the following urls:
- NCBI-BLAST+
- EMBOSS
- (Or, for linux only) Most of the recent distributions have these tools available directly in there repositories:
# Example with Ubuntu
apt-get install ncbi-blast+ emboss
To build the project you will have to install Go (see instructions here).
Then clone this repository:
git clone https://github.com/hdevillers/go-fannot.git
Enter the go-fannot
directory and build the project with make
instructions:
cd go-fannot
make
make test
For linux and macos, binary can be installed by running make install
with administrator rights. The default installation path is /usr/local/bin/
. It is possible to indicate a different installation path as follow:
make install -prefix my/install/path
Precompiled binaries for all platforms will be available soon.