Thanks to recent developments in genomic sequencing technologies, the number of protein sequences in public databases is growing enormously. In order to exploit more fully this huge quantity of data, protein sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. According to the March 2018 release of UniProtKB, some 556,000 sequences are manually curated but over 111 million sequences lack functional annotations. The ability to automatically annotate protein sequences in UniProtKB/TrEMBL, the non-reviewed UniProt sequence repository, would represent a major step towards bridging the gap between annotated and unannotated protein sequences.
Here, I ensemble a curated list of resources regarding the research in protein function annotation.
- Uniprot Knowledge Base
- Protein Data Bank PDB
- Protein Domain Data Base
- Sequence Similarity Based
-
GrAPFI 20120
-
ECPred 2018
-
DEEPre 2018
-
COFACTOR 2017
-
SVMProt 2016
-
EC-BLast 2014
-
EzyPred 2007
- [Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity]
- [Predicting human protein function with multi-task deep neural networks]
- [Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction]
- [Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge]
- [MS-kNN: protein function prediction by integrating multiple data sources]
- [A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data]
- [Predicting gene function using similarity learning]
- [Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae]
- [Prediction of protein function from protein sequence and structure]
- [Predicting protein function from sequence and structural data]
- [Automatic annotation of protein function]
- [Automatic annotation of protein function based on family identification]
- [Annotating proteins by mining protein interaction networks]
- [ProFAT: a web-based tool for the functional annotation of protein sequences]
- [An efficient method for protein function annotation based on multilayer protein networks]
- [Functional Annotation of Proteins using Domain Embedding based Sequence Classification]
- [Accurate prediction of protein enzymatic class by N-to-1 Neural Networks]
- [Efficiency analysis of {KNN} and minimum distance-based classifiers in enzyme family prediction]
- [ECS: an automatic enzyme classifier based on functional domain composition]
- [Predicting enzyme subclass by functional domain composition and pseudo amino acid composition]
- [Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method]
- [EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes]