About ProteinFP
A comprehensive platform for protein function prediction using sequence similarity and phylogenetic analysis.
ProteinFP provides researchers with accessible, accurate tools for predicting protein functions. By combining traditional sequence similarity methods with phylogenetic analysis, we help bridge the gap between sequence data and functional annotation.
As genomic sequencing continues to outpace experimental characterization, computational function prediction becomes increasingly important for understanding biology and developing new therapeutics.
Protein Function Prediction
Extends PSI-BLAST by extracting GO annotations from distantly similar sequences (E-value up to 125) and applying contextual associations of GO terms. Sequences are weighted by E-values to reduce noise. Ranked best in CASP7 function prediction.
Phylogenetic PFP (Phylo-PFP)
Improves PFP by reranking sequences using phylogenetic distance. Constructs trees with MUSCLE alignment and PHYLIP neighbor-joining (PROTDIST with Jones-Taylor-Thornton model). Sequences weighted by ELE: (-log(E-value) + b) / phylogenetic_distance.
Extended Similarity Group
Performs recursive PSI-BLAST searches building a protein similarity graph. Assigns weights based on relative -log(E-value) at each iteration level. Ranked 4th in CAFA for MF predictions among 54 groups.
Domain-PFP
A self-supervised method to learn functional representations of protein domains through learning domain-GO co-occurrences and associations. Significantly outperformed state-of-the-art predictors.
Our predictions are based on the Gene Ontology (GO), a comprehensive framework for describing gene and protein functions. GO organizes biological knowledge into three categories:
Molecular Function
What the protein does at the molecular level
Biological Process
The biological pathways and processes involved
Cellular Component
Where in the cell the protein is located
Enhanced automated function prediction using distantly related sequences and contextual association by PFP
Hawkins T, Luban S, Kihara D. (2006) Protein Science, 15(6):1550-1556.
PFP: Automated prediction of Gene Ontology functional annotations with confidence scores using protein sequence data
Hawkins T, Chitale M, Luban S, Kihara D. (2009) Proteins, 74(3):566-582.
PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool
Khan IK, Wei Q, Chitale M, Kihara D. (2015) Bioinformatics, 31(2):271-272.
ESG: extended similarity group method for automated protein function prediction
Chitale M, Hawkins T, Park C, Kihara D. (2009) Bioinformatics, 25(14):1739-1745.
Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences
Jain A, Kihara D. (2019) Bioinformatics, 35(5):753-759.
Domain-PFP allows protein function prediction using function-aware domain embedding representations
Ibtehaz N, Kagaya Y, Kihara D. (2023) Communications Biology, 6:1103.
CASP7 (2007)
PFP ranked best in function prediction category
CAFA (2013)
ESG ranked 4th in Molecular Function predictions (54 participating groups)
CAFA2 (2016)
Phylo-PFP outperformed all top methods with highest Fmax in MF (0.606), BP (0.380), and CC (0.506)
CAFA3
Domain-PFP achieved overall the best performance among top teams
These tools were developed by the Kihara Bioinformatics Laboratory at Purdue University, Department of Computer Science and Department of Biological Sciences.