About

About ProteinFP

A comprehensive platform for protein function prediction using sequence similarity and phylogenetic analysis.

Our Mission

ProteinFP provides researchers with accessible, accurate tools for predicting protein functions. By combining traditional sequence similarity methods with phylogenetic analysis, we help bridge the gap between sequence data and functional annotation.

As genomic sequencing continues to outpace experimental characterization, computational function prediction becomes increasingly important for understanding biology and developing new therapeutics.

The Tools
PFP

Protein Function Prediction

Extends PSI-BLAST by extracting GO annotations from distantly similar sequences (E-value up to 125) and applying contextual associations of GO terms. Sequences are weighted by E-values to reduce noise. Ranked best in CASP7 function prediction.

~40s processingLarger coverage
P-P

Phylogenetic PFP (Phylo-PFP)

Improves PFP by reranking sequences using phylogenetic distance. Constructs trees with MUSCLE alignment and PHYLIP neighbor-joining (PROTDIST with Jones-Taylor-Thornton model). Sequences weighted by ELE: (-log(E-value) + b) / phylogenetic_distance.

Top CAFA2 performerBetter specificity
ESG

Extended Similarity Group

Performs recursive PSI-BLAST searches building a protein similarity graph. Assigns weights based on relative -log(E-value) at each iteration level. Ranked 4th in CAFA for MF predictions among 54 groups.

~7.5min processingBetter specificity
D-P

Domain-PFP

A self-supervised method to learn functional representations of protein domains through learning domain-GO co-occurrences and associations. Significantly outperformed state-of-the-art predictors.

Competitive in CAFA3<5min processing
Gene Ontology

Our predictions are based on the Gene Ontology (GO), a comprehensive framework for describing gene and protein functions. GO organizes biological knowledge into three categories:

Molecular Function

What the protein does at the molecular level

Biological Process

The biological pathways and processes involved

Cellular Component

Where in the cell the protein is located

Visit Gene Ontology
Publications

Enhanced automated function prediction using distantly related sequences and contextual association by PFP

Hawkins T, Luban S, Kihara D. (2006) Protein Science, 15(6):1550-1556.

PFP: Automated prediction of Gene Ontology functional annotations with confidence scores using protein sequence data

Hawkins T, Chitale M, Luban S, Kihara D. (2009) Proteins, 74(3):566-582.

PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool

Khan IK, Wei Q, Chitale M, Kihara D. (2015) Bioinformatics, 31(2):271-272.

ESG: extended similarity group method for automated protein function prediction

Chitale M, Hawkins T, Park C, Kihara D. (2009) Bioinformatics, 25(14):1739-1745.

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences

Jain A, Kihara D. (2019) Bioinformatics, 35(5):753-759.

Domain-PFP allows protein function prediction using function-aware domain embedding representations

Ibtehaz N, Kagaya Y, Kihara D. (2023) Communications Biology, 6:1103.

Recognition

CASP7 (2007)

PFP ranked best in function prediction category

CAFA (2013)

ESG ranked 4th in Molecular Function predictions (54 participating groups)

CAFA2 (2016)

Phylo-PFP outperformed all top methods with highest Fmax in MF (0.606), BP (0.380), and CC (0.506)

CAFA3

Domain-PFP achieved overall the best performance among top teams

From Kihara Lab

These tools were developed by the Kihara Bioinformatics Laboratory at Purdue University, Department of Computer Science and Department of Biological Sciences.