Documentation

Getting Started

Learn how to use the ProteinFP tools for protein function prediction.

What is ProteinFP?

ProteinFP is a web-based platform for predicting protein functions using Gene Ontology (GO) terms. It provides three complementary prediction methods:

  • PFP - Baseline prediction using PSI-BLAST sequence similarity
  • Phylo-PFP - Enhanced prediction with phylogenetic distance weighting
  • ESG - Extended similarity group for distant homolog discovery
  • Domain-PFP - Self-supervised domain embedding representations
Input Format

All tools accept protein sequences in FASTA format:

>protein_name optional description
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLML
SPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPS
WPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFC

Requirements

  • • Header line starting with >
  • • Standard amino acid letters
  • • Minimum 10 residues

Accepted Characters

  • • A-Z amino acid codes
  • • X for unknown residues
  • • * for stop codons
Output Format

Results are provided for three GO categories:

Molecular Function (MF)

Biochemical activities of the protein

Biological Process (BP)

Larger biological goals the protein contributes to

Cellular Component (CC)

Cellular locations where the protein is active

Each GO term prediction includes a probability score (0-1) indicating confidence in the prediction.