Skip to content

Functional Scoring Methods

Functional scoring addresses a critical bottleneck in phosphoproteomics: fewer than 3% of the roughly 200,000 cataloged human phosphosites have a known biological function, making it essential to computationally prioritize sites likely to be functionally important.

Key Methods

funscoR

  • Paper: Molecular Systems Biology, 2023
  • Code: github.com/evocellnet/funscoR
  • Approach: Random forest classifier predicting functional phosphosites using 59 features spanning evolutionary conservation, structural context, regulatory annotations, and proteomics evidence. Trained on PhosphoSitePlus regulatory sites as positives.
  • Key innovation: Largest feature set for functional scoring; systematic integration of orthogonal evidence types; provides continuous scores rather than binary classification.
  • Strengths: Comprehensive feature integration; well-calibrated probability scores; open-source R package; covers both Ser/Thr and Tyr sites.
  • Limitations: Training set biased toward well-studied sites and pathways; conservation features less informative for lineage-specific regulation; random forest offers limited mechanistic insight into why a site scores highly.

MIMP

  • Paper: Nature Methods, 2015
  • Code: github.com/MoHelmy/mimp
  • Approach: Mutation Impact on Phosphorylation. Predicts whether somatic mutations near phosphosites create or destroy kinase recognition motifs, linking cancer mutations to altered phospho-signaling.
  • Key innovation: Directly connects the cancer genomics layer (somatic mutations) to the phosphoproteomics layer (kinase-substrate rewiring) by modeling motif disruption.
  • Strengths: Mechanistically interpretable; identifies gain-of-phosphorylation and loss-of-phosphorylation events; relevant for precision oncology.
  • Limitations: Only considers mutations within the linear motif window; misses allosteric or structural effects of distant mutations; requires high-quality kinase motif models.

DeepMVP

  • Paper: Bioinformatics, 2024
  • Code: github.com/bzhanglab/DeepMVP
  • Approach: Deep learning model predicting the functional impact of mutations on phosphosites using protein language model embeddings and structural features from AlphaFold2-predicted structures.
  • Key innovation: Leverages protein language models (ESM-2) and AlphaFold2 structures to capture sequence and structural context beyond linear motifs; goes deeper than motif disruption alone.
  • Strengths: Captures structural context that linear motif methods miss; benefits from recent advances in protein structure prediction; strong benchmark performance.
  • Limitations: Computationally heavier than motif-based methods; AlphaFold2 structure quality varies for disordered regions where many phosphosites reside; deep learning model interpretability is limited.

FuncPhos

  • Paper: Cell Reports, 2023
  • Code: github.com/evocellnet/funscoR
  • Approach: Curated database and scoring framework classifying phosphosites by functional evidence type: enzymatic activity regulation, protein interactions, localization, stability, and other regulatory mechanisms.
  • Key innovation: Structured functional annotation taxonomy for phosphosites; distinguishes between different mechanisms of phospho-regulation rather than treating function as binary.
  • Strengths: Rich functional categorization; manually curated high-confidence annotations; useful for interpreting why a site matters, not just whether it matters.
  • Limitations: Limited coverage (curated databases are inherently small); labor-intensive to maintain and expand; primarily literature-derived with associated publication bias.

The Functional Dark Matter Problem

The vast majority of detected phosphosites remain functionally uncharacterized. This "dark phosphoproteome" creates a fundamental challenge: phosphoproteomics experiments routinely quantify tens of thousands of sites, but interpretation frameworks built on known functions cover only a small fraction. Functional scoring methods attempt to triage this gap using predictive features.

Predictive Feature Categories

Three categories of features consistently contribute to functional phosphosite prediction:

  • Evolutionary conservation: Functionally important sites tend to be conserved across species. However, conservation alone has limited sensitivity because some regulatory phosphosites are lineage-specific adaptations.
  • Structural features: Sites in structured regions, near protein-protein interfaces, or in activation loops are more likely to be regulatory. AlphaFold2 has expanded structural coverage, but many phosphosites reside in intrinsically disordered regions with limited structural context.
  • Regulatory evidence: Stoichiometry (high occupancy suggests function), co-regulation with known functional sites, and responsiveness to perturbations all provide indirect evidence of functional importance.

When to Use Functional Scoring Methods

Best for:

  • Prioritizing phosphosites for experimental validation from large-scale datasets
  • Filtering differentially phosphorylated sites to those most likely to be biologically meaningful
  • Connecting somatic mutations to phospho-signaling consequences in cancer genomics
  • Annotating phosphoproteomic atlases with functional confidence scores

Not ideal for:

  • Definitively establishing biological function (experimental validation remains necessary)
  • Sites in poorly conserved or taxonomically restricted proteins
  • Replacing kinase-substrate inference (functional scoring asks "is this site important?" not "which kinase phosphorylates it?")