Skip to content

Kinase-Substrate Inference Methods

Kinase-substrate inference is the central computational challenge in phosphoproteomics: connecting observed phosphosite changes to the kinases responsible for those changes.

Motif-Based Methods

KinaseLibrary

  • Paper: Nature, 2023
  • Code: github.com/TheKinaseLibrary/kinase-library
  • Approach: Position-specific scoring matrices (PSSMs) derived from combinatorial peptide library profiling of 303 serine/threonine kinases. Scores substrates by sequence motif match.
  • Key innovation: Experimentally determined specificity profiles for the majority of the human Ser/Thr kinome, replacing curated databases with systematic biochemical measurement.
  • Strengths: Largest single-source kinase specificity resource; high-quality biochemical data; directly interpretable scores.
  • Limitations: Ser/Thr kinases only (no tyrosine kinases); motif alone cannot account for subcellular co-localization or scaffolding; in vitro preferences may not fully reflect in vivo specificity.

GPS 6.0

  • Paper: Nucleic Acids Research, 2023
  • Code: gps.biocuckoo.cn
  • Approach: Group-based prediction system using sequence patterns around phosphosites to assign kinase families, trained on curated kinase-substrate pairs.
  • Key innovation: Hierarchical prediction at kinase group, family, and individual kinase levels; long-standing resource with extensive validation.
  • Strengths: Covers both Ser/Thr and Tyr kinases; regularly updated; includes kinase group-level predictions when individual kinase assignment is uncertain.
  • Limitations: Training data bias toward well-studied kinases; sequence-only features.

iGPS

  • Paper: Nucleic Acids Research, 2015
  • Code: igps.biocuckoo.org
  • Approach: Integrates GPS motif predictions with protein-protein interaction networks to filter kinase-substrate pairs by physical interaction evidence.
  • Key innovation: Combining sequence specificity with network context to reduce false positives from motif-only approaches.
  • Strengths: More biologically realistic than motif-only methods; leverages interaction databases.
  • Limitations: Dependent on PPI network completeness; web server availability can be inconsistent.

NetworKIN

  • Paper: Cell, 2007
  • Code: networkin.info
  • Approach: Combines linear motif scoring with network proximity in protein association networks (STRING) to predict kinase-substrate relationships.
  • Key innovation: Pioneered the integration of contextual network information with sequence motifs for kinase prediction.
  • Strengths: Foundational method; conceptually influential; uses network context to improve specificity.
  • Limitations: Aging resource with limited recent updates; network data can introduce biases from well-studied pathways.

Enrichment-Based Methods

KSEAapp

  • Paper: Bioinformatics, 2017
  • Code: github.com/casecpb/KSEAapp
  • Approach: Kinase-substrate enrichment analysis. Tests whether substrates of a given kinase show coordinated changes in phosphorylation using a z-score statistic.
  • Key innovation: Simple, accessible enrichment framework analogous to GSEA but applied to kinase-substrate sets from PhosphoSitePlus.
  • Strengths: Easy to use; intuitive output; widely adopted; available as R package and web app.
  • Limitations: Relies heavily on PhosphoSitePlus coverage (biased toward well-studied kinases); treats all substrates equally regardless of motif confidence.

KEA3

  • Paper: Nucleic Acids Research, 2021
  • Code: github.com/MaayanLab/KEA3web
  • Approach: Kinase enrichment analysis integrating substrate annotations from multiple databases and ranking kinases using a mean-rank aggregation across libraries.
  • Key innovation: Aggregates evidence from diverse kinase-substrate resources to produce consensus rankings more robust than any single source.
  • Strengths: Broad coverage by combining multiple databases; consensus ranking reduces single-source bias; web interface and API.
  • Limitations: Aggregation can obscure which evidence sources drive a result; enrichment-based approaches assume coordinated substrate regulation.

PhosX

  • Paper: Bioinformatics, 2024
  • Code: github.com/alussana/PhosX
  • Approach: Combines motif scoring with enrichment analysis, using KinaseLibrary-derived PSSMs as the scoring backbone and testing for enrichment of high-scoring substrates among regulated sites.
  • Key innovation: Bridges motif-based and enrichment-based approaches; does not require pre-defined kinase-substrate databases.
  • Strengths: Database-free kinase activity inference; leverages high-quality KinaseLibrary motifs; performs well in benchmarks.
  • Limitations: Limited to Ser/Thr kinases (inheriting the KinaseLibrary scope); relatively new with less community adoption.

INKA

  • Paper: Molecular & Cellular Proteomics, 2019
  • Web: inkascore.org
  • Approach: Integrative Inferred Kinase Activity scoring combining kinase-centric (activation loop phosphorylation) and substrate-centric (downstream target regulation) evidence.
  • Key innovation: Combines direct kinase phosphorylation evidence with indirect substrate-based inference for a more complete activity picture.
  • Strengths: Dual evidence streams increase confidence; captures kinases with few known substrates via their own phosphorylation.
  • Limitations: Requires detection of kinase activation loop peptides, which is not always achieved.

Network-Based Methods

RoKAI

  • Paper: Nucleic Acids Research, 2021
  • Code: github.com/serhan-yilmaz/RoKAI
  • Approach: Robust Kinase Activity Inference propagates phosphosite quantifications across a functional network of phosphosites before performing kinase activity scoring. Uses network smoothing to share information between functionally related sites.
  • Key innovation: Network propagation step imputes missing phosphosite values and denoises observed measurements, improving downstream kinase activity estimates.
  • Strengths: Addresses missing values and measurement noise -- critical issues in phosphoproteomics; compatible with any downstream scoring method.
  • Limitations: Results depend on the quality of the phosphosite functional network; adds a layer of complexity.

decoupleR

  • Paper: Bioinformatics Advances, 2022
  • Code: github.com/saezlab/decoupleR
  • Approach: Framework for inferring biological activities from omics data using multiple statistical methods (ULM, MLM, VIPER, GSEA). Applied to kinase activity inference using kinase-substrate prior knowledge networks.
  • Key innovation: Method-agnostic framework that runs multiple inference algorithms and provides consensus scores; part of the Saez-Rodriguez lab ecosystem.
  • Strengths: Flexible; supports multiple statistical backends; integrates with OmniPath for prior knowledge; available in both R and Python.
  • Limitations: Performance depends on quality of prior knowledge network; consensus approach may average out true signal in some cases.

KSTAR

  • Paper: Nature Communications, 2022
  • Code: github.com/NaegleLab/KSTAR
  • Approach: Uses random subsampling of kinase-substrate networks to generate activity predictions robust to database biases. Builds multiple pruned networks and aggregates predictions.
  • Key innovation: Explicitly addresses study bias in kinase-substrate databases by random network pruning, producing predictions less dominated by well-studied kinases.
  • Strengths: Addresses a fundamental bias problem; strong statistical framework; works with binary (regulated/not) input.
  • Limitations: Computationally intensive due to repeated network sampling; binary input discards quantitative fold-change information.

BenchmarKIN Findings

The benchmarKIN study (Bioinformatics, 2024) systematically compared kinase activity inference methods across perturbation datasets. Key findings: motif-based methods (KinaseLibrary, PhosX) and enrichment methods (KSEAapp) performed comparably on well-covered kinases; no single method dominated across all scenarios; combining motif and enrichment approaches improved robustness; all methods struggled with understudied kinases lacking sufficient substrate annotations.

When to Use Kinase-Substrate Inference Methods

Best for:

  • Identifying activated or inhibited kinases from phosphoproteomics data
  • Prioritizing kinase inhibitors for therapeutic targeting
  • Comparing kinase activity across cancer types or treatment conditions
  • Generating mechanistic hypotheses from phosphosite-level measurements

Not ideal for:

  • Predicting kinase activity for poorly characterized kinases with few known substrates
  • Replacing direct kinase activity assays for clinical decision-making
  • Inferring full signaling pathway topology (see pathway reconstruction methods)