Kinase-Substrate Inference Methods¶

Kinase-substrate inference is the central computational challenge in phosphoproteomics: connecting observed phosphosite changes to the kinases responsible for those changes.

Motif-Based Methods¶

KinaseLibrary¶

Paper: Nature, 2023
Code: github.com/TheKinaseLibrary/kinase-library
Approach: Position-specific scoring matrices (PSSMs) derived from combinatorial peptide library profiling of 303 serine/threonine kinases. Scores substrates by sequence motif match.
Key innovation: Experimentally determined specificity profiles for the majority of the human Ser/Thr kinome, replacing curated databases with systematic biochemical measurement.
Strengths: Largest single-source kinase specificity resource; high-quality biochemical data; directly interpretable scores.
Limitations: Ser/Thr kinases only (no tyrosine kinases); motif alone cannot account for subcellular co-localization or scaffolding; in vitro preferences may not fully reflect in vivo specificity.

GPS 6.0¶

Paper: Nucleic Acids Research, 2023
Code: gps.biocuckoo.cn
Approach: Group-based prediction system using sequence patterns around phosphosites to assign kinase families, trained on curated kinase-substrate pairs.
Key innovation: Hierarchical prediction at kinase group, family, and individual kinase levels; long-standing resource with extensive validation.
Strengths: Covers both Ser/Thr and Tyr kinases; regularly updated; includes kinase group-level predictions when individual kinase assignment is uncertain.
Limitations: Training data bias toward well-studied kinases; sequence-only features.

iGPS¶

Paper: Nucleic Acids Research, 2015
Code: igps.biocuckoo.org
Approach: Integrates GPS motif predictions with protein-protein interaction networks to filter kinase-substrate pairs by physical interaction evidence.
Key innovation: Combining sequence specificity with network context to reduce false positives from motif-only approaches.
Strengths: More biologically realistic than motif-only methods; leverages interaction databases.
Limitations: Dependent on PPI network completeness; web server availability can be inconsistent.

NetworKIN¶

Paper: Cell, 2007
Code: networkin.info
Approach: Combines linear motif scoring with network proximity in protein association networks (STRING) to predict kinase-substrate relationships.
Key innovation: Pioneered the integration of contextual network information with sequence motifs for kinase prediction.
Strengths: Foundational method; conceptually influential; uses network context to improve specificity.
Limitations: Aging resource with limited recent updates; network data can introduce biases from well-studied pathways.

Enrichment-Based Methods¶

KSEAapp¶

Paper: Bioinformatics, 2017
Code: github.com/casecpb/KSEAapp
Approach: Kinase-substrate enrichment analysis. Tests whether substrates of a given kinase show coordinated changes in phosphorylation using a z-score statistic.
Key innovation: Simple, accessible enrichment framework analogous to GSEA but applied to kinase-substrate sets from PhosphoSitePlus.
Strengths: Easy to use; intuitive output; widely adopted; available as R package and web app.
Limitations: Relies heavily on PhosphoSitePlus coverage (biased toward well-studied kinases); treats all substrates equally regardless of motif confidence.

KEA3¶

Paper: Nucleic Acids Research, 2021
Code: github.com/MaayanLab/KEA3web
Approach: Kinase enrichment analysis integrating substrate annotations from multiple databases and ranking kinases using a mean-rank aggregation across libraries.
Key innovation: Aggregates evidence from diverse kinase-substrate resources to produce consensus rankings more robust than any single source.
Strengths: Broad coverage by combining multiple databases; consensus ranking reduces single-source bias; web interface and API.
Limitations: Aggregation can obscure which evidence sources drive a result; enrichment-based approaches assume coordinated substrate regulation.

PhosX¶

Paper: Bioinformatics, 2024
Code: github.com/alussana/PhosX
Approach: Combines motif scoring with enrichment analysis, using KinaseLibrary-derived PSSMs as the scoring backbone and testing for enrichment of high-scoring substrates among regulated sites.
Key innovation: Bridges motif-based and enrichment-based approaches; does not require pre-defined kinase-substrate databases.
Strengths: Database-free kinase activity inference; leverages high-quality KinaseLibrary motifs; performs well in benchmarks.
Limitations: Limited to Ser/Thr kinases (inheriting the KinaseLibrary scope); relatively new with less community adoption.

INKA¶

Paper: Molecular & Cellular Proteomics, 2019
Web: inkascore.org
Approach: Integrative Inferred Kinase Activity scoring combining kinase-centric (activation loop phosphorylation) and substrate-centric (downstream target regulation) evidence.
Key innovation: Combines direct kinase phosphorylation evidence with indirect substrate-based inference for a more complete activity picture.
Strengths: Dual evidence streams increase confidence; captures kinases with few known substrates via their own phosphorylation.
Limitations: Requires detection of kinase activation loop peptides, which is not always achieved.

Network-Based Methods¶

RoKAI¶

Paper: Nucleic Acids Research, 2021
Code: github.com/serhan-yilmaz/RoKAI
Approach: Robust Kinase Activity Inference propagates phosphosite quantifications across a functional network of phosphosites before performing kinase activity scoring. Uses network smoothing to share information between functionally related sites.
Key innovation: Network propagation step imputes missing phosphosite values and denoises observed measurements, improving downstream kinase activity estimates.
Strengths: Addresses missing values and measurement noise -- critical issues in phosphoproteomics; compatible with any downstream scoring method.
Limitations: Results depend on the quality of the phosphosite functional network; adds a layer of complexity.

decoupleR¶

Paper: Bioinformatics Advances, 2022
Code: github.com/saezlab/decoupleR
Approach: Framework for inferring biological activities from omics data using multiple statistical methods (ULM, MLM, VIPER, GSEA). Applied to kinase activity inference using kinase-substrate prior knowledge networks.
Key innovation: Method-agnostic framework that runs multiple inference algorithms and provides consensus scores; part of the Saez-Rodriguez lab ecosystem.
Strengths: Flexible; supports multiple statistical backends; integrates with OmniPath for prior knowledge; available in both R and Python.
Limitations: Performance depends on quality of prior knowledge network; consensus approach may average out true signal in some cases.

KSTAR¶

Paper: Nature Communications, 2022
Code: github.com/NaegleLab/KSTAR
Approach: Uses random subsampling of kinase-substrate networks to generate activity predictions robust to database biases. Builds multiple pruned networks and aggregates predictions.
Key innovation: Explicitly addresses study bias in kinase-substrate databases by random network pruning, producing predictions less dominated by well-studied kinases.
Strengths: Addresses a fundamental bias problem; strong statistical framework; works with binary (regulated/not) input.
Limitations: Computationally intensive due to repeated network sampling; binary input discards quantitative fold-change information.

BenchmarKIN Findings¶

The benchmarKIN study (Bioinformatics, 2024) systematically compared kinase activity inference methods across perturbation datasets. Key findings: motif-based methods (KinaseLibrary, PhosX) and enrichment methods (KSEAapp) performed comparably on well-covered kinases; no single method dominated across all scenarios; combining motif and enrichment approaches improved robustness; all methods struggled with understudied kinases lacking sufficient substrate annotations.

When to Use Kinase-Substrate Inference Methods¶

Best for:

Identifying activated or inhibited kinases from phosphoproteomics data
Prioritizing kinase inhibitors for therapeutic targeting
Comparing kinase activity across cancer types or treatment conditions
Generating mechanistic hypotheses from phosphosite-level measurements

Not ideal for:

Predicting kinase activity for poorly characterized kinases with few known substrates
Replacing direct kinase activity assays for clinical decision-making
Inferring full signaling pathway topology (see pathway reconstruction methods)