Skip to content

Spatial Domain Identification

Pipeline question: What are the distinct tissue regions (spatial domains) defined by coherent gene expression patterns, and how do they relate to histological structures?

Overview

Spatial domain identification groups spots or cells into contiguous tissue regions that share similar transcriptomic profiles. Unlike standard clustering of dissociated cells, spatial domain methods must balance transcriptomic similarity with spatial coherence — two spots with identical expression profiles should be in the same domain only if they are spatially contiguous. This step reveals tissue architecture at a level that bridges molecular profiles and histological annotations.

Domains vs. niches

Spatial domains are contiguous tissue regions with coherent expression. Cellular niches are local microenvironments defined by cell-type composition and interactions. See the awesome-spatial-omics-niche resource for niche-specific methods.

Key Methods

BANKSY

  • Paper: Nature Genetics, 2024
  • Code: github.com/prabhakarlab/Banksy_py
  • Key innovation: Augments each cell's features with the mean and gradient of neighbor expression (azimuthal Gabor filter), then applies standard clustering — simple, fast, and highly effective.
  • Strengths:
    • Conceptually simple: augment features, then cluster
    • Scalable to millions of cells
    • Works across sequencing-based and imaging-based platforms
  • Limitations:
    • Neighborhood radius is a key parameter that requires tuning
    • Does not explicitly model number of domains
  • Technology compatibility: Visium, Visium HD, Xenium, MERFISH, CosMx, Slide-seq, Stereo-seq

SpaGCN

  • Paper: Nature Methods, 2021
  • Code: github.com/jianhuupenn/SpaGCN
  • Key innovation: Graph convolutional network that integrates gene expression, spatial location, and histology image features for domain identification.
  • Strengths:
    • Integrates histology information when available
    • Identifies spatially variable genes per domain
  • Limitations:
    • GCN training can be unstable with default hyperparameters
    • Limited scalability to very large datasets
  • Technology compatibility: Visium, ST

STAGATE

  • Paper: Nature Communications, 2022
  • Code: github.com/QIFEIDKN/STAGATE_pyG
  • Key innovation: Graph attention autoencoder that learns spatial domain-aware embeddings by selectively attending to informative spatial neighbors.
  • Strengths:
    • Attention mechanism identifies which neighbors matter most
    • Top performer in multiple benchmarks
    • Supports 3D spatial data
  • Limitations:
    • Requires GPU for practical runtimes
    • Sensitive to graph construction parameters
  • Technology compatibility: Visium, Slide-seq, Stereo-seq, MERFISH

BayesSpace

  • Paper: Nature Biotechnology, 2021
  • Code: github.com/edward130603/BayesSpace
  • Key innovation: Bayesian spatial clustering with Markov random field prior that encourages spatial smoothness, plus sub-spot resolution enhancement.
  • Strengths:
    • Principled Bayesian framework with uncertainty quantification
    • Sub-spot resolution enhancement for Visium
    • Statistically well-grounded
  • Limitations:
    • Specifically designed for Visium's hexagonal grid — does not generalize to irregular coordinates
    • Computationally expensive MCMC inference
    • R/Bioconductor only
  • Technology compatibility: Visium only

GraphST

  • Paper: Nature Communications, 2023
  • Code: github.com/JinmiaoChenLab/GraphST
  • Key innovation: Graph self-supervised contrastive learning framework that learns spatial representations via data augmentation and contrastive objectives.
  • Strengths:
    • Self-supervised — no labels required
    • Consistently top performer across benchmarks
    • Also performs batch integration for multi-slice data
  • Limitations:
    • Multiple loss terms require careful balancing
    • GPU required
  • Technology compatibility: Visium, Slide-seq, Stereo-seq, MERFISH

BASS

  • Paper: Genome Biology, 2022
  • Code: github.com/zhengli09/BASS
  • Key innovation: Multi-scale Bayesian model that simultaneously identifies cell types and spatial domains in a unified framework.
  • Strengths:
    • Joint cell-type and domain inference
    • Handles multi-sample analysis natively
    • Principled uncertainty quantification
  • Limitations:
    • Computationally expensive for large datasets
    • Requires specifying number of cell types and domains
  • Technology compatibility: Visium, Slide-seq

SpaceFlow

  • Paper: Nature Communications, 2022
  • Code: github.com/hongleir/SpaceFlow
  • Key innovation: Deep graph network that learns pseudo-spatiotemporal embeddings, capturing both spatial domains and gradual spatial transitions.
  • Strengths:
    • Captures continuous spatial gradients, not just discrete domains
    • Regularization loss encourages spatially smooth embeddings
  • Limitations:
    • Pseudo-temporal interpretation can be misleading
    • Moderate scalability
  • Technology compatibility: Visium, Slide-seq

SEDR

  • Paper: Nature Communications, 2022
  • Code: github.com/JKleinfeld/SEDR
  • Key innovation: Variational graph autoencoder with deep embedding that disentangles spatial and transcriptomic features.
  • Strengths:
    • Deep generative model with clear latent space
    • Good performance on Visium benchmarks
  • Limitations:
    • Slower than non-deep methods
    • Limited validation on imaging-based platforms
  • Technology compatibility: Visium, Slide-seq

GASTON

  • Paper: Nature Methods, 2024
  • Code: github.com/raphael-group/GASTON
  • Key innovation: Identifies spatial domains and spatially-varying gene expression programs simultaneously, modeling expression as piecewise linear functions of spatial coordinates.
  • Strengths:
    • Interpretable model connecting domains to gene programs
    • Captures within-domain expression gradients
  • Limitations:
    • Assumes piecewise linear spatial patterns
    • R implementation
  • Technology compatibility: Visium, Slide-seq

conST

  • Paper: Nature Communications, 2022
  • Code: github.com/ys-zong/conST
  • Key innovation: Contrastive learning on spatial transcriptomics graphs with self-supervised objectives, avoiding the need for domain labels.
  • Strengths:
    • Self-supervised contrastive framework
    • Integrates gene expression and morphological features
  • Limitations:
    • Similar to GraphST but with less community adoption
    • Augmentation strategies may not generalize across tissues
  • Technology compatibility: Visium

smoothclust

  • Paper: bioRxiv, 2024
  • Code: github.com/lmweber/smoothclust
  • Key innovation: Spatially-aware smoothing of expression data before standard clustering, achieving spatial coherence through preprocessing rather than custom models.
  • Strengths:
    • Simple approach compatible with existing clustering pipelines
    • No GPU or deep learning required
    • Bioconductor integration
  • Limitations:
    • Smoothing can blur genuine sharp boundaries
    • Less sophisticated than deep learning approaches
  • Technology compatibility: Visium, Visium HD, any spatial platform

Vesalius

  • Paper: Nature Communications, 2024
  • Code: github.com/WJiangLab/Vesalius
  • Key innovation: Image-processing-inspired approach that converts gene expression to spatial "images" and applies morphological operations for territory detection.
  • Strengths:
    • Unique image-processing perspective on spatial domains
    • Identifies territory boundaries and gradients
  • Limitations:
    • Novel approach with limited benchmarking against deep methods
    • R implementation
  • Technology compatibility: Visium, Slide-seq

Benchmark Summary

Systematic benchmarks (e.g., from the STbenchmark and SpatialBenchmarking studies) consistently place GraphST and STAGATE as top performers for spatial domain identification on Visium data. BANKSY has emerged as a strong practical choice due to its simplicity, speed, and competitive accuracy. BayesSpace performs well on Visium but is strictly limited to grid-based data. GNN-based methods dominate overall, but their advantage diminishes on well-separated domains where simpler methods like BANKSY or smoothclust suffice.

Number of domains matters

Most methods require specifying the number of domains (k). Results are sensitive to this choice. Use multiple k values and assess stability. BayesSpace provides some guidance through its Bayesian framework, but no method reliably auto-selects k.

When to Use What

Your data Your goal Recommended Why
Visium, quick analysis Fast domain detection BANKSY Simple, fast, competitive accuracy
Visium, best accuracy Benchmark-winning domains GraphST or STAGATE Top performers in systematic benchmarks
Visium, statistical rigor Bayesian domain inference BayesSpace Uncertainty quantification, sub-spot resolution
Large dataset (>100K cells) Scalable domains BANKSY Scales to millions without GPU
Multi-sample Visium Cross-sample domain alignment GraphST or BASS Built-in batch integration
Imaging-based (MERFISH) Cell-level domains BANKSY or STAGATE Validated on irregular cell coordinates
Continuous spatial gradients Detect gradients, not just domains SpaceFlow or GASTON Model continuous spatial variation

Technology Compatibility

Method Visium Visium HD Xenium MERFISH CosMx CODEX Stereo-seq
BANKSY Yes Yes Yes Yes Yes - Yes
SpaGCN Yes - - - - - -
STAGATE Yes - - Yes - - Yes
BayesSpace Yes - - - - - -
GraphST Yes - - Yes - - Yes
BASS Yes - - - - - -
SpaceFlow Yes - - - - - -
SEDR Yes - - - - - -
GASTON Yes - - - - - -
conST Yes - - - - - -
smoothclust Yes Yes - - - - -
Vesalius Yes - - - - - -