Spatial Domain Identification¶

Pipeline question: What are the distinct tissue regions (spatial domains) defined by coherent gene expression patterns, and how do they relate to histological structures?

Overview¶

Spatial domain identification groups spots or cells into contiguous tissue regions that share similar transcriptomic profiles. Unlike standard clustering of dissociated cells, spatial domain methods must balance transcriptomic similarity with spatial coherence — two spots with identical expression profiles should be in the same domain only if they are spatially contiguous. This step reveals tissue architecture at a level that bridges molecular profiles and histological annotations.

Domains vs. niches

Spatial domains are contiguous tissue regions with coherent expression. Cellular niches are local microenvironments defined by cell-type composition and interactions. See the awesome-spatial-omics-niche resource for niche-specific methods.

Key Methods¶

BANKSY¶

Paper: Nature Genetics, 2024
Code: github.com/prabhakarlab/Banksy_py
Key innovation: Augments each cell's features with the mean and gradient of neighbor expression (azimuthal Gabor filter), then applies standard clustering — simple, fast, and highly effective.
Strengths:
- Conceptually simple: augment features, then cluster
- Scalable to millions of cells
- Works across sequencing-based and imaging-based platforms
Limitations:
- Neighborhood radius is a key parameter that requires tuning
- Does not explicitly model number of domains
Technology compatibility: Visium, Visium HD, Xenium, MERFISH, CosMx, Slide-seq, Stereo-seq

SpaGCN¶

Paper: Nature Methods, 2021
Code: github.com/jianhuupenn/SpaGCN
Key innovation: Graph convolutional network that integrates gene expression, spatial location, and histology image features for domain identification.
Strengths:
- Integrates histology information when available
- Identifies spatially variable genes per domain
Limitations:
- GCN training can be unstable with default hyperparameters
- Limited scalability to very large datasets
Technology compatibility: Visium, ST

STAGATE¶

Paper: Nature Communications, 2022
Code: github.com/QIFEIDKN/STAGATE_pyG
Key innovation: Graph attention autoencoder that learns spatial domain-aware embeddings by selectively attending to informative spatial neighbors.
Strengths:
- Attention mechanism identifies which neighbors matter most
- Top performer in multiple benchmarks
- Supports 3D spatial data
Limitations:
- Requires GPU for practical runtimes
- Sensitive to graph construction parameters
Technology compatibility: Visium, Slide-seq, Stereo-seq, MERFISH

BayesSpace¶

Paper: Nature Biotechnology, 2021
Code: github.com/edward130603/BayesSpace
Key innovation: Bayesian spatial clustering with Markov random field prior that encourages spatial smoothness, plus sub-spot resolution enhancement.
Strengths:
- Principled Bayesian framework with uncertainty quantification
- Sub-spot resolution enhancement for Visium
- Statistically well-grounded
Limitations:
- Specifically designed for Visium's hexagonal grid — does not generalize to irregular coordinates
- Computationally expensive MCMC inference
- R/Bioconductor only
Technology compatibility: Visium only

GraphST¶

Paper: Nature Communications, 2023
Code: github.com/JinmiaoChenLab/GraphST
Key innovation: Graph self-supervised contrastive learning framework that learns spatial representations via data augmentation and contrastive objectives.
Strengths:
- Self-supervised — no labels required
- Consistently top performer across benchmarks
- Also performs batch integration for multi-slice data
Limitations:
- Multiple loss terms require careful balancing
- GPU required
Technology compatibility: Visium, Slide-seq, Stereo-seq, MERFISH

BASS¶

Paper: Genome Biology, 2022
Code: github.com/zhengli09/BASS
Key innovation: Multi-scale Bayesian model that simultaneously identifies cell types and spatial domains in a unified framework.
Strengths:
- Joint cell-type and domain inference
- Handles multi-sample analysis natively
- Principled uncertainty quantification
Limitations:
- Computationally expensive for large datasets
- Requires specifying number of cell types and domains
Technology compatibility: Visium, Slide-seq

SpaceFlow¶

Paper: Nature Communications, 2022
Code: github.com/hongleir/SpaceFlow
Key innovation: Deep graph network that learns pseudo-spatiotemporal embeddings, capturing both spatial domains and gradual spatial transitions.
Strengths:
- Captures continuous spatial gradients, not just discrete domains
- Regularization loss encourages spatially smooth embeddings
Limitations:
- Pseudo-temporal interpretation can be misleading
- Moderate scalability
Technology compatibility: Visium, Slide-seq

SEDR¶

Paper: Nature Communications, 2022
Code: github.com/JKleinfeld/SEDR
Key innovation: Variational graph autoencoder with deep embedding that disentangles spatial and transcriptomic features.
Strengths:
- Deep generative model with clear latent space
- Good performance on Visium benchmarks
Limitations:
- Slower than non-deep methods
- Limited validation on imaging-based platforms
Technology compatibility: Visium, Slide-seq

GASTON¶

Paper: Nature Methods, 2024
Code: github.com/raphael-group/GASTON
Key innovation: Identifies spatial domains and spatially-varying gene expression programs simultaneously, modeling expression as piecewise linear functions of spatial coordinates.
Strengths:
- Interpretable model connecting domains to gene programs
- Captures within-domain expression gradients
Limitations:
- Assumes piecewise linear spatial patterns
- R implementation
Technology compatibility: Visium, Slide-seq

conST¶

Paper: Nature Communications, 2022
Code: github.com/ys-zong/conST
Key innovation: Contrastive learning on spatial transcriptomics graphs with self-supervised objectives, avoiding the need for domain labels.
Strengths:
- Self-supervised contrastive framework
- Integrates gene expression and morphological features
Limitations:
- Similar to GraphST but with less community adoption
- Augmentation strategies may not generalize across tissues
Technology compatibility: Visium

smoothclust¶

Paper: bioRxiv, 2024
Code: github.com/lmweber/smoothclust
Key innovation: Spatially-aware smoothing of expression data before standard clustering, achieving spatial coherence through preprocessing rather than custom models.
Strengths:
- Simple approach compatible with existing clustering pipelines
- No GPU or deep learning required
- Bioconductor integration
Limitations:
- Smoothing can blur genuine sharp boundaries
- Less sophisticated than deep learning approaches
Technology compatibility: Visium, Visium HD, any spatial platform

Vesalius¶

Paper: Nature Communications, 2024
Code: github.com/WJiangLab/Vesalius
Key innovation: Image-processing-inspired approach that converts gene expression to spatial "images" and applies morphological operations for territory detection.
Strengths:
- Unique image-processing perspective on spatial domains
- Identifies territory boundaries and gradients
Limitations:
- Novel approach with limited benchmarking against deep methods
- R implementation
Technology compatibility: Visium, Slide-seq

Benchmark Summary¶

Systematic benchmarks (e.g., from the STbenchmark and SpatialBenchmarking studies) consistently place GraphST and STAGATE as top performers for spatial domain identification on Visium data. BANKSY has emerged as a strong practical choice due to its simplicity, speed, and competitive accuracy. BayesSpace performs well on Visium but is strictly limited to grid-based data. GNN-based methods dominate overall, but their advantage diminishes on well-separated domains where simpler methods like BANKSY or smoothclust suffice.

Number of domains matters

Most methods require specifying the number of domains (k). Results are sensitive to this choice. Use multiple k values and assess stability. BayesSpace provides some guidance through its Bayesian framework, but no method reliably auto-selects k.

When to Use What¶

Your data	Your goal	Recommended	Why
Visium, quick analysis	Fast domain detection	BANKSY	Simple, fast, competitive accuracy
Visium, best accuracy	Benchmark-winning domains	GraphST or STAGATE	Top performers in systematic benchmarks
Visium, statistical rigor	Bayesian domain inference	BayesSpace	Uncertainty quantification, sub-spot resolution
Large dataset (>100K cells)	Scalable domains	BANKSY	Scales to millions without GPU
Multi-sample Visium	Cross-sample domain alignment	GraphST or BASS	Built-in batch integration
Imaging-based (MERFISH)	Cell-level domains	BANKSY or STAGATE	Validated on irregular cell coordinates
Continuous spatial gradients	Detect gradients, not just domains	SpaceFlow or GASTON	Model continuous spatial variation

Technology Compatibility¶

Method	Visium	Visium HD	Xenium	MERFISH	CosMx	CODEX	Stereo-seq
BANKSY	Yes	Yes	Yes	Yes	Yes	-	Yes
SpaGCN	Yes	-	-	-	-	-	-
STAGATE	Yes	-	-	Yes	-	-	Yes
BayesSpace	Yes	-	-	-	-	-	-
GraphST	Yes	-	-	Yes	-	-	Yes
BASS	Yes	-	-	-	-	-	-
SpaceFlow	Yes	-	-	-	-	-	-
SEDR	Yes	-	-	-	-	-	-
GASTON	Yes	-	-	-	-	-	-
conST	Yes	-	-	-	-	-	-
smoothclust	Yes	Yes	-	-	-	-	-
Vesalius	Yes	-	-	-	-	-	-