Analysis Pipeline Decision Tree¶

This is the most practically useful page on the site. It answers two questions: "I have data from technology X -- what tools should I use?" and "I want to answer question Y -- what category of tool do I need?"

Opinionated recommendations

These pipelines reflect a synthesis of benchmarks, community adoption, and practical experience as of early 2026. The field moves fast. Where a clear winner exists, it is named; where the choice is context-dependent, alternatives are listed with trade-offs.

Part 1: What technology do you have?¶

Use this table to jump to the technology-specific pipeline that matches your data.

Technology	Type	Resolution	Recommended pipeline
Visium	Sequencing	55 um (multi-cell)	Visium pipeline
Visium HD	Sequencing	2 um (bins)	Visium HD pipeline
Xenium	Imaging	Subcellular	Xenium pipeline
MERFISH / MERSCOPE	Imaging	Subcellular	MERFISH pipeline
CosMx SMI	Imaging	Subcellular	CosMx pipeline
CODEX / PhenoCycler	Protein	Subcellular	CODEX pipeline
Stereo-seq	Sequencing	~500 nm (bins)	Stereo-seq pipeline
Slide-seq V2	Sequencing	10 um	Slide-seq pipeline
GeoMx DSP	Sequencing/Protein	ROI-level	GeoMx pipeline

Visium pipeline¶

The most mature ecosystem. Spot-level resolution (55 um) means each spot captures 1-10 cells, making deconvolution essential for cell-type inference.

Raw data
  |
  v
SpaceRanger (10x) ---- alignment + counting
  |
  v
SpotClean / SpatialDWLS ---- ambient RNA correction
  |
  v
Scanpy/Squidpy ---- standard QC, filtering, normalization, HVG selection
  |
  v
BayesSpace / STAGATE / SpaGCN ---- spatial domain identification
  |
  v
Cell2location / RCTD / Tangram ---- cell type deconvolution (requires scRNA-seq reference)
  |
  v
nnSVG / SPARK-X ---- spatially variable gene detection
  |
  v
CellChat v2 / LIANA+ ---- cell-cell communication inference
  |
  v
Squidpy / stLearn ---- neighborhood enrichment, spatial statistics

Key decisions

Deconvolution method: Cell2location is the benchmark leader for Visium but requires a good scRNA-seq reference. RCTD is faster and more robust to reference mismatches. Tangram works well when the reference is from the same tissue.
Spatial domains: BayesSpace is Visium-native and leverages the grid structure. STAGATE uses graph attention networks and generalizes better. SpaGCN is simpler but less accurate in benchmarks.
SVG detection: nnSVG is the current benchmark leader. SPARK-X is faster for large datasets. Avoid older methods like SpatialDE v1 (slow, inflated false positives).

Visium HD pipeline¶

Visium HD generates 2-um bins that must be aggregated before most analysis. The ecosystem is still maturing -- expect gaps and rough edges.

Raw data (2-um bins)
  |
  v
SpaceRanger (10x) ---- alignment + counting at 2-um and 8-um bins
  |
  v
Bin2Cell / ENACT ---- bin-to-cell aggregation using H&E image
  |
  v
Standard QC (Scanpy/Squidpy) ---- filtering, normalization
  |
  v
BANKSY / STAGATE ---- spatial domain identification (scalable methods required)
  |
  v
Cell2location / RCTD ---- deconvolution (if using 8-um bins without Bin2Cell)
  |
  v
nnSVG ---- spatially variable genes (subsample if needed for speed)
  |
  v
CellChat v2 / COMMOT ---- cell-cell communication

Visium HD is still emerging

Bin2Cell (Kleshchevnikov et al., 2024) and ENACT are the leading bin-to-cell methods but neither is fully benchmarked yet.
Many tools designed for Visium work on Visium HD 8-um bins but may need parameter tuning.
Memory and compute requirements are 10-100x larger than standard Visium. Dask-backed AnnData or SpatialData is often necessary.
The Visium HD deep read covers practical considerations in detail.

Xenium pipeline¶

Xenium provides molecule-level coordinates with pre-designed or custom panels. Segmentation is the critical first step.

Xenium Explorer output (transcripts + DAPI + cell boundaries)
  |
  v
Cellpose2 / Baysor ---- cell segmentation (or use 10x default boundaries as starting point)
  |
  v
SpatialData / Squidpy ---- data loading + QC
  |
  v
BANKSY / GraphST ---- spatial domain identification
  |
  v
STELLAR / Tangram ---- cell type annotation (transfer from scRNA-seq reference)
  |
  v
CellChat v2 / COMMOT ---- cell-cell communication
  |
  v
Squidpy ---- neighborhood enrichment, co-occurrence, spatial statistics

Key decisions

Segmentation: The 10x-provided cell boundaries are reasonable for many analyses. Re-segmentation with Cellpose2 (using DAPI + membrane stains) or Baysor (transcript-based, no image needed) can improve results in dense tissues.
Cell typing: With targeted panels (100-5000 genes), direct annotation via marker-based approaches may outperform reference-based transfer. STELLAR is designed for spatial data and handles novel cell types.
CCC: COMMOT is specifically designed for imaging-based spatial data and uses optimal transport. CellChat v2 now supports spatial coordinates directly.

MERFISH / MERSCOPE pipeline¶

MERFISH/MERSCOPE data is structurally similar to Xenium but typically uses the Vizgen pipeline for initial processing.

MERSCOPE output (transcripts + cell boundaries)
  |
  v
Baysor / Cellpose2 ---- re-segmentation (Vizgen defaults are often suboptimal)
  |
  v
Squidpy / SpatialData ---- data loading + QC
  |
  v
GraphST / BANKSY ---- spatial domain identification
  |
  v
COMMOT ---- cell-cell communication (optimal transport on coordinates)
  |
  v
nnSVG / SPARK-X ---- spatially variable genes

MERSCOPE commercial status

10x Genomics acquired Vizgen in 2024. The MERSCOPE platform continues to operate but long-term product roadmap is uncertain. Existing data and workflows remain valid.

CosMx pipeline¶

CosMx provides single-molecule imaging with both RNA and protein readouts. NanoString (now part of Bruker) provides initial cell segmentation.

CosMx output (transcripts + cell boundaries)
  |
  v
InSituType ---- probabilistic cell typing (NanoString's method, well-suited to CosMx data)
  |
  v
SpaGCN / BANKSY ---- spatial domain identification
  |
  v
CellChat v2 ---- cell-cell communication
  |
  v
Squidpy ---- spatial statistics and neighborhood analysis

CosMx considerations

InSituType is optimized for CosMx data and often outperforms general-purpose methods on this platform.
CosMx supports simultaneous RNA + protein measurement, but most computational tools do not yet handle multi-modal CosMx data natively. Process RNA and protein separately, then integrate.
NanoString was acquired by Bruker in 2024; platform support continues under the Bruker Spatial Biology brand.

CODEX / PhenoCycler pipeline¶

Protein-based spatial data requires a fundamentally different analytical approach -- no gene expression matrices, no deconvolution. Analysis centers on cell phenotyping from marker intensities and neighborhood structure.

PhenoCycler images (multichannel fluorescence)
  |
  v
Mesmer / DeepCell ---- cell segmentation from nuclear + membrane markers
  |
  v
Intensity extraction ---- per-cell marker expression matrix
  |
  v
FlowSOM / Leiden clustering ---- cell phenotyping from marker profiles
  |
  v
Neighborhood analysis (Squidpy / ATHENA) ---- cellular neighborhood identification
  |
  v
Spatial statistics ---- co-occurrence, interaction scores, community detection

Key decisions

Segmentation: Mesmer (DeepCell) is the benchmark leader for protein imaging data. It handles variable marker quality better than classical watershed approaches.
Phenotyping: Manual gating is still common but does not scale. FlowSOM with expert-guided metaclustering offers a good balance. Avoid over-clustering.
Neighborhood analysis: This is where the unique value of spatial proteomics lies. Squidpy provides neighborhood enrichment tests; ATHENA offers more sophisticated niche identification.

Stereo-seq pipeline¶

Stereo-seq generates the largest spatial datasets (subcellular resolution across centimeter-scale tissue), requiring scalable tools throughout.

Raw data (~500 nm bins)
  |
  v
SAW (BGI pipeline) ---- alignment + counting
  |
  v
Bin aggregation (cell-bin or fixed-size bins) ---- typically 50-100 um bins for tissue-level, or cell-bin via segmentation
  |
  v
STAGATE / BANKSY ---- spatial domain identification (must handle millions of bins)
  |
  v
Cell2location / RCTD ---- deconvolution (if using larger bins)
  |
  v
nnSVG ---- spatially variable genes (subsampling likely necessary)

Scale challenges

Stereo-seq datasets can exceed 100 million bins. Most standard tools will fail without subsampling or chunked processing. STAGATE and BANKSY are among the few spatial domain methods that scale to this size. Use SpatialData or Dask-backed workflows.

Slide-seq V2 pipeline¶

Slide-seq V2 achieves 10-um resolution on fresh-frozen tissue with bead-based capture. The pipeline is similar to Visium but with higher resolution and noisier per-bead data.

Puck data (bead x gene matrix + coordinates)
  |
  v
Standard QC (Scanpy) ---- filter low-quality beads, normalize
  |
  v
RCTD / Cell2location ---- deconvolution (RCTD was originally developed for Slide-seq)
  |
  v
BayesSpace / STAGATE ---- spatial domain identification
  |
  v
nnSVG / SPARK-X ---- spatially variable genes

Slide-seq context

RCTD was originally developed and validated on Slide-seq data, making it a natural first choice for deconvolution on this platform. Slide-seq is primarily an academic technology; commercial adoption is limited.

GeoMx DSP pipeline¶

GeoMx provides region-of-interest (ROI) level data, not single-cell. Analysis resembles bulk RNA-seq with spatial annotation rather than spatial transcriptomics proper.

GeoMx output (ROI x gene/protein matrix)
  |
  v
GeomxTools (NanoString R package) ---- QC, normalization
  |
  v
Standard differential expression (DESeq2 / limma) ---- ROI-level comparisons
  |
  v
Spatial context is metadata ---- annotate ROIs with tissue region, distance to feature, etc.

GeoMx limitations

GeoMx is not single-cell and not truly spatially resolved at the cellular level. It is best suited for clinical FFPE samples where other technologies fail, or for hypothesis-driven ROI comparisons. Do not apply single-cell spatial methods to GeoMx data.

Part 2: What question are you asking?¶

Use this table to identify the right category of tool for a biological question, then follow the link to the detailed methods page.

Question	Tool category	Top picks	Methods page
What cell types are present and where?	Deconvolution / Cell typing	Cell2location, RCTD, Tangram, InSituType	Deconvolution
Which genes vary spatially?	SVG detection	nnSVG, SPARK-X, SpatialDE2	Spatially Variable Genes
What are the tissue domains / regions?	Spatial domain identification	BANKSY, GraphST, STAGATE, BayesSpace	Spatial Domains
How do cells communicate?	Cell-cell communication	COMMOT, CellChat v2, LIANA+	Cell-Cell Communication
What are the cellular neighborhoods?	Niche analysis	Squidpy, ATHENA, Nicheformer	Spatial Domains
Are genes differentially expressed between regions?	Spatial DE	GLISS, DestDE, SpatialDWLS	Spatial DE
Where are transcripts within cells?	Subcellular analysis	Bento, Ficture, subcellular segmentation	Subcellular Analysis
How do cell states change across space?	Spatial trajectories	stLearn, SpaceFlow, CellRank	Spatial Trajectories
How do I segment cells from images?	Cell segmentation	Cellpose2, Baysor, Mesmer/DeepCell	Cell Segmentation
How do I integrate spatial with scRNA-seq?	Multi-modal integration	Tangram, gimVI, SpatialData	Multi-modal Integration
Can I use a foundation model?	Foundation models	scGPT, Geneformer, Nicheformer	Foundation Models
How do I align serial sections in 3D?	3D reconstruction	PASTE, STalign, GPSA	3D Reconstruction

Part 3: Common pitfalls¶

Mistakes that waste months

Skipping deconvolution on Visium data. Each 55-um spot contains multiple cells. Treating spots as cells leads to chimeric expression profiles and misleading downstream results.
Using SpatialDE v1 for SVG detection. It is slow, has inflated false positive rates, and has been superseded by nnSVG and SPARK-X in every benchmark.
Ignoring segmentation quality on imaging data. Garbage segmentation propagates through every downstream step. Always visually inspect segmentation results on multiple tissue regions before proceeding.
Applying scRNA-seq CCC methods to spatial data without coordinates. CellChat v1 and CellPhoneDB do not use spatial coordinates. Use COMMOT, CellChat v2, or LIANA+ with spatial mode for coordinate-aware communication inference.
Running standard Visium tools on Visium HD without aggregation. Visium HD 2-um bins are not cells. Aggregate to 8-um bins or use Bin2Cell before applying standard pipelines.
Neglecting ambient RNA correction. Spatial platforms have significant ambient RNA contamination, especially Visium. SpotClean or similar correction should precede downstream analysis.

Part 4: Framework and infrastructure choices¶

Regardless of technology, spatial data benefits from standardized data structures and workflow frameworks.

Need	Recommended tool	Notes
Unified data structure	SpatialData	Handles images, points, shapes, and tables in a single object. Actively developed by the scverse consortium.
Spatial statistics toolkit	Squidpy	Neighborhood enrichment, co-occurrence, spatial autocorrelation, and more. Integrates with AnnData/SpatialData.
Visualization	Napari + napari-spatialdata	Interactive visualization of spatial data layers. Essential for QC and segmentation validation.
Scalable processing	Dask-backed AnnData	Required for Visium HD and Stereo-seq scale data.
Pipeline orchestration	Nextflow / Snakemake	For reproducible, end-to-end spatial analysis pipelines.