Analysis Pipeline Decision Tree¶
This is the most practically useful page on the site. It answers two questions: "I have data from technology X -- what tools should I use?" and "I want to answer question Y -- what category of tool do I need?"
Opinionated recommendations
These pipelines reflect a synthesis of benchmarks, community adoption, and practical experience as of early 2026. The field moves fast. Where a clear winner exists, it is named; where the choice is context-dependent, alternatives are listed with trade-offs.
Part 1: What technology do you have?¶
Use this table to jump to the technology-specific pipeline that matches your data.
| Technology | Type | Resolution | Recommended pipeline |
|---|---|---|---|
| Visium | Sequencing | 55 um (multi-cell) | Visium pipeline |
| Visium HD | Sequencing | 2 um (bins) | Visium HD pipeline |
| Xenium | Imaging | Subcellular | Xenium pipeline |
| MERFISH / MERSCOPE | Imaging | Subcellular | MERFISH pipeline |
| CosMx SMI | Imaging | Subcellular | CosMx pipeline |
| CODEX / PhenoCycler | Protein | Subcellular | CODEX pipeline |
| Stereo-seq | Sequencing | ~500 nm (bins) | Stereo-seq pipeline |
| Slide-seq V2 | Sequencing | 10 um | Slide-seq pipeline |
| GeoMx DSP | Sequencing/Protein | ROI-level | GeoMx pipeline |
Visium pipeline¶
The most mature ecosystem. Spot-level resolution (55 um) means each spot captures 1-10 cells, making deconvolution essential for cell-type inference.
Raw data
|
v
SpaceRanger (10x) ---- alignment + counting
|
v
SpotClean / SpatialDWLS ---- ambient RNA correction
|
v
Scanpy/Squidpy ---- standard QC, filtering, normalization, HVG selection
|
v
BayesSpace / STAGATE / SpaGCN ---- spatial domain identification
|
v
Cell2location / RCTD / Tangram ---- cell type deconvolution (requires scRNA-seq reference)
|
v
nnSVG / SPARK-X ---- spatially variable gene detection
|
v
CellChat v2 / LIANA+ ---- cell-cell communication inference
|
v
Squidpy / stLearn ---- neighborhood enrichment, spatial statistics
Key decisions
- Deconvolution method: Cell2location is the benchmark leader for Visium but requires a good scRNA-seq reference. RCTD is faster and more robust to reference mismatches. Tangram works well when the reference is from the same tissue.
- Spatial domains: BayesSpace is Visium-native and leverages the grid structure. STAGATE uses graph attention networks and generalizes better. SpaGCN is simpler but less accurate in benchmarks.
- SVG detection: nnSVG is the current benchmark leader. SPARK-X is faster for large datasets. Avoid older methods like SpatialDE v1 (slow, inflated false positives).
Visium HD pipeline¶
Visium HD generates 2-um bins that must be aggregated before most analysis. The ecosystem is still maturing -- expect gaps and rough edges.
Raw data (2-um bins)
|
v
SpaceRanger (10x) ---- alignment + counting at 2-um and 8-um bins
|
v
Bin2Cell / ENACT ---- bin-to-cell aggregation using H&E image
|
v
Standard QC (Scanpy/Squidpy) ---- filtering, normalization
|
v
BANKSY / STAGATE ---- spatial domain identification (scalable methods required)
|
v
Cell2location / RCTD ---- deconvolution (if using 8-um bins without Bin2Cell)
|
v
nnSVG ---- spatially variable genes (subsample if needed for speed)
|
v
CellChat v2 / COMMOT ---- cell-cell communication
Visium HD is still emerging
- Bin2Cell (Kleshchevnikov et al., 2024) and ENACT are the leading bin-to-cell methods but neither is fully benchmarked yet.
- Many tools designed for Visium work on Visium HD 8-um bins but may need parameter tuning.
- Memory and compute requirements are 10-100x larger than standard Visium. Dask-backed AnnData or SpatialData is often necessary.
- The Visium HD deep read covers practical considerations in detail.
Xenium pipeline¶
Xenium provides molecule-level coordinates with pre-designed or custom panels. Segmentation is the critical first step.
Xenium Explorer output (transcripts + DAPI + cell boundaries)
|
v
Cellpose2 / Baysor ---- cell segmentation (or use 10x default boundaries as starting point)
|
v
SpatialData / Squidpy ---- data loading + QC
|
v
BANKSY / GraphST ---- spatial domain identification
|
v
STELLAR / Tangram ---- cell type annotation (transfer from scRNA-seq reference)
|
v
CellChat v2 / COMMOT ---- cell-cell communication
|
v
Squidpy ---- neighborhood enrichment, co-occurrence, spatial statistics
Key decisions
- Segmentation: The 10x-provided cell boundaries are reasonable for many analyses. Re-segmentation with Cellpose2 (using DAPI + membrane stains) or Baysor (transcript-based, no image needed) can improve results in dense tissues.
- Cell typing: With targeted panels (100-5000 genes), direct annotation via marker-based approaches may outperform reference-based transfer. STELLAR is designed for spatial data and handles novel cell types.
- CCC: COMMOT is specifically designed for imaging-based spatial data and uses optimal transport. CellChat v2 now supports spatial coordinates directly.
MERFISH / MERSCOPE pipeline¶
MERFISH/MERSCOPE data is structurally similar to Xenium but typically uses the Vizgen pipeline for initial processing.
MERSCOPE output (transcripts + cell boundaries)
|
v
Baysor / Cellpose2 ---- re-segmentation (Vizgen defaults are often suboptimal)
|
v
Squidpy / SpatialData ---- data loading + QC
|
v
GraphST / BANKSY ---- spatial domain identification
|
v
COMMOT ---- cell-cell communication (optimal transport on coordinates)
|
v
nnSVG / SPARK-X ---- spatially variable genes
MERSCOPE commercial status
10x Genomics acquired Vizgen in 2024. The MERSCOPE platform continues to operate but long-term product roadmap is uncertain. Existing data and workflows remain valid.
CosMx pipeline¶
CosMx provides single-molecule imaging with both RNA and protein readouts. NanoString (now part of Bruker) provides initial cell segmentation.
CosMx output (transcripts + cell boundaries)
|
v
InSituType ---- probabilistic cell typing (NanoString's method, well-suited to CosMx data)
|
v
SpaGCN / BANKSY ---- spatial domain identification
|
v
CellChat v2 ---- cell-cell communication
|
v
Squidpy ---- spatial statistics and neighborhood analysis
CosMx considerations
- InSituType is optimized for CosMx data and often outperforms general-purpose methods on this platform.
- CosMx supports simultaneous RNA + protein measurement, but most computational tools do not yet handle multi-modal CosMx data natively. Process RNA and protein separately, then integrate.
- NanoString was acquired by Bruker in 2024; platform support continues under the Bruker Spatial Biology brand.
CODEX / PhenoCycler pipeline¶
Protein-based spatial data requires a fundamentally different analytical approach -- no gene expression matrices, no deconvolution. Analysis centers on cell phenotyping from marker intensities and neighborhood structure.
PhenoCycler images (multichannel fluorescence)
|
v
Mesmer / DeepCell ---- cell segmentation from nuclear + membrane markers
|
v
Intensity extraction ---- per-cell marker expression matrix
|
v
FlowSOM / Leiden clustering ---- cell phenotyping from marker profiles
|
v
Neighborhood analysis (Squidpy / ATHENA) ---- cellular neighborhood identification
|
v
Spatial statistics ---- co-occurrence, interaction scores, community detection
Key decisions
- Segmentation: Mesmer (DeepCell) is the benchmark leader for protein imaging data. It handles variable marker quality better than classical watershed approaches.
- Phenotyping: Manual gating is still common but does not scale. FlowSOM with expert-guided metaclustering offers a good balance. Avoid over-clustering.
- Neighborhood analysis: This is where the unique value of spatial proteomics lies. Squidpy provides neighborhood enrichment tests; ATHENA offers more sophisticated niche identification.
Stereo-seq pipeline¶
Stereo-seq generates the largest spatial datasets (subcellular resolution across centimeter-scale tissue), requiring scalable tools throughout.
Raw data (~500 nm bins)
|
v
SAW (BGI pipeline) ---- alignment + counting
|
v
Bin aggregation (cell-bin or fixed-size bins) ---- typically 50-100 um bins for tissue-level, or cell-bin via segmentation
|
v
STAGATE / BANKSY ---- spatial domain identification (must handle millions of bins)
|
v
Cell2location / RCTD ---- deconvolution (if using larger bins)
|
v
nnSVG ---- spatially variable genes (subsampling likely necessary)
Scale challenges
Stereo-seq datasets can exceed 100 million bins. Most standard tools will fail without subsampling or chunked processing. STAGATE and BANKSY are among the few spatial domain methods that scale to this size. Use SpatialData or Dask-backed workflows.
Slide-seq V2 pipeline¶
Slide-seq V2 achieves 10-um resolution on fresh-frozen tissue with bead-based capture. The pipeline is similar to Visium but with higher resolution and noisier per-bead data.
Puck data (bead x gene matrix + coordinates)
|
v
Standard QC (Scanpy) ---- filter low-quality beads, normalize
|
v
RCTD / Cell2location ---- deconvolution (RCTD was originally developed for Slide-seq)
|
v
BayesSpace / STAGATE ---- spatial domain identification
|
v
nnSVG / SPARK-X ---- spatially variable genes
Slide-seq context
RCTD was originally developed and validated on Slide-seq data, making it a natural first choice for deconvolution on this platform. Slide-seq is primarily an academic technology; commercial adoption is limited.
GeoMx DSP pipeline¶
GeoMx provides region-of-interest (ROI) level data, not single-cell. Analysis resembles bulk RNA-seq with spatial annotation rather than spatial transcriptomics proper.
GeoMx output (ROI x gene/protein matrix)
|
v
GeomxTools (NanoString R package) ---- QC, normalization
|
v
Standard differential expression (DESeq2 / limma) ---- ROI-level comparisons
|
v
Spatial context is metadata ---- annotate ROIs with tissue region, distance to feature, etc.
GeoMx limitations
GeoMx is not single-cell and not truly spatially resolved at the cellular level. It is best suited for clinical FFPE samples where other technologies fail, or for hypothesis-driven ROI comparisons. Do not apply single-cell spatial methods to GeoMx data.
Part 2: What question are you asking?¶
Use this table to identify the right category of tool for a biological question, then follow the link to the detailed methods page.
| Question | Tool category | Top picks | Methods page |
|---|---|---|---|
| What cell types are present and where? | Deconvolution / Cell typing | Cell2location, RCTD, Tangram, InSituType | Deconvolution |
| Which genes vary spatially? | SVG detection | nnSVG, SPARK-X, SpatialDE2 | Spatially Variable Genes |
| What are the tissue domains / regions? | Spatial domain identification | BANKSY, GraphST, STAGATE, BayesSpace | Spatial Domains |
| How do cells communicate? | Cell-cell communication | COMMOT, CellChat v2, LIANA+ | Cell-Cell Communication |
| What are the cellular neighborhoods? | Niche analysis | Squidpy, ATHENA, Nicheformer | Spatial Domains |
| Are genes differentially expressed between regions? | Spatial DE | GLISS, DestDE, SpatialDWLS | Spatial DE |
| Where are transcripts within cells? | Subcellular analysis | Bento, Ficture, subcellular segmentation | Subcellular Analysis |
| How do cell states change across space? | Spatial trajectories | stLearn, SpaceFlow, CellRank | Spatial Trajectories |
| How do I segment cells from images? | Cell segmentation | Cellpose2, Baysor, Mesmer/DeepCell | Cell Segmentation |
| How do I integrate spatial with scRNA-seq? | Multi-modal integration | Tangram, gimVI, SpatialData | Multi-modal Integration |
| Can I use a foundation model? | Foundation models | scGPT, Geneformer, Nicheformer | Foundation Models |
| How do I align serial sections in 3D? | 3D reconstruction | PASTE, STalign, GPSA | 3D Reconstruction |
Part 3: Common pitfalls¶
Mistakes that waste months
- Skipping deconvolution on Visium data. Each 55-um spot contains multiple cells. Treating spots as cells leads to chimeric expression profiles and misleading downstream results.
- Using SpatialDE v1 for SVG detection. It is slow, has inflated false positive rates, and has been superseded by nnSVG and SPARK-X in every benchmark.
- Ignoring segmentation quality on imaging data. Garbage segmentation propagates through every downstream step. Always visually inspect segmentation results on multiple tissue regions before proceeding.
- Applying scRNA-seq CCC methods to spatial data without coordinates. CellChat v1 and CellPhoneDB do not use spatial coordinates. Use COMMOT, CellChat v2, or LIANA+ with spatial mode for coordinate-aware communication inference.
- Running standard Visium tools on Visium HD without aggregation. Visium HD 2-um bins are not cells. Aggregate to 8-um bins or use Bin2Cell before applying standard pipelines.
- Neglecting ambient RNA correction. Spatial platforms have significant ambient RNA contamination, especially Visium. SpotClean or similar correction should precede downstream analysis.
Part 4: Framework and infrastructure choices¶
Regardless of technology, spatial data benefits from standardized data structures and workflow frameworks.
| Need | Recommended tool | Notes |
|---|---|---|
| Unified data structure | SpatialData | Handles images, points, shapes, and tables in a single object. Actively developed by the scverse consortium. |
| Spatial statistics toolkit | Squidpy | Neighborhood enrichment, co-occurrence, spatial autocorrelation, and more. Integrates with AnnData/SpatialData. |
| Visualization | Napari + napari-spatialdata | Interactive visualization of spatial data layers. Essential for QC and segmentation validation. |
| Scalable processing | Dask-backed AnnData | Required for Visium HD and Stereo-seq scale data. |
| Pipeline orchestration | Nextflow / Snakemake | For reproducible, end-to-end spatial analysis pipelines. |