CODEX Niche Analysis¶
This page documents how spatial niche analysis is performed on protein-based spatial data (CODEX, PhenoCycler, MIBI-TOF, IMC), based on a review of 10 representative papers. Since CODEX measures proteins (~30-60 markers) rather than transcripts, the niche analysis workflow differs fundamentally from RNA-based platforms.
The CODEX Workflow¶
The standard CODEX niche analysis pipeline has 5 steps:
- Cell Segmentation — identify individual cells from multiplexed images
- Cell Typing — classify cells by protein marker expression
- Niche Identification — define spatial neighborhoods by cell-type composition
- Niche Characterization — functional analysis via marker co-expression and spatial enrichment
- Clinical Correlation — link niches to patient outcomes or treatment response
No cell-cell communication step
Unlike RNA-based workflows, CODEX analysis does not include a ligand-receptor communication step. CCC tools (NicheCompass, CellChat, COMMOT) rely on gene-symbol L-R databases that do not exist for protein panels. Functional characterization on protein data uses marker intensity analysis, cell-type interaction frequencies, and spatial enrichment instead.
Tools at Each Step¶
| Step | Common Tools | Notes |
|---|---|---|
| Segmentation | Mesmer/DeepCell, CellProfiler, CODEX toolkit, Steinbock | Deep learning pipelines (Mesmer) now standard; older papers used CellProfiler |
| Cell Typing | FlowSOM, X-shift/VorteX, Leiden (scanpy) | FlowSOM and X-shift are cytometry-native; Leiden increasingly used |
| Niche ID | k-NN frequency vectors + K-means (Schurch-style), CellCharter, CytoCommunity, ColonyMap | k=10 KNN with cell-type frequency vectors is the dominant approach |
| Characterization | Marker co-expression, spatial enrichment, cell-cell interaction scoring, MISTy | No L-R tools; MISTy is the only computational niche characterization tool verified on protein data |
| Clinical Correlation | Kaplan-Meier, Cox regression, linear mixed models | Standard survival analysis |
The Dominant Niche Method: k-NN Cellular Neighborhoods¶
Most CODEX niche papers (6/10 reviewed) use a variant of the approach introduced by Schurch et al. (2020):
- Build a k-nearest-neighbor graph (typically k=10) from cell coordinates
- For each cell, compute a frequency vector: the proportion of each cell type among its k neighbors
- Cluster these frequency vectors (K-means or Leiden) into cellular neighborhoods (CNs)
This is a composition-based niche definition — niches are defined by the mix of cell types present, not by gene expression or signaling.
Computational tools that automate this:
| Tool | What It Adds Beyond k-NN + K-means | Paper-Tested On |
|---|---|---|
| CellCharter | Multi-scale GNN embeddings + GMM with automatic K selection | CODEX (CRC), IMC (breast), CosMx, MERSCOPE, Visium |
| CytoCommunity | Supervised GNN that uses condition labels to find disease-relevant niches | CODEX (CRC), MIBI-TOF (TNBC) |
| ColonyMap | Detects spatially assembled "colonies" (densely co-localized communities) | CODEX/PhenoCycler (SCLC) |
CellCharter is the recommended starting point
CellCharter is the most versatile option: it works on both RNA and protein platforms, handles multi-sample integration, and automatically selects the number of niches. It is also the tool used by other hackathon team members on Xenium, enabling direct cross-modality comparison.
Literature Review: 10 Representative Papers¶
Summary Table¶
| Paper | Platform | Cancer | Markers | Niche Method | Clinical Linkage | Journal |
|---|---|---|---|---|---|---|
| Goltsev et al. 2018 | CODEX | Mouse spleen | 30 | i-niche (Delaunay + K-means) | No (mouse) | Cell |
| Schurch et al. 2020 | CODEX | Colorectal | 56 | k=10 KNN -> 9 CNs | Yes (survival) | Cell |
| Risom et al. 2022 | MIBI-TOF | Breast (DCIS->IBC) | 37 | TME state mapping | Yes (recurrence) | Cell |
| Danenberg et al. 2022 | IMC | Breast (n=693) | 37 | Graph community detection | Yes (survival) | Nature Genetics |
| Hoch et al. 2022 | IMC | Melanoma | ~40 | Chemokine patch + TLS | Yes (immunotherapy) | Science Immunology |
| Wang et al. 2023 | IMC | TNBC (n=243) | 43 | Cell interaction scoring | Yes (immunotherapy) | Nature |
| CellCharter 2024 | CODEX/IMC | Lung, CRC | 30+ | GNN + GMM | Yes (prognosis) | Nature Genetics |
| CytoCommunity 2024 | CODEX + MIBI-TOF | CRC, TNBC | 56 | GNN graph partitioning | Yes (risk) | Nature Methods |
| Matusiak et al. 2024 | CODEX | Colon + breast | 36 | k=10 KNN -> 30 clusters | Yes (survival) | Cancer Discovery |
| Chen et al. 2025 | CODEX/PhenoCycler | SCLC | 35 | ColonyMap | Yes (immunotherapy) | Cancer Cell |
Paper Details¶
Goltsev et al. 2018 — The CODEX Origin Paper¶
Citation: Goltsev Y, Samusik N, Kennedy-Darling J, et al. Deep profiling of mouse splenic architecture with CODEX multiplexed imaging. Cell 174: 968-981 (2018).
Platform: CODEX, 30 markers on mouse spleen (not cancer — included because it introduced the i-niche concept and computational framework that all subsequent CODEX papers build on).
Workflow: - Segmentation: CellProfiler + custom pipeline - Cell typing: X-shift/VorteX -> 27 phenotypic clusters - Niche ID: Delaunay triangulation -> i-niche frequency vectors -> K-means (K=100) - Characterization: Mapping i-niches to anatomical compartments; healthy vs. lupus comparison
Key finding: A cell's spatial neighborhood strongly influences its protein expression levels, independent of cell identity. This established that spatial context is a determinant of cell state.
Schurch et al. 2020 — Cellular Neighborhoods in CRC¶
Citation: Schurch CM, Bhate SS, Barlow GL, et al. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell 182: 1341-1359 (2020).
Platform: CODEX, 56 markers on colorectal cancer TMAs (140 regions, 35 patients).
Workflow: - Segmentation: CODEX toolkit - Cell typing: X-shift/VorteX -> 26 cell types - Niche ID: k=10 KNN -> cell-type frequency vectors -> K-means -> 9 cellular neighborhoods - Characterization: Tucker tensor decomposition, CN coupling/fragmentation metrics - Clinical: Cox regression, Kaplan-Meier survival
Key finding: Nine conserved cellular neighborhoods identified. Two patient archetypes: Crohn's-like reaction (TLS-rich, better prognosis) vs. diffuse inflammatory infiltration (no TLS, worse). PD-1+CD4+ T cells in granulocyte neighborhoods predicted survival in high-risk patients.
The origin of CODEX niche analysis
This paper established the k=10 KNN -> frequency vector -> K-means pipeline that most subsequent CODEX papers follow. It remains the most cited spatial niche paper in the protein spatial field.
Risom et al. 2022 — DCIS-to-Invasive Breast Cancer Transition¶
Citation: Risom T, Glass DR, Averbukh I, et al. Transition to invasive breast cancer is associated with progressive changes in the structure and composition of tumor stroma. Cell 185: 299-310 (2022).
Platform: MIBI-TOF, 37 markers on breast cancer surgical resections (79 cases).
Workflow: - Segmentation: Mesmer/DeepCell - Cell typing: FlowSOM -> 16 cell classes - Niche ID: Four TME states via spatial compartment mapping (myoepithelial, stromal, immune) - Characterization: Myoepithelial disruption quantification, stromal activation markers
Key finding: Myoepithelial disruption was paradoxically more advanced in DCIS that did not progress to invasive cancer, suggesting protective remodeling. Spatial proteomic signature predicted invasive recurrence from DCIS.
Danenberg et al. 2022 — Breast Cancer TME Structures (METABRIC)¶
Citation: Danenberg E, Bardwell H, Zanotelli VRT, et al. Breast tumor microenvironment structures are associated with genomic features and clinical outcome. Nature Genetics 54: 660-669 (2022).
Platform: IMC, 37 markers on 693 breast tumors from the METABRIC cohort.
Workflow: - Segmentation: ilastik + CellProfiler - Cell typing: FlowSOM -> epithelial, stromal, leukocyte lineages - Niche ID: Graph community detection -> 10 recurrent multicellular structures - Characterization: Enrichment for proliferative, regulatory, and exhausted immune markers
Key finding: Ten recurrent TME structures identified. A "suppressed expansion" structure (FoxP3+ Tregs + PD-1+ exhausted T cells + proliferating cells) predicted poor outcome in ER+ breast cancer, independently of genomic classifiers. Spatial architecture adds prognostic signal beyond PAM50 subtypes.
Hoch et al. 2022 — Chemokine Niches in Melanoma¶
Citation: Hoch T, Schulz D, Eling N, et al. Multiplexed imaging mass cytometry of the chemokine milieus in melanoma characterizes features of the response to immunotherapy. Science Immunology 7: eabk1692 (2022).
Platform: IMC with protein + chemokine RNA co-detection (~40 markers), 69 metastatic melanoma patients.
Workflow: - Segmentation: Steinbock - Cell typing: Protein marker clustering -> CD8+ T cells, macrophages, B cells, tumor cells, etc. - Niche ID: Chemokine "patch" spatial enrichment + TLS niche co-localization - Characterization: RNA-protein co-detection attributed chemokine sources to specific cell types within niches
Key finding: TLS-like chemokine niches enriched for TCF7+ naive-like T cells predicted checkpoint blockade response. CXCL13+ exhausted T cells co-localized with CXCL9/10 patches in infiltrated tumors.
Wang et al. 2023 — Spatial Predictors of Immunotherapy Response in TNBC¶
Citation: Wang XQ, Danenberg E, Huang CS, et al. Spatial predictors of immunotherapy response in triple-negative breast cancer. Nature 621: 868-876 (2023).
Platform: IMC, 43 markers on TNBC (243 patients from a randomized neoadjuvant ICB trial, with baseline + on-treatment + post-treatment biopsies).
Workflow: - Segmentation: Semi-supervised epithelial/TME separation - Cell typing: Expression clustering -> 17 epithelial + 20 TME phenotypes - Niche ID: Cell density + phenotype-specific interaction scoring - Characterization: Longitudinal spatial remodeling across 3 biopsy timepoints
Key finding: Proliferating CD8+TCF1+ T cells and MHCII+ cancer cells were the dominant baseline predictors of ICB response. Combining baseline and on-treatment spatial features improved response prediction. This is the most clinically direct protein spatial niche paper — it links spatial organization to treatment response in a randomized trial.
CellCharter 2024 — Multi-Platform Niche Identification¶
Citation: Varrone M, Tavernari D, Santamaria-Martinez A, Walsh LA, Ciriello G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nature Genetics 56: 74-84 (2024).
Platform: Paper-tested on CODEX (CRC), IMC (breast cancer), CosMx, MERSCOPE, Visium.
Workflow: - Niche ID: GNN embeddings of spatial neighborhoods -> GMM clustering with stability-based model selection - Multi-scale: captures niches at different spatial resolutions simultaneously - Cross-platform: same method on both RNA and protein data
Key finding (CODEX/IMC): Identified tumor-associated neutrophil niches co-localizing with hypoxia markers, associated with poor prognosis in lung cancer.
CytoCommunity 2024 — Supervised Niche Discovery¶
Citation: Hu Y, Rong J, Xu Y, et al. Unsupervised and supervised discovery of tissue cellular neighborhoods from cell phenotypes. Nature Methods 21: 267-278 (2024).
Platform: Paper-tested on CODEX (CRC, 56 markers) and MIBI-TOF (TNBC).
Workflow: - Niche ID: GNN-based graph partitioning; supervised mode uses sample-level condition labels to discover disease-relevant tissue cellular neighborhoods (TCNs) - Characterization: Hypergeometric cell-type enrichment per TCN
Key finding: In high-risk CRC, identified granulocyte-enriched and CAF-enriched TCNs absent in low-risk tumors. TCN composition correlated with risk stratification and survival.
Matusiak et al. 2024 — Macrophage Niches in Colon and Breast Cancer¶
Citation: Matusiak M, Hickey JW, van IJzendoorn DGP, et al. Spatially segregated macrophage populations predict distinct outcomes in colon cancer. Cancer Discovery 14: 1418-1439 (2024).
Platform: CODEX, 36 markers on colon (32 cases) and breast (36 cases) cancer TMAs.
Workflow: - Segmentation: CODEX toolkit - Cell typing: Leiden clustering (scanpy) + manual annotation - Niche ID: k=10 KNN -> 30 neighborhood clusters (Schurch-style) - Characterization: Linear mixed-effects models; IHC and IF validation
Key finding: Five spatially distinct macrophage populations occupied conserved niche architectures: FOLR2+ in plasma cell niches, LYVE1+ in perivascular niches, SPP1+ in hypoxic zones, NLRP3+ with neutrophils. IL4I1+ macrophages predicted good outcomes; SPP1+ predicted poor outcomes in colon cancer.
Chen et al. 2025 — Immune Colony Niches in SCLC¶
Citation: Chen H, Deng C, Gao J, et al. Integrative spatial analysis reveals tumor heterogeneity and immune colony niche related to clinical outcomes in small cell lung cancer. Cancer Cell 43: 519-536 (2025).
Platform: CODEX/PhenoCycler, 35 markers on SCLC (165 patients, 267 images, >9.3M cells).
Workflow: - Segmentation: Deep learning pipeline - Cell typing: Unsupervised clustering -> tumor subtypes, macrophages, T cells, NK/NKT, B cells, stroma - Niche ID: ColonyMap — novel algorithm detecting densely co-localized cellular communities - Characterization: Multi-omics integration (genomics + CODEX)
Key finding: An antitumoral immune colony niche (MT2: macrophages + CD8+ T cells + NKT cells) predicted immunotherapy response and superior survival. ColonyMap-derived spatial features outperformed standard cell-type abundance metrics.
Patterns Across Papers¶
What's consistent¶
- Composition-based niche ID dominates: 8/10 papers define niches by cell-type composition of neighborhoods
- k=10 is standard: Most papers use k=10 nearest neighbors for neighborhood construction
- Clinical linkage is expected: 9/10 papers connect niches to patient outcomes or treatment response
- Segmentation is solved: Deep learning (Mesmer) or platform-native tools; not a bottleneck
- No CCC tools used: Zero papers apply ligand-receptor communication analysis to protein data
What varies¶
- Niche granularity: From 4 TME states (Risom) to 100 i-niches (Goltsev) — no consensus on optimal resolution
- Cell typing method: X-shift (older), FlowSOM (cytometry standard), Leiden (newer, scanpy-based)
- Characterization approach: Ranges from simple enrichment tests to tensor decomposition to longitudinal spatial remodeling
What's missing¶
- No standardized functional characterization: Unlike RNA platforms (where NicheCompass/Niche-DE provide structured analysis), protein niche characterization is ad hoc
- No cross-modality validation: Only CellCharter has been applied to both RNA and protein data from the same tissue
- Limited panel overlap: Each study uses a different antibody panel, making cross-study comparison difficult
Recommended CODEX Workflow for the Hackathon¶
Based on the literature review, the recommended workflow for CODEX data:
| Step | Tool | Rationale |
|---|---|---|
| Segmentation | Mesmer/DeepCell or platform-provided | Standard; deep learning preferred |
| Cell Typing | Leiden clustering (scanpy) | Consistent with team's RNA workflow; scanpy ecosystem |
| Niche ID | CellCharter | Paper-tested on CODEX; same tool as RNA team -> enables cross-modality comparison |
| Characterization | Marker intensity analysis + spatial enrichment + MISTy | MISTy is the only computational tool verified on protein spatial data for niche characterization |
| Clinical Correlation | Kaplan-Meier + Cox regression | Standard |
Cross-modality concordance
Using CellCharter on both CODEX and Xenium (from the same tissue in D1) enables the key hackathon question: do RNA niches and protein niches agree? This is directly modeled on CellCharter's own paper, which demonstrated cross-platform niche analysis.
See also: Data Compatibility | Tool Selection | Literature Review | CellCharter Deep Read