16 KiB
Graph Construction & Spatial Analysis
Overview
PathML provides tools for constructing spatial graphs from tissue images to represent cellular and tissue-level relationships. Graph-based representations enable sophisticated spatial analysis, including neighborhood analysis, cell-cell interaction studies, and graph neural network applications. These graphs capture both morphological features and spatial topology for downstream computational analysis.
Graph Types
PathML supports construction of multiple graph types:
Cell Graphs
- Nodes represent individual cells
- Edges represent spatial proximity or biological interactions
- Node features include morphology, marker expression, cell type
- Suitable for single-cell spatial analysis
Tissue Graphs
- Nodes represent tissue regions or superpixels
- Edges represent spatial adjacency
- Node features include tissue composition, texture features
- Suitable for tissue-level spatial patterns
Spatial Transcriptomics Graphs
- Nodes represent spatial spots or cells
- Edges encode spatial relationships
- Node features include gene expression profiles
- Suitable for spatial omics analysis
Graph Construction Workflow
From Segmentation to Graphs
Convert nucleus or cell segmentation results into spatial graphs:
from pathml.graph import CellGraph
from pathml.preprocessing import Pipeline, SegmentMIF
import numpy as np
# 1. Perform cell segmentation
pipeline = Pipeline([
SegmentMIF(
nuclear_channel='DAPI',
cytoplasm_channel='CD45',
model='mesmer'
)
])
pipeline.run(slide)
# 2. Extract instance segmentation mask
inst_map = slide.masks['cell_segmentation']
# 3. Build cell graph
cell_graph = CellGraph.from_instance_map(
inst_map,
image=slide.image, # Optional: for extracting visual features
connectivity='delaunay', # 'knn', 'radius', or 'delaunay'
k=5, # For knn: number of neighbors
radius=50 # For radius: distance threshold in pixels
)
# 4. Access graph components
nodes = cell_graph.nodes # Node features
edges = cell_graph.edges # Edge list
adjacency = cell_graph.adjacency_matrix # Adjacency matrix
Connectivity Methods
K-Nearest Neighbors (KNN):
# Connect each cell to its k nearest neighbors
graph = CellGraph.from_instance_map(
inst_map,
connectivity='knn',
k=5 # Number of neighbors
)
- Fixed degree per node
- Captures local neighborhoods
- Simple and interpretable
Radius-based:
# Connect cells within a distance threshold
graph = CellGraph.from_instance_map(
inst_map,
connectivity='radius',
radius=100, # Maximum distance in pixels
distance_metric='euclidean' # or 'manhattan', 'chebyshev'
)
- Variable degree based on density
- Biologically motivated (interaction range)
- Captures physical proximity
Delaunay Triangulation:
# Connect cells using Delaunay triangulation
graph = CellGraph.from_instance_map(
inst_map,
connectivity='delaunay'
)
- Creates connected graph from spatial positions
- No isolated nodes (in convex hull)
- Captures spatial tessellation
Contact-based:
# Connect cells with touching boundaries
graph = CellGraph.from_instance_map(
inst_map,
connectivity='contact',
dilation=2 # Dilate boundaries to capture near-contacts
)
- Physical cell-cell contacts
- Most biologically direct
- Sparse edges for separated cells
Node Features
Morphological Features
Extract shape and size features for each cell:
from pathml.graph import extract_morphology_features
# Compute morphological features
morphology_features = extract_morphology_features(
inst_map,
features=[
'area', # Cell area in pixels
'perimeter', # Cell perimeter
'eccentricity', # Shape elongation
'solidity', # Convexity measure
'major_axis_length',
'minor_axis_length',
'orientation' # Cell orientation angle
]
)
# Add to graph
cell_graph.add_node_features(morphology_features, feature_names=['area', 'perimeter', ...])
Available morphological features:
- Area - Number of pixels
- Perimeter - Boundary length
- Eccentricity - 0 (circle) to 1 (line)
- Solidity - Area / convex hull area
- Circularity - 4π × area / perimeter²
- Major/Minor axis - Lengths of fitted ellipse axes
- Orientation - Angle of major axis
- Extent - Area / bounding box area
Intensity Features
Extract marker expression or intensity statistics:
from pathml.graph import extract_intensity_features
# Extract mean marker intensities per cell
intensity_features = extract_intensity_features(
inst_map,
image=multichannel_image, # Shape: (H, W, C)
channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
statistics=['mean', 'std', 'median', 'max']
)
# Add to graph
cell_graph.add_node_features(
intensity_features,
feature_names=['DAPI_mean', 'CD3_mean', ...]
)
Available statistics:
- mean - Average intensity
- median - Median intensity
- std - Standard deviation
- max - Maximum intensity
- min - Minimum intensity
- quantile_25/75 - Quartiles
Texture Features
Compute texture descriptors for each cell region:
from pathml.graph import extract_texture_features
# Haralick texture features
texture_features = extract_texture_features(
inst_map,
image=grayscale_image,
features='haralick', # or 'lbp', 'gabor'
distance=1,
angles=[0, np.pi/4, np.pi/2, 3*np.pi/4]
)
cell_graph.add_node_features(texture_features)
Cell Type Annotations
Add cell type labels from classification:
# From ML model predictions
cell_types = hovernet_type_predictions # Array of cell type IDs
cell_graph.add_node_features(
cell_types,
feature_names=['cell_type']
)
# One-hot encode cell types
cell_type_onehot = one_hot_encode(cell_types, num_classes=5)
cell_graph.add_node_features(
cell_type_onehot,
feature_names=['type_epithelial', 'type_inflammatory', ...]
)
Edge Features
Spatial Distance
Compute edge features based on spatial relationships:
from pathml.graph import compute_edge_distances
# Add pairwise distances as edge features
distances = compute_edge_distances(
cell_graph,
metric='euclidean' # or 'manhattan', 'chebyshev'
)
cell_graph.add_edge_features(distances, feature_names=['distance'])
Interaction Features
Model biological interactions between cell types:
from pathml.graph import compute_interaction_features
# Cell type co-occurrence along edges
interaction_features = compute_interaction_features(
cell_graph,
cell_types=cell_type_labels,
interaction_type='categorical' # or 'numerical'
)
cell_graph.add_edge_features(interaction_features)
Graph-Level Features
Aggregate features for entire graph:
from pathml.graph import compute_graph_features
# Topological features
graph_features = compute_graph_features(
cell_graph,
features=[
'num_nodes',
'num_edges',
'average_degree',
'clustering_coefficient',
'average_path_length',
'diameter'
]
)
# Cell composition features
composition = cell_graph.compute_cell_type_composition(
cell_type_labels,
normalize=True # Proportions
)
Spatial Analysis
Neighborhood Analysis
Analyze cell neighborhoods and microenvironments:
from pathml.graph import analyze_neighborhoods
# Characterize neighborhoods around each cell
neighborhoods = analyze_neighborhoods(
cell_graph,
cell_types=cell_type_labels,
radius=100, # Neighborhood radius
metrics=['diversity', 'density', 'composition']
)
# Neighborhood diversity (Shannon entropy)
diversity = neighborhoods['diversity']
# Cell type composition in each neighborhood
composition = neighborhoods['composition'] # (n_cells, n_cell_types)
Spatial Clustering
Identify spatial clusters of cell types:
from pathml.graph import spatial_clustering
import matplotlib.pyplot as plt
# Detect spatial clusters
clusters = spatial_clustering(
cell_graph,
cell_positions,
method='dbscan', # or 'kmeans', 'hierarchical'
eps=50, # DBSCAN: neighborhood radius
min_samples=10 # DBSCAN: minimum cluster size
)
# Visualize clusters
plt.scatter(
cell_positions[:, 0],
cell_positions[:, 1],
c=clusters,
cmap='tab20'
)
plt.title('Spatial Clusters')
plt.show()
Cell-Cell Interaction Analysis
Test for enrichment or depletion of cell type interactions:
from pathml.graph import cell_interaction_analysis
# Test for significant interactions
interaction_results = cell_interaction_analysis(
cell_graph,
cell_types=cell_type_labels,
method='permutation', # or 'expected'
n_permutations=1000,
significance_level=0.05
)
# Interaction scores (positive = attraction, negative = avoidance)
interaction_matrix = interaction_results['scores']
# Visualize with heatmap
import seaborn as sns
sns.heatmap(
interaction_matrix,
cmap='RdBu_r',
center=0,
xticklabels=cell_type_names,
yticklabels=cell_type_names
)
plt.title('Cell-Cell Interaction Scores')
plt.show()
Spatial Statistics
Compute spatial statistics and patterns:
from pathml.graph import spatial_statistics
# Ripley's K function for spatial point patterns
ripleys_k = spatial_statistics(
cell_positions,
cell_types=cell_type_labels,
statistic='ripleys_k',
radii=np.linspace(0, 200, 50)
)
# Nearest neighbor distances
nn_distances = spatial_statistics(
cell_positions,
statistic='nearest_neighbor',
by_cell_type=True
)
Integration with Graph Neural Networks
Convert to PyTorch Geometric Format
from pathml.graph import to_pyg
import torch
from torch_geometric.data import Data
# Convert to PyTorch Geometric Data object
pyg_data = cell_graph.to_pyg()
# Access components
x = pyg_data.x # Node features (n_nodes, n_features)
edge_index = pyg_data.edge_index # Edge connectivity (2, n_edges)
edge_attr = pyg_data.edge_attr # Edge features (n_edges, n_edge_features)
y = pyg_data.y # Graph-level label
pos = pyg_data.pos # Node positions (n_nodes, 2)
# Use with PyTorch Geometric
from torch_geometric.nn import GCNConv
class GNN(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels):
super().__init__()
self.conv1 = GCNConv(in_channels, hidden_channels)
self.conv2 = GCNConv(hidden_channels, out_channels)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index).relu()
x = self.conv2(x, edge_index)
return x
model = GNN(in_channels=pyg_data.num_features, hidden_channels=64, out_channels=5)
output = model(pyg_data)
Graph Dataset for Multiple Slides
from pathml.graph import GraphDataset
from torch_geometric.loader import DataLoader
# Create dataset of graphs from multiple slides
graphs = []
for slide in slides:
# Build graph for each slide
cell_graph = CellGraph.from_instance_map(slide.inst_map, ...)
pyg_graph = cell_graph.to_pyg()
graphs.append(pyg_graph)
# Create DataLoader
loader = DataLoader(graphs, batch_size=32, shuffle=True)
# Train GNN
for batch in loader:
output = model(batch)
loss = criterion(output, batch.y)
loss.backward()
optimizer.step()
Visualization
Graph Visualization
import matplotlib.pyplot as plt
import networkx as nx
# Convert to NetworkX
nx_graph = cell_graph.to_networkx()
# Draw graph with cell positions as layout
pos = {i: cell_graph.positions[i] for i in range(len(cell_graph.nodes))}
plt.figure(figsize=(12, 12))
nx.draw_networkx(
nx_graph,
pos=pos,
node_color=cell_type_labels,
node_size=50,
cmap='tab10',
with_labels=False,
alpha=0.8
)
plt.axis('equal')
plt.title('Cell Graph')
plt.show()
Overlay on Tissue Image
from pathml.graph import visualize_graph_on_image
# Visualize graph overlaid on tissue
fig, ax = plt.subplots(figsize=(15, 15))
ax.imshow(tissue_image)
# Draw edges
for edge in cell_graph.edges:
node1, node2 = edge
pos1 = cell_graph.positions[node1]
pos2 = cell_graph.positions[node2]
ax.plot([pos1[0], pos2[0]], [pos1[1], pos2[1]], 'b-', alpha=0.3, linewidth=0.5)
# Draw nodes colored by type
for cell_type in np.unique(cell_type_labels):
mask = cell_type_labels == cell_type
positions = cell_graph.positions[mask]
ax.scatter(positions[:, 0], positions[:, 1], label=f'Type {cell_type}', s=20)
ax.legend()
ax.axis('off')
plt.title('Cell Graph on Tissue')
plt.show()
Complete Workflow Example
from pathml.core import SlideData, CODEXSlide
from pathml.preprocessing import Pipeline, CollapseRunsCODEX, SegmentMIF
from pathml.graph import CellGraph, extract_morphology_features, extract_intensity_features
import matplotlib.pyplot as plt
# 1. Load and preprocess slide
slide = CODEXSlide('path/to/codex', stain='IF')
pipeline = Pipeline([
CollapseRunsCODEX(z_slice=2),
SegmentMIF(
nuclear_channel='DAPI',
cytoplasm_channel='CD45',
model='mesmer'
)
])
pipeline.run(slide)
# 2. Build cell graph
inst_map = slide.masks['cell_segmentation']
cell_graph = CellGraph.from_instance_map(
inst_map,
image=slide.image,
connectivity='knn',
k=6
)
# 3. Extract features
# Morphological features
morph_features = extract_morphology_features(
inst_map,
features=['area', 'perimeter', 'eccentricity', 'solidity']
)
cell_graph.add_node_features(morph_features)
# Intensity features (marker expression)
intensity_features = extract_intensity_features(
inst_map,
image=slide.image,
channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
statistics=['mean', 'std']
)
cell_graph.add_node_features(intensity_features)
# 4. Spatial analysis
from pathml.graph import analyze_neighborhoods
neighborhoods = analyze_neighborhoods(
cell_graph,
cell_types=cell_type_predictions,
radius=100,
metrics=['diversity', 'composition']
)
# 5. Export for GNN
pyg_data = cell_graph.to_pyg()
# 6. Visualize
plt.figure(figsize=(15, 15))
plt.imshow(slide.image)
# Overlay graph
nx_graph = cell_graph.to_networkx()
pos = {i: cell_graph.positions[i] for i in range(cell_graph.num_nodes)}
nx.draw_networkx(
nx_graph,
pos=pos,
node_color=cell_type_predictions,
cmap='tab10',
node_size=30,
with_labels=False
)
plt.axis('off')
plt.title('Cell Graph with Spatial Neighborhood')
plt.show()
Performance Considerations
Large tissue sections:
- Build graphs tile-by-tile, then merge
- Use sparse adjacency matrices
- Leverage GPU for feature extraction
Memory efficiency:
- Store only necessary edge features
- Use int32/float32 instead of int64/float64
- Batch process multiple slides
Computational efficiency:
- Parallelize feature extraction across cells
- Use KNN for faster neighbor queries
- Cache computed features
Best Practices
-
Choose appropriate connectivity: KNN for uniform analysis, radius for physical interactions, contact for direct cell-cell communication
-
Normalize features: Scale morphological and intensity features for GNN compatibility
-
Handle edge effects: Exclude boundary cells or use tissue masks to define valid regions
-
Validate graph construction: Visualize graphs on small regions before large-scale processing
-
Combine multiple feature types: Morphology + intensity + texture provides rich representations
-
Consider tissue context: Tissue type affects appropriate graph parameters (connectivity, radius)
Common Issues and Solutions
Issue: Too many/few edges
- Adjust k (KNN) or radius (radius-based) parameters
- Verify pixel-to-micron conversion for biological relevance
Issue: Memory errors with large graphs
- Process tiles separately and merge graphs
- Use sparse matrix representations
- Reduce edge features to essential ones
Issue: Missing cells at tissue boundaries
- Apply edge_correction parameter
- Use tissue masks to exclude invalid regions
Issue: Inconsistent feature scales
- Normalize features:
(x - mean) / std - Use robust scaling for outliers
Additional Resources
- PathML Graph API: https://pathml.readthedocs.io/en/latest/api_graph_reference.html
- PyTorch Geometric: https://pytorch-geometric.readthedocs.io/
- NetworkX: https://networkx.org/
- Spatial Statistics: Baddeley et al., "Spatial Point Patterns: Methodology and Applications with R"