654 lines
16 KiB
Markdown
654 lines
16 KiB
Markdown
# Graph Construction & Spatial Analysis
|
||
|
||
## Overview
|
||
|
||
PathML provides tools for constructing spatial graphs from tissue images to represent cellular and tissue-level relationships. Graph-based representations enable sophisticated spatial analysis, including neighborhood analysis, cell-cell interaction studies, and graph neural network applications. These graphs capture both morphological features and spatial topology for downstream computational analysis.
|
||
|
||
## Graph Types
|
||
|
||
PathML supports construction of multiple graph types:
|
||
|
||
### Cell Graphs
|
||
- Nodes represent individual cells
|
||
- Edges represent spatial proximity or biological interactions
|
||
- Node features include morphology, marker expression, cell type
|
||
- Suitable for single-cell spatial analysis
|
||
|
||
### Tissue Graphs
|
||
- Nodes represent tissue regions or superpixels
|
||
- Edges represent spatial adjacency
|
||
- Node features include tissue composition, texture features
|
||
- Suitable for tissue-level spatial patterns
|
||
|
||
### Spatial Transcriptomics Graphs
|
||
- Nodes represent spatial spots or cells
|
||
- Edges encode spatial relationships
|
||
- Node features include gene expression profiles
|
||
- Suitable for spatial omics analysis
|
||
|
||
## Graph Construction Workflow
|
||
|
||
### From Segmentation to Graphs
|
||
|
||
Convert nucleus or cell segmentation results into spatial graphs:
|
||
|
||
```python
|
||
from pathml.graph import CellGraph
|
||
from pathml.preprocessing import Pipeline, SegmentMIF
|
||
import numpy as np
|
||
|
||
# 1. Perform cell segmentation
|
||
pipeline = Pipeline([
|
||
SegmentMIF(
|
||
nuclear_channel='DAPI',
|
||
cytoplasm_channel='CD45',
|
||
model='mesmer'
|
||
)
|
||
])
|
||
pipeline.run(slide)
|
||
|
||
# 2. Extract instance segmentation mask
|
||
inst_map = slide.masks['cell_segmentation']
|
||
|
||
# 3. Build cell graph
|
||
cell_graph = CellGraph.from_instance_map(
|
||
inst_map,
|
||
image=slide.image, # Optional: for extracting visual features
|
||
connectivity='delaunay', # 'knn', 'radius', or 'delaunay'
|
||
k=5, # For knn: number of neighbors
|
||
radius=50 # For radius: distance threshold in pixels
|
||
)
|
||
|
||
# 4. Access graph components
|
||
nodes = cell_graph.nodes # Node features
|
||
edges = cell_graph.edges # Edge list
|
||
adjacency = cell_graph.adjacency_matrix # Adjacency matrix
|
||
```
|
||
|
||
### Connectivity Methods
|
||
|
||
**K-Nearest Neighbors (KNN):**
|
||
```python
|
||
# Connect each cell to its k nearest neighbors
|
||
graph = CellGraph.from_instance_map(
|
||
inst_map,
|
||
connectivity='knn',
|
||
k=5 # Number of neighbors
|
||
)
|
||
```
|
||
- Fixed degree per node
|
||
- Captures local neighborhoods
|
||
- Simple and interpretable
|
||
|
||
**Radius-based:**
|
||
```python
|
||
# Connect cells within a distance threshold
|
||
graph = CellGraph.from_instance_map(
|
||
inst_map,
|
||
connectivity='radius',
|
||
radius=100, # Maximum distance in pixels
|
||
distance_metric='euclidean' # or 'manhattan', 'chebyshev'
|
||
)
|
||
```
|
||
- Variable degree based on density
|
||
- Biologically motivated (interaction range)
|
||
- Captures physical proximity
|
||
|
||
**Delaunay Triangulation:**
|
||
```python
|
||
# Connect cells using Delaunay triangulation
|
||
graph = CellGraph.from_instance_map(
|
||
inst_map,
|
||
connectivity='delaunay'
|
||
)
|
||
```
|
||
- Creates connected graph from spatial positions
|
||
- No isolated nodes (in convex hull)
|
||
- Captures spatial tessellation
|
||
|
||
**Contact-based:**
|
||
```python
|
||
# Connect cells with touching boundaries
|
||
graph = CellGraph.from_instance_map(
|
||
inst_map,
|
||
connectivity='contact',
|
||
dilation=2 # Dilate boundaries to capture near-contacts
|
||
)
|
||
```
|
||
- Physical cell-cell contacts
|
||
- Most biologically direct
|
||
- Sparse edges for separated cells
|
||
|
||
## Node Features
|
||
|
||
### Morphological Features
|
||
|
||
Extract shape and size features for each cell:
|
||
|
||
```python
|
||
from pathml.graph import extract_morphology_features
|
||
|
||
# Compute morphological features
|
||
morphology_features = extract_morphology_features(
|
||
inst_map,
|
||
features=[
|
||
'area', # Cell area in pixels
|
||
'perimeter', # Cell perimeter
|
||
'eccentricity', # Shape elongation
|
||
'solidity', # Convexity measure
|
||
'major_axis_length',
|
||
'minor_axis_length',
|
||
'orientation' # Cell orientation angle
|
||
]
|
||
)
|
||
|
||
# Add to graph
|
||
cell_graph.add_node_features(morphology_features, feature_names=['area', 'perimeter', ...])
|
||
```
|
||
|
||
**Available morphological features:**
|
||
- **Area** - Number of pixels
|
||
- **Perimeter** - Boundary length
|
||
- **Eccentricity** - 0 (circle) to 1 (line)
|
||
- **Solidity** - Area / convex hull area
|
||
- **Circularity** - 4π × area / perimeter²
|
||
- **Major/Minor axis** - Lengths of fitted ellipse axes
|
||
- **Orientation** - Angle of major axis
|
||
- **Extent** - Area / bounding box area
|
||
|
||
### Intensity Features
|
||
|
||
Extract marker expression or intensity statistics:
|
||
|
||
```python
|
||
from pathml.graph import extract_intensity_features
|
||
|
||
# Extract mean marker intensities per cell
|
||
intensity_features = extract_intensity_features(
|
||
inst_map,
|
||
image=multichannel_image, # Shape: (H, W, C)
|
||
channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
|
||
statistics=['mean', 'std', 'median', 'max']
|
||
)
|
||
|
||
# Add to graph
|
||
cell_graph.add_node_features(
|
||
intensity_features,
|
||
feature_names=['DAPI_mean', 'CD3_mean', ...]
|
||
)
|
||
```
|
||
|
||
**Available statistics:**
|
||
- **mean** - Average intensity
|
||
- **median** - Median intensity
|
||
- **std** - Standard deviation
|
||
- **max** - Maximum intensity
|
||
- **min** - Minimum intensity
|
||
- **quantile_25/75** - Quartiles
|
||
|
||
### Texture Features
|
||
|
||
Compute texture descriptors for each cell region:
|
||
|
||
```python
|
||
from pathml.graph import extract_texture_features
|
||
|
||
# Haralick texture features
|
||
texture_features = extract_texture_features(
|
||
inst_map,
|
||
image=grayscale_image,
|
||
features='haralick', # or 'lbp', 'gabor'
|
||
distance=1,
|
||
angles=[0, np.pi/4, np.pi/2, 3*np.pi/4]
|
||
)
|
||
|
||
cell_graph.add_node_features(texture_features)
|
||
```
|
||
|
||
### Cell Type Annotations
|
||
|
||
Add cell type labels from classification:
|
||
|
||
```python
|
||
# From ML model predictions
|
||
cell_types = hovernet_type_predictions # Array of cell type IDs
|
||
|
||
cell_graph.add_node_features(
|
||
cell_types,
|
||
feature_names=['cell_type']
|
||
)
|
||
|
||
# One-hot encode cell types
|
||
cell_type_onehot = one_hot_encode(cell_types, num_classes=5)
|
||
cell_graph.add_node_features(
|
||
cell_type_onehot,
|
||
feature_names=['type_epithelial', 'type_inflammatory', ...]
|
||
)
|
||
```
|
||
|
||
## Edge Features
|
||
|
||
### Spatial Distance
|
||
|
||
Compute edge features based on spatial relationships:
|
||
|
||
```python
|
||
from pathml.graph import compute_edge_distances
|
||
|
||
# Add pairwise distances as edge features
|
||
distances = compute_edge_distances(
|
||
cell_graph,
|
||
metric='euclidean' # or 'manhattan', 'chebyshev'
|
||
)
|
||
|
||
cell_graph.add_edge_features(distances, feature_names=['distance'])
|
||
```
|
||
|
||
### Interaction Features
|
||
|
||
Model biological interactions between cell types:
|
||
|
||
```python
|
||
from pathml.graph import compute_interaction_features
|
||
|
||
# Cell type co-occurrence along edges
|
||
interaction_features = compute_interaction_features(
|
||
cell_graph,
|
||
cell_types=cell_type_labels,
|
||
interaction_type='categorical' # or 'numerical'
|
||
)
|
||
|
||
cell_graph.add_edge_features(interaction_features)
|
||
```
|
||
|
||
## Graph-Level Features
|
||
|
||
Aggregate features for entire graph:
|
||
|
||
```python
|
||
from pathml.graph import compute_graph_features
|
||
|
||
# Topological features
|
||
graph_features = compute_graph_features(
|
||
cell_graph,
|
||
features=[
|
||
'num_nodes',
|
||
'num_edges',
|
||
'average_degree',
|
||
'clustering_coefficient',
|
||
'average_path_length',
|
||
'diameter'
|
||
]
|
||
)
|
||
|
||
# Cell composition features
|
||
composition = cell_graph.compute_cell_type_composition(
|
||
cell_type_labels,
|
||
normalize=True # Proportions
|
||
)
|
||
```
|
||
|
||
## Spatial Analysis
|
||
|
||
### Neighborhood Analysis
|
||
|
||
Analyze cell neighborhoods and microenvironments:
|
||
|
||
```python
|
||
from pathml.graph import analyze_neighborhoods
|
||
|
||
# Characterize neighborhoods around each cell
|
||
neighborhoods = analyze_neighborhoods(
|
||
cell_graph,
|
||
cell_types=cell_type_labels,
|
||
radius=100, # Neighborhood radius
|
||
metrics=['diversity', 'density', 'composition']
|
||
)
|
||
|
||
# Neighborhood diversity (Shannon entropy)
|
||
diversity = neighborhoods['diversity']
|
||
|
||
# Cell type composition in each neighborhood
|
||
composition = neighborhoods['composition'] # (n_cells, n_cell_types)
|
||
```
|
||
|
||
### Spatial Clustering
|
||
|
||
Identify spatial clusters of cell types:
|
||
|
||
```python
|
||
from pathml.graph import spatial_clustering
|
||
import matplotlib.pyplot as plt
|
||
|
||
# Detect spatial clusters
|
||
clusters = spatial_clustering(
|
||
cell_graph,
|
||
cell_positions,
|
||
method='dbscan', # or 'kmeans', 'hierarchical'
|
||
eps=50, # DBSCAN: neighborhood radius
|
||
min_samples=10 # DBSCAN: minimum cluster size
|
||
)
|
||
|
||
# Visualize clusters
|
||
plt.scatter(
|
||
cell_positions[:, 0],
|
||
cell_positions[:, 1],
|
||
c=clusters,
|
||
cmap='tab20'
|
||
)
|
||
plt.title('Spatial Clusters')
|
||
plt.show()
|
||
```
|
||
|
||
### Cell-Cell Interaction Analysis
|
||
|
||
Test for enrichment or depletion of cell type interactions:
|
||
|
||
```python
|
||
from pathml.graph import cell_interaction_analysis
|
||
|
||
# Test for significant interactions
|
||
interaction_results = cell_interaction_analysis(
|
||
cell_graph,
|
||
cell_types=cell_type_labels,
|
||
method='permutation', # or 'expected'
|
||
n_permutations=1000,
|
||
significance_level=0.05
|
||
)
|
||
|
||
# Interaction scores (positive = attraction, negative = avoidance)
|
||
interaction_matrix = interaction_results['scores']
|
||
|
||
# Visualize with heatmap
|
||
import seaborn as sns
|
||
sns.heatmap(
|
||
interaction_matrix,
|
||
cmap='RdBu_r',
|
||
center=0,
|
||
xticklabels=cell_type_names,
|
||
yticklabels=cell_type_names
|
||
)
|
||
plt.title('Cell-Cell Interaction Scores')
|
||
plt.show()
|
||
```
|
||
|
||
### Spatial Statistics
|
||
|
||
Compute spatial statistics and patterns:
|
||
|
||
```python
|
||
from pathml.graph import spatial_statistics
|
||
|
||
# Ripley's K function for spatial point patterns
|
||
ripleys_k = spatial_statistics(
|
||
cell_positions,
|
||
cell_types=cell_type_labels,
|
||
statistic='ripleys_k',
|
||
radii=np.linspace(0, 200, 50)
|
||
)
|
||
|
||
# Nearest neighbor distances
|
||
nn_distances = spatial_statistics(
|
||
cell_positions,
|
||
statistic='nearest_neighbor',
|
||
by_cell_type=True
|
||
)
|
||
```
|
||
|
||
## Integration with Graph Neural Networks
|
||
|
||
### Convert to PyTorch Geometric Format
|
||
|
||
```python
|
||
from pathml.graph import to_pyg
|
||
import torch
|
||
from torch_geometric.data import Data
|
||
|
||
# Convert to PyTorch Geometric Data object
|
||
pyg_data = cell_graph.to_pyg()
|
||
|
||
# Access components
|
||
x = pyg_data.x # Node features (n_nodes, n_features)
|
||
edge_index = pyg_data.edge_index # Edge connectivity (2, n_edges)
|
||
edge_attr = pyg_data.edge_attr # Edge features (n_edges, n_edge_features)
|
||
y = pyg_data.y # Graph-level label
|
||
pos = pyg_data.pos # Node positions (n_nodes, 2)
|
||
|
||
# Use with PyTorch Geometric
|
||
from torch_geometric.nn import GCNConv
|
||
|
||
class GNN(torch.nn.Module):
|
||
def __init__(self, in_channels, hidden_channels, out_channels):
|
||
super().__init__()
|
||
self.conv1 = GCNConv(in_channels, hidden_channels)
|
||
self.conv2 = GCNConv(hidden_channels, out_channels)
|
||
|
||
def forward(self, data):
|
||
x, edge_index = data.x, data.edge_index
|
||
x = self.conv1(x, edge_index).relu()
|
||
x = self.conv2(x, edge_index)
|
||
return x
|
||
|
||
model = GNN(in_channels=pyg_data.num_features, hidden_channels=64, out_channels=5)
|
||
output = model(pyg_data)
|
||
```
|
||
|
||
### Graph Dataset for Multiple Slides
|
||
|
||
```python
|
||
from pathml.graph import GraphDataset
|
||
from torch_geometric.loader import DataLoader
|
||
|
||
# Create dataset of graphs from multiple slides
|
||
graphs = []
|
||
for slide in slides:
|
||
# Build graph for each slide
|
||
cell_graph = CellGraph.from_instance_map(slide.inst_map, ...)
|
||
pyg_graph = cell_graph.to_pyg()
|
||
graphs.append(pyg_graph)
|
||
|
||
# Create DataLoader
|
||
loader = DataLoader(graphs, batch_size=32, shuffle=True)
|
||
|
||
# Train GNN
|
||
for batch in loader:
|
||
output = model(batch)
|
||
loss = criterion(output, batch.y)
|
||
loss.backward()
|
||
optimizer.step()
|
||
```
|
||
|
||
## Visualization
|
||
|
||
### Graph Visualization
|
||
|
||
```python
|
||
import matplotlib.pyplot as plt
|
||
import networkx as nx
|
||
|
||
# Convert to NetworkX
|
||
nx_graph = cell_graph.to_networkx()
|
||
|
||
# Draw graph with cell positions as layout
|
||
pos = {i: cell_graph.positions[i] for i in range(len(cell_graph.nodes))}
|
||
|
||
plt.figure(figsize=(12, 12))
|
||
nx.draw_networkx(
|
||
nx_graph,
|
||
pos=pos,
|
||
node_color=cell_type_labels,
|
||
node_size=50,
|
||
cmap='tab10',
|
||
with_labels=False,
|
||
alpha=0.8
|
||
)
|
||
plt.axis('equal')
|
||
plt.title('Cell Graph')
|
||
plt.show()
|
||
```
|
||
|
||
### Overlay on Tissue Image
|
||
|
||
```python
|
||
from pathml.graph import visualize_graph_on_image
|
||
|
||
# Visualize graph overlaid on tissue
|
||
fig, ax = plt.subplots(figsize=(15, 15))
|
||
ax.imshow(tissue_image)
|
||
|
||
# Draw edges
|
||
for edge in cell_graph.edges:
|
||
node1, node2 = edge
|
||
pos1 = cell_graph.positions[node1]
|
||
pos2 = cell_graph.positions[node2]
|
||
ax.plot([pos1[0], pos2[0]], [pos1[1], pos2[1]], 'b-', alpha=0.3, linewidth=0.5)
|
||
|
||
# Draw nodes colored by type
|
||
for cell_type in np.unique(cell_type_labels):
|
||
mask = cell_type_labels == cell_type
|
||
positions = cell_graph.positions[mask]
|
||
ax.scatter(positions[:, 0], positions[:, 1], label=f'Type {cell_type}', s=20)
|
||
|
||
ax.legend()
|
||
ax.axis('off')
|
||
plt.title('Cell Graph on Tissue')
|
||
plt.show()
|
||
```
|
||
|
||
## Complete Workflow Example
|
||
|
||
```python
|
||
from pathml.core import SlideData, CODEXSlide
|
||
from pathml.preprocessing import Pipeline, CollapseRunsCODEX, SegmentMIF
|
||
from pathml.graph import CellGraph, extract_morphology_features, extract_intensity_features
|
||
import matplotlib.pyplot as plt
|
||
|
||
# 1. Load and preprocess slide
|
||
slide = CODEXSlide('path/to/codex', stain='IF')
|
||
|
||
pipeline = Pipeline([
|
||
CollapseRunsCODEX(z_slice=2),
|
||
SegmentMIF(
|
||
nuclear_channel='DAPI',
|
||
cytoplasm_channel='CD45',
|
||
model='mesmer'
|
||
)
|
||
])
|
||
pipeline.run(slide)
|
||
|
||
# 2. Build cell graph
|
||
inst_map = slide.masks['cell_segmentation']
|
||
cell_graph = CellGraph.from_instance_map(
|
||
inst_map,
|
||
image=slide.image,
|
||
connectivity='knn',
|
||
k=6
|
||
)
|
||
|
||
# 3. Extract features
|
||
# Morphological features
|
||
morph_features = extract_morphology_features(
|
||
inst_map,
|
||
features=['area', 'perimeter', 'eccentricity', 'solidity']
|
||
)
|
||
cell_graph.add_node_features(morph_features)
|
||
|
||
# Intensity features (marker expression)
|
||
intensity_features = extract_intensity_features(
|
||
inst_map,
|
||
image=slide.image,
|
||
channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
|
||
statistics=['mean', 'std']
|
||
)
|
||
cell_graph.add_node_features(intensity_features)
|
||
|
||
# 4. Spatial analysis
|
||
from pathml.graph import analyze_neighborhoods
|
||
|
||
neighborhoods = analyze_neighborhoods(
|
||
cell_graph,
|
||
cell_types=cell_type_predictions,
|
||
radius=100,
|
||
metrics=['diversity', 'composition']
|
||
)
|
||
|
||
# 5. Export for GNN
|
||
pyg_data = cell_graph.to_pyg()
|
||
|
||
# 6. Visualize
|
||
plt.figure(figsize=(15, 15))
|
||
plt.imshow(slide.image)
|
||
|
||
# Overlay graph
|
||
nx_graph = cell_graph.to_networkx()
|
||
pos = {i: cell_graph.positions[i] for i in range(cell_graph.num_nodes)}
|
||
nx.draw_networkx(
|
||
nx_graph,
|
||
pos=pos,
|
||
node_color=cell_type_predictions,
|
||
cmap='tab10',
|
||
node_size=30,
|
||
with_labels=False
|
||
)
|
||
plt.axis('off')
|
||
plt.title('Cell Graph with Spatial Neighborhood')
|
||
plt.show()
|
||
```
|
||
|
||
## Performance Considerations
|
||
|
||
**Large tissue sections:**
|
||
- Build graphs tile-by-tile, then merge
|
||
- Use sparse adjacency matrices
|
||
- Leverage GPU for feature extraction
|
||
|
||
**Memory efficiency:**
|
||
- Store only necessary edge features
|
||
- Use int32/float32 instead of int64/float64
|
||
- Batch process multiple slides
|
||
|
||
**Computational efficiency:**
|
||
- Parallelize feature extraction across cells
|
||
- Use KNN for faster neighbor queries
|
||
- Cache computed features
|
||
|
||
## Best Practices
|
||
|
||
1. **Choose appropriate connectivity:** KNN for uniform analysis, radius for physical interactions, contact for direct cell-cell communication
|
||
|
||
2. **Normalize features:** Scale morphological and intensity features for GNN compatibility
|
||
|
||
3. **Handle edge effects:** Exclude boundary cells or use tissue masks to define valid regions
|
||
|
||
4. **Validate graph construction:** Visualize graphs on small regions before large-scale processing
|
||
|
||
5. **Combine multiple feature types:** Morphology + intensity + texture provides rich representations
|
||
|
||
6. **Consider tissue context:** Tissue type affects appropriate graph parameters (connectivity, radius)
|
||
|
||
## Common Issues and Solutions
|
||
|
||
**Issue: Too many/few edges**
|
||
- Adjust k (KNN) or radius (radius-based) parameters
|
||
- Verify pixel-to-micron conversion for biological relevance
|
||
|
||
**Issue: Memory errors with large graphs**
|
||
- Process tiles separately and merge graphs
|
||
- Use sparse matrix representations
|
||
- Reduce edge features to essential ones
|
||
|
||
**Issue: Missing cells at tissue boundaries**
|
||
- Apply edge_correction parameter
|
||
- Use tissue masks to exclude invalid regions
|
||
|
||
**Issue: Inconsistent feature scales**
|
||
- Normalize features: `(x - mean) / std`
|
||
- Use robust scaling for outliers
|
||
|
||
## Additional Resources
|
||
|
||
- **PathML Graph API:** https://pathml.readthedocs.io/en/latest/api_graph_reference.html
|
||
- **PyTorch Geometric:** https://pytorch-geometric.readthedocs.io/
|
||
- **NetworkX:** https://networkx.org/
|
||
- **Spatial Statistics:** Baddeley et al., "Spatial Point Patterns: Methodology and Applications with R"
|