Initial commit
This commit is contained in:
653
skills/pathml/references/graphs.md
Normal file
653
skills/pathml/references/graphs.md
Normal file
@@ -0,0 +1,653 @@
|
||||
# Graph Construction & Spatial Analysis
|
||||
|
||||
## Overview
|
||||
|
||||
PathML provides tools for constructing spatial graphs from tissue images to represent cellular and tissue-level relationships. Graph-based representations enable sophisticated spatial analysis, including neighborhood analysis, cell-cell interaction studies, and graph neural network applications. These graphs capture both morphological features and spatial topology for downstream computational analysis.
|
||||
|
||||
## Graph Types
|
||||
|
||||
PathML supports construction of multiple graph types:
|
||||
|
||||
### Cell Graphs
|
||||
- Nodes represent individual cells
|
||||
- Edges represent spatial proximity or biological interactions
|
||||
- Node features include morphology, marker expression, cell type
|
||||
- Suitable for single-cell spatial analysis
|
||||
|
||||
### Tissue Graphs
|
||||
- Nodes represent tissue regions or superpixels
|
||||
- Edges represent spatial adjacency
|
||||
- Node features include tissue composition, texture features
|
||||
- Suitable for tissue-level spatial patterns
|
||||
|
||||
### Spatial Transcriptomics Graphs
|
||||
- Nodes represent spatial spots or cells
|
||||
- Edges encode spatial relationships
|
||||
- Node features include gene expression profiles
|
||||
- Suitable for spatial omics analysis
|
||||
|
||||
## Graph Construction Workflow
|
||||
|
||||
### From Segmentation to Graphs
|
||||
|
||||
Convert nucleus or cell segmentation results into spatial graphs:
|
||||
|
||||
```python
|
||||
from pathml.graph import CellGraph
|
||||
from pathml.preprocessing import Pipeline, SegmentMIF
|
||||
import numpy as np
|
||||
|
||||
# 1. Perform cell segmentation
|
||||
pipeline = Pipeline([
|
||||
SegmentMIF(
|
||||
nuclear_channel='DAPI',
|
||||
cytoplasm_channel='CD45',
|
||||
model='mesmer'
|
||||
)
|
||||
])
|
||||
pipeline.run(slide)
|
||||
|
||||
# 2. Extract instance segmentation mask
|
||||
inst_map = slide.masks['cell_segmentation']
|
||||
|
||||
# 3. Build cell graph
|
||||
cell_graph = CellGraph.from_instance_map(
|
||||
inst_map,
|
||||
image=slide.image, # Optional: for extracting visual features
|
||||
connectivity='delaunay', # 'knn', 'radius', or 'delaunay'
|
||||
k=5, # For knn: number of neighbors
|
||||
radius=50 # For radius: distance threshold in pixels
|
||||
)
|
||||
|
||||
# 4. Access graph components
|
||||
nodes = cell_graph.nodes # Node features
|
||||
edges = cell_graph.edges # Edge list
|
||||
adjacency = cell_graph.adjacency_matrix # Adjacency matrix
|
||||
```
|
||||
|
||||
### Connectivity Methods
|
||||
|
||||
**K-Nearest Neighbors (KNN):**
|
||||
```python
|
||||
# Connect each cell to its k nearest neighbors
|
||||
graph = CellGraph.from_instance_map(
|
||||
inst_map,
|
||||
connectivity='knn',
|
||||
k=5 # Number of neighbors
|
||||
)
|
||||
```
|
||||
- Fixed degree per node
|
||||
- Captures local neighborhoods
|
||||
- Simple and interpretable
|
||||
|
||||
**Radius-based:**
|
||||
```python
|
||||
# Connect cells within a distance threshold
|
||||
graph = CellGraph.from_instance_map(
|
||||
inst_map,
|
||||
connectivity='radius',
|
||||
radius=100, # Maximum distance in pixels
|
||||
distance_metric='euclidean' # or 'manhattan', 'chebyshev'
|
||||
)
|
||||
```
|
||||
- Variable degree based on density
|
||||
- Biologically motivated (interaction range)
|
||||
- Captures physical proximity
|
||||
|
||||
**Delaunay Triangulation:**
|
||||
```python
|
||||
# Connect cells using Delaunay triangulation
|
||||
graph = CellGraph.from_instance_map(
|
||||
inst_map,
|
||||
connectivity='delaunay'
|
||||
)
|
||||
```
|
||||
- Creates connected graph from spatial positions
|
||||
- No isolated nodes (in convex hull)
|
||||
- Captures spatial tessellation
|
||||
|
||||
**Contact-based:**
|
||||
```python
|
||||
# Connect cells with touching boundaries
|
||||
graph = CellGraph.from_instance_map(
|
||||
inst_map,
|
||||
connectivity='contact',
|
||||
dilation=2 # Dilate boundaries to capture near-contacts
|
||||
)
|
||||
```
|
||||
- Physical cell-cell contacts
|
||||
- Most biologically direct
|
||||
- Sparse edges for separated cells
|
||||
|
||||
## Node Features
|
||||
|
||||
### Morphological Features
|
||||
|
||||
Extract shape and size features for each cell:
|
||||
|
||||
```python
|
||||
from pathml.graph import extract_morphology_features
|
||||
|
||||
# Compute morphological features
|
||||
morphology_features = extract_morphology_features(
|
||||
inst_map,
|
||||
features=[
|
||||
'area', # Cell area in pixels
|
||||
'perimeter', # Cell perimeter
|
||||
'eccentricity', # Shape elongation
|
||||
'solidity', # Convexity measure
|
||||
'major_axis_length',
|
||||
'minor_axis_length',
|
||||
'orientation' # Cell orientation angle
|
||||
]
|
||||
)
|
||||
|
||||
# Add to graph
|
||||
cell_graph.add_node_features(morphology_features, feature_names=['area', 'perimeter', ...])
|
||||
```
|
||||
|
||||
**Available morphological features:**
|
||||
- **Area** - Number of pixels
|
||||
- **Perimeter** - Boundary length
|
||||
- **Eccentricity** - 0 (circle) to 1 (line)
|
||||
- **Solidity** - Area / convex hull area
|
||||
- **Circularity** - 4π × area / perimeter²
|
||||
- **Major/Minor axis** - Lengths of fitted ellipse axes
|
||||
- **Orientation** - Angle of major axis
|
||||
- **Extent** - Area / bounding box area
|
||||
|
||||
### Intensity Features
|
||||
|
||||
Extract marker expression or intensity statistics:
|
||||
|
||||
```python
|
||||
from pathml.graph import extract_intensity_features
|
||||
|
||||
# Extract mean marker intensities per cell
|
||||
intensity_features = extract_intensity_features(
|
||||
inst_map,
|
||||
image=multichannel_image, # Shape: (H, W, C)
|
||||
channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
|
||||
statistics=['mean', 'std', 'median', 'max']
|
||||
)
|
||||
|
||||
# Add to graph
|
||||
cell_graph.add_node_features(
|
||||
intensity_features,
|
||||
feature_names=['DAPI_mean', 'CD3_mean', ...]
|
||||
)
|
||||
```
|
||||
|
||||
**Available statistics:**
|
||||
- **mean** - Average intensity
|
||||
- **median** - Median intensity
|
||||
- **std** - Standard deviation
|
||||
- **max** - Maximum intensity
|
||||
- **min** - Minimum intensity
|
||||
- **quantile_25/75** - Quartiles
|
||||
|
||||
### Texture Features
|
||||
|
||||
Compute texture descriptors for each cell region:
|
||||
|
||||
```python
|
||||
from pathml.graph import extract_texture_features
|
||||
|
||||
# Haralick texture features
|
||||
texture_features = extract_texture_features(
|
||||
inst_map,
|
||||
image=grayscale_image,
|
||||
features='haralick', # or 'lbp', 'gabor'
|
||||
distance=1,
|
||||
angles=[0, np.pi/4, np.pi/2, 3*np.pi/4]
|
||||
)
|
||||
|
||||
cell_graph.add_node_features(texture_features)
|
||||
```
|
||||
|
||||
### Cell Type Annotations
|
||||
|
||||
Add cell type labels from classification:
|
||||
|
||||
```python
|
||||
# From ML model predictions
|
||||
cell_types = hovernet_type_predictions # Array of cell type IDs
|
||||
|
||||
cell_graph.add_node_features(
|
||||
cell_types,
|
||||
feature_names=['cell_type']
|
||||
)
|
||||
|
||||
# One-hot encode cell types
|
||||
cell_type_onehot = one_hot_encode(cell_types, num_classes=5)
|
||||
cell_graph.add_node_features(
|
||||
cell_type_onehot,
|
||||
feature_names=['type_epithelial', 'type_inflammatory', ...]
|
||||
)
|
||||
```
|
||||
|
||||
## Edge Features
|
||||
|
||||
### Spatial Distance
|
||||
|
||||
Compute edge features based on spatial relationships:
|
||||
|
||||
```python
|
||||
from pathml.graph import compute_edge_distances
|
||||
|
||||
# Add pairwise distances as edge features
|
||||
distances = compute_edge_distances(
|
||||
cell_graph,
|
||||
metric='euclidean' # or 'manhattan', 'chebyshev'
|
||||
)
|
||||
|
||||
cell_graph.add_edge_features(distances, feature_names=['distance'])
|
||||
```
|
||||
|
||||
### Interaction Features
|
||||
|
||||
Model biological interactions between cell types:
|
||||
|
||||
```python
|
||||
from pathml.graph import compute_interaction_features
|
||||
|
||||
# Cell type co-occurrence along edges
|
||||
interaction_features = compute_interaction_features(
|
||||
cell_graph,
|
||||
cell_types=cell_type_labels,
|
||||
interaction_type='categorical' # or 'numerical'
|
||||
)
|
||||
|
||||
cell_graph.add_edge_features(interaction_features)
|
||||
```
|
||||
|
||||
## Graph-Level Features
|
||||
|
||||
Aggregate features for entire graph:
|
||||
|
||||
```python
|
||||
from pathml.graph import compute_graph_features
|
||||
|
||||
# Topological features
|
||||
graph_features = compute_graph_features(
|
||||
cell_graph,
|
||||
features=[
|
||||
'num_nodes',
|
||||
'num_edges',
|
||||
'average_degree',
|
||||
'clustering_coefficient',
|
||||
'average_path_length',
|
||||
'diameter'
|
||||
]
|
||||
)
|
||||
|
||||
# Cell composition features
|
||||
composition = cell_graph.compute_cell_type_composition(
|
||||
cell_type_labels,
|
||||
normalize=True # Proportions
|
||||
)
|
||||
```
|
||||
|
||||
## Spatial Analysis
|
||||
|
||||
### Neighborhood Analysis
|
||||
|
||||
Analyze cell neighborhoods and microenvironments:
|
||||
|
||||
```python
|
||||
from pathml.graph import analyze_neighborhoods
|
||||
|
||||
# Characterize neighborhoods around each cell
|
||||
neighborhoods = analyze_neighborhoods(
|
||||
cell_graph,
|
||||
cell_types=cell_type_labels,
|
||||
radius=100, # Neighborhood radius
|
||||
metrics=['diversity', 'density', 'composition']
|
||||
)
|
||||
|
||||
# Neighborhood diversity (Shannon entropy)
|
||||
diversity = neighborhoods['diversity']
|
||||
|
||||
# Cell type composition in each neighborhood
|
||||
composition = neighborhoods['composition'] # (n_cells, n_cell_types)
|
||||
```
|
||||
|
||||
### Spatial Clustering
|
||||
|
||||
Identify spatial clusters of cell types:
|
||||
|
||||
```python
|
||||
from pathml.graph import spatial_clustering
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Detect spatial clusters
|
||||
clusters = spatial_clustering(
|
||||
cell_graph,
|
||||
cell_positions,
|
||||
method='dbscan', # or 'kmeans', 'hierarchical'
|
||||
eps=50, # DBSCAN: neighborhood radius
|
||||
min_samples=10 # DBSCAN: minimum cluster size
|
||||
)
|
||||
|
||||
# Visualize clusters
|
||||
plt.scatter(
|
||||
cell_positions[:, 0],
|
||||
cell_positions[:, 1],
|
||||
c=clusters,
|
||||
cmap='tab20'
|
||||
)
|
||||
plt.title('Spatial Clusters')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### Cell-Cell Interaction Analysis
|
||||
|
||||
Test for enrichment or depletion of cell type interactions:
|
||||
|
||||
```python
|
||||
from pathml.graph import cell_interaction_analysis
|
||||
|
||||
# Test for significant interactions
|
||||
interaction_results = cell_interaction_analysis(
|
||||
cell_graph,
|
||||
cell_types=cell_type_labels,
|
||||
method='permutation', # or 'expected'
|
||||
n_permutations=1000,
|
||||
significance_level=0.05
|
||||
)
|
||||
|
||||
# Interaction scores (positive = attraction, negative = avoidance)
|
||||
interaction_matrix = interaction_results['scores']
|
||||
|
||||
# Visualize with heatmap
|
||||
import seaborn as sns
|
||||
sns.heatmap(
|
||||
interaction_matrix,
|
||||
cmap='RdBu_r',
|
||||
center=0,
|
||||
xticklabels=cell_type_names,
|
||||
yticklabels=cell_type_names
|
||||
)
|
||||
plt.title('Cell-Cell Interaction Scores')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### Spatial Statistics
|
||||
|
||||
Compute spatial statistics and patterns:
|
||||
|
||||
```python
|
||||
from pathml.graph import spatial_statistics
|
||||
|
||||
# Ripley's K function for spatial point patterns
|
||||
ripleys_k = spatial_statistics(
|
||||
cell_positions,
|
||||
cell_types=cell_type_labels,
|
||||
statistic='ripleys_k',
|
||||
radii=np.linspace(0, 200, 50)
|
||||
)
|
||||
|
||||
# Nearest neighbor distances
|
||||
nn_distances = spatial_statistics(
|
||||
cell_positions,
|
||||
statistic='nearest_neighbor',
|
||||
by_cell_type=True
|
||||
)
|
||||
```
|
||||
|
||||
## Integration with Graph Neural Networks
|
||||
|
||||
### Convert to PyTorch Geometric Format
|
||||
|
||||
```python
|
||||
from pathml.graph import to_pyg
|
||||
import torch
|
||||
from torch_geometric.data import Data
|
||||
|
||||
# Convert to PyTorch Geometric Data object
|
||||
pyg_data = cell_graph.to_pyg()
|
||||
|
||||
# Access components
|
||||
x = pyg_data.x # Node features (n_nodes, n_features)
|
||||
edge_index = pyg_data.edge_index # Edge connectivity (2, n_edges)
|
||||
edge_attr = pyg_data.edge_attr # Edge features (n_edges, n_edge_features)
|
||||
y = pyg_data.y # Graph-level label
|
||||
pos = pyg_data.pos # Node positions (n_nodes, 2)
|
||||
|
||||
# Use with PyTorch Geometric
|
||||
from torch_geometric.nn import GCNConv
|
||||
|
||||
class GNN(torch.nn.Module):
|
||||
def __init__(self, in_channels, hidden_channels, out_channels):
|
||||
super().__init__()
|
||||
self.conv1 = GCNConv(in_channels, hidden_channels)
|
||||
self.conv2 = GCNConv(hidden_channels, out_channels)
|
||||
|
||||
def forward(self, data):
|
||||
x, edge_index = data.x, data.edge_index
|
||||
x = self.conv1(x, edge_index).relu()
|
||||
x = self.conv2(x, edge_index)
|
||||
return x
|
||||
|
||||
model = GNN(in_channels=pyg_data.num_features, hidden_channels=64, out_channels=5)
|
||||
output = model(pyg_data)
|
||||
```
|
||||
|
||||
### Graph Dataset for Multiple Slides
|
||||
|
||||
```python
|
||||
from pathml.graph import GraphDataset
|
||||
from torch_geometric.loader import DataLoader
|
||||
|
||||
# Create dataset of graphs from multiple slides
|
||||
graphs = []
|
||||
for slide in slides:
|
||||
# Build graph for each slide
|
||||
cell_graph = CellGraph.from_instance_map(slide.inst_map, ...)
|
||||
pyg_graph = cell_graph.to_pyg()
|
||||
graphs.append(pyg_graph)
|
||||
|
||||
# Create DataLoader
|
||||
loader = DataLoader(graphs, batch_size=32, shuffle=True)
|
||||
|
||||
# Train GNN
|
||||
for batch in loader:
|
||||
output = model(batch)
|
||||
loss = criterion(output, batch.y)
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
```
|
||||
|
||||
## Visualization
|
||||
|
||||
### Graph Visualization
|
||||
|
||||
```python
|
||||
import matplotlib.pyplot as plt
|
||||
import networkx as nx
|
||||
|
||||
# Convert to NetworkX
|
||||
nx_graph = cell_graph.to_networkx()
|
||||
|
||||
# Draw graph with cell positions as layout
|
||||
pos = {i: cell_graph.positions[i] for i in range(len(cell_graph.nodes))}
|
||||
|
||||
plt.figure(figsize=(12, 12))
|
||||
nx.draw_networkx(
|
||||
nx_graph,
|
||||
pos=pos,
|
||||
node_color=cell_type_labels,
|
||||
node_size=50,
|
||||
cmap='tab10',
|
||||
with_labels=False,
|
||||
alpha=0.8
|
||||
)
|
||||
plt.axis('equal')
|
||||
plt.title('Cell Graph')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### Overlay on Tissue Image
|
||||
|
||||
```python
|
||||
from pathml.graph import visualize_graph_on_image
|
||||
|
||||
# Visualize graph overlaid on tissue
|
||||
fig, ax = plt.subplots(figsize=(15, 15))
|
||||
ax.imshow(tissue_image)
|
||||
|
||||
# Draw edges
|
||||
for edge in cell_graph.edges:
|
||||
node1, node2 = edge
|
||||
pos1 = cell_graph.positions[node1]
|
||||
pos2 = cell_graph.positions[node2]
|
||||
ax.plot([pos1[0], pos2[0]], [pos1[1], pos2[1]], 'b-', alpha=0.3, linewidth=0.5)
|
||||
|
||||
# Draw nodes colored by type
|
||||
for cell_type in np.unique(cell_type_labels):
|
||||
mask = cell_type_labels == cell_type
|
||||
positions = cell_graph.positions[mask]
|
||||
ax.scatter(positions[:, 0], positions[:, 1], label=f'Type {cell_type}', s=20)
|
||||
|
||||
ax.legend()
|
||||
ax.axis('off')
|
||||
plt.title('Cell Graph on Tissue')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
## Complete Workflow Example
|
||||
|
||||
```python
|
||||
from pathml.core import SlideData, CODEXSlide
|
||||
from pathml.preprocessing import Pipeline, CollapseRunsCODEX, SegmentMIF
|
||||
from pathml.graph import CellGraph, extract_morphology_features, extract_intensity_features
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# 1. Load and preprocess slide
|
||||
slide = CODEXSlide('path/to/codex', stain='IF')
|
||||
|
||||
pipeline = Pipeline([
|
||||
CollapseRunsCODEX(z_slice=2),
|
||||
SegmentMIF(
|
||||
nuclear_channel='DAPI',
|
||||
cytoplasm_channel='CD45',
|
||||
model='mesmer'
|
||||
)
|
||||
])
|
||||
pipeline.run(slide)
|
||||
|
||||
# 2. Build cell graph
|
||||
inst_map = slide.masks['cell_segmentation']
|
||||
cell_graph = CellGraph.from_instance_map(
|
||||
inst_map,
|
||||
image=slide.image,
|
||||
connectivity='knn',
|
||||
k=6
|
||||
)
|
||||
|
||||
# 3. Extract features
|
||||
# Morphological features
|
||||
morph_features = extract_morphology_features(
|
||||
inst_map,
|
||||
features=['area', 'perimeter', 'eccentricity', 'solidity']
|
||||
)
|
||||
cell_graph.add_node_features(morph_features)
|
||||
|
||||
# Intensity features (marker expression)
|
||||
intensity_features = extract_intensity_features(
|
||||
inst_map,
|
||||
image=slide.image,
|
||||
channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
|
||||
statistics=['mean', 'std']
|
||||
)
|
||||
cell_graph.add_node_features(intensity_features)
|
||||
|
||||
# 4. Spatial analysis
|
||||
from pathml.graph import analyze_neighborhoods
|
||||
|
||||
neighborhoods = analyze_neighborhoods(
|
||||
cell_graph,
|
||||
cell_types=cell_type_predictions,
|
||||
radius=100,
|
||||
metrics=['diversity', 'composition']
|
||||
)
|
||||
|
||||
# 5. Export for GNN
|
||||
pyg_data = cell_graph.to_pyg()
|
||||
|
||||
# 6. Visualize
|
||||
plt.figure(figsize=(15, 15))
|
||||
plt.imshow(slide.image)
|
||||
|
||||
# Overlay graph
|
||||
nx_graph = cell_graph.to_networkx()
|
||||
pos = {i: cell_graph.positions[i] for i in range(cell_graph.num_nodes)}
|
||||
nx.draw_networkx(
|
||||
nx_graph,
|
||||
pos=pos,
|
||||
node_color=cell_type_predictions,
|
||||
cmap='tab10',
|
||||
node_size=30,
|
||||
with_labels=False
|
||||
)
|
||||
plt.axis('off')
|
||||
plt.title('Cell Graph with Spatial Neighborhood')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
**Large tissue sections:**
|
||||
- Build graphs tile-by-tile, then merge
|
||||
- Use sparse adjacency matrices
|
||||
- Leverage GPU for feature extraction
|
||||
|
||||
**Memory efficiency:**
|
||||
- Store only necessary edge features
|
||||
- Use int32/float32 instead of int64/float64
|
||||
- Batch process multiple slides
|
||||
|
||||
**Computational efficiency:**
|
||||
- Parallelize feature extraction across cells
|
||||
- Use KNN for faster neighbor queries
|
||||
- Cache computed features
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Choose appropriate connectivity:** KNN for uniform analysis, radius for physical interactions, contact for direct cell-cell communication
|
||||
|
||||
2. **Normalize features:** Scale morphological and intensity features for GNN compatibility
|
||||
|
||||
3. **Handle edge effects:** Exclude boundary cells or use tissue masks to define valid regions
|
||||
|
||||
4. **Validate graph construction:** Visualize graphs on small regions before large-scale processing
|
||||
|
||||
5. **Combine multiple feature types:** Morphology + intensity + texture provides rich representations
|
||||
|
||||
6. **Consider tissue context:** Tissue type affects appropriate graph parameters (connectivity, radius)
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
**Issue: Too many/few edges**
|
||||
- Adjust k (KNN) or radius (radius-based) parameters
|
||||
- Verify pixel-to-micron conversion for biological relevance
|
||||
|
||||
**Issue: Memory errors with large graphs**
|
||||
- Process tiles separately and merge graphs
|
||||
- Use sparse matrix representations
|
||||
- Reduce edge features to essential ones
|
||||
|
||||
**Issue: Missing cells at tissue boundaries**
|
||||
- Apply edge_correction parameter
|
||||
- Use tissue masks to exclude invalid regions
|
||||
|
||||
**Issue: Inconsistent feature scales**
|
||||
- Normalize features: `(x - mean) / std`
|
||||
- Use robust scaling for outliers
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **PathML Graph API:** https://pathml.readthedocs.io/en/latest/api_graph_reference.html
|
||||
- **PyTorch Geometric:** https://pytorch-geometric.readthedocs.io/
|
||||
- **NetworkX:** https://networkx.org/
|
||||
- **Spatial Statistics:** Baddeley et al., "Spatial Point Patterns: Methodology and Applications with R"
|
||||
Reference in New Issue
Block a user