Files
gh-k-dense-ai-claude-scient…/skills/pathml/references/graphs.md
2025-11-30 08:30:10 +08:00

654 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Graph Construction & Spatial Analysis
## Overview
PathML provides tools for constructing spatial graphs from tissue images to represent cellular and tissue-level relationships. Graph-based representations enable sophisticated spatial analysis, including neighborhood analysis, cell-cell interaction studies, and graph neural network applications. These graphs capture both morphological features and spatial topology for downstream computational analysis.
## Graph Types
PathML supports construction of multiple graph types:
### Cell Graphs
- Nodes represent individual cells
- Edges represent spatial proximity or biological interactions
- Node features include morphology, marker expression, cell type
- Suitable for single-cell spatial analysis
### Tissue Graphs
- Nodes represent tissue regions or superpixels
- Edges represent spatial adjacency
- Node features include tissue composition, texture features
- Suitable for tissue-level spatial patterns
### Spatial Transcriptomics Graphs
- Nodes represent spatial spots or cells
- Edges encode spatial relationships
- Node features include gene expression profiles
- Suitable for spatial omics analysis
## Graph Construction Workflow
### From Segmentation to Graphs
Convert nucleus or cell segmentation results into spatial graphs:
```python
from pathml.graph import CellGraph
from pathml.preprocessing import Pipeline, SegmentMIF
import numpy as np
# 1. Perform cell segmentation
pipeline = Pipeline([
SegmentMIF(
nuclear_channel='DAPI',
cytoplasm_channel='CD45',
model='mesmer'
)
])
pipeline.run(slide)
# 2. Extract instance segmentation mask
inst_map = slide.masks['cell_segmentation']
# 3. Build cell graph
cell_graph = CellGraph.from_instance_map(
inst_map,
image=slide.image, # Optional: for extracting visual features
connectivity='delaunay', # 'knn', 'radius', or 'delaunay'
k=5, # For knn: number of neighbors
radius=50 # For radius: distance threshold in pixels
)
# 4. Access graph components
nodes = cell_graph.nodes # Node features
edges = cell_graph.edges # Edge list
adjacency = cell_graph.adjacency_matrix # Adjacency matrix
```
### Connectivity Methods
**K-Nearest Neighbors (KNN):**
```python
# Connect each cell to its k nearest neighbors
graph = CellGraph.from_instance_map(
inst_map,
connectivity='knn',
k=5 # Number of neighbors
)
```
- Fixed degree per node
- Captures local neighborhoods
- Simple and interpretable
**Radius-based:**
```python
# Connect cells within a distance threshold
graph = CellGraph.from_instance_map(
inst_map,
connectivity='radius',
radius=100, # Maximum distance in pixels
distance_metric='euclidean' # or 'manhattan', 'chebyshev'
)
```
- Variable degree based on density
- Biologically motivated (interaction range)
- Captures physical proximity
**Delaunay Triangulation:**
```python
# Connect cells using Delaunay triangulation
graph = CellGraph.from_instance_map(
inst_map,
connectivity='delaunay'
)
```
- Creates connected graph from spatial positions
- No isolated nodes (in convex hull)
- Captures spatial tessellation
**Contact-based:**
```python
# Connect cells with touching boundaries
graph = CellGraph.from_instance_map(
inst_map,
connectivity='contact',
dilation=2 # Dilate boundaries to capture near-contacts
)
```
- Physical cell-cell contacts
- Most biologically direct
- Sparse edges for separated cells
## Node Features
### Morphological Features
Extract shape and size features for each cell:
```python
from pathml.graph import extract_morphology_features
# Compute morphological features
morphology_features = extract_morphology_features(
inst_map,
features=[
'area', # Cell area in pixels
'perimeter', # Cell perimeter
'eccentricity', # Shape elongation
'solidity', # Convexity measure
'major_axis_length',
'minor_axis_length',
'orientation' # Cell orientation angle
]
)
# Add to graph
cell_graph.add_node_features(morphology_features, feature_names=['area', 'perimeter', ...])
```
**Available morphological features:**
- **Area** - Number of pixels
- **Perimeter** - Boundary length
- **Eccentricity** - 0 (circle) to 1 (line)
- **Solidity** - Area / convex hull area
- **Circularity** - 4π × area / perimeter²
- **Major/Minor axis** - Lengths of fitted ellipse axes
- **Orientation** - Angle of major axis
- **Extent** - Area / bounding box area
### Intensity Features
Extract marker expression or intensity statistics:
```python
from pathml.graph import extract_intensity_features
# Extract mean marker intensities per cell
intensity_features = extract_intensity_features(
inst_map,
image=multichannel_image, # Shape: (H, W, C)
channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
statistics=['mean', 'std', 'median', 'max']
)
# Add to graph
cell_graph.add_node_features(
intensity_features,
feature_names=['DAPI_mean', 'CD3_mean', ...]
)
```
**Available statistics:**
- **mean** - Average intensity
- **median** - Median intensity
- **std** - Standard deviation
- **max** - Maximum intensity
- **min** - Minimum intensity
- **quantile_25/75** - Quartiles
### Texture Features
Compute texture descriptors for each cell region:
```python
from pathml.graph import extract_texture_features
# Haralick texture features
texture_features = extract_texture_features(
inst_map,
image=grayscale_image,
features='haralick', # or 'lbp', 'gabor'
distance=1,
angles=[0, np.pi/4, np.pi/2, 3*np.pi/4]
)
cell_graph.add_node_features(texture_features)
```
### Cell Type Annotations
Add cell type labels from classification:
```python
# From ML model predictions
cell_types = hovernet_type_predictions # Array of cell type IDs
cell_graph.add_node_features(
cell_types,
feature_names=['cell_type']
)
# One-hot encode cell types
cell_type_onehot = one_hot_encode(cell_types, num_classes=5)
cell_graph.add_node_features(
cell_type_onehot,
feature_names=['type_epithelial', 'type_inflammatory', ...]
)
```
## Edge Features
### Spatial Distance
Compute edge features based on spatial relationships:
```python
from pathml.graph import compute_edge_distances
# Add pairwise distances as edge features
distances = compute_edge_distances(
cell_graph,
metric='euclidean' # or 'manhattan', 'chebyshev'
)
cell_graph.add_edge_features(distances, feature_names=['distance'])
```
### Interaction Features
Model biological interactions between cell types:
```python
from pathml.graph import compute_interaction_features
# Cell type co-occurrence along edges
interaction_features = compute_interaction_features(
cell_graph,
cell_types=cell_type_labels,
interaction_type='categorical' # or 'numerical'
)
cell_graph.add_edge_features(interaction_features)
```
## Graph-Level Features
Aggregate features for entire graph:
```python
from pathml.graph import compute_graph_features
# Topological features
graph_features = compute_graph_features(
cell_graph,
features=[
'num_nodes',
'num_edges',
'average_degree',
'clustering_coefficient',
'average_path_length',
'diameter'
]
)
# Cell composition features
composition = cell_graph.compute_cell_type_composition(
cell_type_labels,
normalize=True # Proportions
)
```
## Spatial Analysis
### Neighborhood Analysis
Analyze cell neighborhoods and microenvironments:
```python
from pathml.graph import analyze_neighborhoods
# Characterize neighborhoods around each cell
neighborhoods = analyze_neighborhoods(
cell_graph,
cell_types=cell_type_labels,
radius=100, # Neighborhood radius
metrics=['diversity', 'density', 'composition']
)
# Neighborhood diversity (Shannon entropy)
diversity = neighborhoods['diversity']
# Cell type composition in each neighborhood
composition = neighborhoods['composition'] # (n_cells, n_cell_types)
```
### Spatial Clustering
Identify spatial clusters of cell types:
```python
from pathml.graph import spatial_clustering
import matplotlib.pyplot as plt
# Detect spatial clusters
clusters = spatial_clustering(
cell_graph,
cell_positions,
method='dbscan', # or 'kmeans', 'hierarchical'
eps=50, # DBSCAN: neighborhood radius
min_samples=10 # DBSCAN: minimum cluster size
)
# Visualize clusters
plt.scatter(
cell_positions[:, 0],
cell_positions[:, 1],
c=clusters,
cmap='tab20'
)
plt.title('Spatial Clusters')
plt.show()
```
### Cell-Cell Interaction Analysis
Test for enrichment or depletion of cell type interactions:
```python
from pathml.graph import cell_interaction_analysis
# Test for significant interactions
interaction_results = cell_interaction_analysis(
cell_graph,
cell_types=cell_type_labels,
method='permutation', # or 'expected'
n_permutations=1000,
significance_level=0.05
)
# Interaction scores (positive = attraction, negative = avoidance)
interaction_matrix = interaction_results['scores']
# Visualize with heatmap
import seaborn as sns
sns.heatmap(
interaction_matrix,
cmap='RdBu_r',
center=0,
xticklabels=cell_type_names,
yticklabels=cell_type_names
)
plt.title('Cell-Cell Interaction Scores')
plt.show()
```
### Spatial Statistics
Compute spatial statistics and patterns:
```python
from pathml.graph import spatial_statistics
# Ripley's K function for spatial point patterns
ripleys_k = spatial_statistics(
cell_positions,
cell_types=cell_type_labels,
statistic='ripleys_k',
radii=np.linspace(0, 200, 50)
)
# Nearest neighbor distances
nn_distances = spatial_statistics(
cell_positions,
statistic='nearest_neighbor',
by_cell_type=True
)
```
## Integration with Graph Neural Networks
### Convert to PyTorch Geometric Format
```python
from pathml.graph import to_pyg
import torch
from torch_geometric.data import Data
# Convert to PyTorch Geometric Data object
pyg_data = cell_graph.to_pyg()
# Access components
x = pyg_data.x # Node features (n_nodes, n_features)
edge_index = pyg_data.edge_index # Edge connectivity (2, n_edges)
edge_attr = pyg_data.edge_attr # Edge features (n_edges, n_edge_features)
y = pyg_data.y # Graph-level label
pos = pyg_data.pos # Node positions (n_nodes, 2)
# Use with PyTorch Geometric
from torch_geometric.nn import GCNConv
class GNN(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels):
super().__init__()
self.conv1 = GCNConv(in_channels, hidden_channels)
self.conv2 = GCNConv(hidden_channels, out_channels)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index).relu()
x = self.conv2(x, edge_index)
return x
model = GNN(in_channels=pyg_data.num_features, hidden_channels=64, out_channels=5)
output = model(pyg_data)
```
### Graph Dataset for Multiple Slides
```python
from pathml.graph import GraphDataset
from torch_geometric.loader import DataLoader
# Create dataset of graphs from multiple slides
graphs = []
for slide in slides:
# Build graph for each slide
cell_graph = CellGraph.from_instance_map(slide.inst_map, ...)
pyg_graph = cell_graph.to_pyg()
graphs.append(pyg_graph)
# Create DataLoader
loader = DataLoader(graphs, batch_size=32, shuffle=True)
# Train GNN
for batch in loader:
output = model(batch)
loss = criterion(output, batch.y)
loss.backward()
optimizer.step()
```
## Visualization
### Graph Visualization
```python
import matplotlib.pyplot as plt
import networkx as nx
# Convert to NetworkX
nx_graph = cell_graph.to_networkx()
# Draw graph with cell positions as layout
pos = {i: cell_graph.positions[i] for i in range(len(cell_graph.nodes))}
plt.figure(figsize=(12, 12))
nx.draw_networkx(
nx_graph,
pos=pos,
node_color=cell_type_labels,
node_size=50,
cmap='tab10',
with_labels=False,
alpha=0.8
)
plt.axis('equal')
plt.title('Cell Graph')
plt.show()
```
### Overlay on Tissue Image
```python
from pathml.graph import visualize_graph_on_image
# Visualize graph overlaid on tissue
fig, ax = plt.subplots(figsize=(15, 15))
ax.imshow(tissue_image)
# Draw edges
for edge in cell_graph.edges:
node1, node2 = edge
pos1 = cell_graph.positions[node1]
pos2 = cell_graph.positions[node2]
ax.plot([pos1[0], pos2[0]], [pos1[1], pos2[1]], 'b-', alpha=0.3, linewidth=0.5)
# Draw nodes colored by type
for cell_type in np.unique(cell_type_labels):
mask = cell_type_labels == cell_type
positions = cell_graph.positions[mask]
ax.scatter(positions[:, 0], positions[:, 1], label=f'Type {cell_type}', s=20)
ax.legend()
ax.axis('off')
plt.title('Cell Graph on Tissue')
plt.show()
```
## Complete Workflow Example
```python
from pathml.core import SlideData, CODEXSlide
from pathml.preprocessing import Pipeline, CollapseRunsCODEX, SegmentMIF
from pathml.graph import CellGraph, extract_morphology_features, extract_intensity_features
import matplotlib.pyplot as plt
# 1. Load and preprocess slide
slide = CODEXSlide('path/to/codex', stain='IF')
pipeline = Pipeline([
CollapseRunsCODEX(z_slice=2),
SegmentMIF(
nuclear_channel='DAPI',
cytoplasm_channel='CD45',
model='mesmer'
)
])
pipeline.run(slide)
# 2. Build cell graph
inst_map = slide.masks['cell_segmentation']
cell_graph = CellGraph.from_instance_map(
inst_map,
image=slide.image,
connectivity='knn',
k=6
)
# 3. Extract features
# Morphological features
morph_features = extract_morphology_features(
inst_map,
features=['area', 'perimeter', 'eccentricity', 'solidity']
)
cell_graph.add_node_features(morph_features)
# Intensity features (marker expression)
intensity_features = extract_intensity_features(
inst_map,
image=slide.image,
channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
statistics=['mean', 'std']
)
cell_graph.add_node_features(intensity_features)
# 4. Spatial analysis
from pathml.graph import analyze_neighborhoods
neighborhoods = analyze_neighborhoods(
cell_graph,
cell_types=cell_type_predictions,
radius=100,
metrics=['diversity', 'composition']
)
# 5. Export for GNN
pyg_data = cell_graph.to_pyg()
# 6. Visualize
plt.figure(figsize=(15, 15))
plt.imshow(slide.image)
# Overlay graph
nx_graph = cell_graph.to_networkx()
pos = {i: cell_graph.positions[i] for i in range(cell_graph.num_nodes)}
nx.draw_networkx(
nx_graph,
pos=pos,
node_color=cell_type_predictions,
cmap='tab10',
node_size=30,
with_labels=False
)
plt.axis('off')
plt.title('Cell Graph with Spatial Neighborhood')
plt.show()
```
## Performance Considerations
**Large tissue sections:**
- Build graphs tile-by-tile, then merge
- Use sparse adjacency matrices
- Leverage GPU for feature extraction
**Memory efficiency:**
- Store only necessary edge features
- Use int32/float32 instead of int64/float64
- Batch process multiple slides
**Computational efficiency:**
- Parallelize feature extraction across cells
- Use KNN for faster neighbor queries
- Cache computed features
## Best Practices
1. **Choose appropriate connectivity:** KNN for uniform analysis, radius for physical interactions, contact for direct cell-cell communication
2. **Normalize features:** Scale morphological and intensity features for GNN compatibility
3. **Handle edge effects:** Exclude boundary cells or use tissue masks to define valid regions
4. **Validate graph construction:** Visualize graphs on small regions before large-scale processing
5. **Combine multiple feature types:** Morphology + intensity + texture provides rich representations
6. **Consider tissue context:** Tissue type affects appropriate graph parameters (connectivity, radius)
## Common Issues and Solutions
**Issue: Too many/few edges**
- Adjust k (KNN) or radius (radius-based) parameters
- Verify pixel-to-micron conversion for biological relevance
**Issue: Memory errors with large graphs**
- Process tiles separately and merge graphs
- Use sparse matrix representations
- Reduce edge features to essential ones
**Issue: Missing cells at tissue boundaries**
- Apply edge_correction parameter
- Use tissue masks to exclude invalid regions
**Issue: Inconsistent feature scales**
- Normalize features: `(x - mean) / std`
- Use robust scaling for outliers
## Additional Resources
- **PathML Graph API:** https://pathml.readthedocs.io/en/latest/api_graph_reference.html
- **PyTorch Geometric:** https://pytorch-geometric.readthedocs.io/
- **NetworkX:** https://networkx.org/
- **Spatial Statistics:** Baddeley et al., "Spatial Point Patterns: Methodology and Applications with R"