Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/pathml/references/graphs.md
+++ b/skills/pathml/references/graphs.md
@@ -0,0 +1,653 @@
+# Graph Construction & Spatial Analysis
+
+## Overview
+
+PathML provides tools for constructing spatial graphs from tissue images to represent cellular and tissue-level relationships. Graph-based representations enable sophisticated spatial analysis, including neighborhood analysis, cell-cell interaction studies, and graph neural network applications. These graphs capture both morphological features and spatial topology for downstream computational analysis.
+
+## Graph Types
+
+PathML supports construction of multiple graph types:
+
+### Cell Graphs
+- Nodes represent individual cells
+- Edges represent spatial proximity or biological interactions
+- Node features include morphology, marker expression, cell type
+- Suitable for single-cell spatial analysis
+
+### Tissue Graphs
+- Nodes represent tissue regions or superpixels
+- Edges represent spatial adjacency
+- Node features include tissue composition, texture features
+- Suitable for tissue-level spatial patterns
+
+### Spatial Transcriptomics Graphs
+- Nodes represent spatial spots or cells
+- Edges encode spatial relationships
+- Node features include gene expression profiles
+- Suitable for spatial omics analysis
+
+## Graph Construction Workflow
+
+### From Segmentation to Graphs
+
+Convert nucleus or cell segmentation results into spatial graphs:
+
+```python
+from pathml.graph import CellGraph
+from pathml.preprocessing import Pipeline, SegmentMIF
+import numpy as np
+
+# 1. Perform cell segmentation
+pipeline = Pipeline([
+    SegmentMIF(
+        nuclear_channel='DAPI',
+        cytoplasm_channel='CD45',
+        model='mesmer'
+    )
+])
+pipeline.run(slide)
+
+# 2. Extract instance segmentation mask
+inst_map = slide.masks['cell_segmentation']
+
+# 3. Build cell graph
+cell_graph = CellGraph.from_instance_map(
+    inst_map,
+    image=slide.image,  # Optional: for extracting visual features
+    connectivity='delaunay',  # 'knn', 'radius', or 'delaunay'
+    k=5,  # For knn: number of neighbors
+    radius=50  # For radius: distance threshold in pixels
+)
+
+# 4. Access graph components
+nodes = cell_graph.nodes  # Node features
+edges = cell_graph.edges  # Edge list
+adjacency = cell_graph.adjacency_matrix  # Adjacency matrix
+```
+
+### Connectivity Methods
+
+**K-Nearest Neighbors (KNN):**
+```python
+# Connect each cell to its k nearest neighbors
+graph = CellGraph.from_instance_map(
+    inst_map,
+    connectivity='knn',
+    k=5  # Number of neighbors
+)
+```
+- Fixed degree per node
+- Captures local neighborhoods
+- Simple and interpretable
+
+**Radius-based:**
+```python
+# Connect cells within a distance threshold
+graph = CellGraph.from_instance_map(
+    inst_map,
+    connectivity='radius',
+    radius=100,  # Maximum distance in pixels
+    distance_metric='euclidean'  # or 'manhattan', 'chebyshev'
+)
+```
+- Variable degree based on density
+- Biologically motivated (interaction range)
+- Captures physical proximity
+
+**Delaunay Triangulation:**
+```python
+# Connect cells using Delaunay triangulation
+graph = CellGraph.from_instance_map(
+    inst_map,
+    connectivity='delaunay'
+)
+```
+- Creates connected graph from spatial positions
+- No isolated nodes (in convex hull)
+- Captures spatial tessellation
+
+**Contact-based:**
+```python
+# Connect cells with touching boundaries
+graph = CellGraph.from_instance_map(
+    inst_map,
+    connectivity='contact',
+    dilation=2  # Dilate boundaries to capture near-contacts
+)
+```
+- Physical cell-cell contacts
+- Most biologically direct
+- Sparse edges for separated cells
+
+## Node Features
+
+### Morphological Features
+
+Extract shape and size features for each cell:
+
+```python
+from pathml.graph import extract_morphology_features
+
+# Compute morphological features
+morphology_features = extract_morphology_features(
+    inst_map,
+    features=[
+        'area',  # Cell area in pixels
+        'perimeter',  # Cell perimeter
+        'eccentricity',  # Shape elongation
+        'solidity',  # Convexity measure
+        'major_axis_length',
+        'minor_axis_length',
+        'orientation'  # Cell orientation angle
+    ]
+)
+
+# Add to graph
+cell_graph.add_node_features(morphology_features, feature_names=['area', 'perimeter', ...])
+```
+
+**Available morphological features:**
+- **Area** - Number of pixels
+- **Perimeter** - Boundary length
+- **Eccentricity** - 0 (circle) to 1 (line)
+- **Solidity** - Area / convex hull area
+- **Circularity** - 4π × area / perimeter²
+- **Major/Minor axis** - Lengths of fitted ellipse axes
+- **Orientation** - Angle of major axis
+- **Extent** - Area / bounding box area
+
+### Intensity Features
+
+Extract marker expression or intensity statistics:
+
+```python
+from pathml.graph import extract_intensity_features
+
+# Extract mean marker intensities per cell
+intensity_features = extract_intensity_features(
+    inst_map,
+    image=multichannel_image,  # Shape: (H, W, C)
+    channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
+    statistics=['mean', 'std', 'median', 'max']
+)
+
+# Add to graph
+cell_graph.add_node_features(
+    intensity_features,
+    feature_names=['DAPI_mean', 'CD3_mean', ...]
+)
+```
+
+**Available statistics:**
+- **mean** - Average intensity
+- **median** - Median intensity
+- **std** - Standard deviation
+- **max** - Maximum intensity
+- **min** - Minimum intensity
+- **quantile_25/75** - Quartiles
+
+### Texture Features
+
+Compute texture descriptors for each cell region:
+
+```python
+from pathml.graph import extract_texture_features
+
+# Haralick texture features
+texture_features = extract_texture_features(
+    inst_map,
+    image=grayscale_image,
+    features='haralick',  # or 'lbp', 'gabor'
+    distance=1,
+    angles=[0, np.pi/4, np.pi/2, 3*np.pi/4]
+)
+
+cell_graph.add_node_features(texture_features)
+```
+
+### Cell Type Annotations
+
+Add cell type labels from classification:
+
+```python
+# From ML model predictions
+cell_types = hovernet_type_predictions  # Array of cell type IDs
+
+cell_graph.add_node_features(
+    cell_types,
+    feature_names=['cell_type']
+)
+
+# One-hot encode cell types
+cell_type_onehot = one_hot_encode(cell_types, num_classes=5)
+cell_graph.add_node_features(
+    cell_type_onehot,
+    feature_names=['type_epithelial', 'type_inflammatory', ...]
+)
+```
+
+## Edge Features
+
+### Spatial Distance
+
+Compute edge features based on spatial relationships:
+
+```python
+from pathml.graph import compute_edge_distances
+
+# Add pairwise distances as edge features
+distances = compute_edge_distances(
+    cell_graph,
+    metric='euclidean'  # or 'manhattan', 'chebyshev'
+)
+
+cell_graph.add_edge_features(distances, feature_names=['distance'])
+```
+
+### Interaction Features
+
+Model biological interactions between cell types:
+
+```python
+from pathml.graph import compute_interaction_features
+
+# Cell type co-occurrence along edges
+interaction_features = compute_interaction_features(
+    cell_graph,
+    cell_types=cell_type_labels,
+    interaction_type='categorical'  # or 'numerical'
+)
+
+cell_graph.add_edge_features(interaction_features)
+```
+
+## Graph-Level Features
+
+Aggregate features for entire graph:
+
+```python
+from pathml.graph import compute_graph_features
+
+# Topological features
+graph_features = compute_graph_features(
+    cell_graph,
+    features=[
+        'num_nodes',
+        'num_edges',
+        'average_degree',
+        'clustering_coefficient',
+        'average_path_length',
+        'diameter'
+    ]
+)
+
+# Cell composition features
+composition = cell_graph.compute_cell_type_composition(
+    cell_type_labels,
+    normalize=True  # Proportions
+)
+```
+
+## Spatial Analysis
+
+### Neighborhood Analysis
+
+Analyze cell neighborhoods and microenvironments:
+
+```python
+from pathml.graph import analyze_neighborhoods
+
+# Characterize neighborhoods around each cell
+neighborhoods = analyze_neighborhoods(
+    cell_graph,
+    cell_types=cell_type_labels,
+    radius=100,  # Neighborhood radius
+    metrics=['diversity', 'density', 'composition']
+)
+
+# Neighborhood diversity (Shannon entropy)
+diversity = neighborhoods['diversity']
+
+# Cell type composition in each neighborhood
+composition = neighborhoods['composition']  # (n_cells, n_cell_types)
+```
+
+### Spatial Clustering
+
+Identify spatial clusters of cell types:
+
+```python
+from pathml.graph import spatial_clustering
+import matplotlib.pyplot as plt
+
+# Detect spatial clusters
+clusters = spatial_clustering(
+    cell_graph,
+    cell_positions,
+    method='dbscan',  # or 'kmeans', 'hierarchical'
+    eps=50,  # DBSCAN: neighborhood radius
+    min_samples=10  # DBSCAN: minimum cluster size
+)
+
+# Visualize clusters
+plt.scatter(
+    cell_positions[:, 0],
+    cell_positions[:, 1],
+    c=clusters,
+    cmap='tab20'
+)
+plt.title('Spatial Clusters')
+plt.show()
+```
+
+### Cell-Cell Interaction Analysis
+
+Test for enrichment or depletion of cell type interactions:
+
+```python
+from pathml.graph import cell_interaction_analysis
+
+# Test for significant interactions
+interaction_results = cell_interaction_analysis(
+    cell_graph,
+    cell_types=cell_type_labels,
+    method='permutation',  # or 'expected'
+    n_permutations=1000,
+    significance_level=0.05
+)
+
+# Interaction scores (positive = attraction, negative = avoidance)
+interaction_matrix = interaction_results['scores']
+
+# Visualize with heatmap
+import seaborn as sns
+sns.heatmap(
+    interaction_matrix,
+    cmap='RdBu_r',
+    center=0,
+    xticklabels=cell_type_names,
+    yticklabels=cell_type_names
+)
+plt.title('Cell-Cell Interaction Scores')
+plt.show()
+```
+
+### Spatial Statistics
+
+Compute spatial statistics and patterns:
+
+```python
+from pathml.graph import spatial_statistics
+
+# Ripley's K function for spatial point patterns
+ripleys_k = spatial_statistics(
+    cell_positions,
+    cell_types=cell_type_labels,
+    statistic='ripleys_k',
+    radii=np.linspace(0, 200, 50)
+)
+
+# Nearest neighbor distances
+nn_distances = spatial_statistics(
+    cell_positions,
+    statistic='nearest_neighbor',
+    by_cell_type=True
+)
+```
+
+## Integration with Graph Neural Networks
+
+### Convert to PyTorch Geometric Format
+
+```python
+from pathml.graph import to_pyg
+import torch
+from torch_geometric.data import Data
+
+# Convert to PyTorch Geometric Data object
+pyg_data = cell_graph.to_pyg()
+
+# Access components
+x = pyg_data.x  # Node features (n_nodes, n_features)
+edge_index = pyg_data.edge_index  # Edge connectivity (2, n_edges)
+edge_attr = pyg_data.edge_attr  # Edge features (n_edges, n_edge_features)
+y = pyg_data.y  # Graph-level label
+pos = pyg_data.pos  # Node positions (n_nodes, 2)
+
+# Use with PyTorch Geometric
+from torch_geometric.nn import GCNConv
+
+class GNN(torch.nn.Module):
+    def __init__(self, in_channels, hidden_channels, out_channels):
+        super().__init__()
+        self.conv1 = GCNConv(in_channels, hidden_channels)
+        self.conv2 = GCNConv(hidden_channels, out_channels)
+
+    def forward(self, data):
+        x, edge_index = data.x, data.edge_index
+        x = self.conv1(x, edge_index).relu()
+        x = self.conv2(x, edge_index)
+        return x
+
+model = GNN(in_channels=pyg_data.num_features, hidden_channels=64, out_channels=5)
+output = model(pyg_data)
+```
+
+### Graph Dataset for Multiple Slides
+
+```python
+from pathml.graph import GraphDataset
+from torch_geometric.loader import DataLoader
+
+# Create dataset of graphs from multiple slides
+graphs = []
+for slide in slides:
+    # Build graph for each slide
+    cell_graph = CellGraph.from_instance_map(slide.inst_map, ...)
+    pyg_graph = cell_graph.to_pyg()
+    graphs.append(pyg_graph)
+
+# Create DataLoader
+loader = DataLoader(graphs, batch_size=32, shuffle=True)
+
+# Train GNN
+for batch in loader:
+    output = model(batch)
+    loss = criterion(output, batch.y)
+    loss.backward()
+    optimizer.step()
+```
+
+## Visualization
+
+### Graph Visualization
+
+```python
+import matplotlib.pyplot as plt
+import networkx as nx
+
+# Convert to NetworkX
+nx_graph = cell_graph.to_networkx()
+
+# Draw graph with cell positions as layout
+pos = {i: cell_graph.positions[i] for i in range(len(cell_graph.nodes))}
+
+plt.figure(figsize=(12, 12))
+nx.draw_networkx(
+    nx_graph,
+    pos=pos,
+    node_color=cell_type_labels,
+    node_size=50,
+    cmap='tab10',
+    with_labels=False,
+    alpha=0.8
+)
+plt.axis('equal')
+plt.title('Cell Graph')
+plt.show()
+```
+
+### Overlay on Tissue Image
+
+```python
+from pathml.graph import visualize_graph_on_image
+
+# Visualize graph overlaid on tissue
+fig, ax = plt.subplots(figsize=(15, 15))
+ax.imshow(tissue_image)
+
+# Draw edges
+for edge in cell_graph.edges:
+    node1, node2 = edge
+    pos1 = cell_graph.positions[node1]
+    pos2 = cell_graph.positions[node2]
+    ax.plot([pos1[0], pos2[0]], [pos1[1], pos2[1]], 'b-', alpha=0.3, linewidth=0.5)
+
+# Draw nodes colored by type
+for cell_type in np.unique(cell_type_labels):
+    mask = cell_type_labels == cell_type
+    positions = cell_graph.positions[mask]
+    ax.scatter(positions[:, 0], positions[:, 1], label=f'Type {cell_type}', s=20)
+
+ax.legend()
+ax.axis('off')
+plt.title('Cell Graph on Tissue')
+plt.show()
+```
+
+## Complete Workflow Example
+
+```python
+from pathml.core import SlideData, CODEXSlide
+from pathml.preprocessing import Pipeline, CollapseRunsCODEX, SegmentMIF
+from pathml.graph import CellGraph, extract_morphology_features, extract_intensity_features
+import matplotlib.pyplot as plt
+
+# 1. Load and preprocess slide
+slide = CODEXSlide('path/to/codex', stain='IF')
+
+pipeline = Pipeline([
+    CollapseRunsCODEX(z_slice=2),
+    SegmentMIF(
+        nuclear_channel='DAPI',
+        cytoplasm_channel='CD45',
+        model='mesmer'
+    )
+])
+pipeline.run(slide)
+
+# 2. Build cell graph
+inst_map = slide.masks['cell_segmentation']
+cell_graph = CellGraph.from_instance_map(
+    inst_map,
+    image=slide.image,
+    connectivity='knn',
+    k=6
+)
+
+# 3. Extract features
+# Morphological features
+morph_features = extract_morphology_features(
+    inst_map,
+    features=['area', 'perimeter', 'eccentricity', 'solidity']
+)
+cell_graph.add_node_features(morph_features)
+
+# Intensity features (marker expression)
+intensity_features = extract_intensity_features(
+    inst_map,
+    image=slide.image,
+    channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
+    statistics=['mean', 'std']
+)
+cell_graph.add_node_features(intensity_features)
+
+# 4. Spatial analysis
+from pathml.graph import analyze_neighborhoods
+
+neighborhoods = analyze_neighborhoods(
+    cell_graph,
+    cell_types=cell_type_predictions,
+    radius=100,
+    metrics=['diversity', 'composition']
+)
+
+# 5. Export for GNN
+pyg_data = cell_graph.to_pyg()
+
+# 6. Visualize
+plt.figure(figsize=(15, 15))
+plt.imshow(slide.image)
+
+# Overlay graph
+nx_graph = cell_graph.to_networkx()
+pos = {i: cell_graph.positions[i] for i in range(cell_graph.num_nodes)}
+nx.draw_networkx(
+    nx_graph,
+    pos=pos,
+    node_color=cell_type_predictions,
+    cmap='tab10',
+    node_size=30,
+    with_labels=False
+)
+plt.axis('off')
+plt.title('Cell Graph with Spatial Neighborhood')
+plt.show()
+```
+
+## Performance Considerations
+
+**Large tissue sections:**
+- Build graphs tile-by-tile, then merge
+- Use sparse adjacency matrices
+- Leverage GPU for feature extraction
+
+**Memory efficiency:**
+- Store only necessary edge features
+- Use int32/float32 instead of int64/float64
+- Batch process multiple slides
+
+**Computational efficiency:**
+- Parallelize feature extraction across cells
+- Use KNN for faster neighbor queries
+- Cache computed features
+
+## Best Practices
+
+1. **Choose appropriate connectivity:** KNN for uniform analysis, radius for physical interactions, contact for direct cell-cell communication
+
+2. **Normalize features:** Scale morphological and intensity features for GNN compatibility
+
+3. **Handle edge effects:** Exclude boundary cells or use tissue masks to define valid regions
+
+4. **Validate graph construction:** Visualize graphs on small regions before large-scale processing
+
+5. **Combine multiple feature types:** Morphology + intensity + texture provides rich representations
+
+6. **Consider tissue context:** Tissue type affects appropriate graph parameters (connectivity, radius)
+
+## Common Issues and Solutions
+
+**Issue: Too many/few edges**
+- Adjust k (KNN) or radius (radius-based) parameters
+- Verify pixel-to-micron conversion for biological relevance
+
+**Issue: Memory errors with large graphs**
+- Process tiles separately and merge graphs
+- Use sparse matrix representations
+- Reduce edge features to essential ones
+
+**Issue: Missing cells at tissue boundaries**
+- Apply edge_correction parameter
+- Use tissue masks to exclude invalid regions
+
+**Issue: Inconsistent feature scales**
+- Normalize features: `(x - mean) / std`
+- Use robust scaling for outliers
+
+## Additional Resources
+
+- **PathML Graph API:** https://pathml.readthedocs.io/en/latest/api_graph_reference.html
+- **PyTorch Geometric:** https://pytorch-geometric.readthedocs.io/
+- **NetworkX:** https://networkx.org/
+- **Spatial Statistics:** Baddeley et al., "Spatial Point Patterns: Methodology and Applications with R"