# Graph Construction & Spatial Analysis

## Overview

PathML provides tools for constructing spatial graphs from tissue images to represent cellular and tissue-level relationships. Graph-based representations enable sophisticated spatial analysis, including neighborhood analysis, cell-cell interaction studies, and graph neural network applications. These graphs capture both morphological features and spatial topology for downstream computational analysis.

## Graph Types

PathML supports construction of multiple graph types:

### Cell Graphs
- Nodes represent individual cells
- Edges represent spatial proximity or biological interactions
- Node features include morphology, marker expression, cell type
- Suitable for single-cell spatial analysis

### Tissue Graphs
- Nodes represent tissue regions or superpixels
- Edges represent spatial adjacency
- Node features include tissue composition, texture features
- Suitable for tissue-level spatial patterns

### Spatial Transcriptomics Graphs
- Nodes represent spatial spots or cells
- Edges encode spatial relationships
- Node features include gene expression profiles
- Suitable for spatial omics analysis

## Graph Construction Workflow

### From Segmentation to Graphs

Convert nucleus or cell segmentation results into spatial graphs:

```python
from pathml.graph import CellGraph
from pathml.preprocessing import Pipeline, SegmentMIF
import numpy as np

# 1. Perform cell segmentation
pipeline = Pipeline([
    SegmentMIF(
        nuclear_channel='DAPI',
        cytoplasm_channel='CD45',
        model='mesmer'
    )
])
pipeline.run(slide)

# 2. Extract instance segmentation mask
inst_map = slide.masks['cell_segmentation']

# 3. Build cell graph
cell_graph = CellGraph.from_instance_map(
    inst_map,
    image=slide.image,  # Optional: for extracting visual features
    connectivity='delaunay',  # 'knn', 'radius', or 'delaunay'
    k=5,  # For knn: number of neighbors
    radius=50  # For radius: distance threshold in pixels
)

# 4. Access graph components
nodes = cell_graph.nodes  # Node features
edges = cell_graph.edges  # Edge list
adjacency = cell_graph.adjacency_matrix  # Adjacency matrix
```

### Connectivity Methods

**K-Nearest Neighbors (KNN):**
```python
# Connect each cell to its k nearest neighbors
graph = CellGraph.from_instance_map(
    inst_map,
    connectivity='knn',
    k=5  # Number of neighbors
)
```
- Fixed degree per node
- Captures local neighborhoods
- Simple and interpretable

**Radius-based:**
```python
# Connect cells within a distance threshold
graph = CellGraph.from_instance_map(
    inst_map,
    connectivity='radius',
    radius=100,  # Maximum distance in pixels
    distance_metric='euclidean'  # or 'manhattan', 'chebyshev'
)
```
- Variable degree based on density
- Biologically motivated (interaction range)
- Captures physical proximity

**Delaunay Triangulation:**
```python
# Connect cells using Delaunay triangulation
graph = CellGraph.from_instance_map(
    inst_map,
    connectivity='delaunay'
)
```
- Creates connected graph from spatial positions
- No isolated nodes (in convex hull)
- Captures spatial tessellation

**Contact-based:**
```python
# Connect cells with touching boundaries
graph = CellGraph.from_instance_map(
    inst_map,
    connectivity='contact',
    dilation=2  # Dilate boundaries to capture near-contacts
)
```
- Physical cell-cell contacts
- Most biologically direct
- Sparse edges for separated cells

## Node Features

### Morphological Features

Extract shape and size features for each cell:

```python
from pathml.graph import extract_morphology_features

# Compute morphological features
morphology_features = extract_morphology_features(
    inst_map,
    features=[
        'area',  # Cell area in pixels
        'perimeter',  # Cell perimeter
        'eccentricity',  # Shape elongation
        'solidity',  # Convexity measure
        'major_axis_length',
        'minor_axis_length',
        'orientation'  # Cell orientation angle
    ]
)

# Add to graph
cell_graph.add_node_features(morphology_features, feature_names=['area', 'perimeter', ...])
```

**Available morphological features:**
- **Area** - Number of pixels
- **Perimeter** - Boundary length
- **Eccentricity** - 0 (circle) to 1 (line)
- **Solidity** - Area / convex hull area
- **Circularity** - 4π × area / perimeter²
- **Major/Minor axis** - Lengths of fitted ellipse axes
- **Orientation** - Angle of major axis
- **Extent** - Area / bounding box area

### Intensity Features

Extract marker expression or intensity statistics:

```python
from pathml.graph import extract_intensity_features

# Extract mean marker intensities per cell
intensity_features = extract_intensity_features(
    inst_map,
    image=multichannel_image,  # Shape: (H, W, C)
    channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
    statistics=['mean', 'std', 'median', 'max']
)

# Add to graph
cell_graph.add_node_features(
    intensity_features,
    feature_names=['DAPI_mean', 'CD3_mean', ...]
)
```

**Available statistics:**
- **mean** - Average intensity
- **median** - Median intensity
- **std** - Standard deviation
- **max** - Maximum intensity
- **min** - Minimum intensity
- **quantile_25/75** - Quartiles

### Texture Features

Compute texture descriptors for each cell region:

```python
from pathml.graph import extract_texture_features

# Haralick texture features
texture_features = extract_texture_features(
    inst_map,
    image=grayscale_image,
    features='haralick',  # or 'lbp', 'gabor'
    distance=1,
    angles=[0, np.pi/4, np.pi/2, 3*np.pi/4]
)

cell_graph.add_node_features(texture_features)
```

### Cell Type Annotations

Add cell type labels from classification:

```python
# From ML model predictions
cell_types = hovernet_type_predictions  # Array of cell type IDs

cell_graph.add_node_features(
    cell_types,
    feature_names=['cell_type']
)

# One-hot encode cell types
cell_type_onehot = one_hot_encode(cell_types, num_classes=5)
cell_graph.add_node_features(
    cell_type_onehot,
    feature_names=['type_epithelial', 'type_inflammatory', ...]
)
```

## Edge Features

### Spatial Distance

Compute edge features based on spatial relationships:

```python
from pathml.graph import compute_edge_distances

# Add pairwise distances as edge features
distances = compute_edge_distances(
    cell_graph,
    metric='euclidean'  # or 'manhattan', 'chebyshev'
)

cell_graph.add_edge_features(distances, feature_names=['distance'])
```

### Interaction Features

Model biological interactions between cell types:

```python
from pathml.graph import compute_interaction_features

# Cell type co-occurrence along edges
interaction_features = compute_interaction_features(
    cell_graph,
    cell_types=cell_type_labels,
    interaction_type='categorical'  # or 'numerical'
)

cell_graph.add_edge_features(interaction_features)
```

## Graph-Level Features

Aggregate features for entire graph:

```python
from pathml.graph import compute_graph_features

# Topological features
graph_features = compute_graph_features(
    cell_graph,
    features=[
        'num_nodes',
        'num_edges',
        'average_degree',
        'clustering_coefficient',
        'average_path_length',
        'diameter'
    ]
)

# Cell composition features
composition = cell_graph.compute_cell_type_composition(
    cell_type_labels,
    normalize=True  # Proportions
)
```

## Spatial Analysis

### Neighborhood Analysis

Analyze cell neighborhoods and microenvironments:

```python
from pathml.graph import analyze_neighborhoods

# Characterize neighborhoods around each cell
neighborhoods = analyze_neighborhoods(
    cell_graph,
    cell_types=cell_type_labels,
    radius=100,  # Neighborhood radius
    metrics=['diversity', 'density', 'composition']
)

# Neighborhood diversity (Shannon entropy)
diversity = neighborhoods['diversity']

# Cell type composition in each neighborhood
composition = neighborhoods['composition']  # (n_cells, n_cell_types)
```

### Spatial Clustering

Identify spatial clusters of cell types:

```python
from pathml.graph import spatial_clustering
import matplotlib.pyplot as plt

# Detect spatial clusters
clusters = spatial_clustering(
    cell_graph,
    cell_positions,
    method='dbscan',  # or 'kmeans', 'hierarchical'
    eps=50,  # DBSCAN: neighborhood radius
    min_samples=10  # DBSCAN: minimum cluster size
)

# Visualize clusters
plt.scatter(
    cell_positions[:, 0],
    cell_positions[:, 1],
    c=clusters,
    cmap='tab20'
)
plt.title('Spatial Clusters')
plt.show()
```

### Cell-Cell Interaction Analysis

Test for enrichment or depletion of cell type interactions:

```python
from pathml.graph import cell_interaction_analysis

# Test for significant interactions
interaction_results = cell_interaction_analysis(
    cell_graph,
    cell_types=cell_type_labels,
    method='permutation',  # or 'expected'
    n_permutations=1000,
    significance_level=0.05
)

# Interaction scores (positive = attraction, negative = avoidance)
interaction_matrix = interaction_results['scores']

# Visualize with heatmap
import seaborn as sns
sns.heatmap(
    interaction_matrix,
    cmap='RdBu_r',
    center=0,
    xticklabels=cell_type_names,
    yticklabels=cell_type_names
)
plt.title('Cell-Cell Interaction Scores')
plt.show()
```

### Spatial Statistics

Compute spatial statistics and patterns:

```python
from pathml.graph import spatial_statistics

# Ripley's K function for spatial point patterns
ripleys_k = spatial_statistics(
    cell_positions,
    cell_types=cell_type_labels,
    statistic='ripleys_k',
    radii=np.linspace(0, 200, 50)
)

# Nearest neighbor distances
nn_distances = spatial_statistics(
    cell_positions,
    statistic='nearest_neighbor',
    by_cell_type=True
)
```

## Integration with Graph Neural Networks

### Convert to PyTorch Geometric Format

```python
from pathml.graph import to_pyg
import torch
from torch_geometric.data import Data

# Convert to PyTorch Geometric Data object
pyg_data = cell_graph.to_pyg()

# Access components
x = pyg_data.x  # Node features (n_nodes, n_features)
edge_index = pyg_data.edge_index  # Edge connectivity (2, n_edges)
edge_attr = pyg_data.edge_attr  # Edge features (n_edges, n_edge_features)
y = pyg_data.y  # Graph-level label
pos = pyg_data.pos  # Node positions (n_nodes, 2)

# Use with PyTorch Geometric
from torch_geometric.nn import GCNConv

class GNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x

model = GNN(in_channels=pyg_data.num_features, hidden_channels=64, out_channels=5)
output = model(pyg_data)
```

### Graph Dataset for Multiple Slides

```python
from pathml.graph import GraphDataset
from torch_geometric.loader import DataLoader

# Create dataset of graphs from multiple slides
graphs = []
for slide in slides:
    # Build graph for each slide
    cell_graph = CellGraph.from_instance_map(slide.inst_map, ...)
    pyg_graph = cell_graph.to_pyg()
    graphs.append(pyg_graph)

# Create DataLoader
loader = DataLoader(graphs, batch_size=32, shuffle=True)

# Train GNN
for batch in loader:
    output = model(batch)
    loss = criterion(output, batch.y)
    loss.backward()
    optimizer.step()
```

## Visualization

### Graph Visualization

```python
import matplotlib.pyplot as plt
import networkx as nx

# Convert to NetworkX
nx_graph = cell_graph.to_networkx()

# Draw graph with cell positions as layout
pos = {i: cell_graph.positions[i] for i in range(len(cell_graph.nodes))}

plt.figure(figsize=(12, 12))
nx.draw_networkx(
    nx_graph,
    pos=pos,
    node_color=cell_type_labels,
    node_size=50,
    cmap='tab10',
    with_labels=False,
    alpha=0.8
)
plt.axis('equal')
plt.title('Cell Graph')
plt.show()
```

### Overlay on Tissue Image

```python
from pathml.graph import visualize_graph_on_image

# Visualize graph overlaid on tissue
fig, ax = plt.subplots(figsize=(15, 15))
ax.imshow(tissue_image)

# Draw edges
for edge in cell_graph.edges:
    node1, node2 = edge
    pos1 = cell_graph.positions[node1]
    pos2 = cell_graph.positions[node2]
    ax.plot([pos1[0], pos2[0]], [pos1[1], pos2[1]], 'b-', alpha=0.3, linewidth=0.5)

# Draw nodes colored by type
for cell_type in np.unique(cell_type_labels):
    mask = cell_type_labels == cell_type
    positions = cell_graph.positions[mask]
    ax.scatter(positions[:, 0], positions[:, 1], label=f'Type {cell_type}', s=20)

ax.legend()
ax.axis('off')
plt.title('Cell Graph on Tissue')
plt.show()
```

## Complete Workflow Example

```python
from pathml.core import SlideData, CODEXSlide
from pathml.preprocessing import Pipeline, CollapseRunsCODEX, SegmentMIF
from pathml.graph import CellGraph, extract_morphology_features, extract_intensity_features
import matplotlib.pyplot as plt

# 1. Load and preprocess slide
slide = CODEXSlide('path/to/codex', stain='IF')

pipeline = Pipeline([
    CollapseRunsCODEX(z_slice=2),
    SegmentMIF(
        nuclear_channel='DAPI',
        cytoplasm_channel='CD45',
        model='mesmer'
    )
])
pipeline.run(slide)

# 2. Build cell graph
inst_map = slide.masks['cell_segmentation']
cell_graph = CellGraph.from_instance_map(
    inst_map,
    image=slide.image,
    connectivity='knn',
    k=6
)

# 3. Extract features
# Morphological features
morph_features = extract_morphology_features(
    inst_map,
    features=['area', 'perimeter', 'eccentricity', 'solidity']
)
cell_graph.add_node_features(morph_features)

# Intensity features (marker expression)
intensity_features = extract_intensity_features(
    inst_map,
    image=slide.image,
    channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
    statistics=['mean', 'std']
)
cell_graph.add_node_features(intensity_features)

# 4. Spatial analysis
from pathml.graph import analyze_neighborhoods

neighborhoods = analyze_neighborhoods(
    cell_graph,
    cell_types=cell_type_predictions,
    radius=100,
    metrics=['diversity', 'composition']
)

# 5. Export for GNN
pyg_data = cell_graph.to_pyg()

# 6. Visualize
plt.figure(figsize=(15, 15))
plt.imshow(slide.image)

# Overlay graph
nx_graph = cell_graph.to_networkx()
pos = {i: cell_graph.positions[i] for i in range(cell_graph.num_nodes)}
nx.draw_networkx(
    nx_graph,
    pos=pos,
    node_color=cell_type_predictions,
    cmap='tab10',
    node_size=30,
    with_labels=False
)
plt.axis('off')
plt.title('Cell Graph with Spatial Neighborhood')
plt.show()
```

## Performance Considerations

**Large tissue sections:**
- Build graphs tile-by-tile, then merge
- Use sparse adjacency matrices
- Leverage GPU for feature extraction

**Memory efficiency:**
- Store only necessary edge features
- Use int32/float32 instead of int64/float64
- Batch process multiple slides

**Computational efficiency:**
- Parallelize feature extraction across cells
- Use KNN for faster neighbor queries
- Cache computed features

## Best Practices

1. **Choose appropriate connectivity:** KNN for uniform analysis, radius for physical interactions, contact for direct cell-cell communication

2. **Normalize features:** Scale morphological and intensity features for GNN compatibility

3. **Handle edge effects:** Exclude boundary cells or use tissue masks to define valid regions

4. **Validate graph construction:** Visualize graphs on small regions before large-scale processing

5. **Combine multiple feature types:** Morphology + intensity + texture provides rich representations

6. **Consider tissue context:** Tissue type affects appropriate graph parameters (connectivity, radius)

## Common Issues and Solutions

**Issue: Too many/few edges**
- Adjust k (KNN) or radius (radius-based) parameters
- Verify pixel-to-micron conversion for biological relevance

**Issue: Memory errors with large graphs**
- Process tiles separately and merge graphs
- Use sparse matrix representations
- Reduce edge features to essential ones

**Issue: Missing cells at tissue boundaries**
- Apply edge_correction parameter
- Use tissue masks to exclude invalid regions

**Issue: Inconsistent feature scales**
- Normalize features: `(x - mean) / std`
- Use robust scaling for outliers

## Additional Resources

- **PathML Graph API:** https://pathml.readthedocs.io/en/latest/api_graph_reference.html
- **PyTorch Geometric:** https://pytorch-geometric.readthedocs.io/
- **NetworkX:** https://networkx.org/
- **Spatial Statistics:** Baddeley et al., "Spatial Point Patterns: Methodology and Applications with R"