zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

16 KiB

Raw Permalink Blame History

Graph Construction & Spatial Analysis

Overview

PathML provides tools for constructing spatial graphs from tissue images to represent cellular and tissue-level relationships. Graph-based representations enable sophisticated spatial analysis, including neighborhood analysis, cell-cell interaction studies, and graph neural network applications. These graphs capture both morphological features and spatial topology for downstream computational analysis.

Graph Types

PathML supports construction of multiple graph types:

Cell Graphs

Nodes represent individual cells
Edges represent spatial proximity or biological interactions
Node features include morphology, marker expression, cell type
Suitable for single-cell spatial analysis

Tissue Graphs

Nodes represent tissue regions or superpixels
Edges represent spatial adjacency
Node features include tissue composition, texture features
Suitable for tissue-level spatial patterns

Spatial Transcriptomics Graphs

Nodes represent spatial spots or cells
Edges encode spatial relationships
Node features include gene expression profiles
Suitable for spatial omics analysis

Graph Construction Workflow

From Segmentation to Graphs

Convert nucleus or cell segmentation results into spatial graphs:

from pathml.graph import CellGraph
from pathml.preprocessing import Pipeline, SegmentMIF
import numpy as np

# 1. Perform cell segmentation
pipeline = Pipeline([
    SegmentMIF(
        nuclear_channel='DAPI',
        cytoplasm_channel='CD45',
        model='mesmer'
    )
])
pipeline.run(slide)

# 2. Extract instance segmentation mask
inst_map = slide.masks['cell_segmentation']

# 3. Build cell graph
cell_graph = CellGraph.from_instance_map(
    inst_map,
    image=slide.image,  # Optional: for extracting visual features
    connectivity='delaunay',  # 'knn', 'radius', or 'delaunay'
    k=5,  # For knn: number of neighbors
    radius=50  # For radius: distance threshold in pixels
)

# 4. Access graph components
nodes = cell_graph.nodes  # Node features
edges = cell_graph.edges  # Edge list
adjacency = cell_graph.adjacency_matrix  # Adjacency matrix

Connectivity Methods

K-Nearest Neighbors (KNN):

# Connect each cell to its k nearest neighbors
graph = CellGraph.from_instance_map(
    inst_map,
    connectivity='knn',
    k=5  # Number of neighbors
)

Fixed degree per node
Captures local neighborhoods
Simple and interpretable

Radius-based:

# Connect cells within a distance threshold
graph = CellGraph.from_instance_map(
    inst_map,
    connectivity='radius',
    radius=100,  # Maximum distance in pixels
    distance_metric='euclidean'  # or 'manhattan', 'chebyshev'
)

Variable degree based on density
Biologically motivated (interaction range)
Captures physical proximity

Delaunay Triangulation:

# Connect cells using Delaunay triangulation
graph = CellGraph.from_instance_map(
    inst_map,
    connectivity='delaunay'
)

Creates connected graph from spatial positions
No isolated nodes (in convex hull)
Captures spatial tessellation

Contact-based:

# Connect cells with touching boundaries
graph = CellGraph.from_instance_map(
    inst_map,
    connectivity='contact',
    dilation=2  # Dilate boundaries to capture near-contacts
)

Physical cell-cell contacts
Most biologically direct
Sparse edges for separated cells

Node Features

Morphological Features

Extract shape and size features for each cell:

from pathml.graph import extract_morphology_features

# Compute morphological features
morphology_features = extract_morphology_features(
    inst_map,
    features=[
        'area',  # Cell area in pixels
        'perimeter',  # Cell perimeter
        'eccentricity',  # Shape elongation
        'solidity',  # Convexity measure
        'major_axis_length',
        'minor_axis_length',
        'orientation'  # Cell orientation angle
    ]
)

# Add to graph
cell_graph.add_node_features(morphology_features, feature_names=['area', 'perimeter', ...])

Available morphological features:

Area - Number of pixels
Perimeter - Boundary length
Eccentricity - 0 (circle) to 1 (line)
Solidity - Area / convex hull area
Circularity - 4π × area / perimeter²
Major/Minor axis - Lengths of fitted ellipse axes
Orientation - Angle of major axis
Extent - Area / bounding box area

Intensity Features

Extract marker expression or intensity statistics:

from pathml.graph import extract_intensity_features

# Extract mean marker intensities per cell
intensity_features = extract_intensity_features(
    inst_map,
    image=multichannel_image,  # Shape: (H, W, C)
    channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
    statistics=['mean', 'std', 'median', 'max']
)

# Add to graph
cell_graph.add_node_features(
    intensity_features,
    feature_names=['DAPI_mean', 'CD3_mean', ...]
)

Available statistics:

mean - Average intensity
median - Median intensity
std - Standard deviation
max - Maximum intensity
min - Minimum intensity
quantile_25/75 - Quartiles

Texture Features

Compute texture descriptors for each cell region:

from pathml.graph import extract_texture_features

# Haralick texture features
texture_features = extract_texture_features(
    inst_map,
    image=grayscale_image,
    features='haralick',  # or 'lbp', 'gabor'
    distance=1,
    angles=[0, np.pi/4, np.pi/2, 3*np.pi/4]
)

cell_graph.add_node_features(texture_features)

Cell Type Annotations

Add cell type labels from classification:

# From ML model predictions
cell_types = hovernet_type_predictions  # Array of cell type IDs

cell_graph.add_node_features(
    cell_types,
    feature_names=['cell_type']
)

# One-hot encode cell types
cell_type_onehot = one_hot_encode(cell_types, num_classes=5)
cell_graph.add_node_features(
    cell_type_onehot,
    feature_names=['type_epithelial', 'type_inflammatory', ...]
)

Edge Features

Spatial Distance

Compute edge features based on spatial relationships:

from pathml.graph import compute_edge_distances

# Add pairwise distances as edge features
distances = compute_edge_distances(
    cell_graph,
    metric='euclidean'  # or 'manhattan', 'chebyshev'
)

cell_graph.add_edge_features(distances, feature_names=['distance'])

Interaction Features

Model biological interactions between cell types:

from pathml.graph import compute_interaction_features

# Cell type co-occurrence along edges
interaction_features = compute_interaction_features(
    cell_graph,
    cell_types=cell_type_labels,
    interaction_type='categorical'  # or 'numerical'
)

cell_graph.add_edge_features(interaction_features)

Graph-Level Features

Aggregate features for entire graph:

from pathml.graph import compute_graph_features

# Topological features
graph_features = compute_graph_features(
    cell_graph,
    features=[
        'num_nodes',
        'num_edges',
        'average_degree',
        'clustering_coefficient',
        'average_path_length',
        'diameter'
    ]
)

# Cell composition features
composition = cell_graph.compute_cell_type_composition(
    cell_type_labels,
    normalize=True  # Proportions
)

Spatial Analysis

Neighborhood Analysis

Analyze cell neighborhoods and microenvironments:

from pathml.graph import analyze_neighborhoods

# Characterize neighborhoods around each cell
neighborhoods = analyze_neighborhoods(
    cell_graph,
    cell_types=cell_type_labels,
    radius=100,  # Neighborhood radius
    metrics=['diversity', 'density', 'composition']
)

# Neighborhood diversity (Shannon entropy)
diversity = neighborhoods['diversity']

# Cell type composition in each neighborhood
composition = neighborhoods['composition']  # (n_cells, n_cell_types)

Spatial Clustering

Identify spatial clusters of cell types:

from pathml.graph import spatial_clustering
import matplotlib.pyplot as plt

# Detect spatial clusters
clusters = spatial_clustering(
    cell_graph,
    cell_positions,
    method='dbscan',  # or 'kmeans', 'hierarchical'
    eps=50,  # DBSCAN: neighborhood radius
    min_samples=10  # DBSCAN: minimum cluster size
)

# Visualize clusters
plt.scatter(
    cell_positions[:, 0],
    cell_positions[:, 1],
    c=clusters,
    cmap='tab20'
)
plt.title('Spatial Clusters')
plt.show()

Cell-Cell Interaction Analysis

Test for enrichment or depletion of cell type interactions:

from pathml.graph import cell_interaction_analysis

# Test for significant interactions
interaction_results = cell_interaction_analysis(
    cell_graph,
    cell_types=cell_type_labels,
    method='permutation',  # or 'expected'
    n_permutations=1000,
    significance_level=0.05
)

# Interaction scores (positive = attraction, negative = avoidance)
interaction_matrix = interaction_results['scores']

# Visualize with heatmap
import seaborn as sns
sns.heatmap(
    interaction_matrix,
    cmap='RdBu_r',
    center=0,
    xticklabels=cell_type_names,
    yticklabels=cell_type_names
)
plt.title('Cell-Cell Interaction Scores')
plt.show()

Spatial Statistics

Compute spatial statistics and patterns:

from pathml.graph import spatial_statistics

# Ripley's K function for spatial point patterns
ripleys_k = spatial_statistics(
    cell_positions,
    cell_types=cell_type_labels,
    statistic='ripleys_k',
    radii=np.linspace(0, 200, 50)
)

# Nearest neighbor distances
nn_distances = spatial_statistics(
    cell_positions,
    statistic='nearest_neighbor',
    by_cell_type=True
)

Integration with Graph Neural Networks

Convert to PyTorch Geometric Format

from pathml.graph import to_pyg
import torch
from torch_geometric.data import Data

# Convert to PyTorch Geometric Data object
pyg_data = cell_graph.to_pyg()

# Access components
x = pyg_data.x  # Node features (n_nodes, n_features)
edge_index = pyg_data.edge_index  # Edge connectivity (2, n_edges)
edge_attr = pyg_data.edge_attr  # Edge features (n_edges, n_edge_features)
y = pyg_data.y  # Graph-level label
pos = pyg_data.pos  # Node positions (n_nodes, 2)

# Use with PyTorch Geometric
from torch_geometric.nn import GCNConv

class GNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x

model = GNN(in_channels=pyg_data.num_features, hidden_channels=64, out_channels=5)
output = model(pyg_data)

Graph Dataset for Multiple Slides

from pathml.graph import GraphDataset
from torch_geometric.loader import DataLoader

# Create dataset of graphs from multiple slides
graphs = []
for slide in slides:
    # Build graph for each slide
    cell_graph = CellGraph.from_instance_map(slide.inst_map, ...)
    pyg_graph = cell_graph.to_pyg()
    graphs.append(pyg_graph)

# Create DataLoader
loader = DataLoader(graphs, batch_size=32, shuffle=True)

# Train GNN
for batch in loader:
    output = model(batch)
    loss = criterion(output, batch.y)
    loss.backward()
    optimizer.step()

Visualization

Graph Visualization

import matplotlib.pyplot as plt
import networkx as nx

# Convert to NetworkX
nx_graph = cell_graph.to_networkx()

# Draw graph with cell positions as layout
pos = {i: cell_graph.positions[i] for i in range(len(cell_graph.nodes))}

plt.figure(figsize=(12, 12))
nx.draw_networkx(
    nx_graph,
    pos=pos,
    node_color=cell_type_labels,
    node_size=50,
    cmap='tab10',
    with_labels=False,
    alpha=0.8
)
plt.axis('equal')
plt.title('Cell Graph')
plt.show()

Overlay on Tissue Image

from pathml.graph import visualize_graph_on_image

# Visualize graph overlaid on tissue
fig, ax = plt.subplots(figsize=(15, 15))
ax.imshow(tissue_image)

# Draw edges
for edge in cell_graph.edges:
    node1, node2 = edge
    pos1 = cell_graph.positions[node1]
    pos2 = cell_graph.positions[node2]
    ax.plot([pos1[0], pos2[0]], [pos1[1], pos2[1]], 'b-', alpha=0.3, linewidth=0.5)

# Draw nodes colored by type
for cell_type in np.unique(cell_type_labels):
    mask = cell_type_labels == cell_type
    positions = cell_graph.positions[mask]
    ax.scatter(positions[:, 0], positions[:, 1], label=f'Type {cell_type}', s=20)

ax.legend()
ax.axis('off')
plt.title('Cell Graph on Tissue')
plt.show()

Complete Workflow Example

from pathml.core import SlideData, CODEXSlide
from pathml.preprocessing import Pipeline, CollapseRunsCODEX, SegmentMIF
from pathml.graph import CellGraph, extract_morphology_features, extract_intensity_features
import matplotlib.pyplot as plt

# 1. Load and preprocess slide
slide = CODEXSlide('path/to/codex', stain='IF')

pipeline = Pipeline([
    CollapseRunsCODEX(z_slice=2),
    SegmentMIF(
        nuclear_channel='DAPI',
        cytoplasm_channel='CD45',
        model='mesmer'
    )
])
pipeline.run(slide)

# 2. Build cell graph
inst_map = slide.masks['cell_segmentation']
cell_graph = CellGraph.from_instance_map(
    inst_map,
    image=slide.image,
    connectivity='knn',
    k=6
)

# 3. Extract features
# Morphological features
morph_features = extract_morphology_features(
    inst_map,
    features=['area', 'perimeter', 'eccentricity', 'solidity']
)
cell_graph.add_node_features(morph_features)

# Intensity features (marker expression)
intensity_features = extract_intensity_features(
    inst_map,
    image=slide.image,
    channel_names=['DAPI', 'CD3', 'CD4', 'CD8', 'CD20'],
    statistics=['mean', 'std']
)
cell_graph.add_node_features(intensity_features)

# 4. Spatial analysis
from pathml.graph import analyze_neighborhoods

neighborhoods = analyze_neighborhoods(
    cell_graph,
    cell_types=cell_type_predictions,
    radius=100,
    metrics=['diversity', 'composition']
)

# 5. Export for GNN
pyg_data = cell_graph.to_pyg()

# 6. Visualize
plt.figure(figsize=(15, 15))
plt.imshow(slide.image)

# Overlay graph
nx_graph = cell_graph.to_networkx()
pos = {i: cell_graph.positions[i] for i in range(cell_graph.num_nodes)}
nx.draw_networkx(
    nx_graph,
    pos=pos,
    node_color=cell_type_predictions,
    cmap='tab10',
    node_size=30,
    with_labels=False
)
plt.axis('off')
plt.title('Cell Graph with Spatial Neighborhood')
plt.show()

Performance Considerations

Large tissue sections:

Build graphs tile-by-tile, then merge
Use sparse adjacency matrices
Leverage GPU for feature extraction

Memory efficiency:

Store only necessary edge features
Use int32/float32 instead of int64/float64
Batch process multiple slides

Computational efficiency:

Parallelize feature extraction across cells
Use KNN for faster neighbor queries
Cache computed features

Best Practices

Choose appropriate connectivity: KNN for uniform analysis, radius for physical interactions, contact for direct cell-cell communication
Normalize features: Scale morphological and intensity features for GNN compatibility
Handle edge effects: Exclude boundary cells or use tissue masks to define valid regions
Validate graph construction: Visualize graphs on small regions before large-scale processing
Combine multiple feature types: Morphology + intensity + texture provides rich representations
Consider tissue context: Tissue type affects appropriate graph parameters (connectivity, radius)

Common Issues and Solutions

Issue: Too many/few edges

Adjust k (KNN) or radius (radius-based) parameters
Verify pixel-to-micron conversion for biological relevance

Issue: Memory errors with large graphs

Process tiles separately and merge graphs
Use sparse matrix representations
Reduce edge features to essential ones

Issue: Missing cells at tissue boundaries

Apply edge_correction parameter
Use tissue masks to exclude invalid regions

Issue: Inconsistent feature scales

Normalize features: (x - mean) / std
Use robust scaling for outliers

Additional Resources

PathML Graph API: https://pathml.readthedocs.io/en/latest/api_graph_reference.html
PyTorch Geometric: https://pytorch-geometric.readthedocs.io/
NetworkX: https://networkx.org/
Spatial Statistics: Baddeley et al., "Spatial Point Patterns: Methodology and Applications with R"

16 KiB Raw Permalink Blame History Unescape Escape