zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

13 KiB

Raw Blame History

Models and Architectures

Overview

TorchDrug provides a comprehensive collection of pre-built model architectures for various graph-based learning tasks. This reference catalogs all available models with their characteristics, use cases, and implementation details.

Graph Neural Networks

GCN (Graph Convolutional Network)

Type: Spatial message passing Paper: Semi-Supervised Classification with Graph Convolutional Networks (Kipf & Welling, 2017)

Characteristics:

Simple and efficient aggregation
Normalized adjacency matrix convolution
Works well for homophilic graphs
Good baseline for many tasks

Best For:

Initial experiments and baselines
When computational efficiency is important
Graphs with clear local structure

Parameters:

input_dim: Node feature dimension
hidden_dims: List of hidden layer dimensions
edge_input_dim: Edge feature dimension (optional)
batch_norm: Apply batch normalization
activation: Activation function (relu, elu, etc.)
dropout: Dropout rate

Use Cases:

Molecular property prediction
Citation network classification
Social network analysis

GAT (Graph Attention Network)

Type: Attention-based message passing Paper: Graph Attention Networks (Veličković et al., 2018)

Characteristics:

Learns attention weights for neighbors
Different importance for different neighbors
Multi-head attention for robustness
Handles varying node degrees naturally

Best For:

When neighbor importance varies
Heterogeneous graphs
Interpretable predictions

Parameters:

input_dim, hidden_dims: Standard dimensions
num_heads: Number of attention heads
negative_slope: LeakyReLU slope
concat: Concatenate or average multi-head outputs

Use Cases:

Protein-protein interaction prediction
Molecule generation with attention to reactive sites
Knowledge graph reasoning with relation importance

GIN (Graph Isomorphism Network)

Type: Maximally powerful message passing Paper: How Powerful are Graph Neural Networks? (Xu et al., 2019)

Characteristics:

Theoretically most expressive GNN architecture
Injective aggregation function
Can distinguish graph structures GCN cannot
Often best performance on molecular tasks

Best For:

Molecular property prediction (state-of-the-art)
Tasks requiring structural discrimination
Graph classification

Parameters:

input_dim, hidden_dims: Standard dimensions
edge_input_dim: Include edge features
batch_norm: Typically use true
readout: Graph pooling ("sum", "mean", "max")
eps: Learnable or fixed epsilon

Use Cases:

Drug property prediction (BBBP, HIV, etc.)
Molecular generation
Reaction prediction

RGCN (Relational Graph Convolutional Network)

Type: Multi-relational message passing Paper: Modeling Relational Data with Graph Convolutional Networks (Schlichtkrull et al., 2018)

Characteristics:

Handles multiple edge/relation types
Relation-specific weight matrices
Basis decomposition for parameter efficiency
Essential for knowledge graphs

Best For:

Knowledge graph reasoning
Heterogeneous molecular graphs
Multi-relational data

Parameters:

num_relation: Number of relation types
hidden_dims: Layer dimensions
num_bases: Basis decomposition (reduce parameters)

Use Cases:

Knowledge graph completion
Retrosynthesis (different bond types)
Protein interaction networks

MPNN (Message Passing Neural Network)

Type: General message passing framework Paper: Neural Message Passing for Quantum Chemistry (Gilmer et al., 2017)

Characteristics:

Flexible message and update functions
Edge features in message computation
GRU updates for node hidden states
Set2Set readout for graph representation

Best For:

Quantum chemistry predictions
Tasks with important edge information
When node states evolve over multiple iterations

Parameters:

input_dim, hidden_dim: Feature dimensions
edge_input_dim: Edge feature dimension
num_layer: Message passing iterations
num_mlp_layer: MLP layers in message function

Use Cases:

QM9 quantum property prediction
Molecular dynamics
3D conformation-aware tasks

SchNet (Continuous-Filter Convolutional Network)

Type: 3D geometry-aware convolution Paper: SchNet: A continuous-filter convolutional neural network (Schütt et al., 2017)

Characteristics:

Operates on 3D atomic coordinates
Continuous filter convolutions
Rotation and translation invariant
Excellent for quantum chemistry

Best For:

3D molecular structure tasks
Quantum property prediction
Protein structure analysis
Energy and force prediction

Parameters:

input_dim: Atom features
hidden_dims: Layer dimensions
num_gaussian: RBF basis functions for distances
cutoff: Interaction cutoff distance

Use Cases:

QM9 property prediction
Molecular dynamics simulations
Protein-ligand binding with structures
Crystal property prediction

ChebNet (Chebyshev Spectral CNN)

Type: Spectral convolution Paper: Convolutional Neural Networks on Graphs (Defferrard et al., 2016)

Characteristics:

Spectral graph convolution
Chebyshev polynomial approximation
Captures global graph structure
Computationally efficient

Best For:

Tasks requiring global information
When graph Laplacian is informative
Theoretical analysis

Parameters:

input_dim, hidden_dims: Dimensions
num_cheb: Order of Chebyshev polynomial

Use Cases:

Citation network classification
Brain network analysis
Signal processing on graphs

NFP (Neural Fingerprint)

Type: Molecular fingerprint learning Paper: Convolutional Networks on Graphs for Learning Molecular Fingerprints (Duvenaud et al., 2015)

Characteristics:

Learns differentiable molecular fingerprints
Alternative to hand-crafted fingerprints (ECFP)
Circular convolutions like ECFP
Interpretable learned features

Best For:

Molecular similarity learning
Property prediction with limited data
When interpretability is important

Parameters:

input_dim, output_dim: Feature dimensions
hidden_dims: Layer dimensions
num_layer: Circular convolution depth

Use Cases:

Virtual screening
Molecular similarity search
QSAR modeling

Protein-Specific Models

GearNet (Geometry-Aware Relational Graph Network)

Type: Protein structure encoder Paper: Protein Representation Learning by Geometric Structure Pretraining (Zhang et al., 2023)

Characteristics:

Incorporates 3D geometric information
Multiple edge types (sequential, spatial, KNN)
Designed specifically for proteins
State-of-the-art on protein tasks

Best For:

Protein structure prediction
Protein function prediction
Protein-protein interaction
Any task with protein 3D structures

Parameters:

input_dim: Residue features
hidden_dims: Layer dimensions
num_relation: Edge types (sequence, radius, KNN)
edge_input_dim: Geometric features (distances, angles)
batch_norm: Typically true

Use Cases:

Enzyme function prediction (EnzymeCommission)
Protein fold recognition
Contact prediction
Binding site identification

ESM (Evolutionary Scale Modeling)

Type: Protein language model (transformer) Paper: Biological structure and function emerge from scaling unsupervised learning (Rives et al., 2021)

Characteristics:

Pre-trained on 250M+ protein sequences
Captures evolutionary and structural information
Transformer architecture
Transfer learning for downstream tasks

Best For:

Any sequence-based protein task
When no structure available
Transfer learning with limited data

Variants:

ESM-1b: 650M parameters
ESM-2: Multiple sizes (8M to 15B parameters)

Use Cases:

Protein function prediction
Variant effect prediction
Protein design
Structure prediction (ESMFold)

ProteinBERT

Type: Masked language model for proteins

Characteristics:

BERT-style pre-training
Masked amino acid prediction
Bidirectional context
Good for sequence-based tasks

Use Cases:

Function annotation
Subcellular localization
Stability prediction

ProteinCNN / ProteinResNet

Type: Convolutional networks for sequences

Characteristics:

1D convolutions on sequences
Local pattern recognition
Faster than transformers
Good for motif detection

Use Cases:

Binding site prediction
Secondary structure prediction
Domain identification

ProteinLSTM

Type: Recurrent network for sequences

Characteristics:

Bidirectional LSTM
Captures long-range dependencies
Sequential processing
Good baseline for sequence tasks

Use Cases:

Order prediction
Sequential annotation
Time-series protein data

Knowledge Graph Models

TransE (Translation Embedding)

Type: Translation-based embedding Paper: Translating Embeddings for Modeling Multi-relational Data (Bordes et al., 2013)

Characteristics:

h + r ≈ t (head + relation ≈ tail)
Simple and interpretable
Works well for 1-to-1 relations
Memory efficient

Best For:

Large knowledge graphs
Initial experiments
Interpretable embeddings

Parameters:

num_entity, num_relation: Graph size
embedding_dim: Embedding dimensions (typically 50-500)

RotatE (Rotation Embedding)

Type: Rotation in complex space Paper: RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space (Sun et al., 2019)

Characteristics:

Relations as rotations in complex space
Handles symmetric, antisymmetric, inverse, composition
State-of-the-art on many benchmarks

Best For:

Most knowledge graph tasks
Complex relation patterns
When accuracy is critical

Parameters:

num_entity, num_relation: Graph size
embedding_dim: Must be even (complex embeddings)
max_score: Score clipping value

DistMult

Type: Bilinear model

Characteristics:

Symmetric relation modeling
Fast and efficient
Cannot model antisymmetric relations

Best For:

Symmetric relations (e.g., "similar to")
When speed is critical
Large-scale graphs

ComplEx

Type: Complex-valued embeddings

Characteristics:

Handles asymmetric and symmetric relations
Better than DistMult for most graphs
Good balance of expressiveness and efficiency

Best For:

General knowledge graph completion
Mixed relation types
When RotatE is too complex

SimplE

Type: Enhanced embedding model

Characteristics:

Two embeddings per entity (canonical + inverse)
Fully expressive
Slightly more parameters than ComplEx

Best For:

When full expressiveness needed
Inverse relations are important

Generative Models

GraphAutoregressiveFlow

Type: Normalizing flow for molecules

Characteristics:

Exact likelihood computation
Invertible transformations
Stable training (no adversarial)
Conditional generation support

Best For:

Molecular generation
Density estimation
Interpolation between molecules

Parameters:

input_dim: Atom features
hidden_dims: Coupling layers
num_flow: Number of flow transformations

Use Cases:

De novo drug design
Chemical space exploration
Property-targeted generation

Pre-training Models

InfoGraph

Type: Contrastive learning

Characteristics:

Maximizes mutual information
Graph-level and node-level contrast
Unsupervised pre-training
Good for small datasets

Use Cases:

Pre-train molecular encoders
Few-shot learning
Transfer learning

MultiviewContrast

Type: Multi-view contrastive learning for proteins

Characteristics:

Contrasts different views of proteins
Geometric pre-training
Uses 3D structure information
Excellent for protein models

Use Cases:

Pre-train GearNet on protein structures
Transfer to property prediction
Limited labeled data scenarios

Model Selection Guide

By Task Type

Molecular Property Prediction:

GIN (first choice)
GAT (interpretability)
SchNet (3D available)

Protein Tasks:

ESM (sequence only)
GearNet (structure available)
ProteinBERT (sequence, lighter than ESM)

Knowledge Graphs:

RotatE (best performance)
ComplEx (good balance)
TransE (large graphs, efficiency)

Molecular Generation:

GraphAutoregressiveFlow (exact likelihood)
GCPN with GIN backbone (property optimization)

Retrosynthesis:

GIN (synthon completion)
RGCN (center identification with bond types)

By Dataset Size

Small (< 1k):

Use pre-trained models (ESM for proteins)
Simpler architectures (GCN, ProteinCNN)
Heavy regularization

Medium (1k-100k):

GIN for molecules
GAT for interpretability
Standard training

Large (> 100k):

Any model works
Deeper architectures
Can train from scratch

By Computational Budget

Low:

GCN (simplest)
DistMult (KG)
ProteinLSTM

Medium:

GIN
GAT
ComplEx

High:

ESM (large)
SchNet (3D)
RotatE with high dim

Implementation Tips

Start Simple: Begin with GCN or GIN baseline
Use Pre-trained: ESM for proteins, InfoGraph for molecules
Tune Depth: 3-5 layers typically sufficient
Batch Normalization: Usually helps (except KG embeddings)
Residual Connections: Important for deep networks
Readout Function: "mean" usually works well
Edge Features: Include when available (bonds, distances)
Regularization: Dropout, weight decay, early stopping

13 KiB Raw Blame History

Models and Architectures

Overview

Graph Neural Networks

GCN (Graph Convolutional Network)

GAT (Graph Attention Network)

GIN (Graph Isomorphism Network)

RGCN (Relational Graph Convolutional Network)

MPNN (Message Passing Neural Network)

SchNet (Continuous-Filter Convolutional Network)

ChebNet (Chebyshev Spectral CNN)

NFP (Neural Fingerprint)

Protein-Specific Models

GearNet (Geometry-Aware Relational Graph Network)

ESM (Evolutionary Scale Modeling)

ProteinBERT

ProteinCNN / ProteinResNet

ProteinLSTM

Knowledge Graph Models

TransE (Translation Embedding)

RotatE (Rotation Embedding)

DistMult

ComplEx

SimplE

Generative Models

GraphAutoregressiveFlow

Pre-training Models

InfoGraph

MultiviewContrast

Model Selection Guide

By Task Type

By Dataset Size

By Computational Budget

Implementation Tips

13 KiB

Raw Blame History