zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Fork 0

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

5.0 KiB

Raw Blame History

DiffDock Configuration Parameters Reference

This document provides comprehensive details on all DiffDock configuration parameters and command-line options.

Model & Checkpoint Settings

Model Paths

--model_dir: Directory containing the score model checkpoint
- Default: ./workdir/v1.1/score_model
- DiffDock-L model (current default)
--confidence_model_dir: Directory containing the confidence model checkpoint
- Default: ./workdir/v1.1/confidence_model
--ckpt: Name of the score model checkpoint file
- Default: best_ema_inference_epoch_model.pt
--confidence_ckpt: Name of the confidence model checkpoint file
- Default: best_model_epoch75.pt

Model Version Flags

--old_score_model: Use original DiffDock model instead of DiffDock-L
- Default: false (uses DiffDock-L)
--old_filtering_model: Use legacy confidence filtering approach
- Default: true

Input/Output Options

Input Specification

--protein_path: Path to protein PDB file
- Example: --protein_path protein.pdb
- Alternative to --protein_sequence
--protein_sequence: Amino acid sequence for ESMFold folding
- Automatically generates protein structure from sequence
- Alternative to --protein_path
--ligand: Ligand specification (SMILES string or file path)
- SMILES string: --ligand "COc(cc1)ccc1C#N"
- File path: --ligand ligand.sdf or .mol2
--protein_ligand_csv: CSV file for batch processing
- Required columns: complex_name, protein_path, ligand_description, protein_sequence
- Example: --protein_ligand_csv data/protein_ligand_example.csv

Output Control

--out_dir: Output directory for predictions
- Example: --out_dir results/user_predictions/
--save_visualisation: Export predicted molecules as SDF files
- Enables visualization of results

Inference Parameters

Diffusion Steps

--inference_steps: Number of planned inference iterations
- Default: 20
- Higher values may improve accuracy but increase runtime
--actual_steps: Actual diffusion steps executed
- Default: 19
--no_final_step_noise: Omit noise at the final diffusion step
- Default: true

Sampling Settings

--samples_per_complex: Number of samples to generate per complex
- Default: 10
- More samples provide better coverage but increase computation
--sigma_schedule: Noise schedule type
- Default: expbeta (exponential-beta)
--initial_noise_std_proportion: Initial noise standard deviation scaling
- Default: 1.46

Temperature Parameters

Sampling Temperatures (Controls diversity of predictions)

--temp_sampling_tr: Translation sampling temperature
- Default: 1.17
--temp_sampling_rot: Rotation sampling temperature
- Default: 2.06
--temp_sampling_tor: Torsion sampling temperature
- Default: 7.04

Psi Angle Temperatures

--temp_psi_tr: Translation psi temperature
- Default: 0.73
--temp_psi_rot: Rotation psi temperature
- Default: 0.90
--temp_psi_tor: Torsion psi temperature
- Default: 0.59

Sigma Data Temperatures

--temp_sigma_data_tr: Translation data distribution scaling
- Default: 0.93
--temp_sigma_data_rot: Rotation data distribution scaling
- Default: 0.75
--temp_sigma_data_tor: Torsion data distribution scaling
- Default: 0.69

Processing Options

Performance

--batch_size: Processing batch size
- Default: 10
- Larger values increase throughput but require more memory
--tqdm: Enable progress bar visualization
- Useful for monitoring long-running jobs

Protein Structure

--chain_cutoff: Maximum number of protein chains to process
- Example: --chain_cutoff 10
- Useful for large multi-chain complexes
--esm_embeddings_path: Path to pre-computed ESM2 protein embeddings
- Speeds up inference by reusing embeddings
- Optional optimization

Dataset Options

--split: Dataset split to use (train/test/val)
- Used for evaluation on standard benchmarks

Advanced Flags

Debugging & Testing

--no_model: Disable model inference (debugging)
- Default: false
--no_random: Disable randomization
- Default: false
- Useful for reproducibility testing

Alternative Sampling

--ode: Use ODE solver instead of SDE
- Default: false
- Alternative sampling approach
--different_schedules: Use different noise schedules per component
- Default: false

Error Handling

--limit_failures: Maximum allowed failures before stopping
- Default: 5

Configuration File

All parameters can be specified in a YAML configuration file (typically default_inference_args.yaml) or overridden via command line:

python -m inference --config default_inference_args.yaml --samples_per_complex 20

Command-line arguments take precedence over configuration file values.

5.0 KiB Raw Blame History