Files
gh-k-dense-ai-claude-scient…/skills/diffdock/references/parameters_reference.md
2025-11-30 08:30:10 +08:00

5.0 KiB

DiffDock Configuration Parameters Reference

This document provides comprehensive details on all DiffDock configuration parameters and command-line options.

Model & Checkpoint Settings

Model Paths

  • --model_dir: Directory containing the score model checkpoint

    • Default: ./workdir/v1.1/score_model
    • DiffDock-L model (current default)
  • --confidence_model_dir: Directory containing the confidence model checkpoint

    • Default: ./workdir/v1.1/confidence_model
  • --ckpt: Name of the score model checkpoint file

    • Default: best_ema_inference_epoch_model.pt
  • --confidence_ckpt: Name of the confidence model checkpoint file

    • Default: best_model_epoch75.pt

Model Version Flags

  • --old_score_model: Use original DiffDock model instead of DiffDock-L

    • Default: false (uses DiffDock-L)
  • --old_filtering_model: Use legacy confidence filtering approach

    • Default: true

Input/Output Options

Input Specification

  • --protein_path: Path to protein PDB file

    • Example: --protein_path protein.pdb
    • Alternative to --protein_sequence
  • --protein_sequence: Amino acid sequence for ESMFold folding

    • Automatically generates protein structure from sequence
    • Alternative to --protein_path
  • --ligand: Ligand specification (SMILES string or file path)

    • SMILES string: --ligand "COc(cc1)ccc1C#N"
    • File path: --ligand ligand.sdf or .mol2
  • --protein_ligand_csv: CSV file for batch processing

    • Required columns: complex_name, protein_path, ligand_description, protein_sequence
    • Example: --protein_ligand_csv data/protein_ligand_example.csv

Output Control

  • --out_dir: Output directory for predictions

    • Example: --out_dir results/user_predictions/
  • --save_visualisation: Export predicted molecules as SDF files

    • Enables visualization of results

Inference Parameters

Diffusion Steps

  • --inference_steps: Number of planned inference iterations

    • Default: 20
    • Higher values may improve accuracy but increase runtime
  • --actual_steps: Actual diffusion steps executed

    • Default: 19
  • --no_final_step_noise: Omit noise at the final diffusion step

    • Default: true

Sampling Settings

  • --samples_per_complex: Number of samples to generate per complex

    • Default: 10
    • More samples provide better coverage but increase computation
  • --sigma_schedule: Noise schedule type

    • Default: expbeta (exponential-beta)
  • --initial_noise_std_proportion: Initial noise standard deviation scaling

    • Default: 1.46

Temperature Parameters

Sampling Temperatures (Controls diversity of predictions)

  • --temp_sampling_tr: Translation sampling temperature

    • Default: 1.17
  • --temp_sampling_rot: Rotation sampling temperature

    • Default: 2.06
  • --temp_sampling_tor: Torsion sampling temperature

    • Default: 7.04

Psi Angle Temperatures

  • --temp_psi_tr: Translation psi temperature

    • Default: 0.73
  • --temp_psi_rot: Rotation psi temperature

    • Default: 0.90
  • --temp_psi_tor: Torsion psi temperature

    • Default: 0.59

Sigma Data Temperatures

  • --temp_sigma_data_tr: Translation data distribution scaling

    • Default: 0.93
  • --temp_sigma_data_rot: Rotation data distribution scaling

    • Default: 0.75
  • --temp_sigma_data_tor: Torsion data distribution scaling

    • Default: 0.69

Processing Options

Performance

  • --batch_size: Processing batch size

    • Default: 10
    • Larger values increase throughput but require more memory
  • --tqdm: Enable progress bar visualization

    • Useful for monitoring long-running jobs

Protein Structure

  • --chain_cutoff: Maximum number of protein chains to process

    • Example: --chain_cutoff 10
    • Useful for large multi-chain complexes
  • --esm_embeddings_path: Path to pre-computed ESM2 protein embeddings

    • Speeds up inference by reusing embeddings
    • Optional optimization

Dataset Options

  • --split: Dataset split to use (train/test/val)
    • Used for evaluation on standard benchmarks

Advanced Flags

Debugging & Testing

  • --no_model: Disable model inference (debugging)

    • Default: false
  • --no_random: Disable randomization

    • Default: false
    • Useful for reproducibility testing

Alternative Sampling

  • --ode: Use ODE solver instead of SDE

    • Default: false
    • Alternative sampling approach
  • --different_schedules: Use different noise schedules per component

    • Default: false

Error Handling

  • --limit_failures: Maximum allowed failures before stopping
    • Default: 5

Configuration File

All parameters can be specified in a YAML configuration file (typically default_inference_args.yaml) or overridden via command line:

python -m inference --config default_inference_args.yaml --samples_per_complex 20

Command-line arguments take precedence over configuration file values.