Files
2025-11-30 08:30:10 +08:00

164 lines
5.0 KiB
Markdown

# DiffDock Configuration Parameters Reference
This document provides comprehensive details on all DiffDock configuration parameters and command-line options.
## Model & Checkpoint Settings
### Model Paths
- **`--model_dir`**: Directory containing the score model checkpoint
- Default: `./workdir/v1.1/score_model`
- DiffDock-L model (current default)
- **`--confidence_model_dir`**: Directory containing the confidence model checkpoint
- Default: `./workdir/v1.1/confidence_model`
- **`--ckpt`**: Name of the score model checkpoint file
- Default: `best_ema_inference_epoch_model.pt`
- **`--confidence_ckpt`**: Name of the confidence model checkpoint file
- Default: `best_model_epoch75.pt`
### Model Version Flags
- **`--old_score_model`**: Use original DiffDock model instead of DiffDock-L
- Default: `false` (uses DiffDock-L)
- **`--old_filtering_model`**: Use legacy confidence filtering approach
- Default: `true`
## Input/Output Options
### Input Specification
- **`--protein_path`**: Path to protein PDB file
- Example: `--protein_path protein.pdb`
- Alternative to `--protein_sequence`
- **`--protein_sequence`**: Amino acid sequence for ESMFold folding
- Automatically generates protein structure from sequence
- Alternative to `--protein_path`
- **`--ligand`**: Ligand specification (SMILES string or file path)
- SMILES string: `--ligand "COc(cc1)ccc1C#N"`
- File path: `--ligand ligand.sdf` or `.mol2`
- **`--protein_ligand_csv`**: CSV file for batch processing
- Required columns: `complex_name`, `protein_path`, `ligand_description`, `protein_sequence`
- Example: `--protein_ligand_csv data/protein_ligand_example.csv`
### Output Control
- **`--out_dir`**: Output directory for predictions
- Example: `--out_dir results/user_predictions/`
- **`--save_visualisation`**: Export predicted molecules as SDF files
- Enables visualization of results
## Inference Parameters
### Diffusion Steps
- **`--inference_steps`**: Number of planned inference iterations
- Default: `20`
- Higher values may improve accuracy but increase runtime
- **`--actual_steps`**: Actual diffusion steps executed
- Default: `19`
- **`--no_final_step_noise`**: Omit noise at the final diffusion step
- Default: `true`
### Sampling Settings
- **`--samples_per_complex`**: Number of samples to generate per complex
- Default: `10`
- More samples provide better coverage but increase computation
- **`--sigma_schedule`**: Noise schedule type
- Default: `expbeta` (exponential-beta)
- **`--initial_noise_std_proportion`**: Initial noise standard deviation scaling
- Default: `1.46`
### Temperature Parameters
#### Sampling Temperatures (Controls diversity of predictions)
- **`--temp_sampling_tr`**: Translation sampling temperature
- Default: `1.17`
- **`--temp_sampling_rot`**: Rotation sampling temperature
- Default: `2.06`
- **`--temp_sampling_tor`**: Torsion sampling temperature
- Default: `7.04`
#### Psi Angle Temperatures
- **`--temp_psi_tr`**: Translation psi temperature
- Default: `0.73`
- **`--temp_psi_rot`**: Rotation psi temperature
- Default: `0.90`
- **`--temp_psi_tor`**: Torsion psi temperature
- Default: `0.59`
#### Sigma Data Temperatures
- **`--temp_sigma_data_tr`**: Translation data distribution scaling
- Default: `0.93`
- **`--temp_sigma_data_rot`**: Rotation data distribution scaling
- Default: `0.75`
- **`--temp_sigma_data_tor`**: Torsion data distribution scaling
- Default: `0.69`
## Processing Options
### Performance
- **`--batch_size`**: Processing batch size
- Default: `10`
- Larger values increase throughput but require more memory
- **`--tqdm`**: Enable progress bar visualization
- Useful for monitoring long-running jobs
### Protein Structure
- **`--chain_cutoff`**: Maximum number of protein chains to process
- Example: `--chain_cutoff 10`
- Useful for large multi-chain complexes
- **`--esm_embeddings_path`**: Path to pre-computed ESM2 protein embeddings
- Speeds up inference by reusing embeddings
- Optional optimization
### Dataset Options
- **`--split`**: Dataset split to use (train/test/val)
- Used for evaluation on standard benchmarks
## Advanced Flags
### Debugging & Testing
- **`--no_model`**: Disable model inference (debugging)
- Default: `false`
- **`--no_random`**: Disable randomization
- Default: `false`
- Useful for reproducibility testing
### Alternative Sampling
- **`--ode`**: Use ODE solver instead of SDE
- Default: `false`
- Alternative sampling approach
- **`--different_schedules`**: Use different noise schedules per component
- Default: `false`
### Error Handling
- **`--limit_failures`**: Maximum allowed failures before stopping
- Default: `5`
## Configuration File
All parameters can be specified in a YAML configuration file (typically `default_inference_args.yaml`) or overridden via command line:
```bash
python -m inference --config default_inference_args.yaml --samples_per_complex 20
```
Command-line arguments take precedence over configuration file values.