164 lines
5.0 KiB
Markdown
164 lines
5.0 KiB
Markdown
# DiffDock Configuration Parameters Reference
|
|
|
|
This document provides comprehensive details on all DiffDock configuration parameters and command-line options.
|
|
|
|
## Model & Checkpoint Settings
|
|
|
|
### Model Paths
|
|
- **`--model_dir`**: Directory containing the score model checkpoint
|
|
- Default: `./workdir/v1.1/score_model`
|
|
- DiffDock-L model (current default)
|
|
|
|
- **`--confidence_model_dir`**: Directory containing the confidence model checkpoint
|
|
- Default: `./workdir/v1.1/confidence_model`
|
|
|
|
- **`--ckpt`**: Name of the score model checkpoint file
|
|
- Default: `best_ema_inference_epoch_model.pt`
|
|
|
|
- **`--confidence_ckpt`**: Name of the confidence model checkpoint file
|
|
- Default: `best_model_epoch75.pt`
|
|
|
|
### Model Version Flags
|
|
- **`--old_score_model`**: Use original DiffDock model instead of DiffDock-L
|
|
- Default: `false` (uses DiffDock-L)
|
|
|
|
- **`--old_filtering_model`**: Use legacy confidence filtering approach
|
|
- Default: `true`
|
|
|
|
## Input/Output Options
|
|
|
|
### Input Specification
|
|
- **`--protein_path`**: Path to protein PDB file
|
|
- Example: `--protein_path protein.pdb`
|
|
- Alternative to `--protein_sequence`
|
|
|
|
- **`--protein_sequence`**: Amino acid sequence for ESMFold folding
|
|
- Automatically generates protein structure from sequence
|
|
- Alternative to `--protein_path`
|
|
|
|
- **`--ligand`**: Ligand specification (SMILES string or file path)
|
|
- SMILES string: `--ligand "COc(cc1)ccc1C#N"`
|
|
- File path: `--ligand ligand.sdf` or `.mol2`
|
|
|
|
- **`--protein_ligand_csv`**: CSV file for batch processing
|
|
- Required columns: `complex_name`, `protein_path`, `ligand_description`, `protein_sequence`
|
|
- Example: `--protein_ligand_csv data/protein_ligand_example.csv`
|
|
|
|
### Output Control
|
|
- **`--out_dir`**: Output directory for predictions
|
|
- Example: `--out_dir results/user_predictions/`
|
|
|
|
- **`--save_visualisation`**: Export predicted molecules as SDF files
|
|
- Enables visualization of results
|
|
|
|
## Inference Parameters
|
|
|
|
### Diffusion Steps
|
|
- **`--inference_steps`**: Number of planned inference iterations
|
|
- Default: `20`
|
|
- Higher values may improve accuracy but increase runtime
|
|
|
|
- **`--actual_steps`**: Actual diffusion steps executed
|
|
- Default: `19`
|
|
|
|
- **`--no_final_step_noise`**: Omit noise at the final diffusion step
|
|
- Default: `true`
|
|
|
|
### Sampling Settings
|
|
- **`--samples_per_complex`**: Number of samples to generate per complex
|
|
- Default: `10`
|
|
- More samples provide better coverage but increase computation
|
|
|
|
- **`--sigma_schedule`**: Noise schedule type
|
|
- Default: `expbeta` (exponential-beta)
|
|
|
|
- **`--initial_noise_std_proportion`**: Initial noise standard deviation scaling
|
|
- Default: `1.46`
|
|
|
|
### Temperature Parameters
|
|
|
|
#### Sampling Temperatures (Controls diversity of predictions)
|
|
- **`--temp_sampling_tr`**: Translation sampling temperature
|
|
- Default: `1.17`
|
|
|
|
- **`--temp_sampling_rot`**: Rotation sampling temperature
|
|
- Default: `2.06`
|
|
|
|
- **`--temp_sampling_tor`**: Torsion sampling temperature
|
|
- Default: `7.04`
|
|
|
|
#### Psi Angle Temperatures
|
|
- **`--temp_psi_tr`**: Translation psi temperature
|
|
- Default: `0.73`
|
|
|
|
- **`--temp_psi_rot`**: Rotation psi temperature
|
|
- Default: `0.90`
|
|
|
|
- **`--temp_psi_tor`**: Torsion psi temperature
|
|
- Default: `0.59`
|
|
|
|
#### Sigma Data Temperatures
|
|
- **`--temp_sigma_data_tr`**: Translation data distribution scaling
|
|
- Default: `0.93`
|
|
|
|
- **`--temp_sigma_data_rot`**: Rotation data distribution scaling
|
|
- Default: `0.75`
|
|
|
|
- **`--temp_sigma_data_tor`**: Torsion data distribution scaling
|
|
- Default: `0.69`
|
|
|
|
## Processing Options
|
|
|
|
### Performance
|
|
- **`--batch_size`**: Processing batch size
|
|
- Default: `10`
|
|
- Larger values increase throughput but require more memory
|
|
|
|
- **`--tqdm`**: Enable progress bar visualization
|
|
- Useful for monitoring long-running jobs
|
|
|
|
### Protein Structure
|
|
- **`--chain_cutoff`**: Maximum number of protein chains to process
|
|
- Example: `--chain_cutoff 10`
|
|
- Useful for large multi-chain complexes
|
|
|
|
- **`--esm_embeddings_path`**: Path to pre-computed ESM2 protein embeddings
|
|
- Speeds up inference by reusing embeddings
|
|
- Optional optimization
|
|
|
|
### Dataset Options
|
|
- **`--split`**: Dataset split to use (train/test/val)
|
|
- Used for evaluation on standard benchmarks
|
|
|
|
## Advanced Flags
|
|
|
|
### Debugging & Testing
|
|
- **`--no_model`**: Disable model inference (debugging)
|
|
- Default: `false`
|
|
|
|
- **`--no_random`**: Disable randomization
|
|
- Default: `false`
|
|
- Useful for reproducibility testing
|
|
|
|
### Alternative Sampling
|
|
- **`--ode`**: Use ODE solver instead of SDE
|
|
- Default: `false`
|
|
- Alternative sampling approach
|
|
|
|
- **`--different_schedules`**: Use different noise schedules per component
|
|
- Default: `false`
|
|
|
|
### Error Handling
|
|
- **`--limit_failures`**: Maximum allowed failures before stopping
|
|
- Default: `5`
|
|
|
|
## Configuration File
|
|
|
|
All parameters can be specified in a YAML configuration file (typically `default_inference_args.yaml`) or overridden via command line:
|
|
|
|
```bash
|
|
python -m inference --config default_inference_args.yaml --samples_per_complex 20
|
|
```
|
|
|
|
Command-line arguments take precedence over configuration file values.
|