175 lines
5.7 KiB
Markdown
175 lines
5.7 KiB
Markdown
# Datamol Fragments and Scaffolds Reference
|
|
|
|
## Scaffolds Module (`datamol.scaffold`)
|
|
|
|
Scaffolds represent the core structure of molecules, useful for identifying structural families and analyzing structure-activity relationships (SAR).
|
|
|
|
### Murcko Scaffolds
|
|
|
|
#### `dm.to_scaffold_murcko(mol)`
|
|
Extract Bemis-Murcko scaffold (molecular framework).
|
|
- **Method**: Removes side chains, retaining ring systems and linkers
|
|
- **Returns**: Molecule object representing the scaffold
|
|
- **Use case**: Identify core structures across compound series
|
|
- **Example**:
|
|
```python
|
|
mol = dm.to_mol("c1ccc(cc1)CCN") # Phenethylamine
|
|
scaffold = dm.to_scaffold_murcko(mol)
|
|
scaffold_smiles = dm.to_smiles(scaffold)
|
|
# Returns: 'c1ccccc1CC' (benzene ring + ethyl linker)
|
|
```
|
|
|
|
**Workflow for scaffold analysis**:
|
|
```python
|
|
# Extract scaffolds from compound library
|
|
scaffolds = [dm.to_scaffold_murcko(mol) for mol in mols]
|
|
scaffold_smiles = [dm.to_smiles(s) for s in scaffolds]
|
|
|
|
# Count scaffold frequency
|
|
from collections import Counter
|
|
scaffold_counts = Counter(scaffold_smiles)
|
|
most_common = scaffold_counts.most_common(10)
|
|
```
|
|
|
|
### Fuzzy Scaffolds
|
|
|
|
#### `dm.scaffold.fuzzy_scaffolding(mol, ...)`
|
|
Generate fuzzy scaffolds with enforceable groups that must appear in the core.
|
|
- **Purpose**: More flexible scaffold definition allowing specified functional groups
|
|
- **Use case**: Custom scaffold definitions beyond Murcko rules
|
|
|
|
### Applications
|
|
|
|
**Scaffold-based splitting** (for ML model validation):
|
|
```python
|
|
# Group compounds by scaffold
|
|
scaffold_to_mols = {}
|
|
for mol, scaffold in zip(mols, scaffolds):
|
|
smi = dm.to_smiles(scaffold)
|
|
if smi not in scaffold_to_mols:
|
|
scaffold_to_mols[smi] = []
|
|
scaffold_to_mols[smi].append(mol)
|
|
|
|
# Ensure train/test sets have different scaffolds
|
|
```
|
|
|
|
**SAR analysis**:
|
|
```python
|
|
# Group by scaffold and analyze activity
|
|
for scaffold_smi, molecules in scaffold_to_mols.items():
|
|
activities = [get_activity(mol) for mol in molecules]
|
|
print(f"Scaffold: {scaffold_smi}, Mean activity: {np.mean(activities)}")
|
|
```
|
|
|
|
---
|
|
|
|
## Fragments Module (`datamol.fragment`)
|
|
|
|
Molecular fragmentation breaks molecules into smaller pieces based on chemical rules, useful for fragment-based drug design and substructure analysis.
|
|
|
|
### BRICS Fragmentation
|
|
|
|
#### `dm.fragment.brics(mol, ...)`
|
|
Fragment molecule using BRICS (Breaking Retrosynthetically Interesting Chemical Substructures).
|
|
- **Method**: Dissects based on 16 chemically meaningful bond types
|
|
- **Consideration**: Considers chemical environment and surrounding substructures
|
|
- **Returns**: Set of fragment SMILES strings
|
|
- **Use case**: Retrosynthetic analysis, fragment-based design
|
|
- **Example**:
|
|
```python
|
|
mol = dm.to_mol("c1ccccc1CCN")
|
|
fragments = dm.fragment.brics(mol)
|
|
# Returns fragments like: '[1*]CCN', '[1*]c1ccccc1', etc.
|
|
# [1*] represents attachment points
|
|
```
|
|
|
|
### RECAP Fragmentation
|
|
|
|
#### `dm.fragment.recap(mol, ...)`
|
|
Fragment molecule using RECAP (Retrosynthetic Combinatorial Analysis Procedure).
|
|
- **Method**: Dissects based on 11 predefined bond types
|
|
- **Rules**:
|
|
- Leaves alkyl groups smaller than 5 carbons intact
|
|
- Preserves cyclic bonds
|
|
- **Returns**: Set of fragment SMILES strings
|
|
- **Use case**: Combinatorial library design
|
|
- **Example**:
|
|
```python
|
|
mol = dm.to_mol("CCCCCc1ccccc1")
|
|
fragments = dm.fragment.recap(mol)
|
|
```
|
|
|
|
### MMPA Fragmentation
|
|
|
|
#### `dm.fragment.mmpa_frag(mol, ...)`
|
|
Fragment for Matched Molecular Pair Analysis.
|
|
- **Purpose**: Generate fragments suitable for identifying molecular pairs
|
|
- **Use case**: Analyzing how small structural changes affect properties
|
|
- **Example**:
|
|
```python
|
|
fragments = dm.fragment.mmpa_frag(mol)
|
|
# Used to find pairs of molecules differing by single transformation
|
|
```
|
|
|
|
### Comparison of Methods
|
|
|
|
| Method | Bond Types | Preserves Cycles | Best For |
|
|
|--------|-----------|------------------|----------|
|
|
| BRICS | 16 | Yes | Retrosynthetic analysis, fragment recombination |
|
|
| RECAP | 11 | Yes | Combinatorial library design |
|
|
| MMPA | Variable | Depends | Structure-activity relationship analysis |
|
|
|
|
### Fragmentation Workflow
|
|
|
|
```python
|
|
import datamol as dm
|
|
|
|
# 1. Fragment a molecule
|
|
mol = dm.to_mol("CC(=O)Oc1ccccc1C(=O)O") # Aspirin
|
|
brics_frags = dm.fragment.brics(mol)
|
|
recap_frags = dm.fragment.recap(mol)
|
|
|
|
# 2. Analyze fragment frequency across library
|
|
all_fragments = []
|
|
for mol in molecule_library:
|
|
frags = dm.fragment.brics(mol)
|
|
all_fragments.extend(frags)
|
|
|
|
# 3. Identify common fragments
|
|
from collections import Counter
|
|
fragment_counts = Counter(all_fragments)
|
|
common_fragments = fragment_counts.most_common(20)
|
|
|
|
# 4. Convert fragments back to molecules (remove attachment points)
|
|
def clean_fragment(frag_smiles):
|
|
# Remove [1*], [2*], etc. attachment point markers
|
|
clean = frag_smiles.replace('[1*]', '[H]')
|
|
return dm.to_mol(clean)
|
|
```
|
|
|
|
### Advanced: Fragment-Based Virtual Screening
|
|
|
|
```python
|
|
# Build fragment library from known actives
|
|
active_fragments = set()
|
|
for active_mol in active_compounds:
|
|
frags = dm.fragment.brics(active_mol)
|
|
active_fragments.update(frags)
|
|
|
|
# Screen compounds for presence of active fragments
|
|
def score_by_fragments(mol, fragment_set):
|
|
mol_frags = dm.fragment.brics(mol)
|
|
overlap = mol_frags.intersection(fragment_set)
|
|
return len(overlap) / len(mol_frags)
|
|
|
|
# Score screening library
|
|
scores = [score_by_fragments(mol, active_fragments) for mol in screening_lib]
|
|
```
|
|
|
|
### Key Concepts
|
|
|
|
- **Attachment Points**: Marked with [1*], [2*], etc. in fragment SMILES
|
|
- **Retrosynthetic**: Fragmentation mimics synthetic disconnections
|
|
- **Chemically Meaningful**: Breaks occur at typical synthetic bonds
|
|
- **Recombination**: Fragments can theoretically be recombined into valid molecules
|