Initial commit
This commit is contained in:
195
skills/datamol/references/descriptors_viz.md
Normal file
195
skills/datamol/references/descriptors_viz.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# Datamol Descriptors and Visualization Reference
|
||||
|
||||
## Descriptors Module (`datamol.descriptors`)
|
||||
|
||||
The descriptors module provides tools for computing molecular properties and descriptors.
|
||||
|
||||
### Specialized Descriptor Functions
|
||||
|
||||
#### `dm.descriptors.n_aromatic_atoms(mol)`
|
||||
Calculate the number of aromatic atoms.
|
||||
- **Returns**: Integer count
|
||||
- **Use case**: Aromaticity analysis
|
||||
|
||||
#### `dm.descriptors.n_aromatic_atoms_proportion(mol)`
|
||||
Calculate ratio of aromatic atoms to total heavy atoms.
|
||||
- **Returns**: Float between 0 and 1
|
||||
- **Use case**: Quantifying aromatic character
|
||||
|
||||
#### `dm.descriptors.n_charged_atoms(mol)`
|
||||
Count atoms with nonzero formal charge.
|
||||
- **Returns**: Integer count
|
||||
- **Use case**: Charge distribution analysis
|
||||
|
||||
#### `dm.descriptors.n_rigid_bonds(mol)`
|
||||
Count non-rotatable bonds (neither single bonds nor ring bonds).
|
||||
- **Returns**: Integer count
|
||||
- **Use case**: Molecular flexibility assessment
|
||||
|
||||
#### `dm.descriptors.n_stereo_centers(mol)`
|
||||
Count stereogenic centers (chiral centers).
|
||||
- **Returns**: Integer count
|
||||
- **Use case**: Stereochemistry analysis
|
||||
|
||||
#### `dm.descriptors.n_stereo_centers_unspecified(mol)`
|
||||
Count stereocenters lacking stereochemical specification.
|
||||
- **Returns**: Integer count
|
||||
- **Use case**: Identifying incomplete stereochemistry
|
||||
|
||||
### Batch Descriptor Computation
|
||||
|
||||
#### `dm.descriptors.compute_many_descriptors(mol, properties_fn=None, add_properties=True)`
|
||||
Compute multiple molecular properties for a single molecule.
|
||||
- **Parameters**:
|
||||
- `properties_fn`: Custom list of descriptor functions
|
||||
- `add_properties`: Include additional computed properties
|
||||
- **Returns**: Dictionary of descriptor name → value pairs
|
||||
- **Default descriptors include**:
|
||||
- Molecular weight, LogP, number of H-bond donors/acceptors
|
||||
- Aromatic atoms, stereocenters, rotatable bonds
|
||||
- TPSA (Topological Polar Surface Area)
|
||||
- Ring count, heteroatom count
|
||||
- **Example**:
|
||||
```python
|
||||
mol = dm.to_mol("CCO")
|
||||
descriptors = dm.descriptors.compute_many_descriptors(mol)
|
||||
# Returns: {'mw': 46.07, 'logp': -0.03, 'hbd': 1, 'hba': 1, ...}
|
||||
```
|
||||
|
||||
#### `dm.descriptors.batch_compute_many_descriptors(mols, properties_fn=None, add_properties=True, n_jobs=1, batch_size=None, progress=False)`
|
||||
Compute descriptors for multiple molecules in parallel.
|
||||
- **Parameters**:
|
||||
- `mols`: List of molecules
|
||||
- `n_jobs`: Number of parallel jobs (-1 for all cores)
|
||||
- `batch_size`: Chunk size for parallel processing
|
||||
- `progress`: Show progress bar
|
||||
- **Returns**: Pandas DataFrame with one row per molecule
|
||||
- **Example**:
|
||||
```python
|
||||
mols = [dm.to_mol(smi) for smi in smiles_list]
|
||||
df = dm.descriptors.batch_compute_many_descriptors(
|
||||
mols,
|
||||
n_jobs=-1,
|
||||
progress=True
|
||||
)
|
||||
```
|
||||
|
||||
### RDKit Descriptor Access
|
||||
|
||||
#### `dm.descriptors.any_rdkit_descriptor(name)`
|
||||
Retrieve any descriptor function from RDKit by name.
|
||||
- **Parameters**: `name` - Descriptor function name (e.g., 'MolWt', 'TPSA')
|
||||
- **Returns**: RDKit descriptor function
|
||||
- **Available descriptors**: From `rdkit.Chem.Descriptors` and `rdkit.Chem.rdMolDescriptors`
|
||||
- **Example**:
|
||||
```python
|
||||
tpsa_fn = dm.descriptors.any_rdkit_descriptor('TPSA')
|
||||
tpsa_value = tpsa_fn(mol)
|
||||
```
|
||||
|
||||
### Common Use Cases
|
||||
|
||||
**Drug-likeness Filtering (Lipinski's Rule of Five)**:
|
||||
```python
|
||||
descriptors = dm.descriptors.compute_many_descriptors(mol)
|
||||
is_druglike = (
|
||||
descriptors['mw'] <= 500 and
|
||||
descriptors['logp'] <= 5 and
|
||||
descriptors['hbd'] <= 5 and
|
||||
descriptors['hba'] <= 10
|
||||
)
|
||||
```
|
||||
|
||||
**ADME Property Analysis**:
|
||||
```python
|
||||
df = dm.descriptors.batch_compute_many_descriptors(compound_library)
|
||||
# Filter by TPSA for blood-brain barrier penetration
|
||||
bbb_candidates = df[df['tpsa'] < 90]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Visualization Module (`datamol.viz`)
|
||||
|
||||
The viz module provides tools for rendering molecules and conformers as images.
|
||||
|
||||
### Main Visualization Function
|
||||
|
||||
#### `dm.viz.to_image(mols, legends=None, n_cols=4, use_svg=False, mol_size=(200, 200), highlight_atom=None, highlight_bond=None, outfile=None, max_mols=None, copy=True, indices=False, ...)`
|
||||
Generate image grid from molecules.
|
||||
- **Parameters**:
|
||||
- `mols`: Single molecule or list of molecules
|
||||
- `legends`: String or list of strings as labels (one per molecule)
|
||||
- `n_cols`: Number of molecules per row (default: 4)
|
||||
- `use_svg`: Output SVG format (True) or PNG (False, default)
|
||||
- `mol_size`: Tuple (width, height) or single int for square images
|
||||
- `highlight_atom`: Atom indices to highlight (list or dict)
|
||||
- `highlight_bond`: Bond indices to highlight (list or dict)
|
||||
- `outfile`: Save path (local or remote, supports fsspec)
|
||||
- `max_mols`: Maximum number of molecules to display
|
||||
- `indices`: Draw atom indices on structures (default: False)
|
||||
- `align`: Align molecules using MCS (Maximum Common Substructure)
|
||||
- **Returns**: Image object (can be displayed in Jupyter) or saves to file
|
||||
- **Example**:
|
||||
```python
|
||||
# Basic grid
|
||||
dm.viz.to_image(mols[:10], legends=[dm.to_smiles(m) for m in mols[:10]])
|
||||
|
||||
# Save to file
|
||||
dm.viz.to_image(mols, outfile="molecules.png", n_cols=5)
|
||||
|
||||
# Highlight substructure
|
||||
dm.viz.to_image(mol, highlight_atom=[0, 1, 2], highlight_bond=[0, 1])
|
||||
|
||||
# Aligned visualization
|
||||
dm.viz.to_image(mols, align=True, legends=activity_labels)
|
||||
```
|
||||
|
||||
### Conformer Visualization
|
||||
|
||||
#### `dm.viz.conformers(mol, n_confs=None, align_conf=True, n_cols=3, sync_views=True, remove_hs=True, ...)`
|
||||
Display multiple conformers in grid layout.
|
||||
- **Parameters**:
|
||||
- `mol`: Molecule with embedded conformers
|
||||
- `n_confs`: Number or list of conformer indices to display (None = all)
|
||||
- `align_conf`: Align conformers for comparison (default: True)
|
||||
- `n_cols`: Grid columns (default: 3)
|
||||
- `sync_views`: Synchronize 3D views when interactive (default: True)
|
||||
- `remove_hs`: Remove hydrogens for clarity (default: True)
|
||||
- **Returns**: Grid of conformer visualizations
|
||||
- **Use case**: Comparing conformational diversity
|
||||
- **Example**:
|
||||
```python
|
||||
mol_3d = dm.conformers.generate(mol, n_confs=20)
|
||||
dm.viz.conformers(mol_3d, n_confs=10, align_conf=True)
|
||||
```
|
||||
|
||||
### Circle Grid Visualization
|
||||
|
||||
#### `dm.viz.circle_grid(center_mol, circle_mols, mol_size=200, circle_margin=50, act_mapper=None, ...)`
|
||||
Create concentric ring visualization with central molecule.
|
||||
- **Parameters**:
|
||||
- `center_mol`: Molecule at center
|
||||
- `circle_mols`: List of molecule lists (one list per ring)
|
||||
- `mol_size`: Image size per molecule
|
||||
- `circle_margin`: Spacing between rings (default: 50)
|
||||
- `act_mapper`: Activity mapping dictionary for color-coding
|
||||
- **Returns**: Circular grid image
|
||||
- **Use case**: Visualizing molecular neighborhoods, SAR analysis, similarity networks
|
||||
- **Example**:
|
||||
```python
|
||||
# Show a reference molecule surrounded by similar compounds
|
||||
dm.viz.circle_grid(
|
||||
center_mol=reference,
|
||||
circle_mols=[nearest_neighbors, second_tier]
|
||||
)
|
||||
```
|
||||
|
||||
### Visualization Best Practices
|
||||
|
||||
1. **Use legends for clarity**: Always label molecules with SMILES, IDs, or activity values
|
||||
2. **Align related molecules**: Use `align=True` in `to_image()` for SAR analysis
|
||||
3. **Adjust grid size**: Set `n_cols` based on molecule count and display width
|
||||
4. **Use SVG for publications**: Set `use_svg=True` for scalable vector graphics
|
||||
5. **Highlight substructures**: Use `highlight_atom` and `highlight_bond` to emphasize features
|
||||
6. **Save large grids**: Use `outfile` parameter to save rather than display in memory
|
||||
Reference in New Issue
Block a user