7.2 KiB
Datamol Descriptors and Visualization Reference
Descriptors Module (datamol.descriptors)
The descriptors module provides tools for computing molecular properties and descriptors.
Specialized Descriptor Functions
dm.descriptors.n_aromatic_atoms(mol)
Calculate the number of aromatic atoms.
- Returns: Integer count
- Use case: Aromaticity analysis
dm.descriptors.n_aromatic_atoms_proportion(mol)
Calculate ratio of aromatic atoms to total heavy atoms.
- Returns: Float between 0 and 1
- Use case: Quantifying aromatic character
dm.descriptors.n_charged_atoms(mol)
Count atoms with nonzero formal charge.
- Returns: Integer count
- Use case: Charge distribution analysis
dm.descriptors.n_rigid_bonds(mol)
Count non-rotatable bonds (neither single bonds nor ring bonds).
- Returns: Integer count
- Use case: Molecular flexibility assessment
dm.descriptors.n_stereo_centers(mol)
Count stereogenic centers (chiral centers).
- Returns: Integer count
- Use case: Stereochemistry analysis
dm.descriptors.n_stereo_centers_unspecified(mol)
Count stereocenters lacking stereochemical specification.
- Returns: Integer count
- Use case: Identifying incomplete stereochemistry
Batch Descriptor Computation
dm.descriptors.compute_many_descriptors(mol, properties_fn=None, add_properties=True)
Compute multiple molecular properties for a single molecule.
- Parameters:
properties_fn: Custom list of descriptor functionsadd_properties: Include additional computed properties
- Returns: Dictionary of descriptor name → value pairs
- Default descriptors include:
- Molecular weight, LogP, number of H-bond donors/acceptors
- Aromatic atoms, stereocenters, rotatable bonds
- TPSA (Topological Polar Surface Area)
- Ring count, heteroatom count
- Example:
mol = dm.to_mol("CCO") descriptors = dm.descriptors.compute_many_descriptors(mol) # Returns: {'mw': 46.07, 'logp': -0.03, 'hbd': 1, 'hba': 1, ...}
dm.descriptors.batch_compute_many_descriptors(mols, properties_fn=None, add_properties=True, n_jobs=1, batch_size=None, progress=False)
Compute descriptors for multiple molecules in parallel.
- Parameters:
mols: List of moleculesn_jobs: Number of parallel jobs (-1 for all cores)batch_size: Chunk size for parallel processingprogress: Show progress bar
- Returns: Pandas DataFrame with one row per molecule
- Example:
mols = [dm.to_mol(smi) for smi in smiles_list] df = dm.descriptors.batch_compute_many_descriptors( mols, n_jobs=-1, progress=True )
RDKit Descriptor Access
dm.descriptors.any_rdkit_descriptor(name)
Retrieve any descriptor function from RDKit by name.
- Parameters:
name- Descriptor function name (e.g., 'MolWt', 'TPSA') - Returns: RDKit descriptor function
- Available descriptors: From
rdkit.Chem.Descriptorsandrdkit.Chem.rdMolDescriptors - Example:
tpsa_fn = dm.descriptors.any_rdkit_descriptor('TPSA') tpsa_value = tpsa_fn(mol)
Common Use Cases
Drug-likeness Filtering (Lipinski's Rule of Five):
descriptors = dm.descriptors.compute_many_descriptors(mol)
is_druglike = (
descriptors['mw'] <= 500 and
descriptors['logp'] <= 5 and
descriptors['hbd'] <= 5 and
descriptors['hba'] <= 10
)
ADME Property Analysis:
df = dm.descriptors.batch_compute_many_descriptors(compound_library)
# Filter by TPSA for blood-brain barrier penetration
bbb_candidates = df[df['tpsa'] < 90]
Visualization Module (datamol.viz)
The viz module provides tools for rendering molecules and conformers as images.
Main Visualization Function
dm.viz.to_image(mols, legends=None, n_cols=4, use_svg=False, mol_size=(200, 200), highlight_atom=None, highlight_bond=None, outfile=None, max_mols=None, copy=True, indices=False, ...)
Generate image grid from molecules.
- Parameters:
mols: Single molecule or list of moleculeslegends: String or list of strings as labels (one per molecule)n_cols: Number of molecules per row (default: 4)use_svg: Output SVG format (True) or PNG (False, default)mol_size: Tuple (width, height) or single int for square imageshighlight_atom: Atom indices to highlight (list or dict)highlight_bond: Bond indices to highlight (list or dict)outfile: Save path (local or remote, supports fsspec)max_mols: Maximum number of molecules to displayindices: Draw atom indices on structures (default: False)align: Align molecules using MCS (Maximum Common Substructure)
- Returns: Image object (can be displayed in Jupyter) or saves to file
- Example:
# Basic grid dm.viz.to_image(mols[:10], legends=[dm.to_smiles(m) for m in mols[:10]]) # Save to file dm.viz.to_image(mols, outfile="molecules.png", n_cols=5) # Highlight substructure dm.viz.to_image(mol, highlight_atom=[0, 1, 2], highlight_bond=[0, 1]) # Aligned visualization dm.viz.to_image(mols, align=True, legends=activity_labels)
Conformer Visualization
dm.viz.conformers(mol, n_confs=None, align_conf=True, n_cols=3, sync_views=True, remove_hs=True, ...)
Display multiple conformers in grid layout.
- Parameters:
mol: Molecule with embedded conformersn_confs: Number or list of conformer indices to display (None = all)align_conf: Align conformers for comparison (default: True)n_cols: Grid columns (default: 3)sync_views: Synchronize 3D views when interactive (default: True)remove_hs: Remove hydrogens for clarity (default: True)
- Returns: Grid of conformer visualizations
- Use case: Comparing conformational diversity
- Example:
mol_3d = dm.conformers.generate(mol, n_confs=20) dm.viz.conformers(mol_3d, n_confs=10, align_conf=True)
Circle Grid Visualization
dm.viz.circle_grid(center_mol, circle_mols, mol_size=200, circle_margin=50, act_mapper=None, ...)
Create concentric ring visualization with central molecule.
- Parameters:
center_mol: Molecule at centercircle_mols: List of molecule lists (one list per ring)mol_size: Image size per moleculecircle_margin: Spacing between rings (default: 50)act_mapper: Activity mapping dictionary for color-coding
- Returns: Circular grid image
- Use case: Visualizing molecular neighborhoods, SAR analysis, similarity networks
- Example:
# Show a reference molecule surrounded by similar compounds dm.viz.circle_grid( center_mol=reference, circle_mols=[nearest_neighbors, second_tier] )
Visualization Best Practices
- Use legends for clarity: Always label molecules with SMILES, IDs, or activity values
- Align related molecules: Use
align=Trueinto_image()for SAR analysis - Adjust grid size: Set
n_colsbased on molecule count and display width - Use SVG for publications: Set
use_svg=Truefor scalable vector graphics - Highlight substructures: Use
highlight_atomandhighlight_bondto emphasize features - Save large grids: Use
outfileparameter to save rather than display in memory