Files
gh-k-dense-ai-claude-scient…/skills/datamol/references/descriptors_viz.md
2025-11-30 08:30:10 +08:00

7.2 KiB

Datamol Descriptors and Visualization Reference

Descriptors Module (datamol.descriptors)

The descriptors module provides tools for computing molecular properties and descriptors.

Specialized Descriptor Functions

dm.descriptors.n_aromatic_atoms(mol)

Calculate the number of aromatic atoms.

  • Returns: Integer count
  • Use case: Aromaticity analysis

dm.descriptors.n_aromatic_atoms_proportion(mol)

Calculate ratio of aromatic atoms to total heavy atoms.

  • Returns: Float between 0 and 1
  • Use case: Quantifying aromatic character

dm.descriptors.n_charged_atoms(mol)

Count atoms with nonzero formal charge.

  • Returns: Integer count
  • Use case: Charge distribution analysis

dm.descriptors.n_rigid_bonds(mol)

Count non-rotatable bonds (neither single bonds nor ring bonds).

  • Returns: Integer count
  • Use case: Molecular flexibility assessment

dm.descriptors.n_stereo_centers(mol)

Count stereogenic centers (chiral centers).

  • Returns: Integer count
  • Use case: Stereochemistry analysis

dm.descriptors.n_stereo_centers_unspecified(mol)

Count stereocenters lacking stereochemical specification.

  • Returns: Integer count
  • Use case: Identifying incomplete stereochemistry

Batch Descriptor Computation

dm.descriptors.compute_many_descriptors(mol, properties_fn=None, add_properties=True)

Compute multiple molecular properties for a single molecule.

  • Parameters:
    • properties_fn: Custom list of descriptor functions
    • add_properties: Include additional computed properties
  • Returns: Dictionary of descriptor name → value pairs
  • Default descriptors include:
    • Molecular weight, LogP, number of H-bond donors/acceptors
    • Aromatic atoms, stereocenters, rotatable bonds
    • TPSA (Topological Polar Surface Area)
    • Ring count, heteroatom count
  • Example:
    mol = dm.to_mol("CCO")
    descriptors = dm.descriptors.compute_many_descriptors(mol)
    # Returns: {'mw': 46.07, 'logp': -0.03, 'hbd': 1, 'hba': 1, ...}
    

dm.descriptors.batch_compute_many_descriptors(mols, properties_fn=None, add_properties=True, n_jobs=1, batch_size=None, progress=False)

Compute descriptors for multiple molecules in parallel.

  • Parameters:
    • mols: List of molecules
    • n_jobs: Number of parallel jobs (-1 for all cores)
    • batch_size: Chunk size for parallel processing
    • progress: Show progress bar
  • Returns: Pandas DataFrame with one row per molecule
  • Example:
    mols = [dm.to_mol(smi) for smi in smiles_list]
    df = dm.descriptors.batch_compute_many_descriptors(
        mols,
        n_jobs=-1,
        progress=True
    )
    

RDKit Descriptor Access

dm.descriptors.any_rdkit_descriptor(name)

Retrieve any descriptor function from RDKit by name.

  • Parameters: name - Descriptor function name (e.g., 'MolWt', 'TPSA')
  • Returns: RDKit descriptor function
  • Available descriptors: From rdkit.Chem.Descriptors and rdkit.Chem.rdMolDescriptors
  • Example:
    tpsa_fn = dm.descriptors.any_rdkit_descriptor('TPSA')
    tpsa_value = tpsa_fn(mol)
    

Common Use Cases

Drug-likeness Filtering (Lipinski's Rule of Five):

descriptors = dm.descriptors.compute_many_descriptors(mol)
is_druglike = (
    descriptors['mw'] <= 500 and
    descriptors['logp'] <= 5 and
    descriptors['hbd'] <= 5 and
    descriptors['hba'] <= 10
)

ADME Property Analysis:

df = dm.descriptors.batch_compute_many_descriptors(compound_library)
# Filter by TPSA for blood-brain barrier penetration
bbb_candidates = df[df['tpsa'] < 90]

Visualization Module (datamol.viz)

The viz module provides tools for rendering molecules and conformers as images.

Main Visualization Function

dm.viz.to_image(mols, legends=None, n_cols=4, use_svg=False, mol_size=(200, 200), highlight_atom=None, highlight_bond=None, outfile=None, max_mols=None, copy=True, indices=False, ...)

Generate image grid from molecules.

  • Parameters:
    • mols: Single molecule or list of molecules
    • legends: String or list of strings as labels (one per molecule)
    • n_cols: Number of molecules per row (default: 4)
    • use_svg: Output SVG format (True) or PNG (False, default)
    • mol_size: Tuple (width, height) or single int for square images
    • highlight_atom: Atom indices to highlight (list or dict)
    • highlight_bond: Bond indices to highlight (list or dict)
    • outfile: Save path (local or remote, supports fsspec)
    • max_mols: Maximum number of molecules to display
    • indices: Draw atom indices on structures (default: False)
    • align: Align molecules using MCS (Maximum Common Substructure)
  • Returns: Image object (can be displayed in Jupyter) or saves to file
  • Example:
    # Basic grid
    dm.viz.to_image(mols[:10], legends=[dm.to_smiles(m) for m in mols[:10]])
    
    # Save to file
    dm.viz.to_image(mols, outfile="molecules.png", n_cols=5)
    
    # Highlight substructure
    dm.viz.to_image(mol, highlight_atom=[0, 1, 2], highlight_bond=[0, 1])
    
    # Aligned visualization
    dm.viz.to_image(mols, align=True, legends=activity_labels)
    

Conformer Visualization

dm.viz.conformers(mol, n_confs=None, align_conf=True, n_cols=3, sync_views=True, remove_hs=True, ...)

Display multiple conformers in grid layout.

  • Parameters:
    • mol: Molecule with embedded conformers
    • n_confs: Number or list of conformer indices to display (None = all)
    • align_conf: Align conformers for comparison (default: True)
    • n_cols: Grid columns (default: 3)
    • sync_views: Synchronize 3D views when interactive (default: True)
    • remove_hs: Remove hydrogens for clarity (default: True)
  • Returns: Grid of conformer visualizations
  • Use case: Comparing conformational diversity
  • Example:
    mol_3d = dm.conformers.generate(mol, n_confs=20)
    dm.viz.conformers(mol_3d, n_confs=10, align_conf=True)
    

Circle Grid Visualization

dm.viz.circle_grid(center_mol, circle_mols, mol_size=200, circle_margin=50, act_mapper=None, ...)

Create concentric ring visualization with central molecule.

  • Parameters:
    • center_mol: Molecule at center
    • circle_mols: List of molecule lists (one list per ring)
    • mol_size: Image size per molecule
    • circle_margin: Spacing between rings (default: 50)
    • act_mapper: Activity mapping dictionary for color-coding
  • Returns: Circular grid image
  • Use case: Visualizing molecular neighborhoods, SAR analysis, similarity networks
  • Example:
    # Show a reference molecule surrounded by similar compounds
    dm.viz.circle_grid(
        center_mol=reference,
        circle_mols=[nearest_neighbors, second_tier]
    )
    

Visualization Best Practices

  1. Use legends for clarity: Always label molecules with SMILES, IDs, or activity values
  2. Align related molecules: Use align=True in to_image() for SAR analysis
  3. Adjust grid size: Set n_cols based on molecule count and display width
  4. Use SVG for publications: Set use_svg=True for scalable vector graphics
  5. Highlight substructures: Use highlight_atom and highlight_bond to emphasize features
  6. Save large grids: Use outfile parameter to save rather than display in memory