Initial commit
This commit is contained in:
763
skills/rdkit/SKILL.md
Normal file
763
skills/rdkit/SKILL.md
Normal file
@@ -0,0 +1,763 @@
|
||||
---
|
||||
name: rdkit
|
||||
description: "Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For standard workflows with simpler interface, use datamol (wrapper around RDKit). Use rdkit for advanced control, custom sanitization, specialized algorithms."
|
||||
---
|
||||
|
||||
# RDKit Cheminformatics Toolkit
|
||||
|
||||
## Overview
|
||||
|
||||
RDKit is a comprehensive cheminformatics library providing Python APIs for molecular analysis and manipulation. This skill provides guidance for reading/writing molecular structures, calculating descriptors, fingerprinting, substructure searching, chemical reactions, 2D/3D coordinate generation, and molecular visualization. Use this skill for drug discovery, computational chemistry, and cheminformatics research tasks.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
### 1. Molecular I/O and Creation
|
||||
|
||||
**Reading Molecules:**
|
||||
|
||||
Read molecular structures from various formats:
|
||||
|
||||
```python
|
||||
from rdkit import Chem
|
||||
|
||||
# From SMILES strings
|
||||
mol = Chem.MolFromSmiles('Cc1ccccc1') # Returns Mol object or None
|
||||
|
||||
# From MOL files
|
||||
mol = Chem.MolFromMolFile('path/to/file.mol')
|
||||
|
||||
# From MOL blocks (string data)
|
||||
mol = Chem.MolFromMolBlock(mol_block_string)
|
||||
|
||||
# From InChI
|
||||
mol = Chem.MolFromInchi('InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H')
|
||||
```
|
||||
|
||||
**Writing Molecules:**
|
||||
|
||||
Convert molecules to text representations:
|
||||
|
||||
```python
|
||||
# To canonical SMILES
|
||||
smiles = Chem.MolToSmiles(mol)
|
||||
|
||||
# To MOL block
|
||||
mol_block = Chem.MolToMolBlock(mol)
|
||||
|
||||
# To InChI
|
||||
inchi = Chem.MolToInchi(mol)
|
||||
```
|
||||
|
||||
**Batch Processing:**
|
||||
|
||||
For processing multiple molecules, use Supplier/Writer objects:
|
||||
|
||||
```python
|
||||
# Read SDF files
|
||||
suppl = Chem.SDMolSupplier('molecules.sdf')
|
||||
for mol in suppl:
|
||||
if mol is not None: # Check for parsing errors
|
||||
# Process molecule
|
||||
pass
|
||||
|
||||
# Read SMILES files
|
||||
suppl = Chem.SmilesMolSupplier('molecules.smi', titleLine=False)
|
||||
|
||||
# For large files or compressed data
|
||||
with gzip.open('molecules.sdf.gz') as f:
|
||||
suppl = Chem.ForwardSDMolSupplier(f)
|
||||
for mol in suppl:
|
||||
# Process molecule
|
||||
pass
|
||||
|
||||
# Multithreaded processing for large datasets
|
||||
suppl = Chem.MultithreadedSDMolSupplier('molecules.sdf')
|
||||
|
||||
# Write molecules to SDF
|
||||
writer = Chem.SDWriter('output.sdf')
|
||||
for mol in molecules:
|
||||
writer.write(mol)
|
||||
writer.close()
|
||||
```
|
||||
|
||||
**Important Notes:**
|
||||
- All `MolFrom*` functions return `None` on failure with error messages
|
||||
- Always check for `None` before processing molecules
|
||||
- Molecules are automatically sanitized on import (validates valence, perceives aromaticity)
|
||||
|
||||
### 2. Molecular Sanitization and Validation
|
||||
|
||||
RDKit automatically sanitizes molecules during parsing, executing 13 steps including valence checking, aromaticity perception, and chirality assignment.
|
||||
|
||||
**Sanitization Control:**
|
||||
|
||||
```python
|
||||
# Disable automatic sanitization
|
||||
mol = Chem.MolFromSmiles('C1=CC=CC=C1', sanitize=False)
|
||||
|
||||
# Manual sanitization
|
||||
Chem.SanitizeMol(mol)
|
||||
|
||||
# Detect problems before sanitization
|
||||
problems = Chem.DetectChemistryProblems(mol)
|
||||
for problem in problems:
|
||||
print(problem.GetType(), problem.Message())
|
||||
|
||||
# Partial sanitization (skip specific steps)
|
||||
from rdkit.Chem import rdMolStandardize
|
||||
Chem.SanitizeMol(mol, sanitizeOps=Chem.SANITIZE_ALL ^ Chem.SANITIZE_PROPERTIES)
|
||||
```
|
||||
|
||||
**Common Sanitization Issues:**
|
||||
- Atoms with explicit valence exceeding maximum allowed will raise exceptions
|
||||
- Invalid aromatic rings will cause kekulization errors
|
||||
- Radical electrons may not be properly assigned without explicit specification
|
||||
|
||||
### 3. Molecular Analysis and Properties
|
||||
|
||||
**Accessing Molecular Structure:**
|
||||
|
||||
```python
|
||||
# Iterate atoms and bonds
|
||||
for atom in mol.GetAtoms():
|
||||
print(atom.GetSymbol(), atom.GetIdx(), atom.GetDegree())
|
||||
|
||||
for bond in mol.GetBonds():
|
||||
print(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType())
|
||||
|
||||
# Ring information
|
||||
ring_info = mol.GetRingInfo()
|
||||
ring_info.NumRings()
|
||||
ring_info.AtomRings() # Returns tuples of atom indices
|
||||
|
||||
# Check if atom is in ring
|
||||
atom = mol.GetAtomWithIdx(0)
|
||||
atom.IsInRing()
|
||||
atom.IsInRingSize(6) # Check for 6-membered rings
|
||||
|
||||
# Find smallest set of smallest rings (SSSR)
|
||||
from rdkit.Chem import GetSymmSSSR
|
||||
rings = GetSymmSSSR(mol)
|
||||
```
|
||||
|
||||
**Stereochemistry:**
|
||||
|
||||
```python
|
||||
# Find chiral centers
|
||||
from rdkit.Chem import FindMolChiralCenters
|
||||
chiral_centers = FindMolChiralCenters(mol, includeUnassigned=True)
|
||||
# Returns list of (atom_idx, chirality) tuples
|
||||
|
||||
# Assign stereochemistry from 3D coordinates
|
||||
from rdkit.Chem import AssignStereochemistryFrom3D
|
||||
AssignStereochemistryFrom3D(mol)
|
||||
|
||||
# Check bond stereochemistry
|
||||
bond = mol.GetBondWithIdx(0)
|
||||
stereo = bond.GetStereo() # STEREONONE, STEREOZ, STEREOE, etc.
|
||||
```
|
||||
|
||||
**Fragment Analysis:**
|
||||
|
||||
```python
|
||||
# Get disconnected fragments
|
||||
frags = Chem.GetMolFrags(mol, asMols=True)
|
||||
|
||||
# Fragment on specific bonds
|
||||
from rdkit.Chem import FragmentOnBonds
|
||||
frag_mol = FragmentOnBonds(mol, [bond_idx1, bond_idx2])
|
||||
|
||||
# Count ring systems
|
||||
from rdkit.Chem.Scaffolds import MurckoScaffold
|
||||
scaffold = MurckoScaffold.GetScaffoldForMol(mol)
|
||||
```
|
||||
|
||||
### 4. Molecular Descriptors and Properties
|
||||
|
||||
**Basic Descriptors:**
|
||||
|
||||
```python
|
||||
from rdkit.Chem import Descriptors
|
||||
|
||||
# Molecular weight
|
||||
mw = Descriptors.MolWt(mol)
|
||||
exact_mw = Descriptors.ExactMolWt(mol)
|
||||
|
||||
# LogP (lipophilicity)
|
||||
logp = Descriptors.MolLogP(mol)
|
||||
|
||||
# Topological polar surface area
|
||||
tpsa = Descriptors.TPSA(mol)
|
||||
|
||||
# Number of hydrogen bond donors/acceptors
|
||||
hbd = Descriptors.NumHDonors(mol)
|
||||
hba = Descriptors.NumHAcceptors(mol)
|
||||
|
||||
# Number of rotatable bonds
|
||||
rot_bonds = Descriptors.NumRotatableBonds(mol)
|
||||
|
||||
# Number of aromatic rings
|
||||
aromatic_rings = Descriptors.NumAromaticRings(mol)
|
||||
```
|
||||
|
||||
**Batch Descriptor Calculation:**
|
||||
|
||||
```python
|
||||
# Calculate all descriptors at once
|
||||
all_descriptors = Descriptors.CalcMolDescriptors(mol)
|
||||
# Returns dictionary: {'MolWt': 180.16, 'MolLogP': 1.23, ...}
|
||||
|
||||
# Get list of available descriptor names
|
||||
descriptor_names = [desc[0] for desc in Descriptors._descList]
|
||||
```
|
||||
|
||||
**Lipinski's Rule of Five:**
|
||||
|
||||
```python
|
||||
# Check drug-likeness
|
||||
mw = Descriptors.MolWt(mol) <= 500
|
||||
logp = Descriptors.MolLogP(mol) <= 5
|
||||
hbd = Descriptors.NumHDonors(mol) <= 5
|
||||
hba = Descriptors.NumHAcceptors(mol) <= 10
|
||||
|
||||
is_drug_like = mw and logp and hbd and hba
|
||||
```
|
||||
|
||||
### 5. Fingerprints and Molecular Similarity
|
||||
|
||||
**Fingerprint Types:**
|
||||
|
||||
```python
|
||||
from rdkit.Chem import AllChem, RDKFingerprint
|
||||
from rdkit.Chem.AtomPairs import Pairs, Torsions
|
||||
from rdkit.Chem import MACCSkeys
|
||||
|
||||
# RDKit topological fingerprint
|
||||
fp = Chem.RDKFingerprint(mol)
|
||||
|
||||
# Morgan fingerprints (circular fingerprints, similar to ECFP)
|
||||
fp = AllChem.GetMorganFingerprint(mol, radius=2)
|
||||
fp_bits = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048)
|
||||
|
||||
# MACCS keys (166-bit structural key)
|
||||
fp = MACCSkeys.GenMACCSKeys(mol)
|
||||
|
||||
# Atom pair fingerprints
|
||||
fp = Pairs.GetAtomPairFingerprint(mol)
|
||||
|
||||
# Topological torsion fingerprints
|
||||
fp = Torsions.GetTopologicalTorsionFingerprint(mol)
|
||||
|
||||
# Avalon fingerprints (if available)
|
||||
from rdkit.Avalon import pyAvalonTools
|
||||
fp = pyAvalonTools.GetAvalonFP(mol)
|
||||
```
|
||||
|
||||
**Similarity Calculation:**
|
||||
|
||||
```python
|
||||
from rdkit import DataStructs
|
||||
|
||||
# Calculate Tanimoto similarity
|
||||
fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1, radius=2)
|
||||
fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2, radius=2)
|
||||
similarity = DataStructs.TanimotoSimilarity(fp1, fp2)
|
||||
|
||||
# Calculate similarity for multiple molecules
|
||||
similarities = DataStructs.BulkTanimotoSimilarity(fp1, [fp2, fp3, fp4])
|
||||
|
||||
# Other similarity metrics
|
||||
dice = DataStructs.DiceSimilarity(fp1, fp2)
|
||||
cosine = DataStructs.CosineSimilarity(fp1, fp2)
|
||||
```
|
||||
|
||||
**Clustering and Diversity:**
|
||||
|
||||
```python
|
||||
# Butina clustering based on fingerprint similarity
|
||||
from rdkit.ML.Cluster import Butina
|
||||
|
||||
# Calculate distance matrix
|
||||
dists = []
|
||||
fps = [AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mols]
|
||||
for i in range(len(fps)):
|
||||
sims = DataStructs.BulkTanimotoSimilarity(fps[i], fps[:i])
|
||||
dists.extend([1-sim for sim in sims])
|
||||
|
||||
# Cluster with distance cutoff
|
||||
clusters = Butina.ClusterData(dists, len(fps), distThresh=0.3, isDistData=True)
|
||||
```
|
||||
|
||||
### 6. Substructure Searching and SMARTS
|
||||
|
||||
**Basic Substructure Matching:**
|
||||
|
||||
```python
|
||||
# Define query using SMARTS
|
||||
query = Chem.MolFromSmarts('[#6]1:[#6]:[#6]:[#6]:[#6]:[#6]:1') # Benzene ring
|
||||
|
||||
# Check if molecule contains substructure
|
||||
has_match = mol.HasSubstructMatch(query)
|
||||
|
||||
# Get all matches (returns tuple of tuples with atom indices)
|
||||
matches = mol.GetSubstructMatches(query)
|
||||
|
||||
# Get only first match
|
||||
match = mol.GetSubstructMatch(query)
|
||||
```
|
||||
|
||||
**Common SMARTS Patterns:**
|
||||
|
||||
```python
|
||||
# Primary alcohols
|
||||
primary_alcohol = Chem.MolFromSmarts('[CH2][OH1]')
|
||||
|
||||
# Carboxylic acids
|
||||
carboxylic_acid = Chem.MolFromSmarts('C(=O)[OH]')
|
||||
|
||||
# Amides
|
||||
amide = Chem.MolFromSmarts('C(=O)N')
|
||||
|
||||
# Aromatic heterocycles
|
||||
aromatic_n = Chem.MolFromSmarts('[nR]') # Aromatic nitrogen in ring
|
||||
|
||||
# Macrocycles (rings > 12 atoms)
|
||||
macrocycle = Chem.MolFromSmarts('[r{12-}]')
|
||||
```
|
||||
|
||||
**Matching Rules:**
|
||||
- Unspecified properties in query match any value in target
|
||||
- Hydrogens are ignored unless explicitly specified
|
||||
- Charged query atom won't match uncharged target atom
|
||||
- Aromatic query atom won't match aliphatic target atom (unless query is generic)
|
||||
|
||||
### 7. Chemical Reactions
|
||||
|
||||
**Reaction SMARTS:**
|
||||
|
||||
```python
|
||||
from rdkit.Chem import AllChem
|
||||
|
||||
# Define reaction using SMARTS: reactants >> products
|
||||
rxn = AllChem.ReactionFromSmarts('[C:1]=[O:2]>>[C:1][O:2]') # Ketone reduction
|
||||
|
||||
# Apply reaction to molecules
|
||||
reactants = (mol1,)
|
||||
products = rxn.RunReactants(reactants)
|
||||
|
||||
# Products is tuple of tuples (one tuple per product set)
|
||||
for product_set in products:
|
||||
for product in product_set:
|
||||
# Sanitize product
|
||||
Chem.SanitizeMol(product)
|
||||
```
|
||||
|
||||
**Reaction Features:**
|
||||
- Atom mapping preserves specific atoms between reactants and products
|
||||
- Dummy atoms in products are replaced by corresponding reactant atoms
|
||||
- "Any" bonds inherit bond order from reactants
|
||||
- Chirality preserved unless explicitly changed
|
||||
|
||||
**Reaction Similarity:**
|
||||
|
||||
```python
|
||||
# Generate reaction fingerprints
|
||||
fp = AllChem.CreateDifferenceFingerprintForReaction(rxn)
|
||||
|
||||
# Compare reactions
|
||||
similarity = DataStructs.TanimotoSimilarity(fp1, fp2)
|
||||
```
|
||||
|
||||
### 8. 2D and 3D Coordinate Generation
|
||||
|
||||
**2D Coordinate Generation:**
|
||||
|
||||
```python
|
||||
from rdkit.Chem import AllChem
|
||||
|
||||
# Generate 2D coordinates for depiction
|
||||
AllChem.Compute2DCoords(mol)
|
||||
|
||||
# Align molecule to template structure
|
||||
template = Chem.MolFromSmiles('c1ccccc1')
|
||||
AllChem.Compute2DCoords(template)
|
||||
AllChem.GenerateDepictionMatching2DStructure(mol, template)
|
||||
```
|
||||
|
||||
**3D Coordinate Generation and Conformers:**
|
||||
|
||||
```python
|
||||
# Generate single 3D conformer using ETKDG
|
||||
AllChem.EmbedMolecule(mol, randomSeed=42)
|
||||
|
||||
# Generate multiple conformers
|
||||
conf_ids = AllChem.EmbedMultipleConfs(mol, numConfs=10, randomSeed=42)
|
||||
|
||||
# Optimize geometry with force field
|
||||
AllChem.UFFOptimizeMolecule(mol) # UFF force field
|
||||
AllChem.MMFFOptimizeMolecule(mol) # MMFF94 force field
|
||||
|
||||
# Optimize all conformers
|
||||
for conf_id in conf_ids:
|
||||
AllChem.MMFFOptimizeMolecule(mol, confId=conf_id)
|
||||
|
||||
# Calculate RMSD between conformers
|
||||
from rdkit.Chem import AllChem
|
||||
rms = AllChem.GetConformerRMS(mol, conf_id1, conf_id2)
|
||||
|
||||
# Align molecules
|
||||
AllChem.AlignMol(probe_mol, ref_mol)
|
||||
```
|
||||
|
||||
**Constrained Embedding:**
|
||||
|
||||
```python
|
||||
# Embed with part of molecule constrained to specific coordinates
|
||||
AllChem.ConstrainedEmbed(mol, core_mol)
|
||||
```
|
||||
|
||||
### 9. Molecular Visualization
|
||||
|
||||
**Basic Drawing:**
|
||||
|
||||
```python
|
||||
from rdkit.Chem import Draw
|
||||
|
||||
# Draw single molecule to PIL image
|
||||
img = Draw.MolToImage(mol, size=(300, 300))
|
||||
img.save('molecule.png')
|
||||
|
||||
# Draw to file directly
|
||||
Draw.MolToFile(mol, 'molecule.png')
|
||||
|
||||
# Draw multiple molecules in grid
|
||||
mols = [mol1, mol2, mol3, mol4]
|
||||
img = Draw.MolsToGridImage(mols, molsPerRow=2, subImgSize=(200, 200))
|
||||
```
|
||||
|
||||
**Highlighting Substructures:**
|
||||
|
||||
```python
|
||||
# Highlight substructure match
|
||||
query = Chem.MolFromSmarts('c1ccccc1')
|
||||
match = mol.GetSubstructMatch(query)
|
||||
|
||||
img = Draw.MolToImage(mol, highlightAtoms=match)
|
||||
|
||||
# Custom highlight colors
|
||||
highlight_colors = {atom_idx: (1, 0, 0) for atom_idx in match} # Red
|
||||
img = Draw.MolToImage(mol, highlightAtoms=match,
|
||||
highlightAtomColors=highlight_colors)
|
||||
```
|
||||
|
||||
**Customizing Visualization:**
|
||||
|
||||
```python
|
||||
from rdkit.Chem.Draw import rdMolDraw2D
|
||||
|
||||
# Create drawer with custom options
|
||||
drawer = rdMolDraw2D.MolDraw2DCairo(300, 300)
|
||||
opts = drawer.drawOptions()
|
||||
|
||||
# Customize options
|
||||
opts.addAtomIndices = True
|
||||
opts.addStereoAnnotation = True
|
||||
opts.bondLineWidth = 2
|
||||
|
||||
# Draw molecule
|
||||
drawer.DrawMolecule(mol)
|
||||
drawer.FinishDrawing()
|
||||
|
||||
# Save to file
|
||||
with open('molecule.png', 'wb') as f:
|
||||
f.write(drawer.GetDrawingText())
|
||||
```
|
||||
|
||||
**Jupyter Notebook Integration:**
|
||||
|
||||
```python
|
||||
# Enable inline display in Jupyter
|
||||
from rdkit.Chem.Draw import IPythonConsole
|
||||
|
||||
# Customize default display
|
||||
IPythonConsole.ipython_useSVG = True # Use SVG instead of PNG
|
||||
IPythonConsole.molSize = (300, 300) # Default size
|
||||
|
||||
# Molecules now display automatically
|
||||
mol # Shows molecule image
|
||||
```
|
||||
|
||||
**Visualizing Fingerprint Bits:**
|
||||
|
||||
```python
|
||||
# Show what molecular features a fingerprint bit represents
|
||||
from rdkit.Chem import Draw
|
||||
|
||||
# For Morgan fingerprints
|
||||
bit_info = {}
|
||||
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, bitInfo=bit_info)
|
||||
|
||||
# Draw environment for specific bit
|
||||
img = Draw.DrawMorganBit(mol, bit_id, bit_info)
|
||||
```
|
||||
|
||||
### 10. Molecular Modification
|
||||
|
||||
**Adding/Removing Hydrogens:**
|
||||
|
||||
```python
|
||||
# Add explicit hydrogens
|
||||
mol_h = Chem.AddHs(mol)
|
||||
|
||||
# Remove explicit hydrogens
|
||||
mol = Chem.RemoveHs(mol_h)
|
||||
```
|
||||
|
||||
**Kekulization and Aromaticity:**
|
||||
|
||||
```python
|
||||
# Convert aromatic bonds to alternating single/double
|
||||
Chem.Kekulize(mol)
|
||||
|
||||
# Set aromaticity
|
||||
Chem.SetAromaticity(mol)
|
||||
```
|
||||
|
||||
**Replacing Substructures:**
|
||||
|
||||
```python
|
||||
# Replace substructure with another structure
|
||||
query = Chem.MolFromSmarts('c1ccccc1') # Benzene
|
||||
replacement = Chem.MolFromSmiles('C1CCCCC1') # Cyclohexane
|
||||
|
||||
new_mol = Chem.ReplaceSubstructs(mol, query, replacement)[0]
|
||||
```
|
||||
|
||||
**Neutralizing Charges:**
|
||||
|
||||
```python
|
||||
# Remove formal charges by adding/removing hydrogens
|
||||
from rdkit.Chem.MolStandardize import rdMolStandardize
|
||||
|
||||
# Using Uncharger
|
||||
uncharger = rdMolStandardize.Uncharger()
|
||||
mol_neutral = uncharger.uncharge(mol)
|
||||
```
|
||||
|
||||
### 11. Working with Molecular Hashes and Standardization
|
||||
|
||||
**Molecular Hashing:**
|
||||
|
||||
```python
|
||||
from rdkit.Chem import rdMolHash
|
||||
|
||||
# Generate Murcko scaffold hash
|
||||
scaffold_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.MurckoScaffold)
|
||||
|
||||
# Canonical SMILES hash
|
||||
canonical_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.CanonicalSmiles)
|
||||
|
||||
# Regioisomer hash (ignores stereochemistry)
|
||||
regio_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.Regioisomer)
|
||||
```
|
||||
|
||||
**Randomized SMILES:**
|
||||
|
||||
```python
|
||||
# Generate random SMILES representations (for data augmentation)
|
||||
from rdkit.Chem import MolToRandomSmilesVect
|
||||
|
||||
random_smiles = MolToRandomSmilesVect(mol, numSmiles=10, randomSeed=42)
|
||||
```
|
||||
|
||||
### 12. Pharmacophore and 3D Features
|
||||
|
||||
**Pharmacophore Features:**
|
||||
|
||||
```python
|
||||
from rdkit.Chem import ChemicalFeatures
|
||||
from rdkit import RDConfig
|
||||
import os
|
||||
|
||||
# Load feature factory
|
||||
fdef_path = os.path.join(RDConfig.RDDataDir, 'BaseFeatures.fdef')
|
||||
factory = ChemicalFeatures.BuildFeatureFactory(fdef_path)
|
||||
|
||||
# Get pharmacophore features
|
||||
features = factory.GetFeaturesForMol(mol)
|
||||
|
||||
for feat in features:
|
||||
print(feat.GetFamily(), feat.GetType(), feat.GetAtomIds())
|
||||
```
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### Drug-likeness Analysis
|
||||
|
||||
```python
|
||||
from rdkit import Chem
|
||||
from rdkit.Chem import Descriptors
|
||||
|
||||
def analyze_druglikeness(smiles):
|
||||
mol = Chem.MolFromSmiles(smiles)
|
||||
if mol is None:
|
||||
return None
|
||||
|
||||
# Calculate Lipinski descriptors
|
||||
results = {
|
||||
'MW': Descriptors.MolWt(mol),
|
||||
'LogP': Descriptors.MolLogP(mol),
|
||||
'HBD': Descriptors.NumHDonors(mol),
|
||||
'HBA': Descriptors.NumHAcceptors(mol),
|
||||
'TPSA': Descriptors.TPSA(mol),
|
||||
'RotBonds': Descriptors.NumRotatableBonds(mol)
|
||||
}
|
||||
|
||||
# Check Lipinski's Rule of Five
|
||||
results['Lipinski'] = (
|
||||
results['MW'] <= 500 and
|
||||
results['LogP'] <= 5 and
|
||||
results['HBD'] <= 5 and
|
||||
results['HBA'] <= 10
|
||||
)
|
||||
|
||||
return results
|
||||
```
|
||||
|
||||
### Similarity Screening
|
||||
|
||||
```python
|
||||
from rdkit import Chem
|
||||
from rdkit.Chem import AllChem
|
||||
from rdkit import DataStructs
|
||||
|
||||
def similarity_screen(query_smiles, database_smiles, threshold=0.7):
|
||||
query_mol = Chem.MolFromSmiles(query_smiles)
|
||||
query_fp = AllChem.GetMorganFingerprintAsBitVect(query_mol, 2)
|
||||
|
||||
hits = []
|
||||
for idx, smiles in enumerate(database_smiles):
|
||||
mol = Chem.MolFromSmiles(smiles)
|
||||
if mol:
|
||||
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2)
|
||||
sim = DataStructs.TanimotoSimilarity(query_fp, fp)
|
||||
if sim >= threshold:
|
||||
hits.append((idx, smiles, sim))
|
||||
|
||||
return sorted(hits, key=lambda x: x[2], reverse=True)
|
||||
```
|
||||
|
||||
### Substructure Filtering
|
||||
|
||||
```python
|
||||
from rdkit import Chem
|
||||
|
||||
def filter_by_substructure(smiles_list, pattern_smarts):
|
||||
query = Chem.MolFromSmarts(pattern_smarts)
|
||||
|
||||
hits = []
|
||||
for smiles in smiles_list:
|
||||
mol = Chem.MolFromSmiles(smiles)
|
||||
if mol and mol.HasSubstructMatch(query):
|
||||
hits.append(smiles)
|
||||
|
||||
return hits
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Error Handling
|
||||
|
||||
Always check for `None` when parsing molecules:
|
||||
|
||||
```python
|
||||
mol = Chem.MolFromSmiles(smiles)
|
||||
if mol is None:
|
||||
print(f"Failed to parse: {smiles}")
|
||||
continue
|
||||
```
|
||||
|
||||
### Performance Optimization
|
||||
|
||||
**Use binary formats for storage:**
|
||||
|
||||
```python
|
||||
import pickle
|
||||
|
||||
# Pickle molecules for fast loading
|
||||
with open('molecules.pkl', 'wb') as f:
|
||||
pickle.dump(mols, f)
|
||||
|
||||
# Load pickled molecules (much faster than reparsing)
|
||||
with open('molecules.pkl', 'rb') as f:
|
||||
mols = pickle.load(f)
|
||||
```
|
||||
|
||||
**Use bulk operations:**
|
||||
|
||||
```python
|
||||
# Calculate fingerprints for all molecules at once
|
||||
fps = [AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mols]
|
||||
|
||||
# Use bulk similarity calculations
|
||||
similarities = DataStructs.BulkTanimotoSimilarity(fps[0], fps[1:])
|
||||
```
|
||||
|
||||
### Thread Safety
|
||||
|
||||
RDKit operations are generally thread-safe for:
|
||||
- Molecule I/O (SMILES, mol blocks)
|
||||
- Coordinate generation
|
||||
- Fingerprinting and descriptors
|
||||
- Substructure searching
|
||||
- Reactions
|
||||
- Drawing
|
||||
|
||||
**Not thread-safe:** MolSuppliers when accessed concurrently.
|
||||
|
||||
### Memory Management
|
||||
|
||||
For large datasets:
|
||||
|
||||
```python
|
||||
# Use ForwardSDMolSupplier to avoid loading entire file
|
||||
with open('large.sdf') as f:
|
||||
suppl = Chem.ForwardSDMolSupplier(f)
|
||||
for mol in suppl:
|
||||
# Process one molecule at a time
|
||||
pass
|
||||
|
||||
# Use MultithreadedSDMolSupplier for parallel processing
|
||||
suppl = Chem.MultithreadedSDMolSupplier('large.sdf', numWriterThreads=4)
|
||||
```
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Forgetting to check for None:** Always validate molecules after parsing
|
||||
2. **Sanitization failures:** Use `DetectChemistryProblems()` to debug
|
||||
3. **Missing hydrogens:** Use `AddHs()` when calculating properties that depend on hydrogen
|
||||
4. **2D vs 3D:** Generate appropriate coordinates before visualization or 3D analysis
|
||||
5. **SMARTS matching rules:** Remember that unspecified properties match anything
|
||||
6. **Thread safety with MolSuppliers:** Don't share supplier objects across threads
|
||||
|
||||
## Resources
|
||||
|
||||
### references/
|
||||
|
||||
This skill includes detailed API reference documentation:
|
||||
|
||||
- `api_reference.md` - Comprehensive listing of RDKit modules, functions, and classes organized by functionality
|
||||
- `descriptors_reference.md` - Complete list of available molecular descriptors with descriptions
|
||||
- `smarts_patterns.md` - Common SMARTS patterns for functional groups and structural features
|
||||
|
||||
Load these references when needing specific API details, parameter information, or pattern examples.
|
||||
|
||||
### scripts/
|
||||
|
||||
Example scripts for common RDKit workflows:
|
||||
|
||||
- `molecular_properties.py` - Calculate comprehensive molecular properties and descriptors
|
||||
- `similarity_search.py` - Perform fingerprint-based similarity screening
|
||||
- `substructure_filter.py` - Filter molecules by substructure patterns
|
||||
|
||||
These scripts can be executed directly or used as templates for custom workflows.
|
||||
Reference in New Issue
Block a user