16 KiB
RDKit API Reference
This document provides a comprehensive reference for RDKit's Python API, organized by functionality.
Core Module: rdkit.Chem
The fundamental module for working with molecules.
Molecule I/O
Reading Molecules:
Chem.MolFromSmiles(smiles, sanitize=True)- Parse SMILES stringChem.MolFromSmarts(smarts)- Parse SMARTS patternChem.MolFromMolFile(filename, sanitize=True, removeHs=True)- Read MOL fileChem.MolFromMolBlock(molblock, sanitize=True, removeHs=True)- Parse MOL block stringChem.MolFromMol2File(filename, sanitize=True, removeHs=True)- Read MOL2 fileChem.MolFromMol2Block(molblock, sanitize=True, removeHs=True)- Parse MOL2 blockChem.MolFromPDBFile(filename, sanitize=True, removeHs=True)- Read PDB fileChem.MolFromPDBBlock(pdbblock, sanitize=True, removeHs=True)- Parse PDB blockChem.MolFromInchi(inchi, sanitize=True, removeHs=True)- Parse InChI stringChem.MolFromSequence(seq, sanitize=True)- Create molecule from peptide sequence
Writing Molecules:
Chem.MolToSmiles(mol, isomericSmiles=True, canonical=True)- Convert to SMILESChem.MolToSmarts(mol, isomericSmarts=False)- Convert to SMARTSChem.MolToMolBlock(mol, includeStereo=True, confId=-1)- Convert to MOL blockChem.MolToMolFile(mol, filename, includeStereo=True, confId=-1)- Write MOL fileChem.MolToPDBBlock(mol, confId=-1)- Convert to PDB blockChem.MolToPDBFile(mol, filename, confId=-1)- Write PDB fileChem.MolToInchi(mol, options='')- Convert to InChIChem.MolToInchiKey(mol, options='')- Generate InChI keyChem.MolToSequence(mol)- Convert to peptide sequence
Batch I/O:
Chem.SDMolSupplier(filename, sanitize=True, removeHs=True)- SDF file readerChem.ForwardSDMolSupplier(fileobj, sanitize=True, removeHs=True)- Forward-only SDF readerChem.MultithreadedSDMolSupplier(filename, numWriterThreads=1)- Parallel SDF readerChem.SmilesMolSupplier(filename, delimiter=' ', titleLine=True)- SMILES file readerChem.SDWriter(filename)- SDF file writerChem.SmilesWriter(filename, delimiter=' ', includeHeader=True)- SMILES file writer
Molecular Manipulation
Sanitization:
Chem.SanitizeMol(mol, sanitizeOps=SANITIZE_ALL, catchErrors=False)- Sanitize moleculeChem.DetectChemistryProblems(mol, sanitizeOps=SANITIZE_ALL)- Detect sanitization issuesChem.AssignStereochemistry(mol, cleanIt=True, force=False)- Assign stereochemistryChem.FindPotentialStereo(mol)- Find potential stereocentersChem.AssignStereochemistryFrom3D(mol, confId=-1)- Assign stereo from 3D coords
Hydrogen Management:
Chem.AddHs(mol, explicitOnly=False, addCoords=False)- Add explicit hydrogensChem.RemoveHs(mol, implicitOnly=False, updateExplicitCount=False)- Remove hydrogensChem.RemoveAllHs(mol)- Remove all hydrogens
Aromaticity:
Chem.SetAromaticity(mol, model=AROMATICITY_RDKIT)- Set aromaticity modelChem.Kekulize(mol, clearAromaticFlags=False)- Kekulize aromatic bondsChem.SetConjugation(mol)- Set conjugation flags
Fragments:
Chem.GetMolFrags(mol, asMols=False, sanitizeFrags=True)- Get disconnected fragmentsChem.FragmentOnBonds(mol, bondIndices, addDummies=True)- Fragment on specific bondsChem.ReplaceSubstructs(mol, query, replacement, replaceAll=False)- Replace substructuresChem.DeleteSubstructs(mol, query, onlyFrags=False)- Delete substructures
Stereochemistry:
Chem.FindMolChiralCenters(mol, includeUnassigned=False, useLegacyImplementation=False)- Find chiral centersChem.FindPotentialStereo(mol, cleanIt=True)- Find potential stereocenters
Substructure Searching
Basic Matching:
mol.HasSubstructMatch(query, useChirality=False)- Check for substructure matchmol.GetSubstructMatch(query, useChirality=False)- Get first matchmol.GetSubstructMatches(query, uniquify=True, useChirality=False)- Get all matchesmol.GetSubstructMatches(query, maxMatches=1000)- Limit number of matches
Molecular Properties
Atom Methods:
atom.GetSymbol()- Atomic symbolatom.GetAtomicNum()- Atomic numberatom.GetDegree()- Number of bondsatom.GetTotalDegree()- Including hydrogensatom.GetFormalCharge()- Formal chargeatom.GetNumRadicalElectrons()- Radical electronsatom.GetIsAromatic()- Aromaticity flagatom.GetHybridization()- Hybridization (SP, SP2, SP3, etc.)atom.GetIdx()- Atom indexatom.IsInRing()- In any ringatom.IsInRingSize(size)- In ring of specific sizeatom.GetChiralTag()- Chirality tag
Bond Methods:
bond.GetBondType()- Bond type (SINGLE, DOUBLE, TRIPLE, AROMATIC)bond.GetBeginAtomIdx()- Starting atom indexbond.GetEndAtomIdx()- Ending atom indexbond.GetIsConjugated()- Conjugation flagbond.GetIsAromatic()- Aromaticity flagbond.IsInRing()- In any ringbond.GetStereo()- Stereochemistry (STEREONONE, STEREOZ, STEREOE, etc.)
Molecule Methods:
mol.GetNumAtoms(onlyExplicit=True)- Number of atomsmol.GetNumHeavyAtoms()- Number of heavy atomsmol.GetNumBonds()- Number of bondsmol.GetAtoms()- Iterator over atomsmol.GetBonds()- Iterator over bondsmol.GetAtomWithIdx(idx)- Get specific atommol.GetBondWithIdx(idx)- Get specific bondmol.GetRingInfo()- Ring information object
Ring Information:
Chem.GetSymmSSSR(mol)- Get smallest set of smallest ringsChem.GetSSSR(mol)- Alias for GetSymmSSSRring_info.NumRings()- Number of ringsring_info.AtomRings()- Tuples of atom indices in ringsring_info.BondRings()- Tuples of bond indices in rings
rdkit.Chem.AllChem
Extended chemistry functionality.
2D/3D Coordinate Generation
AllChem.Compute2DCoords(mol, canonOrient=True, clearConfs=True)- Generate 2D coordinatesAllChem.EmbedMolecule(mol, maxAttempts=0, randomSeed=-1, useRandomCoords=False)- Generate 3D conformerAllChem.EmbedMultipleConfs(mol, numConfs=10, maxAttempts=0, randomSeed=-1)- Generate multiple conformersAllChem.ConstrainedEmbed(mol, core, useTethers=True)- Constrained embeddingAllChem.GenerateDepictionMatching2DStructure(mol, reference, refPattern=None)- Align to template
Force Field Optimization
AllChem.UFFOptimizeMolecule(mol, maxIters=200, confId=-1)- UFF optimizationAllChem.MMFFOptimizeMolecule(mol, maxIters=200, confId=-1, mmffVariant='MMFF94')- MMFF optimizationAllChem.UFFGetMoleculeForceField(mol, confId=-1)- Get UFF force field objectAllChem.MMFFGetMoleculeForceField(mol, pyMMFFMolProperties, confId=-1)- Get MMFF force field
Conformer Analysis
AllChem.GetConformerRMS(mol, confId1, confId2, prealigned=False)- Calculate RMSDAllChem.GetConformerRMSMatrix(mol, prealigned=False)- RMSD matrixAllChem.AlignMol(prbMol, refMol, prbCid=-1, refCid=-1)- Align moleculesAllChem.AlignMolConformers(mol)- Align all conformers
Reactions
AllChem.ReactionFromSmarts(smarts, useSmiles=False)- Create reaction from SMARTSreaction.RunReactants(reactants)- Apply reactionreaction.RunReactant(reactant, reactionIdx)- Apply to specific reactantAllChem.CreateDifferenceFingerprintForReaction(reaction)- Reaction fingerprint
Fingerprints
AllChem.GetMorganFingerprint(mol, radius, useFeatures=False)- Morgan fingerprintAllChem.GetMorganFingerprintAsBitVect(mol, radius, nBits=2048)- Morgan bit vectorAllChem.GetHashedMorganFingerprint(mol, radius, nBits=2048)- Hashed MorganAllChem.GetErGFingerprint(mol)- ErG fingerprint
rdkit.Chem.Descriptors
Molecular descriptor calculations.
Common Descriptors
Descriptors.MolWt(mol)- Molecular weightDescriptors.ExactMolWt(mol)- Exact molecular weightDescriptors.HeavyAtomMolWt(mol)- Heavy atom molecular weightDescriptors.MolLogP(mol)- LogP (lipophilicity)Descriptors.MolMR(mol)- Molar refractivityDescriptors.TPSA(mol)- Topological polar surface areaDescriptors.NumHDonors(mol)- Hydrogen bond donorsDescriptors.NumHAcceptors(mol)- Hydrogen bond acceptorsDescriptors.NumRotatableBonds(mol)- Rotatable bondsDescriptors.NumAromaticRings(mol)- Aromatic ringsDescriptors.NumSaturatedRings(mol)- Saturated ringsDescriptors.NumAliphaticRings(mol)- Aliphatic ringsDescriptors.NumAromaticHeterocycles(mol)- Aromatic heterocyclesDescriptors.NumRadicalElectrons(mol)- Radical electronsDescriptors.NumValenceElectrons(mol)- Valence electrons
Batch Calculation
Descriptors.CalcMolDescriptors(mol)- Calculate all descriptors as dictionary
Descriptor Lists
Descriptors._descList- List of (name, function) tuples for all descriptors
rdkit.Chem.Draw
Molecular visualization.
Image Generation
Draw.MolToImage(mol, size=(300,300), kekulize=True, wedgeBonds=True, highlightAtoms=None)- Generate PIL imageDraw.MolToFile(mol, filename, size=(300,300), kekulize=True, wedgeBonds=True)- Save to fileDraw.MolsToGridImage(mols, molsPerRow=3, subImgSize=(200,200), legends=None)- Grid of moleculesDraw.MolsMatrixToGridImage(mols, molsPerRow=3, subImgSize=(200,200), legends=None)- Nested gridDraw.ReactionToImage(rxn, subImgSize=(200,200))- Reaction image
Fingerprint Visualization
Draw.DrawMorganBit(mol, bitId, bitInfo, whichExample=0)- Visualize Morgan bitDraw.DrawMorganBits(bits, mol, bitInfo, molsPerRow=3)- Multiple Morgan bitsDraw.DrawRDKitBit(mol, bitId, bitInfo, whichExample=0)- Visualize RDKit bit
IPython Integration
Draw.IPythonConsole- Module for Jupyter integrationDraw.IPythonConsole.ipython_useSVG- Use SVG (True) or PNG (False)Draw.IPythonConsole.molSize- Default molecule image size
Drawing Options
rdMolDraw2D.MolDrawOptions()- Get drawing options object.addAtomIndices- Show atom indices.addBondIndices- Show bond indices.addStereoAnnotation- Show stereochemistry.bondLineWidth- Line width.highlightBondWidthMultiplier- Highlight width.minFontSize- Minimum font size.maxFontSize- Maximum font size
rdkit.Chem.rdMolDescriptors
Additional descriptor calculations.
rdMolDescriptors.CalcNumRings(mol)- Number of ringsrdMolDescriptors.CalcNumAromaticRings(mol)- Aromatic ringsrdMolDescriptors.CalcNumAliphaticRings(mol)- Aliphatic ringsrdMolDescriptors.CalcNumSaturatedRings(mol)- Saturated ringsrdMolDescriptors.CalcNumHeterocycles(mol)- HeterocyclesrdMolDescriptors.CalcNumAromaticHeterocycles(mol)- Aromatic heterocyclesrdMolDescriptors.CalcNumSpiroAtoms(mol)- Spiro atomsrdMolDescriptors.CalcNumBridgeheadAtoms(mol)- Bridgehead atomsrdMolDescriptors.CalcFractionCsp3(mol)- Fraction of sp3 carbonsrdMolDescriptors.CalcLabuteASA(mol)- Labute accessible surface areardMolDescriptors.CalcTPSA(mol)- TPSArdMolDescriptors.CalcMolFormula(mol)- Molecular formula
rdkit.Chem.Scaffolds
Scaffold analysis.
Murcko Scaffolds
MurckoScaffold.GetScaffoldForMol(mol)- Get Murcko scaffoldMurckoScaffold.MakeScaffoldGeneric(mol)- Generic scaffoldMurckoScaffold.MurckoDecompose(mol)- Decompose to scaffold and sidechains
rdkit.Chem.rdMolHash
Molecular hashing and standardization.
rdMolHash.MolHash(mol, hashFunction)- Generate hashrdMolHash.HashFunction.AnonymousGraph- Anonymized structurerdMolHash.HashFunction.CanonicalSmiles- Canonical SMILESrdMolHash.HashFunction.ElementGraph- Element graphrdMolHash.HashFunction.MurckoScaffold- Murcko scaffoldrdMolHash.HashFunction.Regioisomer- Regioisomer (no stereo)rdMolHash.HashFunction.NetCharge- Net chargerdMolHash.HashFunction.HetAtomProtomer- Heteroatom protomerrdMolHash.HashFunction.HetAtomTautomer- Heteroatom tautomer
rdkit.Chem.MolStandardize
Molecule standardization.
rdMolStandardize.Normalize(mol)- Normalize functional groupsrdMolStandardize.Reionize(mol)- Fix ionization staterdMolStandardize.RemoveFragments(mol)- Remove small fragmentsrdMolStandardize.Cleanup(mol)- Full cleanup (normalize + reionize + remove)rdMolStandardize.Uncharger()- Create uncharger object.uncharge(mol)- Remove charges
rdMolStandardize.TautomerEnumerator()- Enumerate tautomers.Enumerate(mol)- Generate tautomers.Canonicalize(mol)- Get canonical tautomer
rdkit.DataStructs
Fingerprint similarity and operations.
Similarity Metrics
DataStructs.TanimotoSimilarity(fp1, fp2)- Tanimoto coefficientDataStructs.DiceSimilarity(fp1, fp2)- Dice coefficientDataStructs.CosineSimilarity(fp1, fp2)- Cosine similarityDataStructs.SokalSimilarity(fp1, fp2)- Sokal similarityDataStructs.KulczynskiSimilarity(fp1, fp2)- Kulczynski similarityDataStructs.McConnaugheySimilarity(fp1, fp2)- McConnaughey similarity
Bulk Operations
DataStructs.BulkTanimotoSimilarity(fp, fps)- Tanimoto for list of fingerprintsDataStructs.BulkDiceSimilarity(fp, fps)- Dice for listDataStructs.BulkCosineSimilarity(fp, fps)- Cosine for list
Distance Metrics
DataStructs.TanimotoDistance(fp1, fp2)- 1 - TanimotoDataStructs.DiceDistance(fp1, fp2)- 1 - Dice
rdkit.Chem.AtomPairs
Atom pair fingerprints.
Pairs.GetAtomPairFingerprint(mol, minLength=1, maxLength=30)- Atom pair fingerprintPairs.GetAtomPairFingerprintAsBitVect(mol, minLength=1, maxLength=30, nBits=2048)- As bit vectorPairs.GetHashedAtomPairFingerprint(mol, nBits=2048, minLength=1, maxLength=30)- Hashed version
rdkit.Chem.Torsions
Topological torsion fingerprints.
Torsions.GetTopologicalTorsionFingerprint(mol, targetSize=4)- Torsion fingerprintTorsions.GetTopologicalTorsionFingerprintAsIntVect(mol, targetSize=4)- As int vectorTorsions.GetHashedTopologicalTorsionFingerprint(mol, nBits=2048, targetSize=4)- Hashed version
rdkit.Chem.MACCSkeys
MACCS structural keys.
MACCSkeys.GenMACCSKeys(mol)- Generate 166-bit MACCS keys
rdkit.Chem.ChemicalFeatures
Pharmacophore features.
ChemicalFeatures.BuildFeatureFactory(featureFile)- Create feature factoryfactory.GetFeaturesForMol(mol)- Get pharmacophore featuresfeature.GetFamily()- Feature family (Donor, Acceptor, etc.)feature.GetType()- Feature typefeature.GetAtomIds()- Atoms involved in feature
rdkit.ML.Cluster.Butina
Clustering algorithms.
Butina.ClusterData(distances, nPts, distThresh, isDistData=True)- Butina clustering- Returns tuple of tuples with cluster members
rdkit.Chem.rdFingerprintGenerator
Modern fingerprint generation API (RDKit 2020.09+).
rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)- Morgan generatorrdFingerprintGenerator.GetRDKitFPGenerator(minPath=1, maxPath=7, fpSize=2048)- RDKit FP generatorrdFingerprintGenerator.GetAtomPairGenerator(minDistance=1, maxDistance=30)- Atom pair generatorgenerator.GetFingerprint(mol)- Generate fingerprintgenerator.GetCountFingerprint(mol)- Count-based fingerprint
Common Parameters
Sanitization Operations
SANITIZE_NONE- No sanitizationSANITIZE_ALL- All operations (default)SANITIZE_CLEANUP- Basic cleanupSANITIZE_PROPERTIES- Calculate propertiesSANITIZE_SYMMRINGS- Symmetrize ringsSANITIZE_KEKULIZE- Kekulize aromatic ringsSANITIZE_FINDRADICALS- Find radical electronsSANITIZE_SETAROMATICITY- Set aromaticitySANITIZE_SETCONJUGATION- Set conjugationSANITIZE_SETHYBRIDIZATION- Set hybridizationSANITIZE_CLEANUPCHIRALITY- Cleanup chirality
Bond Types
BondType.SINGLE- Single bondBondType.DOUBLE- Double bondBondType.TRIPLE- Triple bondBondType.AROMATIC- Aromatic bondBondType.DATIVE- Dative bondBondType.UNSPECIFIED- Unspecified
Hybridization
HybridizationType.S- SHybridizationType.SP- SPHybridizationType.SP2- SP2HybridizationType.SP3- SP3HybridizationType.SP3D- SP3DHybridizationType.SP3D2- SP3D2
Chirality
ChiralType.CHI_UNSPECIFIED- UnspecifiedChiralType.CHI_TETRAHEDRAL_CW- ClockwiseChiralType.CHI_TETRAHEDRAL_CCW- Counter-clockwise
Installation
# Using conda (recommended)
conda install -c conda-forge rdkit
# Using pip
pip install rdkit-pypi
Importing
# Core functionality
from rdkit import Chem
from rdkit.Chem import AllChem
# Descriptors
from rdkit.Chem import Descriptors
# Drawing
from rdkit.Chem import Draw
# Similarity
from rdkit import DataStructs