596 lines
12 KiB
Markdown
596 lines
12 KiB
Markdown
# RDKit Molecular Descriptors Reference
|
|
|
|
Complete reference for molecular descriptors available in RDKit's `Descriptors` module.
|
|
|
|
## Usage
|
|
|
|
```python
|
|
from rdkit import Chem
|
|
from rdkit.Chem import Descriptors
|
|
|
|
mol = Chem.MolFromSmiles('CCO')
|
|
|
|
# Calculate individual descriptor
|
|
mw = Descriptors.MolWt(mol)
|
|
|
|
# Calculate all descriptors at once
|
|
all_desc = Descriptors.CalcMolDescriptors(mol)
|
|
```
|
|
|
|
## Molecular Weight and Mass
|
|
|
|
### MolWt
|
|
Average molecular weight of the molecule.
|
|
```python
|
|
Descriptors.MolWt(mol)
|
|
```
|
|
|
|
### ExactMolWt
|
|
Exact molecular weight using isotopic composition.
|
|
```python
|
|
Descriptors.ExactMolWt(mol)
|
|
```
|
|
|
|
### HeavyAtomMolWt
|
|
Average molecular weight ignoring hydrogens.
|
|
```python
|
|
Descriptors.HeavyAtomMolWt(mol)
|
|
```
|
|
|
|
## Lipophilicity
|
|
|
|
### MolLogP
|
|
Wildman-Crippen LogP (octanol-water partition coefficient).
|
|
```python
|
|
Descriptors.MolLogP(mol)
|
|
```
|
|
|
|
### MolMR
|
|
Wildman-Crippen molar refractivity.
|
|
```python
|
|
Descriptors.MolMR(mol)
|
|
```
|
|
|
|
## Polar Surface Area
|
|
|
|
### TPSA
|
|
Topological polar surface area (TPSA) based on fragment contributions.
|
|
```python
|
|
Descriptors.TPSA(mol)
|
|
```
|
|
|
|
### LabuteASA
|
|
Labute's Approximate Surface Area (ASA).
|
|
```python
|
|
Descriptors.LabuteASA(mol)
|
|
```
|
|
|
|
## Hydrogen Bonding
|
|
|
|
### NumHDonors
|
|
Number of hydrogen bond donors (N-H and O-H).
|
|
```python
|
|
Descriptors.NumHDonors(mol)
|
|
```
|
|
|
|
### NumHAcceptors
|
|
Number of hydrogen bond acceptors (N and O).
|
|
```python
|
|
Descriptors.NumHAcceptors(mol)
|
|
```
|
|
|
|
### NOCount
|
|
Number of N and O atoms.
|
|
```python
|
|
Descriptors.NOCount(mol)
|
|
```
|
|
|
|
### NHOHCount
|
|
Number of N-H and O-H bonds.
|
|
```python
|
|
Descriptors.NHOHCount(mol)
|
|
```
|
|
|
|
## Atom Counts
|
|
|
|
### HeavyAtomCount
|
|
Number of heavy atoms (non-hydrogen).
|
|
```python
|
|
Descriptors.HeavyAtomCount(mol)
|
|
```
|
|
|
|
### NumHeteroatoms
|
|
Number of heteroatoms (non-C and non-H).
|
|
```python
|
|
Descriptors.NumHeteroatoms(mol)
|
|
```
|
|
|
|
### NumValenceElectrons
|
|
Total number of valence electrons.
|
|
```python
|
|
Descriptors.NumValenceElectrons(mol)
|
|
```
|
|
|
|
### NumRadicalElectrons
|
|
Number of radical electrons.
|
|
```python
|
|
Descriptors.NumRadicalElectrons(mol)
|
|
```
|
|
|
|
## Ring Descriptors
|
|
|
|
### RingCount
|
|
Number of rings.
|
|
```python
|
|
Descriptors.RingCount(mol)
|
|
```
|
|
|
|
### NumAromaticRings
|
|
Number of aromatic rings.
|
|
```python
|
|
Descriptors.NumAromaticRings(mol)
|
|
```
|
|
|
|
### NumSaturatedRings
|
|
Number of saturated rings.
|
|
```python
|
|
Descriptors.NumSaturatedRings(mol)
|
|
```
|
|
|
|
### NumAliphaticRings
|
|
Number of aliphatic (non-aromatic) rings.
|
|
```python
|
|
Descriptors.NumAliphaticRings(mol)
|
|
```
|
|
|
|
### NumAromaticCarbocycles
|
|
Number of aromatic carbocycles (rings with only carbons).
|
|
```python
|
|
Descriptors.NumAromaticCarbocycles(mol)
|
|
```
|
|
|
|
### NumAromaticHeterocycles
|
|
Number of aromatic heterocycles (rings with heteroatoms).
|
|
```python
|
|
Descriptors.NumAromaticHeterocycles(mol)
|
|
```
|
|
|
|
### NumSaturatedCarbocycles
|
|
Number of saturated carbocycles.
|
|
```python
|
|
Descriptors.NumSaturatedCarbocycles(mol)
|
|
```
|
|
|
|
### NumSaturatedHeterocycles
|
|
Number of saturated heterocycles.
|
|
```python
|
|
Descriptors.NumSaturatedHeterocycles(mol)
|
|
```
|
|
|
|
### NumAliphaticCarbocycles
|
|
Number of aliphatic carbocycles.
|
|
```python
|
|
Descriptors.NumAliphaticCarbocycles(mol)
|
|
```
|
|
|
|
### NumAliphaticHeterocycles
|
|
Number of aliphatic heterocycles.
|
|
```python
|
|
Descriptors.NumAliphaticHeterocycles(mol)
|
|
```
|
|
|
|
## Rotatable Bonds
|
|
|
|
### NumRotatableBonds
|
|
Number of rotatable bonds (flexibility).
|
|
```python
|
|
Descriptors.NumRotatableBonds(mol)
|
|
```
|
|
|
|
## Aromatic Atoms
|
|
|
|
### NumAromaticAtoms
|
|
Number of aromatic atoms.
|
|
```python
|
|
Descriptors.NumAromaticAtoms(mol)
|
|
```
|
|
|
|
## Fraction Descriptors
|
|
|
|
### FractionCsp3
|
|
Fraction of carbons that are sp3 hybridized.
|
|
```python
|
|
Descriptors.FractionCsp3(mol)
|
|
```
|
|
|
|
## Complexity Descriptors
|
|
|
|
### BertzCT
|
|
Bertz complexity index.
|
|
```python
|
|
Descriptors.BertzCT(mol)
|
|
```
|
|
|
|
### Ipc
|
|
Information content (complexity measure).
|
|
```python
|
|
Descriptors.Ipc(mol)
|
|
```
|
|
|
|
## Kappa Shape Indices
|
|
|
|
Molecular shape descriptors based on graph invariants.
|
|
|
|
### Kappa1
|
|
First kappa shape index.
|
|
```python
|
|
Descriptors.Kappa1(mol)
|
|
```
|
|
|
|
### Kappa2
|
|
Second kappa shape index.
|
|
```python
|
|
Descriptors.Kappa2(mol)
|
|
```
|
|
|
|
### Kappa3
|
|
Third kappa shape index.
|
|
```python
|
|
Descriptors.Kappa3(mol)
|
|
```
|
|
|
|
## Chi Connectivity Indices
|
|
|
|
Molecular connectivity indices.
|
|
|
|
### Chi0, Chi1, Chi2, Chi3, Chi4
|
|
Simple chi connectivity indices.
|
|
```python
|
|
Descriptors.Chi0(mol)
|
|
Descriptors.Chi1(mol)
|
|
Descriptors.Chi2(mol)
|
|
Descriptors.Chi3(mol)
|
|
Descriptors.Chi4(mol)
|
|
```
|
|
|
|
### Chi0n, Chi1n, Chi2n, Chi3n, Chi4n
|
|
Valence-modified chi connectivity indices.
|
|
```python
|
|
Descriptors.Chi0n(mol)
|
|
Descriptors.Chi1n(mol)
|
|
Descriptors.Chi2n(mol)
|
|
Descriptors.Chi3n(mol)
|
|
Descriptors.Chi4n(mol)
|
|
```
|
|
|
|
### Chi0v, Chi1v, Chi2v, Chi3v, Chi4v
|
|
Valence chi connectivity indices.
|
|
```python
|
|
Descriptors.Chi0v(mol)
|
|
Descriptors.Chi1v(mol)
|
|
Descriptors.Chi2v(mol)
|
|
Descriptors.Chi3v(mol)
|
|
Descriptors.Chi4v(mol)
|
|
```
|
|
|
|
## Hall-Kier Alpha
|
|
|
|
### HallKierAlpha
|
|
Hall-Kier alpha value (molecular flexibility).
|
|
```python
|
|
Descriptors.HallKierAlpha(mol)
|
|
```
|
|
|
|
## Balaban's J Index
|
|
|
|
### BalabanJ
|
|
Balaban's J index (branching descriptor).
|
|
```python
|
|
Descriptors.BalabanJ(mol)
|
|
```
|
|
|
|
## EState Indices
|
|
|
|
Electrotopological state indices.
|
|
|
|
### MaxEStateIndex
|
|
Maximum E-state value.
|
|
```python
|
|
Descriptors.MaxEStateIndex(mol)
|
|
```
|
|
|
|
### MinEStateIndex
|
|
Minimum E-state value.
|
|
```python
|
|
Descriptors.MinEStateIndex(mol)
|
|
```
|
|
|
|
### MaxAbsEStateIndex
|
|
Maximum absolute E-state value.
|
|
```python
|
|
Descriptors.MaxAbsEStateIndex(mol)
|
|
```
|
|
|
|
### MinAbsEStateIndex
|
|
Minimum absolute E-state value.
|
|
```python
|
|
Descriptors.MinAbsEStateIndex(mol)
|
|
```
|
|
|
|
## Partial Charges
|
|
|
|
### MaxPartialCharge
|
|
Maximum partial charge.
|
|
```python
|
|
Descriptors.MaxPartialCharge(mol)
|
|
```
|
|
|
|
### MinPartialCharge
|
|
Minimum partial charge.
|
|
```python
|
|
Descriptors.MinPartialCharge(mol)
|
|
```
|
|
|
|
### MaxAbsPartialCharge
|
|
Maximum absolute partial charge.
|
|
```python
|
|
Descriptors.MaxAbsPartialCharge(mol)
|
|
```
|
|
|
|
### MinAbsPartialCharge
|
|
Minimum absolute partial charge.
|
|
```python
|
|
Descriptors.MinAbsPartialCharge(mol)
|
|
```
|
|
|
|
## Fingerprint Density
|
|
|
|
Measures the density of molecular fingerprints.
|
|
|
|
### FpDensityMorgan1
|
|
Morgan fingerprint density at radius 1.
|
|
```python
|
|
Descriptors.FpDensityMorgan1(mol)
|
|
```
|
|
|
|
### FpDensityMorgan2
|
|
Morgan fingerprint density at radius 2.
|
|
```python
|
|
Descriptors.FpDensityMorgan2(mol)
|
|
```
|
|
|
|
### FpDensityMorgan3
|
|
Morgan fingerprint density at radius 3.
|
|
```python
|
|
Descriptors.FpDensityMorgan3(mol)
|
|
```
|
|
|
|
## PEOE VSA Descriptors
|
|
|
|
Partial Equalization of Orbital Electronegativities (PEOE) VSA descriptors.
|
|
|
|
### PEOE_VSA1 through PEOE_VSA14
|
|
MOE-type descriptors using partial charges and surface area contributions.
|
|
```python
|
|
Descriptors.PEOE_VSA1(mol)
|
|
# ... through PEOE_VSA14
|
|
```
|
|
|
|
## SMR VSA Descriptors
|
|
|
|
Molecular refractivity VSA descriptors.
|
|
|
|
### SMR_VSA1 through SMR_VSA10
|
|
MOE-type descriptors using MR contributions and surface area.
|
|
```python
|
|
Descriptors.SMR_VSA1(mol)
|
|
# ... through SMR_VSA10
|
|
```
|
|
|
|
## SLogP VSA Descriptors
|
|
|
|
LogP VSA descriptors.
|
|
|
|
### SLogP_VSA1 through SLogP_VSA12
|
|
MOE-type descriptors using LogP contributions and surface area.
|
|
```python
|
|
Descriptors.SLogP_VSA1(mol)
|
|
# ... through SLogP_VSA12
|
|
```
|
|
|
|
## EState VSA Descriptors
|
|
|
|
### EState_VSA1 through EState_VSA11
|
|
MOE-type descriptors using E-state indices and surface area.
|
|
```python
|
|
Descriptors.EState_VSA1(mol)
|
|
# ... through EState_VSA11
|
|
```
|
|
|
|
## VSA Descriptors
|
|
|
|
van der Waals surface area descriptors.
|
|
|
|
### VSA_EState1 through VSA_EState10
|
|
EState VSA descriptors.
|
|
```python
|
|
Descriptors.VSA_EState1(mol)
|
|
# ... through VSA_EState10
|
|
```
|
|
|
|
## BCUT Descriptors
|
|
|
|
Burden-CAS-University of Texas eigenvalue descriptors.
|
|
|
|
### BCUT2D_MWHI
|
|
Highest eigenvalue of Burden matrix weighted by molecular weight.
|
|
```python
|
|
Descriptors.BCUT2D_MWHI(mol)
|
|
```
|
|
|
|
### BCUT2D_MWLOW
|
|
Lowest eigenvalue of Burden matrix weighted by molecular weight.
|
|
```python
|
|
Descriptors.BCUT2D_MWLOW(mol)
|
|
```
|
|
|
|
### BCUT2D_CHGHI
|
|
Highest eigenvalue weighted by partial charges.
|
|
```python
|
|
Descriptors.BCUT2D_CHGHI(mol)
|
|
```
|
|
|
|
### BCUT2D_CHGLO
|
|
Lowest eigenvalue weighted by partial charges.
|
|
```python
|
|
Descriptors.BCUT2D_CHGLO(mol)
|
|
```
|
|
|
|
### BCUT2D_LOGPHI
|
|
Highest eigenvalue weighted by LogP.
|
|
```python
|
|
Descriptors.BCUT2D_LOGPHI(mol)
|
|
```
|
|
|
|
### BCUT2D_LOGPLOW
|
|
Lowest eigenvalue weighted by LogP.
|
|
```python
|
|
Descriptors.BCUT2D_LOGPLOW(mol)
|
|
```
|
|
|
|
### BCUT2D_MRHI
|
|
Highest eigenvalue weighted by molar refractivity.
|
|
```python
|
|
Descriptors.BCUT2D_MRHI(mol)
|
|
```
|
|
|
|
### BCUT2D_MRLOW
|
|
Lowest eigenvalue weighted by molar refractivity.
|
|
```python
|
|
Descriptors.BCUT2D_MRLOW(mol)
|
|
```
|
|
|
|
## Autocorrelation Descriptors
|
|
|
|
### AUTOCORR2D
|
|
2D autocorrelation descriptors (if enabled).
|
|
Various autocorrelation indices measuring spatial distribution of properties.
|
|
|
|
## MQN Descriptors
|
|
|
|
Molecular Quantum Numbers - 42 simple descriptors.
|
|
|
|
### mqn1 through mqn42
|
|
Integer descriptors counting various molecular features.
|
|
```python
|
|
# Access via CalcMolDescriptors
|
|
desc = Descriptors.CalcMolDescriptors(mol)
|
|
mqns = {k: v for k, v in desc.items() if k.startswith('mqn')}
|
|
```
|
|
|
|
## QED
|
|
|
|
### qed
|
|
Quantitative Estimate of Drug-likeness.
|
|
```python
|
|
Descriptors.qed(mol)
|
|
```
|
|
|
|
## Lipinski's Rule of Five
|
|
|
|
Check drug-likeness using Lipinski's criteria:
|
|
|
|
```python
|
|
def lipinski_rule_of_five(mol):
|
|
mw = Descriptors.MolWt(mol) <= 500
|
|
logp = Descriptors.MolLogP(mol) <= 5
|
|
hbd = Descriptors.NumHDonors(mol) <= 5
|
|
hba = Descriptors.NumHAcceptors(mol) <= 10
|
|
return mw and logp and hbd and hba
|
|
```
|
|
|
|
## Batch Descriptor Calculation
|
|
|
|
Calculate all descriptors at once:
|
|
|
|
```python
|
|
from rdkit import Chem
|
|
from rdkit.Chem import Descriptors
|
|
|
|
mol = Chem.MolFromSmiles('CCO')
|
|
|
|
# Get all descriptors as dictionary
|
|
all_descriptors = Descriptors.CalcMolDescriptors(mol)
|
|
|
|
# Access specific descriptor
|
|
mw = all_descriptors['MolWt']
|
|
logp = all_descriptors['MolLogP']
|
|
|
|
# Get list of available descriptor names
|
|
from rdkit.Chem import Descriptors
|
|
descriptor_names = [desc[0] for desc in Descriptors._descList]
|
|
```
|
|
|
|
## Descriptor Categories Summary
|
|
|
|
1. **Physicochemical**: MolWt, MolLogP, MolMR, TPSA
|
|
2. **Topological**: BertzCT, BalabanJ, Kappa indices
|
|
3. **Electronic**: Partial charges, E-state indices
|
|
4. **Shape**: Kappa indices, BCUT descriptors
|
|
5. **Connectivity**: Chi indices
|
|
6. **2D Fingerprints**: FpDensity descriptors
|
|
7. **Atom counts**: Heavy atoms, heteroatoms, rings
|
|
8. **Drug-likeness**: QED, Lipinski parameters
|
|
9. **Flexibility**: NumRotatableBonds, HallKierAlpha
|
|
10. **Surface area**: VSA-based descriptors
|
|
|
|
## Common Use Cases
|
|
|
|
### Drug-likeness Screening
|
|
|
|
```python
|
|
def screen_druglikeness(mol):
|
|
return {
|
|
'MW': Descriptors.MolWt(mol),
|
|
'LogP': Descriptors.MolLogP(mol),
|
|
'HBD': Descriptors.NumHDonors(mol),
|
|
'HBA': Descriptors.NumHAcceptors(mol),
|
|
'TPSA': Descriptors.TPSA(mol),
|
|
'RotBonds': Descriptors.NumRotatableBonds(mol),
|
|
'AromaticRings': Descriptors.NumAromaticRings(mol),
|
|
'QED': Descriptors.qed(mol)
|
|
}
|
|
```
|
|
|
|
### Lead-like Filtering
|
|
|
|
```python
|
|
def is_leadlike(mol):
|
|
mw = 250 <= Descriptors.MolWt(mol) <= 350
|
|
logp = Descriptors.MolLogP(mol) <= 3.5
|
|
rot_bonds = Descriptors.NumRotatableBonds(mol) <= 7
|
|
return mw and logp and rot_bonds
|
|
```
|
|
|
|
### Diversity Analysis
|
|
|
|
```python
|
|
def molecular_complexity(mol):
|
|
return {
|
|
'BertzCT': Descriptors.BertzCT(mol),
|
|
'NumRings': Descriptors.RingCount(mol),
|
|
'NumRotBonds': Descriptors.NumRotatableBonds(mol),
|
|
'FractionCsp3': Descriptors.FractionCsp3(mol),
|
|
'NumAromaticRings': Descriptors.NumAromaticRings(mol)
|
|
}
|
|
```
|
|
|
|
## Tips
|
|
|
|
1. **Use batch calculation** for multiple descriptors to avoid redundant computations
|
|
2. **Check for None** - some descriptors may return None for invalid molecules
|
|
3. **Normalize descriptors** for machine learning applications
|
|
4. **Select relevant descriptors** - not all 200+ descriptors are useful for every task
|
|
5. **Consider 3D descriptors** separately (require 3D coordinates)
|
|
6. **Validate ranges** - check if descriptor values are in expected ranges
|