Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions

View File

@@ -0,0 +1,595 @@
# RDKit Molecular Descriptors Reference
Complete reference for molecular descriptors available in RDKit's `Descriptors` module.
## Usage
```python
from rdkit import Chem
from rdkit.Chem import Descriptors
mol = Chem.MolFromSmiles('CCO')
# Calculate individual descriptor
mw = Descriptors.MolWt(mol)
# Calculate all descriptors at once
all_desc = Descriptors.CalcMolDescriptors(mol)
```
## Molecular Weight and Mass
### MolWt
Average molecular weight of the molecule.
```python
Descriptors.MolWt(mol)
```
### ExactMolWt
Exact molecular weight using isotopic composition.
```python
Descriptors.ExactMolWt(mol)
```
### HeavyAtomMolWt
Average molecular weight ignoring hydrogens.
```python
Descriptors.HeavyAtomMolWt(mol)
```
## Lipophilicity
### MolLogP
Wildman-Crippen LogP (octanol-water partition coefficient).
```python
Descriptors.MolLogP(mol)
```
### MolMR
Wildman-Crippen molar refractivity.
```python
Descriptors.MolMR(mol)
```
## Polar Surface Area
### TPSA
Topological polar surface area (TPSA) based on fragment contributions.
```python
Descriptors.TPSA(mol)
```
### LabuteASA
Labute's Approximate Surface Area (ASA).
```python
Descriptors.LabuteASA(mol)
```
## Hydrogen Bonding
### NumHDonors
Number of hydrogen bond donors (N-H and O-H).
```python
Descriptors.NumHDonors(mol)
```
### NumHAcceptors
Number of hydrogen bond acceptors (N and O).
```python
Descriptors.NumHAcceptors(mol)
```
### NOCount
Number of N and O atoms.
```python
Descriptors.NOCount(mol)
```
### NHOHCount
Number of N-H and O-H bonds.
```python
Descriptors.NHOHCount(mol)
```
## Atom Counts
### HeavyAtomCount
Number of heavy atoms (non-hydrogen).
```python
Descriptors.HeavyAtomCount(mol)
```
### NumHeteroatoms
Number of heteroatoms (non-C and non-H).
```python
Descriptors.NumHeteroatoms(mol)
```
### NumValenceElectrons
Total number of valence electrons.
```python
Descriptors.NumValenceElectrons(mol)
```
### NumRadicalElectrons
Number of radical electrons.
```python
Descriptors.NumRadicalElectrons(mol)
```
## Ring Descriptors
### RingCount
Number of rings.
```python
Descriptors.RingCount(mol)
```
### NumAromaticRings
Number of aromatic rings.
```python
Descriptors.NumAromaticRings(mol)
```
### NumSaturatedRings
Number of saturated rings.
```python
Descriptors.NumSaturatedRings(mol)
```
### NumAliphaticRings
Number of aliphatic (non-aromatic) rings.
```python
Descriptors.NumAliphaticRings(mol)
```
### NumAromaticCarbocycles
Number of aromatic carbocycles (rings with only carbons).
```python
Descriptors.NumAromaticCarbocycles(mol)
```
### NumAromaticHeterocycles
Number of aromatic heterocycles (rings with heteroatoms).
```python
Descriptors.NumAromaticHeterocycles(mol)
```
### NumSaturatedCarbocycles
Number of saturated carbocycles.
```python
Descriptors.NumSaturatedCarbocycles(mol)
```
### NumSaturatedHeterocycles
Number of saturated heterocycles.
```python
Descriptors.NumSaturatedHeterocycles(mol)
```
### NumAliphaticCarbocycles
Number of aliphatic carbocycles.
```python
Descriptors.NumAliphaticCarbocycles(mol)
```
### NumAliphaticHeterocycles
Number of aliphatic heterocycles.
```python
Descriptors.NumAliphaticHeterocycles(mol)
```
## Rotatable Bonds
### NumRotatableBonds
Number of rotatable bonds (flexibility).
```python
Descriptors.NumRotatableBonds(mol)
```
## Aromatic Atoms
### NumAromaticAtoms
Number of aromatic atoms.
```python
Descriptors.NumAromaticAtoms(mol)
```
## Fraction Descriptors
### FractionCsp3
Fraction of carbons that are sp3 hybridized.
```python
Descriptors.FractionCsp3(mol)
```
## Complexity Descriptors
### BertzCT
Bertz complexity index.
```python
Descriptors.BertzCT(mol)
```
### Ipc
Information content (complexity measure).
```python
Descriptors.Ipc(mol)
```
## Kappa Shape Indices
Molecular shape descriptors based on graph invariants.
### Kappa1
First kappa shape index.
```python
Descriptors.Kappa1(mol)
```
### Kappa2
Second kappa shape index.
```python
Descriptors.Kappa2(mol)
```
### Kappa3
Third kappa shape index.
```python
Descriptors.Kappa3(mol)
```
## Chi Connectivity Indices
Molecular connectivity indices.
### Chi0, Chi1, Chi2, Chi3, Chi4
Simple chi connectivity indices.
```python
Descriptors.Chi0(mol)
Descriptors.Chi1(mol)
Descriptors.Chi2(mol)
Descriptors.Chi3(mol)
Descriptors.Chi4(mol)
```
### Chi0n, Chi1n, Chi2n, Chi3n, Chi4n
Valence-modified chi connectivity indices.
```python
Descriptors.Chi0n(mol)
Descriptors.Chi1n(mol)
Descriptors.Chi2n(mol)
Descriptors.Chi3n(mol)
Descriptors.Chi4n(mol)
```
### Chi0v, Chi1v, Chi2v, Chi3v, Chi4v
Valence chi connectivity indices.
```python
Descriptors.Chi0v(mol)
Descriptors.Chi1v(mol)
Descriptors.Chi2v(mol)
Descriptors.Chi3v(mol)
Descriptors.Chi4v(mol)
```
## Hall-Kier Alpha
### HallKierAlpha
Hall-Kier alpha value (molecular flexibility).
```python
Descriptors.HallKierAlpha(mol)
```
## Balaban's J Index
### BalabanJ
Balaban's J index (branching descriptor).
```python
Descriptors.BalabanJ(mol)
```
## EState Indices
Electrotopological state indices.
### MaxEStateIndex
Maximum E-state value.
```python
Descriptors.MaxEStateIndex(mol)
```
### MinEStateIndex
Minimum E-state value.
```python
Descriptors.MinEStateIndex(mol)
```
### MaxAbsEStateIndex
Maximum absolute E-state value.
```python
Descriptors.MaxAbsEStateIndex(mol)
```
### MinAbsEStateIndex
Minimum absolute E-state value.
```python
Descriptors.MinAbsEStateIndex(mol)
```
## Partial Charges
### MaxPartialCharge
Maximum partial charge.
```python
Descriptors.MaxPartialCharge(mol)
```
### MinPartialCharge
Minimum partial charge.
```python
Descriptors.MinPartialCharge(mol)
```
### MaxAbsPartialCharge
Maximum absolute partial charge.
```python
Descriptors.MaxAbsPartialCharge(mol)
```
### MinAbsPartialCharge
Minimum absolute partial charge.
```python
Descriptors.MinAbsPartialCharge(mol)
```
## Fingerprint Density
Measures the density of molecular fingerprints.
### FpDensityMorgan1
Morgan fingerprint density at radius 1.
```python
Descriptors.FpDensityMorgan1(mol)
```
### FpDensityMorgan2
Morgan fingerprint density at radius 2.
```python
Descriptors.FpDensityMorgan2(mol)
```
### FpDensityMorgan3
Morgan fingerprint density at radius 3.
```python
Descriptors.FpDensityMorgan3(mol)
```
## PEOE VSA Descriptors
Partial Equalization of Orbital Electronegativities (PEOE) VSA descriptors.
### PEOE_VSA1 through PEOE_VSA14
MOE-type descriptors using partial charges and surface area contributions.
```python
Descriptors.PEOE_VSA1(mol)
# ... through PEOE_VSA14
```
## SMR VSA Descriptors
Molecular refractivity VSA descriptors.
### SMR_VSA1 through SMR_VSA10
MOE-type descriptors using MR contributions and surface area.
```python
Descriptors.SMR_VSA1(mol)
# ... through SMR_VSA10
```
## SLogP VSA Descriptors
LogP VSA descriptors.
### SLogP_VSA1 through SLogP_VSA12
MOE-type descriptors using LogP contributions and surface area.
```python
Descriptors.SLogP_VSA1(mol)
# ... through SLogP_VSA12
```
## EState VSA Descriptors
### EState_VSA1 through EState_VSA11
MOE-type descriptors using E-state indices and surface area.
```python
Descriptors.EState_VSA1(mol)
# ... through EState_VSA11
```
## VSA Descriptors
van der Waals surface area descriptors.
### VSA_EState1 through VSA_EState10
EState VSA descriptors.
```python
Descriptors.VSA_EState1(mol)
# ... through VSA_EState10
```
## BCUT Descriptors
Burden-CAS-University of Texas eigenvalue descriptors.
### BCUT2D_MWHI
Highest eigenvalue of Burden matrix weighted by molecular weight.
```python
Descriptors.BCUT2D_MWHI(mol)
```
### BCUT2D_MWLOW
Lowest eigenvalue of Burden matrix weighted by molecular weight.
```python
Descriptors.BCUT2D_MWLOW(mol)
```
### BCUT2D_CHGHI
Highest eigenvalue weighted by partial charges.
```python
Descriptors.BCUT2D_CHGHI(mol)
```
### BCUT2D_CHGLO
Lowest eigenvalue weighted by partial charges.
```python
Descriptors.BCUT2D_CHGLO(mol)
```
### BCUT2D_LOGPHI
Highest eigenvalue weighted by LogP.
```python
Descriptors.BCUT2D_LOGPHI(mol)
```
### BCUT2D_LOGPLOW
Lowest eigenvalue weighted by LogP.
```python
Descriptors.BCUT2D_LOGPLOW(mol)
```
### BCUT2D_MRHI
Highest eigenvalue weighted by molar refractivity.
```python
Descriptors.BCUT2D_MRHI(mol)
```
### BCUT2D_MRLOW
Lowest eigenvalue weighted by molar refractivity.
```python
Descriptors.BCUT2D_MRLOW(mol)
```
## Autocorrelation Descriptors
### AUTOCORR2D
2D autocorrelation descriptors (if enabled).
Various autocorrelation indices measuring spatial distribution of properties.
## MQN Descriptors
Molecular Quantum Numbers - 42 simple descriptors.
### mqn1 through mqn42
Integer descriptors counting various molecular features.
```python
# Access via CalcMolDescriptors
desc = Descriptors.CalcMolDescriptors(mol)
mqns = {k: v for k, v in desc.items() if k.startswith('mqn')}
```
## QED
### qed
Quantitative Estimate of Drug-likeness.
```python
Descriptors.qed(mol)
```
## Lipinski's Rule of Five
Check drug-likeness using Lipinski's criteria:
```python
def lipinski_rule_of_five(mol):
mw = Descriptors.MolWt(mol) <= 500
logp = Descriptors.MolLogP(mol) <= 5
hbd = Descriptors.NumHDonors(mol) <= 5
hba = Descriptors.NumHAcceptors(mol) <= 10
return mw and logp and hbd and hba
```
## Batch Descriptor Calculation
Calculate all descriptors at once:
```python
from rdkit import Chem
from rdkit.Chem import Descriptors
mol = Chem.MolFromSmiles('CCO')
# Get all descriptors as dictionary
all_descriptors = Descriptors.CalcMolDescriptors(mol)
# Access specific descriptor
mw = all_descriptors['MolWt']
logp = all_descriptors['MolLogP']
# Get list of available descriptor names
from rdkit.Chem import Descriptors
descriptor_names = [desc[0] for desc in Descriptors._descList]
```
## Descriptor Categories Summary
1. **Physicochemical**: MolWt, MolLogP, MolMR, TPSA
2. **Topological**: BertzCT, BalabanJ, Kappa indices
3. **Electronic**: Partial charges, E-state indices
4. **Shape**: Kappa indices, BCUT descriptors
5. **Connectivity**: Chi indices
6. **2D Fingerprints**: FpDensity descriptors
7. **Atom counts**: Heavy atoms, heteroatoms, rings
8. **Drug-likeness**: QED, Lipinski parameters
9. **Flexibility**: NumRotatableBonds, HallKierAlpha
10. **Surface area**: VSA-based descriptors
## Common Use Cases
### Drug-likeness Screening
```python
def screen_druglikeness(mol):
return {
'MW': Descriptors.MolWt(mol),
'LogP': Descriptors.MolLogP(mol),
'HBD': Descriptors.NumHDonors(mol),
'HBA': Descriptors.NumHAcceptors(mol),
'TPSA': Descriptors.TPSA(mol),
'RotBonds': Descriptors.NumRotatableBonds(mol),
'AromaticRings': Descriptors.NumAromaticRings(mol),
'QED': Descriptors.qed(mol)
}
```
### Lead-like Filtering
```python
def is_leadlike(mol):
mw = 250 <= Descriptors.MolWt(mol) <= 350
logp = Descriptors.MolLogP(mol) <= 3.5
rot_bonds = Descriptors.NumRotatableBonds(mol) <= 7
return mw and logp and rot_bonds
```
### Diversity Analysis
```python
def molecular_complexity(mol):
return {
'BertzCT': Descriptors.BertzCT(mol),
'NumRings': Descriptors.RingCount(mol),
'NumRotBonds': Descriptors.NumRotatableBonds(mol),
'FractionCsp3': Descriptors.FractionCsp3(mol),
'NumAromaticRings': Descriptors.NumAromaticRings(mol)
}
```
## Tips
1. **Use batch calculation** for multiple descriptors to avoid redundant computations
2. **Check for None** - some descriptors may return None for invalid molecules
3. **Normalize descriptors** for machine learning applications
4. **Select relevant descriptors** - not all 200+ descriptors are useful for every task
5. **Consider 3D descriptors** separately (require 3D coordinates)
6. **Validate ranges** - check if descriptor values are in expected ranges