Initial commit
This commit is contained in:
595
skills/rdkit/references/descriptors_reference.md
Normal file
595
skills/rdkit/references/descriptors_reference.md
Normal file
@@ -0,0 +1,595 @@
|
||||
# RDKit Molecular Descriptors Reference
|
||||
|
||||
Complete reference for molecular descriptors available in RDKit's `Descriptors` module.
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from rdkit import Chem
|
||||
from rdkit.Chem import Descriptors
|
||||
|
||||
mol = Chem.MolFromSmiles('CCO')
|
||||
|
||||
# Calculate individual descriptor
|
||||
mw = Descriptors.MolWt(mol)
|
||||
|
||||
# Calculate all descriptors at once
|
||||
all_desc = Descriptors.CalcMolDescriptors(mol)
|
||||
```
|
||||
|
||||
## Molecular Weight and Mass
|
||||
|
||||
### MolWt
|
||||
Average molecular weight of the molecule.
|
||||
```python
|
||||
Descriptors.MolWt(mol)
|
||||
```
|
||||
|
||||
### ExactMolWt
|
||||
Exact molecular weight using isotopic composition.
|
||||
```python
|
||||
Descriptors.ExactMolWt(mol)
|
||||
```
|
||||
|
||||
### HeavyAtomMolWt
|
||||
Average molecular weight ignoring hydrogens.
|
||||
```python
|
||||
Descriptors.HeavyAtomMolWt(mol)
|
||||
```
|
||||
|
||||
## Lipophilicity
|
||||
|
||||
### MolLogP
|
||||
Wildman-Crippen LogP (octanol-water partition coefficient).
|
||||
```python
|
||||
Descriptors.MolLogP(mol)
|
||||
```
|
||||
|
||||
### MolMR
|
||||
Wildman-Crippen molar refractivity.
|
||||
```python
|
||||
Descriptors.MolMR(mol)
|
||||
```
|
||||
|
||||
## Polar Surface Area
|
||||
|
||||
### TPSA
|
||||
Topological polar surface area (TPSA) based on fragment contributions.
|
||||
```python
|
||||
Descriptors.TPSA(mol)
|
||||
```
|
||||
|
||||
### LabuteASA
|
||||
Labute's Approximate Surface Area (ASA).
|
||||
```python
|
||||
Descriptors.LabuteASA(mol)
|
||||
```
|
||||
|
||||
## Hydrogen Bonding
|
||||
|
||||
### NumHDonors
|
||||
Number of hydrogen bond donors (N-H and O-H).
|
||||
```python
|
||||
Descriptors.NumHDonors(mol)
|
||||
```
|
||||
|
||||
### NumHAcceptors
|
||||
Number of hydrogen bond acceptors (N and O).
|
||||
```python
|
||||
Descriptors.NumHAcceptors(mol)
|
||||
```
|
||||
|
||||
### NOCount
|
||||
Number of N and O atoms.
|
||||
```python
|
||||
Descriptors.NOCount(mol)
|
||||
```
|
||||
|
||||
### NHOHCount
|
||||
Number of N-H and O-H bonds.
|
||||
```python
|
||||
Descriptors.NHOHCount(mol)
|
||||
```
|
||||
|
||||
## Atom Counts
|
||||
|
||||
### HeavyAtomCount
|
||||
Number of heavy atoms (non-hydrogen).
|
||||
```python
|
||||
Descriptors.HeavyAtomCount(mol)
|
||||
```
|
||||
|
||||
### NumHeteroatoms
|
||||
Number of heteroatoms (non-C and non-H).
|
||||
```python
|
||||
Descriptors.NumHeteroatoms(mol)
|
||||
```
|
||||
|
||||
### NumValenceElectrons
|
||||
Total number of valence electrons.
|
||||
```python
|
||||
Descriptors.NumValenceElectrons(mol)
|
||||
```
|
||||
|
||||
### NumRadicalElectrons
|
||||
Number of radical electrons.
|
||||
```python
|
||||
Descriptors.NumRadicalElectrons(mol)
|
||||
```
|
||||
|
||||
## Ring Descriptors
|
||||
|
||||
### RingCount
|
||||
Number of rings.
|
||||
```python
|
||||
Descriptors.RingCount(mol)
|
||||
```
|
||||
|
||||
### NumAromaticRings
|
||||
Number of aromatic rings.
|
||||
```python
|
||||
Descriptors.NumAromaticRings(mol)
|
||||
```
|
||||
|
||||
### NumSaturatedRings
|
||||
Number of saturated rings.
|
||||
```python
|
||||
Descriptors.NumSaturatedRings(mol)
|
||||
```
|
||||
|
||||
### NumAliphaticRings
|
||||
Number of aliphatic (non-aromatic) rings.
|
||||
```python
|
||||
Descriptors.NumAliphaticRings(mol)
|
||||
```
|
||||
|
||||
### NumAromaticCarbocycles
|
||||
Number of aromatic carbocycles (rings with only carbons).
|
||||
```python
|
||||
Descriptors.NumAromaticCarbocycles(mol)
|
||||
```
|
||||
|
||||
### NumAromaticHeterocycles
|
||||
Number of aromatic heterocycles (rings with heteroatoms).
|
||||
```python
|
||||
Descriptors.NumAromaticHeterocycles(mol)
|
||||
```
|
||||
|
||||
### NumSaturatedCarbocycles
|
||||
Number of saturated carbocycles.
|
||||
```python
|
||||
Descriptors.NumSaturatedCarbocycles(mol)
|
||||
```
|
||||
|
||||
### NumSaturatedHeterocycles
|
||||
Number of saturated heterocycles.
|
||||
```python
|
||||
Descriptors.NumSaturatedHeterocycles(mol)
|
||||
```
|
||||
|
||||
### NumAliphaticCarbocycles
|
||||
Number of aliphatic carbocycles.
|
||||
```python
|
||||
Descriptors.NumAliphaticCarbocycles(mol)
|
||||
```
|
||||
|
||||
### NumAliphaticHeterocycles
|
||||
Number of aliphatic heterocycles.
|
||||
```python
|
||||
Descriptors.NumAliphaticHeterocycles(mol)
|
||||
```
|
||||
|
||||
## Rotatable Bonds
|
||||
|
||||
### NumRotatableBonds
|
||||
Number of rotatable bonds (flexibility).
|
||||
```python
|
||||
Descriptors.NumRotatableBonds(mol)
|
||||
```
|
||||
|
||||
## Aromatic Atoms
|
||||
|
||||
### NumAromaticAtoms
|
||||
Number of aromatic atoms.
|
||||
```python
|
||||
Descriptors.NumAromaticAtoms(mol)
|
||||
```
|
||||
|
||||
## Fraction Descriptors
|
||||
|
||||
### FractionCsp3
|
||||
Fraction of carbons that are sp3 hybridized.
|
||||
```python
|
||||
Descriptors.FractionCsp3(mol)
|
||||
```
|
||||
|
||||
## Complexity Descriptors
|
||||
|
||||
### BertzCT
|
||||
Bertz complexity index.
|
||||
```python
|
||||
Descriptors.BertzCT(mol)
|
||||
```
|
||||
|
||||
### Ipc
|
||||
Information content (complexity measure).
|
||||
```python
|
||||
Descriptors.Ipc(mol)
|
||||
```
|
||||
|
||||
## Kappa Shape Indices
|
||||
|
||||
Molecular shape descriptors based on graph invariants.
|
||||
|
||||
### Kappa1
|
||||
First kappa shape index.
|
||||
```python
|
||||
Descriptors.Kappa1(mol)
|
||||
```
|
||||
|
||||
### Kappa2
|
||||
Second kappa shape index.
|
||||
```python
|
||||
Descriptors.Kappa2(mol)
|
||||
```
|
||||
|
||||
### Kappa3
|
||||
Third kappa shape index.
|
||||
```python
|
||||
Descriptors.Kappa3(mol)
|
||||
```
|
||||
|
||||
## Chi Connectivity Indices
|
||||
|
||||
Molecular connectivity indices.
|
||||
|
||||
### Chi0, Chi1, Chi2, Chi3, Chi4
|
||||
Simple chi connectivity indices.
|
||||
```python
|
||||
Descriptors.Chi0(mol)
|
||||
Descriptors.Chi1(mol)
|
||||
Descriptors.Chi2(mol)
|
||||
Descriptors.Chi3(mol)
|
||||
Descriptors.Chi4(mol)
|
||||
```
|
||||
|
||||
### Chi0n, Chi1n, Chi2n, Chi3n, Chi4n
|
||||
Valence-modified chi connectivity indices.
|
||||
```python
|
||||
Descriptors.Chi0n(mol)
|
||||
Descriptors.Chi1n(mol)
|
||||
Descriptors.Chi2n(mol)
|
||||
Descriptors.Chi3n(mol)
|
||||
Descriptors.Chi4n(mol)
|
||||
```
|
||||
|
||||
### Chi0v, Chi1v, Chi2v, Chi3v, Chi4v
|
||||
Valence chi connectivity indices.
|
||||
```python
|
||||
Descriptors.Chi0v(mol)
|
||||
Descriptors.Chi1v(mol)
|
||||
Descriptors.Chi2v(mol)
|
||||
Descriptors.Chi3v(mol)
|
||||
Descriptors.Chi4v(mol)
|
||||
```
|
||||
|
||||
## Hall-Kier Alpha
|
||||
|
||||
### HallKierAlpha
|
||||
Hall-Kier alpha value (molecular flexibility).
|
||||
```python
|
||||
Descriptors.HallKierAlpha(mol)
|
||||
```
|
||||
|
||||
## Balaban's J Index
|
||||
|
||||
### BalabanJ
|
||||
Balaban's J index (branching descriptor).
|
||||
```python
|
||||
Descriptors.BalabanJ(mol)
|
||||
```
|
||||
|
||||
## EState Indices
|
||||
|
||||
Electrotopological state indices.
|
||||
|
||||
### MaxEStateIndex
|
||||
Maximum E-state value.
|
||||
```python
|
||||
Descriptors.MaxEStateIndex(mol)
|
||||
```
|
||||
|
||||
### MinEStateIndex
|
||||
Minimum E-state value.
|
||||
```python
|
||||
Descriptors.MinEStateIndex(mol)
|
||||
```
|
||||
|
||||
### MaxAbsEStateIndex
|
||||
Maximum absolute E-state value.
|
||||
```python
|
||||
Descriptors.MaxAbsEStateIndex(mol)
|
||||
```
|
||||
|
||||
### MinAbsEStateIndex
|
||||
Minimum absolute E-state value.
|
||||
```python
|
||||
Descriptors.MinAbsEStateIndex(mol)
|
||||
```
|
||||
|
||||
## Partial Charges
|
||||
|
||||
### MaxPartialCharge
|
||||
Maximum partial charge.
|
||||
```python
|
||||
Descriptors.MaxPartialCharge(mol)
|
||||
```
|
||||
|
||||
### MinPartialCharge
|
||||
Minimum partial charge.
|
||||
```python
|
||||
Descriptors.MinPartialCharge(mol)
|
||||
```
|
||||
|
||||
### MaxAbsPartialCharge
|
||||
Maximum absolute partial charge.
|
||||
```python
|
||||
Descriptors.MaxAbsPartialCharge(mol)
|
||||
```
|
||||
|
||||
### MinAbsPartialCharge
|
||||
Minimum absolute partial charge.
|
||||
```python
|
||||
Descriptors.MinAbsPartialCharge(mol)
|
||||
```
|
||||
|
||||
## Fingerprint Density
|
||||
|
||||
Measures the density of molecular fingerprints.
|
||||
|
||||
### FpDensityMorgan1
|
||||
Morgan fingerprint density at radius 1.
|
||||
```python
|
||||
Descriptors.FpDensityMorgan1(mol)
|
||||
```
|
||||
|
||||
### FpDensityMorgan2
|
||||
Morgan fingerprint density at radius 2.
|
||||
```python
|
||||
Descriptors.FpDensityMorgan2(mol)
|
||||
```
|
||||
|
||||
### FpDensityMorgan3
|
||||
Morgan fingerprint density at radius 3.
|
||||
```python
|
||||
Descriptors.FpDensityMorgan3(mol)
|
||||
```
|
||||
|
||||
## PEOE VSA Descriptors
|
||||
|
||||
Partial Equalization of Orbital Electronegativities (PEOE) VSA descriptors.
|
||||
|
||||
### PEOE_VSA1 through PEOE_VSA14
|
||||
MOE-type descriptors using partial charges and surface area contributions.
|
||||
```python
|
||||
Descriptors.PEOE_VSA1(mol)
|
||||
# ... through PEOE_VSA14
|
||||
```
|
||||
|
||||
## SMR VSA Descriptors
|
||||
|
||||
Molecular refractivity VSA descriptors.
|
||||
|
||||
### SMR_VSA1 through SMR_VSA10
|
||||
MOE-type descriptors using MR contributions and surface area.
|
||||
```python
|
||||
Descriptors.SMR_VSA1(mol)
|
||||
# ... through SMR_VSA10
|
||||
```
|
||||
|
||||
## SLogP VSA Descriptors
|
||||
|
||||
LogP VSA descriptors.
|
||||
|
||||
### SLogP_VSA1 through SLogP_VSA12
|
||||
MOE-type descriptors using LogP contributions and surface area.
|
||||
```python
|
||||
Descriptors.SLogP_VSA1(mol)
|
||||
# ... through SLogP_VSA12
|
||||
```
|
||||
|
||||
## EState VSA Descriptors
|
||||
|
||||
### EState_VSA1 through EState_VSA11
|
||||
MOE-type descriptors using E-state indices and surface area.
|
||||
```python
|
||||
Descriptors.EState_VSA1(mol)
|
||||
# ... through EState_VSA11
|
||||
```
|
||||
|
||||
## VSA Descriptors
|
||||
|
||||
van der Waals surface area descriptors.
|
||||
|
||||
### VSA_EState1 through VSA_EState10
|
||||
EState VSA descriptors.
|
||||
```python
|
||||
Descriptors.VSA_EState1(mol)
|
||||
# ... through VSA_EState10
|
||||
```
|
||||
|
||||
## BCUT Descriptors
|
||||
|
||||
Burden-CAS-University of Texas eigenvalue descriptors.
|
||||
|
||||
### BCUT2D_MWHI
|
||||
Highest eigenvalue of Burden matrix weighted by molecular weight.
|
||||
```python
|
||||
Descriptors.BCUT2D_MWHI(mol)
|
||||
```
|
||||
|
||||
### BCUT2D_MWLOW
|
||||
Lowest eigenvalue of Burden matrix weighted by molecular weight.
|
||||
```python
|
||||
Descriptors.BCUT2D_MWLOW(mol)
|
||||
```
|
||||
|
||||
### BCUT2D_CHGHI
|
||||
Highest eigenvalue weighted by partial charges.
|
||||
```python
|
||||
Descriptors.BCUT2D_CHGHI(mol)
|
||||
```
|
||||
|
||||
### BCUT2D_CHGLO
|
||||
Lowest eigenvalue weighted by partial charges.
|
||||
```python
|
||||
Descriptors.BCUT2D_CHGLO(mol)
|
||||
```
|
||||
|
||||
### BCUT2D_LOGPHI
|
||||
Highest eigenvalue weighted by LogP.
|
||||
```python
|
||||
Descriptors.BCUT2D_LOGPHI(mol)
|
||||
```
|
||||
|
||||
### BCUT2D_LOGPLOW
|
||||
Lowest eigenvalue weighted by LogP.
|
||||
```python
|
||||
Descriptors.BCUT2D_LOGPLOW(mol)
|
||||
```
|
||||
|
||||
### BCUT2D_MRHI
|
||||
Highest eigenvalue weighted by molar refractivity.
|
||||
```python
|
||||
Descriptors.BCUT2D_MRHI(mol)
|
||||
```
|
||||
|
||||
### BCUT2D_MRLOW
|
||||
Lowest eigenvalue weighted by molar refractivity.
|
||||
```python
|
||||
Descriptors.BCUT2D_MRLOW(mol)
|
||||
```
|
||||
|
||||
## Autocorrelation Descriptors
|
||||
|
||||
### AUTOCORR2D
|
||||
2D autocorrelation descriptors (if enabled).
|
||||
Various autocorrelation indices measuring spatial distribution of properties.
|
||||
|
||||
## MQN Descriptors
|
||||
|
||||
Molecular Quantum Numbers - 42 simple descriptors.
|
||||
|
||||
### mqn1 through mqn42
|
||||
Integer descriptors counting various molecular features.
|
||||
```python
|
||||
# Access via CalcMolDescriptors
|
||||
desc = Descriptors.CalcMolDescriptors(mol)
|
||||
mqns = {k: v for k, v in desc.items() if k.startswith('mqn')}
|
||||
```
|
||||
|
||||
## QED
|
||||
|
||||
### qed
|
||||
Quantitative Estimate of Drug-likeness.
|
||||
```python
|
||||
Descriptors.qed(mol)
|
||||
```
|
||||
|
||||
## Lipinski's Rule of Five
|
||||
|
||||
Check drug-likeness using Lipinski's criteria:
|
||||
|
||||
```python
|
||||
def lipinski_rule_of_five(mol):
|
||||
mw = Descriptors.MolWt(mol) <= 500
|
||||
logp = Descriptors.MolLogP(mol) <= 5
|
||||
hbd = Descriptors.NumHDonors(mol) <= 5
|
||||
hba = Descriptors.NumHAcceptors(mol) <= 10
|
||||
return mw and logp and hbd and hba
|
||||
```
|
||||
|
||||
## Batch Descriptor Calculation
|
||||
|
||||
Calculate all descriptors at once:
|
||||
|
||||
```python
|
||||
from rdkit import Chem
|
||||
from rdkit.Chem import Descriptors
|
||||
|
||||
mol = Chem.MolFromSmiles('CCO')
|
||||
|
||||
# Get all descriptors as dictionary
|
||||
all_descriptors = Descriptors.CalcMolDescriptors(mol)
|
||||
|
||||
# Access specific descriptor
|
||||
mw = all_descriptors['MolWt']
|
||||
logp = all_descriptors['MolLogP']
|
||||
|
||||
# Get list of available descriptor names
|
||||
from rdkit.Chem import Descriptors
|
||||
descriptor_names = [desc[0] for desc in Descriptors._descList]
|
||||
```
|
||||
|
||||
## Descriptor Categories Summary
|
||||
|
||||
1. **Physicochemical**: MolWt, MolLogP, MolMR, TPSA
|
||||
2. **Topological**: BertzCT, BalabanJ, Kappa indices
|
||||
3. **Electronic**: Partial charges, E-state indices
|
||||
4. **Shape**: Kappa indices, BCUT descriptors
|
||||
5. **Connectivity**: Chi indices
|
||||
6. **2D Fingerprints**: FpDensity descriptors
|
||||
7. **Atom counts**: Heavy atoms, heteroatoms, rings
|
||||
8. **Drug-likeness**: QED, Lipinski parameters
|
||||
9. **Flexibility**: NumRotatableBonds, HallKierAlpha
|
||||
10. **Surface area**: VSA-based descriptors
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
### Drug-likeness Screening
|
||||
|
||||
```python
|
||||
def screen_druglikeness(mol):
|
||||
return {
|
||||
'MW': Descriptors.MolWt(mol),
|
||||
'LogP': Descriptors.MolLogP(mol),
|
||||
'HBD': Descriptors.NumHDonors(mol),
|
||||
'HBA': Descriptors.NumHAcceptors(mol),
|
||||
'TPSA': Descriptors.TPSA(mol),
|
||||
'RotBonds': Descriptors.NumRotatableBonds(mol),
|
||||
'AromaticRings': Descriptors.NumAromaticRings(mol),
|
||||
'QED': Descriptors.qed(mol)
|
||||
}
|
||||
```
|
||||
|
||||
### Lead-like Filtering
|
||||
|
||||
```python
|
||||
def is_leadlike(mol):
|
||||
mw = 250 <= Descriptors.MolWt(mol) <= 350
|
||||
logp = Descriptors.MolLogP(mol) <= 3.5
|
||||
rot_bonds = Descriptors.NumRotatableBonds(mol) <= 7
|
||||
return mw and logp and rot_bonds
|
||||
```
|
||||
|
||||
### Diversity Analysis
|
||||
|
||||
```python
|
||||
def molecular_complexity(mol):
|
||||
return {
|
||||
'BertzCT': Descriptors.BertzCT(mol),
|
||||
'NumRings': Descriptors.RingCount(mol),
|
||||
'NumRotBonds': Descriptors.NumRotatableBonds(mol),
|
||||
'FractionCsp3': Descriptors.FractionCsp3(mol),
|
||||
'NumAromaticRings': Descriptors.NumAromaticRings(mol)
|
||||
}
|
||||
```
|
||||
|
||||
## Tips
|
||||
|
||||
1. **Use batch calculation** for multiple descriptors to avoid redundant computations
|
||||
2. **Check for None** - some descriptors may return None for invalid molecules
|
||||
3. **Normalize descriptors** for machine learning applications
|
||||
4. **Select relevant descriptors** - not all 200+ descriptors are useful for every task
|
||||
5. **Consider 3D descriptors** separately (require 3D coordinates)
|
||||
6. **Validate ranges** - check if descriptor values are in expected ranges
|
||||
Reference in New Issue
Block a user