8.0 KiB
8.0 KiB
Common SMARTS Patterns for RDKit
This document provides a collection of commonly used SMARTS patterns for substructure searching in RDKit.
Functional Groups
Alcohols
# Primary alcohol
'[CH2][OH1]'
# Secondary alcohol
'[CH1]([OH1])[CH3,CH2]'
# Tertiary alcohol
'[C]([OH1])([C])([C])[C]'
# Any alcohol
'[OH1][C]'
# Phenol
'c[OH1]'
Aldehydes and Ketones
# Aldehyde
'[CH1](=O)'
# Ketone
'[C](=O)[C]'
# Any carbonyl
'[C](=O)'
Carboxylic Acids and Derivatives
# Carboxylic acid
'C(=O)[OH1]'
'[CX3](=O)[OX2H1]' # More specific
# Ester
'C(=O)O[C]'
'[CX3](=O)[OX2][C]' # More specific
# Amide
'C(=O)N'
'[CX3](=O)[NX3]' # More specific
# Acyl chloride
'C(=O)Cl'
# Anhydride
'C(=O)OC(=O)'
Amines
# Primary amine
'[NH2][C]'
# Secondary amine
'[NH1]([C])[C]'
# Tertiary amine
'[N]([C])([C])[C]'
# Aromatic amine (aniline)
'c[NH2]'
# Any amine
'[NX3]'
Ethers
# Aliphatic ether
'[C][O][C]'
# Aromatic ether
'c[O][C,c]'
Halides
# Alkyl halide
'[C][F,Cl,Br,I]'
# Aryl halide
'c[F,Cl,Br,I]'
# Specific halides
'[C]F' # Fluoride
'[C]Cl' # Chloride
'[C]Br' # Bromide
'[C]I' # Iodide
Nitriles and Nitro Groups
# Nitrile
'C#N'
# Nitro group
'[N+](=O)[O-]'
# Nitro on aromatic
'c[N+](=O)[O-]'
Thiols and Sulfides
# Thiol
'[C][SH1]'
# Sulfide
'[C][S][C]'
# Disulfide
'[C][S][S][C]'
# Sulfoxide
'[C][S](=O)[C]'
# Sulfone
'[C][S](=O)(=O)[C]'
Ring Systems
Simple Rings
# Benzene ring
'c1ccccc1'
'[#6]1:[#6]:[#6]:[#6]:[#6]:[#6]:1' # Explicit atoms
# Cyclohexane
'C1CCCCC1'
# Cyclopentane
'C1CCCC1'
# Any 3-membered ring
'[r3]'
# Any 4-membered ring
'[r4]'
# Any 5-membered ring
'[r5]'
# Any 6-membered ring
'[r6]'
# Any 7-membered ring
'[r7]'
Aromatic Rings
# Aromatic carbon in ring
'[cR]'
# Aromatic nitrogen in ring (pyridine, etc.)
'[nR]'
# Aromatic oxygen in ring (furan, etc.)
'[oR]'
# Aromatic sulfur in ring (thiophene, etc.)
'[sR]'
# Any aromatic ring
'a1aaaaa1'
Heterocycles
# Pyridine
'n1ccccc1'
# Pyrrole
'n1cccc1'
# Furan
'o1cccc1'
# Thiophene
's1cccc1'
# Imidazole
'n1cncc1'
# Pyrimidine
'n1cnccc1'
# Thiazole
'n1ccsc1'
# Oxazole
'n1ccoc1'
Fused Rings
# Naphthalene
'c1ccc2ccccc2c1'
# Indole
'c1ccc2[nH]ccc2c1'
# Quinoline
'n1cccc2ccccc12'
# Benzimidazole
'c1ccc2[nH]cnc2c1'
# Purine
'n1cnc2ncnc2c1'
Macrocycles
# Rings with 8 or more atoms
'[r{8-}]'
# Rings with 9-15 atoms
'[r{9-15}]'
# Rings with more than 12 atoms (macrocycles)
'[r{12-}]'
Specific Structural Features
Aliphatic vs Aromatic
# Aliphatic carbon
'[C]'
# Aromatic carbon
'[c]'
# Aliphatic carbon in ring
'[CR]'
# Aromatic carbon (alternative)
'[cR]'
Stereochemistry
# Tetrahedral center with clockwise chirality
'[C@]'
# Tetrahedral center with counterclockwise chirality
'[C@@]'
# Any chiral center
'[C@,C@@]'
# E double bond
'C/C=C/C'
# Z double bond
'C/C=C\\C'
Hybridization
# SP hybridization (triple bond)
'[CX2]'
# SP2 hybridization (double bond or aromatic)
'[CX3]'
# SP3 hybridization (single bonds)
'[CX4]'
Charge
# Positive charge
'[+]'
# Negative charge
'[-]'
# Specific charge
'[+1]'
'[-1]'
'[+2]'
# Positively charged nitrogen
'[N+]'
# Negatively charged oxygen
'[O-]'
# Carboxylate anion
'C(=O)[O-]'
# Ammonium cation
'[N+]([C])([C])([C])[C]'
Pharmacophore Features
Hydrogen Bond Donors
# Hydroxyl
'[OH]'
# Amine
'[NH,NH2]'
# Amide NH
'[N][C](=O)'
# Any H-bond donor
'[OH,NH,NH2,NH3+]'
Hydrogen Bond Acceptors
# Carbonyl oxygen
'[O]=[C,S,P]'
# Ether oxygen
'[OX2]'
# Ester oxygen
'C(=O)[O]'
# Nitrogen acceptor
'[N;!H0]'
# Any H-bond acceptor
'[O,N]'
Hydrophobic Groups
# Alkyl chain (4+ carbons)
'CCCC'
# Branched alkyl
'C(C)(C)C'
# Aromatic rings (hydrophobic)
'c1ccccc1'
Aromatic Interactions
# Benzene for pi-pi stacking
'c1ccccc1'
# Heterocycle for pi-pi
'[a]1[a][a][a][a][a]1'
# Any aromatic ring
'[aR]'
Drug-like Fragments
Lipinski Fragments
# Aromatic ring with substituents
'c1cc(*)ccc1'
# Aliphatic chain
'CCCC'
# Ether linkage
'[C][O][C]'
# Amine (basic center)
'[N]([C])([C])'
Common Scaffolds
# Benzamide
'c1ccccc1C(=O)N'
# Sulfonamide
'S(=O)(=O)N'
# Urea
'[N][C](=O)[N]'
# Guanidine
'[N]C(=[N])[N]'
# Phosphate
'P(=O)([O-])([O-])[O-]'
Privileged Structures
# Biphenyl
'c1ccccc1-c2ccccc2'
# Benzopyran
'c1ccc2OCCCc2c1'
# Piperazine
'N1CCNCC1'
# Piperidine
'N1CCCCC1'
# Morpholine
'N1CCOCC1'
Reactive Groups
Electrophiles
# Acyl chloride
'C(=O)Cl'
# Alkyl halide
'[C][Cl,Br,I]'
# Epoxide
'C1OC1'
# Michael acceptor
'C=C[C](=O)'
Nucleophiles
# Primary amine
'[NH2][C]'
# Thiol
'[SH][C]'
# Alcohol
'[OH][C]'
Toxicity Alerts (PAINS)
# Rhodanine
'S1C(=O)NC(=S)C1'
# Catechol
'c1ccc(O)c(O)c1'
# Quinone
'O=C1C=CC(=O)C=C1'
# Hydroquinone
'OC1=CC=C(O)C=C1'
# Alkyl halide (reactive)
'[C][I,Br]'
# Michael acceptor (reactive)
'C=CC(=O)[C,N]'
Metal Binding
# Carboxylate (metal chelator)
'C(=O)[O-]'
# Hydroxamic acid
'C(=O)N[OH]'
# Catechol (iron chelator)
'c1c(O)c(O)ccc1'
# Thiol (metal binding)
'[SH]'
# Histidine-like (metal binding)
'c1ncnc1'
Size and Complexity Filters
# Long aliphatic chains (>6 carbons)
'CCCCCCC'
# Highly branched (quaternary carbon)
'C(C)(C)(C)C'
# Multiple rings
'[R]~[R]' # Two rings connected
# Spiro center
'[C]12[C][C][C]1[C][C]2'
Special Patterns
Atom Counts
# Any atom
'[*]'
# Heavy atom (not H)
'[!H]'
# Carbon
'[C,c]'
# Heteroatom
'[!C;!H]'
# Halogen
'[F,Cl,Br,I]'
Bond Types
# Single bond
'C-C'
# Double bond
'C=C'
# Triple bond
'C#C'
# Aromatic bond
'c:c'
# Any bond
'C~C'
Ring Membership
# In any ring
'[R]'
# Not in ring
'[!R]'
# In exactly one ring
'[R1]'
# In exactly two rings
'[R2]'
# Ring bond
'[R]~[R]'
Degree and Connectivity
# Total degree 1 (terminal atom)
'[D1]'
# Total degree 2 (chain)
'[D2]'
# Total degree 3 (branch point)
'[D3]'
# Total degree 4 (highly branched)
'[D4]'
# Connected to exactly 2 carbons
'[C]([C])[C]'
Usage Examples
from rdkit import Chem
# Create SMARTS query
pattern = Chem.MolFromSmarts('[CH2][OH1]') # Primary alcohol
# Search molecule
mol = Chem.MolFromSmiles('CCO')
matches = mol.GetSubstructMatches(pattern)
# Multiple patterns
patterns = {
'alcohol': '[OH1][C]',
'amine': '[NH2,NH1][C]',
'carboxylic_acid': 'C(=O)[OH1]'
}
# Check for functional groups
for name, smarts in patterns.items():
query = Chem.MolFromSmarts(smarts)
if mol.HasSubstructMatch(query):
print(f"Found {name}")
Tips for Writing SMARTS
- Be specific when needed: Use atom properties [CX3] instead of just [C]
- Use brackets for clarity: [C] is different from C (aromatic)
- Consider aromaticity: lowercase letters (c, n, o) are aromatic
- Check ring membership: [R] for in-ring, [!R] for not in-ring
- Use recursive SMARTS: $(...) for complex patterns
- Test patterns: Always validate SMARTS on known molecules
- Start simple: Build complex patterns incrementally
Common SMARTS Syntax
[C]- Aliphatic carbon[c]- Aromatic carbon[CX4]- Carbon with 4 connections (sp3)[CX3]- Carbon with 3 connections (sp2)[CX2]- Carbon with 2 connections (sp)[CH3]- Methyl group[R]- In ring[r6]- In 6-membered ring[r{5-7}]- In 5, 6, or 7-membered ring[D2]- Degree 2 (2 neighbors)[+]- Positive charge[-]- Negative charge[!C]- Not carbon[#6]- Element with atomic number 6 (carbon)~- Any bond type-- Single bond=- Double bond#- Triple bond:- Aromatic bond@- Clockwise chirality@@- Counter-clockwise chirality