6.7 KiB
ChEMBL Web Services API Reference
Overview
ChEMBL is a manually curated database of bioactive molecules with drug-like properties maintained by the European Bioinformatics Institute (EBI). It contains information about compounds, targets, assays, bioactivity data, and approved drugs.
The ChEMBL database contains:
- Over 2 million compound records
- Over 1.4 million assay records
- Over 19 million activity values
- Information on 13,000+ drug targets
- Data on 16,000+ approved drugs and clinical candidates
Python Client Installation
pip install chembl_webresource_client
Key Resources and Endpoints
ChEMBL provides access to 30+ specialized endpoints:
Core Data Types
- molecule - Compound structures, properties, and synonyms
- target - Protein and non-protein biological targets
- activity - Bioassay measurement results
- assay - Experimental assay details
- drug - Approved pharmaceutical information
- mechanism - Drug mechanism of action data
- document - Literature sources and references
- cell_line - Cell line information
- tissue - Tissue types
- protein_class - Protein classification
- target_component - Target component details
- compound_structural_alert - Structural alerts for toxicity
Query Patterns and Filters
Filter Operators
The API supports Django-style filter operators:
__exact- Exact match__iexact- Case-insensitive exact match__contains- Contains substring__icontains- Case-insensitive contains__startswith- Starts with prefix__endswith- Ends with suffix__gt- Greater than__gte- Greater than or equal__lt- Less than__lte- Less than or equal__range- Value in range__in- Value in list__isnull- Is null/not null__regex- Regular expression match__search- Full text search
Example Filter Queries
Molecular weight filtering:
molecules.filter(molecule_properties__mw_freebase__lte=300)
Name pattern matching:
molecules.filter(pref_name__endswith='nib')
Multiple conditions:
molecules.filter(
molecule_properties__mw_freebase__lte=300,
pref_name__endswith='nib'
)
Chemical Structure Searches
Substructure Search
Search for compounds containing a specific substructure using SMILES:
from chembl_webresource_client.new_client import new_client
similarity = new_client.similarity
results = similarity.filter(smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=70)
Similarity Search
Find compounds similar to a query structure:
similarity = new_client.similarity
results = similarity.filter(smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85)
Common Data Retrieval Patterns
Get Molecule by ChEMBL ID
molecule = new_client.molecule.get('CHEMBL25')
Get Target Information
target = new_client.target.get('CHEMBL240')
Get Activity Data
activities = new_client.activity.filter(
target_chembl_id='CHEMBL240',
standard_type='IC50',
standard_value__lte=100
)
Get Drug Information
drug = new_client.drug.get('CHEMBL1234')
Response Formats
The API supports multiple response formats:
- JSON (default)
- XML
- YAML
Caching and Performance
The Python client automatically caches results locally:
- Default cache duration: 24 hours
- Cache location: Local file system
- Lazy evaluation: Queries execute only when data is accessed
Configuration Settings
from chembl_webresource_client.settings import Settings
# Disable caching
Settings.Instance().CACHING = False
# Adjust cache expiration (in seconds)
Settings.Instance().CACHE_EXPIRE = 86400 # 24 hours
# Set timeout
Settings.Instance().TIMEOUT = 30
# Set retries
Settings.Instance().TOTAL_RETRIES = 3
Molecular Properties
Common molecular properties available:
mw_freebase- Molecular weightalogp- Calculated LogPhba- Hydrogen bond acceptorshbd- Hydrogen bond donorspsa- Polar surface areartb- Rotatable bondsro3_pass- Rule of 3 compliancenum_ro5_violations- Lipinski rule of 5 violationscx_most_apka- Most acidic pKacx_most_bpka- Most basic pKamolecular_species- Molecular speciesfull_mwt- Full molecular weight
Bioactivity Data Fields
Key bioactivity fields:
standard_type- Activity type (IC50, Ki, Kd, EC50, etc.)standard_value- Numerical activity valuestandard_units- Units (nM, uM, etc.)pchembl_value- Normalized activity value (-log scale)activity_comment- Activity annotationsdata_validity_comment- Data validity flagspotential_duplicate- Duplicate flag
Target Information Fields
Target data includes:
target_chembl_id- ChEMBL target identifierpref_name- Preferred target nametarget_type- Type (PROTEIN, ORGANISM, etc.)organism- Target organismtax_id- NCBI taxonomy IDtarget_components- Component details
Advanced Query Examples
Find Kinase Inhibitors
# Get kinase targets
targets = new_client.target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
# Get activities for these targets
activities = new_client.activity.filter(
target_chembl_id__in=[t['target_chembl_id'] for t in targets],
standard_type='IC50',
standard_value__lte=100
)
Retrieve Drug Mechanisms
mechanisms = new_client.mechanism.filter(
molecule_chembl_id='CHEMBL25'
)
Get Compound Bioactivities
activities = new_client.activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)
Image Generation
ChEMBL can generate SVG images of molecular structures:
from chembl_webresource_client.new_client import new_client
image = new_client.image
svg = image.get('CHEMBL25')
Pagination
Results are paginated automatically. To iterate through all results:
activities = new_client.activity.filter(target_chembl_id='CHEMBL240')
for activity in activities:
print(activity)
Error Handling
Common errors:
- 404: Resource not found
- 503: Service temporarily unavailable
- Timeout: Request took too long
The client automatically retries failed requests based on TOTAL_RETRIES setting.
Rate Limiting
ChEMBL has fair usage policies:
- Be respectful with query frequency
- Use caching to minimize repeated requests
- Consider bulk downloads for large datasets
Additional Resources
- Official API documentation: https://www.ebi.ac.uk/chembl/api/data/docs
- Python client GitHub: https://github.com/chembl/chembl_webresource_client
- ChEMBL interface docs: https://chembl.gitbook.io/chembl-interface-documentation/
- Example notebooks: https://github.com/chembl/notebooks