# ChEMBL Web Services API Reference ## Overview ChEMBL is a manually curated database of bioactive molecules with drug-like properties maintained by the European Bioinformatics Institute (EBI). It contains information about compounds, targets, assays, bioactivity data, and approved drugs. The ChEMBL database contains: - Over 2 million compound records - Over 1.4 million assay records - Over 19 million activity values - Information on 13,000+ drug targets - Data on 16,000+ approved drugs and clinical candidates ## Python Client Installation ```bash pip install chembl_webresource_client ``` ## Key Resources and Endpoints ChEMBL provides access to 30+ specialized endpoints: ### Core Data Types - **molecule** - Compound structures, properties, and synonyms - **target** - Protein and non-protein biological targets - **activity** - Bioassay measurement results - **assay** - Experimental assay details - **drug** - Approved pharmaceutical information - **mechanism** - Drug mechanism of action data - **document** - Literature sources and references - **cell_line** - Cell line information - **tissue** - Tissue types - **protein_class** - Protein classification - **target_component** - Target component details - **compound_structural_alert** - Structural alerts for toxicity ## Query Patterns and Filters ### Filter Operators The API supports Django-style filter operators: - `__exact` - Exact match - `__iexact` - Case-insensitive exact match - `__contains` - Contains substring - `__icontains` - Case-insensitive contains - `__startswith` - Starts with prefix - `__endswith` - Ends with suffix - `__gt` - Greater than - `__gte` - Greater than or equal - `__lt` - Less than - `__lte` - Less than or equal - `__range` - Value in range - `__in` - Value in list - `__isnull` - Is null/not null - `__regex` - Regular expression match - `__search` - Full text search ### Example Filter Queries **Molecular weight filtering:** ```python molecules.filter(molecule_properties__mw_freebase__lte=300) ``` **Name pattern matching:** ```python molecules.filter(pref_name__endswith='nib') ``` **Multiple conditions:** ```python molecules.filter( molecule_properties__mw_freebase__lte=300, pref_name__endswith='nib' ) ``` ## Chemical Structure Searches ### Substructure Search Search for compounds containing a specific substructure using SMILES: ```python from chembl_webresource_client.new_client import new_client similarity = new_client.similarity results = similarity.filter(smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=70) ``` ### Similarity Search Find compounds similar to a query structure: ```python similarity = new_client.similarity results = similarity.filter(smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85) ``` ## Common Data Retrieval Patterns ### Get Molecule by ChEMBL ID ```python molecule = new_client.molecule.get('CHEMBL25') ``` ### Get Target Information ```python target = new_client.target.get('CHEMBL240') ``` ### Get Activity Data ```python activities = new_client.activity.filter( target_chembl_id='CHEMBL240', standard_type='IC50', standard_value__lte=100 ) ``` ### Get Drug Information ```python drug = new_client.drug.get('CHEMBL1234') ``` ## Response Formats The API supports multiple response formats: - JSON (default) - XML - YAML ## Caching and Performance The Python client automatically caches results locally: - **Default cache duration**: 24 hours - **Cache location**: Local file system - **Lazy evaluation**: Queries execute only when data is accessed ### Configuration Settings ```python from chembl_webresource_client.settings import Settings # Disable caching Settings.Instance().CACHING = False # Adjust cache expiration (in seconds) Settings.Instance().CACHE_EXPIRE = 86400 # 24 hours # Set timeout Settings.Instance().TIMEOUT = 30 # Set retries Settings.Instance().TOTAL_RETRIES = 3 ``` ## Molecular Properties Common molecular properties available: - `mw_freebase` - Molecular weight - `alogp` - Calculated LogP - `hba` - Hydrogen bond acceptors - `hbd` - Hydrogen bond donors - `psa` - Polar surface area - `rtb` - Rotatable bonds - `ro3_pass` - Rule of 3 compliance - `num_ro5_violations` - Lipinski rule of 5 violations - `cx_most_apka` - Most acidic pKa - `cx_most_bpka` - Most basic pKa - `molecular_species` - Molecular species - `full_mwt` - Full molecular weight ## Bioactivity Data Fields Key bioactivity fields: - `standard_type` - Activity type (IC50, Ki, Kd, EC50, etc.) - `standard_value` - Numerical activity value - `standard_units` - Units (nM, uM, etc.) - `pchembl_value` - Normalized activity value (-log scale) - `activity_comment` - Activity annotations - `data_validity_comment` - Data validity flags - `potential_duplicate` - Duplicate flag ## Target Information Fields Target data includes: - `target_chembl_id` - ChEMBL target identifier - `pref_name` - Preferred target name - `target_type` - Type (PROTEIN, ORGANISM, etc.) - `organism` - Target organism - `tax_id` - NCBI taxonomy ID - `target_components` - Component details ## Advanced Query Examples ### Find Kinase Inhibitors ```python # Get kinase targets targets = new_client.target.filter( target_type='SINGLE PROTEIN', pref_name__icontains='kinase' ) # Get activities for these targets activities = new_client.activity.filter( target_chembl_id__in=[t['target_chembl_id'] for t in targets], standard_type='IC50', standard_value__lte=100 ) ``` ### Retrieve Drug Mechanisms ```python mechanisms = new_client.mechanism.filter( molecule_chembl_id='CHEMBL25' ) ``` ### Get Compound Bioactivities ```python activities = new_client.activity.filter( molecule_chembl_id='CHEMBL25', pchembl_value__isnull=False ) ``` ## Image Generation ChEMBL can generate SVG images of molecular structures: ```python from chembl_webresource_client.new_client import new_client image = new_client.image svg = image.get('CHEMBL25') ``` ## Pagination Results are paginated automatically. To iterate through all results: ```python activities = new_client.activity.filter(target_chembl_id='CHEMBL240') for activity in activities: print(activity) ``` ## Error Handling Common errors: - **404**: Resource not found - **503**: Service temporarily unavailable - **Timeout**: Request took too long The client automatically retries failed requests based on `TOTAL_RETRIES` setting. ## Rate Limiting ChEMBL has fair usage policies: - Be respectful with query frequency - Use caching to minimize repeated requests - Consider bulk downloads for large datasets ## Additional Resources - Official API documentation: https://www.ebi.ac.uk/chembl/api/data/docs - Python client GitHub: https://github.com/chembl/chembl_webresource_client - ChEMBL interface docs: https://chembl.gitbook.io/chembl-interface-documentation/ - Example notebooks: https://github.com/chembl/notebooks