Files
gh-k-dense-ai-claude-scient…/skills/chembl-database/references/api_reference.md
2025-11-30 08:30:10 +08:00

6.7 KiB

ChEMBL Web Services API Reference

Overview

ChEMBL is a manually curated database of bioactive molecules with drug-like properties maintained by the European Bioinformatics Institute (EBI). It contains information about compounds, targets, assays, bioactivity data, and approved drugs.

The ChEMBL database contains:

  • Over 2 million compound records
  • Over 1.4 million assay records
  • Over 19 million activity values
  • Information on 13,000+ drug targets
  • Data on 16,000+ approved drugs and clinical candidates

Python Client Installation

pip install chembl_webresource_client

Key Resources and Endpoints

ChEMBL provides access to 30+ specialized endpoints:

Core Data Types

  • molecule - Compound structures, properties, and synonyms
  • target - Protein and non-protein biological targets
  • activity - Bioassay measurement results
  • assay - Experimental assay details
  • drug - Approved pharmaceutical information
  • mechanism - Drug mechanism of action data
  • document - Literature sources and references
  • cell_line - Cell line information
  • tissue - Tissue types
  • protein_class - Protein classification
  • target_component - Target component details
  • compound_structural_alert - Structural alerts for toxicity

Query Patterns and Filters

Filter Operators

The API supports Django-style filter operators:

  • __exact - Exact match
  • __iexact - Case-insensitive exact match
  • __contains - Contains substring
  • __icontains - Case-insensitive contains
  • __startswith - Starts with prefix
  • __endswith - Ends with suffix
  • __gt - Greater than
  • __gte - Greater than or equal
  • __lt - Less than
  • __lte - Less than or equal
  • __range - Value in range
  • __in - Value in list
  • __isnull - Is null/not null
  • __regex - Regular expression match
  • __search - Full text search

Example Filter Queries

Molecular weight filtering:

molecules.filter(molecule_properties__mw_freebase__lte=300)

Name pattern matching:

molecules.filter(pref_name__endswith='nib')

Multiple conditions:

molecules.filter(
    molecule_properties__mw_freebase__lte=300,
    pref_name__endswith='nib'
)

Chemical Structure Searches

Search for compounds containing a specific substructure using SMILES:

from chembl_webresource_client.new_client import new_client
similarity = new_client.similarity
results = similarity.filter(smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=70)

Find compounds similar to a query structure:

similarity = new_client.similarity
results = similarity.filter(smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85)

Common Data Retrieval Patterns

Get Molecule by ChEMBL ID

molecule = new_client.molecule.get('CHEMBL25')

Get Target Information

target = new_client.target.get('CHEMBL240')

Get Activity Data

activities = new_client.activity.filter(
    target_chembl_id='CHEMBL240',
    standard_type='IC50',
    standard_value__lte=100
)

Get Drug Information

drug = new_client.drug.get('CHEMBL1234')

Response Formats

The API supports multiple response formats:

  • JSON (default)
  • XML
  • YAML

Caching and Performance

The Python client automatically caches results locally:

  • Default cache duration: 24 hours
  • Cache location: Local file system
  • Lazy evaluation: Queries execute only when data is accessed

Configuration Settings

from chembl_webresource_client.settings import Settings

# Disable caching
Settings.Instance().CACHING = False

# Adjust cache expiration (in seconds)
Settings.Instance().CACHE_EXPIRE = 86400  # 24 hours

# Set timeout
Settings.Instance().TIMEOUT = 30

# Set retries
Settings.Instance().TOTAL_RETRIES = 3

Molecular Properties

Common molecular properties available:

  • mw_freebase - Molecular weight
  • alogp - Calculated LogP
  • hba - Hydrogen bond acceptors
  • hbd - Hydrogen bond donors
  • psa - Polar surface area
  • rtb - Rotatable bonds
  • ro3_pass - Rule of 3 compliance
  • num_ro5_violations - Lipinski rule of 5 violations
  • cx_most_apka - Most acidic pKa
  • cx_most_bpka - Most basic pKa
  • molecular_species - Molecular species
  • full_mwt - Full molecular weight

Bioactivity Data Fields

Key bioactivity fields:

  • standard_type - Activity type (IC50, Ki, Kd, EC50, etc.)
  • standard_value - Numerical activity value
  • standard_units - Units (nM, uM, etc.)
  • pchembl_value - Normalized activity value (-log scale)
  • activity_comment - Activity annotations
  • data_validity_comment - Data validity flags
  • potential_duplicate - Duplicate flag

Target Information Fields

Target data includes:

  • target_chembl_id - ChEMBL target identifier
  • pref_name - Preferred target name
  • target_type - Type (PROTEIN, ORGANISM, etc.)
  • organism - Target organism
  • tax_id - NCBI taxonomy ID
  • target_components - Component details

Advanced Query Examples

Find Kinase Inhibitors

# Get kinase targets
targets = new_client.target.filter(
    target_type='SINGLE PROTEIN',
    pref_name__icontains='kinase'
)

# Get activities for these targets
activities = new_client.activity.filter(
    target_chembl_id__in=[t['target_chembl_id'] for t in targets],
    standard_type='IC50',
    standard_value__lte=100
)

Retrieve Drug Mechanisms

mechanisms = new_client.mechanism.filter(
    molecule_chembl_id='CHEMBL25'
)

Get Compound Bioactivities

activities = new_client.activity.filter(
    molecule_chembl_id='CHEMBL25',
    pchembl_value__isnull=False
)

Image Generation

ChEMBL can generate SVG images of molecular structures:

from chembl_webresource_client.new_client import new_client
image = new_client.image
svg = image.get('CHEMBL25')

Pagination

Results are paginated automatically. To iterate through all results:

activities = new_client.activity.filter(target_chembl_id='CHEMBL240')
for activity in activities:
    print(activity)

Error Handling

Common errors:

  • 404: Resource not found
  • 503: Service temporarily unavailable
  • Timeout: Request took too long

The client automatically retries failed requests based on TOTAL_RETRIES setting.

Rate Limiting

ChEMBL has fair usage policies:

  • Be respectful with query frequency
  • Use caching to minimize repeated requests
  • Consider bulk downloads for large datasets

Additional Resources