Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/chembl-database/references/api_reference.md
+++ b/skills/chembl-database/references/api_reference.md
@@ -0,0 +1,272 @@
+# ChEMBL Web Services API Reference
+
+## Overview
+
+ChEMBL is a manually curated database of bioactive molecules with drug-like properties maintained by the European Bioinformatics Institute (EBI). It contains information about compounds, targets, assays, bioactivity data, and approved drugs.
+
+The ChEMBL database contains:
+- Over 2 million compound records
+- Over 1.4 million assay records
+- Over 19 million activity values
+- Information on 13,000+ drug targets
+- Data on 16,000+ approved drugs and clinical candidates
+
+## Python Client Installation
+
+```bash
+pip install chembl_webresource_client
+```
+
+## Key Resources and Endpoints
+
+ChEMBL provides access to 30+ specialized endpoints:
+
+### Core Data Types
+
+- **molecule** - Compound structures, properties, and synonyms
+- **target** - Protein and non-protein biological targets
+- **activity** - Bioassay measurement results
+- **assay** - Experimental assay details
+- **drug** - Approved pharmaceutical information
+- **mechanism** - Drug mechanism of action data
+- **document** - Literature sources and references
+- **cell_line** - Cell line information
+- **tissue** - Tissue types
+- **protein_class** - Protein classification
+- **target_component** - Target component details
+- **compound_structural_alert** - Structural alerts for toxicity
+
+## Query Patterns and Filters
+
+### Filter Operators
+
+The API supports Django-style filter operators:
+
+- `__exact` - Exact match
+- `__iexact` - Case-insensitive exact match
+- `__contains` - Contains substring
+- `__icontains` - Case-insensitive contains
+- `__startswith` - Starts with prefix
+- `__endswith` - Ends with suffix
+- `__gt` - Greater than
+- `__gte` - Greater than or equal
+- `__lt` - Less than
+- `__lte` - Less than or equal
+- `__range` - Value in range
+- `__in` - Value in list
+- `__isnull` - Is null/not null
+- `__regex` - Regular expression match
+- `__search` - Full text search
+
+### Example Filter Queries
+
+**Molecular weight filtering:**
+```python
+molecules.filter(molecule_properties__mw_freebase__lte=300)
+```
+
+**Name pattern matching:**
+```python
+molecules.filter(pref_name__endswith='nib')
+```
+
+**Multiple conditions:**
+```python
+molecules.filter(
+    molecule_properties__mw_freebase__lte=300,
+    pref_name__endswith='nib'
+)
+```
+
+## Chemical Structure Searches
+
+### Substructure Search
+Search for compounds containing a specific substructure using SMILES:
+
+```python
+from chembl_webresource_client.new_client import new_client
+similarity = new_client.similarity
+results = similarity.filter(smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=70)
+```
+
+### Similarity Search
+Find compounds similar to a query structure:
+
+```python
+similarity = new_client.similarity
+results = similarity.filter(smiles='CC(=O)Oc1ccccc1C(=O)O', similarity=85)
+```
+
+## Common Data Retrieval Patterns
+
+### Get Molecule by ChEMBL ID
+```python
+molecule = new_client.molecule.get('CHEMBL25')
+```
+
+### Get Target Information
+```python
+target = new_client.target.get('CHEMBL240')
+```
+
+### Get Activity Data
+```python
+activities = new_client.activity.filter(
+    target_chembl_id='CHEMBL240',
+    standard_type='IC50',
+    standard_value__lte=100
+)
+```
+
+### Get Drug Information
+```python
+drug = new_client.drug.get('CHEMBL1234')
+```
+
+## Response Formats
+
+The API supports multiple response formats:
+- JSON (default)
+- XML
+- YAML
+
+## Caching and Performance
+
+The Python client automatically caches results locally:
+- **Default cache duration**: 24 hours
+- **Cache location**: Local file system
+- **Lazy evaluation**: Queries execute only when data is accessed
+
+### Configuration Settings
+
+```python
+from chembl_webresource_client.settings import Settings
+
+# Disable caching
+Settings.Instance().CACHING = False
+
+# Adjust cache expiration (in seconds)
+Settings.Instance().CACHE_EXPIRE = 86400  # 24 hours
+
+# Set timeout
+Settings.Instance().TIMEOUT = 30
+
+# Set retries
+Settings.Instance().TOTAL_RETRIES = 3
+```
+
+## Molecular Properties
+
+Common molecular properties available:
+
+- `mw_freebase` - Molecular weight
+- `alogp` - Calculated LogP
+- `hba` - Hydrogen bond acceptors
+- `hbd` - Hydrogen bond donors
+- `psa` - Polar surface area
+- `rtb` - Rotatable bonds
+- `ro3_pass` - Rule of 3 compliance
+- `num_ro5_violations` - Lipinski rule of 5 violations
+- `cx_most_apka` - Most acidic pKa
+- `cx_most_bpka` - Most basic pKa
+- `molecular_species` - Molecular species
+- `full_mwt` - Full molecular weight
+
+## Bioactivity Data Fields
+
+Key bioactivity fields:
+
+- `standard_type` - Activity type (IC50, Ki, Kd, EC50, etc.)
+- `standard_value` - Numerical activity value
+- `standard_units` - Units (nM, uM, etc.)
+- `pchembl_value` - Normalized activity value (-log scale)
+- `activity_comment` - Activity annotations
+- `data_validity_comment` - Data validity flags
+- `potential_duplicate` - Duplicate flag
+
+## Target Information Fields
+
+Target data includes:
+
+- `target_chembl_id` - ChEMBL target identifier
+- `pref_name` - Preferred target name
+- `target_type` - Type (PROTEIN, ORGANISM, etc.)
+- `organism` - Target organism
+- `tax_id` - NCBI taxonomy ID
+- `target_components` - Component details
+
+## Advanced Query Examples
+
+### Find Kinase Inhibitors
+```python
+# Get kinase targets
+targets = new_client.target.filter(
+    target_type='SINGLE PROTEIN',
+    pref_name__icontains='kinase'
+)
+
+# Get activities for these targets
+activities = new_client.activity.filter(
+    target_chembl_id__in=[t['target_chembl_id'] for t in targets],
+    standard_type='IC50',
+    standard_value__lte=100
+)
+```
+
+### Retrieve Drug Mechanisms
+```python
+mechanisms = new_client.mechanism.filter(
+    molecule_chembl_id='CHEMBL25'
+)
+```
+
+### Get Compound Bioactivities
+```python
+activities = new_client.activity.filter(
+    molecule_chembl_id='CHEMBL25',
+    pchembl_value__isnull=False
+)
+```
+
+## Image Generation
+
+ChEMBL can generate SVG images of molecular structures:
+
+```python
+from chembl_webresource_client.new_client import new_client
+image = new_client.image
+svg = image.get('CHEMBL25')
+```
+
+## Pagination
+
+Results are paginated automatically. To iterate through all results:
+
+```python
+activities = new_client.activity.filter(target_chembl_id='CHEMBL240')
+for activity in activities:
+    print(activity)
+```
+
+## Error Handling
+
+Common errors:
+- **404**: Resource not found
+- **503**: Service temporarily unavailable
+- **Timeout**: Request took too long
+
+The client automatically retries failed requests based on `TOTAL_RETRIES` setting.
+
+## Rate Limiting
+
+ChEMBL has fair usage policies:
+- Be respectful with query frequency
+- Use caching to minimize repeated requests
+- Consider bulk downloads for large datasets
+
+## Additional Resources
+
+- Official API documentation: https://www.ebi.ac.uk/chembl/api/data/docs
+- Python client GitHub: https://github.com/chembl/chembl_webresource_client
+- ChEMBL interface docs: https://chembl.gitbook.io/chembl-interface-documentation/
+- Example notebooks: https://github.com/chembl/notebooks