243 lines
5.9 KiB
Markdown
243 lines
5.9 KiB
Markdown
# DrugBank Data Access
|
|
|
|
## Authentication and Setup
|
|
|
|
### Account Creation
|
|
DrugBank requires user authentication to access data:
|
|
1. Create account at go.drugbank.com
|
|
2. Accept the license agreement (free for academic use, paid for commercial)
|
|
3. Obtain username and password credentials
|
|
|
|
### Credential Management
|
|
|
|
**Environment Variables (Recommended)**
|
|
```bash
|
|
export DRUGBANK_USERNAME="your_username"
|
|
export DRUGBANK_PASSWORD="your_password"
|
|
```
|
|
|
|
**Configuration File**
|
|
Create `~/.config/drugbank.ini`:
|
|
```ini
|
|
[drugbank]
|
|
username = your_username
|
|
password = your_password
|
|
```
|
|
|
|
**Direct Specification**
|
|
```python
|
|
# Pass credentials directly (not recommended for production)
|
|
download_drugbank(username="user", password="pass")
|
|
```
|
|
|
|
## Python Package Installation
|
|
|
|
### drugbank-downloader
|
|
Primary tool for programmatic access:
|
|
```bash
|
|
pip install drugbank-downloader
|
|
```
|
|
|
|
**Requirements:** Python >=3.9
|
|
|
|
### Optional Dependencies
|
|
```bash
|
|
pip install bioversions # For automatic latest version detection
|
|
pip install lxml # For XML parsing optimization
|
|
```
|
|
|
|
## Data Download Methods
|
|
|
|
### Download Full Database
|
|
```python
|
|
from drugbank_downloader import download_drugbank
|
|
|
|
# Download specific version
|
|
path = download_drugbank(version='5.1.7')
|
|
# Returns: ~/.data/drugbank/5.1.7/full database.xml.zip
|
|
|
|
# Download latest version (requires bioversions)
|
|
path = download_drugbank()
|
|
```
|
|
|
|
### Custom Storage Location
|
|
```python
|
|
# Custom prefix for storage
|
|
path = download_drugbank(prefix=['custom', 'location', 'drugbank'])
|
|
# Stores at: ~/.data/custom/location/drugbank/[version]/
|
|
```
|
|
|
|
### Verify Download
|
|
```python
|
|
import os
|
|
if os.path.exists(path):
|
|
size_mb = os.path.getsize(path) / (1024 * 1024)
|
|
print(f"Downloaded successfully: {size_mb:.1f} MB")
|
|
```
|
|
|
|
## Working with Downloaded Data
|
|
|
|
### Open Zipped XML Without Extraction
|
|
```python
|
|
from drugbank_downloader import open_drugbank
|
|
import xml.etree.ElementTree as ET
|
|
|
|
# Open file directly from zip
|
|
with open_drugbank() as file:
|
|
tree = ET.parse(file)
|
|
root = tree.getroot()
|
|
```
|
|
|
|
### Parse XML Tree
|
|
```python
|
|
from drugbank_downloader import parse_drugbank, get_drugbank_root
|
|
|
|
# Get parsed tree
|
|
tree = parse_drugbank()
|
|
|
|
# Get root element directly
|
|
root = get_drugbank_root()
|
|
```
|
|
|
|
### CLI Usage
|
|
```bash
|
|
# Download using command line
|
|
drugbank_downloader --username USER --password PASS
|
|
|
|
# Download latest version
|
|
drugbank_downloader
|
|
```
|
|
|
|
## Data Formats and Versions
|
|
|
|
### Available Formats
|
|
- **XML**: Primary format, most comprehensive data
|
|
- **JSON**: Available via API (requires separate API key)
|
|
- **CSV/TSV**: Export from web interface or parse XML
|
|
- **SQL**: Database dumps available for download
|
|
|
|
### Version Management
|
|
```python
|
|
# Specify exact version for reproducibility
|
|
path = download_drugbank(version='5.1.10')
|
|
|
|
# List cached versions
|
|
from pathlib import Path
|
|
drugbank_dir = Path.home() / '.data' / 'drugbank'
|
|
if drugbank_dir.exists():
|
|
versions = [d.name for d in drugbank_dir.iterdir() if d.is_dir()]
|
|
print(f"Cached versions: {versions}")
|
|
```
|
|
|
|
### Version History
|
|
- **Version 6.0** (2024): Latest release, expanded drug entries
|
|
- **Version 5.1.x** (2019-2023): Incremental updates
|
|
- **Version 5.0** (2017): ~9,591 drug entries
|
|
- **Version 4.0** (2014): Added metabolite structures
|
|
- **Version 3.0** (2011): Added transporter and pathway data
|
|
- **Version 2.0** (2009): Added interactions and ADMET
|
|
|
|
## API Access
|
|
|
|
### REST API Endpoints
|
|
```python
|
|
import requests
|
|
|
|
# Query by DrugBank ID
|
|
drug_id = "DB00001"
|
|
url = f"https://go.drugbank.com/drugs/{drug_id}.json"
|
|
headers = {"Authorization": "Bearer YOUR_API_KEY"}
|
|
|
|
response = requests.get(url, headers=headers)
|
|
if response.status_code == 200:
|
|
drug_data = response.json()
|
|
```
|
|
|
|
### Rate Limits
|
|
- **Development Key**: 3,000 requests/month
|
|
- **Production Key**: Custom limits based on license
|
|
- **Best Practice**: Cache results locally to minimize API calls
|
|
|
|
### Regional Scoping
|
|
DrugBank API is scoped by region:
|
|
- **USA**: FDA-approved drugs
|
|
- **Canada**: Health Canada-approved drugs
|
|
- **EU**: EMA-approved drugs
|
|
|
|
Specify region in API requests when applicable.
|
|
|
|
## Data Caching Strategy
|
|
|
|
### Intermediate Results
|
|
```python
|
|
import pickle
|
|
from pathlib import Path
|
|
|
|
# Cache parsed data
|
|
cache_file = Path("drugbank_parsed.pkl")
|
|
|
|
if cache_file.exists():
|
|
with open(cache_file, 'rb') as f:
|
|
data = pickle.load(f)
|
|
else:
|
|
# Parse and process
|
|
root = get_drugbank_root()
|
|
data = process_drugbank_data(root)
|
|
|
|
# Save cache
|
|
with open(cache_file, 'wb') as f:
|
|
pickle.dump(data, f)
|
|
```
|
|
|
|
### Version-Specific Caching
|
|
```python
|
|
version = "5.1.10"
|
|
cache_file = Path(f"drugbank_{version}_processed.pkl")
|
|
# Ensures cache invalidation when version changes
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Authentication Failures**
|
|
- Verify credentials are correct
|
|
- Check license agreement is accepted
|
|
- Ensure account has not expired
|
|
|
|
**Download Failures**
|
|
- Check internet connectivity
|
|
- Verify sufficient disk space (~1-2 GB needed)
|
|
- Try specifying an older version if latest fails
|
|
|
|
**Parsing Errors**
|
|
- Ensure complete download (check file size)
|
|
- Verify XML is not corrupted
|
|
- Use lxml parser for better error handling
|
|
|
|
### Error Handling
|
|
```python
|
|
from drugbank_downloader import download_drugbank
|
|
import logging
|
|
|
|
logging.basicConfig(level=logging.INFO)
|
|
|
|
try:
|
|
path = download_drugbank()
|
|
print(f"Success: {path}")
|
|
except Exception as e:
|
|
print(f"Download failed: {e}")
|
|
# Fallback: specify older stable version
|
|
path = download_drugbank(version='5.1.7')
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Version Specification**: Always specify exact version for reproducible research
|
|
2. **Credential Security**: Use environment variables, never hardcode credentials
|
|
3. **Caching**: Cache intermediate processing results to avoid re-parsing
|
|
4. **Documentation**: Document which DrugBank version was used in analysis
|
|
5. **License Compliance**: Ensure proper licensing for your use case
|
|
6. **Local Storage**: Keep local copies to reduce download frequency
|
|
7. **Error Handling**: Implement robust error handling for network issues
|