5.9 KiB
5.9 KiB
DrugBank Data Access
Authentication and Setup
Account Creation
DrugBank requires user authentication to access data:
- Create account at go.drugbank.com
- Accept the license agreement (free for academic use, paid for commercial)
- Obtain username and password credentials
Credential Management
Environment Variables (Recommended)
export DRUGBANK_USERNAME="your_username"
export DRUGBANK_PASSWORD="your_password"
Configuration File
Create ~/.config/drugbank.ini:
[drugbank]
username = your_username
password = your_password
Direct Specification
# Pass credentials directly (not recommended for production)
download_drugbank(username="user", password="pass")
Python Package Installation
drugbank-downloader
Primary tool for programmatic access:
pip install drugbank-downloader
Requirements: Python >=3.9
Optional Dependencies
pip install bioversions # For automatic latest version detection
pip install lxml # For XML parsing optimization
Data Download Methods
Download Full Database
from drugbank_downloader import download_drugbank
# Download specific version
path = download_drugbank(version='5.1.7')
# Returns: ~/.data/drugbank/5.1.7/full database.xml.zip
# Download latest version (requires bioversions)
path = download_drugbank()
Custom Storage Location
# Custom prefix for storage
path = download_drugbank(prefix=['custom', 'location', 'drugbank'])
# Stores at: ~/.data/custom/location/drugbank/[version]/
Verify Download
import os
if os.path.exists(path):
size_mb = os.path.getsize(path) / (1024 * 1024)
print(f"Downloaded successfully: {size_mb:.1f} MB")
Working with Downloaded Data
Open Zipped XML Without Extraction
from drugbank_downloader import open_drugbank
import xml.etree.ElementTree as ET
# Open file directly from zip
with open_drugbank() as file:
tree = ET.parse(file)
root = tree.getroot()
Parse XML Tree
from drugbank_downloader import parse_drugbank, get_drugbank_root
# Get parsed tree
tree = parse_drugbank()
# Get root element directly
root = get_drugbank_root()
CLI Usage
# Download using command line
drugbank_downloader --username USER --password PASS
# Download latest version
drugbank_downloader
Data Formats and Versions
Available Formats
- XML: Primary format, most comprehensive data
- JSON: Available via API (requires separate API key)
- CSV/TSV: Export from web interface or parse XML
- SQL: Database dumps available for download
Version Management
# Specify exact version for reproducibility
path = download_drugbank(version='5.1.10')
# List cached versions
from pathlib import Path
drugbank_dir = Path.home() / '.data' / 'drugbank'
if drugbank_dir.exists():
versions = [d.name for d in drugbank_dir.iterdir() if d.is_dir()]
print(f"Cached versions: {versions}")
Version History
- Version 6.0 (2024): Latest release, expanded drug entries
- Version 5.1.x (2019-2023): Incremental updates
- Version 5.0 (2017): ~9,591 drug entries
- Version 4.0 (2014): Added metabolite structures
- Version 3.0 (2011): Added transporter and pathway data
- Version 2.0 (2009): Added interactions and ADMET
API Access
REST API Endpoints
import requests
# Query by DrugBank ID
drug_id = "DB00001"
url = f"https://go.drugbank.com/drugs/{drug_id}.json"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
response = requests.get(url, headers=headers)
if response.status_code == 200:
drug_data = response.json()
Rate Limits
- Development Key: 3,000 requests/month
- Production Key: Custom limits based on license
- Best Practice: Cache results locally to minimize API calls
Regional Scoping
DrugBank API is scoped by region:
- USA: FDA-approved drugs
- Canada: Health Canada-approved drugs
- EU: EMA-approved drugs
Specify region in API requests when applicable.
Data Caching Strategy
Intermediate Results
import pickle
from pathlib import Path
# Cache parsed data
cache_file = Path("drugbank_parsed.pkl")
if cache_file.exists():
with open(cache_file, 'rb') as f:
data = pickle.load(f)
else:
# Parse and process
root = get_drugbank_root()
data = process_drugbank_data(root)
# Save cache
with open(cache_file, 'wb') as f:
pickle.dump(data, f)
Version-Specific Caching
version = "5.1.10"
cache_file = Path(f"drugbank_{version}_processed.pkl")
# Ensures cache invalidation when version changes
Troubleshooting
Common Issues
Authentication Failures
- Verify credentials are correct
- Check license agreement is accepted
- Ensure account has not expired
Download Failures
- Check internet connectivity
- Verify sufficient disk space (~1-2 GB needed)
- Try specifying an older version if latest fails
Parsing Errors
- Ensure complete download (check file size)
- Verify XML is not corrupted
- Use lxml parser for better error handling
Error Handling
from drugbank_downloader import download_drugbank
import logging
logging.basicConfig(level=logging.INFO)
try:
path = download_drugbank()
print(f"Success: {path}")
except Exception as e:
print(f"Download failed: {e}")
# Fallback: specify older stable version
path = download_drugbank(version='5.1.7')
Best Practices
- Version Specification: Always specify exact version for reproducible research
- Credential Security: Use environment variables, never hardcode credentials
- Caching: Cache intermediate processing results to avoid re-parsing
- Documentation: Document which DrugBank version was used in analysis
- License Compliance: Ensure proper licensing for your use case
- Local Storage: Keep local copies to reduce download frequency
- Error Handling: Implement robust error handling for network issues