# DrugBank Data Access ## Authentication and Setup ### Account Creation DrugBank requires user authentication to access data: 1. Create account at go.drugbank.com 2. Accept the license agreement (free for academic use, paid for commercial) 3. Obtain username and password credentials ### Credential Management **Environment Variables (Recommended)** ```bash export DRUGBANK_USERNAME="your_username" export DRUGBANK_PASSWORD="your_password" ``` **Configuration File** Create `~/.config/drugbank.ini`: ```ini [drugbank] username = your_username password = your_password ``` **Direct Specification** ```python # Pass credentials directly (not recommended for production) download_drugbank(username="user", password="pass") ``` ## Python Package Installation ### drugbank-downloader Primary tool for programmatic access: ```bash pip install drugbank-downloader ``` **Requirements:** Python >=3.9 ### Optional Dependencies ```bash pip install bioversions # For automatic latest version detection pip install lxml # For XML parsing optimization ``` ## Data Download Methods ### Download Full Database ```python from drugbank_downloader import download_drugbank # Download specific version path = download_drugbank(version='5.1.7') # Returns: ~/.data/drugbank/5.1.7/full database.xml.zip # Download latest version (requires bioversions) path = download_drugbank() ``` ### Custom Storage Location ```python # Custom prefix for storage path = download_drugbank(prefix=['custom', 'location', 'drugbank']) # Stores at: ~/.data/custom/location/drugbank/[version]/ ``` ### Verify Download ```python import os if os.path.exists(path): size_mb = os.path.getsize(path) / (1024 * 1024) print(f"Downloaded successfully: {size_mb:.1f} MB") ``` ## Working with Downloaded Data ### Open Zipped XML Without Extraction ```python from drugbank_downloader import open_drugbank import xml.etree.ElementTree as ET # Open file directly from zip with open_drugbank() as file: tree = ET.parse(file) root = tree.getroot() ``` ### Parse XML Tree ```python from drugbank_downloader import parse_drugbank, get_drugbank_root # Get parsed tree tree = parse_drugbank() # Get root element directly root = get_drugbank_root() ``` ### CLI Usage ```bash # Download using command line drugbank_downloader --username USER --password PASS # Download latest version drugbank_downloader ``` ## Data Formats and Versions ### Available Formats - **XML**: Primary format, most comprehensive data - **JSON**: Available via API (requires separate API key) - **CSV/TSV**: Export from web interface or parse XML - **SQL**: Database dumps available for download ### Version Management ```python # Specify exact version for reproducibility path = download_drugbank(version='5.1.10') # List cached versions from pathlib import Path drugbank_dir = Path.home() / '.data' / 'drugbank' if drugbank_dir.exists(): versions = [d.name for d in drugbank_dir.iterdir() if d.is_dir()] print(f"Cached versions: {versions}") ``` ### Version History - **Version 6.0** (2024): Latest release, expanded drug entries - **Version 5.1.x** (2019-2023): Incremental updates - **Version 5.0** (2017): ~9,591 drug entries - **Version 4.0** (2014): Added metabolite structures - **Version 3.0** (2011): Added transporter and pathway data - **Version 2.0** (2009): Added interactions and ADMET ## API Access ### REST API Endpoints ```python import requests # Query by DrugBank ID drug_id = "DB00001" url = f"https://go.drugbank.com/drugs/{drug_id}.json" headers = {"Authorization": "Bearer YOUR_API_KEY"} response = requests.get(url, headers=headers) if response.status_code == 200: drug_data = response.json() ``` ### Rate Limits - **Development Key**: 3,000 requests/month - **Production Key**: Custom limits based on license - **Best Practice**: Cache results locally to minimize API calls ### Regional Scoping DrugBank API is scoped by region: - **USA**: FDA-approved drugs - **Canada**: Health Canada-approved drugs - **EU**: EMA-approved drugs Specify region in API requests when applicable. ## Data Caching Strategy ### Intermediate Results ```python import pickle from pathlib import Path # Cache parsed data cache_file = Path("drugbank_parsed.pkl") if cache_file.exists(): with open(cache_file, 'rb') as f: data = pickle.load(f) else: # Parse and process root = get_drugbank_root() data = process_drugbank_data(root) # Save cache with open(cache_file, 'wb') as f: pickle.dump(data, f) ``` ### Version-Specific Caching ```python version = "5.1.10" cache_file = Path(f"drugbank_{version}_processed.pkl") # Ensures cache invalidation when version changes ``` ## Troubleshooting ### Common Issues **Authentication Failures** - Verify credentials are correct - Check license agreement is accepted - Ensure account has not expired **Download Failures** - Check internet connectivity - Verify sufficient disk space (~1-2 GB needed) - Try specifying an older version if latest fails **Parsing Errors** - Ensure complete download (check file size) - Verify XML is not corrupted - Use lxml parser for better error handling ### Error Handling ```python from drugbank_downloader import download_drugbank import logging logging.basicConfig(level=logging.INFO) try: path = download_drugbank() print(f"Success: {path}") except Exception as e: print(f"Download failed: {e}") # Fallback: specify older stable version path = download_drugbank(version='5.1.7') ``` ## Best Practices 1. **Version Specification**: Always specify exact version for reproducible research 2. **Credential Security**: Use environment variables, never hardcode credentials 3. **Caching**: Cache intermediate processing results to avoid re-parsing 4. **Documentation**: Document which DrugBank version was used in analysis 5. **License Compliance**: Ensure proper licensing for your use case 6. **Local Storage**: Keep local copies to reduce download frequency 7. **Error Handling**: Implement robust error handling for network issues