zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Fork 0

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

5.9 KiB

Raw Blame History

DrugBank Data Access

Authentication and Setup

Account Creation

DrugBank requires user authentication to access data:

Create account at go.drugbank.com
Accept the license agreement (free for academic use, paid for commercial)
Obtain username and password credentials

Credential Management

Environment Variables (Recommended)

export DRUGBANK_USERNAME="your_username"
export DRUGBANK_PASSWORD="your_password"

Configuration File Create ~/.config/drugbank.ini:

[drugbank]
username = your_username
password = your_password

Direct Specification

# Pass credentials directly (not recommended for production)
download_drugbank(username="user", password="pass")

Python Package Installation

drugbank-downloader

Primary tool for programmatic access:

pip install drugbank-downloader

Requirements: Python >=3.9

Optional Dependencies

pip install bioversions  # For automatic latest version detection
pip install lxml  # For XML parsing optimization

Data Download Methods

Download Full Database

from drugbank_downloader import download_drugbank

# Download specific version
path = download_drugbank(version='5.1.7')
# Returns: ~/.data/drugbank/5.1.7/full database.xml.zip

# Download latest version (requires bioversions)
path = download_drugbank()

Custom Storage Location

# Custom prefix for storage
path = download_drugbank(prefix=['custom', 'location', 'drugbank'])
# Stores at: ~/.data/custom/location/drugbank/[version]/

Verify Download

import os
if os.path.exists(path):
    size_mb = os.path.getsize(path) / (1024 * 1024)
    print(f"Downloaded successfully: {size_mb:.1f} MB")

Working with Downloaded Data

Open Zipped XML Without Extraction

from drugbank_downloader import open_drugbank
import xml.etree.ElementTree as ET

# Open file directly from zip
with open_drugbank() as file:
    tree = ET.parse(file)
    root = tree.getroot()

Parse XML Tree

from drugbank_downloader import parse_drugbank, get_drugbank_root

# Get parsed tree
tree = parse_drugbank()

# Get root element directly
root = get_drugbank_root()

CLI Usage

# Download using command line
drugbank_downloader --username USER --password PASS

# Download latest version
drugbank_downloader

Data Formats and Versions

Available Formats

XML: Primary format, most comprehensive data
JSON: Available via API (requires separate API key)
CSV/TSV: Export from web interface or parse XML
SQL: Database dumps available for download

Version Management

# Specify exact version for reproducibility
path = download_drugbank(version='5.1.10')

# List cached versions
from pathlib import Path
drugbank_dir = Path.home() / '.data' / 'drugbank'
if drugbank_dir.exists():
    versions = [d.name for d in drugbank_dir.iterdir() if d.is_dir()]
    print(f"Cached versions: {versions}")

Version History

Version 6.0 (2024): Latest release, expanded drug entries
Version 5.1.x (2019-2023): Incremental updates
Version 5.0 (2017): ~9,591 drug entries
Version 4.0 (2014): Added metabolite structures
Version 3.0 (2011): Added transporter and pathway data
Version 2.0 (2009): Added interactions and ADMET

API Access

REST API Endpoints

import requests

# Query by DrugBank ID
drug_id = "DB00001"
url = f"https://go.drugbank.com/drugs/{drug_id}.json"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

response = requests.get(url, headers=headers)
if response.status_code == 200:
    drug_data = response.json()

Rate Limits

Development Key: 3,000 requests/month
Production Key: Custom limits based on license
Best Practice: Cache results locally to minimize API calls

Regional Scoping

DrugBank API is scoped by region:

USA: FDA-approved drugs
Canada: Health Canada-approved drugs
EU: EMA-approved drugs

Specify region in API requests when applicable.

Data Caching Strategy

Intermediate Results

import pickle
from pathlib import Path

# Cache parsed data
cache_file = Path("drugbank_parsed.pkl")

if cache_file.exists():
    with open(cache_file, 'rb') as f:
        data = pickle.load(f)
else:
    # Parse and process
    root = get_drugbank_root()
    data = process_drugbank_data(root)

    # Save cache
    with open(cache_file, 'wb') as f:
        pickle.dump(data, f)

Version-Specific Caching

version = "5.1.10"
cache_file = Path(f"drugbank_{version}_processed.pkl")
# Ensures cache invalidation when version changes

Troubleshooting

Common Issues

Authentication Failures

Verify credentials are correct
Check license agreement is accepted
Ensure account has not expired

Download Failures

Check internet connectivity
Verify sufficient disk space (~1-2 GB needed)
Try specifying an older version if latest fails

Parsing Errors

Ensure complete download (check file size)
Verify XML is not corrupted
Use lxml parser for better error handling

Error Handling

from drugbank_downloader import download_drugbank
import logging

logging.basicConfig(level=logging.INFO)

try:
    path = download_drugbank()
    print(f"Success: {path}")
except Exception as e:
    print(f"Download failed: {e}")
    # Fallback: specify older stable version
    path = download_drugbank(version='5.1.7')

Best Practices

Version Specification: Always specify exact version for reproducible research
Credential Security: Use environment variables, never hardcode credentials
Caching: Cache intermediate processing results to avoid re-parsing
Documentation: Document which DrugBank version was used in analysis
License Compliance: Ensure proper licensing for your use case
Local Storage: Keep local copies to reduce download frequency
Error Handling: Implement robust error handling for network issues

5.9 KiB Raw Blame History