Initial commit
This commit is contained in:
242
skills/drugbank-database/references/data-access.md
Normal file
242
skills/drugbank-database/references/data-access.md
Normal file
@@ -0,0 +1,242 @@
|
||||
# DrugBank Data Access
|
||||
|
||||
## Authentication and Setup
|
||||
|
||||
### Account Creation
|
||||
DrugBank requires user authentication to access data:
|
||||
1. Create account at go.drugbank.com
|
||||
2. Accept the license agreement (free for academic use, paid for commercial)
|
||||
3. Obtain username and password credentials
|
||||
|
||||
### Credential Management
|
||||
|
||||
**Environment Variables (Recommended)**
|
||||
```bash
|
||||
export DRUGBANK_USERNAME="your_username"
|
||||
export DRUGBANK_PASSWORD="your_password"
|
||||
```
|
||||
|
||||
**Configuration File**
|
||||
Create `~/.config/drugbank.ini`:
|
||||
```ini
|
||||
[drugbank]
|
||||
username = your_username
|
||||
password = your_password
|
||||
```
|
||||
|
||||
**Direct Specification**
|
||||
```python
|
||||
# Pass credentials directly (not recommended for production)
|
||||
download_drugbank(username="user", password="pass")
|
||||
```
|
||||
|
||||
## Python Package Installation
|
||||
|
||||
### drugbank-downloader
|
||||
Primary tool for programmatic access:
|
||||
```bash
|
||||
pip install drugbank-downloader
|
||||
```
|
||||
|
||||
**Requirements:** Python >=3.9
|
||||
|
||||
### Optional Dependencies
|
||||
```bash
|
||||
pip install bioversions # For automatic latest version detection
|
||||
pip install lxml # For XML parsing optimization
|
||||
```
|
||||
|
||||
## Data Download Methods
|
||||
|
||||
### Download Full Database
|
||||
```python
|
||||
from drugbank_downloader import download_drugbank
|
||||
|
||||
# Download specific version
|
||||
path = download_drugbank(version='5.1.7')
|
||||
# Returns: ~/.data/drugbank/5.1.7/full database.xml.zip
|
||||
|
||||
# Download latest version (requires bioversions)
|
||||
path = download_drugbank()
|
||||
```
|
||||
|
||||
### Custom Storage Location
|
||||
```python
|
||||
# Custom prefix for storage
|
||||
path = download_drugbank(prefix=['custom', 'location', 'drugbank'])
|
||||
# Stores at: ~/.data/custom/location/drugbank/[version]/
|
||||
```
|
||||
|
||||
### Verify Download
|
||||
```python
|
||||
import os
|
||||
if os.path.exists(path):
|
||||
size_mb = os.path.getsize(path) / (1024 * 1024)
|
||||
print(f"Downloaded successfully: {size_mb:.1f} MB")
|
||||
```
|
||||
|
||||
## Working with Downloaded Data
|
||||
|
||||
### Open Zipped XML Without Extraction
|
||||
```python
|
||||
from drugbank_downloader import open_drugbank
|
||||
import xml.etree.ElementTree as ET
|
||||
|
||||
# Open file directly from zip
|
||||
with open_drugbank() as file:
|
||||
tree = ET.parse(file)
|
||||
root = tree.getroot()
|
||||
```
|
||||
|
||||
### Parse XML Tree
|
||||
```python
|
||||
from drugbank_downloader import parse_drugbank, get_drugbank_root
|
||||
|
||||
# Get parsed tree
|
||||
tree = parse_drugbank()
|
||||
|
||||
# Get root element directly
|
||||
root = get_drugbank_root()
|
||||
```
|
||||
|
||||
### CLI Usage
|
||||
```bash
|
||||
# Download using command line
|
||||
drugbank_downloader --username USER --password PASS
|
||||
|
||||
# Download latest version
|
||||
drugbank_downloader
|
||||
```
|
||||
|
||||
## Data Formats and Versions
|
||||
|
||||
### Available Formats
|
||||
- **XML**: Primary format, most comprehensive data
|
||||
- **JSON**: Available via API (requires separate API key)
|
||||
- **CSV/TSV**: Export from web interface or parse XML
|
||||
- **SQL**: Database dumps available for download
|
||||
|
||||
### Version Management
|
||||
```python
|
||||
# Specify exact version for reproducibility
|
||||
path = download_drugbank(version='5.1.10')
|
||||
|
||||
# List cached versions
|
||||
from pathlib import Path
|
||||
drugbank_dir = Path.home() / '.data' / 'drugbank'
|
||||
if drugbank_dir.exists():
|
||||
versions = [d.name for d in drugbank_dir.iterdir() if d.is_dir()]
|
||||
print(f"Cached versions: {versions}")
|
||||
```
|
||||
|
||||
### Version History
|
||||
- **Version 6.0** (2024): Latest release, expanded drug entries
|
||||
- **Version 5.1.x** (2019-2023): Incremental updates
|
||||
- **Version 5.0** (2017): ~9,591 drug entries
|
||||
- **Version 4.0** (2014): Added metabolite structures
|
||||
- **Version 3.0** (2011): Added transporter and pathway data
|
||||
- **Version 2.0** (2009): Added interactions and ADMET
|
||||
|
||||
## API Access
|
||||
|
||||
### REST API Endpoints
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Query by DrugBank ID
|
||||
drug_id = "DB00001"
|
||||
url = f"https://go.drugbank.com/drugs/{drug_id}.json"
|
||||
headers = {"Authorization": "Bearer YOUR_API_KEY"}
|
||||
|
||||
response = requests.get(url, headers=headers)
|
||||
if response.status_code == 200:
|
||||
drug_data = response.json()
|
||||
```
|
||||
|
||||
### Rate Limits
|
||||
- **Development Key**: 3,000 requests/month
|
||||
- **Production Key**: Custom limits based on license
|
||||
- **Best Practice**: Cache results locally to minimize API calls
|
||||
|
||||
### Regional Scoping
|
||||
DrugBank API is scoped by region:
|
||||
- **USA**: FDA-approved drugs
|
||||
- **Canada**: Health Canada-approved drugs
|
||||
- **EU**: EMA-approved drugs
|
||||
|
||||
Specify region in API requests when applicable.
|
||||
|
||||
## Data Caching Strategy
|
||||
|
||||
### Intermediate Results
|
||||
```python
|
||||
import pickle
|
||||
from pathlib import Path
|
||||
|
||||
# Cache parsed data
|
||||
cache_file = Path("drugbank_parsed.pkl")
|
||||
|
||||
if cache_file.exists():
|
||||
with open(cache_file, 'rb') as f:
|
||||
data = pickle.load(f)
|
||||
else:
|
||||
# Parse and process
|
||||
root = get_drugbank_root()
|
||||
data = process_drugbank_data(root)
|
||||
|
||||
# Save cache
|
||||
with open(cache_file, 'wb') as f:
|
||||
pickle.dump(data, f)
|
||||
```
|
||||
|
||||
### Version-Specific Caching
|
||||
```python
|
||||
version = "5.1.10"
|
||||
cache_file = Path(f"drugbank_{version}_processed.pkl")
|
||||
# Ensures cache invalidation when version changes
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Authentication Failures**
|
||||
- Verify credentials are correct
|
||||
- Check license agreement is accepted
|
||||
- Ensure account has not expired
|
||||
|
||||
**Download Failures**
|
||||
- Check internet connectivity
|
||||
- Verify sufficient disk space (~1-2 GB needed)
|
||||
- Try specifying an older version if latest fails
|
||||
|
||||
**Parsing Errors**
|
||||
- Ensure complete download (check file size)
|
||||
- Verify XML is not corrupted
|
||||
- Use lxml parser for better error handling
|
||||
|
||||
### Error Handling
|
||||
```python
|
||||
from drugbank_downloader import download_drugbank
|
||||
import logging
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
try:
|
||||
path = download_drugbank()
|
||||
print(f"Success: {path}")
|
||||
except Exception as e:
|
||||
print(f"Download failed: {e}")
|
||||
# Fallback: specify older stable version
|
||||
path = download_drugbank(version='5.1.7')
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Version Specification**: Always specify exact version for reproducible research
|
||||
2. **Credential Security**: Use environment variables, never hardcode credentials
|
||||
3. **Caching**: Cache intermediate processing results to avoid re-parsing
|
||||
4. **Documentation**: Document which DrugBank version was used in analysis
|
||||
5. **License Compliance**: Ensure proper licensing for your use case
|
||||
6. **Local Storage**: Keep local copies to reduce download frequency
|
||||
7. **Error Handling**: Implement robust error handling for network issues
|
||||
Reference in New Issue
Block a user