Initial commit
This commit is contained in:
687
skills/fda-database/references/api_basics.md
Normal file
687
skills/fda-database/references/api_basics.md
Normal file
@@ -0,0 +1,687 @@
|
||||
# OpenFDA API Basics
|
||||
|
||||
This reference provides comprehensive information about using the openFDA API, including authentication, rate limits, query syntax, and best practices.
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Base URL
|
||||
|
||||
All openFDA API endpoints follow this structure:
|
||||
```
|
||||
https://api.fda.gov/{category}/{endpoint}.json
|
||||
```
|
||||
|
||||
Examples:
|
||||
- `https://api.fda.gov/drug/event.json`
|
||||
- `https://api.fda.gov/device/510k.json`
|
||||
- `https://api.fda.gov/food/enforcement.json`
|
||||
|
||||
### HTTPS Required
|
||||
|
||||
**All requests must use HTTPS**. HTTP requests are not accepted and will fail.
|
||||
|
||||
## Authentication
|
||||
|
||||
### API Key Registration
|
||||
|
||||
While openFDA can be used without an API key, registering for a free API key is strongly recommended for higher rate limits.
|
||||
|
||||
**Registration**: Visit https://open.fda.gov/apis/authentication/ to sign up
|
||||
|
||||
**Benefits of API Key**:
|
||||
- Higher rate limits (240 req/min, 120,000 req/day)
|
||||
- Better for production applications
|
||||
- No additional cost
|
||||
|
||||
### Using Your API Key
|
||||
|
||||
Include your API key in requests using one of two methods:
|
||||
|
||||
**Method 1: Query Parameter (Recommended)**
|
||||
```python
|
||||
import requests
|
||||
|
||||
api_key = "YOUR_API_KEY_HERE"
|
||||
url = "https://api.fda.gov/drug/event.json"
|
||||
|
||||
params = {
|
||||
"api_key": api_key,
|
||||
"search": "patient.drug.medicinalproduct:aspirin",
|
||||
"limit": 10
|
||||
}
|
||||
|
||||
response = requests.get(url, params=params)
|
||||
```
|
||||
|
||||
**Method 2: Basic Authentication**
|
||||
```python
|
||||
import requests
|
||||
|
||||
api_key = "YOUR_API_KEY_HERE"
|
||||
url = "https://api.fda.gov/drug/event.json"
|
||||
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:aspirin",
|
||||
"limit": 10
|
||||
}
|
||||
|
||||
response = requests.get(url, params=params, auth=(api_key, ''))
|
||||
```
|
||||
|
||||
## Rate Limits
|
||||
|
||||
### Current Limits
|
||||
|
||||
| Status | Requests per Minute | Requests per Day |
|
||||
|--------|-------------------|------------------|
|
||||
| **Without API Key** | 240 per IP address | 1,000 per IP address |
|
||||
| **With API Key** | 240 per key | 120,000 per key |
|
||||
|
||||
### Rate Limit Headers
|
||||
|
||||
The API returns rate limit information in response headers:
|
||||
```python
|
||||
response = requests.get(url, params=params)
|
||||
|
||||
print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}")
|
||||
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
|
||||
print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}")
|
||||
```
|
||||
|
||||
### Handling Rate Limits
|
||||
|
||||
When you exceed rate limits, the API returns:
|
||||
- **Status Code**: `429 Too Many Requests`
|
||||
- **Error Message**: Indicates rate limit exceeded
|
||||
|
||||
**Best Practice**: Implement exponential backoff:
|
||||
```python
|
||||
import requests
|
||||
import time
|
||||
|
||||
def query_with_rate_limit_handling(url, params, max_retries=3):
|
||||
"""Query API with automatic rate limit handling."""
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
response = requests.get(url, params=params)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except requests.exceptions.HTTPError as e:
|
||||
if response.status_code == 429:
|
||||
# Rate limit exceeded
|
||||
wait_time = (2 ** attempt) * 60 # Exponential backoff
|
||||
print(f"Rate limit hit. Waiting {wait_time} seconds...")
|
||||
time.sleep(wait_time)
|
||||
else:
|
||||
raise
|
||||
raise Exception("Max retries exceeded")
|
||||
```
|
||||
|
||||
### Increasing Limits
|
||||
|
||||
For applications requiring higher limits, contact the openFDA team through their website with details about your use case.
|
||||
|
||||
## Query Syntax
|
||||
|
||||
### Basic Structure
|
||||
|
||||
Queries use this format:
|
||||
```
|
||||
?api_key=YOUR_KEY¶meter=value¶meter2=value2
|
||||
```
|
||||
|
||||
Parameters are separated by ampersands (`&`).
|
||||
|
||||
### Search Parameter
|
||||
|
||||
The `search` parameter is the primary way to filter results.
|
||||
|
||||
**Basic Format**:
|
||||
```
|
||||
search=field:value
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```python
|
||||
params = {
|
||||
"api_key": api_key,
|
||||
"search": "patient.drug.medicinalproduct:aspirin"
|
||||
}
|
||||
```
|
||||
|
||||
### Search Operators
|
||||
|
||||
#### AND Operator
|
||||
Combines multiple conditions (both must be true):
|
||||
```python
|
||||
# Find aspirin adverse events in Canada
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:aspirin+AND+occurcountry:ca"
|
||||
}
|
||||
```
|
||||
|
||||
#### OR Operator
|
||||
Either condition can be true (OR is implicit with space):
|
||||
```python
|
||||
# Find aspirin OR ibuprofen
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:(aspirin ibuprofen)"
|
||||
}
|
||||
```
|
||||
|
||||
Or explicitly:
|
||||
```python
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen"
|
||||
}
|
||||
```
|
||||
|
||||
#### NOT Operator
|
||||
Exclude results:
|
||||
```python
|
||||
# Events NOT in the United States
|
||||
params = {
|
||||
"search": "_exists_:occurcountry+AND+NOT+occurcountry:us"
|
||||
}
|
||||
```
|
||||
|
||||
#### Wildcards
|
||||
Use asterisk (`*`) for partial matching:
|
||||
```python
|
||||
# Any drug starting with "met"
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:met*"
|
||||
}
|
||||
|
||||
# Any drug containing "cillin"
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:*cillin*"
|
||||
}
|
||||
```
|
||||
|
||||
#### Exact Phrase Matching
|
||||
Use quotes for exact phrases:
|
||||
```python
|
||||
params = {
|
||||
"search": 'patient.reaction.reactionmeddrapt:"heart attack"'
|
||||
}
|
||||
```
|
||||
|
||||
#### Range Queries
|
||||
Search within ranges:
|
||||
```python
|
||||
# Date range (YYYYMMDD format)
|
||||
params = {
|
||||
"search": "receivedate:[20200101+TO+20201231]"
|
||||
}
|
||||
|
||||
# Numeric range
|
||||
params = {
|
||||
"search": "patient.patientonsetage:[18+TO+65]"
|
||||
}
|
||||
|
||||
# Open-ended ranges
|
||||
params = {
|
||||
"search": "patient.patientonsetage:[65+TO+*]" # 65 and older
|
||||
}
|
||||
```
|
||||
|
||||
#### Field Existence
|
||||
Check if a field exists:
|
||||
```python
|
||||
# Records that have a patient age
|
||||
params = {
|
||||
"search": "_exists_:patient.patientonsetage"
|
||||
}
|
||||
|
||||
# Records missing patient age
|
||||
params = {
|
||||
"search": "_missing_:patient.patientonsetage"
|
||||
}
|
||||
```
|
||||
|
||||
### Limit Parameter
|
||||
|
||||
Controls how many results to return (1-1000, default 1):
|
||||
```python
|
||||
params = {
|
||||
"search": "...",
|
||||
"limit": 100
|
||||
}
|
||||
```
|
||||
|
||||
**Maximum**: 1000 results per request
|
||||
|
||||
### Skip Parameter
|
||||
|
||||
For pagination, skip the first N results:
|
||||
```python
|
||||
# Get results 101-200
|
||||
params = {
|
||||
"search": "...",
|
||||
"limit": 100,
|
||||
"skip": 100
|
||||
}
|
||||
```
|
||||
|
||||
**Pagination Example**:
|
||||
```python
|
||||
def get_all_results(url, search_query, api_key, max_results=5000):
|
||||
"""Retrieve results with pagination."""
|
||||
all_results = []
|
||||
skip = 0
|
||||
limit = 100
|
||||
|
||||
while len(all_results) < max_results:
|
||||
params = {
|
||||
"api_key": api_key,
|
||||
"search": search_query,
|
||||
"limit": limit,
|
||||
"skip": skip
|
||||
}
|
||||
|
||||
response = requests.get(url, params=params)
|
||||
data = response.json()
|
||||
|
||||
if "results" not in data or len(data["results"]) == 0:
|
||||
break
|
||||
|
||||
all_results.extend(data["results"])
|
||||
|
||||
if len(data["results"]) < limit:
|
||||
break # No more results
|
||||
|
||||
skip += limit
|
||||
time.sleep(0.25) # Rate limiting courtesy
|
||||
|
||||
return all_results[:max_results]
|
||||
```
|
||||
|
||||
### Count Parameter
|
||||
|
||||
Aggregate and count results by a field (instead of returning individual records):
|
||||
```python
|
||||
# Count events by country
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:aspirin",
|
||||
"count": "occurcountry"
|
||||
}
|
||||
```
|
||||
|
||||
**Response Format**:
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{"term": "us", "count": 12543},
|
||||
{"term": "ca", "count": 3421},
|
||||
{"term": "gb", "count": 2156}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Exact Counting
|
||||
|
||||
Add `.exact` suffix for exact phrase counting (especially important for multi-word fields):
|
||||
```python
|
||||
# Count exact reaction terms (not individual words)
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:aspirin",
|
||||
"count": "patient.reaction.reactionmeddrapt.exact"
|
||||
}
|
||||
```
|
||||
|
||||
**Without `.exact`**: Counts individual words
|
||||
**With `.exact`**: Counts complete phrases
|
||||
|
||||
### Sort Parameter
|
||||
|
||||
Sort results by field:
|
||||
```python
|
||||
# Sort by date, newest first
|
||||
params = {
|
||||
"search": "...",
|
||||
"sort": "receivedate:desc"
|
||||
}
|
||||
|
||||
# Sort by date, oldest first
|
||||
params = {
|
||||
"search": "...",
|
||||
"sort": "receivedate:asc"
|
||||
}
|
||||
```
|
||||
|
||||
## Response Format
|
||||
|
||||
### Standard Response Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"meta": {
|
||||
"disclaimer": "...",
|
||||
"terms": "...",
|
||||
"license": "...",
|
||||
"last_updated": "2024-01-15",
|
||||
"results": {
|
||||
"skip": 0,
|
||||
"limit": 10,
|
||||
"total": 15234
|
||||
}
|
||||
},
|
||||
"results": [
|
||||
{
|
||||
// Individual result record
|
||||
},
|
||||
{
|
||||
// Another result record
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Response Fields
|
||||
|
||||
- **meta**: Metadata about the query and results
|
||||
- `disclaimer`: Important legal disclaimer
|
||||
- `terms`: Terms of use URL
|
||||
- `license`: Data license information
|
||||
- `last_updated`: When data was last updated
|
||||
- `results.skip`: Number of skipped results
|
||||
- `results.limit`: Maximum results per page
|
||||
- `results.total`: Total matching results (may be approximate for large result sets)
|
||||
|
||||
- **results**: Array of matching records
|
||||
|
||||
### Empty Results
|
||||
|
||||
When no results match:
|
||||
```json
|
||||
{
|
||||
"meta": {...},
|
||||
"results": []
|
||||
}
|
||||
```
|
||||
|
||||
### Error Response
|
||||
|
||||
When an error occurs:
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"code": "INVALID_QUERY",
|
||||
"message": "Detailed error message"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Common Error Codes**:
|
||||
- `NOT_FOUND`: No results found (404)
|
||||
- `INVALID_QUERY`: Malformed search query (400)
|
||||
- `RATE_LIMIT_EXCEEDED`: Too many requests (429)
|
||||
- `UNAUTHORIZED`: Invalid API key (401)
|
||||
- `SERVER_ERROR`: Internal server error (500)
|
||||
|
||||
## Advanced Techniques
|
||||
|
||||
### Nested Field Queries
|
||||
|
||||
Query nested objects:
|
||||
```python
|
||||
# Drug adverse events where serious outcome is death
|
||||
params = {
|
||||
"search": "serious:1+AND+seriousnessdeath:1"
|
||||
}
|
||||
```
|
||||
|
||||
### Multiple Field Search
|
||||
|
||||
Search across multiple fields:
|
||||
```python
|
||||
# Search drug name in multiple fields
|
||||
params = {
|
||||
"search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.openfda.brand_name:aspirin)"
|
||||
}
|
||||
```
|
||||
|
||||
### Complex Boolean Logic
|
||||
|
||||
Combine multiple operators:
|
||||
```python
|
||||
# (Aspirin OR Ibuprofen) AND (Heart Attack) AND NOT (US)
|
||||
params = {
|
||||
"search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen)+AND+patient.reaction.reactionmeddrapt:*heart*attack*+AND+NOT+occurcountry:us"
|
||||
}
|
||||
```
|
||||
|
||||
### Counting with Filters
|
||||
|
||||
Count within a specific subset:
|
||||
```python
|
||||
# Count reactions for serious events only
|
||||
params = {
|
||||
"search": "serious:1",
|
||||
"count": "patient.reaction.reactionmeddrapt.exact"
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Query Efficiency
|
||||
|
||||
**DO**:
|
||||
- Use specific field searches
|
||||
- Filter before counting
|
||||
- Use exact match when possible
|
||||
- Implement pagination for large datasets
|
||||
|
||||
**DON'T**:
|
||||
- Use overly broad wildcards (e.g., `search=*`)
|
||||
- Request more data than needed
|
||||
- Skip error handling
|
||||
- Ignore rate limits
|
||||
|
||||
### 2. Error Handling
|
||||
|
||||
Always handle common errors:
|
||||
```python
|
||||
def safe_api_call(url, params):
|
||||
"""Safely call FDA API with comprehensive error handling."""
|
||||
try:
|
||||
response = requests.get(url, params=params, timeout=30)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except requests.exceptions.HTTPError as e:
|
||||
if response.status_code == 404:
|
||||
return {"error": "No results found"}
|
||||
elif response.status_code == 429:
|
||||
return {"error": "Rate limit exceeded"}
|
||||
elif response.status_code == 400:
|
||||
return {"error": "Invalid query"}
|
||||
else:
|
||||
return {"error": f"HTTP error: {e}"}
|
||||
except requests.exceptions.ConnectionError:
|
||||
return {"error": "Connection failed"}
|
||||
except requests.exceptions.Timeout:
|
||||
return {"error": "Request timeout"}
|
||||
except requests.exceptions.RequestException as e:
|
||||
return {"error": f"Request error: {e}"}
|
||||
```
|
||||
|
||||
### 3. Data Validation
|
||||
|
||||
Validate and clean data:
|
||||
```python
|
||||
def clean_search_term(term):
|
||||
"""Clean and prepare search term."""
|
||||
# Remove special characters that break queries
|
||||
term = term.replace('"', '\\"') # Escape quotes
|
||||
term = term.strip()
|
||||
return term
|
||||
|
||||
def validate_date(date_str):
|
||||
"""Validate date format (YYYYMMDD)."""
|
||||
import re
|
||||
if not re.match(r'^\d{8}$', date_str):
|
||||
raise ValueError("Date must be in YYYYMMDD format")
|
||||
return date_str
|
||||
```
|
||||
|
||||
### 4. Caching
|
||||
|
||||
Implement caching for frequently accessed data:
|
||||
```python
|
||||
import json
|
||||
from pathlib import Path
|
||||
import hashlib
|
||||
import time
|
||||
|
||||
class FDACache:
|
||||
"""Simple file-based cache for FDA API responses."""
|
||||
|
||||
def __init__(self, cache_dir="fda_cache", ttl=3600):
|
||||
self.cache_dir = Path(cache_dir)
|
||||
self.cache_dir.mkdir(exist_ok=True)
|
||||
self.ttl = ttl # Time to live in seconds
|
||||
|
||||
def _get_cache_key(self, url, params):
|
||||
"""Generate cache key from URL and params."""
|
||||
cache_string = f"{url}_{json.dumps(params, sort_keys=True)}"
|
||||
return hashlib.md5(cache_string.encode()).hexdigest()
|
||||
|
||||
def get(self, url, params):
|
||||
"""Get cached response if available and not expired."""
|
||||
key = self._get_cache_key(url, params)
|
||||
cache_file = self.cache_dir / f"{key}.json"
|
||||
|
||||
if cache_file.exists():
|
||||
# Check if expired
|
||||
age = time.time() - cache_file.stat().st_mtime
|
||||
if age < self.ttl:
|
||||
with open(cache_file, 'r') as f:
|
||||
return json.load(f)
|
||||
|
||||
return None
|
||||
|
||||
def set(self, url, params, data):
|
||||
"""Cache response data."""
|
||||
key = self._get_cache_key(url, params)
|
||||
cache_file = self.cache_dir / f"{key}.json"
|
||||
|
||||
with open(cache_file, 'w') as f:
|
||||
json.dump(data, f)
|
||||
|
||||
# Usage
|
||||
cache = FDACache(ttl=3600) # 1 hour cache
|
||||
|
||||
def cached_api_call(url, params):
|
||||
"""API call with caching."""
|
||||
# Check cache
|
||||
cached = cache.get(url, params)
|
||||
if cached:
|
||||
return cached
|
||||
|
||||
# Make request
|
||||
response = requests.get(url, params=params)
|
||||
data = response.json()
|
||||
|
||||
# Cache result
|
||||
cache.set(url, params, data)
|
||||
|
||||
return data
|
||||
```
|
||||
|
||||
### 5. Rate Limit Management
|
||||
|
||||
Track and respect rate limits:
|
||||
```python
|
||||
import time
|
||||
from collections import deque
|
||||
|
||||
class RateLimiter:
|
||||
"""Track and enforce rate limits."""
|
||||
|
||||
def __init__(self, max_per_minute=240):
|
||||
self.max_per_minute = max_per_minute
|
||||
self.requests = deque()
|
||||
|
||||
def wait_if_needed(self):
|
||||
"""Wait if necessary to stay under rate limit."""
|
||||
now = time.time()
|
||||
|
||||
# Remove requests older than 1 minute
|
||||
while self.requests and now - self.requests[0] > 60:
|
||||
self.requests.popleft()
|
||||
|
||||
# Check if at limit
|
||||
if len(self.requests) >= self.max_per_minute:
|
||||
sleep_time = 60 - (now - self.requests[0])
|
||||
if sleep_time > 0:
|
||||
time.sleep(sleep_time)
|
||||
self.requests.popleft()
|
||||
|
||||
self.requests.append(time.time())
|
||||
|
||||
# Usage
|
||||
rate_limiter = RateLimiter(max_per_minute=240)
|
||||
|
||||
def rate_limited_request(url, params):
|
||||
"""Make request with rate limiting."""
|
||||
rate_limiter.wait_if_needed()
|
||||
return requests.get(url, params=params)
|
||||
```
|
||||
|
||||
## Common Query Patterns
|
||||
|
||||
### Pattern 1: Time-based Analysis
|
||||
```python
|
||||
# Get events from last 30 days
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
end_date = datetime.now()
|
||||
start_date = end_date - timedelta(days=30)
|
||||
|
||||
params = {
|
||||
"search": f"receivedate:[{start_date.strftime('%Y%m%d')}+TO+{end_date.strftime('%Y%m%d')}]",
|
||||
"limit": 1000
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: Top N Analysis
|
||||
```python
|
||||
# Get top 10 most common reactions for a drug
|
||||
params = {
|
||||
"search": "patient.drug.medicinalproduct:aspirin",
|
||||
"count": "patient.reaction.reactionmeddrapt.exact",
|
||||
"limit": 10
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: Comparative Analysis
|
||||
```python
|
||||
# Compare two drugs
|
||||
drugs = ["aspirin", "ibuprofen"]
|
||||
results = {}
|
||||
|
||||
for drug in drugs:
|
||||
params = {
|
||||
"search": f"patient.drug.medicinalproduct:{drug}",
|
||||
"count": "patient.reaction.reactionmeddrapt.exact",
|
||||
"limit": 10
|
||||
}
|
||||
results[drug] = requests.get(url, params=params).json()
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **openFDA Homepage**: https://open.fda.gov/
|
||||
- **API Documentation**: https://open.fda.gov/apis/
|
||||
- **Interactive API Explorer**: https://open.fda.gov/apis/try-the-api/
|
||||
- **Terms of Service**: https://open.fda.gov/terms/
|
||||
- **GitHub**: https://github.com/FDA/openfda
|
||||
- **Status Page**: Check for API outages and maintenance
|
||||
|
||||
## Support
|
||||
|
||||
For questions or issues:
|
||||
- **GitHub Issues**: https://github.com/FDA/openfda/issues
|
||||
- **Email**: open-fda@fda.hhs.gov
|
||||
- **Discussion Forum**: Check GitHub discussions
|
||||
Reference in New Issue
Block a user