Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions

View File

@@ -0,0 +1,687 @@
# OpenFDA API Basics
This reference provides comprehensive information about using the openFDA API, including authentication, rate limits, query syntax, and best practices.
## Getting Started
### Base URL
All openFDA API endpoints follow this structure:
```
https://api.fda.gov/{category}/{endpoint}.json
```
Examples:
- `https://api.fda.gov/drug/event.json`
- `https://api.fda.gov/device/510k.json`
- `https://api.fda.gov/food/enforcement.json`
### HTTPS Required
**All requests must use HTTPS**. HTTP requests are not accepted and will fail.
## Authentication
### API Key Registration
While openFDA can be used without an API key, registering for a free API key is strongly recommended for higher rate limits.
**Registration**: Visit https://open.fda.gov/apis/authentication/ to sign up
**Benefits of API Key**:
- Higher rate limits (240 req/min, 120,000 req/day)
- Better for production applications
- No additional cost
### Using Your API Key
Include your API key in requests using one of two methods:
**Method 1: Query Parameter (Recommended)**
```python
import requests
api_key = "YOUR_API_KEY_HERE"
url = "https://api.fda.gov/drug/event.json"
params = {
"api_key": api_key,
"search": "patient.drug.medicinalproduct:aspirin",
"limit": 10
}
response = requests.get(url, params=params)
```
**Method 2: Basic Authentication**
```python
import requests
api_key = "YOUR_API_KEY_HERE"
url = "https://api.fda.gov/drug/event.json"
params = {
"search": "patient.drug.medicinalproduct:aspirin",
"limit": 10
}
response = requests.get(url, params=params, auth=(api_key, ''))
```
## Rate Limits
### Current Limits
| Status | Requests per Minute | Requests per Day |
|--------|-------------------|------------------|
| **Without API Key** | 240 per IP address | 1,000 per IP address |
| **With API Key** | 240 per key | 120,000 per key |
### Rate Limit Headers
The API returns rate limit information in response headers:
```python
response = requests.get(url, params=params)
print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}")
```
### Handling Rate Limits
When you exceed rate limits, the API returns:
- **Status Code**: `429 Too Many Requests`
- **Error Message**: Indicates rate limit exceeded
**Best Practice**: Implement exponential backoff:
```python
import requests
import time
def query_with_rate_limit_handling(url, params, max_retries=3):
"""Query API with automatic rate limit handling."""
for attempt in range(max_retries):
try:
response = requests.get(url, params=params)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if response.status_code == 429:
# Rate limit exceeded
wait_time = (2 ** attempt) * 60 # Exponential backoff
print(f"Rate limit hit. Waiting {wait_time} seconds...")
time.sleep(wait_time)
else:
raise
raise Exception("Max retries exceeded")
```
### Increasing Limits
For applications requiring higher limits, contact the openFDA team through their website with details about your use case.
## Query Syntax
### Basic Structure
Queries use this format:
```
?api_key=YOUR_KEY&parameter=value&parameter2=value2
```
Parameters are separated by ampersands (`&`).
### Search Parameter
The `search` parameter is the primary way to filter results.
**Basic Format**:
```
search=field:value
```
**Example**:
```python
params = {
"api_key": api_key,
"search": "patient.drug.medicinalproduct:aspirin"
}
```
### Search Operators
#### AND Operator
Combines multiple conditions (both must be true):
```python
# Find aspirin adverse events in Canada
params = {
"search": "patient.drug.medicinalproduct:aspirin+AND+occurcountry:ca"
}
```
#### OR Operator
Either condition can be true (OR is implicit with space):
```python
# Find aspirin OR ibuprofen
params = {
"search": "patient.drug.medicinalproduct:(aspirin ibuprofen)"
}
```
Or explicitly:
```python
params = {
"search": "patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen"
}
```
#### NOT Operator
Exclude results:
```python
# Events NOT in the United States
params = {
"search": "_exists_:occurcountry+AND+NOT+occurcountry:us"
}
```
#### Wildcards
Use asterisk (`*`) for partial matching:
```python
# Any drug starting with "met"
params = {
"search": "patient.drug.medicinalproduct:met*"
}
# Any drug containing "cillin"
params = {
"search": "patient.drug.medicinalproduct:*cillin*"
}
```
#### Exact Phrase Matching
Use quotes for exact phrases:
```python
params = {
"search": 'patient.reaction.reactionmeddrapt:"heart attack"'
}
```
#### Range Queries
Search within ranges:
```python
# Date range (YYYYMMDD format)
params = {
"search": "receivedate:[20200101+TO+20201231]"
}
# Numeric range
params = {
"search": "patient.patientonsetage:[18+TO+65]"
}
# Open-ended ranges
params = {
"search": "patient.patientonsetage:[65+TO+*]" # 65 and older
}
```
#### Field Existence
Check if a field exists:
```python
# Records that have a patient age
params = {
"search": "_exists_:patient.patientonsetage"
}
# Records missing patient age
params = {
"search": "_missing_:patient.patientonsetage"
}
```
### Limit Parameter
Controls how many results to return (1-1000, default 1):
```python
params = {
"search": "...",
"limit": 100
}
```
**Maximum**: 1000 results per request
### Skip Parameter
For pagination, skip the first N results:
```python
# Get results 101-200
params = {
"search": "...",
"limit": 100,
"skip": 100
}
```
**Pagination Example**:
```python
def get_all_results(url, search_query, api_key, max_results=5000):
"""Retrieve results with pagination."""
all_results = []
skip = 0
limit = 100
while len(all_results) < max_results:
params = {
"api_key": api_key,
"search": search_query,
"limit": limit,
"skip": skip
}
response = requests.get(url, params=params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
break
all_results.extend(data["results"])
if len(data["results"]) < limit:
break # No more results
skip += limit
time.sleep(0.25) # Rate limiting courtesy
return all_results[:max_results]
```
### Count Parameter
Aggregate and count results by a field (instead of returning individual records):
```python
# Count events by country
params = {
"search": "patient.drug.medicinalproduct:aspirin",
"count": "occurcountry"
}
```
**Response Format**:
```json
{
"results": [
{"term": "us", "count": 12543},
{"term": "ca", "count": 3421},
{"term": "gb", "count": 2156}
]
}
```
#### Exact Counting
Add `.exact` suffix for exact phrase counting (especially important for multi-word fields):
```python
# Count exact reaction terms (not individual words)
params = {
"search": "patient.drug.medicinalproduct:aspirin",
"count": "patient.reaction.reactionmeddrapt.exact"
}
```
**Without `.exact`**: Counts individual words
**With `.exact`**: Counts complete phrases
### Sort Parameter
Sort results by field:
```python
# Sort by date, newest first
params = {
"search": "...",
"sort": "receivedate:desc"
}
# Sort by date, oldest first
params = {
"search": "...",
"sort": "receivedate:asc"
}
```
## Response Format
### Standard Response Structure
```json
{
"meta": {
"disclaimer": "...",
"terms": "...",
"license": "...",
"last_updated": "2024-01-15",
"results": {
"skip": 0,
"limit": 10,
"total": 15234
}
},
"results": [
{
// Individual result record
},
{
// Another result record
}
]
}
```
### Response Fields
- **meta**: Metadata about the query and results
- `disclaimer`: Important legal disclaimer
- `terms`: Terms of use URL
- `license`: Data license information
- `last_updated`: When data was last updated
- `results.skip`: Number of skipped results
- `results.limit`: Maximum results per page
- `results.total`: Total matching results (may be approximate for large result sets)
- **results**: Array of matching records
### Empty Results
When no results match:
```json
{
"meta": {...},
"results": []
}
```
### Error Response
When an error occurs:
```json
{
"error": {
"code": "INVALID_QUERY",
"message": "Detailed error message"
}
}
```
**Common Error Codes**:
- `NOT_FOUND`: No results found (404)
- `INVALID_QUERY`: Malformed search query (400)
- `RATE_LIMIT_EXCEEDED`: Too many requests (429)
- `UNAUTHORIZED`: Invalid API key (401)
- `SERVER_ERROR`: Internal server error (500)
## Advanced Techniques
### Nested Field Queries
Query nested objects:
```python
# Drug adverse events where serious outcome is death
params = {
"search": "serious:1+AND+seriousnessdeath:1"
}
```
### Multiple Field Search
Search across multiple fields:
```python
# Search drug name in multiple fields
params = {
"search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.openfda.brand_name:aspirin)"
}
```
### Complex Boolean Logic
Combine multiple operators:
```python
# (Aspirin OR Ibuprofen) AND (Heart Attack) AND NOT (US)
params = {
"search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen)+AND+patient.reaction.reactionmeddrapt:*heart*attack*+AND+NOT+occurcountry:us"
}
```
### Counting with Filters
Count within a specific subset:
```python
# Count reactions for serious events only
params = {
"search": "serious:1",
"count": "patient.reaction.reactionmeddrapt.exact"
}
```
## Best Practices
### 1. Query Efficiency
**DO**:
- Use specific field searches
- Filter before counting
- Use exact match when possible
- Implement pagination for large datasets
**DON'T**:
- Use overly broad wildcards (e.g., `search=*`)
- Request more data than needed
- Skip error handling
- Ignore rate limits
### 2. Error Handling
Always handle common errors:
```python
def safe_api_call(url, params):
"""Safely call FDA API with comprehensive error handling."""
try:
response = requests.get(url, params=params, timeout=30)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if response.status_code == 404:
return {"error": "No results found"}
elif response.status_code == 429:
return {"error": "Rate limit exceeded"}
elif response.status_code == 400:
return {"error": "Invalid query"}
else:
return {"error": f"HTTP error: {e}"}
except requests.exceptions.ConnectionError:
return {"error": "Connection failed"}
except requests.exceptions.Timeout:
return {"error": "Request timeout"}
except requests.exceptions.RequestException as e:
return {"error": f"Request error: {e}"}
```
### 3. Data Validation
Validate and clean data:
```python
def clean_search_term(term):
"""Clean and prepare search term."""
# Remove special characters that break queries
term = term.replace('"', '\\"') # Escape quotes
term = term.strip()
return term
def validate_date(date_str):
"""Validate date format (YYYYMMDD)."""
import re
if not re.match(r'^\d{8}$', date_str):
raise ValueError("Date must be in YYYYMMDD format")
return date_str
```
### 4. Caching
Implement caching for frequently accessed data:
```python
import json
from pathlib import Path
import hashlib
import time
class FDACache:
"""Simple file-based cache for FDA API responses."""
def __init__(self, cache_dir="fda_cache", ttl=3600):
self.cache_dir = Path(cache_dir)
self.cache_dir.mkdir(exist_ok=True)
self.ttl = ttl # Time to live in seconds
def _get_cache_key(self, url, params):
"""Generate cache key from URL and params."""
cache_string = f"{url}_{json.dumps(params, sort_keys=True)}"
return hashlib.md5(cache_string.encode()).hexdigest()
def get(self, url, params):
"""Get cached response if available and not expired."""
key = self._get_cache_key(url, params)
cache_file = self.cache_dir / f"{key}.json"
if cache_file.exists():
# Check if expired
age = time.time() - cache_file.stat().st_mtime
if age < self.ttl:
with open(cache_file, 'r') as f:
return json.load(f)
return None
def set(self, url, params, data):
"""Cache response data."""
key = self._get_cache_key(url, params)
cache_file = self.cache_dir / f"{key}.json"
with open(cache_file, 'w') as f:
json.dump(data, f)
# Usage
cache = FDACache(ttl=3600) # 1 hour cache
def cached_api_call(url, params):
"""API call with caching."""
# Check cache
cached = cache.get(url, params)
if cached:
return cached
# Make request
response = requests.get(url, params=params)
data = response.json()
# Cache result
cache.set(url, params, data)
return data
```
### 5. Rate Limit Management
Track and respect rate limits:
```python
import time
from collections import deque
class RateLimiter:
"""Track and enforce rate limits."""
def __init__(self, max_per_minute=240):
self.max_per_minute = max_per_minute
self.requests = deque()
def wait_if_needed(self):
"""Wait if necessary to stay under rate limit."""
now = time.time()
# Remove requests older than 1 minute
while self.requests and now - self.requests[0] > 60:
self.requests.popleft()
# Check if at limit
if len(self.requests) >= self.max_per_minute:
sleep_time = 60 - (now - self.requests[0])
if sleep_time > 0:
time.sleep(sleep_time)
self.requests.popleft()
self.requests.append(time.time())
# Usage
rate_limiter = RateLimiter(max_per_minute=240)
def rate_limited_request(url, params):
"""Make request with rate limiting."""
rate_limiter.wait_if_needed()
return requests.get(url, params=params)
```
## Common Query Patterns
### Pattern 1: Time-based Analysis
```python
# Get events from last 30 days
from datetime import datetime, timedelta
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
params = {
"search": f"receivedate:[{start_date.strftime('%Y%m%d')}+TO+{end_date.strftime('%Y%m%d')}]",
"limit": 1000
}
```
### Pattern 2: Top N Analysis
```python
# Get top 10 most common reactions for a drug
params = {
"search": "patient.drug.medicinalproduct:aspirin",
"count": "patient.reaction.reactionmeddrapt.exact",
"limit": 10
}
```
### Pattern 3: Comparative Analysis
```python
# Compare two drugs
drugs = ["aspirin", "ibuprofen"]
results = {}
for drug in drugs:
params = {
"search": f"patient.drug.medicinalproduct:{drug}",
"count": "patient.reaction.reactionmeddrapt.exact",
"limit": 10
}
results[drug] = requests.get(url, params=params).json()
```
## Additional Resources
- **openFDA Homepage**: https://open.fda.gov/
- **API Documentation**: https://open.fda.gov/apis/
- **Interactive API Explorer**: https://open.fda.gov/apis/try-the-api/
- **Terms of Service**: https://open.fda.gov/terms/
- **GitHub**: https://github.com/FDA/openfda
- **Status Page**: Check for API outages and maintenance
## Support
For questions or issues:
- **GitHub Issues**: https://github.com/FDA/openfda/issues
- **Email**: open-fda@fda.hhs.gov
- **Discussion Forum**: Check GitHub discussions