16 KiB
OpenFDA API Basics
This reference provides comprehensive information about using the openFDA API, including authentication, rate limits, query syntax, and best practices.
Getting Started
Base URL
All openFDA API endpoints follow this structure:
https://api.fda.gov/{category}/{endpoint}.json
Examples:
https://api.fda.gov/drug/event.jsonhttps://api.fda.gov/device/510k.jsonhttps://api.fda.gov/food/enforcement.json
HTTPS Required
All requests must use HTTPS. HTTP requests are not accepted and will fail.
Authentication
API Key Registration
While openFDA can be used without an API key, registering for a free API key is strongly recommended for higher rate limits.
Registration: Visit https://open.fda.gov/apis/authentication/ to sign up
Benefits of API Key:
- Higher rate limits (240 req/min, 120,000 req/day)
- Better for production applications
- No additional cost
Using Your API Key
Include your API key in requests using one of two methods:
Method 1: Query Parameter (Recommended)
import requests
api_key = "YOUR_API_KEY_HERE"
url = "https://api.fda.gov/drug/event.json"
params = {
"api_key": api_key,
"search": "patient.drug.medicinalproduct:aspirin",
"limit": 10
}
response = requests.get(url, params=params)
Method 2: Basic Authentication
import requests
api_key = "YOUR_API_KEY_HERE"
url = "https://api.fda.gov/drug/event.json"
params = {
"search": "patient.drug.medicinalproduct:aspirin",
"limit": 10
}
response = requests.get(url, params=params, auth=(api_key, ''))
Rate Limits
Current Limits
| Status | Requests per Minute | Requests per Day |
|---|---|---|
| Without API Key | 240 per IP address | 1,000 per IP address |
| With API Key | 240 per key | 120,000 per key |
Rate Limit Headers
The API returns rate limit information in response headers:
response = requests.get(url, params=params)
print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}")
Handling Rate Limits
When you exceed rate limits, the API returns:
- Status Code:
429 Too Many Requests - Error Message: Indicates rate limit exceeded
Best Practice: Implement exponential backoff:
import requests
import time
def query_with_rate_limit_handling(url, params, max_retries=3):
"""Query API with automatic rate limit handling."""
for attempt in range(max_retries):
try:
response = requests.get(url, params=params)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if response.status_code == 429:
# Rate limit exceeded
wait_time = (2 ** attempt) * 60 # Exponential backoff
print(f"Rate limit hit. Waiting {wait_time} seconds...")
time.sleep(wait_time)
else:
raise
raise Exception("Max retries exceeded")
Increasing Limits
For applications requiring higher limits, contact the openFDA team through their website with details about your use case.
Query Syntax
Basic Structure
Queries use this format:
?api_key=YOUR_KEY¶meter=value¶meter2=value2
Parameters are separated by ampersands (&).
Search Parameter
The search parameter is the primary way to filter results.
Basic Format:
search=field:value
Example:
params = {
"api_key": api_key,
"search": "patient.drug.medicinalproduct:aspirin"
}
Search Operators
AND Operator
Combines multiple conditions (both must be true):
# Find aspirin adverse events in Canada
params = {
"search": "patient.drug.medicinalproduct:aspirin+AND+occurcountry:ca"
}
OR Operator
Either condition can be true (OR is implicit with space):
# Find aspirin OR ibuprofen
params = {
"search": "patient.drug.medicinalproduct:(aspirin ibuprofen)"
}
Or explicitly:
params = {
"search": "patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen"
}
NOT Operator
Exclude results:
# Events NOT in the United States
params = {
"search": "_exists_:occurcountry+AND+NOT+occurcountry:us"
}
Wildcards
Use asterisk (*) for partial matching:
# Any drug starting with "met"
params = {
"search": "patient.drug.medicinalproduct:met*"
}
# Any drug containing "cillin"
params = {
"search": "patient.drug.medicinalproduct:*cillin*"
}
Exact Phrase Matching
Use quotes for exact phrases:
params = {
"search": 'patient.reaction.reactionmeddrapt:"heart attack"'
}
Range Queries
Search within ranges:
# Date range (YYYYMMDD format)
params = {
"search": "receivedate:[20200101+TO+20201231]"
}
# Numeric range
params = {
"search": "patient.patientonsetage:[18+TO+65]"
}
# Open-ended ranges
params = {
"search": "patient.patientonsetage:[65+TO+*]" # 65 and older
}
Field Existence
Check if a field exists:
# Records that have a patient age
params = {
"search": "_exists_:patient.patientonsetage"
}
# Records missing patient age
params = {
"search": "_missing_:patient.patientonsetage"
}
Limit Parameter
Controls how many results to return (1-1000, default 1):
params = {
"search": "...",
"limit": 100
}
Maximum: 1000 results per request
Skip Parameter
For pagination, skip the first N results:
# Get results 101-200
params = {
"search": "...",
"limit": 100,
"skip": 100
}
Pagination Example:
def get_all_results(url, search_query, api_key, max_results=5000):
"""Retrieve results with pagination."""
all_results = []
skip = 0
limit = 100
while len(all_results) < max_results:
params = {
"api_key": api_key,
"search": search_query,
"limit": limit,
"skip": skip
}
response = requests.get(url, params=params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
break
all_results.extend(data["results"])
if len(data["results"]) < limit:
break # No more results
skip += limit
time.sleep(0.25) # Rate limiting courtesy
return all_results[:max_results]
Count Parameter
Aggregate and count results by a field (instead of returning individual records):
# Count events by country
params = {
"search": "patient.drug.medicinalproduct:aspirin",
"count": "occurcountry"
}
Response Format:
{
"results": [
{"term": "us", "count": 12543},
{"term": "ca", "count": 3421},
{"term": "gb", "count": 2156}
]
}
Exact Counting
Add .exact suffix for exact phrase counting (especially important for multi-word fields):
# Count exact reaction terms (not individual words)
params = {
"search": "patient.drug.medicinalproduct:aspirin",
"count": "patient.reaction.reactionmeddrapt.exact"
}
Without .exact: Counts individual words
With .exact: Counts complete phrases
Sort Parameter
Sort results by field:
# Sort by date, newest first
params = {
"search": "...",
"sort": "receivedate:desc"
}
# Sort by date, oldest first
params = {
"search": "...",
"sort": "receivedate:asc"
}
Response Format
Standard Response Structure
{
"meta": {
"disclaimer": "...",
"terms": "...",
"license": "...",
"last_updated": "2024-01-15",
"results": {
"skip": 0,
"limit": 10,
"total": 15234
}
},
"results": [
{
// Individual result record
},
{
// Another result record
}
]
}
Response Fields
-
meta: Metadata about the query and results
disclaimer: Important legal disclaimerterms: Terms of use URLlicense: Data license informationlast_updated: When data was last updatedresults.skip: Number of skipped resultsresults.limit: Maximum results per pageresults.total: Total matching results (may be approximate for large result sets)
-
results: Array of matching records
Empty Results
When no results match:
{
"meta": {...},
"results": []
}
Error Response
When an error occurs:
{
"error": {
"code": "INVALID_QUERY",
"message": "Detailed error message"
}
}
Common Error Codes:
NOT_FOUND: No results found (404)INVALID_QUERY: Malformed search query (400)RATE_LIMIT_EXCEEDED: Too many requests (429)UNAUTHORIZED: Invalid API key (401)SERVER_ERROR: Internal server error (500)
Advanced Techniques
Nested Field Queries
Query nested objects:
# Drug adverse events where serious outcome is death
params = {
"search": "serious:1+AND+seriousnessdeath:1"
}
Multiple Field Search
Search across multiple fields:
# Search drug name in multiple fields
params = {
"search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.openfda.brand_name:aspirin)"
}
Complex Boolean Logic
Combine multiple operators:
# (Aspirin OR Ibuprofen) AND (Heart Attack) AND NOT (US)
params = {
"search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen)+AND+patient.reaction.reactionmeddrapt:*heart*attack*+AND+NOT+occurcountry:us"
}
Counting with Filters
Count within a specific subset:
# Count reactions for serious events only
params = {
"search": "serious:1",
"count": "patient.reaction.reactionmeddrapt.exact"
}
Best Practices
1. Query Efficiency
DO:
- Use specific field searches
- Filter before counting
- Use exact match when possible
- Implement pagination for large datasets
DON'T:
- Use overly broad wildcards (e.g.,
search=*) - Request more data than needed
- Skip error handling
- Ignore rate limits
2. Error Handling
Always handle common errors:
def safe_api_call(url, params):
"""Safely call FDA API with comprehensive error handling."""
try:
response = requests.get(url, params=params, timeout=30)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if response.status_code == 404:
return {"error": "No results found"}
elif response.status_code == 429:
return {"error": "Rate limit exceeded"}
elif response.status_code == 400:
return {"error": "Invalid query"}
else:
return {"error": f"HTTP error: {e}"}
except requests.exceptions.ConnectionError:
return {"error": "Connection failed"}
except requests.exceptions.Timeout:
return {"error": "Request timeout"}
except requests.exceptions.RequestException as e:
return {"error": f"Request error: {e}"}
3. Data Validation
Validate and clean data:
def clean_search_term(term):
"""Clean and prepare search term."""
# Remove special characters that break queries
term = term.replace('"', '\\"') # Escape quotes
term = term.strip()
return term
def validate_date(date_str):
"""Validate date format (YYYYMMDD)."""
import re
if not re.match(r'^\d{8}$', date_str):
raise ValueError("Date must be in YYYYMMDD format")
return date_str
4. Caching
Implement caching for frequently accessed data:
import json
from pathlib import Path
import hashlib
import time
class FDACache:
"""Simple file-based cache for FDA API responses."""
def __init__(self, cache_dir="fda_cache", ttl=3600):
self.cache_dir = Path(cache_dir)
self.cache_dir.mkdir(exist_ok=True)
self.ttl = ttl # Time to live in seconds
def _get_cache_key(self, url, params):
"""Generate cache key from URL and params."""
cache_string = f"{url}_{json.dumps(params, sort_keys=True)}"
return hashlib.md5(cache_string.encode()).hexdigest()
def get(self, url, params):
"""Get cached response if available and not expired."""
key = self._get_cache_key(url, params)
cache_file = self.cache_dir / f"{key}.json"
if cache_file.exists():
# Check if expired
age = time.time() - cache_file.stat().st_mtime
if age < self.ttl:
with open(cache_file, 'r') as f:
return json.load(f)
return None
def set(self, url, params, data):
"""Cache response data."""
key = self._get_cache_key(url, params)
cache_file = self.cache_dir / f"{key}.json"
with open(cache_file, 'w') as f:
json.dump(data, f)
# Usage
cache = FDACache(ttl=3600) # 1 hour cache
def cached_api_call(url, params):
"""API call with caching."""
# Check cache
cached = cache.get(url, params)
if cached:
return cached
# Make request
response = requests.get(url, params=params)
data = response.json()
# Cache result
cache.set(url, params, data)
return data
5. Rate Limit Management
Track and respect rate limits:
import time
from collections import deque
class RateLimiter:
"""Track and enforce rate limits."""
def __init__(self, max_per_minute=240):
self.max_per_minute = max_per_minute
self.requests = deque()
def wait_if_needed(self):
"""Wait if necessary to stay under rate limit."""
now = time.time()
# Remove requests older than 1 minute
while self.requests and now - self.requests[0] > 60:
self.requests.popleft()
# Check if at limit
if len(self.requests) >= self.max_per_minute:
sleep_time = 60 - (now - self.requests[0])
if sleep_time > 0:
time.sleep(sleep_time)
self.requests.popleft()
self.requests.append(time.time())
# Usage
rate_limiter = RateLimiter(max_per_minute=240)
def rate_limited_request(url, params):
"""Make request with rate limiting."""
rate_limiter.wait_if_needed()
return requests.get(url, params=params)
Common Query Patterns
Pattern 1: Time-based Analysis
# Get events from last 30 days
from datetime import datetime, timedelta
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
params = {
"search": f"receivedate:[{start_date.strftime('%Y%m%d')}+TO+{end_date.strftime('%Y%m%d')}]",
"limit": 1000
}
Pattern 2: Top N Analysis
# Get top 10 most common reactions for a drug
params = {
"search": "patient.drug.medicinalproduct:aspirin",
"count": "patient.reaction.reactionmeddrapt.exact",
"limit": 10
}
Pattern 3: Comparative Analysis
# Compare two drugs
drugs = ["aspirin", "ibuprofen"]
results = {}
for drug in drugs:
params = {
"search": f"patient.drug.medicinalproduct:{drug}",
"count": "patient.reaction.reactionmeddrapt.exact",
"limit": 10
}
results[drug] = requests.get(url, params=params).json()
Additional Resources
- openFDA Homepage: https://open.fda.gov/
- API Documentation: https://open.fda.gov/apis/
- Interactive API Explorer: https://open.fda.gov/apis/try-the-api/
- Terms of Service: https://open.fda.gov/terms/
- GitHub: https://github.com/FDA/openfda
- Status Page: Check for API outages and maintenance
Support
For questions or issues:
- GitHub Issues: https://github.com/FDA/openfda/issues
- Email: open-fda@fda.hhs.gov
- Discussion Forum: Check GitHub discussions