# OpenFDA API Basics This reference provides comprehensive information about using the openFDA API, including authentication, rate limits, query syntax, and best practices. ## Getting Started ### Base URL All openFDA API endpoints follow this structure: ``` https://api.fda.gov/{category}/{endpoint}.json ``` Examples: - `https://api.fda.gov/drug/event.json` - `https://api.fda.gov/device/510k.json` - `https://api.fda.gov/food/enforcement.json` ### HTTPS Required **All requests must use HTTPS**. HTTP requests are not accepted and will fail. ## Authentication ### API Key Registration While openFDA can be used without an API key, registering for a free API key is strongly recommended for higher rate limits. **Registration**: Visit https://open.fda.gov/apis/authentication/ to sign up **Benefits of API Key**: - Higher rate limits (240 req/min, 120,000 req/day) - Better for production applications - No additional cost ### Using Your API Key Include your API key in requests using one of two methods: **Method 1: Query Parameter (Recommended)** ```python import requests api_key = "YOUR_API_KEY_HERE" url = "https://api.fda.gov/drug/event.json" params = { "api_key": api_key, "search": "patient.drug.medicinalproduct:aspirin", "limit": 10 } response = requests.get(url, params=params) ``` **Method 2: Basic Authentication** ```python import requests api_key = "YOUR_API_KEY_HERE" url = "https://api.fda.gov/drug/event.json" params = { "search": "patient.drug.medicinalproduct:aspirin", "limit": 10 } response = requests.get(url, params=params, auth=(api_key, '')) ``` ## Rate Limits ### Current Limits | Status | Requests per Minute | Requests per Day | |--------|-------------------|------------------| | **Without API Key** | 240 per IP address | 1,000 per IP address | | **With API Key** | 240 per key | 120,000 per key | ### Rate Limit Headers The API returns rate limit information in response headers: ```python response = requests.get(url, params=params) print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}") print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}") print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}") ``` ### Handling Rate Limits When you exceed rate limits, the API returns: - **Status Code**: `429 Too Many Requests` - **Error Message**: Indicates rate limit exceeded **Best Practice**: Implement exponential backoff: ```python import requests import time def query_with_rate_limit_handling(url, params, max_retries=3): """Query API with automatic rate limit handling.""" for attempt in range(max_retries): try: response = requests.get(url, params=params) response.raise_for_status() return response.json() except requests.exceptions.HTTPError as e: if response.status_code == 429: # Rate limit exceeded wait_time = (2 ** attempt) * 60 # Exponential backoff print(f"Rate limit hit. Waiting {wait_time} seconds...") time.sleep(wait_time) else: raise raise Exception("Max retries exceeded") ``` ### Increasing Limits For applications requiring higher limits, contact the openFDA team through their website with details about your use case. ## Query Syntax ### Basic Structure Queries use this format: ``` ?api_key=YOUR_KEY¶meter=value¶meter2=value2 ``` Parameters are separated by ampersands (`&`). ### Search Parameter The `search` parameter is the primary way to filter results. **Basic Format**: ``` search=field:value ``` **Example**: ```python params = { "api_key": api_key, "search": "patient.drug.medicinalproduct:aspirin" } ``` ### Search Operators #### AND Operator Combines multiple conditions (both must be true): ```python # Find aspirin adverse events in Canada params = { "search": "patient.drug.medicinalproduct:aspirin+AND+occurcountry:ca" } ``` #### OR Operator Either condition can be true (OR is implicit with space): ```python # Find aspirin OR ibuprofen params = { "search": "patient.drug.medicinalproduct:(aspirin ibuprofen)" } ``` Or explicitly: ```python params = { "search": "patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen" } ``` #### NOT Operator Exclude results: ```python # Events NOT in the United States params = { "search": "_exists_:occurcountry+AND+NOT+occurcountry:us" } ``` #### Wildcards Use asterisk (`*`) for partial matching: ```python # Any drug starting with "met" params = { "search": "patient.drug.medicinalproduct:met*" } # Any drug containing "cillin" params = { "search": "patient.drug.medicinalproduct:*cillin*" } ``` #### Exact Phrase Matching Use quotes for exact phrases: ```python params = { "search": 'patient.reaction.reactionmeddrapt:"heart attack"' } ``` #### Range Queries Search within ranges: ```python # Date range (YYYYMMDD format) params = { "search": "receivedate:[20200101+TO+20201231]" } # Numeric range params = { "search": "patient.patientonsetage:[18+TO+65]" } # Open-ended ranges params = { "search": "patient.patientonsetage:[65+TO+*]" # 65 and older } ``` #### Field Existence Check if a field exists: ```python # Records that have a patient age params = { "search": "_exists_:patient.patientonsetage" } # Records missing patient age params = { "search": "_missing_:patient.patientonsetage" } ``` ### Limit Parameter Controls how many results to return (1-1000, default 1): ```python params = { "search": "...", "limit": 100 } ``` **Maximum**: 1000 results per request ### Skip Parameter For pagination, skip the first N results: ```python # Get results 101-200 params = { "search": "...", "limit": 100, "skip": 100 } ``` **Pagination Example**: ```python def get_all_results(url, search_query, api_key, max_results=5000): """Retrieve results with pagination.""" all_results = [] skip = 0 limit = 100 while len(all_results) < max_results: params = { "api_key": api_key, "search": search_query, "limit": limit, "skip": skip } response = requests.get(url, params=params) data = response.json() if "results" not in data or len(data["results"]) == 0: break all_results.extend(data["results"]) if len(data["results"]) < limit: break # No more results skip += limit time.sleep(0.25) # Rate limiting courtesy return all_results[:max_results] ``` ### Count Parameter Aggregate and count results by a field (instead of returning individual records): ```python # Count events by country params = { "search": "patient.drug.medicinalproduct:aspirin", "count": "occurcountry" } ``` **Response Format**: ```json { "results": [ {"term": "us", "count": 12543}, {"term": "ca", "count": 3421}, {"term": "gb", "count": 2156} ] } ``` #### Exact Counting Add `.exact` suffix for exact phrase counting (especially important for multi-word fields): ```python # Count exact reaction terms (not individual words) params = { "search": "patient.drug.medicinalproduct:aspirin", "count": "patient.reaction.reactionmeddrapt.exact" } ``` **Without `.exact`**: Counts individual words **With `.exact`**: Counts complete phrases ### Sort Parameter Sort results by field: ```python # Sort by date, newest first params = { "search": "...", "sort": "receivedate:desc" } # Sort by date, oldest first params = { "search": "...", "sort": "receivedate:asc" } ``` ## Response Format ### Standard Response Structure ```json { "meta": { "disclaimer": "...", "terms": "...", "license": "...", "last_updated": "2024-01-15", "results": { "skip": 0, "limit": 10, "total": 15234 } }, "results": [ { // Individual result record }, { // Another result record } ] } ``` ### Response Fields - **meta**: Metadata about the query and results - `disclaimer`: Important legal disclaimer - `terms`: Terms of use URL - `license`: Data license information - `last_updated`: When data was last updated - `results.skip`: Number of skipped results - `results.limit`: Maximum results per page - `results.total`: Total matching results (may be approximate for large result sets) - **results**: Array of matching records ### Empty Results When no results match: ```json { "meta": {...}, "results": [] } ``` ### Error Response When an error occurs: ```json { "error": { "code": "INVALID_QUERY", "message": "Detailed error message" } } ``` **Common Error Codes**: - `NOT_FOUND`: No results found (404) - `INVALID_QUERY`: Malformed search query (400) - `RATE_LIMIT_EXCEEDED`: Too many requests (429) - `UNAUTHORIZED`: Invalid API key (401) - `SERVER_ERROR`: Internal server error (500) ## Advanced Techniques ### Nested Field Queries Query nested objects: ```python # Drug adverse events where serious outcome is death params = { "search": "serious:1+AND+seriousnessdeath:1" } ``` ### Multiple Field Search Search across multiple fields: ```python # Search drug name in multiple fields params = { "search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.openfda.brand_name:aspirin)" } ``` ### Complex Boolean Logic Combine multiple operators: ```python # (Aspirin OR Ibuprofen) AND (Heart Attack) AND NOT (US) params = { "search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen)+AND+patient.reaction.reactionmeddrapt:*heart*attack*+AND+NOT+occurcountry:us" } ``` ### Counting with Filters Count within a specific subset: ```python # Count reactions for serious events only params = { "search": "serious:1", "count": "patient.reaction.reactionmeddrapt.exact" } ``` ## Best Practices ### 1. Query Efficiency **DO**: - Use specific field searches - Filter before counting - Use exact match when possible - Implement pagination for large datasets **DON'T**: - Use overly broad wildcards (e.g., `search=*`) - Request more data than needed - Skip error handling - Ignore rate limits ### 2. Error Handling Always handle common errors: ```python def safe_api_call(url, params): """Safely call FDA API with comprehensive error handling.""" try: response = requests.get(url, params=params, timeout=30) response.raise_for_status() return response.json() except requests.exceptions.HTTPError as e: if response.status_code == 404: return {"error": "No results found"} elif response.status_code == 429: return {"error": "Rate limit exceeded"} elif response.status_code == 400: return {"error": "Invalid query"} else: return {"error": f"HTTP error: {e}"} except requests.exceptions.ConnectionError: return {"error": "Connection failed"} except requests.exceptions.Timeout: return {"error": "Request timeout"} except requests.exceptions.RequestException as e: return {"error": f"Request error: {e}"} ``` ### 3. Data Validation Validate and clean data: ```python def clean_search_term(term): """Clean and prepare search term.""" # Remove special characters that break queries term = term.replace('"', '\\"') # Escape quotes term = term.strip() return term def validate_date(date_str): """Validate date format (YYYYMMDD).""" import re if not re.match(r'^\d{8}$', date_str): raise ValueError("Date must be in YYYYMMDD format") return date_str ``` ### 4. Caching Implement caching for frequently accessed data: ```python import json from pathlib import Path import hashlib import time class FDACache: """Simple file-based cache for FDA API responses.""" def __init__(self, cache_dir="fda_cache", ttl=3600): self.cache_dir = Path(cache_dir) self.cache_dir.mkdir(exist_ok=True) self.ttl = ttl # Time to live in seconds def _get_cache_key(self, url, params): """Generate cache key from URL and params.""" cache_string = f"{url}_{json.dumps(params, sort_keys=True)}" return hashlib.md5(cache_string.encode()).hexdigest() def get(self, url, params): """Get cached response if available and not expired.""" key = self._get_cache_key(url, params) cache_file = self.cache_dir / f"{key}.json" if cache_file.exists(): # Check if expired age = time.time() - cache_file.stat().st_mtime if age < self.ttl: with open(cache_file, 'r') as f: return json.load(f) return None def set(self, url, params, data): """Cache response data.""" key = self._get_cache_key(url, params) cache_file = self.cache_dir / f"{key}.json" with open(cache_file, 'w') as f: json.dump(data, f) # Usage cache = FDACache(ttl=3600) # 1 hour cache def cached_api_call(url, params): """API call with caching.""" # Check cache cached = cache.get(url, params) if cached: return cached # Make request response = requests.get(url, params=params) data = response.json() # Cache result cache.set(url, params, data) return data ``` ### 5. Rate Limit Management Track and respect rate limits: ```python import time from collections import deque class RateLimiter: """Track and enforce rate limits.""" def __init__(self, max_per_minute=240): self.max_per_minute = max_per_minute self.requests = deque() def wait_if_needed(self): """Wait if necessary to stay under rate limit.""" now = time.time() # Remove requests older than 1 minute while self.requests and now - self.requests[0] > 60: self.requests.popleft() # Check if at limit if len(self.requests) >= self.max_per_minute: sleep_time = 60 - (now - self.requests[0]) if sleep_time > 0: time.sleep(sleep_time) self.requests.popleft() self.requests.append(time.time()) # Usage rate_limiter = RateLimiter(max_per_minute=240) def rate_limited_request(url, params): """Make request with rate limiting.""" rate_limiter.wait_if_needed() return requests.get(url, params=params) ``` ## Common Query Patterns ### Pattern 1: Time-based Analysis ```python # Get events from last 30 days from datetime import datetime, timedelta end_date = datetime.now() start_date = end_date - timedelta(days=30) params = { "search": f"receivedate:[{start_date.strftime('%Y%m%d')}+TO+{end_date.strftime('%Y%m%d')}]", "limit": 1000 } ``` ### Pattern 2: Top N Analysis ```python # Get top 10 most common reactions for a drug params = { "search": "patient.drug.medicinalproduct:aspirin", "count": "patient.reaction.reactionmeddrapt.exact", "limit": 10 } ``` ### Pattern 3: Comparative Analysis ```python # Compare two drugs drugs = ["aspirin", "ibuprofen"] results = {} for drug in drugs: params = { "search": f"patient.drug.medicinalproduct:{drug}", "count": "patient.reaction.reactionmeddrapt.exact", "limit": 10 } results[drug] = requests.get(url, params=params).json() ``` ## Additional Resources - **openFDA Homepage**: https://open.fda.gov/ - **API Documentation**: https://open.fda.gov/apis/ - **Interactive API Explorer**: https://open.fda.gov/apis/try-the-api/ - **Terms of Service**: https://open.fda.gov/terms/ - **GitHub**: https://github.com/FDA/openfda - **Status Page**: Check for API outages and maintenance ## Support For questions or issues: - **GitHub Issues**: https://github.com/FDA/openfda/issues - **Email**: open-fda@fda.hhs.gov - **Discussion Forum**: Check GitHub discussions