zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

16 KiB

Raw Permalink Blame History

OpenFDA API Basics

This reference provides comprehensive information about using the openFDA API, including authentication, rate limits, query syntax, and best practices.

Getting Started

Base URL

All openFDA API endpoints follow this structure:

https://api.fda.gov/{category}/{endpoint}.json

Examples:

https://api.fda.gov/drug/event.json
https://api.fda.gov/device/510k.json
https://api.fda.gov/food/enforcement.json

HTTPS Required

All requests must use HTTPS. HTTP requests are not accepted and will fail.

Authentication

API Key Registration

While openFDA can be used without an API key, registering for a free API key is strongly recommended for higher rate limits.

Registration: Visit https://open.fda.gov/apis/authentication/ to sign up

Benefits of API Key:

Higher rate limits (240 req/min, 120,000 req/day)
Better for production applications
No additional cost

Using Your API Key

Include your API key in requests using one of two methods:

Method 1: Query Parameter (Recommended)

import requests

api_key = "YOUR_API_KEY_HERE"
url = "https://api.fda.gov/drug/event.json"

params = {
    "api_key": api_key,
    "search": "patient.drug.medicinalproduct:aspirin",
    "limit": 10
}

response = requests.get(url, params=params)

Method 2: Basic Authentication

import requests

api_key = "YOUR_API_KEY_HERE"
url = "https://api.fda.gov/drug/event.json"

params = {
    "search": "patient.drug.medicinalproduct:aspirin",
    "limit": 10
}

response = requests.get(url, params=params, auth=(api_key, ''))

Rate Limits

Current Limits

Status	Requests per Minute	Requests per Day
Without API Key	240 per IP address	1,000 per IP address
With API Key	240 per key	120,000 per key

Rate Limit Headers

The API returns rate limit information in response headers:

response = requests.get(url, params=params)

print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}")

Handling Rate Limits

When you exceed rate limits, the API returns:

Status Code: 429 Too Many Requests
Error Message: Indicates rate limit exceeded

Best Practice: Implement exponential backoff:

import requests
import time

def query_with_rate_limit_handling(url, params, max_retries=3):
    """Query API with automatic rate limit handling."""
    for attempt in range(max_retries):
        try:
            response = requests.get(url, params=params)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.HTTPError as e:
            if response.status_code == 429:
                # Rate limit exceeded
                wait_time = (2 ** attempt) * 60  # Exponential backoff
                print(f"Rate limit hit. Waiting {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                raise
    raise Exception("Max retries exceeded")

Increasing Limits

For applications requiring higher limits, contact the openFDA team through their website with details about your use case.

Query Syntax

Basic Structure

Queries use this format:

?api_key=YOUR_KEY&parameter=value&parameter2=value2

Parameters are separated by ampersands (&).

Search Parameter

The search parameter is the primary way to filter results.

Basic Format:

search=field:value

Example:

params = {
    "api_key": api_key,
    "search": "patient.drug.medicinalproduct:aspirin"
}

Search Operators

AND Operator

Combines multiple conditions (both must be true):

# Find aspirin adverse events in Canada
params = {
    "search": "patient.drug.medicinalproduct:aspirin+AND+occurcountry:ca"
}

OR Operator

Either condition can be true (OR is implicit with space):

# Find aspirin OR ibuprofen
params = {
    "search": "patient.drug.medicinalproduct:(aspirin ibuprofen)"
}

Or explicitly:

params = {
    "search": "patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen"
}

NOT Operator

Exclude results:

# Events NOT in the United States
params = {
    "search": "_exists_:occurcountry+AND+NOT+occurcountry:us"
}

Wildcards

Use asterisk (*) for partial matching:

# Any drug starting with "met"
params = {
    "search": "patient.drug.medicinalproduct:met*"
}

# Any drug containing "cillin"
params = {
    "search": "patient.drug.medicinalproduct:*cillin*"
}

Exact Phrase Matching

Use quotes for exact phrases:

params = {
    "search": 'patient.reaction.reactionmeddrapt:"heart attack"'
}

Range Queries

Search within ranges:

# Date range (YYYYMMDD format)
params = {
    "search": "receivedate:[20200101+TO+20201231]"
}

# Numeric range
params = {
    "search": "patient.patientonsetage:[18+TO+65]"
}

# Open-ended ranges
params = {
    "search": "patient.patientonsetage:[65+TO+*]"  # 65 and older
}

Field Existence

Check if a field exists:

# Records that have a patient age
params = {
    "search": "_exists_:patient.patientonsetage"
}

# Records missing patient age
params = {
    "search": "_missing_:patient.patientonsetage"
}

Limit Parameter

Controls how many results to return (1-1000, default 1):

params = {
    "search": "...",
    "limit": 100
}

Maximum: 1000 results per request

Skip Parameter

For pagination, skip the first N results:

# Get results 101-200
params = {
    "search": "...",
    "limit": 100,
    "skip": 100
}

Pagination Example:

def get_all_results(url, search_query, api_key, max_results=5000):
    """Retrieve results with pagination."""
    all_results = []
    skip = 0
    limit = 100

    while len(all_results) < max_results:
        params = {
            "api_key": api_key,
            "search": search_query,
            "limit": limit,
            "skip": skip
        }

        response = requests.get(url, params=params)
        data = response.json()

        if "results" not in data or len(data["results"]) == 0:
            break

        all_results.extend(data["results"])

        if len(data["results"]) < limit:
            break  # No more results

        skip += limit
        time.sleep(0.25)  # Rate limiting courtesy

    return all_results[:max_results]

Count Parameter

Aggregate and count results by a field (instead of returning individual records):

# Count events by country
params = {
    "search": "patient.drug.medicinalproduct:aspirin",
    "count": "occurcountry"
}

Response Format:

{
  "results": [
    {"term": "us", "count": 12543},
    {"term": "ca", "count": 3421},
    {"term": "gb", "count": 2156}
  ]
}

Exact Counting

Add .exact suffix for exact phrase counting (especially important for multi-word fields):

# Count exact reaction terms (not individual words)
params = {
    "search": "patient.drug.medicinalproduct:aspirin",
    "count": "patient.reaction.reactionmeddrapt.exact"
}

Without .exact: Counts individual words With .exact: Counts complete phrases

Sort Parameter

Sort results by field:

# Sort by date, newest first
params = {
    "search": "...",
    "sort": "receivedate:desc"
}

# Sort by date, oldest first
params = {
    "search": "...",
    "sort": "receivedate:asc"
}

Response Format

Standard Response Structure

{
  "meta": {
    "disclaimer": "...",
    "terms": "...",
    "license": "...",
    "last_updated": "2024-01-15",
    "results": {
      "skip": 0,
      "limit": 10,
      "total": 15234
    }
  },
  "results": [
    {
      // Individual result record
    },
    {
      // Another result record
    }
  ]
}

Response Fields

meta: Metadata about the query and results
- disclaimer: Important legal disclaimer
- terms: Terms of use URL
- license: Data license information
- last_updated: When data was last updated
- results.skip: Number of skipped results
- results.limit: Maximum results per page
- results.total: Total matching results (may be approximate for large result sets)
results: Array of matching records

Empty Results

When no results match:

{
  "meta": {...},
  "results": []
}

Error Response

When an error occurs:

{
  "error": {
    "code": "INVALID_QUERY",
    "message": "Detailed error message"
  }
}

Common Error Codes:

NOT_FOUND: No results found (404)
INVALID_QUERY: Malformed search query (400)
RATE_LIMIT_EXCEEDED: Too many requests (429)
UNAUTHORIZED: Invalid API key (401)
SERVER_ERROR: Internal server error (500)

Advanced Techniques

Nested Field Queries

Query nested objects:

# Drug adverse events where serious outcome is death
params = {
    "search": "serious:1+AND+seriousnessdeath:1"
}

Multiple Field Search

Search across multiple fields:

# Search drug name in multiple fields
params = {
    "search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.openfda.brand_name:aspirin)"
}

Complex Boolean Logic

Combine multiple operators:

# (Aspirin OR Ibuprofen) AND (Heart Attack) AND NOT (US)
params = {
    "search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen)+AND+patient.reaction.reactionmeddrapt:*heart*attack*+AND+NOT+occurcountry:us"
}

Counting with Filters

Count within a specific subset:

# Count reactions for serious events only
params = {
    "search": "serious:1",
    "count": "patient.reaction.reactionmeddrapt.exact"
}

Best Practices

1. Query Efficiency

DO:

Use specific field searches
Filter before counting
Use exact match when possible
Implement pagination for large datasets

DON'T:

Use overly broad wildcards (e.g., search=*)
Request more data than needed
Skip error handling
Ignore rate limits

2. Error Handling

Always handle common errors:

def safe_api_call(url, params):
    """Safely call FDA API with comprehensive error handling."""
    try:
        response = requests.get(url, params=params, timeout=30)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as e:
        if response.status_code == 404:
            return {"error": "No results found"}
        elif response.status_code == 429:
            return {"error": "Rate limit exceeded"}
        elif response.status_code == 400:
            return {"error": "Invalid query"}
        else:
            return {"error": f"HTTP error: {e}"}
    except requests.exceptions.ConnectionError:
        return {"error": "Connection failed"}
    except requests.exceptions.Timeout:
        return {"error": "Request timeout"}
    except requests.exceptions.RequestException as e:
        return {"error": f"Request error: {e}"}

3. Data Validation

Validate and clean data:

def clean_search_term(term):
    """Clean and prepare search term."""
    # Remove special characters that break queries
    term = term.replace('"', '\\"')  # Escape quotes
    term = term.strip()
    return term

def validate_date(date_str):
    """Validate date format (YYYYMMDD)."""
    import re
    if not re.match(r'^\d{8}$', date_str):
        raise ValueError("Date must be in YYYYMMDD format")
    return date_str

4. Caching

Implement caching for frequently accessed data:

import json
from pathlib import Path
import hashlib
import time

class FDACache:
    """Simple file-based cache for FDA API responses."""

    def __init__(self, cache_dir="fda_cache", ttl=3600):
        self.cache_dir = Path(cache_dir)
        self.cache_dir.mkdir(exist_ok=True)
        self.ttl = ttl  # Time to live in seconds

    def _get_cache_key(self, url, params):
        """Generate cache key from URL and params."""
        cache_string = f"{url}_{json.dumps(params, sort_keys=True)}"
        return hashlib.md5(cache_string.encode()).hexdigest()

    def get(self, url, params):
        """Get cached response if available and not expired."""
        key = self._get_cache_key(url, params)
        cache_file = self.cache_dir / f"{key}.json"

        if cache_file.exists():
            # Check if expired
            age = time.time() - cache_file.stat().st_mtime
            if age < self.ttl:
                with open(cache_file, 'r') as f:
                    return json.load(f)

        return None

    def set(self, url, params, data):
        """Cache response data."""
        key = self._get_cache_key(url, params)
        cache_file = self.cache_dir / f"{key}.json"

        with open(cache_file, 'w') as f:
            json.dump(data, f)

# Usage
cache = FDACache(ttl=3600)  # 1 hour cache

def cached_api_call(url, params):
    """API call with caching."""
    # Check cache
    cached = cache.get(url, params)
    if cached:
        return cached

    # Make request
    response = requests.get(url, params=params)
    data = response.json()

    # Cache result
    cache.set(url, params, data)

    return data

5. Rate Limit Management

Track and respect rate limits:

import time
from collections import deque

class RateLimiter:
    """Track and enforce rate limits."""

    def __init__(self, max_per_minute=240):
        self.max_per_minute = max_per_minute
        self.requests = deque()

    def wait_if_needed(self):
        """Wait if necessary to stay under rate limit."""
        now = time.time()

        # Remove requests older than 1 minute
        while self.requests and now - self.requests[0] > 60:
            self.requests.popleft()

        # Check if at limit
        if len(self.requests) >= self.max_per_minute:
            sleep_time = 60 - (now - self.requests[0])
            if sleep_time > 0:
                time.sleep(sleep_time)
            self.requests.popleft()

        self.requests.append(time.time())

# Usage
rate_limiter = RateLimiter(max_per_minute=240)

def rate_limited_request(url, params):
    """Make request with rate limiting."""
    rate_limiter.wait_if_needed()
    return requests.get(url, params=params)

Common Query Patterns

Pattern 1: Time-based Analysis

# Get events from last 30 days
from datetime import datetime, timedelta

end_date = datetime.now()
start_date = end_date - timedelta(days=30)

params = {
    "search": f"receivedate:[{start_date.strftime('%Y%m%d')}+TO+{end_date.strftime('%Y%m%d')}]",
    "limit": 1000
}

Pattern 2: Top N Analysis

# Get top 10 most common reactions for a drug
params = {
    "search": "patient.drug.medicinalproduct:aspirin",
    "count": "patient.reaction.reactionmeddrapt.exact",
    "limit": 10
}

Pattern 3: Comparative Analysis

# Compare two drugs
drugs = ["aspirin", "ibuprofen"]
results = {}

for drug in drugs:
    params = {
        "search": f"patient.drug.medicinalproduct:{drug}",
        "count": "patient.reaction.reactionmeddrapt.exact",
        "limit": 10
    }
    results[drug] = requests.get(url, params=params).json()

Additional Resources

openFDA Homepage: https://open.fda.gov/
API Documentation: https://open.fda.gov/apis/
Interactive API Explorer: https://open.fda.gov/apis/try-the-api/
Terms of Service: https://open.fda.gov/terms/
GitHub: https://github.com/FDA/openfda
Status Page: Check for API outages and maintenance

Support

For questions or issues:

GitHub Issues: https://github.com/FDA/openfda/issues
Email: open-fda@fda.hhs.gov
Discussion Forum: Check GitHub discussions

16 KiB Raw Permalink Blame History