Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/fda-database/references/api_basics.md
+++ b/skills/fda-database/references/api_basics.md
@@ -0,0 +1,687 @@
+# OpenFDA API Basics
+
+This reference provides comprehensive information about using the openFDA API, including authentication, rate limits, query syntax, and best practices.
+
+## Getting Started
+
+### Base URL
+
+All openFDA API endpoints follow this structure:
+```
+https://api.fda.gov/{category}/{endpoint}.json
+```
+
+Examples:
+- `https://api.fda.gov/drug/event.json`
+- `https://api.fda.gov/device/510k.json`
+- `https://api.fda.gov/food/enforcement.json`
+
+### HTTPS Required
+
+**All requests must use HTTPS**. HTTP requests are not accepted and will fail.
+
+## Authentication
+
+### API Key Registration
+
+While openFDA can be used without an API key, registering for a free API key is strongly recommended for higher rate limits.
+
+**Registration**: Visit https://open.fda.gov/apis/authentication/ to sign up
+
+**Benefits of API Key**:
+- Higher rate limits (240 req/min, 120,000 req/day)
+- Better for production applications
+- No additional cost
+
+### Using Your API Key
+
+Include your API key in requests using one of two methods:
+
+**Method 1: Query Parameter (Recommended)**
+```python
+import requests
+
+api_key = "YOUR_API_KEY_HERE"
+url = "https://api.fda.gov/drug/event.json"
+
+params = {
+    "api_key": api_key,
+    "search": "patient.drug.medicinalproduct:aspirin",
+    "limit": 10
+}
+
+response = requests.get(url, params=params)
+```
+
+**Method 2: Basic Authentication**
+```python
+import requests
+
+api_key = "YOUR_API_KEY_HERE"
+url = "https://api.fda.gov/drug/event.json"
+
+params = {
+    "search": "patient.drug.medicinalproduct:aspirin",
+    "limit": 10
+}
+
+response = requests.get(url, params=params, auth=(api_key, ''))
+```
+
+## Rate Limits
+
+### Current Limits
+
+| Status | Requests per Minute | Requests per Day |
+|--------|-------------------|------------------|
+| **Without API Key** | 240 per IP address | 1,000 per IP address |
+| **With API Key** | 240 per key | 120,000 per key |
+
+### Rate Limit Headers
+
+The API returns rate limit information in response headers:
+```python
+response = requests.get(url, params=params)
+
+print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}")
+print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
+print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}")
+```
+
+### Handling Rate Limits
+
+When you exceed rate limits, the API returns:
+- **Status Code**: `429 Too Many Requests`
+- **Error Message**: Indicates rate limit exceeded
+
+**Best Practice**: Implement exponential backoff:
+```python
+import requests
+import time
+
+def query_with_rate_limit_handling(url, params, max_retries=3):
+    """Query API with automatic rate limit handling."""
+    for attempt in range(max_retries):
+        try:
+            response = requests.get(url, params=params)
+            response.raise_for_status()
+            return response.json()
+        except requests.exceptions.HTTPError as e:
+            if response.status_code == 429:
+                # Rate limit exceeded
+                wait_time = (2 ** attempt) * 60  # Exponential backoff
+                print(f"Rate limit hit. Waiting {wait_time} seconds...")
+                time.sleep(wait_time)
+            else:
+                raise
+    raise Exception("Max retries exceeded")
+```
+
+### Increasing Limits
+
+For applications requiring higher limits, contact the openFDA team through their website with details about your use case.
+
+## Query Syntax
+
+### Basic Structure
+
+Queries use this format:
+```
+?api_key=YOUR_KEY&parameter=value&parameter2=value2
+```
+
+Parameters are separated by ampersands (`&`).
+
+### Search Parameter
+
+The `search` parameter is the primary way to filter results.
+
+**Basic Format**:
+```
+search=field:value
+```
+
+**Example**:
+```python
+params = {
+    "api_key": api_key,
+    "search": "patient.drug.medicinalproduct:aspirin"
+}
+```
+
+### Search Operators
+
+#### AND Operator
+Combines multiple conditions (both must be true):
+```python
+# Find aspirin adverse events in Canada
+params = {
+    "search": "patient.drug.medicinalproduct:aspirin+AND+occurcountry:ca"
+}
+```
+
+#### OR Operator
+Either condition can be true (OR is implicit with space):
+```python
+# Find aspirin OR ibuprofen
+params = {
+    "search": "patient.drug.medicinalproduct:(aspirin ibuprofen)"
+}
+```
+
+Or explicitly:
+```python
+params = {
+    "search": "patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen"
+}
+```
+
+#### NOT Operator
+Exclude results:
+```python
+# Events NOT in the United States
+params = {
+    "search": "_exists_:occurcountry+AND+NOT+occurcountry:us"
+}
+```
+
+#### Wildcards
+Use asterisk (`*`) for partial matching:
+```python
+# Any drug starting with "met"
+params = {
+    "search": "patient.drug.medicinalproduct:met*"
+}
+
+# Any drug containing "cillin"
+params = {
+    "search": "patient.drug.medicinalproduct:*cillin*"
+}
+```
+
+#### Exact Phrase Matching
+Use quotes for exact phrases:
+```python
+params = {
+    "search": 'patient.reaction.reactionmeddrapt:"heart attack"'
+}
+```
+
+#### Range Queries
+Search within ranges:
+```python
+# Date range (YYYYMMDD format)
+params = {
+    "search": "receivedate:[20200101+TO+20201231]"
+}
+
+# Numeric range
+params = {
+    "search": "patient.patientonsetage:[18+TO+65]"
+}
+
+# Open-ended ranges
+params = {
+    "search": "patient.patientonsetage:[65+TO+*]"  # 65 and older
+}
+```
+
+#### Field Existence
+Check if a field exists:
+```python
+# Records that have a patient age
+params = {
+    "search": "_exists_:patient.patientonsetage"
+}
+
+# Records missing patient age
+params = {
+    "search": "_missing_:patient.patientonsetage"
+}
+```
+
+### Limit Parameter
+
+Controls how many results to return (1-1000, default 1):
+```python
+params = {
+    "search": "...",
+    "limit": 100
+}
+```
+
+**Maximum**: 1000 results per request
+
+### Skip Parameter
+
+For pagination, skip the first N results:
+```python
+# Get results 101-200
+params = {
+    "search": "...",
+    "limit": 100,
+    "skip": 100
+}
+```
+
+**Pagination Example**:
+```python
+def get_all_results(url, search_query, api_key, max_results=5000):
+    """Retrieve results with pagination."""
+    all_results = []
+    skip = 0
+    limit = 100
+
+    while len(all_results) < max_results:
+        params = {
+            "api_key": api_key,
+            "search": search_query,
+            "limit": limit,
+            "skip": skip
+        }
+
+        response = requests.get(url, params=params)
+        data = response.json()
+
+        if "results" not in data or len(data["results"]) == 0:
+            break
+
+        all_results.extend(data["results"])
+
+        if len(data["results"]) < limit:
+            break  # No more results
+
+        skip += limit
+        time.sleep(0.25)  # Rate limiting courtesy
+
+    return all_results[:max_results]
+```
+
+### Count Parameter
+
+Aggregate and count results by a field (instead of returning individual records):
+```python
+# Count events by country
+params = {
+    "search": "patient.drug.medicinalproduct:aspirin",
+    "count": "occurcountry"
+}
+```
+
+**Response Format**:
+```json
+{
+  "results": [
+    {"term": "us", "count": 12543},
+    {"term": "ca", "count": 3421},
+    {"term": "gb", "count": 2156}
+  ]
+}
+```
+
+#### Exact Counting
+
+Add `.exact` suffix for exact phrase counting (especially important for multi-word fields):
+```python
+# Count exact reaction terms (not individual words)
+params = {
+    "search": "patient.drug.medicinalproduct:aspirin",
+    "count": "patient.reaction.reactionmeddrapt.exact"
+}
+```
+
+**Without `.exact`**: Counts individual words
+**With `.exact`**: Counts complete phrases
+
+### Sort Parameter
+
+Sort results by field:
+```python
+# Sort by date, newest first
+params = {
+    "search": "...",
+    "sort": "receivedate:desc"
+}
+
+# Sort by date, oldest first
+params = {
+    "search": "...",
+    "sort": "receivedate:asc"
+}
+```
+
+## Response Format
+
+### Standard Response Structure
+
+```json
+{
+  "meta": {
+    "disclaimer": "...",
+    "terms": "...",
+    "license": "...",
+    "last_updated": "2024-01-15",
+    "results": {
+      "skip": 0,
+      "limit": 10,
+      "total": 15234
+    }
+  },
+  "results": [
+    {
+      // Individual result record
+    },
+    {
+      // Another result record
+    }
+  ]
+}
+```
+
+### Response Fields
+
+- **meta**: Metadata about the query and results
+  - `disclaimer`: Important legal disclaimer
+  - `terms`: Terms of use URL
+  - `license`: Data license information
+  - `last_updated`: When data was last updated
+  - `results.skip`: Number of skipped results
+  - `results.limit`: Maximum results per page
+  - `results.total`: Total matching results (may be approximate for large result sets)
+
+- **results**: Array of matching records
+
+### Empty Results
+
+When no results match:
+```json
+{
+  "meta": {...},
+  "results": []
+}
+```
+
+### Error Response
+
+When an error occurs:
+```json
+{
+  "error": {
+    "code": "INVALID_QUERY",
+    "message": "Detailed error message"
+  }
+}
+```
+
+**Common Error Codes**:
+- `NOT_FOUND`: No results found (404)
+- `INVALID_QUERY`: Malformed search query (400)
+- `RATE_LIMIT_EXCEEDED`: Too many requests (429)
+- `UNAUTHORIZED`: Invalid API key (401)
+- `SERVER_ERROR`: Internal server error (500)
+
+## Advanced Techniques
+
+### Nested Field Queries
+
+Query nested objects:
+```python
+# Drug adverse events where serious outcome is death
+params = {
+    "search": "serious:1+AND+seriousnessdeath:1"
+}
+```
+
+### Multiple Field Search
+
+Search across multiple fields:
+```python
+# Search drug name in multiple fields
+params = {
+    "search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.openfda.brand_name:aspirin)"
+}
+```
+
+### Complex Boolean Logic
+
+Combine multiple operators:
+```python
+# (Aspirin OR Ibuprofen) AND (Heart Attack) AND NOT (US)
+params = {
+    "search": "(patient.drug.medicinalproduct:aspirin+OR+patient.drug.medicinalproduct:ibuprofen)+AND+patient.reaction.reactionmeddrapt:*heart*attack*+AND+NOT+occurcountry:us"
+}
+```
+
+### Counting with Filters
+
+Count within a specific subset:
+```python
+# Count reactions for serious events only
+params = {
+    "search": "serious:1",
+    "count": "patient.reaction.reactionmeddrapt.exact"
+}
+```
+
+## Best Practices
+
+### 1. Query Efficiency
+
+**DO**:
+- Use specific field searches
+- Filter before counting
+- Use exact match when possible
+- Implement pagination for large datasets
+
+**DON'T**:
+- Use overly broad wildcards (e.g., `search=*`)
+- Request more data than needed
+- Skip error handling
+- Ignore rate limits
+
+### 2. Error Handling
+
+Always handle common errors:
+```python
+def safe_api_call(url, params):
+    """Safely call FDA API with comprehensive error handling."""
+    try:
+        response = requests.get(url, params=params, timeout=30)
+        response.raise_for_status()
+        return response.json()
+    except requests.exceptions.HTTPError as e:
+        if response.status_code == 404:
+            return {"error": "No results found"}
+        elif response.status_code == 429:
+            return {"error": "Rate limit exceeded"}
+        elif response.status_code == 400:
+            return {"error": "Invalid query"}
+        else:
+            return {"error": f"HTTP error: {e}"}
+    except requests.exceptions.ConnectionError:
+        return {"error": "Connection failed"}
+    except requests.exceptions.Timeout:
+        return {"error": "Request timeout"}
+    except requests.exceptions.RequestException as e:
+        return {"error": f"Request error: {e}"}
+```
+
+### 3. Data Validation
+
+Validate and clean data:
+```python
+def clean_search_term(term):
+    """Clean and prepare search term."""
+    # Remove special characters that break queries
+    term = term.replace('"', '\\"')  # Escape quotes
+    term = term.strip()
+    return term
+
+def validate_date(date_str):
+    """Validate date format (YYYYMMDD)."""
+    import re
+    if not re.match(r'^\d{8}$', date_str):
+        raise ValueError("Date must be in YYYYMMDD format")
+    return date_str
+```
+
+### 4. Caching
+
+Implement caching for frequently accessed data:
+```python
+import json
+from pathlib import Path
+import hashlib
+import time
+
+class FDACache:
+    """Simple file-based cache for FDA API responses."""
+
+    def __init__(self, cache_dir="fda_cache", ttl=3600):
+        self.cache_dir = Path(cache_dir)
+        self.cache_dir.mkdir(exist_ok=True)
+        self.ttl = ttl  # Time to live in seconds
+
+    def _get_cache_key(self, url, params):
+        """Generate cache key from URL and params."""
+        cache_string = f"{url}_{json.dumps(params, sort_keys=True)}"
+        return hashlib.md5(cache_string.encode()).hexdigest()
+
+    def get(self, url, params):
+        """Get cached response if available and not expired."""
+        key = self._get_cache_key(url, params)
+        cache_file = self.cache_dir / f"{key}.json"
+
+        if cache_file.exists():
+            # Check if expired
+            age = time.time() - cache_file.stat().st_mtime
+            if age < self.ttl:
+                with open(cache_file, 'r') as f:
+                    return json.load(f)
+
+        return None
+
+    def set(self, url, params, data):
+        """Cache response data."""
+        key = self._get_cache_key(url, params)
+        cache_file = self.cache_dir / f"{key}.json"
+
+        with open(cache_file, 'w') as f:
+            json.dump(data, f)
+
+# Usage
+cache = FDACache(ttl=3600)  # 1 hour cache
+
+def cached_api_call(url, params):
+    """API call with caching."""
+    # Check cache
+    cached = cache.get(url, params)
+    if cached:
+        return cached
+
+    # Make request
+    response = requests.get(url, params=params)
+    data = response.json()
+
+    # Cache result
+    cache.set(url, params, data)
+
+    return data
+```
+
+### 5. Rate Limit Management
+
+Track and respect rate limits:
+```python
+import time
+from collections import deque
+
+class RateLimiter:
+    """Track and enforce rate limits."""
+
+    def __init__(self, max_per_minute=240):
+        self.max_per_minute = max_per_minute
+        self.requests = deque()
+
+    def wait_if_needed(self):
+        """Wait if necessary to stay under rate limit."""
+        now = time.time()
+
+        # Remove requests older than 1 minute
+        while self.requests and now - self.requests[0] > 60:
+            self.requests.popleft()
+
+        # Check if at limit
+        if len(self.requests) >= self.max_per_minute:
+            sleep_time = 60 - (now - self.requests[0])
+            if sleep_time > 0:
+                time.sleep(sleep_time)
+            self.requests.popleft()
+
+        self.requests.append(time.time())
+
+# Usage
+rate_limiter = RateLimiter(max_per_minute=240)
+
+def rate_limited_request(url, params):
+    """Make request with rate limiting."""
+    rate_limiter.wait_if_needed()
+    return requests.get(url, params=params)
+```
+
+## Common Query Patterns
+
+### Pattern 1: Time-based Analysis
+```python
+# Get events from last 30 days
+from datetime import datetime, timedelta
+
+end_date = datetime.now()
+start_date = end_date - timedelta(days=30)
+
+params = {
+    "search": f"receivedate:[{start_date.strftime('%Y%m%d')}+TO+{end_date.strftime('%Y%m%d')}]",
+    "limit": 1000
+}
+```
+
+### Pattern 2: Top N Analysis
+```python
+# Get top 10 most common reactions for a drug
+params = {
+    "search": "patient.drug.medicinalproduct:aspirin",
+    "count": "patient.reaction.reactionmeddrapt.exact",
+    "limit": 10
+}
+```
+
+### Pattern 3: Comparative Analysis
+```python
+# Compare two drugs
+drugs = ["aspirin", "ibuprofen"]
+results = {}
+
+for drug in drugs:
+    params = {
+        "search": f"patient.drug.medicinalproduct:{drug}",
+        "count": "patient.reaction.reactionmeddrapt.exact",
+        "limit": 10
+    }
+    results[drug] = requests.get(url, params=params).json()
+```
+
+## Additional Resources
+
+- **openFDA Homepage**: https://open.fda.gov/
+- **API Documentation**: https://open.fda.gov/apis/
+- **Interactive API Explorer**: https://open.fda.gov/apis/try-the-api/
+- **Terms of Service**: https://open.fda.gov/terms/
+- **GitHub**: https://github.com/FDA/openfda
+- **Status Page**: Check for API outages and maintenance
+
+## Support
+
+For questions or issues:
+- **GitHub Issues**: https://github.com/FDA/openfda/issues
+- **Email**: open-fda@fda.hhs.gov
+- **Discussion Forum**: Check GitHub discussions