Files
2025-11-30 08:30:10 +08:00

473 lines
13 KiB
Markdown

# FDA Other Databases - Substances and NSDE
This reference covers FDA substance-related and other specialized API endpoints accessible through openFDA.
## Overview
The FDA maintains additional databases for substance-level information that is precise to the molecular level. These databases support regulatory activities across drugs, biologics, devices, foods, and cosmetics.
## Available Endpoints
### 1. Substance Data
**Endpoint**: `https://api.fda.gov/other/substance.json`
**Purpose**: Access substance information that is precise to the molecular level for internal and external use. This includes information about active pharmaceutical ingredients, excipients, and other substances used in FDA-regulated products.
**Data Source**: FDA Global Substance Registration System (GSRS)
**Key Fields**:
- `uuid` - Unique substance identifier (UUID)
- `approvalID` - FDA Unique Ingredient Identifier (UNII)
- `approved` - Approval date
- `substanceClass` - Type of substance (chemical, protein, nucleic acid, polymer, etc.)
- `names` - Array of substance names
- `names.name` - Name text
- `names.type` - Name type (systematic, brand, common, etc.)
- `names.preferred` - Whether preferred name
- `codes` - Array of substance codes
- `codes.code` - Code value
- `codes.codeSystem` - Code system (CAS, ECHA, EINECS, etc.)
- `codes.type` - Code type
- `relationships` - Array of substance relationships
- `relationships.type` - Relationship type (ACTIVE MOIETY, METABOLITE, IMPURITY, etc.)
- `relationships.relatedSubstance` - Related substance reference
- `moieties` - Molecular moieties
- `properties` - Array of physicochemical properties
- `properties.name` - Property name
- `properties.value` - Property value
- `properties.propertyType` - Property type
- `structure` - Chemical structure information
- `structure.smiles` - SMILES notation
- `structure.inchi` - InChI string
- `structure.inchiKey` - InChI key
- `structure.formula` - Molecular formula
- `structure.molecularWeight` - Molecular weight
- `modifications` - Structural modifications (for proteins, etc.)
- `protein` - Protein-specific information
- `protein.subunits` - Protein subunits
- `protein.sequenceType` - Sequence type
- `nucleicAcid` - Nucleic acid information
- `nucleicAcid.subunits` - Sequence subunits
- `polymer` - Polymer information
- `mixture` - Mixture components
- `mixture.components` - Component substances
- `tags` - Substance tags
- `references` - Literature references
**Substance Classes**:
- **Chemical** - Small molecules with defined chemical structure
- **Protein** - Proteins and peptides
- **Nucleic Acid** - DNA, RNA, oligonucleotides
- **Polymer** - Polymeric substances
- **Structurally Diverse** - Complex mixtures, botanicals
- **Mixture** - Defined mixtures
- **Concept** - Abstract concepts (e.g., groups)
**Common Use Cases**:
- Active ingredient identification
- Molecular structure lookup
- UNII code resolution
- Chemical identifier mapping (CAS to UNII, etc.)
- Substance relationship analysis
- Excipient identification
- Botanical substance information
- Protein and biologic characterization
**Example Queries**:
```python
import requests
api_key = "YOUR_API_KEY"
url = "https://api.fda.gov/other/substance.json"
# Look up substance by UNII code
params = {
"api_key": api_key,
"search": "approvalID:R16CO5Y76E", # Aspirin UNII
"limit": 1
}
response = requests.get(url, params=params)
data = response.json()
```
```python
# Search by substance name
params = {
"api_key": api_key,
"search": "names.name:acetaminophen",
"limit": 5
}
```
```python
# Find substances by CAS number
params = {
"api_key": api_key,
"search": "codes.code:50-78-2", # Aspirin CAS
"limit": 1
}
```
```python
# Get chemical substances only
params = {
"api_key": api_key,
"search": "substanceClass:chemical",
"limit": 100
}
```
```python
# Search by molecular formula
params = {
"api_key": api_key,
"search": "structure.formula:C8H9NO2", # Acetaminophen
"limit": 10
}
```
```python
# Find protein substances
params = {
"api_key": api_key,
"search": "substanceClass:protein",
"limit": 50
}
```
### 2. NSDE (National Substance Database Entry)
**Endpoint**: `https://api.fda.gov/other/nsde.json`
**Purpose**: Access historical substance data from legacy National Drug Code (NDC) directory entries. This endpoint provides substance information as it appears in historical drug product listings.
**Note**: This database is primarily for historical reference. For current substance information, use the Substance Data endpoint.
**Key Fields**:
- `proprietary_name` - Product proprietary name
- `nonproprietary_name` - Nonproprietary name
- `dosage_form` - Dosage form
- `route` - Route of administration
- `company_name` - Company name
- `substance_name` - Substance name
- `active_numerator_strength` - Active ingredient strength (numerator)
- `active_ingred_unit` - Active ingredient unit
- `pharm_classes` - Pharmacological classes
- `dea_schedule` - DEA controlled substance schedule
**Common Use Cases**:
- Historical drug formulation research
- Legacy system integration
- Historical substance name mapping
- Pharmaceutical history research
**Example Queries**:
```python
# Search by substance name
params = {
"api_key": api_key,
"search": "substance_name:ibuprofen",
"limit": 20
}
response = requests.get("https://api.fda.gov/other/nsde.json", params=params)
```
```python
# Find controlled substances by DEA schedule
params = {
"api_key": api_key,
"search": "dea_schedule:CII",
"limit": 50
}
```
## Integration Tips
### UNII to CAS Mapping
```python
def get_substance_identifiers(unii, api_key):
"""
Get all identifiers for a substance given its UNII code.
Args:
unii: FDA Unique Ingredient Identifier
api_key: FDA API key
Returns:
Dictionary with substance identifiers
"""
import requests
url = "https://api.fda.gov/other/substance.json"
params = {
"api_key": api_key,
"search": f"approvalID:{unii}",
"limit": 1
}
response = requests.get(url, params=params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
return None
substance = data["results"][0]
identifiers = {
"unii": substance.get("approvalID"),
"uuid": substance.get("uuid"),
"preferred_name": None,
"cas_numbers": [],
"other_codes": {}
}
# Extract names
if "names" in substance:
for name in substance["names"]:
if name.get("preferred"):
identifiers["preferred_name"] = name.get("name")
break
if not identifiers["preferred_name"] and len(substance["names"]) > 0:
identifiers["preferred_name"] = substance["names"][0].get("name")
# Extract codes
if "codes" in substance:
for code in substance["codes"]:
code_system = code.get("codeSystem", "").upper()
code_value = code.get("code")
if "CAS" in code_system:
identifiers["cas_numbers"].append(code_value)
else:
if code_system not in identifiers["other_codes"]:
identifiers["other_codes"][code_system] = []
identifiers["other_codes"][code_system].append(code_value)
return identifiers
```
### Chemical Structure Lookup
```python
def get_chemical_structure(substance_name, api_key):
"""
Get chemical structure information for a substance.
Args:
substance_name: Name of the substance
api_key: FDA API key
Returns:
Dictionary with structure information
"""
import requests
url = "https://api.fda.gov/other/substance.json"
params = {
"api_key": api_key,
"search": f"names.name:{substance_name}",
"limit": 1
}
response = requests.get(url, params=params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
return None
substance = data["results"][0]
if "structure" not in substance:
return None
structure = substance["structure"]
return {
"smiles": structure.get("smiles"),
"inchi": structure.get("inchi"),
"inchi_key": structure.get("inchiKey"),
"formula": structure.get("formula"),
"molecular_weight": structure.get("molecularWeight"),
"substance_class": substance.get("substanceClass")
}
```
### Substance Relationship Mapping
```python
def get_substance_relationships(unii, api_key):
"""
Get all related substances (metabolites, active moieties, etc.).
Args:
unii: FDA Unique Ingredient Identifier
api_key: FDA API key
Returns:
Dictionary organizing relationships by type
"""
import requests
url = "https://api.fda.gov/other/substance.json"
params = {
"api_key": api_key,
"search": f"approvalID:{unii}",
"limit": 1
}
response = requests.get(url, params=params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
return None
substance = data["results"][0]
relationships = {}
if "relationships" in substance:
for rel in substance["relationships"]:
rel_type = rel.get("type")
if rel_type not in relationships:
relationships[rel_type] = []
related = {
"uuid": rel.get("relatedSubstance", {}).get("uuid"),
"unii": rel.get("relatedSubstance", {}).get("approvalID"),
"name": rel.get("relatedSubstance", {}).get("refPname")
}
relationships[rel_type].append(related)
return relationships
```
### Active Ingredient Extraction
```python
def find_active_ingredients_by_product(product_name, api_key):
"""
Find active ingredients in a drug product.
Args:
product_name: Drug product name
api_key: FDA API key
Returns:
List of active ingredient UNIIs and names
"""
import requests
# First search drug label database
label_url = "https://api.fda.gov/drug/label.json"
label_params = {
"api_key": api_key,
"search": f"openfda.brand_name:{product_name}",
"limit": 1
}
response = requests.get(label_url, params=label_params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
return None
label = data["results"][0]
# Extract UNIIs from openfda section
active_ingredients = []
if "openfda" in label:
openfda = label["openfda"]
# Get UNIIs
unii_list = openfda.get("unii", [])
generic_names = openfda.get("generic_name", [])
for i, unii in enumerate(unii_list):
ingredient = {"unii": unii}
if i < len(generic_names):
ingredient["name"] = generic_names[i]
# Get additional substance info
substance_info = get_substance_identifiers(unii, api_key)
if substance_info:
ingredient.update(substance_info)
active_ingredients.append(ingredient)
return active_ingredients
```
## Best Practices
1. **Use UNII as primary identifier** - Most consistent across FDA databases
2. **Map between identifier systems** - CAS, UNII, InChI Key for cross-referencing
3. **Handle substance variations** - Different salt forms, hydrates have different UNIIs
4. **Check substance class** - Different classes have different data structures
5. **Validate chemical structures** - SMILES and InChI should be verified
6. **Consider substance relationships** - Active moiety vs. salt form matters
7. **Use preferred names** - More consistent than trade names
8. **Cache substance data** - Substance information changes infrequently
9. **Cross-reference with other endpoints** - Link substances to drugs/products
10. **Handle mixture components** - Complex products have multiple components
## UNII System
The FDA Unique Ingredient Identifier (UNII) system provides:
- **Unique identifiers** - Each substance gets one UNII
- **Substance specificity** - Different forms (salts, hydrates) get different UNIIs
- **Global recognition** - Used internationally
- **Stability** - UNIIs don't change once assigned
- **Free access** - No licensing required
**UNII Format**: 10-character alphanumeric code (e.g., `R16CO5Y76E`)
## Substance Classes Explained
### Chemical
- Traditional small molecule drugs
- Have defined molecular structure
- Include organic and inorganic compounds
- SMILES, InChI, molecular formula available
### Protein
- Polypeptides and proteins
- Sequence information available
- May have post-translational modifications
- Includes antibodies, enzymes, hormones
### Nucleic Acid
- DNA and RNA sequences
- Oligonucleotides
- Antisense, siRNA, mRNA
- Sequence data available
### Polymer
- Synthetic and natural polymers
- Structural repeat units
- Molecular weight distributions
- Used as excipients and active ingredients
### Structurally Diverse
- Complex natural products
- Botanical extracts
- Materials without single molecular structure
- Characterized by source and composition
### Mixture
- Defined combinations of substances
- Fixed or variable composition
- Each component trackable
## Additional Resources
- FDA Substance Registration System: https://fdasis.nlm.nih.gov/srs/
- UNII Search: https://precision.fda.gov/uniisearch
- OpenFDA Other APIs: https://open.fda.gov/apis/other/
- API Basics: See `api_basics.md` in this references directory
- Python examples: See `scripts/fda_substance_query.py`