473 lines
13 KiB
Markdown
473 lines
13 KiB
Markdown
# FDA Other Databases - Substances and NSDE
|
|
|
|
This reference covers FDA substance-related and other specialized API endpoints accessible through openFDA.
|
|
|
|
## Overview
|
|
|
|
The FDA maintains additional databases for substance-level information that is precise to the molecular level. These databases support regulatory activities across drugs, biologics, devices, foods, and cosmetics.
|
|
|
|
## Available Endpoints
|
|
|
|
### 1. Substance Data
|
|
|
|
**Endpoint**: `https://api.fda.gov/other/substance.json`
|
|
|
|
**Purpose**: Access substance information that is precise to the molecular level for internal and external use. This includes information about active pharmaceutical ingredients, excipients, and other substances used in FDA-regulated products.
|
|
|
|
**Data Source**: FDA Global Substance Registration System (GSRS)
|
|
|
|
**Key Fields**:
|
|
- `uuid` - Unique substance identifier (UUID)
|
|
- `approvalID` - FDA Unique Ingredient Identifier (UNII)
|
|
- `approved` - Approval date
|
|
- `substanceClass` - Type of substance (chemical, protein, nucleic acid, polymer, etc.)
|
|
- `names` - Array of substance names
|
|
- `names.name` - Name text
|
|
- `names.type` - Name type (systematic, brand, common, etc.)
|
|
- `names.preferred` - Whether preferred name
|
|
- `codes` - Array of substance codes
|
|
- `codes.code` - Code value
|
|
- `codes.codeSystem` - Code system (CAS, ECHA, EINECS, etc.)
|
|
- `codes.type` - Code type
|
|
- `relationships` - Array of substance relationships
|
|
- `relationships.type` - Relationship type (ACTIVE MOIETY, METABOLITE, IMPURITY, etc.)
|
|
- `relationships.relatedSubstance` - Related substance reference
|
|
- `moieties` - Molecular moieties
|
|
- `properties` - Array of physicochemical properties
|
|
- `properties.name` - Property name
|
|
- `properties.value` - Property value
|
|
- `properties.propertyType` - Property type
|
|
- `structure` - Chemical structure information
|
|
- `structure.smiles` - SMILES notation
|
|
- `structure.inchi` - InChI string
|
|
- `structure.inchiKey` - InChI key
|
|
- `structure.formula` - Molecular formula
|
|
- `structure.molecularWeight` - Molecular weight
|
|
- `modifications` - Structural modifications (for proteins, etc.)
|
|
- `protein` - Protein-specific information
|
|
- `protein.subunits` - Protein subunits
|
|
- `protein.sequenceType` - Sequence type
|
|
- `nucleicAcid` - Nucleic acid information
|
|
- `nucleicAcid.subunits` - Sequence subunits
|
|
- `polymer` - Polymer information
|
|
- `mixture` - Mixture components
|
|
- `mixture.components` - Component substances
|
|
- `tags` - Substance tags
|
|
- `references` - Literature references
|
|
|
|
**Substance Classes**:
|
|
- **Chemical** - Small molecules with defined chemical structure
|
|
- **Protein** - Proteins and peptides
|
|
- **Nucleic Acid** - DNA, RNA, oligonucleotides
|
|
- **Polymer** - Polymeric substances
|
|
- **Structurally Diverse** - Complex mixtures, botanicals
|
|
- **Mixture** - Defined mixtures
|
|
- **Concept** - Abstract concepts (e.g., groups)
|
|
|
|
**Common Use Cases**:
|
|
- Active ingredient identification
|
|
- Molecular structure lookup
|
|
- UNII code resolution
|
|
- Chemical identifier mapping (CAS to UNII, etc.)
|
|
- Substance relationship analysis
|
|
- Excipient identification
|
|
- Botanical substance information
|
|
- Protein and biologic characterization
|
|
|
|
**Example Queries**:
|
|
```python
|
|
import requests
|
|
|
|
api_key = "YOUR_API_KEY"
|
|
url = "https://api.fda.gov/other/substance.json"
|
|
|
|
# Look up substance by UNII code
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": "approvalID:R16CO5Y76E", # Aspirin UNII
|
|
"limit": 1
|
|
}
|
|
|
|
response = requests.get(url, params=params)
|
|
data = response.json()
|
|
```
|
|
|
|
```python
|
|
# Search by substance name
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": "names.name:acetaminophen",
|
|
"limit": 5
|
|
}
|
|
```
|
|
|
|
```python
|
|
# Find substances by CAS number
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": "codes.code:50-78-2", # Aspirin CAS
|
|
"limit": 1
|
|
}
|
|
```
|
|
|
|
```python
|
|
# Get chemical substances only
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": "substanceClass:chemical",
|
|
"limit": 100
|
|
}
|
|
```
|
|
|
|
```python
|
|
# Search by molecular formula
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": "structure.formula:C8H9NO2", # Acetaminophen
|
|
"limit": 10
|
|
}
|
|
```
|
|
|
|
```python
|
|
# Find protein substances
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": "substanceClass:protein",
|
|
"limit": 50
|
|
}
|
|
```
|
|
|
|
### 2. NSDE (National Substance Database Entry)
|
|
|
|
**Endpoint**: `https://api.fda.gov/other/nsde.json`
|
|
|
|
**Purpose**: Access historical substance data from legacy National Drug Code (NDC) directory entries. This endpoint provides substance information as it appears in historical drug product listings.
|
|
|
|
**Note**: This database is primarily for historical reference. For current substance information, use the Substance Data endpoint.
|
|
|
|
**Key Fields**:
|
|
- `proprietary_name` - Product proprietary name
|
|
- `nonproprietary_name` - Nonproprietary name
|
|
- `dosage_form` - Dosage form
|
|
- `route` - Route of administration
|
|
- `company_name` - Company name
|
|
- `substance_name` - Substance name
|
|
- `active_numerator_strength` - Active ingredient strength (numerator)
|
|
- `active_ingred_unit` - Active ingredient unit
|
|
- `pharm_classes` - Pharmacological classes
|
|
- `dea_schedule` - DEA controlled substance schedule
|
|
|
|
**Common Use Cases**:
|
|
- Historical drug formulation research
|
|
- Legacy system integration
|
|
- Historical substance name mapping
|
|
- Pharmaceutical history research
|
|
|
|
**Example Queries**:
|
|
```python
|
|
# Search by substance name
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": "substance_name:ibuprofen",
|
|
"limit": 20
|
|
}
|
|
|
|
response = requests.get("https://api.fda.gov/other/nsde.json", params=params)
|
|
```
|
|
|
|
```python
|
|
# Find controlled substances by DEA schedule
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": "dea_schedule:CII",
|
|
"limit": 50
|
|
}
|
|
```
|
|
|
|
## Integration Tips
|
|
|
|
### UNII to CAS Mapping
|
|
|
|
```python
|
|
def get_substance_identifiers(unii, api_key):
|
|
"""
|
|
Get all identifiers for a substance given its UNII code.
|
|
|
|
Args:
|
|
unii: FDA Unique Ingredient Identifier
|
|
api_key: FDA API key
|
|
|
|
Returns:
|
|
Dictionary with substance identifiers
|
|
"""
|
|
import requests
|
|
|
|
url = "https://api.fda.gov/other/substance.json"
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": f"approvalID:{unii}",
|
|
"limit": 1
|
|
}
|
|
|
|
response = requests.get(url, params=params)
|
|
data = response.json()
|
|
|
|
if "results" not in data or len(data["results"]) == 0:
|
|
return None
|
|
|
|
substance = data["results"][0]
|
|
|
|
identifiers = {
|
|
"unii": substance.get("approvalID"),
|
|
"uuid": substance.get("uuid"),
|
|
"preferred_name": None,
|
|
"cas_numbers": [],
|
|
"other_codes": {}
|
|
}
|
|
|
|
# Extract names
|
|
if "names" in substance:
|
|
for name in substance["names"]:
|
|
if name.get("preferred"):
|
|
identifiers["preferred_name"] = name.get("name")
|
|
break
|
|
if not identifiers["preferred_name"] and len(substance["names"]) > 0:
|
|
identifiers["preferred_name"] = substance["names"][0].get("name")
|
|
|
|
# Extract codes
|
|
if "codes" in substance:
|
|
for code in substance["codes"]:
|
|
code_system = code.get("codeSystem", "").upper()
|
|
code_value = code.get("code")
|
|
|
|
if "CAS" in code_system:
|
|
identifiers["cas_numbers"].append(code_value)
|
|
else:
|
|
if code_system not in identifiers["other_codes"]:
|
|
identifiers["other_codes"][code_system] = []
|
|
identifiers["other_codes"][code_system].append(code_value)
|
|
|
|
return identifiers
|
|
```
|
|
|
|
### Chemical Structure Lookup
|
|
|
|
```python
|
|
def get_chemical_structure(substance_name, api_key):
|
|
"""
|
|
Get chemical structure information for a substance.
|
|
|
|
Args:
|
|
substance_name: Name of the substance
|
|
api_key: FDA API key
|
|
|
|
Returns:
|
|
Dictionary with structure information
|
|
"""
|
|
import requests
|
|
|
|
url = "https://api.fda.gov/other/substance.json"
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": f"names.name:{substance_name}",
|
|
"limit": 1
|
|
}
|
|
|
|
response = requests.get(url, params=params)
|
|
data = response.json()
|
|
|
|
if "results" not in data or len(data["results"]) == 0:
|
|
return None
|
|
|
|
substance = data["results"][0]
|
|
|
|
if "structure" not in substance:
|
|
return None
|
|
|
|
structure = substance["structure"]
|
|
|
|
return {
|
|
"smiles": structure.get("smiles"),
|
|
"inchi": structure.get("inchi"),
|
|
"inchi_key": structure.get("inchiKey"),
|
|
"formula": structure.get("formula"),
|
|
"molecular_weight": structure.get("molecularWeight"),
|
|
"substance_class": substance.get("substanceClass")
|
|
}
|
|
```
|
|
|
|
### Substance Relationship Mapping
|
|
|
|
```python
|
|
def get_substance_relationships(unii, api_key):
|
|
"""
|
|
Get all related substances (metabolites, active moieties, etc.).
|
|
|
|
Args:
|
|
unii: FDA Unique Ingredient Identifier
|
|
api_key: FDA API key
|
|
|
|
Returns:
|
|
Dictionary organizing relationships by type
|
|
"""
|
|
import requests
|
|
|
|
url = "https://api.fda.gov/other/substance.json"
|
|
params = {
|
|
"api_key": api_key,
|
|
"search": f"approvalID:{unii}",
|
|
"limit": 1
|
|
}
|
|
|
|
response = requests.get(url, params=params)
|
|
data = response.json()
|
|
|
|
if "results" not in data or len(data["results"]) == 0:
|
|
return None
|
|
|
|
substance = data["results"][0]
|
|
|
|
relationships = {}
|
|
|
|
if "relationships" in substance:
|
|
for rel in substance["relationships"]:
|
|
rel_type = rel.get("type")
|
|
if rel_type not in relationships:
|
|
relationships[rel_type] = []
|
|
|
|
related = {
|
|
"uuid": rel.get("relatedSubstance", {}).get("uuid"),
|
|
"unii": rel.get("relatedSubstance", {}).get("approvalID"),
|
|
"name": rel.get("relatedSubstance", {}).get("refPname")
|
|
}
|
|
relationships[rel_type].append(related)
|
|
|
|
return relationships
|
|
```
|
|
|
|
### Active Ingredient Extraction
|
|
|
|
```python
|
|
def find_active_ingredients_by_product(product_name, api_key):
|
|
"""
|
|
Find active ingredients in a drug product.
|
|
|
|
Args:
|
|
product_name: Drug product name
|
|
api_key: FDA API key
|
|
|
|
Returns:
|
|
List of active ingredient UNIIs and names
|
|
"""
|
|
import requests
|
|
|
|
# First search drug label database
|
|
label_url = "https://api.fda.gov/drug/label.json"
|
|
label_params = {
|
|
"api_key": api_key,
|
|
"search": f"openfda.brand_name:{product_name}",
|
|
"limit": 1
|
|
}
|
|
|
|
response = requests.get(label_url, params=label_params)
|
|
data = response.json()
|
|
|
|
if "results" not in data or len(data["results"]) == 0:
|
|
return None
|
|
|
|
label = data["results"][0]
|
|
|
|
# Extract UNIIs from openfda section
|
|
active_ingredients = []
|
|
|
|
if "openfda" in label:
|
|
openfda = label["openfda"]
|
|
|
|
# Get UNIIs
|
|
unii_list = openfda.get("unii", [])
|
|
generic_names = openfda.get("generic_name", [])
|
|
|
|
for i, unii in enumerate(unii_list):
|
|
ingredient = {"unii": unii}
|
|
if i < len(generic_names):
|
|
ingredient["name"] = generic_names[i]
|
|
|
|
# Get additional substance info
|
|
substance_info = get_substance_identifiers(unii, api_key)
|
|
if substance_info:
|
|
ingredient.update(substance_info)
|
|
|
|
active_ingredients.append(ingredient)
|
|
|
|
return active_ingredients
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Use UNII as primary identifier** - Most consistent across FDA databases
|
|
2. **Map between identifier systems** - CAS, UNII, InChI Key for cross-referencing
|
|
3. **Handle substance variations** - Different salt forms, hydrates have different UNIIs
|
|
4. **Check substance class** - Different classes have different data structures
|
|
5. **Validate chemical structures** - SMILES and InChI should be verified
|
|
6. **Consider substance relationships** - Active moiety vs. salt form matters
|
|
7. **Use preferred names** - More consistent than trade names
|
|
8. **Cache substance data** - Substance information changes infrequently
|
|
9. **Cross-reference with other endpoints** - Link substances to drugs/products
|
|
10. **Handle mixture components** - Complex products have multiple components
|
|
|
|
## UNII System
|
|
|
|
The FDA Unique Ingredient Identifier (UNII) system provides:
|
|
- **Unique identifiers** - Each substance gets one UNII
|
|
- **Substance specificity** - Different forms (salts, hydrates) get different UNIIs
|
|
- **Global recognition** - Used internationally
|
|
- **Stability** - UNIIs don't change once assigned
|
|
- **Free access** - No licensing required
|
|
|
|
**UNII Format**: 10-character alphanumeric code (e.g., `R16CO5Y76E`)
|
|
|
|
## Substance Classes Explained
|
|
|
|
### Chemical
|
|
- Traditional small molecule drugs
|
|
- Have defined molecular structure
|
|
- Include organic and inorganic compounds
|
|
- SMILES, InChI, molecular formula available
|
|
|
|
### Protein
|
|
- Polypeptides and proteins
|
|
- Sequence information available
|
|
- May have post-translational modifications
|
|
- Includes antibodies, enzymes, hormones
|
|
|
|
### Nucleic Acid
|
|
- DNA and RNA sequences
|
|
- Oligonucleotides
|
|
- Antisense, siRNA, mRNA
|
|
- Sequence data available
|
|
|
|
### Polymer
|
|
- Synthetic and natural polymers
|
|
- Structural repeat units
|
|
- Molecular weight distributions
|
|
- Used as excipients and active ingredients
|
|
|
|
### Structurally Diverse
|
|
- Complex natural products
|
|
- Botanical extracts
|
|
- Materials without single molecular structure
|
|
- Characterized by source and composition
|
|
|
|
### Mixture
|
|
- Defined combinations of substances
|
|
- Fixed or variable composition
|
|
- Each component trackable
|
|
|
|
## Additional Resources
|
|
|
|
- FDA Substance Registration System: https://fdasis.nlm.nih.gov/srs/
|
|
- UNII Search: https://precision.fda.gov/uniisearch
|
|
- OpenFDA Other APIs: https://open.fda.gov/apis/other/
|
|
- API Basics: See `api_basics.md` in this references directory
|
|
- Python examples: See `scripts/fda_substance_query.py`
|