8.3 KiB
PyHealth Medical Code Translation
Overview
Healthcare data uses multiple coding systems and standards. PyHealth's MedCode module enables translation and mapping between medical coding systems through ontology lookups and cross-system mappings.
Core Classes
InnerMap
Handles within-system ontology lookups and hierarchical navigation.
Key Capabilities:
- Code lookup with attributes (names, descriptions)
- Ancestor/descendant hierarchy traversal
- Code standardization and conversion
- Parent-child relationship navigation
CrossMap
Manages cross-system mappings between different coding standards.
Key Capabilities:
- Translation between coding systems
- Many-to-many relationship handling
- Hierarchical level specification (for medications)
- Bidirectional mapping support
Supported Coding Systems
Diagnosis Codes
ICD-9-CM (International Classification of Diseases, 9th Revision, Clinical Modification)
- Legacy diagnosis coding system
- Hierarchical structure with 3-5 digit codes
- Used in US healthcare pre-2015
- Usage:
from pyhealth.medcode import InnerMapicd9_map = InnerMap.load("ICD9CM")
ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification)
- Current diagnosis coding standard
- Alphanumeric codes (3-7 characters)
- More granular than ICD-9
- Usage:
from pyhealth.medcode import InnerMapicd10_map = InnerMap.load("ICD10CM")
CCSCM (Clinical Classifications Software for ICD-CM)
- Groups ICD codes into clinically meaningful categories
- Reduces dimensionality for analysis
- Single-level and multi-level hierarchies
- Usage:
from pyhealth.medcode import CrossMapicd_to_ccs = CrossMap.load("ICD9CM", "CCSCM")
Procedure Codes
ICD-9-PROC (ICD-9 Procedure Codes)
- Inpatient procedure classification
- 3-4 digit numeric codes
- Legacy system (pre-2015)
- Usage:
from pyhealth.medcode import InnerMapicd9proc_map = InnerMap.load("ICD9PROC")
ICD-10-PROC (ICD-10 Procedure Coding System)
- Current procedural coding standard
- 7-character alphanumeric codes
- More detailed than ICD-9-PROC
- Usage:
from pyhealth.medcode import InnerMapicd10proc_map = InnerMap.load("ICD10PROC")
CCSPROC (Clinical Classifications Software for Procedures)
- Groups procedure codes into categories
- Simplifies procedure analysis
- Usage:
from pyhealth.medcode import CrossMapproc_to_ccs = CrossMap.load("ICD9PROC", "CCSPROC")
Medication Codes
NDC (National Drug Code)
- US FDA drug identification system
- 10 or 11-digit codes
- Product-level specificity (manufacturer, strength, package)
- Usage:
from pyhealth.medcode import InnerMapndc_map = InnerMap.load("NDC")
RxNorm
- Standardized drug terminology
- Normalized drug names and relationships
- Links multiple drug vocabularies
- Usage:
from pyhealth.medcode import CrossMapndc_to_rxnorm = CrossMap.load("NDC", "RXNORM")
ATC (Anatomical Therapeutic Chemical Classification)
- WHO drug classification system
- 5-level hierarchy:
- Level 1: Anatomical main group (1 letter)
- Level 2: Therapeutic subgroup (2 digits)
- Level 3: Pharmacological subgroup (1 letter)
- Level 4: Chemical subgroup (1 letter)
- Level 5: Chemical substance (2 digits)
- Example: "C03CA01" = Furosemide
- C = Cardiovascular system
- C03 = Diuretics
- C03C = High-ceiling diuretics
- C03CA = Sulfonamides
- C03CA01 = Furosemide
Usage:
from pyhealth.medcode import CrossMap
ndc_to_atc = CrossMap.load("NDC", "ATC")
atc_codes = ndc_to_atc.map("00074-3799-13", level=3) # Get ATC level 3
Common Operations
InnerMap Operations
1. Code Lookup
from pyhealth.medcode import InnerMap
icd9_map = InnerMap.load("ICD9CM")
info = icd9_map.lookup("428.0") # Heart failure
# Returns: name, description, additional attributes
2. Ancestor Traversal
# Get all parent codes in hierarchy
ancestors = icd9_map.get_ancestors("428.0")
# Returns: ["428", "420-429", "390-459"]
3. Descendant Traversal
# Get all child codes
descendants = icd9_map.get_descendants("428")
# Returns: ["428.0", "428.1", "428.2", ...]
4. Code Standardization
# Normalize code format
standard_code = icd9_map.standardize("4280") # Returns "428.0"
CrossMap Operations
1. Direct Translation
from pyhealth.medcode import CrossMap
# ICD-9-CM to CCS
icd_to_ccs = CrossMap.load("ICD9CM", "CCSCM")
ccs_codes = icd_to_ccs.map("82101") # Coronary atherosclerosis
# Returns: ["101"] # CCS category for coronary atherosclerosis
2. Hierarchical Drug Mapping
# NDC to ATC at different levels
ndc_to_atc = CrossMap.load("NDC", "ATC")
# Get specific ATC level
atc_level_1 = ndc_to_atc.map("00074-3799-13", level=1) # Anatomical group
atc_level_3 = ndc_to_atc.map("00074-3799-13", level=3) # Pharmacological
atc_level_5 = ndc_to_atc.map("00074-3799-13", level=5) # Chemical substance
3. Bidirectional Mapping
# Map in either direction
rxnorm_to_ndc = CrossMap.load("RXNORM", "NDC")
ndc_codes = rxnorm_to_ndc.map("197381") # Get all NDC codes for RxNorm
Workflow Examples
Example 1: Standardize and Group Diagnoses
from pyhealth.medcode import InnerMap, CrossMap
# Load maps
icd9_map = InnerMap.load("ICD9CM")
icd_to_ccs = CrossMap.load("ICD9CM", "CCSCM")
# Process diagnosis codes
raw_codes = ["4280", "428.0", "42800"]
standardized = [icd9_map.standardize(code) for code in raw_codes]
# All become "428.0"
ccs_categories = [icd_to_ccs.map(code)[0] for code in standardized]
# All map to CCS category "108" (Heart failure)
Example 2: Drug Classification Analysis
from pyhealth.medcode import CrossMap
# Map NDC to ATC for drug class analysis
ndc_to_atc = CrossMap.load("NDC", "ATC")
patient_drugs = ["00074-3799-13", "00074-7286-01", "00456-0765-01"]
# Get therapeutic subgroups (ATC level 2)
drug_classes = []
for ndc in patient_drugs:
atc_codes = ndc_to_atc.map(ndc, level=2)
if atc_codes:
drug_classes.append(atc_codes[0])
# Analyze drug class distribution
Example 3: ICD-9 to ICD-10 Migration
from pyhealth.medcode import CrossMap
# Load ICD-9 to ICD-10 mapping
icd9_to_icd10 = CrossMap.load("ICD9CM", "ICD10CM")
# Convert historical ICD-9 codes
icd9_code = "428.0"
icd10_codes = icd9_to_icd10.map(icd9_code)
# Returns: ["I50.9", "I50.1", ...] # Multiple possible ICD-10 codes
# Handle one-to-many mappings
for icd10_code in icd10_codes:
print(f"ICD-9 {icd9_code} -> ICD-10 {icd10_code}")
Integration with Datasets
Medical code translation integrates seamlessly with PyHealth datasets:
from pyhealth.datasets import MIMIC4Dataset
from pyhealth.medcode import CrossMap
# Load dataset
dataset = MIMIC4Dataset(root="/path/to/data")
# Load code mapping
icd_to_ccs = CrossMap.load("ICD10CM", "CCSCM")
# Process patient diagnoses
for patient in dataset.iter_patients():
for visit in patient.visits:
diagnosis_events = [e for e in visit.events if e.vocabulary == "ICD10CM"]
for event in diagnosis_events:
ccs_codes = icd_to_ccs.map(event.code)
print(f"Diagnosis {event.code} -> CCS {ccs_codes}")
Use Cases
Clinical Research
- Standardize diagnoses across different coding systems
- Group related conditions for cohort identification
- Harmonize multi-site studies with different standards
Drug Safety Analysis
- Classify medications by therapeutic class
- Identify drug-drug interactions at class level
- Analyze polypharmacy patterns
Healthcare Analytics
- Reduce diagnosis/procedure dimensionality
- Create meaningful clinical categories
- Enable longitudinal analysis across coding system changes
Machine Learning
- Create consistent feature representations
- Handle vocabulary mismatch in training/test data
- Generate hierarchical embeddings
Best Practices
- Always standardize codes before mapping to ensure consistent format
- Handle one-to-many mappings appropriately (some codes map to multiple targets)
- Specify ATC level explicitly when mapping drugs to avoid ambiguity
- Use CCS categories to reduce diagnosis/procedure dimensionality
- Validate mappings as some codes may not have direct translations
- Document code versions (ICD-9 vs ICD-10) to maintain data provenance