Files
gh-k-dense-ai-claude-scient…/skills/pyhealth/references/medical_coding.md
2025-11-30 08:30:10 +08:00

8.3 KiB

PyHealth Medical Code Translation

Overview

Healthcare data uses multiple coding systems and standards. PyHealth's MedCode module enables translation and mapping between medical coding systems through ontology lookups and cross-system mappings.

Core Classes

InnerMap

Handles within-system ontology lookups and hierarchical navigation.

Key Capabilities:

  • Code lookup with attributes (names, descriptions)
  • Ancestor/descendant hierarchy traversal
  • Code standardization and conversion
  • Parent-child relationship navigation

CrossMap

Manages cross-system mappings between different coding standards.

Key Capabilities:

  • Translation between coding systems
  • Many-to-many relationship handling
  • Hierarchical level specification (for medications)
  • Bidirectional mapping support

Supported Coding Systems

Diagnosis Codes

ICD-9-CM (International Classification of Diseases, 9th Revision, Clinical Modification)

  • Legacy diagnosis coding system
  • Hierarchical structure with 3-5 digit codes
  • Used in US healthcare pre-2015
  • Usage: from pyhealth.medcode import InnerMap
    • icd9_map = InnerMap.load("ICD9CM")

ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification)

  • Current diagnosis coding standard
  • Alphanumeric codes (3-7 characters)
  • More granular than ICD-9
  • Usage: from pyhealth.medcode import InnerMap
    • icd10_map = InnerMap.load("ICD10CM")

CCSCM (Clinical Classifications Software for ICD-CM)

  • Groups ICD codes into clinically meaningful categories
  • Reduces dimensionality for analysis
  • Single-level and multi-level hierarchies
  • Usage: from pyhealth.medcode import CrossMap
    • icd_to_ccs = CrossMap.load("ICD9CM", "CCSCM")

Procedure Codes

ICD-9-PROC (ICD-9 Procedure Codes)

  • Inpatient procedure classification
  • 3-4 digit numeric codes
  • Legacy system (pre-2015)
  • Usage: from pyhealth.medcode import InnerMap
    • icd9proc_map = InnerMap.load("ICD9PROC")

ICD-10-PROC (ICD-10 Procedure Coding System)

  • Current procedural coding standard
  • 7-character alphanumeric codes
  • More detailed than ICD-9-PROC
  • Usage: from pyhealth.medcode import InnerMap
    • icd10proc_map = InnerMap.load("ICD10PROC")

CCSPROC (Clinical Classifications Software for Procedures)

  • Groups procedure codes into categories
  • Simplifies procedure analysis
  • Usage: from pyhealth.medcode import CrossMap
    • proc_to_ccs = CrossMap.load("ICD9PROC", "CCSPROC")

Medication Codes

NDC (National Drug Code)

  • US FDA drug identification system
  • 10 or 11-digit codes
  • Product-level specificity (manufacturer, strength, package)
  • Usage: from pyhealth.medcode import InnerMap
    • ndc_map = InnerMap.load("NDC")

RxNorm

  • Standardized drug terminology
  • Normalized drug names and relationships
  • Links multiple drug vocabularies
  • Usage: from pyhealth.medcode import CrossMap
    • ndc_to_rxnorm = CrossMap.load("NDC", "RXNORM")

ATC (Anatomical Therapeutic Chemical Classification)

  • WHO drug classification system
  • 5-level hierarchy:
    • Level 1: Anatomical main group (1 letter)
    • Level 2: Therapeutic subgroup (2 digits)
    • Level 3: Pharmacological subgroup (1 letter)
    • Level 4: Chemical subgroup (1 letter)
    • Level 5: Chemical substance (2 digits)
  • Example: "C03CA01" = Furosemide
    • C = Cardiovascular system
    • C03 = Diuretics
    • C03C = High-ceiling diuretics
    • C03CA = Sulfonamides
    • C03CA01 = Furosemide

Usage:

from pyhealth.medcode import CrossMap
ndc_to_atc = CrossMap.load("NDC", "ATC")
atc_codes = ndc_to_atc.map("00074-3799-13", level=3)  # Get ATC level 3

Common Operations

InnerMap Operations

1. Code Lookup

from pyhealth.medcode import InnerMap

icd9_map = InnerMap.load("ICD9CM")
info = icd9_map.lookup("428.0")  # Heart failure
# Returns: name, description, additional attributes

2. Ancestor Traversal

# Get all parent codes in hierarchy
ancestors = icd9_map.get_ancestors("428.0")
# Returns: ["428", "420-429", "390-459"]

3. Descendant Traversal

# Get all child codes
descendants = icd9_map.get_descendants("428")
# Returns: ["428.0", "428.1", "428.2", ...]

4. Code Standardization

# Normalize code format
standard_code = icd9_map.standardize("4280")  # Returns "428.0"

CrossMap Operations

1. Direct Translation

from pyhealth.medcode import CrossMap

# ICD-9-CM to CCS
icd_to_ccs = CrossMap.load("ICD9CM", "CCSCM")
ccs_codes = icd_to_ccs.map("82101")  # Coronary atherosclerosis
# Returns: ["101"]  # CCS category for coronary atherosclerosis

2. Hierarchical Drug Mapping

# NDC to ATC at different levels
ndc_to_atc = CrossMap.load("NDC", "ATC")

# Get specific ATC level
atc_level_1 = ndc_to_atc.map("00074-3799-13", level=1)  # Anatomical group
atc_level_3 = ndc_to_atc.map("00074-3799-13", level=3)  # Pharmacological
atc_level_5 = ndc_to_atc.map("00074-3799-13", level=5)  # Chemical substance

3. Bidirectional Mapping

# Map in either direction
rxnorm_to_ndc = CrossMap.load("RXNORM", "NDC")
ndc_codes = rxnorm_to_ndc.map("197381")  # Get all NDC codes for RxNorm

Workflow Examples

Example 1: Standardize and Group Diagnoses

from pyhealth.medcode import InnerMap, CrossMap

# Load maps
icd9_map = InnerMap.load("ICD9CM")
icd_to_ccs = CrossMap.load("ICD9CM", "CCSCM")

# Process diagnosis codes
raw_codes = ["4280", "428.0", "42800"]

standardized = [icd9_map.standardize(code) for code in raw_codes]
# All become "428.0"

ccs_categories = [icd_to_ccs.map(code)[0] for code in standardized]
# All map to CCS category "108" (Heart failure)

Example 2: Drug Classification Analysis

from pyhealth.medcode import CrossMap

# Map NDC to ATC for drug class analysis
ndc_to_atc = CrossMap.load("NDC", "ATC")

patient_drugs = ["00074-3799-13", "00074-7286-01", "00456-0765-01"]

# Get therapeutic subgroups (ATC level 2)
drug_classes = []
for ndc in patient_drugs:
    atc_codes = ndc_to_atc.map(ndc, level=2)
    if atc_codes:
        drug_classes.append(atc_codes[0])

# Analyze drug class distribution

Example 3: ICD-9 to ICD-10 Migration

from pyhealth.medcode import CrossMap

# Load ICD-9 to ICD-10 mapping
icd9_to_icd10 = CrossMap.load("ICD9CM", "ICD10CM")

# Convert historical ICD-9 codes
icd9_code = "428.0"
icd10_codes = icd9_to_icd10.map(icd9_code)
# Returns: ["I50.9", "I50.1", ...]  # Multiple possible ICD-10 codes

# Handle one-to-many mappings
for icd10_code in icd10_codes:
    print(f"ICD-9 {icd9_code} -> ICD-10 {icd10_code}")

Integration with Datasets

Medical code translation integrates seamlessly with PyHealth datasets:

from pyhealth.datasets import MIMIC4Dataset
from pyhealth.medcode import CrossMap

# Load dataset
dataset = MIMIC4Dataset(root="/path/to/data")

# Load code mapping
icd_to_ccs = CrossMap.load("ICD10CM", "CCSCM")

# Process patient diagnoses
for patient in dataset.iter_patients():
    for visit in patient.visits:
        diagnosis_events = [e for e in visit.events if e.vocabulary == "ICD10CM"]

        for event in diagnosis_events:
            ccs_codes = icd_to_ccs.map(event.code)
            print(f"Diagnosis {event.code} -> CCS {ccs_codes}")

Use Cases

Clinical Research

  • Standardize diagnoses across different coding systems
  • Group related conditions for cohort identification
  • Harmonize multi-site studies with different standards

Drug Safety Analysis

  • Classify medications by therapeutic class
  • Identify drug-drug interactions at class level
  • Analyze polypharmacy patterns

Healthcare Analytics

  • Reduce diagnosis/procedure dimensionality
  • Create meaningful clinical categories
  • Enable longitudinal analysis across coding system changes

Machine Learning

  • Create consistent feature representations
  • Handle vocabulary mismatch in training/test data
  • Generate hierarchical embeddings

Best Practices

  1. Always standardize codes before mapping to ensure consistent format
  2. Handle one-to-many mappings appropriately (some codes map to multiple targets)
  3. Specify ATC level explicitly when mapping drugs to avoid ambiguity
  4. Use CCS categories to reduce diagnosis/procedure dimensionality
  5. Validate mappings as some codes may not have direct translations
  6. Document code versions (ICD-9 vs ICD-10) to maintain data provenance