Files
gh-policyengine-policyengin…/skills/policyengine-code-style-skill/SKILL.md
2025-11-30 08:47:54 +08:00

11 KiB

name, description
name description
policyengine-code-style PolicyEngine code writing style guide - formula optimization, direct returns, eliminating unnecessary variables

PolicyEngine Code Writing Style Guide

Essential patterns for writing clean, efficient PolicyEngine formulas.

Core Principles

  1. Eliminate unnecessary intermediate variables
  2. Use direct parameter/variable access
  3. Return directly when possible
  4. Combine boolean logic
  5. Use correct period access (period vs period.this_year)
  6. NO hardcoded values - use parameters or constants

Pattern 1: Direct Parameter Access

Bad - Unnecessary intermediate variable

def formula(spm_unit, period, parameters):
    countable = spm_unit("tn_tanf_countable_resources", period)
    p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
    resource_limit = p.amount  # ❌ Unnecessary
    return countable <= resource_limit

Good - Direct access

def formula(spm_unit, period, parameters):
    countable = spm_unit("tn_tanf_countable_resources", period)
    p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
    return countable <= p.amount

Pattern 2: Direct Return

Bad - Unnecessary result variable

def formula(spm_unit, period, parameters):
    assets = spm_unit("spm_unit_assets", period.this_year)
    p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
    vehicle_exemption = p.vehicle_exemption  # ❌ Unnecessary
    countable = max_(assets - vehicle_exemption, 0)  # ❌ Unnecessary
    return countable

Good - Direct return

def formula(spm_unit, period, parameters):
    assets = spm_unit("spm_unit_assets", period.this_year)
    p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
    return max_(assets - p.vehicle_exemption, 0)

Pattern 3: Combined Boolean Logic

Bad - Too many intermediate booleans

def formula(spm_unit, period, parameters):
    person = spm_unit.members
    age = person("age", period.this_year)
    is_disabled = person("is_disabled", period.this_year)

    caretaker_is_60_or_older = spm_unit.any(age >= 60)  # ❌ Unnecessary
    caretaker_is_disabled = spm_unit.any(is_disabled)   # ❌ Unnecessary
    eligible = caretaker_is_60_or_older | caretaker_is_disabled  # ❌ Unnecessary

    return eligible

Good - Combined logic

def formula(spm_unit, period, parameters):
    person = spm_unit.members
    age = person("age", period.this_year)
    is_disabled = person("is_disabled", period.this_year)

    return spm_unit.any((age >= 60) | is_disabled)

Pattern 4: Period Access - period vs period.this_year

Bad - Wrong period access

def formula(person, period, parameters):
    # MONTH formula accessing YEAR variables
    age = person("age", period)  # ❌ Gives age/12 = 2.5 "monthly age"
    assets = person("assets", period)  # ❌ Gives assets/12
    monthly_income = person("employment_income", period.this_year) / MONTHS_IN_YEAR  # ❌ Redundant

    return (age >= 18) & (assets < 10000) & (monthly_income < 2000)

Good - Correct period access

def formula(person, period, parameters):
    # MONTH formula accessing YEAR variables
    age = person("age", period.this_year)  # ✅ Gets actual age (30)
    assets = person("assets", period.this_year)  # ✅ Gets actual assets ($10,000)
    monthly_income = person("employment_income", period)  # ✅ Auto-converts to monthly

    p = parameters(period).gov.program.eligibility
    return (age >= p.age_min) & (age <= p.age_max) &
           (assets < p.asset_limit) & (monthly_income < p.income_threshold)

Rule:

  • Income/flows → Use period (want monthly from annual)
  • Age/assets/counts/booleans → Use period.this_year (don't divide by 12)

Pattern 5: No Hardcoded Values

Bad - Hardcoded numbers

def formula(spm_unit, period, parameters):
    size = spm_unit.nb_persons()
    capped_size = min_(size, 10)  # ❌ Hardcoded

    age = person("age", period.this_year)
    income = person("income", period) / 12  # ❌ Use MONTHS_IN_YEAR

    # ❌ Hardcoded thresholds
    if age >= 18 and age <= 65 and income < 2000:
        return True

Good - Parameterized

def formula(spm_unit, period, parameters):
    p = parameters(period).gov.program
    capped_size = min_(spm_unit.nb_persons(), p.max_unit_size)  # ✅

    age = person("age", period.this_year)
    monthly_income = person("income", period)  # ✅ Auto-converts (no manual /12)

    age_eligible = (age >= p.age_min) & (age <= p.age_max)  # ✅
    income_eligible = monthly_income < p.income_threshold  # ✅

    return age_eligible & income_eligible

Pattern 6: Streamline Variable Access

Bad - Redundant steps

def formula(spm_unit, period, parameters):
    unit_size = spm_unit.nb_persons()  # ❌ Unnecessary
    max_size = 10  # ❌ Hardcoded
    capped_size = min_(unit_size, max_size)

    p = parameters(period).gov.states.tn.dhs.tanf.benefit
    spa = p.standard_payment_amount[capped_size]  # ❌ Unnecessary
    dgpa = p.differential_grant_payment_amount[capped_size]  # ❌ Unnecessary

    eligible = spm_unit("eligible_for_dgpa", period)
    return where(eligible, dgpa, spa)

Good - Streamlined

def formula(spm_unit, period, parameters):
    p = parameters(period).gov.states.tn.dhs.tanf.benefit
    capped_size = min_(spm_unit.nb_persons(), p.max_unit_size)
    eligible = spm_unit("eligible_for_dgpa", period)

    return where(
        eligible,
        p.differential_grant_payment_amount[capped_size],
        p.standard_payment_amount[capped_size]
    )

When to Keep Intermediate Variables

Keep when value is used multiple times

def formula(tax_unit, period, parameters):
    p = parameters(period).gov.irs.credits
    filing_status = tax_unit("filing_status", period)

    # ✅ Used multiple times - keep as variable
    threshold = p.phase_out.start[filing_status]

    income = tax_unit("adjusted_gross_income", period)
    excess = max_(0, income - threshold)
    reduction = (excess / p.phase_out.width) * threshold

    return max_(0, threshold - reduction)

Keep when calculation is complex

def formula(spm_unit, period, parameters):
    p = parameters(period).gov.program
    gross_earned = spm_unit("gross_earned_income", period)

    # ✅ Complex multi-step calculation - break it down
    work_expense_deduction = min_(gross_earned * p.work_expense_rate, p.work_expense_max)
    after_work_expense = gross_earned - work_expense_deduction

    earned_disregard = after_work_expense * p.earned_disregard_rate
    countable_earned = after_work_expense - earned_disregard

    dependent_care = spm_unit("dependent_care_expenses", period)

    return max_(0, countable_earned - dependent_care)

Complete Example: Before vs After

Before - Multiple Issues

def formula(person, period, parameters):
    # Wrong period access
    age = person("age", period)  # ❌ age/12
    assets = person("assets", period)  # ❌ assets/12
    annual_income = person("employment_income", period.this_year)
    monthly_income = annual_income / 12  # ❌ Use MONTHS_IN_YEAR

    # Hardcoded values
    min_age = 18  # ❌
    max_age = 64  # ❌
    asset_limit = 10000  # ❌
    income_limit = 2000  # ❌

    # Unnecessary intermediate variables
    age_check = (age >= min_age) & (age <= max_age)
    asset_check = assets <= asset_limit
    income_check = monthly_income <= income_limit
    eligible = age_check & asset_check & income_check

    return eligible

After - Clean and Correct

def formula(person, period, parameters):
    p = parameters(period).gov.program.eligibility

    # Correct period access
    age = person("age", period.this_year)
    assets = person("assets", period.this_year)
    monthly_income = person("employment_income", period)

    # Direct return with combined logic
    return (
        (age >= p.age_min) & (age <= p.age_max) &
        (assets <= p.asset_limit) &
        (monthly_income <= p.income_threshold)
    )

Pattern 7: Minimal Comments

Code Should Be Self-Documenting

Variable names and structure should explain the code - not comments.

Bad - Verbose explanatory comments

def formula(spm_unit, period, parameters):
    # Wisconsin disregards all earned income of dependent children (< 18)
    # Calculate earned income for adults only
    is_adult = spm_unit.members("age", period.this_year) >= 18  # Hard-coded!
    adult_earned = spm_unit.sum(
        spm_unit.members("tanf_gross_earned_income", period) * is_adult
    )

    # All unearned income is counted (including children's)
    gross_unearned = add(spm_unit, period, ["tanf_gross_unearned_income"])

    # NOTE: Wisconsin disregards many additional income sources that
    # are not separately tracked in PolicyEngine (educational aid, etc.)
    return max_(total_income - disregards, 0)

Good - Clean self-documenting code

def formula(spm_unit, period, parameters):
    p = parameters(period).gov.states.wi.dcf.tanf.income

    is_adult = spm_unit.members("age", period.this_year) >= p.adult_age_threshold
    adult_earned = spm_unit.sum(
        spm_unit.members("tanf_gross_earned_income", period) * is_adult
    )
    gross_unearned = add(spm_unit, period, ["tanf_gross_unearned_income"])
    child_support = add(spm_unit, period, ["child_support_received"])

    return max_(adult_earned + gross_unearned - child_support, 0)

Comment Rules

  1. NO comments explaining what code does - variable names should be clear
  2. OK: Brief NOTE about PolicyEngine limitations (one line):
    # NOTE: Time limit cannot be tracked in PolicyEngine
    
  3. NO multi-line explanations of what the code calculates

Quick Checklist

Before finalizing code:

  • No hardcoded numbers (use parameters or constants like MONTHS_IN_YEAR)
  • Correct period access:
    • Income/flows use period
    • Age/assets/counts/booleans use period.this_year
  • No single-use intermediate variables
  • Direct parameter access (p.amount not amount = p.amount)
  • Direct returns when possible
  • Combined boolean logic when possible
  • Minimal comments (code should be self-documenting)

Key Takeaways

  1. Less is more - Eliminate unnecessary variables
  2. Direct is better - Access parameters and return directly
  3. Combine when logical - Group related boolean conditions
  4. Keep when needed - Complex calculations and reused values deserve variables
  5. Period matters - Use correct period access to avoid auto-conversion bugs

  • policyengine-period-patterns-skill - Deep dive on period handling
  • policyengine-implementation-patterns-skill - Variable structure and patterns
  • policyengine-vectorization-skill - NumPy operations and vectorization

For Agents

When writing or reviewing formulas:

  1. Scan for single-use variables - eliminate them
  2. Check period access - ensure correct for variable type
  3. Look for hardcoded values - parameterize them
  4. Identify redundant steps - streamline them
  5. Consider readability - keep complex calculations clear