zhongwei/gh-policyengine-policyengine-claude-complete

Files

Zhongwei Li bd35f442d8 Initial commit

2025-11-30 08:47:54 +08:00

23 KiB

Raw Permalink Blame History

name, description

name	description
policyengine-implementation-patterns	PolicyEngine implementation patterns - variable creation, no hard-coding principle, federal/state separation, metadata standards

PolicyEngine Implementation Patterns

Essential patterns for implementing government benefit program rules in PolicyEngine.

PolicyEngine Architecture Constraints

What CANNOT Be Simulated (Single-Period Limitation)

CRITICAL: PolicyEngine uses single-period simulation architecture

The following CANNOT be implemented and should be SKIPPED when found in documentation:

1. Time Limits and Lifetime Counters

Cannot simulate:

ANY lifetime benefit limits (X months total)
ANY time windows (X months within Y period)
Benefit clocks and countable months
Cumulative time tracking

Why: Requires tracking benefit history across multiple periods. PolicyEngine simulates one period at a time with no state persistence.

What to do: Document in comments but DON'T parameterize or implement:

# NOTE: [State] has [X]-month lifetime limit on [Program] benefits
# This cannot be simulated in PolicyEngine's single-period architecture

2. Work History Requirements

Cannot simulate:

"Must have worked 6 of last 12 months"
"Averaged 30 hours/week over past quarter"
Prior employment verification
Work participation rate tracking

Why: Requires historical data from previous periods.

3. Waiting Periods and Benefit Delays

Cannot simulate:

"3-month waiting period for new residents"
"Benefits start month after application"
Retroactive eligibility
Benefit recertification cycles

Why: Requires tracking application dates and eligibility history.

4. Progressive Sanctions and Penalties

Cannot simulate:

"First violation: 1-month sanction, Second: 3-month, Third: permanent"
Graduated penalties
Strike systems

Why: Requires tracking violation history.

5. Asset Spend-Down Over Time

Cannot simulate:

Medical spend-down across months
Resource depletion tracking
Accumulated medical expenses

Why: Requires tracking expenses and resources across periods.

What CAN Be Simulated (With Caveats)

PolicyEngine CAN simulate point-in-time eligibility and benefits:

✅ Current month income limits
✅ Current month resource limits
✅ Current benefit calculations
✅ Current household composition
✅ Current deductions and disregards

Time-Limited Benefits That Affect Current Calculations

Special Case: Time-limited deductions/disregards

When a deduction or disregard is only available for X months:

DO implement the deduction (assume it applies)
DO add a comment explaining the time limitation
DON'T try to track or enforce the time limit

Example:

class state_tanf_countable_earned_income(Variable):
    def formula(spm_unit, period, parameters):
        p = parameters(period).gov.states.xx.tanf.income
        earned = spm_unit("tanf_gross_earned_income", period)

        # NOTE: In reality, this 75% disregard only applies for first 4 months
        # of employment. PolicyEngine cannot track employment duration, so we
        # apply the disregard assuming the household qualifies.
        # Actual rule: [State Code Citation]
        disregard_rate = p.earned_income_disregard_rate  # 0.75

        return earned * (1 - disregard_rate)

Rule: If it requires history or future tracking, it CANNOT be fully simulated - but implement what we can and document limitations

Critical Principles

1. ZERO Hard-Coded Values

Every numeric value MUST be parameterized

❌ FORBIDDEN:
return where(eligible, 1000, 0)     # Hard-coded 1000
age < 15                             # Hard-coded 15
benefit = income * 0.33              # Hard-coded 0.33
month >= 10 and month <= 3           # Hard-coded months

✅ REQUIRED:
return where(eligible, p.maximum_benefit, 0)
age < p.age_threshold.minor_child
benefit = income * p.benefit_rate
month >= p.season.start_month

Acceptable literals:

0, 1, -1 for basic math
12 for month conversion (/ 12, * 12)
Array indices when structure is known

2. No Placeholder Implementations

Delete the file rather than leave placeholders

❌ NEVER:
def formula(entity, period, parameters):
    # TODO: Implement
    return 75  # Placeholder

✅ ALWAYS:
# Complete implementation or no file at all

Variable Implementation Standards

Variable Metadata Format

Follow established patterns:

class il_tanf_countable_earned_income(Variable):
    value_type = float
    entity = SPMUnit
    definition_period = MONTH
    label = "Illinois TANF countable earned income"
    unit = USD
    reference = "https://www.law.cornell.edu/regulations/illinois/..."
    defined_for = StateCode.IL

    # Use adds for simple sums
    adds = ["il_tanf_earned_income_after_disregard"]

Key rules:

✅ Use full URL in reference (clickable)
❌ Don't use documentation field
❌ Don't use statute citations without URLs

When to Use `adds` vs `formula`

Use adds when:

Just summing variables
Passing through a single variable
No transformations needed

✅ BEST - Simple sum:
class tanf_gross_income(Variable):
    adds = ["employment_income", "self_employment_income"]

Use formula when:

Applying transformations
Conditional logic
Calculations needed

✅ CORRECT - Need logic:
def formula(entity, period, parameters):
    income = add(entity, period, ["income1", "income2"])
    return max_(0, income)  # Need max_

TANF Countable Income Pattern

Critical: Verify Calculation Order from Legal Code

MOST IMPORTANT: Always check the state's legal code or policy manual for the exact calculation order. The pattern below is typical but not universal.

The Typical Pattern:

Apply deductions/disregards to earned income only
Use max_() to prevent negative earned income
Add unearned income (which typically has no deductions)

This pattern is based on how MOST TANF programs work, but you MUST verify with the specific state's legal code.

❌ WRONG - Applying deductions to total income

def formula(spm_unit, period, parameters):
    gross_earned = spm_unit("tanf_gross_earned_income", period)
    unearned = spm_unit("tanf_gross_unearned_income", period)
    deductions = spm_unit("tanf_earned_income_deductions", period)

    # ❌ WRONG: Deductions applied to total income
    total_income = gross_earned + unearned
    countable = total_income - deductions

    return max_(countable, 0)

Why this is wrong:

Deductions should ONLY reduce earned income
Unearned income (SSI, child support, etc.) is not subject to work expense deductions
This incorrectly reduces unearned income when earned income is low

Example error:

Earned: $100, Unearned: $500, Deductions: $200
Wrong result: max_($100 + $500 - $200, 0) = $400 (reduces unearned!)
Correct result: max_($100 - $200, 0) + $500 = $500

✅ CORRECT - Apply deductions to earned only, then add unearned

def formula(spm_unit, period, parameters):
    gross_earned = spm_unit("tanf_gross_earned_income", period)
    unearned = spm_unit("tanf_gross_unearned_income", period)
    deductions = spm_unit("tanf_earned_income_deductions", period)

    # ✅ CORRECT: Deductions applied to earned only, then add unearned
    return max_(gross_earned - deductions, 0) + unearned

Pattern Variations

With multiple deduction steps:

def formula(spm_unit, period, parameters):
    p = parameters(period).gov.states.xx.tanf.income
    gross_earned = spm_unit("tanf_gross_earned_income", period)
    unearned = spm_unit("tanf_gross_unearned_income", period)

    # Step 1: Apply work expense deduction
    work_expense = min_(gross_earned * p.work_expense_rate, p.work_expense_max)
    after_work_expense = max_(gross_earned - work_expense, 0)

    # Step 2: Apply earnings disregard
    earnings_disregard = after_work_expense * p.disregard_rate
    countable_earned = max_(after_work_expense - earnings_disregard, 0)

    # Step 3: Add unearned (no deductions applied)
    return countable_earned + unearned

With disregard percentage (simplified):

def formula(spm_unit, period, parameters):
    p = parameters(period).gov.states.xx.tanf.income
    gross_earned = spm_unit("tanf_gross_earned_income", period)
    unearned = spm_unit("tanf_gross_unearned_income", period)

    # Apply disregard to earned (keep 33% = disregard 67%)
    countable_earned = gross_earned * (1 - p.earned_disregard_rate)

    return max_(countable_earned, 0) + unearned

When Unearned Income HAS Deductions

Some states DO have unearned income deductions (rare). Handle separately:

def formula(spm_unit, period, parameters):
    gross_earned = spm_unit("tanf_gross_earned_income", period)
    gross_unearned = spm_unit("tanf_gross_unearned_income", period)
    earned_deductions = spm_unit("tanf_earned_income_deductions", period)
    unearned_deductions = spm_unit("tanf_unearned_income_deductions", period)

    # Apply each type of deduction to its respective income type
    countable_earned = max_(gross_earned - earned_deductions, 0)
    countable_unearned = max_(gross_unearned - unearned_deductions, 0)

    return countable_earned + countable_unearned

Quick Reference

Standard TANF pattern:

Countable Income = max_(Earned - Earned Deductions, 0) + Unearned

NOT:

❌ max_(Earned + Unearned - Deductions, 0)
❌ max_(Earned - Deductions + Unearned, 0)  # Can go negative

Federal/State Separation

Federal Parameters

Location: /parameters/gov/{agency}/

Base formulas and methodologies
National standards
Required elements

State Parameters

Location: /parameters/gov/states/{state}/

State-specific thresholds
Implementation choices
Scale factors

# Federal: parameters/gov/hhs/fpg/base.yaml
first_person: 14_580

# State: parameters/gov/states/ca/scale_factor.yaml
fpg_multiplier: 2.0  # 200% of FPG

Code Reuse Patterns

Avoid Duplication - Create Intermediate Variables

❌ ANTI-PATTERN: Copy-pasting calculations

# File 1: calculates income after deduction
def formula(household, period, parameters):
    gross = add(household, period, ["income"])
    deduction = p.deduction * household.nb_persons()
    return max_(gross - deduction, 0)

# File 2: DUPLICATES same calculation
def formula(household, period, parameters):
    gross = add(household, period, ["income"])  # Copy-pasted
    deduction = p.deduction * household.nb_persons()  # Copy-pasted
    after_deduction = max_(gross - deduction, 0)  # Copy-pasted
    return after_deduction < p.threshold

✅ CORRECT: Reuse existing variables

# File 2: reuses calculation
def formula(household, period, parameters):
    countable_income = household("program_countable_income", period)
    return countable_income < p.threshold

When to create intermediate variables:

Same calculation in 2+ places
Logic exceeds 5 lines
Reference implementations have similar variable

TANF-Specific Patterns

Study Reference Implementations First

MANDATORY before implementing any TANF:

DC TANF: /variables/gov/states/dc/dhs/tanf/
IL TANF: /variables/gov/states/il/dhs/tanf/
TX TANF: /variables/gov/states/tx/hhs/tanf/

Learn from them:

Variable organization
Naming conventions
Code reuse patterns
When to use adds vs formula

Standard TANF Structure

tanf/
├── eligibility/
│   ├── demographic_eligible.py
│   ├── income_eligible.py
│   └── eligible.py
├── income/
│   ├── earned/
│   ├── unearned/
│   └── countable_income.py
└── [state]_tanf.py

Simplified TANF Rules

For simplified implementations:

DON'T create state-specific versions of:

Demographic eligibility (use federal)
Immigration eligibility (use federal)
Income sources (use federal baseline)

❌ DON'T CREATE:
ca_tanf_demographic_eligible_person.py
ca_tanf_gross_earned_income.py
parameters/.../income/sources/earned.yaml

✅ DO USE:
# Federal demographic eligibility
is_demographic_tanf_eligible
# Federal income aggregation
tanf_gross_earned_income

Avoiding Unnecessary Wrapper Variables (CRITICAL)

Golden Rule: Only create a state variable if you're adding state-specific logic to it!

Understand WHY Variables Exist, Not Just WHAT

When studying reference implementations:

Note which variables they have
READ THE CODE inside each variable
Ask: "Does this variable have state-specific logic?"
If it just returns federal baseline → DON'T copy it

Variable Creation Decision Tree

Before creating ANY state-specific variable, ask:

Does federal baseline already calculate this?
Does my state do it DIFFERENTLY than federal?
Can I write the difference in 1+ lines of state-specific logic?
Will this calculation be used in 2+ other variables? (Code reuse exception)

Decision:

If YES/NO/NO/NO → DON'T create the variable, use federal directly
If YES/YES/YES/NO → CREATE the variable with state logic
If YES/NO/NO/YES → CREATE as intermediate variable for code reuse (see exception below)

EXCEPTION: Code Reuse Justifies Intermediate Variables

Even without state-specific logic, create a variable if the SAME calculation is used in multiple places.

❌ Bad - Duplicating calculation across variables:

# Variable 1 - Income eligibility
class mo_tanf_income_eligible(Variable):
    def formula(spm_unit, period, parameters):
        # Duplicated calculation
        gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
        return gross <= p.income_limit

# Variable 2 - Countable income
class mo_tanf_countable_income(Variable):
    def formula(spm_unit, period, parameters):
        # SAME calculation repeated!
        gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
        deductions = spm_unit("mo_tanf_deductions", period)
        return max_(gross - deductions, 0)

# Variable 3 - Need standard
class mo_tanf_need_standard(Variable):
    def formula(spm_unit, period, parameters):
        # SAME calculation AGAIN!
        gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
        return where(gross < p.threshold, p.high, p.low)

✅ Good - Extract into reusable intermediate variable:

# Intermediate variable - used in multiple places
class mo_tanf_gross_income(Variable):
    adds = ["tanf_gross_earned_income", "tanf_gross_unearned_income"]

# Variable 1 - Reuses intermediate
class mo_tanf_income_eligible(Variable):
    def formula(spm_unit, period, parameters):
        gross = spm_unit("mo_tanf_gross_income", period)  # Reuse
        return gross <= p.income_limit

# Variable 2 - Reuses intermediate
class mo_tanf_countable_income(Variable):
    def formula(spm_unit, period, parameters):
        gross = spm_unit("mo_tanf_gross_income", period)  # Reuse
        deductions = spm_unit("mo_tanf_deductions", period)
        return max_(gross - deductions, 0)

# Variable 3 - Reuses intermediate
class mo_tanf_need_standard(Variable):
    def formula(spm_unit, period, parameters):
        gross = spm_unit("mo_tanf_gross_income", period)  # Reuse
        return where(gross < p.threshold, p.high, p.low)

When to create intermediate variables for reuse:

✅ Same calculation appears in 2+ variables
✅ Represents a meaningful concept (e.g., "gross income", "net resources")
✅ Simplifies maintenance (change once vs many places)
✅ Follows DRY (Don't Repeat Yourself) principle

When NOT to create (still a wrapper):

❌ Only used in ONE place
❌ Just passes through another variable unchanged
❌ Adds indirection without code reuse benefit

Red Flags for Unnecessary Wrapper Variables

❌ INVALID - Pure wrapper, no state logic:
class in_tanf_assistance_unit_size(Variable):
    def formula(spm_unit, period):
        return spm_unit("spm_unit_size", period)  # Just returns federal

❌ INVALID - Aggregation without transformation:
class in_tanf_countable_unearned_income(Variable):
    def formula(tax_unit, period):
        return tax_unit.sum(person("tanf_gross_unearned_income", period))

❌ INVALID - Pass-through with no modification:
class in_tanf_gross_income(Variable):
    def formula(entity, period):
        return entity("tanf_gross_income", period)

Examples of VALID State Variables

✅ VALID - Has state-specific disregard:
class in_tanf_countable_earned_income(Variable):
    def formula(spm_unit, period, parameters):
        p = parameters(period).gov.states.in.tanf.income
        earned = spm_unit("tanf_gross_earned_income", period)
        return earned * (1 - p.earned_income_disregard_rate)  # STATE LOGIC

✅ VALID - Uses state-specific limits:
class in_tanf_income_eligible(Variable):
    def formula(spm_unit, period, parameters):
        p = parameters(period).gov.states.in.tanf
        income = spm_unit("tanf_countable_income", period)
        size = spm_unit("spm_unit_size", period.this_year)
        limit = p.income_limit[min_(size, p.max_household_size)]  # STATE PARAMS
        return income <= limit

✅ VALID - IL has different counting rules:
class il_tanf_assistance_unit_size(Variable):
    adds = [
        "il_tanf_payment_eligible_child",  # STATE-SPECIFIC
        "il_tanf_payment_eligible_parent",  # STATE-SPECIFIC
    ]

State Variables to AVOID Creating

For TANF implementations:

❌ DON'T create these (use federal directly):

state_tanf_assistance_unit_size (unless different counting rules like IL)
state_tanf_countable_unearned_income (unless state has disregards)
state_tanf_gross_income (just use federal baseline)
Any variable that's just return entity("federal_variable", period)

✅ DO create these (when state has unique rules):

state_tanf_countable_earned_income (if unique disregard %)
state_tanf_income_eligible (state income limits)
state_tanf_maximum_benefit (state payment standards)
state_tanf (final benefit calculation)

Demographic Eligibility Pattern

Option 1: Use Federal (Simplified)

class ca_tanf_eligible(Variable):
    def formula(spm_unit, period, parameters):
        # Use federal variable
        has_eligible = spm_unit.any(
            spm_unit.members("is_demographic_tanf_eligible", period)
        )
        return has_eligible & income_eligible

Option 2: State-Specific (Different thresholds)

class ca_tanf_demographic_eligible_person(Variable):
    def formula(person, period, parameters):
        p = parameters(period).gov.states.ca.tanf
        age = person("age", period.this_year)  # NOT monthly_age

        age_limit = where(
            person("is_full_time_student", period),
            p.age_threshold.student,
            p.age_threshold.minor_child
        )
        return age < age_limit

Common Implementation Patterns

Income Eligibility

class program_income_eligible(Variable):
    value_type = bool
    entity = SPMUnit
    definition_period = MONTH

    def formula(spm_unit, period, parameters):
        p = parameters(period).gov.states.xx.program
        income = spm_unit("program_countable_income", period)
        size = spm_unit("spm_unit_size", period.this_year)

        # Get threshold from parameters
        threshold = p.income_limit[min_(size, p.max_household_size)]
        return income <= threshold

Benefit Calculation

class program_benefit(Variable):
    value_type = float
    entity = SPMUnit
    definition_period = MONTH
    unit = USD

    def formula(spm_unit, period, parameters):
        p = parameters(period).gov.states.xx.program
        eligible = spm_unit("program_eligible", period)

        # Calculate benefit amount
        base = p.benefit_schedule.base_amount
        adjustment = p.benefit_schedule.adjustment_rate
        size = spm_unit("spm_unit_size", period.this_year)

        amount = base + (size - 1) * adjustment
        return where(eligible, amount, 0)

Using Scale Parameters

def formula(entity, period, parameters):
    p = parameters(period).gov.states.az.program
    federal_p = parameters(period).gov.hhs.fpg

    # Federal base with state scale
    size = entity("household_size", period.this_year)
    fpg = federal_p.first_person + federal_p.additional * (size - 1)
    state_scale = p.income_limit_scale  # Often exists
    income_limit = fpg * state_scale

Variable Creation Checklist

Before creating any variable:

Check if it already exists
Use standard demographic variables (age, is_disabled)
Reuse federal calculations where applicable
Check for household_income before creating new
Look for existing intermediate variables
Study reference implementations

Quality Standards

Complete Implementation Requirements

All values from parameters (no hard-coding)
Complete formula logic
Proper entity aggregation
Correct period handling
Meaningful variable names
Proper metadata

Anti-Patterns to Avoid

Copy-pasting logic between files
Hard-coding any numeric values
Creating duplicate income variables
State-specific versions of federal rules
Placeholder TODOs in production code

Parameter-to-Variable Mapping Requirements

Every Parameter Must Have a Variable

CRITICAL: Complete implementation means every parameter is used!

When you create parameters, you MUST create corresponding variables:

Parameter Type	Required Variable(s)
resources/limit	`state_program_resource_eligible`
income/limit	`state_program_income_eligible`
payment_standard	`state_program_maximum_benefit`
income/disregard	`state_program_countable_earned_income`
categorical/requirements	`state_program_categorically_eligible`

Complete Eligibility Formula

The main eligibility variable MUST combine ALL checks:

class state_program_eligible(Variable):
    def formula(spm_unit, period, parameters):
        income_eligible = spm_unit("state_program_income_eligible", period)
        resource_eligible = spm_unit("state_program_resource_eligible", period)  # DON'T FORGET!
        categorical = spm_unit("state_program_categorically_eligible", period)

        return income_eligible & resource_eligible & categorical

Common Implementation Failures:

❌ Created resource limit parameter but no resource_eligible variable
❌ Main eligible variable only checks income, ignores resources
❌ Parameters created but never referenced in any formula

For Agents

When implementing variables:

Study reference implementations (DC, IL, TX TANF)
Never hard-code values - use parameters
Map every parameter to a variable - no orphaned parameters
Complete ALL eligibility checks - income AND resources AND categorical
Reuse existing variables - avoid duplication
Use adds when possible - cleaner than formula
Create intermediate variables for complex logic
Follow metadata standards exactly
Complete implementation or delete the file

23 KiB Raw Permalink Blame History

PolicyEngine Implementation Patterns

PolicyEngine Architecture Constraints

What CANNOT Be Simulated (Single-Period Limitation)

1. Time Limits and Lifetime Counters

2. Work History Requirements

3. Waiting Periods and Benefit Delays

4. Progressive Sanctions and Penalties

5. Asset Spend-Down Over Time

What CAN Be Simulated (With Caveats)

Time-Limited Benefits That Affect Current Calculations

Critical Principles

1. ZERO Hard-Coded Values

2. No Placeholder Implementations

Variable Implementation Standards

Variable Metadata Format

When to Use adds vs formula

TANF Countable Income Pattern

Critical: Verify Calculation Order from Legal Code

❌ WRONG - Applying deductions to total income

✅ CORRECT - Apply deductions to earned only, then add unearned

Pattern Variations

When Unearned Income HAS Deductions

Quick Reference

Federal/State Separation

Federal Parameters

State Parameters

Code Reuse Patterns

Avoid Duplication - Create Intermediate Variables

TANF-Specific Patterns

Study Reference Implementations First

Standard TANF Structure

Simplified TANF Rules

Avoiding Unnecessary Wrapper Variables (CRITICAL)

Understand WHY Variables Exist, Not Just WHAT

Variable Creation Decision Tree

EXCEPTION: Code Reuse Justifies Intermediate Variables

Red Flags for Unnecessary Wrapper Variables

Examples of VALID State Variables

State Variables to AVOID Creating

Demographic Eligibility Pattern

Common Implementation Patterns

Income Eligibility

Benefit Calculation

Using Scale Parameters

Variable Creation Checklist

Quality Standards

Complete Implementation Requirements

Anti-Patterns to Avoid

Parameter-to-Variable Mapping Requirements

Every Parameter Must Have a Variable

Complete Eligibility Formula

For Agents

23 KiB

Raw Permalink Blame History

When to Use `adds` vs `formula`