Files
2025-11-30 08:47:57 +08:00

739 lines
23 KiB
Markdown

---
name: policyengine-implementation-patterns
description: PolicyEngine implementation patterns - variable creation, no hard-coding principle, federal/state separation, metadata standards
---
# PolicyEngine Implementation Patterns
Essential patterns for implementing government benefit program rules in PolicyEngine.
## PolicyEngine Architecture Constraints
### What CANNOT Be Simulated (Single-Period Limitation)
**CRITICAL: PolicyEngine uses single-period simulation architecture**
The following CANNOT be implemented and should be SKIPPED when found in documentation:
#### 1. Time Limits and Lifetime Counters
**Cannot simulate:**
- ANY lifetime benefit limits (X months total)
- ANY time windows (X months within Y period)
- Benefit clocks and countable months
- Cumulative time tracking
**Why:** Requires tracking benefit history across multiple periods. PolicyEngine simulates one period at a time with no state persistence.
**What to do:** Document in comments but DON'T parameterize or implement:
```python
# NOTE: [State] has [X]-month lifetime limit on [Program] benefits
# This cannot be simulated in PolicyEngine's single-period architecture
```
#### 2. Work History Requirements
**Cannot simulate:**
- "Must have worked 6 of last 12 months"
- "Averaged 30 hours/week over past quarter"
- Prior employment verification
- Work participation rate tracking
**Why:** Requires historical data from previous periods.
#### 3. Waiting Periods and Benefit Delays
**Cannot simulate:**
- "3-month waiting period for new residents"
- "Benefits start month after application"
- Retroactive eligibility
- Benefit recertification cycles
**Why:** Requires tracking application dates and eligibility history.
#### 4. Progressive Sanctions and Penalties
**Cannot simulate:**
- "First violation: 1-month sanction, Second: 3-month, Third: permanent"
- Graduated penalties
- Strike systems
**Why:** Requires tracking violation history.
#### 5. Asset Spend-Down Over Time
**Cannot simulate:**
- Medical spend-down across months
- Resource depletion tracking
- Accumulated medical expenses
**Why:** Requires tracking expenses and resources across periods.
### What CAN Be Simulated (With Caveats)
PolicyEngine CAN simulate point-in-time eligibility and benefits:
- ✅ Current month income limits
- ✅ Current month resource limits
- ✅ Current benefit calculations
- ✅ Current household composition
- ✅ Current deductions and disregards
### Time-Limited Benefits That Affect Current Calculations
**Special Case: Time-limited deductions/disregards**
When a deduction or disregard is only available for X months:
- **DO implement the deduction** (assume it applies)
- **DO add a comment** explaining the time limitation
- **DON'T try to track or enforce the time limit**
Example:
```python
class state_tanf_countable_earned_income(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
earned = spm_unit("tanf_gross_earned_income", period)
# NOTE: In reality, this 75% disregard only applies for first 4 months
# of employment. PolicyEngine cannot track employment duration, so we
# apply the disregard assuming the household qualifies.
# Actual rule: [State Code Citation]
disregard_rate = p.earned_income_disregard_rate # 0.75
return earned * (1 - disregard_rate)
```
**Rule: If it requires history or future tracking, it CANNOT be fully simulated - but implement what we can and document limitations**
---
## Critical Principles
### 1. ZERO Hard-Coded Values
**Every numeric value MUST be parameterized**
```python
FORBIDDEN:
return where(eligible, 1000, 0) # Hard-coded 1000
age < 15 # Hard-coded 15
benefit = income * 0.33 # Hard-coded 0.33
month >= 10 and month <= 3 # Hard-coded months
REQUIRED:
return where(eligible, p.maximum_benefit, 0)
age < p.age_threshold.minor_child
benefit = income * p.benefit_rate
month >= p.season.start_month
```
**Acceptable literals:**
- `0`, `1`, `-1` for basic math
- `12` for month conversion (`/ 12`, `* 12`)
- Array indices when structure is known
### 2. No Placeholder Implementations
**Delete the file rather than leave placeholders**
```python
NEVER:
def formula(entity, period, parameters):
# TODO: Implement
return 75 # Placeholder
ALWAYS:
# Complete implementation or no file at all
```
---
## Variable Implementation Standards
### Variable Metadata Format
Follow established patterns:
```python
class il_tanf_countable_earned_income(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH
label = "Illinois TANF countable earned income"
unit = USD
reference = "https://www.law.cornell.edu/regulations/illinois/..."
defined_for = StateCode.IL
# Use adds for simple sums
adds = ["il_tanf_earned_income_after_disregard"]
```
**Key rules:**
- ✅ Use full URL in `reference` (clickable)
- ❌ Don't use `documentation` field
- ❌ Don't use statute citations without URLs
### When to Use `adds` vs `formula`
**Use `adds` when:**
- Just summing variables
- Passing through a single variable
- No transformations needed
```python
BEST - Simple sum:
class tanf_gross_income(Variable):
adds = ["employment_income", "self_employment_income"]
```
**Use `formula` when:**
- Applying transformations
- Conditional logic
- Calculations needed
```python
CORRECT - Need logic:
def formula(entity, period, parameters):
income = add(entity, period, ["income1", "income2"])
return max_(0, income) # Need max_
```
---
## TANF Countable Income Pattern
### Critical: Verify Calculation Order from Legal Code
**MOST IMPORTANT:** Always check the state's legal code or policy manual for the exact calculation order. The pattern below is typical but not universal.
**The Typical Pattern:**
1. Apply deductions/disregards to **earned income only**
2. Use `max_()` to prevent negative earned income
3. Add unearned income (which typically has no deductions)
**This pattern is based on how MOST TANF programs work, but you MUST verify with the specific state's legal code.**
### ❌ WRONG - Applying deductions to total income
```python
def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
deductions = spm_unit("tanf_earned_income_deductions", period)
# ❌ WRONG: Deductions applied to total income
total_income = gross_earned + unearned
countable = total_income - deductions
return max_(countable, 0)
```
**Why this is wrong:**
- Deductions should ONLY reduce earned income
- Unearned income (SSI, child support, etc.) is not subject to work expense deductions
- This incorrectly reduces unearned income when earned income is low
**Example error:**
- Earned: $100, Unearned: $500, Deductions: $200
- Wrong result: `max_($100 + $500 - $200, 0) = $400` (reduces unearned!)
- Correct result: `max_($100 - $200, 0) + $500 = $500`
### ✅ CORRECT - Apply deductions to earned only, then add unearned
```python
def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
deductions = spm_unit("tanf_earned_income_deductions", period)
# ✅ CORRECT: Deductions applied to earned only, then add unearned
return max_(gross_earned - deductions, 0) + unearned
```
### Pattern Variations
**With multiple deduction steps:**
```python
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
# Step 1: Apply work expense deduction
work_expense = min_(gross_earned * p.work_expense_rate, p.work_expense_max)
after_work_expense = max_(gross_earned - work_expense, 0)
# Step 2: Apply earnings disregard
earnings_disregard = after_work_expense * p.disregard_rate
countable_earned = max_(after_work_expense - earnings_disregard, 0)
# Step 3: Add unearned (no deductions applied)
return countable_earned + unearned
```
**With disregard percentage (simplified):**
```python
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
# Apply disregard to earned (keep 33% = disregard 67%)
countable_earned = gross_earned * (1 - p.earned_disregard_rate)
return max_(countable_earned, 0) + unearned
```
### When Unearned Income HAS Deductions
Some states DO have unearned income deductions (rare). Handle separately:
```python
def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
gross_unearned = spm_unit("tanf_gross_unearned_income", period)
earned_deductions = spm_unit("tanf_earned_income_deductions", period)
unearned_deductions = spm_unit("tanf_unearned_income_deductions", period)
# Apply each type of deduction to its respective income type
countable_earned = max_(gross_earned - earned_deductions, 0)
countable_unearned = max_(gross_unearned - unearned_deductions, 0)
return countable_earned + countable_unearned
```
### Quick Reference
**Standard TANF pattern:**
```
Countable Income = max_(Earned - Earned Deductions, 0) + Unearned
```
**NOT:**
```
❌ max_(Earned + Unearned - Deductions, 0)
❌ max_(Earned - Deductions + Unearned, 0) # Can go negative
```
---
## Federal/State Separation
### Federal Parameters
Location: `/parameters/gov/{agency}/`
- Base formulas and methodologies
- National standards
- Required elements
### State Parameters
Location: `/parameters/gov/states/{state}/`
- State-specific thresholds
- Implementation choices
- Scale factors
```yaml
# Federal: parameters/gov/hhs/fpg/base.yaml
first_person: 14_580
# State: parameters/gov/states/ca/scale_factor.yaml
fpg_multiplier: 2.0 # 200% of FPG
```
---
## Code Reuse Patterns
### Avoid Duplication - Create Intermediate Variables
**❌ ANTI-PATTERN: Copy-pasting calculations**
```python
# File 1: calculates income after deduction
def formula(household, period, parameters):
gross = add(household, period, ["income"])
deduction = p.deduction * household.nb_persons()
return max_(gross - deduction, 0)
# File 2: DUPLICATES same calculation
def formula(household, period, parameters):
gross = add(household, period, ["income"]) # Copy-pasted
deduction = p.deduction * household.nb_persons() # Copy-pasted
after_deduction = max_(gross - deduction, 0) # Copy-pasted
return after_deduction < p.threshold
```
**✅ CORRECT: Reuse existing variables**
```python
# File 2: reuses calculation
def formula(household, period, parameters):
countable_income = household("program_countable_income", period)
return countable_income < p.threshold
```
**When to create intermediate variables:**
- Same calculation in 2+ places
- Logic exceeds 5 lines
- Reference implementations have similar variable
---
## TANF-Specific Patterns
### Study Reference Implementations First
**MANDATORY before implementing any TANF:**
- DC TANF: `/variables/gov/states/dc/dhs/tanf/`
- IL TANF: `/variables/gov/states/il/dhs/tanf/`
- TX TANF: `/variables/gov/states/tx/hhs/tanf/`
**Learn from them:**
1. Variable organization
2. Naming conventions
3. Code reuse patterns
4. When to use `adds` vs `formula`
### Standard TANF Structure
```
tanf/
├── eligibility/
│ ├── demographic_eligible.py
│ ├── income_eligible.py
│ └── eligible.py
├── income/
│ ├── earned/
│ ├── unearned/
│ └── countable_income.py
└── [state]_tanf.py
```
### Simplified TANF Rules
For simplified implementations:
**DON'T create state-specific versions of:**
- Demographic eligibility (use federal)
- Immigration eligibility (use federal)
- Income sources (use federal baseline)
```python
DON'T CREATE:
ca_tanf_demographic_eligible_person.py
ca_tanf_gross_earned_income.py
parameters/.../income/sources/earned.yaml
DO USE:
# Federal demographic eligibility
is_demographic_tanf_eligible
# Federal income aggregation
tanf_gross_earned_income
```
### Avoiding Unnecessary Wrapper Variables (CRITICAL)
**Golden Rule: Only create a state variable if you're adding state-specific logic to it!**
#### Understand WHY Variables Exist, Not Just WHAT
When studying reference implementations:
1. **Note which variables they have**
2. **READ THE CODE inside each variable**
3. **Ask: "Does this variable have state-specific logic?"**
4. **If it just returns federal baseline → DON'T copy it**
#### Variable Creation Decision Tree
Before creating ANY state-specific variable, ask:
1. Does federal baseline already calculate this?
2. Does my state do it DIFFERENTLY than federal?
3. Can I write the difference in 1+ lines of state-specific logic?
4. **Will this calculation be used in 2+ other variables?** (Code reuse exception)
**Decision:**
- If YES/NO/NO/NO → **DON'T create the variable**, use federal directly
- If YES/YES/YES/NO → **CREATE the variable** with state logic
- If YES/NO/NO/YES → **CREATE as intermediate variable** for code reuse (see exception below)
#### EXCEPTION: Code Reuse Justifies Intermediate Variables
**Even without state-specific logic, create a variable if the SAME calculation is used in multiple places.**
**Bad - Duplicating calculation across variables:**
```python
# Variable 1 - Income eligibility
class mo_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
# Duplicated calculation
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
return gross <= p.income_limit
# Variable 2 - Countable income
class mo_tanf_countable_income(Variable):
def formula(spm_unit, period, parameters):
# SAME calculation repeated!
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
deductions = spm_unit("mo_tanf_deductions", period)
return max_(gross - deductions, 0)
# Variable 3 - Need standard
class mo_tanf_need_standard(Variable):
def formula(spm_unit, period, parameters):
# SAME calculation AGAIN!
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
return where(gross < p.threshold, p.high, p.low)
```
**Good - Extract into reusable intermediate variable:**
```python
# Intermediate variable - used in multiple places
class mo_tanf_gross_income(Variable):
adds = ["tanf_gross_earned_income", "tanf_gross_unearned_income"]
# Variable 1 - Reuses intermediate
class mo_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
return gross <= p.income_limit
# Variable 2 - Reuses intermediate
class mo_tanf_countable_income(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
deductions = spm_unit("mo_tanf_deductions", period)
return max_(gross - deductions, 0)
# Variable 3 - Reuses intermediate
class mo_tanf_need_standard(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
return where(gross < p.threshold, p.high, p.low)
```
**When to create intermediate variables for reuse:**
- ✅ Same calculation appears in 2+ variables
- ✅ Represents a meaningful concept (e.g., "gross income", "net resources")
- ✅ Simplifies maintenance (change once vs many places)
- ✅ Follows DRY (Don't Repeat Yourself) principle
**When NOT to create (still a wrapper):**
- ❌ Only used in ONE place
- ❌ Just passes through another variable unchanged
- ❌ Adds indirection without code reuse benefit
#### Red Flags for Unnecessary Wrapper Variables
```python
INVALID - Pure wrapper, no state logic:
class in_tanf_assistance_unit_size(Variable):
def formula(spm_unit, period):
return spm_unit("spm_unit_size", period) # Just returns federal
INVALID - Aggregation without transformation:
class in_tanf_countable_unearned_income(Variable):
def formula(tax_unit, period):
return tax_unit.sum(person("tanf_gross_unearned_income", period))
INVALID - Pass-through with no modification:
class in_tanf_gross_income(Variable):
def formula(entity, period):
return entity("tanf_gross_income", period)
```
#### Examples of VALID State Variables
```python
VALID - Has state-specific disregard:
class in_tanf_countable_earned_income(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.in.tanf.income
earned = spm_unit("tanf_gross_earned_income", period)
return earned * (1 - p.earned_income_disregard_rate) # STATE LOGIC
VALID - Uses state-specific limits:
class in_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.in.tanf
income = spm_unit("tanf_countable_income", period)
size = spm_unit("spm_unit_size", period.this_year)
limit = p.income_limit[min_(size, p.max_household_size)] # STATE PARAMS
return income <= limit
VALID - IL has different counting rules:
class il_tanf_assistance_unit_size(Variable):
adds = [
"il_tanf_payment_eligible_child", # STATE-SPECIFIC
"il_tanf_payment_eligible_parent", # STATE-SPECIFIC
]
```
#### State Variables to AVOID Creating
For TANF implementations:
**❌ DON'T create these (use federal directly):**
- `state_tanf_assistance_unit_size` (unless different counting rules like IL)
- `state_tanf_countable_unearned_income` (unless state has disregards)
- `state_tanf_gross_income` (just use federal baseline)
- Any variable that's just `return entity("federal_variable", period)`
**✅ DO create these (when state has unique rules):**
- `state_tanf_countable_earned_income` (if unique disregard %)
- `state_tanf_income_eligible` (state income limits)
- `state_tanf_maximum_benefit` (state payment standards)
- `state_tanf` (final benefit calculation)
### Demographic Eligibility Pattern
**Option 1: Use Federal (Simplified)**
```python
class ca_tanf_eligible(Variable):
def formula(spm_unit, period, parameters):
# Use federal variable
has_eligible = spm_unit.any(
spm_unit.members("is_demographic_tanf_eligible", period)
)
return has_eligible & income_eligible
```
**Option 2: State-Specific (Different thresholds)**
```python
class ca_tanf_demographic_eligible_person(Variable):
def formula(person, period, parameters):
p = parameters(period).gov.states.ca.tanf
age = person("age", period.this_year) # NOT monthly_age
age_limit = where(
person("is_full_time_student", period),
p.age_threshold.student,
p.age_threshold.minor_child
)
return age < age_limit
```
---
## Common Implementation Patterns
### Income Eligibility
```python
class program_income_eligible(Variable):
value_type = bool
entity = SPMUnit
definition_period = MONTH
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.program
income = spm_unit("program_countable_income", period)
size = spm_unit("spm_unit_size", period.this_year)
# Get threshold from parameters
threshold = p.income_limit[min_(size, p.max_household_size)]
return income <= threshold
```
### Benefit Calculation
```python
class program_benefit(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH
unit = USD
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.program
eligible = spm_unit("program_eligible", period)
# Calculate benefit amount
base = p.benefit_schedule.base_amount
adjustment = p.benefit_schedule.adjustment_rate
size = spm_unit("spm_unit_size", period.this_year)
amount = base + (size - 1) * adjustment
return where(eligible, amount, 0)
```
### Using Scale Parameters
```python
def formula(entity, period, parameters):
p = parameters(period).gov.states.az.program
federal_p = parameters(period).gov.hhs.fpg
# Federal base with state scale
size = entity("household_size", period.this_year)
fpg = federal_p.first_person + federal_p.additional * (size - 1)
state_scale = p.income_limit_scale # Often exists
income_limit = fpg * state_scale
```
---
## Variable Creation Checklist
Before creating any variable:
- [ ] Check if it already exists
- [ ] Use standard demographic variables (age, is_disabled)
- [ ] Reuse federal calculations where applicable
- [ ] Check for household_income before creating new
- [ ] Look for existing intermediate variables
- [ ] Study reference implementations
---
## Quality Standards
### Complete Implementation Requirements
- All values from parameters (no hard-coding)
- Complete formula logic
- Proper entity aggregation
- Correct period handling
- Meaningful variable names
- Proper metadata
### Anti-Patterns to Avoid
- Copy-pasting logic between files
- Hard-coding any numeric values
- Creating duplicate income variables
- State-specific versions of federal rules
- Placeholder TODOs in production code
---
## Parameter-to-Variable Mapping Requirements
### Every Parameter Must Have a Variable
**CRITICAL: Complete implementation means every parameter is used!**
When you create parameters, you MUST create corresponding variables:
| Parameter Type | Required Variable(s) |
|---------------|---------------------|
| resources/limit | `state_program_resource_eligible` |
| income/limit | `state_program_income_eligible` |
| payment_standard | `state_program_maximum_benefit` |
| income/disregard | `state_program_countable_earned_income` |
| categorical/requirements | `state_program_categorically_eligible` |
### Complete Eligibility Formula
The main eligibility variable MUST combine ALL checks:
```python
class state_program_eligible(Variable):
def formula(spm_unit, period, parameters):
income_eligible = spm_unit("state_program_income_eligible", period)
resource_eligible = spm_unit("state_program_resource_eligible", period) # DON'T FORGET!
categorical = spm_unit("state_program_categorically_eligible", period)
return income_eligible & resource_eligible & categorical
```
**Common Implementation Failures:**
- ❌ Created resource limit parameter but no resource_eligible variable
- ❌ Main eligible variable only checks income, ignores resources
- ❌ Parameters created but never referenced in any formula
---
## For Agents
When implementing variables:
1. **Study reference implementations** (DC, IL, TX TANF)
2. **Never hard-code values** - use parameters
3. **Map every parameter to a variable** - no orphaned parameters
4. **Complete ALL eligibility checks** - income AND resources AND categorical
5. **Reuse existing variables** - avoid duplication
6. **Use `adds` when possible** - cleaner than formula
7. **Create intermediate variables** for complex logic
8. **Follow metadata standards** exactly
9. **Complete implementation** or delete the file