Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:47:43 +08:00
commit 2e8d89fca3
41 changed files with 14051 additions and 0 deletions

View File

@@ -0,0 +1,329 @@
---
name: policyengine-aggregation
description: PolicyEngine aggregation patterns - using adds attribute and add() function for summing variables across entities
---
# PolicyEngine Aggregation Patterns
Essential patterns for summing variables across entities in PolicyEngine.
## Quick Decision Guide
```
Is the variable ONLY a sum of other variables?
├─ YES → Use `adds` attribute (NO formula needed!)
│ adds = ["var1", "var2"]
└─ NO → Use `add()` function in formula
(when you need max_, where, conditions, etc.)
```
## Quick Reference
| Need | Use | Example |
|------|-----|---------|
| Simple sum | `adds` | `adds = ["var1", "var2"]` |
| Sum from parameters | `adds` | `adds = "gov.path.to.list"` |
| Sum + max_() | `add()` | `max_(0, add(...))` |
| Sum + where() | `add()` | `where(cond, add(...), 0)` |
| Sum + conditions | `add()` | `if cond: add(...)` |
| Count booleans | `adds` | `adds = ["is_eligible"]` |
---
## 1. `adds` Class Attribute (Preferred When Possible)
### When to Use
Use `adds` when a variable is **ONLY** the sum of other variables with **NO additional logic**.
### Syntax
```python
class variable_name(Variable):
value_type = float
entity = Entity
definition_period = PERIOD
# Option 1: List of variables
adds = ["variable1", "variable2", "variable3"]
# Option 2: Parameter tree path
adds = "gov.path.to.parameter.list"
```
### Key Points
- ✅ No `formula()` method needed
- ✅ Automatically handles entity aggregation (person → household/tax_unit/spm_unit)
- ✅ Clean and declarative
### Example: Simple Income Sum
```python
class tanf_gross_earned_income(Variable):
value_type = float
entity = SPMUnit
label = "TANF gross earned income"
unit = USD
definition_period = MONTH
adds = ["employment_income", "self_employment_income"]
# NO formula needed! Automatically:
# 1. Gets each person's employment_income
# 2. Gets each person's self_employment_income
# 3. Sums all values across SPM unit members
```
### Example: Using Parameter List
```python
class income_tax_refundable_credits(Variable):
value_type = float
entity = TaxUnit
definition_period = YEAR
adds = "gov.irs.credits.refundable"
# Parameter file contains list like:
# - earned_income_tax_credit
# - child_tax_credit
# - additional_child_tax_credit
```
### Example: Counting Boolean Values
```python
class count_eligible_people(Variable):
value_type = int
entity = SPMUnit
definition_period = YEAR
adds = ["is_eligible_person"]
# Automatically sums True (1) and False (0) across members
```
---
## 2. `add()` Function (When Logic Needed)
### When to Use
Use `add()` inside a `formula()` when you need:
- To apply `max_()`, `where()`, or conditions
- To combine with other operations
- To modify values before/after summing
### Syntax
```python
from policyengine_us.model_api import *
def formula(entity, period, parameters):
result = add(entity, period, variable_list)
```
**Parameters:**
- `entity`: The entity to operate on
- `period`: The time period for calculation
- `variable_list`: List of variable names or parameter path
### Example: With max_() to Prevent Negatives
```python
class adjusted_earned_income(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH
def formula(spm_unit, period, parameters):
# Need max_() to clip negative values
gross = add(spm_unit, period, ["employment_income", "self_employment_income"])
return max_(0, gross) # Prevent negative income
```
### Example: With Additional Logic
```python
class household_benefits(Variable):
value_type = float
entity = Household
definition_period = YEAR
def formula(household, period, parameters):
# Sum existing benefits
BENEFITS = ["snap", "tanf", "ssi", "social_security"]
existing = add(household, period, BENEFITS)
# Add new benefit conditionally
new_benefit = household("special_benefit", period)
p = parameters(period).gov.special_benefit
if p.include_in_total:
return existing + new_benefit
return existing
```
### Example: Building on Previous Variables
```python
class total_deductions(Variable):
value_type = float
entity = TaxUnit
definition_period = YEAR
def formula(tax_unit, period, parameters):
p = parameters(period).gov.irs.deductions
# Get standard deductions using parameter list
standard = add(tax_unit, period, p.standard_items)
# Apply phase-out logic
income = tax_unit("adjusted_gross_income", period)
phase_out_rate = p.phase_out_rate
phase_out_start = p.phase_out_start
reduction = max_(0, (income - phase_out_start) * phase_out_rate)
return max_(0, standard - reduction)
```
---
## 3. Common Anti-Patterns to Avoid
### ❌ NEVER: Manual Summing
```python
# WRONG - Never do this!
def formula(spm_unit, period, parameters):
person = spm_unit.members
employment = person("employment_income", period)
self_emp = person("self_employment_income", period)
return spm_unit.sum(employment + self_emp) # ❌ BAD
```
### ✅ CORRECT: Use adds
```python
# RIGHT - Clean and simple
adds = ["employment_income", "self_employment_income"] # ✅ GOOD
```
### ❌ WRONG: Using add() When adds Suffices
```python
# WRONG - Unnecessary complexity
def formula(spm_unit, period, parameters):
return add(spm_unit, period, ["income1", "income2"]) # ❌ Overkill
```
### ✅ CORRECT: Use adds
```python
# RIGHT - Simpler
adds = ["income1", "income2"] # ✅ GOOD
```
---
## 4. Entity Aggregation Explained
When using `adds` or `add()`, PolicyEngine automatically handles entity aggregation:
```python
class household_total_income(Variable):
entity = Household # Higher-level entity
definition_period = YEAR
adds = ["employment_income", "self_employment_income"]
# employment_income is defined for Person (lower-level)
# PolicyEngine automatically:
# 1. Gets employment_income for each person in household
# 2. Gets self_employment_income for each person
# 3. Sums all values to household level
```
This works across all entity hierarchies:
- Person → Tax Unit
- Person → SPM Unit
- Person → Household
- Tax Unit → Household
- SPM Unit → Household
---
## 5. Parameter Lists
Parameters can define lists of variables to sum:
**Parameter file** (`gov/irs/credits/refundable.yaml`):
```yaml
description: List of refundable tax credits
values:
2024-01-01:
- earned_income_tax_credit
- child_tax_credit
- additional_child_tax_credit
```
**Usage in variable**:
```python
adds = "gov.irs.credits.refundable"
# Automatically sums all credits in the list
```
---
## 6. Decision Matrix
| Scenario | Solution | Code |
|----------|----------|------|
| Sum 2-3 variables | `adds` attribute | `adds = ["var1", "var2"]` |
| Sum many variables | Parameter list | `adds = "gov.path.list"` |
| Sum + prevent negatives | `add()` with `max_()` | `max_(0, add(...))` |
| Sum + conditional | `add()` with `where()` | `where(eligible, add(...), 0)` |
| Sum + phase-out | `add()` with calculation | `add(...) - reduction` |
| Count people/entities | `adds` with boolean | `adds = ["is_child"]` |
---
## 7. Key Principles
1. **Default to `adds` attribute** when variable is only a sum
2. **Use `add()` function** only when additional logic is needed
3. **Never manually sum** with `entity.sum(person(...) + person(...))`
4. **Let PolicyEngine handle** entity aggregation automatically
5. **Use parameter lists** for maintainable, configurable sums
---
## Related Skills
- **policyengine-period-patterns-skill**: For period conversion when summing across different time periods
- **policyengine-core-skill**: For understanding entity hierarchies and relationships
---
## For Agents
When implementing or reviewing code:
1. **Check if `adds` can be used** before writing a formula
2. **Prefer declarative over imperative** when possible
3. **Follow existing patterns** in the codebase
4. **Test entity aggregation** carefully in YAML tests
5. **Document parameter lists** clearly for `adds` references
---
## Common Use Cases
### Earned Income
```python
adds = ["employment_income", "self_employment_income"]
```
### Unearned Income
```python
adds = ["interest_income", "dividend_income", "rental_income"]
```
### Total Benefits
```python
adds = ["snap", "tanf", "wic", "ssi", "social_security"]
```
### Tax Credits
```python
adds = "gov.irs.credits.refundable"
```
### Counting Children
```python
adds = ["is_child"] # Returns count of children
```

View File

@@ -0,0 +1,382 @@
---
name: policyengine-code-style
description: PolicyEngine code writing style guide - formula optimization, direct returns, eliminating unnecessary variables
---
# PolicyEngine Code Writing Style Guide
Essential patterns for writing clean, efficient PolicyEngine formulas.
## Core Principles
1. **Eliminate unnecessary intermediate variables**
2. **Use direct parameter/variable access**
3. **Return directly when possible**
4. **Combine boolean logic**
5. **Use correct period access** (period vs period.this_year)
6. **NO hardcoded values** - use parameters or constants
---
## Pattern 1: Direct Parameter Access
### ❌ Bad - Unnecessary intermediate variable
```python
def formula(spm_unit, period, parameters):
countable = spm_unit("tn_tanf_countable_resources", period)
p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
resource_limit = p.amount # ❌ Unnecessary
return countable <= resource_limit
```
### ✅ Good - Direct access
```python
def formula(spm_unit, period, parameters):
countable = spm_unit("tn_tanf_countable_resources", period)
p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
return countable <= p.amount
```
---
## Pattern 2: Direct Return
### ❌ Bad - Unnecessary result variable
```python
def formula(spm_unit, period, parameters):
assets = spm_unit("spm_unit_assets", period.this_year)
p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
vehicle_exemption = p.vehicle_exemption # ❌ Unnecessary
countable = max_(assets - vehicle_exemption, 0) # ❌ Unnecessary
return countable
```
### ✅ Good - Direct return
```python
def formula(spm_unit, period, parameters):
assets = spm_unit("spm_unit_assets", period.this_year)
p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
return max_(assets - p.vehicle_exemption, 0)
```
---
## Pattern 3: Combined Boolean Logic
### ❌ Bad - Too many intermediate booleans
```python
def formula(spm_unit, period, parameters):
person = spm_unit.members
age = person("age", period.this_year)
is_disabled = person("is_disabled", period.this_year)
caretaker_is_60_or_older = spm_unit.any(age >= 60) # ❌ Unnecessary
caretaker_is_disabled = spm_unit.any(is_disabled) # ❌ Unnecessary
eligible = caretaker_is_60_or_older | caretaker_is_disabled # ❌ Unnecessary
return eligible
```
### ✅ Good - Combined logic
```python
def formula(spm_unit, period, parameters):
person = spm_unit.members
age = person("age", period.this_year)
is_disabled = person("is_disabled", period.this_year)
return spm_unit.any((age >= 60) | is_disabled)
```
---
## Pattern 4: Period Access - period vs period.this_year
### ❌ Bad - Wrong period access
```python
def formula(person, period, parameters):
# MONTH formula accessing YEAR variables
age = person("age", period) # ❌ Gives age/12 = 2.5 "monthly age"
assets = person("assets", period) # ❌ Gives assets/12
monthly_income = person("employment_income", period.this_year) / MONTHS_IN_YEAR # ❌ Redundant
return (age >= 18) & (assets < 10000) & (monthly_income < 2000)
```
### ✅ Good - Correct period access
```python
def formula(person, period, parameters):
# MONTH formula accessing YEAR variables
age = person("age", period.this_year) # ✅ Gets actual age (30)
assets = person("assets", period.this_year) # ✅ Gets actual assets ($10,000)
monthly_income = person("employment_income", period) # ✅ Auto-converts to monthly
p = parameters(period).gov.program.eligibility
return (age >= p.age_min) & (age <= p.age_max) &
(assets < p.asset_limit) & (monthly_income < p.income_threshold)
```
**Rule:**
- Income/flows → Use `period` (want monthly from annual)
- Age/assets/counts/booleans → Use `period.this_year` (don't divide by 12)
---
## Pattern 5: No Hardcoded Values
### ❌ Bad - Hardcoded numbers
```python
def formula(spm_unit, period, parameters):
size = spm_unit.nb_persons()
capped_size = min_(size, 10) # ❌ Hardcoded
age = person("age", period.this_year)
income = person("income", period) / 12 # ❌ Use MONTHS_IN_YEAR
# ❌ Hardcoded thresholds
if age >= 18 and age <= 65 and income < 2000:
return True
```
### ✅ Good - Parameterized
```python
def formula(spm_unit, period, parameters):
p = parameters(period).gov.program
capped_size = min_(spm_unit.nb_persons(), p.max_unit_size) # ✅
age = person("age", period.this_year)
monthly_income = person("income", period) # ✅ Auto-converts (no manual /12)
age_eligible = (age >= p.age_min) & (age <= p.age_max) # ✅
income_eligible = monthly_income < p.income_threshold # ✅
return age_eligible & income_eligible
```
---
## Pattern 6: Streamline Variable Access
### ❌ Bad - Redundant steps
```python
def formula(spm_unit, period, parameters):
unit_size = spm_unit.nb_persons() # ❌ Unnecessary
max_size = 10 # ❌ Hardcoded
capped_size = min_(unit_size, max_size)
p = parameters(period).gov.states.tn.dhs.tanf.benefit
spa = p.standard_payment_amount[capped_size] # ❌ Unnecessary
dgpa = p.differential_grant_payment_amount[capped_size] # ❌ Unnecessary
eligible = spm_unit("eligible_for_dgpa", period)
return where(eligible, dgpa, spa)
```
### ✅ Good - Streamlined
```python
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.tn.dhs.tanf.benefit
capped_size = min_(spm_unit.nb_persons(), p.max_unit_size)
eligible = spm_unit("eligible_for_dgpa", period)
return where(
eligible,
p.differential_grant_payment_amount[capped_size],
p.standard_payment_amount[capped_size]
)
```
---
## When to Keep Intermediate Variables
### ✅ Keep when value is used multiple times
```python
def formula(tax_unit, period, parameters):
p = parameters(period).gov.irs.credits
filing_status = tax_unit("filing_status", period)
# ✅ Used multiple times - keep as variable
threshold = p.phase_out.start[filing_status]
income = tax_unit("adjusted_gross_income", period)
excess = max_(0, income - threshold)
reduction = (excess / p.phase_out.width) * threshold
return max_(0, threshold - reduction)
```
### ✅ Keep when calculation is complex
```python
def formula(spm_unit, period, parameters):
p = parameters(period).gov.program
gross_earned = spm_unit("gross_earned_income", period)
# ✅ Complex multi-step calculation - break it down
work_expense_deduction = min_(gross_earned * p.work_expense_rate, p.work_expense_max)
after_work_expense = gross_earned - work_expense_deduction
earned_disregard = after_work_expense * p.earned_disregard_rate
countable_earned = after_work_expense - earned_disregard
dependent_care = spm_unit("dependent_care_expenses", period)
return max_(0, countable_earned - dependent_care)
```
---
## Complete Example: Before vs After
### ❌ Before - Multiple Issues
```python
def formula(person, period, parameters):
# Wrong period access
age = person("age", period) # ❌ age/12
assets = person("assets", period) # ❌ assets/12
annual_income = person("employment_income", period.this_year)
monthly_income = annual_income / 12 # ❌ Use MONTHS_IN_YEAR
# Hardcoded values
min_age = 18 # ❌
max_age = 64 # ❌
asset_limit = 10000 # ❌
income_limit = 2000 # ❌
# Unnecessary intermediate variables
age_check = (age >= min_age) & (age <= max_age)
asset_check = assets <= asset_limit
income_check = monthly_income <= income_limit
eligible = age_check & asset_check & income_check
return eligible
```
### ✅ After - Clean and Correct
```python
def formula(person, period, parameters):
p = parameters(period).gov.program.eligibility
# Correct period access
age = person("age", period.this_year)
assets = person("assets", period.this_year)
monthly_income = person("employment_income", period)
# Direct return with combined logic
return (
(age >= p.age_min) & (age <= p.age_max) &
(assets <= p.asset_limit) &
(monthly_income <= p.income_threshold)
)
```
---
## Pattern 7: Minimal Comments
### Code Should Be Self-Documenting
**Variable names and structure should explain the code - not comments.**
### ❌ Bad - Verbose explanatory comments
```python
def formula(spm_unit, period, parameters):
# Wisconsin disregards all earned income of dependent children (< 18)
# Calculate earned income for adults only
is_adult = spm_unit.members("age", period.this_year) >= 18 # Hard-coded!
adult_earned = spm_unit.sum(
spm_unit.members("tanf_gross_earned_income", period) * is_adult
)
# All unearned income is counted (including children's)
gross_unearned = add(spm_unit, period, ["tanf_gross_unearned_income"])
# NOTE: Wisconsin disregards many additional income sources that
# are not separately tracked in PolicyEngine (educational aid, etc.)
return max_(total_income - disregards, 0)
```
### ✅ Good - Clean self-documenting code
```python
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.wi.dcf.tanf.income
is_adult = spm_unit.members("age", period.this_year) >= p.adult_age_threshold
adult_earned = spm_unit.sum(
spm_unit.members("tanf_gross_earned_income", period) * is_adult
)
gross_unearned = add(spm_unit, period, ["tanf_gross_unearned_income"])
child_support = add(spm_unit, period, ["child_support_received"])
return max_(adult_earned + gross_unearned - child_support, 0)
```
### Comment Rules
1. **NO comments explaining what code does** - variable names should be clear
2. **OK: Brief NOTE about PolicyEngine limitations** (one line):
```python
# NOTE: Time limit cannot be tracked in PolicyEngine
```
3. **NO multi-line explanations** of what the code calculates
---
## Quick Checklist
Before finalizing code:
- [ ] No hardcoded numbers (use parameters or constants like MONTHS_IN_YEAR)
- [ ] Correct period access:
- Income/flows use `period`
- Age/assets/counts/booleans use `period.this_year`
- [ ] No single-use intermediate variables
- [ ] Direct parameter access (`p.amount` not `amount = p.amount`)
- [ ] Direct returns when possible
- [ ] Combined boolean logic when possible
- [ ] Minimal comments (code should be self-documenting)
---
## Key Takeaways
1. **Less is more** - Eliminate unnecessary variables
2. **Direct is better** - Access parameters and return directly
3. **Combine when logical** - Group related boolean conditions
4. **Keep when needed** - Complex calculations and reused values deserve variables
5. **Period matters** - Use correct period access to avoid auto-conversion bugs
---
## Related Skills
- **policyengine-period-patterns-skill** - Deep dive on period handling
- **policyengine-implementation-patterns-skill** - Variable structure and patterns
- **policyengine-vectorization-skill** - NumPy operations and vectorization
---
## For Agents
When writing or reviewing formulas:
1. **Scan for single-use variables** - eliminate them
2. **Check period access** - ensure correct for variable type
3. **Look for hardcoded values** - parameterize them
4. **Identify redundant steps** - streamline them
5. **Consider readability** - keep complex calculations clear

View File

@@ -0,0 +1,739 @@
---
name: policyengine-implementation-patterns
description: PolicyEngine implementation patterns - variable creation, no hard-coding principle, federal/state separation, metadata standards
---
# PolicyEngine Implementation Patterns
Essential patterns for implementing government benefit program rules in PolicyEngine.
## PolicyEngine Architecture Constraints
### What CANNOT Be Simulated (Single-Period Limitation)
**CRITICAL: PolicyEngine uses single-period simulation architecture**
The following CANNOT be implemented and should be SKIPPED when found in documentation:
#### 1. Time Limits and Lifetime Counters
**Cannot simulate:**
- ANY lifetime benefit limits (X months total)
- ANY time windows (X months within Y period)
- Benefit clocks and countable months
- Cumulative time tracking
**Why:** Requires tracking benefit history across multiple periods. PolicyEngine simulates one period at a time with no state persistence.
**What to do:** Document in comments but DON'T parameterize or implement:
```python
# NOTE: [State] has [X]-month lifetime limit on [Program] benefits
# This cannot be simulated in PolicyEngine's single-period architecture
```
#### 2. Work History Requirements
**Cannot simulate:**
- "Must have worked 6 of last 12 months"
- "Averaged 30 hours/week over past quarter"
- Prior employment verification
- Work participation rate tracking
**Why:** Requires historical data from previous periods.
#### 3. Waiting Periods and Benefit Delays
**Cannot simulate:**
- "3-month waiting period for new residents"
- "Benefits start month after application"
- Retroactive eligibility
- Benefit recertification cycles
**Why:** Requires tracking application dates and eligibility history.
#### 4. Progressive Sanctions and Penalties
**Cannot simulate:**
- "First violation: 1-month sanction, Second: 3-month, Third: permanent"
- Graduated penalties
- Strike systems
**Why:** Requires tracking violation history.
#### 5. Asset Spend-Down Over Time
**Cannot simulate:**
- Medical spend-down across months
- Resource depletion tracking
- Accumulated medical expenses
**Why:** Requires tracking expenses and resources across periods.
### What CAN Be Simulated (With Caveats)
PolicyEngine CAN simulate point-in-time eligibility and benefits:
- ✅ Current month income limits
- ✅ Current month resource limits
- ✅ Current benefit calculations
- ✅ Current household composition
- ✅ Current deductions and disregards
### Time-Limited Benefits That Affect Current Calculations
**Special Case: Time-limited deductions/disregards**
When a deduction or disregard is only available for X months:
- **DO implement the deduction** (assume it applies)
- **DO add a comment** explaining the time limitation
- **DON'T try to track or enforce the time limit**
Example:
```python
class state_tanf_countable_earned_income(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
earned = spm_unit("tanf_gross_earned_income", period)
# NOTE: In reality, this 75% disregard only applies for first 4 months
# of employment. PolicyEngine cannot track employment duration, so we
# apply the disregard assuming the household qualifies.
# Actual rule: [State Code Citation]
disregard_rate = p.earned_income_disregard_rate # 0.75
return earned * (1 - disregard_rate)
```
**Rule: If it requires history or future tracking, it CANNOT be fully simulated - but implement what we can and document limitations**
---
## Critical Principles
### 1. ZERO Hard-Coded Values
**Every numeric value MUST be parameterized**
```python
FORBIDDEN:
return where(eligible, 1000, 0) # Hard-coded 1000
age < 15 # Hard-coded 15
benefit = income * 0.33 # Hard-coded 0.33
month >= 10 and month <= 3 # Hard-coded months
REQUIRED:
return where(eligible, p.maximum_benefit, 0)
age < p.age_threshold.minor_child
benefit = income * p.benefit_rate
month >= p.season.start_month
```
**Acceptable literals:**
- `0`, `1`, `-1` for basic math
- `12` for month conversion (`/ 12`, `* 12`)
- Array indices when structure is known
### 2. No Placeholder Implementations
**Delete the file rather than leave placeholders**
```python
NEVER:
def formula(entity, period, parameters):
# TODO: Implement
return 75 # Placeholder
ALWAYS:
# Complete implementation or no file at all
```
---
## Variable Implementation Standards
### Variable Metadata Format
Follow established patterns:
```python
class il_tanf_countable_earned_income(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH
label = "Illinois TANF countable earned income"
unit = USD
reference = "https://www.law.cornell.edu/regulations/illinois/..."
defined_for = StateCode.IL
# Use adds for simple sums
adds = ["il_tanf_earned_income_after_disregard"]
```
**Key rules:**
- ✅ Use full URL in `reference` (clickable)
- ❌ Don't use `documentation` field
- ❌ Don't use statute citations without URLs
### When to Use `adds` vs `formula`
**Use `adds` when:**
- Just summing variables
- Passing through a single variable
- No transformations needed
```python
BEST - Simple sum:
class tanf_gross_income(Variable):
adds = ["employment_income", "self_employment_income"]
```
**Use `formula` when:**
- Applying transformations
- Conditional logic
- Calculations needed
```python
CORRECT - Need logic:
def formula(entity, period, parameters):
income = add(entity, period, ["income1", "income2"])
return max_(0, income) # Need max_
```
---
## TANF Countable Income Pattern
### Critical: Verify Calculation Order from Legal Code
**MOST IMPORTANT:** Always check the state's legal code or policy manual for the exact calculation order. The pattern below is typical but not universal.
**The Typical Pattern:**
1. Apply deductions/disregards to **earned income only**
2. Use `max_()` to prevent negative earned income
3. Add unearned income (which typically has no deductions)
**This pattern is based on how MOST TANF programs work, but you MUST verify with the specific state's legal code.**
### ❌ WRONG - Applying deductions to total income
```python
def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
deductions = spm_unit("tanf_earned_income_deductions", period)
# ❌ WRONG: Deductions applied to total income
total_income = gross_earned + unearned
countable = total_income - deductions
return max_(countable, 0)
```
**Why this is wrong:**
- Deductions should ONLY reduce earned income
- Unearned income (SSI, child support, etc.) is not subject to work expense deductions
- This incorrectly reduces unearned income when earned income is low
**Example error:**
- Earned: $100, Unearned: $500, Deductions: $200
- Wrong result: `max_($100 + $500 - $200, 0) = $400` (reduces unearned!)
- Correct result: `max_($100 - $200, 0) + $500 = $500`
### ✅ CORRECT - Apply deductions to earned only, then add unearned
```python
def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
deductions = spm_unit("tanf_earned_income_deductions", period)
# ✅ CORRECT: Deductions applied to earned only, then add unearned
return max_(gross_earned - deductions, 0) + unearned
```
### Pattern Variations
**With multiple deduction steps:**
```python
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
# Step 1: Apply work expense deduction
work_expense = min_(gross_earned * p.work_expense_rate, p.work_expense_max)
after_work_expense = max_(gross_earned - work_expense, 0)
# Step 2: Apply earnings disregard
earnings_disregard = after_work_expense * p.disregard_rate
countable_earned = max_(after_work_expense - earnings_disregard, 0)
# Step 3: Add unearned (no deductions applied)
return countable_earned + unearned
```
**With disregard percentage (simplified):**
```python
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
# Apply disregard to earned (keep 33% = disregard 67%)
countable_earned = gross_earned * (1 - p.earned_disregard_rate)
return max_(countable_earned, 0) + unearned
```
### When Unearned Income HAS Deductions
Some states DO have unearned income deductions (rare). Handle separately:
```python
def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
gross_unearned = spm_unit("tanf_gross_unearned_income", period)
earned_deductions = spm_unit("tanf_earned_income_deductions", period)
unearned_deductions = spm_unit("tanf_unearned_income_deductions", period)
# Apply each type of deduction to its respective income type
countable_earned = max_(gross_earned - earned_deductions, 0)
countable_unearned = max_(gross_unearned - unearned_deductions, 0)
return countable_earned + countable_unearned
```
### Quick Reference
**Standard TANF pattern:**
```
Countable Income = max_(Earned - Earned Deductions, 0) + Unearned
```
**NOT:**
```
❌ max_(Earned + Unearned - Deductions, 0)
❌ max_(Earned - Deductions + Unearned, 0) # Can go negative
```
---
## Federal/State Separation
### Federal Parameters
Location: `/parameters/gov/{agency}/`
- Base formulas and methodologies
- National standards
- Required elements
### State Parameters
Location: `/parameters/gov/states/{state}/`
- State-specific thresholds
- Implementation choices
- Scale factors
```yaml
# Federal: parameters/gov/hhs/fpg/base.yaml
first_person: 14_580
# State: parameters/gov/states/ca/scale_factor.yaml
fpg_multiplier: 2.0 # 200% of FPG
```
---
## Code Reuse Patterns
### Avoid Duplication - Create Intermediate Variables
**❌ ANTI-PATTERN: Copy-pasting calculations**
```python
# File 1: calculates income after deduction
def formula(household, period, parameters):
gross = add(household, period, ["income"])
deduction = p.deduction * household.nb_persons()
return max_(gross - deduction, 0)
# File 2: DUPLICATES same calculation
def formula(household, period, parameters):
gross = add(household, period, ["income"]) # Copy-pasted
deduction = p.deduction * household.nb_persons() # Copy-pasted
after_deduction = max_(gross - deduction, 0) # Copy-pasted
return after_deduction < p.threshold
```
**✅ CORRECT: Reuse existing variables**
```python
# File 2: reuses calculation
def formula(household, period, parameters):
countable_income = household("program_countable_income", period)
return countable_income < p.threshold
```
**When to create intermediate variables:**
- Same calculation in 2+ places
- Logic exceeds 5 lines
- Reference implementations have similar variable
---
## TANF-Specific Patterns
### Study Reference Implementations First
**MANDATORY before implementing any TANF:**
- DC TANF: `/variables/gov/states/dc/dhs/tanf/`
- IL TANF: `/variables/gov/states/il/dhs/tanf/`
- TX TANF: `/variables/gov/states/tx/hhs/tanf/`
**Learn from them:**
1. Variable organization
2. Naming conventions
3. Code reuse patterns
4. When to use `adds` vs `formula`
### Standard TANF Structure
```
tanf/
├── eligibility/
│ ├── demographic_eligible.py
│ ├── income_eligible.py
│ └── eligible.py
├── income/
│ ├── earned/
│ ├── unearned/
│ └── countable_income.py
└── [state]_tanf.py
```
### Simplified TANF Rules
For simplified implementations:
**DON'T create state-specific versions of:**
- Demographic eligibility (use federal)
- Immigration eligibility (use federal)
- Income sources (use federal baseline)
```python
DON'T CREATE:
ca_tanf_demographic_eligible_person.py
ca_tanf_gross_earned_income.py
parameters/.../income/sources/earned.yaml
DO USE:
# Federal demographic eligibility
is_demographic_tanf_eligible
# Federal income aggregation
tanf_gross_earned_income
```
### Avoiding Unnecessary Wrapper Variables (CRITICAL)
**Golden Rule: Only create a state variable if you're adding state-specific logic to it!**
#### Understand WHY Variables Exist, Not Just WHAT
When studying reference implementations:
1. **Note which variables they have**
2. **READ THE CODE inside each variable**
3. **Ask: "Does this variable have state-specific logic?"**
4. **If it just returns federal baseline → DON'T copy it**
#### Variable Creation Decision Tree
Before creating ANY state-specific variable, ask:
1. Does federal baseline already calculate this?
2. Does my state do it DIFFERENTLY than federal?
3. Can I write the difference in 1+ lines of state-specific logic?
4. **Will this calculation be used in 2+ other variables?** (Code reuse exception)
**Decision:**
- If YES/NO/NO/NO → **DON'T create the variable**, use federal directly
- If YES/YES/YES/NO → **CREATE the variable** with state logic
- If YES/NO/NO/YES → **CREATE as intermediate variable** for code reuse (see exception below)
#### EXCEPTION: Code Reuse Justifies Intermediate Variables
**Even without state-specific logic, create a variable if the SAME calculation is used in multiple places.**
**Bad - Duplicating calculation across variables:**
```python
# Variable 1 - Income eligibility
class mo_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
# Duplicated calculation
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
return gross <= p.income_limit
# Variable 2 - Countable income
class mo_tanf_countable_income(Variable):
def formula(spm_unit, period, parameters):
# SAME calculation repeated!
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
deductions = spm_unit("mo_tanf_deductions", period)
return max_(gross - deductions, 0)
# Variable 3 - Need standard
class mo_tanf_need_standard(Variable):
def formula(spm_unit, period, parameters):
# SAME calculation AGAIN!
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
return where(gross < p.threshold, p.high, p.low)
```
**Good - Extract into reusable intermediate variable:**
```python
# Intermediate variable - used in multiple places
class mo_tanf_gross_income(Variable):
adds = ["tanf_gross_earned_income", "tanf_gross_unearned_income"]
# Variable 1 - Reuses intermediate
class mo_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
return gross <= p.income_limit
# Variable 2 - Reuses intermediate
class mo_tanf_countable_income(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
deductions = spm_unit("mo_tanf_deductions", period)
return max_(gross - deductions, 0)
# Variable 3 - Reuses intermediate
class mo_tanf_need_standard(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
return where(gross < p.threshold, p.high, p.low)
```
**When to create intermediate variables for reuse:**
- ✅ Same calculation appears in 2+ variables
- ✅ Represents a meaningful concept (e.g., "gross income", "net resources")
- ✅ Simplifies maintenance (change once vs many places)
- ✅ Follows DRY (Don't Repeat Yourself) principle
**When NOT to create (still a wrapper):**
- ❌ Only used in ONE place
- ❌ Just passes through another variable unchanged
- ❌ Adds indirection without code reuse benefit
#### Red Flags for Unnecessary Wrapper Variables
```python
INVALID - Pure wrapper, no state logic:
class in_tanf_assistance_unit_size(Variable):
def formula(spm_unit, period):
return spm_unit("spm_unit_size", period) # Just returns federal
INVALID - Aggregation without transformation:
class in_tanf_countable_unearned_income(Variable):
def formula(tax_unit, period):
return tax_unit.sum(person("tanf_gross_unearned_income", period))
INVALID - Pass-through with no modification:
class in_tanf_gross_income(Variable):
def formula(entity, period):
return entity("tanf_gross_income", period)
```
#### Examples of VALID State Variables
```python
VALID - Has state-specific disregard:
class in_tanf_countable_earned_income(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.in.tanf.income
earned = spm_unit("tanf_gross_earned_income", period)
return earned * (1 - p.earned_income_disregard_rate) # STATE LOGIC
VALID - Uses state-specific limits:
class in_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.in.tanf
income = spm_unit("tanf_countable_income", period)
size = spm_unit("spm_unit_size", period.this_year)
limit = p.income_limit[min_(size, p.max_household_size)] # STATE PARAMS
return income <= limit
VALID - IL has different counting rules:
class il_tanf_assistance_unit_size(Variable):
adds = [
"il_tanf_payment_eligible_child", # STATE-SPECIFIC
"il_tanf_payment_eligible_parent", # STATE-SPECIFIC
]
```
#### State Variables to AVOID Creating
For TANF implementations:
**❌ DON'T create these (use federal directly):**
- `state_tanf_assistance_unit_size` (unless different counting rules like IL)
- `state_tanf_countable_unearned_income` (unless state has disregards)
- `state_tanf_gross_income` (just use federal baseline)
- Any variable that's just `return entity("federal_variable", period)`
**✅ DO create these (when state has unique rules):**
- `state_tanf_countable_earned_income` (if unique disregard %)
- `state_tanf_income_eligible` (state income limits)
- `state_tanf_maximum_benefit` (state payment standards)
- `state_tanf` (final benefit calculation)
### Demographic Eligibility Pattern
**Option 1: Use Federal (Simplified)**
```python
class ca_tanf_eligible(Variable):
def formula(spm_unit, period, parameters):
# Use federal variable
has_eligible = spm_unit.any(
spm_unit.members("is_demographic_tanf_eligible", period)
)
return has_eligible & income_eligible
```
**Option 2: State-Specific (Different thresholds)**
```python
class ca_tanf_demographic_eligible_person(Variable):
def formula(person, period, parameters):
p = parameters(period).gov.states.ca.tanf
age = person("age", period.this_year) # NOT monthly_age
age_limit = where(
person("is_full_time_student", period),
p.age_threshold.student,
p.age_threshold.minor_child
)
return age < age_limit
```
---
## Common Implementation Patterns
### Income Eligibility
```python
class program_income_eligible(Variable):
value_type = bool
entity = SPMUnit
definition_period = MONTH
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.program
income = spm_unit("program_countable_income", period)
size = spm_unit("spm_unit_size", period.this_year)
# Get threshold from parameters
threshold = p.income_limit[min_(size, p.max_household_size)]
return income <= threshold
```
### Benefit Calculation
```python
class program_benefit(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH
unit = USD
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.program
eligible = spm_unit("program_eligible", period)
# Calculate benefit amount
base = p.benefit_schedule.base_amount
adjustment = p.benefit_schedule.adjustment_rate
size = spm_unit("spm_unit_size", period.this_year)
amount = base + (size - 1) * adjustment
return where(eligible, amount, 0)
```
### Using Scale Parameters
```python
def formula(entity, period, parameters):
p = parameters(period).gov.states.az.program
federal_p = parameters(period).gov.hhs.fpg
# Federal base with state scale
size = entity("household_size", period.this_year)
fpg = federal_p.first_person + federal_p.additional * (size - 1)
state_scale = p.income_limit_scale # Often exists
income_limit = fpg * state_scale
```
---
## Variable Creation Checklist
Before creating any variable:
- [ ] Check if it already exists
- [ ] Use standard demographic variables (age, is_disabled)
- [ ] Reuse federal calculations where applicable
- [ ] Check for household_income before creating new
- [ ] Look for existing intermediate variables
- [ ] Study reference implementations
---
## Quality Standards
### Complete Implementation Requirements
- All values from parameters (no hard-coding)
- Complete formula logic
- Proper entity aggregation
- Correct period handling
- Meaningful variable names
- Proper metadata
### Anti-Patterns to Avoid
- Copy-pasting logic between files
- Hard-coding any numeric values
- Creating duplicate income variables
- State-specific versions of federal rules
- Placeholder TODOs in production code
---
## Parameter-to-Variable Mapping Requirements
### Every Parameter Must Have a Variable
**CRITICAL: Complete implementation means every parameter is used!**
When you create parameters, you MUST create corresponding variables:
| Parameter Type | Required Variable(s) |
|---------------|---------------------|
| resources/limit | `state_program_resource_eligible` |
| income/limit | `state_program_income_eligible` |
| payment_standard | `state_program_maximum_benefit` |
| income/disregard | `state_program_countable_earned_income` |
| categorical/requirements | `state_program_categorically_eligible` |
### Complete Eligibility Formula
The main eligibility variable MUST combine ALL checks:
```python
class state_program_eligible(Variable):
def formula(spm_unit, period, parameters):
income_eligible = spm_unit("state_program_income_eligible", period)
resource_eligible = spm_unit("state_program_resource_eligible", period) # DON'T FORGET!
categorical = spm_unit("state_program_categorically_eligible", period)
return income_eligible & resource_eligible & categorical
```
**Common Implementation Failures:**
- ❌ Created resource limit parameter but no resource_eligible variable
- ❌ Main eligible variable only checks income, ignores resources
- ❌ Parameters created but never referenced in any formula
---
## For Agents
When implementing variables:
1. **Study reference implementations** (DC, IL, TX TANF)
2. **Never hard-code values** - use parameters
3. **Map every parameter to a variable** - no orphaned parameters
4. **Complete ALL eligibility checks** - income AND resources AND categorical
5. **Reuse existing variables** - avoid duplication
6. **Use `adds` when possible** - cleaner than formula
7. **Create intermediate variables** for complex logic
8. **Follow metadata standards** exactly
9. **Complete implementation** or delete the file

View File

@@ -0,0 +1,440 @@
---
name: policyengine-parameter-patterns
description: PolicyEngine parameter patterns - YAML structure, naming conventions, metadata requirements, federal/state separation
---
# PolicyEngine Parameter Patterns
Comprehensive patterns for creating PolicyEngine parameter files.
## Critical: Required Structure
Every parameter MUST have this exact structure:
```yaml
description: [One sentence description].
values:
YYYY-MM-DD: value
metadata:
unit: [type] # REQUIRED
period: [period] # REQUIRED
label: [name] # REQUIRED
reference: # REQUIRED
- title: [source]
href: [url]
```
**Missing ANY metadata field = validation error**
---
## 1. File Naming Conventions
### Study Reference Implementations First
Before naming, examine:
- DC TANF: `/parameters/gov/states/dc/dhs/tanf/`
- IL TANF: `/parameters/gov/states/il/dhs/tanf/`
- TX TANF: `/parameters/gov/states/tx/hhs/tanf/`
### Naming Patterns
**Dollar amounts → `/amount.yaml`**
```
income/deductions/work_expense/amount.yaml # $120
resources/limit/amount.yaml # $6,000
payment_standard/amount.yaml # $320
```
**Percentages/rates → `/rate.yaml` or `/percentage.yaml`**
```
income_limit/rate.yaml # 1.85 (185% FPL)
benefit_reduction/rate.yaml # 0.2 (20%)
income/disregard/percentage.yaml # 0.67 (67%)
```
**Thresholds → `/threshold.yaml`**
```
age_threshold/minor_child.yaml # 18
age_threshold/elderly.yaml # 60
income/threshold.yaml # 30_000
```
---
## 2. Description Field
### The ONLY Acceptable Formula
```yaml
description: [State] [verb] [category] to [this X] under the [Full Program Name] program.
```
**Components:**
1. **[State]**: Full state name (Indiana, Texas, California)
2. **[verb]**: ONLY use: limits, provides, sets, excludes, deducts, uses
3. **[category]**: What's being limited/provided (gross income, resources, payment standard)
4. **[this X]**: ALWAYS use generic placeholder
- `this amount` (for currency-USD)
- `this share` or `this percentage` (for rates/percentages)
- `this threshold` (for age/counts)
5. **[Full Program Name]**: ALWAYS spell out (Temporary Assistance for Needy Families, NOT TANF)
### Copy These Exact Templates
**For income limits:**
```yaml
description: [State] limits gross income to this amount under the Temporary Assistance for Needy Families program.
```
**For resource limits:**
```yaml
description: [State] limits resources to this amount under the Temporary Assistance for Needy Families program.
```
**For payment standards:**
```yaml
description: [State] provides this amount as the payment standard under the Temporary Assistance for Needy Families program.
```
**For disregards:**
```yaml
description: [State] excludes this share of earnings from countable income under the Temporary Assistance for Needy Families program.
```
### Description Validation Checklist
Run this check on EVERY description:
```python
# Pseudo-code validation
def validate_description(desc):
checks = [
desc.count('.') == 1, # Exactly one sentence
'TANF' not in desc, # No acronyms
'SNAP' not in desc, # No acronyms
'this amount' in desc or 'this share' in desc or 'this percentage' in desc,
'under the' in desc and 'program' in desc,
'by household size' not in desc, # No explanatory text
'based on' not in desc, # No explanatory text
'for eligibility' not in desc, # Redundant
]
return all(checks)
```
**CRITICAL: Always spell out full program names in descriptions!**
---
## 3. Values Section
### Format Rules
```yaml
values:
2024-01-01: 3_000 # Use underscores
# NOT: 3000
2024-01-01: 0.2 # Remove trailing zeros
# NOT: 0.20 or 0.200
2024-01-01: 2 # No decimals for integers
# NOT: 2.0 or 2.00
```
### Effective Dates
**Use exact dates from sources:**
```yaml
# If source says "effective July 1, 2023"
2023-07-01: value
# If source says "as of October 1"
2024-10-01: value
# NOT arbitrary dates:
2000-01-01: value # Shows no research
```
**Date format:** `YYYY-MM-01` (always use 01 for day)
---
## 4. Metadata Fields (ALL REQUIRED)
### unit
Common units:
- `currency-USD` - Dollar amounts
- `/1` - Rates, percentages (as decimals)
- `month` - Number of months
- `year` - Age in years
- `bool` - True/false
- `person` - Count of people
### period
- `year` - Annual values
- `month` - Monthly values
- `day` - Daily values
- `eternity` - Never changes
### label
Pattern: `[State] [PROGRAM] [description]`
```yaml
label: Montana TANF minor child age threshold
label: Illinois TANF earned income disregard rate
label: California SNAP resource limit
```
**Rules:**
- Spell out state name
- Abbreviate program (TANF, SNAP)
- No period at end
### reference
**Requirements:**
1. At least one source (prefer two)
2. Must contain the actual value
3. Legal codes need subsections
4. PDFs need page anchors
```yaml
✅ GOOD:
reference:
- title: Idaho Admin Code 16.05.03.205(3)
href: https://adminrules.idaho.gov/rules/current/16/160503.pdf#page=14
- title: Idaho LIHEAP Guidelines, Section 3, page 8
href: https://healthandwelfare.idaho.gov/guidelines.pdf#page=8
❌ BAD:
reference:
- title: Federal LIHEAP regulations # Too generic
href: https://www.acf.hhs.gov/ocs # No specific section
```
---
## 5. Federal/State Separation
### Federal Parameters
Location: `/parameters/gov/{agency}/{program}/`
```yaml
# parameters/gov/hhs/fpg/first_person.yaml
description: HHS sets this amount as the federal poverty guideline for one person.
```
### State Parameters
Location: `/parameters/gov/states/{state}/{agency}/{program}/`
```yaml
# parameters/gov/states/ca/dss/tanf/income_limit/rate.yaml
description: California uses this multiplier of the federal poverty guideline for TANF income eligibility.
```
---
## 6. Common Parameter Patterns
### Income Limits (as FPL multiplier)
```yaml
# income_limit/rate.yaml
description: State uses this multiplier of the federal poverty guideline for program income limits.
values:
2024-01-01: 1.85 # 185% FPL
metadata:
unit: /1
period: year
label: State PROGRAM income limit multiplier
```
### Benefit Amounts
```yaml
# payment_standard/amount.yaml
description: State provides this amount as the monthly program benefit.
values:
2024-01-01: 500
metadata:
unit: currency-USD
period: month
label: State PROGRAM payment standard amount
```
### Age Thresholds
```yaml
# age_threshold/minor_child.yaml
description: State defines minor children as under this age for program eligibility.
values:
2024-01-01: 18
metadata:
unit: year
period: eternity
label: State PROGRAM minor child age threshold
```
### Disregard Percentages
```yaml
# income/disregard/percentage.yaml
description: State excludes this share of earned income from program calculations.
values:
2024-01-01: 0.67 # 67%
metadata:
unit: /1
period: eternity
label: State PROGRAM earned income disregard percentage
```
---
## 7. Validation Checklist
Before creating parameters:
- [ ] Studied reference implementations (DC, IL, TX)
- [ ] All four metadata fields present
- [ ] Description is one complete sentence
- [ ] Values use underscore separators
- [ ] Trailing zeros removed from decimals
- [ ] References include subsections and page numbers
- [ ] Label follows naming pattern
- [ ] Effective date matches source document
---
## 8. Common Mistakes to Avoid
### Missing Metadata
```yaml
❌ WRONG - Missing required fields:
metadata:
unit: currency-USD
label: Benefit amount
# Missing: period, reference
```
### Generic References
```yaml
❌ WRONG:
reference:
- title: State TANF Manual
href: https://state.gov/tanf
✅ CORRECT:
reference:
- title: State TANF Manual Section 5.2, page 15
href: https://state.gov/tanf-manual.pdf#page=15
```
### Arbitrary Dates
```yaml
❌ WRONG:
values:
2000-01-01: 500 # Lazy default
✅ CORRECT:
values:
2023-07-01: 500 # From source: "effective July 1, 2023"
```
---
## Real-World Examples from Production Code
**CRITICAL: Study actual parameter files, not just examples!**
Before writing ANY parameter:
1. Open and READ 3+ similar parameter files from TX/IL/DC
2. COPY their exact description pattern
3. Replace state name and specific details only
### Payment Standards
```yaml
# Texas (actual production)
description: Texas provides this amount as the payment standard under the Temporary Assistance for Needy Families program.
# Pennsylvania (actual production)
description: Pennsylvania limits TANF benefits to households with resources at or below this amount.
```
### Income Limits
```yaml
# Indiana (should be)
description: Indiana limits gross income to this amount under the Temporary Assistance for Needy Families program.
# Texas (actual production)
description: Texas limits countable resources to this amount under the Temporary Assistance for Needy Families program.
```
### Disregards
```yaml
# Indiana (should be)
description: Indiana excludes this share of earnings from countable income under the Temporary Assistance for Needy Families program.
# Texas (actual production)
description: Texas deducts this standard work expense amount from gross earned income for Temporary Assistance for Needy Families program calculations.
```
### Pattern Analysis
- **ALWAYS** spell out full program name
- Use "under the [Program] program" or "for [Program] program calculations"
- One simple verb (limits, provides, excludes, deducts)
- One "this X" placeholder
- NO extra explanation ("based on X", "This is Y")
### Common Description Mistakes to AVOID
**❌ WRONG - Using acronyms:**
```yaml
description: Indiana sets this gross income limit for TANF eligibility by household size.
# Problems: "TANF" not spelled out, unnecessary "by household size"
```
**✅ CORRECT:**
```yaml
description: Indiana limits gross income to this amount under the Temporary Assistance for Needy Families program.
```
**❌ WRONG - Adding explanatory text:**
```yaml
description: Indiana provides this payment standard amount based on household size.
# Problem: "based on household size" is unnecessary (evident from breakdown)
```
**✅ CORRECT:**
```yaml
description: Indiana provides this amount as the payment standard under the Temporary Assistance for Needy Families program.
```
**❌ WRONG - Missing program context:**
```yaml
description: Indiana sets the gross income limit.
# Problem: No program name, no "this amount"
```
**✅ CORRECT:**
```yaml
description: Indiana limits gross income to this amount under the Temporary Assistance for Needy Families program.
```
### Authoritative Source Requirements
**ONLY use official government sources:**
- ✅ State codes and administrative regulations
- ✅ Official state agency websites (.gov domains)
- ✅ Federal regulations (CFR, USC)
- ✅ State plans and official manuals (.gov PDFs)
**NEVER use:**
- ❌ Third-party guides (singlemotherguide.com, benefits.gov descriptions)
- ❌ Wikipedia
- ❌ Nonprofit summaries (unless no official source exists)
- ❌ News articles
---
## For Agents
When creating parameters:
1. **READ ACTUAL FILES** - Study TX/IL/DC parameter files, not just skill examples
2. **Include ALL metadata fields** - missing any causes errors
3. **Use exact effective dates** from sources
4. **Follow naming conventions** (amount/rate/threshold)
5. **Write simple descriptions** with "this" placeholders and full program names
6. **Include ONLY official government references** with subsections and pages
7. **Format values properly** (underscores, no trailing zeros)

View File

@@ -0,0 +1,478 @@
---
name: policyengine-period-patterns
description: PolicyEngine period handling - converting between YEAR, MONTH definition periods and testing patterns
---
# PolicyEngine Period Patterns
Essential patterns for handling different definition periods (YEAR, MONTH) in PolicyEngine.
## Quick Reference
| From | To | Method | Example |
|------|-----|--------|---------|
| MONTH formula | YEAR variable | `period.this_year` | `age = person("age", period.this_year)` |
| YEAR formula | MONTH variable | `period.first_month` | `person("monthly_rent", period.first_month)` |
| Any | Year integer | `period.start.year` | `year = period.start.year` |
| Any | Month integer | `period.start.month` | `month = period.start.month` |
| Annual → Monthly | Divide by 12 | `/ MONTHS_IN_YEAR` | `monthly = annual / 12` |
| Monthly → Annual | Multiply by 12 | `* MONTHS_IN_YEAR` | `annual = monthly * 12` |
---
## 1. Definition Periods in PolicyEngine US
### Available Periods
- **YEAR**: Annual values (most common - 2,883 variables)
- **MONTH**: Monthly values (395 variables)
- **ETERNITY**: Never changes (1 variable - structural relationships)
**Note:** QUARTER is NOT used in PolicyEngine US
### Examples
```python
from policyengine_us.model_api import *
class annual_income(Variable):
definition_period = YEAR # Annual amount
class monthly_benefit(Variable):
definition_period = MONTH # Monthly amount
class is_head(Variable):
definition_period = ETERNITY # Never changes
```
---
## 2. The Golden Rule
**When accessing a variable with a different definition period than your formula, you must specify the target period explicitly.**
```python
# ✅ CORRECT - MONTH formula accessing YEAR variable
def formula(person, period, parameters):
age = person("age", period.this_year) # Gets actual age
# ❌ WRONG - Would get age/12
def formula(person, period, parameters):
age = person("age", period) # BAD: gives age divided by 12!
```
---
## 3. Common Patterns
### Pattern 1: MONTH Formula Accessing YEAR Variable
**Use Case**: Monthly benefits need annual demographic data
```python
class monthly_benefit_eligible(Variable):
value_type = bool
entity = Person
definition_period = MONTH # Monthly eligibility
def formula(person, period, parameters):
# Age is YEAR-defined, use period.this_year
age = person("age", period.this_year) # ✅ Gets full age
# is_pregnant is MONTH-defined, just use period
is_pregnant = person("is_pregnant", period) # ✅ Same period
return (age < 18) | is_pregnant
```
### Pattern 2: Accessing Stock Variables (Assets)
**Stock variables** (point-in-time values like assets) are typically YEAR-defined
```python
class tanf_countable_resources(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH # Monthly check
def formula(spm_unit, period, parameters):
# Assets are stocks (YEAR-defined)
cash = spm_unit("cash_assets", period.this_year) # ✅
vehicles = spm_unit("vehicles_value", period.this_year) # ✅
p = parameters(period).gov.tanf.resources
return cash + max_(0, vehicles - p.vehicle_exemption)
```
---
## 4. Understanding Auto-Conversion: When to Use `period` vs `period.this_year`
### The Key Question
**When accessing a YEAR variable from a MONTH formula, should the value be divided by 12?**
- **If YES** → Use `period` (let auto-conversion happen)
- **If NO** → Use `period.this_year` (prevent auto-conversion)
### When Auto-Conversion Makes Sense (Use `period`)
**Flow variables** where you want the monthly portion:
```python
class monthly_benefit(Variable):
definition_period = MONTH
def formula(person, period, parameters):
# ✅ Use period - want $2,000/month from $24,000/year
monthly_income = person("employment_income", period)
# Compare to monthly threshold
p = parameters(period).gov.program
return monthly_income < p.monthly_threshold
```
Why: If annual income is $24,000, you want $2,000/month for monthly eligibility checks.
### When Auto-Conversion Breaks Things (Use `period.this_year`)
**Stock variables and counts** where division by 12 is nonsensical:
**1. Age**
```python
# ❌ WRONG - gives age/12
age = person("age", period) # 30 years → 2.5 "monthly age" ???
# ✅ CORRECT - gives actual age
age = person("age", period.this_year) # 30 years
```
**2. Assets/Resources (Stocks)**
```python
# ❌ WRONG - gives assets/12
assets = spm_unit("spm_unit_assets", period) # $12,000 → $1,000 ???
# ✅ CORRECT - gives point-in-time value
assets = spm_unit("spm_unit_assets", period.this_year) # $12,000
```
**3. Counts (Household Size, Number of Children)**
```python
# ❌ WRONG - gives count/12
size = spm_unit("household_size", period) # 4 people → 0.33 people ???
# ✅ CORRECT - gives actual count
size = spm_unit("household_size", period.this_year) # 4 people
```
**4. Boolean/Enum Variables**
```python
# ❌ WRONG - weird fractional conversion
status = person("is_disabled", period)
# ✅ CORRECT - actual status
status = person("is_disabled", period.this_year)
```
### Decision Tree
```
Accessing YEAR variable from MONTH formula?
├─ Is it an INCOME or FLOW variable?
│ └─ YES → Use period (auto-convert to monthly) ✅
│ Example: employment_income, self_employment_income
└─ Is it AGE, ASSET, COUNT, or BOOLEAN?
└─ YES → Use period.this_year (prevent conversion) ✅
Examples: age, assets, household_size, is_disabled
```
### Complete Example
```python
class monthly_tanf_eligible(Variable):
value_type = bool
entity = Person
definition_period = MONTH
def formula(person, period, parameters):
# Age: Use period.this_year (don't want age/12)
age = person("age", period.this_year) # ✅
# Assets: Use period.this_year (don't want assets/12)
assets = person("assets", period.this_year) # ✅
# Income: Use period (DO want monthly income from annual)
monthly_income = person("employment_income", period) # ✅
p = parameters(period).gov.tanf.eligibility
age_eligible = (age >= 18) & (age <= 64)
asset_eligible = assets <= p.asset_limit
income_eligible = monthly_income <= p.monthly_income_limit
return age_eligible & asset_eligible & income_eligible
```
### Quick Reference for Auto-Conversion
| Variable Type | Use `period` | Use `period.this_year` | Why |
|--------------|-------------|----------------------|-----|
| Income (flow) | ✅ | ❌ | Want monthly portion |
| Age | ❌ | ✅ | Age/12 is meaningless |
| Assets/Resources (stock) | ❌ | ✅ | Point-in-time value |
| Household size/counts | ❌ | ✅ | Can't divide people |
| Boolean/status flags | ❌ | ✅ | True/12 is nonsense |
| Demographic attributes | ❌ | ✅ | Properties don't divide |
**Rule of thumb:** If dividing by 12 makes the value meaningless → use `period.this_year`
### Pattern 3: Converting Annual to Monthly
```python
class monthly_income_limit(Variable):
definition_period = MONTH
def formula(household, period, parameters):
# Get annual parameter
annual_limit = parameters(period).gov.program.annual_limit
# Convert to monthly
monthly_limit = annual_limit / MONTHS_IN_YEAR # ✅
return monthly_limit
```
### Pattern 4: Getting Period Components
```python
class federal_poverty_guideline(Variable):
definition_period = MONTH
def formula(entity, period, parameters):
# Get year and month as integers
year = period.start.year # e.g., 2024
month = period.start.month # e.g., 1-12
# FPG updates October 1st
if month >= 10:
instant_str = f"{year}-10-01"
else:
instant_str = f"{year - 1}-10-01"
# Access parameters at specific date
p_fpg = parameters(instant_str).gov.hhs.fpg
return p_fpg.first_person / MONTHS_IN_YEAR
```
---
## 5. Parameter Access
### Standard Access
```python
def formula(entity, period, parameters):
# Parameters use current period
p = parameters(period).gov.program.benefit
return p.amount
```
### Specific Date Access
```python
def formula(entity, period, parameters):
# Access parameters at specific instant
p = parameters("2024-10-01").gov.hhs.fpg
return p.amount
```
**Important**: Never use `parameters(period.this_year)` - parameters always use the formula's period
---
## 6. Testing with Different Periods
### Critical Testing Rules
**For MONTH period tests** (`period: 2025-01`):
- **Input** YEAR variables as **annual amounts**
- **Output** YEAR variables show **monthly values** (÷12)
### Test Examples
**Example 1: Basic MONTH Test**
```yaml
- name: Monthly income test
period: 2025-01 # MONTH period
input:
people:
person1:
employment_income: 12_000 # Input: Annual
output:
employment_income: 1_000 # Output: Monthly (12_000/12)
```
**Example 2: Mixed Variables**
```yaml
- name: Eligibility with age and income
period: 2024-01 # MONTH period
input:
age: 30 # Age doesn't convert
employment_income: 24_000 # Annual input
output:
age: 30 # Age stays same
employment_income: 2_000 # Monthly output
monthly_eligible: true
```
**Example 3: YEAR Period Test**
```yaml
- name: Annual calculation
period: 2024 # YEAR period
input:
employment_income: 18_000 # Annual
output:
employment_income: 18_000 # Annual output
annual_tax: 2_000
```
### Testing Best Practices
1. **Always specify period explicitly**
2. **Input YEAR variables as annual amounts**
3. **Expect monthly output for YEAR variables in MONTH tests**
4. **Use underscore separators**: `12_000` not `12000`
5. **Add calculation comments** in integration tests
---
## 7. Common Mistakes and Solutions
### ❌ Mistake 1: Not Using period.this_year
```python
# WRONG - From MONTH formula
def formula(person, period, parameters):
age = person("age", period) # Gets age/12!
# CORRECT
def formula(person, period, parameters):
age = person("age", period.this_year) # Gets actual age
```
### ❌ Mistake 2: Mixing Annual and Monthly
```python
# WRONG - Comparing different units
monthly_income = person("monthly_income", period)
annual_limit = parameters(period).gov.limit
if monthly_income < annual_limit: # BAD comparison
# CORRECT - Convert to same units
monthly_income = person("monthly_income", period)
annual_limit = parameters(period).gov.limit
monthly_limit = annual_limit / MONTHS_IN_YEAR
if monthly_income < monthly_limit: # Good comparison
```
### ❌ Mistake 3: Wrong Test Expectations
```yaml
# WRONG - Expecting annual in MONTH test
period: 2024-01
input:
employment_income: 12_000
output:
employment_income: 12_000 # Wrong!
# CORRECT
period: 2024-01
input:
employment_income: 12_000 # Annual input
output:
employment_income: 1_000 # Monthly output
```
---
## 8. Quick Patterns Cheat Sheet
### Accessing Variables
| Your Formula | Target Variable | Use |
|--------------|-----------------|-----|
| MONTH | YEAR | `period.this_year` |
| YEAR | MONTH | `period.first_month` |
| Any | ETERNITY | `period` |
### Common Variables That Need period.this_year
- `age`
- `household_size`, `spm_unit_size`
- `cash_assets`, `vehicles_value`
- `state_name`, `state_code`
- Any demographic variable
### Period Conversion
```python
# Annual to monthly
monthly = annual / MONTHS_IN_YEAR
# Monthly to annual
annual = monthly * MONTHS_IN_YEAR
# Get year/month numbers
year = period.start.year # 2024
month = period.start.month # 1-12
```
---
## 9. Real-World Example
```python
class tanf_income_eligible(Variable):
value_type = bool
entity = SPMUnit
definition_period = MONTH # Monthly eligibility
def formula(spm_unit, period, parameters):
# YEAR variables need period.this_year
household_size = spm_unit("spm_unit_size", period.this_year)
state = spm_unit.household("state_code", period.this_year)
# MONTH variables use period
gross_income = spm_unit("tanf_gross_income", period)
# Parameters use period
p = parameters(period).gov.states[state].tanf
# Convert annual limit to monthly
annual_limit = p.income_limit[household_size]
monthly_limit = annual_limit / MONTHS_IN_YEAR
return gross_income <= monthly_limit
```
---
## 10. Checklist for Period Handling
When writing a formula:
- [ ] Identify your formula's `definition_period`
- [ ] Check `definition_period` of accessed variables
- [ ] Use `period.this_year` for YEAR variables from MONTH formulas
- [ ] Use `period` for parameters (not `period.this_year`)
- [ ] Convert units when comparing (annual ↔ monthly)
- [ ] Test with appropriate period values
---
## Related Skills
- **policyengine-aggregation-skill**: For summing across entities with period handling
- **policyengine-core-skill**: For understanding variable and parameter systems
---
## For Agents
1. **Always check definition_period** before accessing variables
2. **Default to period.this_year** for demographic/stock variables from MONTH formulas
3. **Test thoroughly** - period mismatches cause subtle bugs
4. **Document period conversions** in comments
5. **Follow existing patterns** in similar variables

View File

@@ -0,0 +1,376 @@
---
name: policyengine-review-patterns
description: PolicyEngine code review patterns - validation checklist, common issues, review standards
---
# PolicyEngine Review Patterns
Comprehensive patterns for reviewing PolicyEngine implementations.
## Understanding WHY, Not Just WHAT
### Pattern Analysis Before Review
When reviewing implementations that reference other states:
**🔴 CRITICAL: Check WHY Variables Exist**
Before approving any state-specific variable, verify:
1. **Does it have state-specific logic?** - Read the formula
2. **Are state parameters used?** - Check for `parameters(period).gov.states.XX`
3. **Is there transformation beyond aggregation?** - Look for calculations
4. **Would removing it break functionality?** - Test dependencies
**Example Analysis:**
```python
# IL TANF has this variable:
class il_tanf_assistance_unit_size(Variable):
adds = ["il_tanf_payment_eligible_child", "il_tanf_payment_eligible_parent"]
# ✅ VALID: IL-specific eligibility rules
# But IN TANF shouldn't copy it blindly:
class in_tanf_assistance_unit_size(Variable):
def formula(spm_unit, period):
return spm_unit("spm_unit_size", period)
# ❌ INVALID: No IN-specific logic, just wrapper
```
### Wrapper Variable Detection
**Red Flags - Variables that shouldn't exist:**
- Formula is just `return entity("federal_variable", period)`
- Aggregates federal baseline with no transformation
- No state parameters accessed
- Comment says "use federal" but creates variable anyway
**Action:** Request deletion of unnecessary wrapper variables
---
## Priority Review Checklist
### 🔴 CRITICAL - Automatic Failures
These issues will cause crashes or incorrect results:
#### 1. Vectorization Violations
```python
FAILS:
if household("income") > 1000: # Will crash with arrays
return 500
PASSES:
return where(household("income") > 1000, 500, 100)
```
#### 2. Hard-Coded Values
```python
FAILS:
benefit = min_(income * 0.33, 500) # Hard-coded 0.33 and 500
PASSES:
benefit = min_(income * p.rate, p.maximum)
```
#### 3. Missing Parameter Sources
```yaml
❌ FAILS:
reference:
- title: State website
href: https://state.gov
✅ PASSES:
reference:
- title: Idaho Admin Code 16.05.03.205(3)
href: https://adminrules.idaho.gov/rules/current/16/160503.pdf#page=14
```
---
### 🟡 MAJOR - Must Fix
These affect accuracy or maintainability:
#### 4. Test Quality Issues
```yaml
❌ FAILS:
income: 50000 # No separator
✅ PASSES:
income: 50_000 # Proper formatting
```
#### 5. Calculation Accuracy
- Order of operations matches regulations
- Deductions applied in correct sequence
- Edge cases handled (negatives, zeros)
#### 6. Description Style
```yaml
❌ FAILS:
description: The amount of SNAP benefits # Passive voice
✅ PASSES:
description: SNAP benefits # Active voice
```
---
### 🟢 MINOR - Should Fix
These improve code quality:
#### 7. Code Organization
- One variable per file
- Proper use of `defined_for`
- Use of `adds` for simple sums
#### 8. Documentation
- Clear references to regulation sections
- Changelog entry present
---
## Common Issues Reference
### Documentation Issues
| Issue | Example | Fix |
|-------|---------|-----|
| No primary source | "See SNAP website" | Add USC/CFR citation |
| Wrong value | $198 vs $200 in source | Update parameter |
| Generic link | dol.gov | Link to specific regulation |
| Missing subsection | "7 CFR 273" | "7 CFR 273.9(d)(3)" |
### Code Issues
| Issue | Impact | Fix |
|-------|--------|-----|
| if-elif-else with data | Crashes microsim | Use where/select |
| Hard-coded values | Inflexible | Move to parameters |
| Missing defined_for | Inefficient | Add eligibility condition |
| Manual summing | Wrong pattern | Use adds attribute |
### Test Issues
| Issue | Example | Fix |
|-------|---------|-----|
| No separators | 100000 | 100_000 |
| No documentation | output: 500 | Add calculation comment |
| Wrong period | 2024-04 | Use 2024-01 or 2024 |
| Made-up variables | heating_expense | Use existing variables |
---
## Source Verification Process
### Step 1: Check Parameter Values
For each parameter file:
```python
Value matches source document
Source is primary (statute > regulation > website)
URL links to exact section with page anchor
Effective dates correct
```
### Step 2: Validate References
**Primary sources (preferred):**
- USC (United States Code)
- CFR (Code of Federal Regulations)
- State statutes
- State admin codes
**Secondary sources (acceptable):**
- Official policy manuals
- State plan documents
**Not acceptable alone:**
- Websites without specific sections
- Summaries or fact sheets
- News articles
---
## Code Quality Checks
### Vectorization Scan
Search for these patterns:
```python
# Red flags that indicate scalar logic:
"if household"
"if person"
"elif"
"else:"
"and " (should be &)
"or " (should be |)
"not " (should be ~)
```
### Hard-Coding Scan
Search for numeric literals:
```python
# Check for any number except:
# 0, 1, -1 (basic math)
# 12 (month conversion)
# Small indices (2, 3 for known structures)
# Flag anything like:
"0.5"
"100"
"0.33"
"65" (unless it's a standard age)
```
---
## Review Response Templates
### For Approval
```markdown
## PolicyEngine Review: APPROVED ✅
### Verification Summary
- ✅ All parameters trace to primary sources
- ✅ Code is properly vectorized
- ✅ Tests document calculations
- ✅ No hard-coded values
### Strengths
- Excellent USC/CFR citations
- Comprehensive test coverage
- Clear calculation logic
### Minor Suggestions (optional)
- Consider adding edge case for zero income
```
### For Changes Required
```markdown
## PolicyEngine Review: CHANGES REQUIRED ❌
### Critical Issues (Must Fix)
1. **Non-vectorized code** - lines 45-50
```python
# Replace this:
if income > threshold:
benefit = high_amount
# With this:
benefit = where(income > threshold, high_amount, low_amount)
```
2. **Parameter value mismatch** - standard_deduction.yaml
- Source shows $200, parameter has $198
- Reference: 7 CFR 273.9(d)(1), page 5
### Major Issues (Should Fix)
3. **Missing primary source** - income_limit.yaml
- Add statute/regulation citation
- Current website link insufficient
Please address these issues and re-request review.
```
---
## Test Validation
### Check Test Structure
```yaml
# Verify proper format:
- name: Case 1, description. # Numbered case with period
period: 2024-01 # Valid period (2024-01 or 2024)
input:
people:
person1: # Generic names
employment_income: 50_000 # Underscores
output:
# Calculation documented
# Income: $50,000/year = $4,167/month
program_benefit: 250
```
### Run Test Commands
```bash
# Unit tests
pytest policyengine_us/tests/policy/baseline/gov/
# Integration tests
policyengine-core test <path> -c policyengine_us
# Microsimulation
pytest policyengine_us/tests/microsimulation/
```
---
## Review Priorities by Context
### New Program Implementation
1. Parameter completeness
2. All documented scenarios tested
3. Eligibility paths covered
4. No hard-coded values
### Bug Fixes
1. Root cause addressed
2. No regression potential
3. Tests prevent recurrence
4. Vectorization maintained
### Refactoring
1. Functionality preserved
2. Tests still pass
3. Performance maintained
4. Code clarity improved
---
## Quick Review Checklist
**Parameters:**
- [ ] Values match sources
- [ ] References include subsections
- [ ] All metadata fields present
- [ ] Effective dates correct
**Variables:**
- [ ] Properly vectorized (no if-elif-else)
- [ ] No hard-coded values
- [ ] Uses existing variables
- [ ] Includes proper metadata
**Tests:**
- [ ] Proper period format
- [ ] Underscore separators
- [ ] Calculation comments
- [ ] Realistic scenarios
**Overall:**
- [ ] Changelog entry
- [ ] Code formatted
- [ ] Tests pass
- [ ] Documentation complete
---
## For Agents
When reviewing code:
1. **Check vectorization first** - crashes are worst
2. **Verify parameter sources** - accuracy critical
3. **Scan for hard-coding** - maintainability issue
4. **Validate test quality** - ensures correctness
5. **Run all tests** - catch integration issues
6. **Document issues clearly** - help fixes
7. **Provide fix examples** - speed resolution

View File

@@ -0,0 +1,412 @@
---
name: policyengine-testing-patterns
description: PolicyEngine testing patterns - YAML test structure, naming conventions, period handling, and quality standards
---
# PolicyEngine Testing Patterns
Comprehensive patterns and standards for creating PolicyEngine tests.
## Quick Reference
### File Structure
```
policyengine_us/tests/policy/baseline/gov/states/[state]/[agency]/[program]/
├── [variable_name].yaml # Unit test for specific variable
├── [another_variable].yaml # Another unit test
└── integration.yaml # Integration test (NEVER prefixed)
```
### Period Restrictions
-`2024-01` - First month only
-`2024` - Whole year
-`2024-04` - Other months NOT supported
-`2024-01-01` - Full dates NOT supported
### Naming Convention
- Files: `variable_name.yaml` (matches variable exactly)
- Integration: Always `integration.yaml` (never prefixed)
- Cases: `Case 1, description.` (numbered, comma, period)
- People: `person1`, `person2` (never descriptive names)
---
## 1. Test File Organization
### File Naming Rules
**Unit tests** - Named after the variable they test:
```
✅ CORRECT:
az_liheap_eligible.yaml # Tests az_liheap_eligible variable
az_liheap_benefit.yaml # Tests az_liheap_benefit variable
❌ WRONG:
test_az_liheap.yaml # Wrong prefix
liheap_tests.yaml # Wrong pattern
```
**Integration tests** - Always named `integration.yaml`:
```
✅ CORRECT:
integration.yaml # Standard name
❌ WRONG:
az_liheap_integration.yaml # Never prefix integration
program_integration.yaml # Never prefix integration
```
### Folder Structure
Follow state/agency/program hierarchy:
```
gov/
└── states/
└── [state_code]/
└── [agency]/
└── [program]/
├── eligibility/
│ └── income_eligible.yaml
├── income/
│ └── countable_income.yaml
└── integration.yaml
```
---
## 2. Period Format Restrictions
### Critical: Only Two Formats Supported
PolicyEngine test system ONLY supports:
- `2024-01` - First month of year
- `2024` - Whole year
**Never use:**
- `2024-04` - April (will fail)
- `2024-10` - October (will fail)
- `2024-01-01` - Full date (will fail)
### Handling Mid-Year Policy Changes
If policy changes April 1, 2024:
```yaml
# Option 1: Test with first month
period: 2024-01 # Tests January with new policy
# Option 2: Test next year
period: 2025-01 # When policy definitely active
```
---
## 3. Test Naming Conventions
### Case Names
Use numbered cases with descriptions:
```yaml
✅ CORRECT:
- name: Case 1, single parent with one child.
- name: Case 2, two parents with two children.
- name: Case 3, income at threshold.
❌ WRONG:
- name: Single parent test
- name: Test case for family
- name: Case 1 - single parent # Wrong punctuation
```
### Person Names
Use generic sequential names:
```yaml
✅ CORRECT:
people:
person1:
age: 30
person2:
age: 10
person3:
age: 8
❌ WRONG:
people:
parent:
age: 30
child1:
age: 10
```
### Output Format
Use simplified format without entity key:
```yaml
✅ CORRECT:
output:
tx_tanf_eligible: true
tx_tanf_benefit: 250
❌ WRONG:
output:
tx_tanf_eligible:
spm_unit: true # Don't nest under entity
```
---
## 4. Which Variables Need Tests
### Variables That DON'T Need Tests
Skip tests for simple composition variables using only `adds` or `subtracts`:
```python
# NO TEST NEEDED - just summing
class tx_tanf_countable_income(Variable):
adds = ["earned_income", "unearned_income"]
# NO TEST NEEDED - simple arithmetic
class net_income(Variable):
adds = ["gross_income"]
subtracts = ["deductions"]
```
### Variables That NEED Tests
Create tests for variables with:
- Conditional logic (`where`, `select`, `if`)
- Calculations/transformations
- Business logic
- Deductions/disregards
- Eligibility determinations
```python
# NEEDS TEST - has logic
class tx_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
return where(enrolled, passes_test, other_test)
```
---
## 5. Period Conversion in Tests
### Critical Rule for MONTH Tests
When `period: 2025-01`:
- **Input**: YEAR variables as annual amounts
- **Output**: YEAR variables show monthly values (÷12)
```yaml
- name: Case 1, income conversion.
period: 2025-01 # MONTH period
input:
people:
person1:
employment_income: 12_000 # Input: Annual
output:
employment_income: 1_000 # Output: Monthly (12_000/12)
```
---
## 6. Numeric Formatting
### Always Use Underscore Separators
```yaml
✅ CORRECT:
employment_income: 50_000
cash_assets: 1_500
❌ WRONG:
employment_income: 50000
cash_assets: 1500
```
---
## 7. Integration Test Quality Standards
### Inline Calculation Comments
Document every calculation step:
```yaml
- name: Case 2, earnings with deductions.
period: 2025-01
input:
people:
person1:
employment_income: 3_000 # $250/month
output:
# Person-level arrays
tx_tanf_gross_earned_income: [250, 0]
# Person1: 3,000/12 = 250
tx_tanf_earned_after_disregard: [87.1, 0]
# Person1: 250 - 120 = 130
# Disregard: 130/3 = 43.33
# After: 130 - 43.33 = 86.67 ≈ 87.1
```
### Comprehensive Scenarios
Include 5-7 scenarios covering:
1. Basic eligible case
2. Earnings with deductions
3. Edge case at threshold
4. Mixed enrollment status
5. Special circumstances (SSI, immigration)
6. Ineligible case
### Verify Intermediate Values
Check 8-10 values per test:
```yaml
output:
# Income calculation chain
program_gross_income: 250
program_earned_after_disregard: 87.1
program_deductions: 200
program_countable_income: 0
# Eligibility chain
program_income_eligible: true
program_resources_eligible: true
program_eligible: true
# Final benefit
program_benefit: 320
```
---
## 8. Common Variables to Use
### Always Available
```yaml
# Demographics
age: 30
is_disabled: false
is_pregnant: false
# Income
employment_income: 50_000
self_employment_income: 10_000
social_security: 12_000
ssi: 9_000
# Benefits
snap: 200
tanf: 150
medicaid: true
# Location
state_code: CA
county_code: "06037" # String for FIPS
```
### Variables That DON'T Exist
Never use these (not in PolicyEngine):
- `heating_expense`
- `utility_expense`
- `utility_shut_off_notice`
- `past_due_balance`
- `bulk_fuel_amount`
- `weatherization_needed`
---
## 9. Enum Verification
### Always Check Actual Enum Values
Before using enums in tests:
```bash
# Find enum definition
grep -r "class ImmigrationStatus" --include="*.py"
```
```python
# Check actual values
class ImmigrationStatus(Enum):
CITIZEN = "Citizen"
LEGAL_PERMANENT_RESIDENT = "Legal Permanent Resident" # NOT "PERMANENT_RESIDENT"
REFUGEE = "Refugee"
```
```yaml
✅ CORRECT:
immigration_status: LEGAL_PERMANENT_RESIDENT
❌ WRONG:
immigration_status: PERMANENT_RESIDENT # Doesn't exist
```
---
## 10. Test Quality Checklist
Before submitting tests:
- [ ] All variables exist in PolicyEngine
- [ ] Period format is `2024-01` or `2024` only
- [ ] Numbers use underscore separators
- [ ] Integration tests have calculation comments
- [ ] 5-7 comprehensive scenarios in integration.yaml
- [ ] Enum values verified against actual definitions
- [ ] Output values realistic, not placeholders
- [ ] File names match variable names exactly
---
## Common Test Patterns
### Income Eligibility
```yaml
- name: Case 1, income exactly at threshold.
period: 2024-01
input:
people:
person1:
employment_income: 30_360 # Annual limit
output:
program_income_eligible: true # At threshold = eligible
```
### Priority Groups
```yaml
- name: Case 2, elderly priority.
period: 2024-01
input:
people:
person1:
age: 65
output:
program_priority_group: true
```
### Categorical Eligibility
```yaml
- name: Case 3, SNAP categorical.
period: 2024-01
input:
spm_units:
spm_unit:
snap: 200 # Receives SNAP
output:
program_categorical_eligible: true
```
---
## For Agents
When creating tests:
1. **Check existing variables** before using any in tests
2. **Use only supported periods** (2024-01 or 2024)
3. **Document calculations** in integration tests
4. **Verify enum values** against actual code
5. **Follow naming conventions** exactly
6. **Include edge cases** at thresholds
7. **Test realistic scenarios** not placeholders

View File

@@ -0,0 +1,303 @@
---
name: policyengine-vectorization
description: PolicyEngine vectorization patterns - NumPy operations, where/select usage, avoiding scalar logic with arrays
---
# PolicyEngine Vectorization Patterns
Critical patterns for vectorized operations in PolicyEngine. Scalar logic with arrays will crash the microsimulation.
## The Golden Rule
**PolicyEngine processes multiple households simultaneously using NumPy arrays. NEVER use if-elif-else with entity data.**
---
## 1. Critical: What Will Crash
### ❌ NEVER: if-elif-else with Arrays
```python
# THIS WILL CRASH - household data is an array
def formula(household, period, parameters):
income = household("income", period)
if income > 1000: # ❌ CRASH: "truth value of array is ambiguous"
return 500
else:
return 100
```
### ✅ ALWAYS: Vectorized Operations
```python
# CORRECT - works with arrays
def formula(household, period, parameters):
income = household("income", period)
return where(income > 1000, 500, 100) # ✅ Vectorized
```
---
## 2. Common Vectorization Patterns
### Pattern 1: Simple Conditions → `where()`
```python
# Instead of if-else
if age >= 65:
amount = senior_amount
else:
amount = regular_amount
amount = where(age >= 65, senior_amount, regular_amount)
```
### Pattern 2: Multiple Conditions → `select()`
```python
# Instead of if-elif-else
if age < 18:
benefit = child_amount
elif age >= 65:
benefit = senior_amount
else:
benefit = adult_amount
benefit = select(
[age < 18, age >= 65],
[child_amount, senior_amount],
default=adult_amount
)
```
### Pattern 3: Boolean Operations
```python
# Combining conditions
eligible = (age >= 18) & (income < threshold) # Use & not 'and'
eligible = (is_disabled | is_elderly) # Use | not 'or'
eligible = ~is_excluded # Use ~ not 'not'
```
### Pattern 4: Clipping Values
```python
# Instead of if for bounds checking
if amount < 0:
amount = 0
elif amount > maximum:
amount = maximum
amount = clip(amount, 0, maximum)
# Or: amount = max_(0, min_(amount, maximum))
```
---
## 3. When if-else IS Acceptable
### ✅ OK: Parameter-Only Conditions
```python
# OK - parameters are scalars, not arrays
def formula(entity, period, parameters):
p = parameters(period).gov.program
# This is fine - p.enabled is a scalar boolean
if p.enabled:
base = p.base_amount
else:
base = 0
# But must vectorize when using entity data
income = entity("income", period)
return where(income < p.threshold, base, 0)
```
### ✅ OK: Control Flow (Not Data)
```python
# OK - controlling which calculation to use
def formula(entity, period, parameters):
year = period.start.year
if year >= 2024:
# Use new formula (still vectorized)
return entity("new_calculation", period)
else:
# Use old formula (still vectorized)
return entity("old_calculation", period)
```
---
## 4. Common Vectorization Mistakes
### Mistake 1: Scalar Comparison with Array
```python
WRONG:
if household("income", period) > 1000:
# Error: truth value of array is ambiguous
CORRECT:
income = household("income", period)
high_income = income > 1000 # Boolean array
benefit = where(high_income, low_benefit, high_benefit)
```
### Mistake 2: Using Python's and/or/not
```python
WRONG:
eligible = is_elderly or is_disabled # Python's 'or'
CORRECT:
eligible = is_elderly | is_disabled # NumPy's '|'
```
### Mistake 3: Nested if Statements
```python
WRONG:
if eligible:
if income < threshold:
return full_benefit
else:
return partial_benefit
else:
return 0
CORRECT:
return where(
eligible,
where(income < threshold, full_benefit, partial_benefit),
0
)
```
---
## 5. Advanced Patterns
### Pattern: Vectorized Lookup Tables
```python
# Instead of if-elif for ranges
if size == 1:
amount = 100
elif size == 2:
amount = 150
elif size == 3:
amount = 190
# Using parameter brackets
amount = p.benefit_schedule.calc(size)
# Or using select
amounts = [100, 150, 190, 220, 250]
amount = select(
[size == i for i in range(1, 6)],
amounts[:5],
default=amounts[-1] # 5+ people
)
```
### Pattern: Accumulating Conditions
```python
# Building complex eligibility
income_eligible = income < p.income_threshold
resource_eligible = resources < p.resource_limit
demographic_eligible = (age < 18) | is_pregnant
# Combine with & (not 'and')
eligible = income_eligible & resource_eligible & demographic_eligible
```
### Pattern: Conditional Accumulation
```python
# Sum only for eligible members
person = household.members
is_eligible = person("is_eligible", period)
person_income = person("income", period)
# Only count income of eligible members
eligible_income = where(is_eligible, person_income, 0)
total = household.sum(eligible_income)
```
---
## 6. Performance Implications
### Why Vectorization Matters
- **Scalar logic**: Processes 1 household at a time → SLOW
- **Vectorized**: Processes 1000s of households simultaneously → FAST
```python
# Performance comparison
SLOW (if it worked):
for household in households:
if household.income > 1000:
household.benefit = 500
FAST:
benefits = where(incomes > 1000, 500, 100) # All at once!
```
---
## 7. Testing for Vectorization Issues
### Signs Your Code Isn't Vectorized
**Error messages:**
- "The truth value of an array is ambiguous"
- "ValueError: The truth value of an array with more than one element"
**Performance:**
- Tests run slowly
- Microsimulation times out
### How to Test
```python
# Your formula should work with arrays
def test_vectorization():
# Create array inputs
incomes = np.array([500, 1500, 3000])
# Should return array output
benefits = formula_with_arrays(incomes)
assert len(benefits) == 3
```
---
## Quick Reference Card
| Operation | Scalar (WRONG) | Vectorized (CORRECT) |
|-----------|---------------|---------------------|
| Simple condition | `if x > 5:` | `where(x > 5, ...)` |
| Multiple conditions | `if-elif-else` | `select([...], [...])` |
| Boolean AND | `and` | `&` |
| Boolean OR | `or` | `\|` |
| Boolean NOT | `not` | `~` |
| Bounds checking | `if x < 0: x = 0` | `max_(0, x)` |
| Complex logic | Nested if | Nested where/select |
---
## For Agents
When implementing formulas:
1. **Never use if-elif-else** with entity data
2. **Always use where()** for simple conditions
3. **Use select()** for multiple conditions
4. **Use NumPy operators** (&, |, ~) not Python (and, or, not)
5. **Test with arrays** to ensure vectorization
6. **Parameter conditions** can use if-else (scalars)
7. **Entity data** must use vectorized operations