11 KiB
11 KiB
name, description
| name | description |
|---|---|
| policyengine-code-style | PolicyEngine code writing style guide - formula optimization, direct returns, eliminating unnecessary variables |
PolicyEngine Code Writing Style Guide
Essential patterns for writing clean, efficient PolicyEngine formulas.
Core Principles
- Eliminate unnecessary intermediate variables
- Use direct parameter/variable access
- Return directly when possible
- Combine boolean logic
- Use correct period access (period vs period.this_year)
- NO hardcoded values - use parameters or constants
Pattern 1: Direct Parameter Access
❌ Bad - Unnecessary intermediate variable
def formula(spm_unit, period, parameters):
countable = spm_unit("tn_tanf_countable_resources", period)
p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
resource_limit = p.amount # ❌ Unnecessary
return countable <= resource_limit
✅ Good - Direct access
def formula(spm_unit, period, parameters):
countable = spm_unit("tn_tanf_countable_resources", period)
p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
return countable <= p.amount
Pattern 2: Direct Return
❌ Bad - Unnecessary result variable
def formula(spm_unit, period, parameters):
assets = spm_unit("spm_unit_assets", period.this_year)
p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
vehicle_exemption = p.vehicle_exemption # ❌ Unnecessary
countable = max_(assets - vehicle_exemption, 0) # ❌ Unnecessary
return countable
✅ Good - Direct return
def formula(spm_unit, period, parameters):
assets = spm_unit("spm_unit_assets", period.this_year)
p = parameters(period).gov.states.tn.dhs.tanf.resource_limit
return max_(assets - p.vehicle_exemption, 0)
Pattern 3: Combined Boolean Logic
❌ Bad - Too many intermediate booleans
def formula(spm_unit, period, parameters):
person = spm_unit.members
age = person("age", period.this_year)
is_disabled = person("is_disabled", period.this_year)
caretaker_is_60_or_older = spm_unit.any(age >= 60) # ❌ Unnecessary
caretaker_is_disabled = spm_unit.any(is_disabled) # ❌ Unnecessary
eligible = caretaker_is_60_or_older | caretaker_is_disabled # ❌ Unnecessary
return eligible
✅ Good - Combined logic
def formula(spm_unit, period, parameters):
person = spm_unit.members
age = person("age", period.this_year)
is_disabled = person("is_disabled", period.this_year)
return spm_unit.any((age >= 60) | is_disabled)
Pattern 4: Period Access - period vs period.this_year
❌ Bad - Wrong period access
def formula(person, period, parameters):
# MONTH formula accessing YEAR variables
age = person("age", period) # ❌ Gives age/12 = 2.5 "monthly age"
assets = person("assets", period) # ❌ Gives assets/12
monthly_income = person("employment_income", period.this_year) / MONTHS_IN_YEAR # ❌ Redundant
return (age >= 18) & (assets < 10000) & (monthly_income < 2000)
✅ Good - Correct period access
def formula(person, period, parameters):
# MONTH formula accessing YEAR variables
age = person("age", period.this_year) # ✅ Gets actual age (30)
assets = person("assets", period.this_year) # ✅ Gets actual assets ($10,000)
monthly_income = person("employment_income", period) # ✅ Auto-converts to monthly
p = parameters(period).gov.program.eligibility
return (age >= p.age_min) & (age <= p.age_max) &
(assets < p.asset_limit) & (monthly_income < p.income_threshold)
Rule:
- Income/flows → Use
period(want monthly from annual) - Age/assets/counts/booleans → Use
period.this_year(don't divide by 12)
Pattern 5: No Hardcoded Values
❌ Bad - Hardcoded numbers
def formula(spm_unit, period, parameters):
size = spm_unit.nb_persons()
capped_size = min_(size, 10) # ❌ Hardcoded
age = person("age", period.this_year)
income = person("income", period) / 12 # ❌ Use MONTHS_IN_YEAR
# ❌ Hardcoded thresholds
if age >= 18 and age <= 65 and income < 2000:
return True
✅ Good - Parameterized
def formula(spm_unit, period, parameters):
p = parameters(period).gov.program
capped_size = min_(spm_unit.nb_persons(), p.max_unit_size) # ✅
age = person("age", period.this_year)
monthly_income = person("income", period) # ✅ Auto-converts (no manual /12)
age_eligible = (age >= p.age_min) & (age <= p.age_max) # ✅
income_eligible = monthly_income < p.income_threshold # ✅
return age_eligible & income_eligible
Pattern 6: Streamline Variable Access
❌ Bad - Redundant steps
def formula(spm_unit, period, parameters):
unit_size = spm_unit.nb_persons() # ❌ Unnecessary
max_size = 10 # ❌ Hardcoded
capped_size = min_(unit_size, max_size)
p = parameters(period).gov.states.tn.dhs.tanf.benefit
spa = p.standard_payment_amount[capped_size] # ❌ Unnecessary
dgpa = p.differential_grant_payment_amount[capped_size] # ❌ Unnecessary
eligible = spm_unit("eligible_for_dgpa", period)
return where(eligible, dgpa, spa)
✅ Good - Streamlined
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.tn.dhs.tanf.benefit
capped_size = min_(spm_unit.nb_persons(), p.max_unit_size)
eligible = spm_unit("eligible_for_dgpa", period)
return where(
eligible,
p.differential_grant_payment_amount[capped_size],
p.standard_payment_amount[capped_size]
)
When to Keep Intermediate Variables
✅ Keep when value is used multiple times
def formula(tax_unit, period, parameters):
p = parameters(period).gov.irs.credits
filing_status = tax_unit("filing_status", period)
# ✅ Used multiple times - keep as variable
threshold = p.phase_out.start[filing_status]
income = tax_unit("adjusted_gross_income", period)
excess = max_(0, income - threshold)
reduction = (excess / p.phase_out.width) * threshold
return max_(0, threshold - reduction)
✅ Keep when calculation is complex
def formula(spm_unit, period, parameters):
p = parameters(period).gov.program
gross_earned = spm_unit("gross_earned_income", period)
# ✅ Complex multi-step calculation - break it down
work_expense_deduction = min_(gross_earned * p.work_expense_rate, p.work_expense_max)
after_work_expense = gross_earned - work_expense_deduction
earned_disregard = after_work_expense * p.earned_disregard_rate
countable_earned = after_work_expense - earned_disregard
dependent_care = spm_unit("dependent_care_expenses", period)
return max_(0, countable_earned - dependent_care)
Complete Example: Before vs After
❌ Before - Multiple Issues
def formula(person, period, parameters):
# Wrong period access
age = person("age", period) # ❌ age/12
assets = person("assets", period) # ❌ assets/12
annual_income = person("employment_income", period.this_year)
monthly_income = annual_income / 12 # ❌ Use MONTHS_IN_YEAR
# Hardcoded values
min_age = 18 # ❌
max_age = 64 # ❌
asset_limit = 10000 # ❌
income_limit = 2000 # ❌
# Unnecessary intermediate variables
age_check = (age >= min_age) & (age <= max_age)
asset_check = assets <= asset_limit
income_check = monthly_income <= income_limit
eligible = age_check & asset_check & income_check
return eligible
✅ After - Clean and Correct
def formula(person, period, parameters):
p = parameters(period).gov.program.eligibility
# Correct period access
age = person("age", period.this_year)
assets = person("assets", period.this_year)
monthly_income = person("employment_income", period)
# Direct return with combined logic
return (
(age >= p.age_min) & (age <= p.age_max) &
(assets <= p.asset_limit) &
(monthly_income <= p.income_threshold)
)
Pattern 7: Minimal Comments
Code Should Be Self-Documenting
Variable names and structure should explain the code - not comments.
❌ Bad - Verbose explanatory comments
def formula(spm_unit, period, parameters):
# Wisconsin disregards all earned income of dependent children (< 18)
# Calculate earned income for adults only
is_adult = spm_unit.members("age", period.this_year) >= 18 # Hard-coded!
adult_earned = spm_unit.sum(
spm_unit.members("tanf_gross_earned_income", period) * is_adult
)
# All unearned income is counted (including children's)
gross_unearned = add(spm_unit, period, ["tanf_gross_unearned_income"])
# NOTE: Wisconsin disregards many additional income sources that
# are not separately tracked in PolicyEngine (educational aid, etc.)
return max_(total_income - disregards, 0)
✅ Good - Clean self-documenting code
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.wi.dcf.tanf.income
is_adult = spm_unit.members("age", period.this_year) >= p.adult_age_threshold
adult_earned = spm_unit.sum(
spm_unit.members("tanf_gross_earned_income", period) * is_adult
)
gross_unearned = add(spm_unit, period, ["tanf_gross_unearned_income"])
child_support = add(spm_unit, period, ["child_support_received"])
return max_(adult_earned + gross_unearned - child_support, 0)
Comment Rules
- NO comments explaining what code does - variable names should be clear
- OK: Brief NOTE about PolicyEngine limitations (one line):
# NOTE: Time limit cannot be tracked in PolicyEngine - NO multi-line explanations of what the code calculates
Quick Checklist
Before finalizing code:
- No hardcoded numbers (use parameters or constants like MONTHS_IN_YEAR)
- Correct period access:
- Income/flows use
period - Age/assets/counts/booleans use
period.this_year
- Income/flows use
- No single-use intermediate variables
- Direct parameter access (
p.amountnotamount = p.amount) - Direct returns when possible
- Combined boolean logic when possible
- Minimal comments (code should be self-documenting)
Key Takeaways
- Less is more - Eliminate unnecessary variables
- Direct is better - Access parameters and return directly
- Combine when logical - Group related boolean conditions
- Keep when needed - Complex calculations and reused values deserve variables
- Period matters - Use correct period access to avoid auto-conversion bugs
Related Skills
- policyengine-period-patterns-skill - Deep dive on period handling
- policyengine-implementation-patterns-skill - Variable structure and patterns
- policyengine-vectorization-skill - NumPy operations and vectorization
For Agents
When writing or reviewing formulas:
- Scan for single-use variables - eliminate them
- Check period access - ensure correct for variable type
- Look for hardcoded values - parameterize them
- Identify redundant steps - streamline them
- Consider readability - keep complex calculations clear