303 lines
6.7 KiB
Markdown
303 lines
6.7 KiB
Markdown
---
|
|
name: policyengine-vectorization
|
|
description: PolicyEngine vectorization patterns - NumPy operations, where/select usage, avoiding scalar logic with arrays
|
|
---
|
|
|
|
# PolicyEngine Vectorization Patterns
|
|
|
|
Critical patterns for vectorized operations in PolicyEngine. Scalar logic with arrays will crash the microsimulation.
|
|
|
|
## The Golden Rule
|
|
|
|
**PolicyEngine processes multiple households simultaneously using NumPy arrays. NEVER use if-elif-else with entity data.**
|
|
|
|
---
|
|
|
|
## 1. Critical: What Will Crash
|
|
|
|
### ❌ NEVER: if-elif-else with Arrays
|
|
|
|
```python
|
|
# THIS WILL CRASH - household data is an array
|
|
def formula(household, period, parameters):
|
|
income = household("income", period)
|
|
if income > 1000: # ❌ CRASH: "truth value of array is ambiguous"
|
|
return 500
|
|
else:
|
|
return 100
|
|
```
|
|
|
|
### ✅ ALWAYS: Vectorized Operations
|
|
|
|
```python
|
|
# CORRECT - works with arrays
|
|
def formula(household, period, parameters):
|
|
income = household("income", period)
|
|
return where(income > 1000, 500, 100) # ✅ Vectorized
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Common Vectorization Patterns
|
|
|
|
### Pattern 1: Simple Conditions → `where()`
|
|
|
|
```python
|
|
# Instead of if-else
|
|
❌ if age >= 65:
|
|
amount = senior_amount
|
|
else:
|
|
amount = regular_amount
|
|
|
|
✅ amount = where(age >= 65, senior_amount, regular_amount)
|
|
```
|
|
|
|
### Pattern 2: Multiple Conditions → `select()`
|
|
|
|
```python
|
|
# Instead of if-elif-else
|
|
❌ if age < 18:
|
|
benefit = child_amount
|
|
elif age >= 65:
|
|
benefit = senior_amount
|
|
else:
|
|
benefit = adult_amount
|
|
|
|
✅ benefit = select(
|
|
[age < 18, age >= 65],
|
|
[child_amount, senior_amount],
|
|
default=adult_amount
|
|
)
|
|
```
|
|
|
|
### Pattern 3: Boolean Operations
|
|
|
|
```python
|
|
# Combining conditions
|
|
eligible = (age >= 18) & (income < threshold) # Use & not 'and'
|
|
eligible = (is_disabled | is_elderly) # Use | not 'or'
|
|
eligible = ~is_excluded # Use ~ not 'not'
|
|
```
|
|
|
|
### Pattern 4: Clipping Values
|
|
|
|
```python
|
|
# Instead of if for bounds checking
|
|
❌ if amount < 0:
|
|
amount = 0
|
|
elif amount > maximum:
|
|
amount = maximum
|
|
|
|
✅ amount = clip(amount, 0, maximum)
|
|
# Or: amount = max_(0, min_(amount, maximum))
|
|
```
|
|
|
|
---
|
|
|
|
## 3. When if-else IS Acceptable
|
|
|
|
### ✅ OK: Parameter-Only Conditions
|
|
|
|
```python
|
|
# OK - parameters are scalars, not arrays
|
|
def formula(entity, period, parameters):
|
|
p = parameters(period).gov.program
|
|
|
|
# This is fine - p.enabled is a scalar boolean
|
|
if p.enabled:
|
|
base = p.base_amount
|
|
else:
|
|
base = 0
|
|
|
|
# But must vectorize when using entity data
|
|
income = entity("income", period)
|
|
return where(income < p.threshold, base, 0)
|
|
```
|
|
|
|
### ✅ OK: Control Flow (Not Data)
|
|
|
|
```python
|
|
# OK - controlling which calculation to use
|
|
def formula(entity, period, parameters):
|
|
year = period.start.year
|
|
|
|
if year >= 2024:
|
|
# Use new formula (still vectorized)
|
|
return entity("new_calculation", period)
|
|
else:
|
|
# Use old formula (still vectorized)
|
|
return entity("old_calculation", period)
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Common Vectorization Mistakes
|
|
|
|
### Mistake 1: Scalar Comparison with Array
|
|
|
|
```python
|
|
❌ WRONG:
|
|
if household("income", period) > 1000:
|
|
# Error: truth value of array is ambiguous
|
|
|
|
✅ CORRECT:
|
|
income = household("income", period)
|
|
high_income = income > 1000 # Boolean array
|
|
benefit = where(high_income, low_benefit, high_benefit)
|
|
```
|
|
|
|
### Mistake 2: Using Python's and/or/not
|
|
|
|
```python
|
|
❌ WRONG:
|
|
eligible = is_elderly or is_disabled # Python's 'or'
|
|
|
|
✅ CORRECT:
|
|
eligible = is_elderly | is_disabled # NumPy's '|'
|
|
```
|
|
|
|
### Mistake 3: Nested if Statements
|
|
|
|
```python
|
|
❌ WRONG:
|
|
if eligible:
|
|
if income < threshold:
|
|
return full_benefit
|
|
else:
|
|
return partial_benefit
|
|
else:
|
|
return 0
|
|
|
|
✅ CORRECT:
|
|
return where(
|
|
eligible,
|
|
where(income < threshold, full_benefit, partial_benefit),
|
|
0
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Advanced Patterns
|
|
|
|
### Pattern: Vectorized Lookup Tables
|
|
|
|
```python
|
|
# Instead of if-elif for ranges
|
|
❌ if size == 1:
|
|
amount = 100
|
|
elif size == 2:
|
|
amount = 150
|
|
elif size == 3:
|
|
amount = 190
|
|
|
|
✅ # Using parameter brackets
|
|
amount = p.benefit_schedule.calc(size)
|
|
|
|
✅ # Or using select
|
|
amounts = [100, 150, 190, 220, 250]
|
|
amount = select(
|
|
[size == i for i in range(1, 6)],
|
|
amounts[:5],
|
|
default=amounts[-1] # 5+ people
|
|
)
|
|
```
|
|
|
|
### Pattern: Accumulating Conditions
|
|
|
|
```python
|
|
# Building complex eligibility
|
|
income_eligible = income < p.income_threshold
|
|
resource_eligible = resources < p.resource_limit
|
|
demographic_eligible = (age < 18) | is_pregnant
|
|
|
|
# Combine with & (not 'and')
|
|
eligible = income_eligible & resource_eligible & demographic_eligible
|
|
```
|
|
|
|
### Pattern: Conditional Accumulation
|
|
|
|
```python
|
|
# Sum only for eligible members
|
|
person = household.members
|
|
is_eligible = person("is_eligible", period)
|
|
person_income = person("income", period)
|
|
|
|
# Only count income of eligible members
|
|
eligible_income = where(is_eligible, person_income, 0)
|
|
total = household.sum(eligible_income)
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Performance Implications
|
|
|
|
### Why Vectorization Matters
|
|
|
|
- **Scalar logic**: Processes 1 household at a time → SLOW
|
|
- **Vectorized**: Processes 1000s of households simultaneously → FAST
|
|
|
|
```python
|
|
# Performance comparison
|
|
❌ SLOW (if it worked):
|
|
for household in households:
|
|
if household.income > 1000:
|
|
household.benefit = 500
|
|
|
|
✅ FAST:
|
|
benefits = where(incomes > 1000, 500, 100) # All at once!
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Testing for Vectorization Issues
|
|
|
|
### Signs Your Code Isn't Vectorized
|
|
|
|
**Error messages:**
|
|
- "The truth value of an array is ambiguous"
|
|
- "ValueError: The truth value of an array with more than one element"
|
|
|
|
**Performance:**
|
|
- Tests run slowly
|
|
- Microsimulation times out
|
|
|
|
### How to Test
|
|
|
|
```python
|
|
# Your formula should work with arrays
|
|
def test_vectorization():
|
|
# Create array inputs
|
|
incomes = np.array([500, 1500, 3000])
|
|
|
|
# Should return array output
|
|
benefits = formula_with_arrays(incomes)
|
|
assert len(benefits) == 3
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Reference Card
|
|
|
|
| Operation | Scalar (WRONG) | Vectorized (CORRECT) |
|
|
|-----------|---------------|---------------------|
|
|
| Simple condition | `if x > 5:` | `where(x > 5, ...)` |
|
|
| Multiple conditions | `if-elif-else` | `select([...], [...])` |
|
|
| Boolean AND | `and` | `&` |
|
|
| Boolean OR | `or` | `\|` |
|
|
| Boolean NOT | `not` | `~` |
|
|
| Bounds checking | `if x < 0: x = 0` | `max_(0, x)` |
|
|
| Complex logic | Nested if | Nested where/select |
|
|
|
|
---
|
|
|
|
## For Agents
|
|
|
|
When implementing formulas:
|
|
1. **Never use if-elif-else** with entity data
|
|
2. **Always use where()** for simple conditions
|
|
3. **Use select()** for multiple conditions
|
|
4. **Use NumPy operators** (&, |, ~) not Python (and, or, not)
|
|
5. **Test with arrays** to ensure vectorization
|
|
6. **Parameter conditions** can use if-else (scalars)
|
|
7. **Entity data** must use vectorized operations |