Initial commit
This commit is contained in:
303
skills/policyengine-vectorization-skill/SKILL.md
Normal file
303
skills/policyengine-vectorization-skill/SKILL.md
Normal file
@@ -0,0 +1,303 @@
|
||||
---
|
||||
name: policyengine-vectorization
|
||||
description: PolicyEngine vectorization patterns - NumPy operations, where/select usage, avoiding scalar logic with arrays
|
||||
---
|
||||
|
||||
# PolicyEngine Vectorization Patterns
|
||||
|
||||
Critical patterns for vectorized operations in PolicyEngine. Scalar logic with arrays will crash the microsimulation.
|
||||
|
||||
## The Golden Rule
|
||||
|
||||
**PolicyEngine processes multiple households simultaneously using NumPy arrays. NEVER use if-elif-else with entity data.**
|
||||
|
||||
---
|
||||
|
||||
## 1. Critical: What Will Crash
|
||||
|
||||
### ❌ NEVER: if-elif-else with Arrays
|
||||
|
||||
```python
|
||||
# THIS WILL CRASH - household data is an array
|
||||
def formula(household, period, parameters):
|
||||
income = household("income", period)
|
||||
if income > 1000: # ❌ CRASH: "truth value of array is ambiguous"
|
||||
return 500
|
||||
else:
|
||||
return 100
|
||||
```
|
||||
|
||||
### ✅ ALWAYS: Vectorized Operations
|
||||
|
||||
```python
|
||||
# CORRECT - works with arrays
|
||||
def formula(household, period, parameters):
|
||||
income = household("income", period)
|
||||
return where(income > 1000, 500, 100) # ✅ Vectorized
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Common Vectorization Patterns
|
||||
|
||||
### Pattern 1: Simple Conditions → `where()`
|
||||
|
||||
```python
|
||||
# Instead of if-else
|
||||
❌ if age >= 65:
|
||||
amount = senior_amount
|
||||
else:
|
||||
amount = regular_amount
|
||||
|
||||
✅ amount = where(age >= 65, senior_amount, regular_amount)
|
||||
```
|
||||
|
||||
### Pattern 2: Multiple Conditions → `select()`
|
||||
|
||||
```python
|
||||
# Instead of if-elif-else
|
||||
❌ if age < 18:
|
||||
benefit = child_amount
|
||||
elif age >= 65:
|
||||
benefit = senior_amount
|
||||
else:
|
||||
benefit = adult_amount
|
||||
|
||||
✅ benefit = select(
|
||||
[age < 18, age >= 65],
|
||||
[child_amount, senior_amount],
|
||||
default=adult_amount
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 3: Boolean Operations
|
||||
|
||||
```python
|
||||
# Combining conditions
|
||||
eligible = (age >= 18) & (income < threshold) # Use & not 'and'
|
||||
eligible = (is_disabled | is_elderly) # Use | not 'or'
|
||||
eligible = ~is_excluded # Use ~ not 'not'
|
||||
```
|
||||
|
||||
### Pattern 4: Clipping Values
|
||||
|
||||
```python
|
||||
# Instead of if for bounds checking
|
||||
❌ if amount < 0:
|
||||
amount = 0
|
||||
elif amount > maximum:
|
||||
amount = maximum
|
||||
|
||||
✅ amount = clip(amount, 0, maximum)
|
||||
# Or: amount = max_(0, min_(amount, maximum))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. When if-else IS Acceptable
|
||||
|
||||
### ✅ OK: Parameter-Only Conditions
|
||||
|
||||
```python
|
||||
# OK - parameters are scalars, not arrays
|
||||
def formula(entity, period, parameters):
|
||||
p = parameters(period).gov.program
|
||||
|
||||
# This is fine - p.enabled is a scalar boolean
|
||||
if p.enabled:
|
||||
base = p.base_amount
|
||||
else:
|
||||
base = 0
|
||||
|
||||
# But must vectorize when using entity data
|
||||
income = entity("income", period)
|
||||
return where(income < p.threshold, base, 0)
|
||||
```
|
||||
|
||||
### ✅ OK: Control Flow (Not Data)
|
||||
|
||||
```python
|
||||
# OK - controlling which calculation to use
|
||||
def formula(entity, period, parameters):
|
||||
year = period.start.year
|
||||
|
||||
if year >= 2024:
|
||||
# Use new formula (still vectorized)
|
||||
return entity("new_calculation", period)
|
||||
else:
|
||||
# Use old formula (still vectorized)
|
||||
return entity("old_calculation", period)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Common Vectorization Mistakes
|
||||
|
||||
### Mistake 1: Scalar Comparison with Array
|
||||
|
||||
```python
|
||||
❌ WRONG:
|
||||
if household("income", period) > 1000:
|
||||
# Error: truth value of array is ambiguous
|
||||
|
||||
✅ CORRECT:
|
||||
income = household("income", period)
|
||||
high_income = income > 1000 # Boolean array
|
||||
benefit = where(high_income, low_benefit, high_benefit)
|
||||
```
|
||||
|
||||
### Mistake 2: Using Python's and/or/not
|
||||
|
||||
```python
|
||||
❌ WRONG:
|
||||
eligible = is_elderly or is_disabled # Python's 'or'
|
||||
|
||||
✅ CORRECT:
|
||||
eligible = is_elderly | is_disabled # NumPy's '|'
|
||||
```
|
||||
|
||||
### Mistake 3: Nested if Statements
|
||||
|
||||
```python
|
||||
❌ WRONG:
|
||||
if eligible:
|
||||
if income < threshold:
|
||||
return full_benefit
|
||||
else:
|
||||
return partial_benefit
|
||||
else:
|
||||
return 0
|
||||
|
||||
✅ CORRECT:
|
||||
return where(
|
||||
eligible,
|
||||
where(income < threshold, full_benefit, partial_benefit),
|
||||
0
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Advanced Patterns
|
||||
|
||||
### Pattern: Vectorized Lookup Tables
|
||||
|
||||
```python
|
||||
# Instead of if-elif for ranges
|
||||
❌ if size == 1:
|
||||
amount = 100
|
||||
elif size == 2:
|
||||
amount = 150
|
||||
elif size == 3:
|
||||
amount = 190
|
||||
|
||||
✅ # Using parameter brackets
|
||||
amount = p.benefit_schedule.calc(size)
|
||||
|
||||
✅ # Or using select
|
||||
amounts = [100, 150, 190, 220, 250]
|
||||
amount = select(
|
||||
[size == i for i in range(1, 6)],
|
||||
amounts[:5],
|
||||
default=amounts[-1] # 5+ people
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern: Accumulating Conditions
|
||||
|
||||
```python
|
||||
# Building complex eligibility
|
||||
income_eligible = income < p.income_threshold
|
||||
resource_eligible = resources < p.resource_limit
|
||||
demographic_eligible = (age < 18) | is_pregnant
|
||||
|
||||
# Combine with & (not 'and')
|
||||
eligible = income_eligible & resource_eligible & demographic_eligible
|
||||
```
|
||||
|
||||
### Pattern: Conditional Accumulation
|
||||
|
||||
```python
|
||||
# Sum only for eligible members
|
||||
person = household.members
|
||||
is_eligible = person("is_eligible", period)
|
||||
person_income = person("income", period)
|
||||
|
||||
# Only count income of eligible members
|
||||
eligible_income = where(is_eligible, person_income, 0)
|
||||
total = household.sum(eligible_income)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Performance Implications
|
||||
|
||||
### Why Vectorization Matters
|
||||
|
||||
- **Scalar logic**: Processes 1 household at a time → SLOW
|
||||
- **Vectorized**: Processes 1000s of households simultaneously → FAST
|
||||
|
||||
```python
|
||||
# Performance comparison
|
||||
❌ SLOW (if it worked):
|
||||
for household in households:
|
||||
if household.income > 1000:
|
||||
household.benefit = 500
|
||||
|
||||
✅ FAST:
|
||||
benefits = where(incomes > 1000, 500, 100) # All at once!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Testing for Vectorization Issues
|
||||
|
||||
### Signs Your Code Isn't Vectorized
|
||||
|
||||
**Error messages:**
|
||||
- "The truth value of an array is ambiguous"
|
||||
- "ValueError: The truth value of an array with more than one element"
|
||||
|
||||
**Performance:**
|
||||
- Tests run slowly
|
||||
- Microsimulation times out
|
||||
|
||||
### How to Test
|
||||
|
||||
```python
|
||||
# Your formula should work with arrays
|
||||
def test_vectorization():
|
||||
# Create array inputs
|
||||
incomes = np.array([500, 1500, 3000])
|
||||
|
||||
# Should return array output
|
||||
benefits = formula_with_arrays(incomes)
|
||||
assert len(benefits) == 3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Card
|
||||
|
||||
| Operation | Scalar (WRONG) | Vectorized (CORRECT) |
|
||||
|-----------|---------------|---------------------|
|
||||
| Simple condition | `if x > 5:` | `where(x > 5, ...)` |
|
||||
| Multiple conditions | `if-elif-else` | `select([...], [...])` |
|
||||
| Boolean AND | `and` | `&` |
|
||||
| Boolean OR | `or` | `\|` |
|
||||
| Boolean NOT | `not` | `~` |
|
||||
| Bounds checking | `if x < 0: x = 0` | `max_(0, x)` |
|
||||
| Complex logic | Nested if | Nested where/select |
|
||||
|
||||
---
|
||||
|
||||
## For Agents
|
||||
|
||||
When implementing formulas:
|
||||
1. **Never use if-elif-else** with entity data
|
||||
2. **Always use where()** for simple conditions
|
||||
3. **Use select()** for multiple conditions
|
||||
4. **Use NumPy operators** (&, |, ~) not Python (and, or, not)
|
||||
5. **Test with arrays** to ensure vectorization
|
||||
6. **Parameter conditions** can use if-else (scalars)
|
||||
7. **Entity data** must use vectorized operations
|
||||
Reference in New Issue
Block a user