Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:47:59 +08:00
commit 0e474aef89
20 changed files with 5748 additions and 0 deletions

277
skills/l0-skill/SKILL.md Normal file
View File

@@ -0,0 +1,277 @@
---
name: l0
description: L0 regularization for neural network sparsification and intelligent sampling - used in survey calibration
---
# L0 Regularization
L0 is a PyTorch implementation of L0 regularization for neural network sparsification and intelligent sampling, used in PolicyEngine's survey calibration pipeline.
## For Users 👥
### What is L0?
L0 regularization helps PolicyEngine create more efficient survey datasets by intelligently selecting which households to include in calculations.
**Impact you see:**
- Faster population impact calculations
- Smaller dataset sizes
- Maintained accuracy with fewer samples
**Behind the scenes:**
When PolicyEngine shows population-wide impacts, L0 helps select representative households from the full survey, reducing computation time while maintaining accuracy.
## For Analysts 📊
### What L0 Does
L0 provides intelligent sampling gates for:
- **Household selection** - Choose representative samples from CPS
- **Feature selection** - Identify important variables
- **Sparse weighting** - Create compact, efficient datasets
**Used in PolicyEngine for:**
- Survey calibration (via microcalibrate)
- Dataset sparsification in policyengine-us-data
- Efficient microsimulation
### Installation
```bash
pip install l0-python
```
### Quick Example: Sample Selection
```python
from l0 import SampleGate
# Select 1,000 households from 10,000
gate = SampleGate(n_samples=10000, target_samples=1000)
selected_data, indices = gate.select_samples(data)
# Gates learn which samples are most informative
```
### Integration with microcalibrate
```python
from l0 import HardConcrete
from microcalibrate import Calibration
# L0 gates for household selection
gates = HardConcrete(
len(household_weights),
temperature=0.25,
init_mean=0.999 # Start with most households
)
# Use in calibration
# microcalibrate applies gates during weight optimization
```
## For Contributors 💻
### Repository
**Location:** PolicyEngine/L0
**Clone:**
```bash
git clone https://github.com/PolicyEngine/L0
cd L0
```
### Current Implementation
**To see structure:**
```bash
tree l0/
# Key modules:
ls l0/
# - hard_concrete.py - Core L0 distribution
# - layers.py - L0Linear, L0Conv2d
# - gates.py - Sample/feature gates
# - penalties.py - L0/L2 penalty computation
# - temperature.py - Temperature scheduling
```
**To see specific implementations:**
```bash
# Hard Concrete distribution (core algorithm)
cat l0/hard_concrete.py
# Sample gates (used in calibration)
cat l0/gates.py
# Neural network layers
cat l0/layers.py
```
### Key Concepts
**Hard Concrete Distribution:**
- Differentiable approximation of L0 norm
- Allows gradient-based optimization
- Temperature controls sparsity level
**To see implementation:**
```bash
cat l0/hard_concrete.py
```
**Sample Gates:**
- Binary gates for sample selection
- Learn which samples are most informative
- Used in microcalibrate for household selection
**Feature Gates:**
- Select important features/variables
- Reduce dimensionality
- Maintain prediction accuracy
### Usage in PolicyEngine
**In microcalibrate (survey calibration):**
```python
from l0 import HardConcrete
# Create gates for household selection
gates = HardConcrete(
n_items=len(households),
temperature=0.25,
init_mean=0.999 # Start with almost all households
)
# Gates produce probabilities (0 to 1)
probs = gates()
# Apply to weights during calibration
masked_weights = weights * probs
```
**In policyengine-us-data:**
```bash
# See usage in data pipeline
grep -r "from l0 import" ../policyengine-us-data/
```
### Temperature Scheduling
**Controls sparsity over training:**
```python
from l0 import TemperatureScheduler, update_temperatures
scheduler = TemperatureScheduler(
initial_temp=1.0, # Start relaxed
final_temp=0.1, # End sparse
total_epochs=100
)
for epoch in range(100):
temp = scheduler.get_temperature(epoch)
update_temperatures(model, temp)
# ... training ...
```
**To see implementation:**
```bash
cat l0/temperature.py
```
### L0L2 Combined Penalty
**Prevents overfitting:**
```python
from l0 import compute_l0l2_penalty
# Combine L0 (sparsity) with L2 (regularization)
penalty = compute_l0l2_penalty(
model,
l0_lambda=1e-3, # Sparsity strength
l2_lambda=1e-4 # Weight regularization
)
loss = task_loss + penalty
```
### Testing
**Run tests:**
```bash
make test
# Or
pytest tests/ -v --cov=l0
```
**To see test patterns:**
```bash
cat tests/test_hard_concrete.py
cat tests/test_gates.py
```
## Advanced Usage
### Hybrid Gates (L0 + Random)
```python
from l0 import HybridGate
# Combine L0 selection with random sampling
hybrid = HybridGate(
n_items=10000,
l0_fraction=0.25, # 25% from L0
random_fraction=0.75, # 75% random
target_items=1000
)
selected, indices, types = hybrid.select(data)
```
### Feature Selection
```python
from l0 import FeatureGate
# Select top features
gate = FeatureGate(n_features=1000, max_features=50)
selected_data, feature_indices = gate.select_features(data)
# Get feature importance
importance = gate.get_feature_importance()
```
## Mathematical Background
**L0 norm:**
- Counts non-zero elements
- Non-differentiable (discontinuous)
- Hard to optimize directly
**Hard Concrete relaxation:**
- Continuous, differentiable approximation
- Enables gradient descent
- "Stretches" binary distribution to allow gradients
**Paper:**
Louizos, Welling, & Kingma (2017): "Learning Sparse Neural Networks through L0 Regularization"
https://arxiv.org/abs/1712.01312
## Related Packages
**Uses L0:**
- microcalibrate (survey weight calibration)
- policyengine-us-data (household selection)
**See also:**
- **microcalibrate-skill** - Survey calibration using L0
- **policyengine-us-data-skill** - Data pipeline integration
## Resources
**Repository:** https://github.com/PolicyEngine/L0
**Documentation:** https://policyengine.github.io/L0/
**Paper:** https://arxiv.org/abs/1712.01312
**PyPI:** https://pypi.org/project/l0-python/

View File

@@ -0,0 +1,404 @@
---
name: microcalibrate
description: Survey weight calibration to match population targets - used in policyengine-us-data for enhanced microdata
---
# MicroCalibrate
MicroCalibrate calibrates survey weights to match population targets, with L0 regularization for sparsity and automatic hyperparameter tuning.
## For Users 👥
### What is MicroCalibrate?
When you see PolicyEngine population impacts, the underlying data has been "calibrated" using MicroCalibrate to match official population statistics.
**What calibration does:**
- Adjusts survey weights to match known totals (population, income, employment)
- Creates representative datasets
- Reduces dataset size while maintaining accuracy
- Ensures PolicyEngine estimates match administrative data
**Example:**
- Census says US has 331 million people
- Survey has 100,000 households representing the population
- MicroCalibrate adjusts weights so survey totals match census totals
- Result: More accurate PolicyEngine calculations
## For Analysts 📊
### Installation
```bash
pip install microcalibrate
```
### What MicroCalibrate Does
**Calibration problem:**
You have survey data with initial weights, and you know certain population totals (benchmarks). Calibration adjusts weights so weighted survey totals match benchmarks.
**Example:**
```python
from microcalibrate import Calibration
import numpy as np
import pandas as pd
# Survey data (1,000 households)
weights = np.ones(1000) # Initial weights
# Estimates (how much each household contributes to targets)
estimate_matrix = pd.DataFrame({
'total_income': household_incomes, # Each household's income
'total_employed': household_employment # 1 if employed, 0 if not
})
# Known population targets (benchmarks)
targets = np.array([
50_000_000, # Total income in population
600, # Total employed people
])
# Calibrate
cal = Calibration(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix,
l0_lambda=0.01 # Sparsity penalty
)
# Optimize weights
new_weights = cal.calibrate(max_iter=1000)
# Check results
achieved = (estimate_matrix.values.T @ new_weights)
print(f"Target: {targets}")
print(f"Achieved: {achieved}")
print(f"Non-zero weights: {(new_weights > 0).sum()} / {len(weights)}")
```
### L0 Regularization for Sparsity
**Why sparsity matters:**
- Reduces dataset size (fewer households to simulate)
- Faster PolicyEngine calculations
- Easier to validate and understand
**L0 penalty:**
```python
# L0 encourages many weights to be exactly zero
cal = Calibration(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix,
l0_lambda=0.01 # Higher = more sparse
)
```
**To see impact:**
```python
# Without L0
cal_dense = Calibration(..., l0_lambda=0.0)
weights_dense = cal_dense.calibrate()
# With L0
cal_sparse = Calibration(..., l0_lambda=0.01)
weights_sparse = cal_sparse.calibrate()
print(f"Dense: {(weights_dense > 0).sum()} households")
print(f"Sparse: {(weights_sparse > 0).sum()} households")
# Sparse might use 60% fewer households while matching same targets
```
### Automatic Hyperparameter Tuning
**Find optimal l0_lambda:**
```python
from microcalibrate import tune_hyperparameters
# Find best l0_lambda using cross-validation
best_lambda, results = tune_hyperparameters(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix,
lambda_min=1e-4,
lambda_max=1e-1,
n_trials=50
)
print(f"Best lambda: {best_lambda}")
```
### Robustness Evaluation
**Test calibration stability:**
```python
from microcalibrate import evaluate_robustness
# Holdout validation
robustness = evaluate_robustness(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix,
l0_lambda=0.01,
n_folds=5
)
print(f"Mean error: {robustness['mean_error']}")
print(f"Std error: {robustness['std_error']}")
```
### Interactive Dashboard
**Visualize calibration:**
https://microcalibrate.vercel.app/
Features:
- Upload survey data
- Set targets
- Tune hyperparameters
- View results
- Download calibrated weights
## For Contributors 💻
### Repository
**Location:** PolicyEngine/microcalibrate
**Clone:**
```bash
git clone https://github.com/PolicyEngine/microcalibrate
cd microcalibrate
```
### Current Implementation
**To see structure:**
```bash
tree microcalibrate/
# Key modules:
ls microcalibrate/
# - calibration.py - Main Calibration class
# - hyperparameter_tuning.py - Optuna integration
# - evaluation.py - Robustness testing
# - target_analysis.py - Target diagnostics
```
**To see specific implementations:**
```bash
# Main calibration algorithm
cat microcalibrate/calibration.py
# Hyperparameter tuning
cat microcalibrate/hyperparameter_tuning.py
# Robustness evaluation
cat microcalibrate/evaluation.py
```
### Dependencies
**Required:**
- torch (PyTorch for optimization)
- l0-python (L0 regularization)
- optuna (hyperparameter tuning)
- numpy, pandas, tqdm
**To see all dependencies:**
```bash
cat pyproject.toml
```
### How MicroCalibrate Uses L0
```python
# Internal to microcalibrate
from l0 import HardConcrete
# Create gates for sample selection
gates = HardConcrete(
n_items=len(weights),
temperature=temperature,
init_mean=0.999
)
# Apply gates during optimization
effective_weights = weights * gates()
# L0 penalty encourages gates → 0 or 1
# Result: Many households get weight = 0 (sparse)
```
**To see L0 integration:**
```bash
grep -n "HardConcrete\|l0" microcalibrate/calibration.py
```
### Optimization Algorithm
**Iterative reweighting:**
1. Start with initial weights
2. Apply L0 gates (select samples)
3. Optimize to match targets
4. Apply penalty for sparsity
5. Iterate until convergence
**Loss function:**
```python
# Target matching loss
target_loss = sum((achieved_targets - desired_targets)^2)
# L0 penalty (number of non-zero weights)
l0_penalty = l0_lambda * count_nonzero(weights)
# Total loss
total_loss = target_loss + l0_penalty
```
### Testing
**Run tests:**
```bash
make test
# Or
pytest tests/ -v
```
**To see test patterns:**
```bash
cat tests/test_calibration.py
cat tests/test_hyperparameter_tuning.py
```
### Usage in policyengine-us-data
**To see how data pipeline uses microcalibrate:**
```bash
cd ../policyengine-us-data
# Find usage
grep -r "microcalibrate" policyengine_us_data/
grep -r "Calibration" policyengine_us_data/
```
## Common Patterns
### Pattern 1: Basic Calibration
```python
from microcalibrate import Calibration
cal = Calibration(
weights=initial_weights,
targets=benchmark_values,
estimate_matrix=contributions,
l0_lambda=0.01
)
calibrated_weights = cal.calibrate(max_iter=1000)
```
### Pattern 2: With Hyperparameter Tuning
```python
from microcalibrate import tune_hyperparameters, Calibration
# Find best lambda
best_lambda, results = tune_hyperparameters(
weights=weights,
targets=targets,
estimate_matrix=estimate_matrix
)
# Use best lambda
cal = Calibration(..., l0_lambda=best_lambda)
calibrated_weights = cal.calibrate()
```
### Pattern 3: Multi-Target Calibration
```python
# Multiple population targets
estimate_matrix = pd.DataFrame({
'total_population': population_counts,
'total_income': incomes,
'total_employed': employment_indicators,
'total_children': child_counts
})
targets = np.array([
331_000_000, # US population
15_000_000_000_000, # Total income
160_000_000, # Employed people
73_000_000 # Children
])
cal = Calibration(weights, targets, estimate_matrix, l0_lambda=0.01)
```
## Performance Considerations
**Calibration speed:**
- 1,000 households, 5 targets: ~1 second
- 100,000 households, 10 targets: ~30 seconds
- Depends on: dataset size, number of targets, l0_lambda
**Memory usage:**
- PyTorch tensors for optimization
- Scales linearly with dataset size
**To profile:**
```python
import time
start = time.time()
weights = cal.calibrate()
print(f"Calibration took {time.time() - start:.1f}s")
```
## Troubleshooting
**Common issues:**
**1. Calibration not converging:**
```python
# Try:
# - More iterations
# - Lower l0_lambda
# - Better initialization
cal = Calibration(..., l0_lambda=0.001) # Lower sparsity penalty
weights = cal.calibrate(max_iter=5000) # More iterations
```
**2. Targets not matching:**
```python
# Check achieved vs desired
achieved = (estimate_matrix.values.T @ weights)
error = np.abs(achieved - targets) / targets
print(f"Relative errors: {error}")
# If large errors, l0_lambda may be too high
```
**3. Too sparse (all weights zero):**
```python
# Lower l0_lambda
cal = Calibration(..., l0_lambda=0.0001)
```
## Related Skills
- **l0-skill** - Understanding L0 regularization
- **policyengine-us-data-skill** - How calibration fits in data pipeline
- **microdf-skill** - Working with calibrated survey data
## Resources
**Repository:** https://github.com/PolicyEngine/microcalibrate
**Dashboard:** https://microcalibrate.vercel.app/
**PyPI:** https://pypi.org/project/microcalibrate/
**Paper:** Louizos et al. (2017) on L0 regularization

View File

@@ -0,0 +1,329 @@
---
name: microdf
description: Weighted pandas DataFrames for survey microdata analysis - inequality, poverty, and distributional calculations
---
# MicroDF
MicroDF provides weighted pandas DataFrames and Series for analyzing survey microdata, with built-in support for inequality and poverty calculations.
## For Users 👥
### What is MicroDF?
When you see poverty rates, Gini coefficients, or distributional charts in PolicyEngine, those are calculated using MicroDF.
**MicroDF powers:**
- Poverty rate calculations (SPM)
- Inequality metrics (Gini coefficient)
- Income distribution analysis
- Weighted statistics from survey data
### Understanding the Metrics
**Gini coefficient:**
- Calculated using MicroDF from weighted income data
- Ranges from 0 (perfect equality) to 1 (perfect inequality)
- US typically around 0.48
**Poverty rates:**
- Calculated using MicroDF with weighted household data
- Compares income to poverty thresholds
- Accounts for household composition
**Percentiles:**
- MicroDF calculates weighted percentiles
- Shows income distribution (10th, 50th, 90th percentile)
## For Analysts 📊
### Installation
```bash
pip install microdf-python
```
### Quick Start
```python
import microdf as mdf
import pandas as pd
# Create sample data
df = pd.DataFrame({
'income': [10000, 20000, 30000, 40000, 50000],
'weights': [1, 2, 3, 2, 1]
})
# Create MicroDataFrame
mdf_df = mdf.MicroDataFrame(df, weights='weights')
# All operations are weight-aware
print(f"Weighted mean: ${mdf_df.income.mean():,.0f}")
print(f"Gini coefficient: {mdf_df.income.gini():.3f}")
```
### Common Operations
**Weighted statistics:**
```python
mdf_df.income.mean() # Weighted mean
mdf_df.income.median() # Weighted median
mdf_df.income.sum() # Weighted sum
mdf_df.income.std() # Weighted standard deviation
```
**Inequality metrics:**
```python
mdf_df.income.gini() # Gini coefficient
mdf_df.income.top_x_pct_share(10) # Top 10% share
mdf_df.income.top_x_pct_share(1) # Top 1% share
```
**Poverty analysis:**
```python
# Poverty rate (income < threshold)
poverty_rate = mdf_df.poverty_rate(
income_measure='income',
threshold=poverty_line
)
# Poverty gap (how far below threshold)
poverty_gap = mdf_df.poverty_gap(
income_measure='income',
threshold=poverty_line
)
# Deep poverty (income < 50% of threshold)
deep_poverty_rate = mdf_df.deep_poverty_rate(
income_measure='income',
threshold=poverty_line,
deep_poverty_line=0.5
)
```
**Quantiles:**
```python
# Deciles
mdf_df.income.decile_values()
# Quintiles
mdf_df.income.quintile_values()
# Custom quantiles
mdf_df.income.quantile(0.25) # 25th percentile
```
### MicroSeries
```python
# Extract a Series with weights
income_series = mdf_df.income # This is a MicroSeries
# MicroSeries operations
income_series.mean()
income_series.gini()
income_series.percentile(50)
```
### Working with PolicyEngine Results
```python
import microdf as mdf
from policyengine_us import Simulation
# Run simulation with axes (multiple households)
situation_with_axes = {...} # See policyengine-us-skill
sim = Simulation(situation=situation_with_axes)
# Get results as arrays
incomes = sim.calculate("household_net_income", 2024)
weights = sim.calculate("household_weight", 2024)
# Create MicroDataFrame
df = pd.DataFrame({'income': incomes, 'weight': weights})
mdf_df = mdf.MicroDataFrame(df, weights='weight')
# Calculate metrics
gini = mdf_df.income.gini()
poverty_rate = mdf_df.poverty_rate('income', threshold=15000)
print(f"Gini: {gini:.3f}")
print(f"Poverty rate: {poverty_rate:.1%}")
```
## For Contributors 💻
### Repository
**Location:** PolicyEngine/microdf
**Clone:**
```bash
git clone https://github.com/PolicyEngine/microdf
cd microdf
```
### Current Implementation
**To see current API:**
```bash
# Main classes
cat microdf/microframe.py # MicroDataFrame
cat microdf/microseries.py # MicroSeries
# Key modules
cat microdf/generic.py # Generic weighted operations
cat microdf/inequality.py # Gini, top shares
cat microdf/poverty.py # Poverty metrics
```
**To see all methods:**
```bash
# MicroDataFrame methods
grep "def " microdf/microframe.py
# MicroSeries methods
grep "def " microdf/microseries.py
```
### Testing
**To see test patterns:**
```bash
ls tests/
cat tests/test_microframe.py
```
**Run tests:**
```bash
make test
# Or
pytest tests/ -v
```
### Contributing
**Before contributing:**
1. Check if method already exists
2. Ensure it's weighted correctly
3. Add tests
4. Follow policyengine-standards-skill
**Common contributions:**
- New inequality metrics
- New poverty measures
- Performance optimizations
- Bug fixes
## Advanced Patterns
### Custom Aggregations
```python
# Define custom weighted aggregation
def weighted_operation(series, weights):
return (series * weights).sum() / weights.sum()
# Apply to MicroSeries
result = weighted_operation(mdf_df.income, mdf_df.weights)
```
### Groupby Operations
```python
# Group by with weights
grouped = mdf_df.groupby('state')
state_means = grouped.income.mean() # Weighted means by state
```
### Inequality Decomposition
**To see decomposition methods:**
```bash
grep -A 20 "def.*decomp" microdf/
```
## Integration Examples
### Example 1: PolicyEngine Blog Post Analysis
```python
# Pattern from PolicyEngine blog posts
import microdf as mdf
# Get simulation results
baseline_income = baseline_sim.calculate("household_net_income", 2024)
reform_income = reform_sim.calculate("household_net_income", 2024)
weights = baseline_sim.calculate("household_weight", 2024)
# Create MicroDataFrame
df = pd.DataFrame({
'baseline_income': baseline_income,
'reform_income': reform_income,
'weight': weights
})
mdf_df = mdf.MicroDataFrame(df, weights='weight')
# Calculate impacts
baseline_gini = mdf_df.baseline_income.gini()
reform_gini = mdf_df.reform_income.gini()
print(f"Gini change: {reform_gini - baseline_gini:+.4f}")
```
### Example 2: Poverty Analysis
```python
# Calculate poverty under baseline and reform
from policyengine_us import Simulation
baseline_sim = Simulation(situation=situation)
reform_sim = Simulation(situation=situation, reform=reform)
# Get incomes
baseline_income = baseline_sim.calculate("spm_unit_net_income", 2024)
reform_income = reform_sim.calculate("spm_unit_net_income", 2024)
spm_threshold = baseline_sim.calculate("spm_unit_poverty_threshold", 2024)
weights = baseline_sim.calculate("spm_unit_weight", 2024)
# Calculate poverty rates
df_baseline = mdf.MicroDataFrame(
pd.DataFrame({'income': baseline_income, 'threshold': spm_threshold, 'weight': weights}),
weights='weight'
)
poverty_baseline = (df_baseline.income < df_baseline.threshold).mean() # Weighted
# Similar for reform
print(f"Poverty reduction: {(poverty_baseline - poverty_reform):.1%}")
```
## Package Status
**Maturity:** Stable, production-ready
**API stability:** Stable (rarely breaking changes)
**Performance:** Optimized for large datasets
**To see version:**
```bash
pip show microdf-python
```
**To see changelog:**
```bash
cat CHANGELOG.md # In microdf repo
```
## Related Skills
- **policyengine-us-skill** - Generating data for microdf analysis
- **policyengine-analysis-skill** - Using microdf in policy analysis
- **policyengine-us-data-skill** - Data sources for microdf
## Resources
**Repository:** https://github.com/PolicyEngine/microdf
**PyPI:** https://pypi.org/project/microdf-python/
**Issues:** https://github.com/PolicyEngine/microdf/issues

View File

@@ -0,0 +1,415 @@
---
name: microimpute
description: ML-based variable imputation for survey data - used in policyengine-us-data to fill missing values
---
# MicroImpute
MicroImpute enables ML-based variable imputation through different statistical methods, with comparison and benchmarking capabilities.
## For Users 👥
### What is MicroImpute?
When PolicyEngine calculates population impacts, the underlying survey data has missing information. MicroImpute uses machine learning to fill in those gaps intelligently.
**What imputation does:**
- Fills missing data in surveys
- Uses machine learning to predict missing values
- Maintains statistical relationships
- Improves PolicyEngine accuracy
**Example:**
- Survey asks about income but not capital gains breakdown
- MicroImpute predicts short-term vs long-term capital gains
- Based on patterns from IRS data
- Result: More accurate tax calculations
**You benefit from imputation when:**
- PolicyEngine calculates capital gains tax accurately
- Benefits eligibility uses complete household information
- State-specific calculations have all needed data
## For Analysts 📊
### Installation
```bash
pip install microimpute
# With image export (for plots)
pip install microimpute[images]
```
### What MicroImpute Does
**Imputation problem:**
- Donor dataset has complete information (e.g., IRS tax records)
- Recipient dataset has missing variables (e.g., CPS survey)
- Imputation predicts missing values in recipient using donor patterns
**Methods available:**
- Linear regression
- Random forest
- Quantile forest (preserves full distribution)
- XGBoost
- Hot deck (traditional matching)
### Quick Example
```python
from microimpute import Imputer
import pandas as pd
# Donor data (complete)
donor = pd.DataFrame({
'income': [50000, 60000, 70000],
'age': [30, 40, 50],
'capital_gains': [5000, 8000, 12000] # Variable to impute
})
# Recipient data (missing capital_gains)
recipient = pd.DataFrame({
'income': [55000, 65000],
'age': [35, 45],
# capital_gains is missing
})
# Impute using quantile forest
imputer = Imputer(method='quantile_forest')
imputer.fit(
donor=donor,
donor_target='capital_gains',
common_vars=['income', 'age']
)
recipient_imputed = imputer.predict(recipient)
# Now recipient has predicted capital_gains
```
### Method Comparison
```python
from microimpute import compare_methods
# Compare different imputation methods
results = compare_methods(
donor=donor,
recipient=recipient,
target_var='capital_gains',
common_vars=['income', 'age'],
methods=['linear', 'random_forest', 'quantile_forest']
)
# Shows quantile loss for each method
print(results)
```
### Quantile Loss (Quality Metric)
**Why quantile loss:**
- Measures how well imputation preserves the distribution
- Not just mean accuracy, but full distribution shape
- Lower is better
**Interpretation:**
```python
# Quantile loss around 0.1 = good
# Quantile loss around 0.5 = poor
# Compare across methods to choose best
```
## For Contributors 💻
### Repository
**Location:** PolicyEngine/microimpute
**Clone:**
```bash
git clone https://github.com/PolicyEngine/microimpute
cd microimpute
```
### Current Implementation
**To see structure:**
```bash
tree microimpute/
# Key modules:
ls microimpute/
# - imputer.py - Main Imputer class
# - methods/ - Different imputation methods
# - comparison.py - Method benchmarking
# - utils/ - Utilities
```
**To see specific methods:**
```bash
# Quantile forest implementation
cat microimpute/methods/quantile_forest.py
# Random forest
cat microimpute/methods/random_forest.py
# Linear regression
cat microimpute/methods/linear.py
```
### Dependencies
**Required:**
- numpy, pandas (data handling)
- scikit-learn (ML models)
- quantile-forest (distributional imputation)
- optuna (hyperparameter tuning)
- statsmodels (statistical methods)
- scipy (statistical functions)
**To see all dependencies:**
```bash
cat pyproject.toml
```
### Adding New Imputation Methods
**Pattern:**
```python
# microimpute/methods/my_method.py
class MyMethodImputer:
def fit(self, X_train, y_train):
"""Train on donor data."""
# Fit your model
pass
def predict(self, X_test):
"""Impute on recipient data."""
# Return predictions
pass
def get_quantile_loss(self, X_val, y_val):
"""Compute validation loss."""
# Evaluate quality
pass
```
### Usage in policyengine-us-data
**To see how data pipeline uses microimpute:**
```bash
cd ../policyengine-us-data
# Find usage
grep -r "microimpute" policyengine_us_data/
grep -r "Imputer" policyengine_us_data/
```
**Typical workflow:**
1. Load CPS (has demographics, missing capital gains details)
2. Load IRS PUF (has complete tax data)
3. Use microimpute to predict missing CPS variables from PUF patterns
4. Validate imputation quality
5. Save enhanced dataset
### Testing
**Run tests:**
```bash
make test
# Or
pytest tests/ -v --cov=microimpute
```
**To see test patterns:**
```bash
cat tests/test_imputer.py
cat tests/test_methods.py
```
## Common Patterns
### Pattern 1: Basic Imputation
```python
from microimpute import Imputer
# Create imputer
imputer = Imputer(method='quantile_forest')
# Fit on donor (complete data)
imputer.fit(
donor=donor_df,
donor_target='target_variable',
common_vars=['age', 'income', 'state']
)
# Predict on recipient (missing target_variable)
recipient_imputed = imputer.predict(recipient_df)
```
### Pattern 2: Choosing Best Method
```python
from microimpute import compare_methods
# Test multiple methods
methods = ['linear', 'random_forest', 'quantile_forest', 'xgboost']
results = compare_methods(
donor=donor,
recipient=recipient,
target_var='target',
common_vars=common_vars,
methods=methods
)
# Use method with lowest quantile loss
best_method = results.sort_values('quantile_loss').iloc[0]['method']
```
### Pattern 3: Multiple Variable Imputation
```python
# Impute several variables
variables_to_impute = [
'short_term_capital_gains',
'long_term_capital_gains',
'qualified_dividends'
]
for var in variables_to_impute:
imputer = Imputer(method='quantile_forest')
imputer.fit(donor=irs_puf, donor_target=var, common_vars=common_vars)
cps[var] = imputer.predict(cps)
```
## Advanced Features
### Hyperparameter Tuning
**Built-in Optuna integration:**
```python
from microimpute import tune_hyperparameters
# Automatically find best hyperparameters
best_params, study = tune_hyperparameters(
donor=donor,
target_var='target',
common_vars=common_vars,
method='quantile_forest',
n_trials=100
)
# Use tuned parameters
imputer = Imputer(method='quantile_forest', **best_params)
```
### Cross-Validation
**Validate imputation quality:**
```python
from sklearn.model_selection import cross_val_score
# Split donor for validation
# Impute on validation set
# Measure accuracy
```
### Visualization
**Plot imputation results:**
```python
import plotly.express as px
# Compare imputed vs actual (on donor validation set)
fig = px.scatter(
x=actual_values,
y=imputed_values,
labels={'x': 'Actual', 'y': 'Imputed'}
)
fig.add_trace(px.line(x=[min, max], y=[min, max])) # 45-degree line
```
## Statistical Background
**Imputation preserves:**
- Marginal distributions (imputed variable distribution matches donor)
- Conditional relationships (imputation depends on common variables)
- Uncertainty (quantile methods preserve full distribution)
**Trade-offs:**
- **Linear:** Fast, but assumes linear relationships
- **Random forest:** Handles non-linearity, may overfit
- **Quantile forest:** Preserves full distribution, slower
- **XGBoost:** High accuracy, requires tuning
## Integration with PolicyEngine
**Full pipeline (policyengine-us-data):**
```
1. Load CPS survey data
2. microimpute: Fill missing variables from IRS PUF
3. microcalibrate: Adjust weights to match benchmarks
4. Validation: Check against administrative totals
5. Package: Distribute enhanced dataset
6. PolicyEngine: Use for population simulations
```
## Comparison to Other Methods
**MicroImpute vs traditional imputation:**
**Traditional (mean imputation):**
- Fast but destroys distribution
- All missing values get same value
- Underestimates variance
**MicroImpute (ML methods):**
- Preserves relationships
- Different predictions per record
- Maintains distribution shape
**Quantile forest advantage:**
- Predicts full conditional distribution
- Not just point estimates
- Can sample from predicted distribution
## Performance Tips
**For large datasets:**
```python
# Use random forest (faster than quantile forest)
imputer = Imputer(method='random_forest')
# Or subsample donor
donor_sample = donor.sample(n=10000, random_state=42)
imputer.fit(donor=donor_sample, ...)
```
**For high accuracy:**
```python
# Use quantile forest with tuning
best_params, _ = tune_hyperparameters(...)
imputer = Imputer(method='quantile_forest', **best_params)
```
## Related Skills
- **l0-skill** - Regularization techniques
- **microcalibrate-skill** - Survey calibration (next step after imputation)
- **policyengine-us-data-skill** - Complete data pipeline
- **microdf-skill** - Working with imputed/calibrated data
## Resources
**Repository:** https://github.com/PolicyEngine/microimpute
**PyPI:** https://pypi.org/project/microimpute/
**Documentation:** See README and docstrings in source

View File

@@ -0,0 +1,880 @@
---
name: policyengine-design
description: PolicyEngine visual identity - colors, fonts, logos, and branding for web apps, calculators, charts, and research
---
# PolicyEngine Design System
PolicyEngine's visual identity and branding guidelines for creating consistent user experiences across web apps, calculators, charts, and research outputs.
## For Users 👥
### PolicyEngine Visual Identity
**Brand colors:**
- **Teal** (#39C6C0) - Primary accent color (buttons, highlights, interactive elements)
- **Blue** (#2C6496) - Secondary color (links, charts, headers)
**Typography:**
- **Charts:** Roboto Serif
- **Web app:** System fonts (sans-serif)
- **Streamlit apps:** Default sans-serif
**Logo:**
- Used in charts (bottom right)
- Blue version for light backgrounds
- White version for dark backgrounds
### Recognizing PolicyEngine Content
**You can identify PolicyEngine content by:**
- Teal accent color (#39C6C0) on buttons and interactive elements
- Blue (#2C6496) in charts and links
- Roboto Serif font in charts
- PolicyEngine logo in chart footer
- Clean, minimal white backgrounds
- Data-focused, quantitative presentation
## For Analysts 📊
### Chart Branding
When creating charts for PolicyEngine analysis, follow these guidelines:
#### Color Palette
**Primary colors:**
```python
TEAL_ACCENT = "#39C6C0" # Primary color (teal)
BLUE_PRIMARY = "#2C6496" # Secondary color (blue)
DARK_GRAY = "#616161" # Text color
```
**Extended palette:**
```python
# Blues
BLUE = "#2C6496"
BLUE_LIGHT = "#D8E6F3"
BLUE_PRESSED = "#17354F"
BLUE_98 = "#F7FAFD"
DARK_BLUE_HOVER = "#1d3e5e"
DARKEST_BLUE = "#0C1A27"
# Teals
TEAL_ACCENT = "#39C6C0"
TEAL_LIGHT = "#F7FDFC"
TEAL_PRESSED = "#227773"
# Grays
DARK_GRAY = "#616161"
GRAY = "#808080"
MEDIUM_LIGHT_GRAY = "#BDBDBD"
MEDIUM_DARK_GRAY = "#D2D2D2"
LIGHT_GRAY = "#F2F2F2"
# Accents
WHITE = "#FFFFFF"
BLACK = "#000000"
DARK_RED = "#b50d0d" # For negative values
```
**See current colors:**
```bash
cat policyengine-app/src/style/colors.js
```
#### Plotly Chart Formatting
**Standard PolicyEngine chart:**
```python
import plotly.graph_objects as go
def format_fig(fig):
"""Format chart with PolicyEngine branding."""
fig.update_layout(
# Typography
font=dict(
family="Roboto Serif",
color="black"
),
# Background
plot_bgcolor="white",
template="plotly_white",
# Margins (leave room for logo)
margin=dict(
l=50,
r=100,
t=50,
b=120,
pad=4
),
# Chart size
height=600,
width=800,
)
# Add PolicyEngine logo (bottom right)
fig.add_layout_image(
dict(
source="https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png",
xref="paper",
yref="paper",
x=1.0,
y=-0.10,
sizex=0.10,
sizey=0.10,
xanchor="right",
yanchor="bottom"
)
)
# Clean modebar
fig.update_layout(
modebar=dict(
bgcolor="rgba(0,0,0,0)",
color="rgba(0,0,0,0)"
)
)
return fig
# Usage
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_data, y=y_data, line=dict(color=TEAL_ACCENT)))
fig = format_fig(fig)
```
**Current implementation:**
```bash
# See format_fig in action
cat givecalc/ui/visualization.py
cat policyengine-app/src/pages/policy/output/...
```
#### Chart Colors
**For line charts:**
- Primary line: Teal (#39C6C0) or Blue (#2C6496)
- Background lines: Light gray (rgb(180, 180, 180))
- Markers: Teal with 70% opacity
**For bar charts:**
- Positive values: Teal (#39C6C0)
- Negative values: Dark red (#b50d0d)
- Neutral: Gray
**For multiple series:**
Use variations of blue and teal, or discrete color scale:
```python
colors = ["#2C6496", "#39C6C0", "#17354F", "#227773"]
```
#### Typography
**Charts:**
```python
font=dict(family="Roboto Serif", size=14, color="black")
```
**Axis labels:**
```python
xaxis=dict(
title=dict(text="Label", font=dict(size=14)),
tickfont=dict(size=12)
)
```
**Load Roboto font:**
```python
# In Streamlit apps
st.markdown("""
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;700&display=swap" rel="stylesheet">
""", unsafe_allow_html=True)
```
### Streamlit App Branding
**Streamlit configuration (.streamlit/config.toml):**
```toml
[theme]
base = "light"
primaryColor = "#39C6C0" # Teal accent
backgroundColor = "#FFFFFF" # White background
secondaryBackgroundColor = "#F7FDFC" # Teal light
textColor = "#616161" # Dark gray
[client]
toolbarMode = "minimal"
```
**Current implementation:**
```bash
cat givecalc/.streamlit/config.toml
cat salt-amt-calculator/.streamlit/config.toml # Other calculators
```
### Logo Usage
**Logo URLs:**
```python
# Blue logo (for light backgrounds)
LOGO_BLUE = "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png"
# White logo (for dark backgrounds)
LOGO_WHITE = "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/white.png"
# SVG versions (scalable)
LOGO_BLUE_SVG = "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.svg"
LOGO_WHITE_SVG = "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/white.svg"
```
**Logo placement in charts:**
- Bottom right corner
- 10% of chart width
- Slightly below bottom edge (y=-0.10)
**Current logos:**
```bash
ls policyengine-app/src/images/logos/policyengine/
```
### Complete Example: Branded Chart
```python
import plotly.graph_objects as go
# PolicyEngine colors
TEAL_ACCENT = "#39C6C0"
BLUE_PRIMARY = "#2C6496"
# Create chart
fig = go.Figure()
# Add data
fig.add_trace(go.Scatter(
x=incomes,
y=taxes,
mode='lines',
name='Tax liability',
line=dict(color=TEAL_ACCENT, width=3)
))
# Apply PolicyEngine branding
fig.update_layout(
# Typography
font=dict(family="Roboto Serif", size=14, color="black"),
# Title and labels
title="Tax liability by income",
xaxis_title="Income",
yaxis_title="Tax ($)",
# Formatting
xaxis_tickformat="$,.0f",
yaxis_tickformat="$,.0f",
# Appearance
plot_bgcolor="white",
template="plotly_white",
# Size and margins
height=600,
width=800,
margin=dict(l=50, r=100, t=50, b=120, pad=4)
)
# Add logo
fig.add_layout_image(
dict(
source="https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png",
xref="paper",
yref="paper",
x=1.0,
y=-0.10,
sizex=0.10,
sizey=0.10,
xanchor="right",
yanchor="bottom"
)
)
# Show
fig.show()
```
## For Contributors 💻
### Brand Assets
**Repository:** PolicyEngine/policyengine-app-v2 (current), PolicyEngine/policyengine-app (legacy)
**Logo files:**
```bash
# Logos in app (both v1 and v2 use same logos)
ls policyengine-app/src/images/logos/policyengine/
# - blue.png - For light backgrounds
# - white.png - For dark backgrounds
# - blue.svg - Scalable blue logo
# - white.svg - Scalable white logo
# - banners/ - Banner variations
# - profile/ - Profile/avatar versions
```
**Access logos:**
```bash
# View logo files (v1 repo has the assets)
cd policyengine-app/src/images/logos/policyengine/
ls -la
```
### Color Definitions
**⚠️ IMPORTANT: App V2 Transition**
PolicyEngine is transitioning to policyengine-app-v2 with updated design tokens. Use app-v2 colors for new projects.
**Current colors (policyengine-app-v2):**
```typescript
// policyengine-app-v2/app/src/designTokens/colors.ts
// Primary (teal) - 50 to 900 scale
primary[500]: "#319795" // Main teal
primary[400]: "#38B2AC" // Lighter teal
primary[600]: "#2C7A7B" // Darker teal
// Blue scale
blue[700]: "#026AA2" // Primary blue
blue[500]: "#0EA5E9" // Lighter blue
// Gray scale
gray[700]: "#344054" // Dark text
gray[100]: "#F2F4F7" // Light backgrounds
// Semantic
success: "#22C55E"
warning: "#FEC601"
error: "#EF4444"
// Background
background.primary: "#FFFFFF"
background.secondary: "#F5F9FF"
// Text
text.primary: "#000000"
text.secondary: "#5A5A5A"
```
**To see current design tokens:**
```bash
cat policyengine-app-v2/app/src/designTokens/colors.ts
cat policyengine-app-v2/app/src/styles/colors.ts # Mantine integration
```
**Legacy colors (policyengine-app - still used in some projects):**
```javascript
// policyengine-app/src/style/colors.js
TEAL_ACCENT = "#39C6C0" // Old teal (slightly different from v2)
BLUE = "#2C6496" // Old blue
DARK_GRAY = "#616161" // Old dark gray
```
**To see legacy colors:**
```bash
cat policyengine-app/src/style/colors.js
```
**Usage in React (app-v2):**
```typescript
import { colors } from 'designTokens';
<Button style={{ backgroundColor: colors.primary[500] }} />
<Text style={{ color: colors.text.primary }} />
```
**Usage in Python/Plotly (use legacy colors for now):**
```python
# For charts, continue using legacy colors until officially migrated
TEAL_ACCENT = "#39C6C0" # From original app
BLUE_PRIMARY = "#2C6496" # From original app
# Or use app-v2 colors
TEAL_PRIMARY = "#319795" # From app-v2
BLUE_PRIMARY_V2 = "#026AA2" # From app-v2
```
### Typography
**Fonts:**
**For charts (Plotly):**
```python
font=dict(family="Roboto Serif")
```
**For web apps:**
```javascript
// System font stack
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, ...
```
**Loading Google Fonts:**
```html
<!-- In Streamlit or HTML -->
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;700&family=Roboto+Serif:wght@300;400;500;700&display=swap" rel="stylesheet">
```
### Chart Formatting Function
**Reference implementation:**
```bash
# GiveCalc format_fig function
cat givecalc/ui/visualization.py
# Shows:
# - Roboto Serif font
# - White background
# - Logo placement
# - Margin configuration
```
**Pattern to follow:**
```python
def format_fig(fig: go.Figure) -> go.Figure:
"""Format figure with PolicyEngine branding.
This function is used across PolicyEngine projects to ensure
consistent chart appearance.
"""
# Font
fig.update_layout(
font=dict(family="Roboto Serif", color="black")
)
# Background
fig.update_layout(
template="plotly_white",
plot_bgcolor="white"
)
# Size
fig.update_layout(height=600, width=800)
# Margins (room for logo)
fig.update_layout(
margin=dict(l=50, r=100, t=50, b=120, pad=4)
)
# Logo
fig.add_layout_image(
dict(
source="https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png",
xref="paper",
yref="paper",
x=1.0,
y=-0.10,
sizex=0.10,
sizey=0.10,
xanchor="right",
yanchor="bottom"
)
)
# Clean modebar
fig.update_layout(
modebar=dict(
bgcolor="rgba(0,0,0,0)",
color="rgba(0,0,0,0)"
)
)
return fig
```
### Streamlit Theme Configuration
**Standard .streamlit/config.toml:**
```toml
[theme]
base = "light"
primaryColor = "#39C6C0" # Teal accent
backgroundColor = "#FFFFFF" # White
secondaryBackgroundColor = "#F7FDFC" # Teal light
textColor = "#616161" # Dark gray
font = "sans serif"
[client]
toolbarMode = "minimal"
showErrorDetails = true
```
**Usage:**
```bash
# Create .streamlit directory in your project
mkdir .streamlit
# Copy configuration
cat > .streamlit/config.toml << 'EOF'
[theme]
base = "light"
primaryColor = "#39C6C0"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F7FDFC"
textColor = "#616161"
font = "sans serif"
EOF
```
**Current examples:**
```bash
cat givecalc/.streamlit/config.toml
```
## Design Patterns by Project Type
### Streamlit Calculators (GiveCalc, SALT Calculator, etc.)
**Required branding:**
1. ✅ .streamlit/config.toml with PolicyEngine theme
2. ✅ Charts use format_fig() function with logo
3. ✅ Teal accent color for interactive elements
4. ✅ Roboto Serif for charts
**Example:**
```python
import streamlit as st
import plotly.graph_objects as go
# Constants
TEAL_ACCENT = "#39C6C0"
# Streamlit UI
st.title("Calculator Name") # Uses theme colors automatically
st.button("Calculate", type="primary") # Teal accent from theme
# Charts
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, line=dict(color=TEAL_ACCENT)))
fig = format_fig(fig) # Add branding
st.plotly_chart(fig)
```
### Jupyter Notebooks / Analysis Scripts
**Required branding:**
1. ✅ Charts use format_fig() with logo
2. ✅ PolicyEngine color palette
3. ✅ Roboto Serif font
**Example:**
```python
import plotly.graph_objects as go
TEAL_ACCENT = "#39C6C0"
BLUE_PRIMARY = "#2C6496"
fig = go.Figure()
fig.add_trace(go.Scatter(
x=data.income,
y=data.tax_change,
line=dict(color=TEAL_ACCENT, width=3)
))
fig.update_layout(
font=dict(family="Roboto Serif", size=14),
title="Tax impact by income",
xaxis_title="Income",
yaxis_title="Tax change ($)",
plot_bgcolor="white"
)
# Add logo
fig.add_layout_image(...)
```
### React App Components
**Color usage:**
```javascript
import colors from "style/colors";
// Interactive elements
<Button style={{ backgroundColor: colors.TEAL_ACCENT }}>
Click me
</Button>
// Links
<a style={{ color: colors.BLUE }}>Learn more</a>
// Text
<p style={{ color: colors.DARK_GRAY }}>Description</p>
```
**Current colors:**
```bash
cat policyengine-app/src/style/colors.js
```
## Visual Guidelines
### Chart Design Principles
1. **Minimal decoration** - Let data speak
2. **White backgrounds** - Clean, print-friendly
3. **Clear axis labels** - Always include units
4. **Formatted numbers** - Currency ($), percentages (%), etc.
5. **Logo inclusion** - Bottom right, never intrusive
6. **Consistent sizing** - 800x600 standard
7. **Roboto Serif** - Professional, readable font
### Color Usage Rules
**Primary actions:**
- Use TEAL_ACCENT (#39C6C0)
- Buttons, highlights, current selection
**Chart lines:**
- Primary data: TEAL_ACCENT or BLUE_PRIMARY
- Secondary data: BLUE_LIGHT or GRAY
- Negative values: DARK_RED (#b50d0d)
**Backgrounds:**
- Main: WHITE (#FFFFFF)
- Secondary: TEAL_LIGHT (#F7FDFC) or BLUE_98 (#F7FAFD)
- Plot area: WHITE
**Text:**
- Primary: BLACK (#000000)
- Secondary: DARK_GRAY (#616161)
- Muted: GRAY (#808080)
### Accessibility
**Color contrast requirements:**
- Text on background: 4.5:1 minimum (WCAG AA)
- DARK_GRAY on WHITE: ✅ Passes
- TEAL_ACCENT on WHITE: ✅ Passes for large text
- Use sufficient line weights for visibility
**Don't rely on color alone:**
- Use patterns or labels for different data series
- Ensure charts work in grayscale
## Common Branding Tasks
### Task 1: Create Branded Plotly Chart
1. **Define colors:**
```python
TEAL_ACCENT = "#39C6C0"
BLUE_PRIMARY = "#2C6496"
```
2. **Create chart:**
```python
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, line=dict(color=TEAL_ACCENT)))
```
3. **Apply branding:**
```python
fig = format_fig(fig) # See implementation above
```
### Task 2: Setup Streamlit Branding
1. **Create config directory:**
```bash
mkdir .streamlit
```
2. **Copy theme config:**
```bash
cat givecalc/.streamlit/config.toml > .streamlit/config.toml
```
3. **Verify in app:**
```python
import streamlit as st
st.button("Test", type="primary") # Should be teal
```
### Task 3: Brand Consistency Check
**Checklist:**
- [ ] Charts use Roboto Serif font
- [ ] Primary color is TEAL_ACCENT (#39C6C0)
- [ ] Secondary color is BLUE_PRIMARY (#2C6496)
- [ ] White backgrounds
- [ ] Logo in charts (bottom right)
- [ ] Currency formatted with $ and commas
- [ ] Percentages formatted with %
- [ ] Streamlit config.toml uses PolicyEngine theme
## Reference Implementations
### Excellent Examples
**Streamlit calculators:**
```bash
# GiveCalc - Complete example
cat givecalc/ui/visualization.py
cat givecalc/.streamlit/config.toml
# Other calculators
ls salt-amt-calculator/
ls ctc-calculator/
```
**Blog post charts:**
```bash
# Analysis with branded charts
cat policyengine-app/src/posts/articles/harris-eitc.md
cat policyengine-app/src/posts/articles/montana-tax-cuts-2026.md
```
**React app components:**
```bash
# Charts in app
cat policyengine-app/src/pages/policy/output/DistributionalImpact.jsx
```
### Don't Use These
**❌ Wrong colors:**
```python
# Don't use random colors
color = "#FF5733"
color = "red"
color = "green"
```
**❌ Wrong fonts:**
```python
# Don't use other fonts for charts
font = dict(family="Arial")
font = dict(family="Times New Roman")
```
**❌ Missing logo:**
```python
# Don't skip the logo in charts for publication
# All published charts should include PolicyEngine logo
```
## Assets and Resources
### Logo Files
**In policyengine-app repository:**
```bash
policyengine-app/src/images/logos/policyengine/
├── blue.png # Primary logo (light backgrounds)
├── white.png # Logo for dark backgrounds
├── blue.svg # Scalable blue logo
├── white.svg # Scalable white logo
├── banners/ # Banner variations
└── profile/ # Profile/avatar versions
```
**Raw URLs for direct use:**
```python
# Use these URLs in code
LOGO_URL = "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png"
```
### Font Files
**Roboto (charts):**
- Google Fonts: https://fonts.google.com/specimen/Roboto
- Family: Roboto Serif
- Weights: 300 (light), 400 (regular), 500 (medium), 700 (bold)
**Loading:**
```html
<link href="https://fonts.googleapis.com/css2?family=Roboto+Serif:wght@300;400;500;700&display=swap" rel="stylesheet">
```
### Color Reference Files
**JavaScript (React app):**
```bash
cat policyengine-app/src/style/colors.js
```
**Python (calculators, analysis):**
```python
# Define in constants.py or at top of file
TEAL_ACCENT = "#39C6C0"
BLUE_PRIMARY = "#2C6496"
DARK_GRAY = "#616161"
WHITE = "#FFFFFF"
```
## Brand Evolution
**Current identity (2025):**
- Teal primary (#39C6C0)
- Blue secondary (#2C6496)
- Roboto Serif for charts
- Minimal, data-focused design
**If brand evolves:**
- Colors defined in policyengine-app/src/style/colors.js are source of truth
- Update this skill to point to current definitions
- Never hardcode - always reference colors.js
## Quick Reference
### Color Codes
| Color | Hex | Usage |
|-------|-----|-------|
| Teal Accent | #39C6C0 | Primary interactive elements |
| Blue Primary | #2C6496 | Secondary, links, charts |
| Dark Gray | #616161 | Body text |
| White | #FFFFFF | Backgrounds |
| Teal Light | #F7FDFC | Secondary backgrounds |
| Dark Red | #b50d0d | Negative values, errors |
### Font Families
| Context | Font |
|---------|------|
| Charts | Roboto Serif |
| Web app | System sans-serif |
| Streamlit | Default sans-serif |
| Code blocks | Monospace |
### Logo URLs
| Background | Format | URL |
|------------|--------|-----|
| Light | PNG | https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png |
| Light | SVG | https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.svg |
| Dark | PNG | https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/white.png |
| Dark | SVG | https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/white.svg |
## Related Skills
- **policyengine-app-skill** - React component styling
- **policyengine-analysis-skill** - Chart creation patterns
- **policyengine-writing-skill** - Content style (complements visual style)
## Resources
**Brand assets:** PolicyEngine/policyengine-app/src/images/
**Color definitions:** PolicyEngine/policyengine-app/src/style/colors.js
**Examples:** givecalc, salt-amt-calculator, crfb-tob-impacts

View File

@@ -0,0 +1,768 @@
---
name: policyengine-standards
description: PolicyEngine coding standards, formatters, CI requirements, and development best practices
---
# PolicyEngine Standards Skill
Use this skill to ensure code meets PolicyEngine's development standards and passes CI checks.
## When to Use This Skill
- Before committing code to any PolicyEngine repository
- When CI checks fail with linting/formatting errors
- Setting up a new PolicyEngine repository
- Reviewing PRs for standard compliance
- When AI tools generate code that needs standardization
## Critical Requirements
### Python Version
⚠️ **MUST USE Python 3.13** - Do NOT downgrade to older versions
- Check version: `python --version`
- Use `pyproject.toml` to specify version requirements
### Command Execution
⚠️ **ALWAYS use `uv run` for Python commands** - Never use bare `python` or `pytest`
- ✅ Correct: `uv run python script.py`, `uv run pytest tests/`
- ❌ Wrong: `python script.py`, `pytest tests/`
- This ensures correct virtual environment and dependencies
### Documentation (Python Projects)
⚠️ **MUST USE Jupyter Book 2.0 (MyST-NB)** - NOT Jupyter Book 1.x
- Build docs: `myst build docs` (NOT `jb build`)
- Use MyST markdown syntax
## Before Committing - Checklist
1. **Write tests first** (TDD - see below)
2. **Format code**: `make format` or language-specific formatter
3. **Run tests**: `make test` to ensure all tests pass
4. **Check linting**: Ensure no linting errors
5. **Use config files**: Prefer config files over environment variables
6. **Reference issues**: Include "Fixes #123" in commit message
## Creating Pull Requests
### The CI Waiting Problem
**Common failure pattern:**
```
User: "Create a PR and mark it ready when CI passes"
Claude: "I've created the PR as draft. CI will take a while, I'll check back later..."
[Chat ends - Claude never checks back]
Result: PR stays in draft, user has to manually check CI and mark ready
```
### Solution: Use /create-pr Command
**When creating PRs, use the /create-pr command:**
```bash
/create-pr
```
**This command:**
- ✅ Creates PR as draft
- ✅ Actually waits for CI (polls every 15 seconds)
- ✅ Marks ready when CI passes
- ✅ Reports failures with details
- ✅ Handles timeouts gracefully
**Why this works:**
The command contains explicit polling logic that Claude executes, so it actually waits instead of giving up.
### If /create-pr is Not Available
**If the command isn't installed, implement the pattern directly:**
```bash
# 1. Create PR as draft
gh pr create --draft --title "Title" --body "Body"
PR_NUMBER=$(gh pr view --json number --jq '.number')
# 2. Wait for CI (ACTUALLY WAIT - don't give up!)
POLL_INTERVAL=15
ELAPSED=0
while true; do # No timeout - wait as long as needed
CHECKS=$(gh pr checks $PR_NUMBER --json status,conclusion)
TOTAL=$(echo "$CHECKS" | jq '. | length')
COMPLETED=$(echo "$CHECKS" | jq '[.[] | select(.status == "COMPLETED")] | length')
echo "[$ELAPSED s] CI: $COMPLETED/$TOTAL completed"
if [ "$COMPLETED" -eq "$TOTAL" ] && [ "$TOTAL" -gt 0 ]; then
FAILED=$(echo "$CHECKS" | jq '[.[] | select(.conclusion == "FAILURE")] | length')
if [ "$FAILED" -eq 0 ]; then
echo "✅ All CI passed! Marking ready..."
gh pr ready $PR_NUMBER
break
else
echo "❌ CI failed. PR remains draft."
gh pr checks $PR_NUMBER
break
fi
fi
sleep $POLL_INTERVAL
ELAPSED=$((ELAPSED + POLL_INTERVAL))
done
# Important: No timeout! Population simulations can take 30+ minutes.
```
### DO NOT Say "I'll Check Back Later"
**❌ WRONG:**
```
"I've created the PR as draft. CI checks will take a few minutes.
I'll check back later once they complete."
```
**Why wrong:** You cannot check back later. The chat session ends.
**✅ CORRECT:**
```
"I've created the PR as draft. Now polling CI status every 15 seconds..."
[Actually polls using while loop]
"CI checks completed. All passed! Marking PR as ready for review."
```
### When to Create Draft vs Ready
**Always create as draft when:**
- CI checks are configured
- User asks to wait for CI
- Making automated changes
- Unsure if CI will pass
**Create as ready only when:**
- User explicitly requests ready PR
- No CI configured
- CI already verified locally
### PR Workflow Standards
**Standard flow:**
```bash
# 1. Ensure branch is pushed
git push -u origin feature-branch
# 2. Create PR as draft
gh pr create --draft --title "..." --body "..."
# 3. Wait for CI (use polling loop - see pattern above)
# 4. If CI passes:
gh pr ready $PR_NUMBER
# 5. If CI fails:
echo "CI failed. PR remains draft. Fix issues and push again."
```
## Test-Driven Development (TDD)
PolicyEngine follows Test-Driven Development practices across all repositories.
### TDD Workflow
**1. Write test first (RED):**
```python
# tests/test_new_feature.py
def test_california_eitc_calculation():
"""Test California EITC for family with 2 children earning $30,000."""
situation = create_family(income=30000, num_children=2, state="CA")
sim = Simulation(situation=situation)
ca_eitc = sim.calculate("ca_eitc", 2024)[0]
# Test fails initially (feature not implemented yet)
assert ca_eitc == 3000, "CA EITC should be $3,000 for this household"
```
**2. Implement feature (GREEN):**
```python
# policyengine_us/variables/gov/states/ca/tax/income/credits/ca_eitc.py
class ca_eitc(Variable):
value_type = float
entity = TaxUnit
definition_period = YEAR
def formula(tax_unit, period, parameters):
# Implementation to make test pass
federal_eitc = tax_unit("eitc", period)
return federal_eitc * parameters(period).gov.states.ca.tax.eitc.match
```
**3. Refactor (REFACTOR):**
```python
# Clean up, optimize, add documentation
# All while tests continue to pass
```
### TDD Benefits
**Why PolicyEngine uses TDD:**
-**Accuracy** - Tests verify implementation matches regulations
-**Documentation** - Tests show expected behavior
-**Regression prevention** - Changes don't break existing features
-**Confidence** - Safe to refactor
-**Isolation** - Multi-agent workflow (test-creator and rules-engineer work separately)
### TDD in Multi-Agent Workflow
**Country model development:**
1. **@document-collector** gathers regulations
2. **@test-creator** writes tests from regulations (isolated, no implementation access)
3. **@rules-engineer** implements from regulations (isolated, no test access)
4. Both work from same source → tests verify implementation accuracy
**See policyengine-core-skill and country-models agents for details.**
### Test Examples
**Python (pytest):**
```python
def test_ctc_for_two_children():
"""Test CTC calculation for married couple with 2 children."""
situation = create_married_couple(
income_1=75000,
income_2=50000,
num_children=2,
child_ages=[5, 8]
)
sim = Simulation(situation=situation)
ctc = sim.calculate("ctc", 2024)[0]
assert ctc == 4000, "CTC should be $2,000 per child"
```
**React (Jest + RTL):**
```javascript
import { render, screen } from '@testing-library/react';
import TaxCalculator from './TaxCalculator';
test('displays calculated tax', () => {
render(<TaxCalculator income={50000} />);
// Test what user sees, not implementation
expect(screen.getByText(/\$5,000/)).toBeInTheDocument();
});
```
### Test Organization
**Python:**
```
tests/
├── test_variables/
│ ├── test_income.py
│ ├── test_deductions.py
│ └── test_credits.py
├── test_parameters/
└── test_simulations/
```
**React:**
```
src/
├── components/
│ └── TaxCalculator/
│ ├── TaxCalculator.jsx
│ └── TaxCalculator.test.jsx
```
### Running Tests
**Python:**
```bash
# All tests
make test
# With uv
uv run pytest tests/ -v
# Specific test
uv run pytest tests/test_credits.py::test_ctc_for_two_children -v
# With coverage
uv run pytest tests/ --cov=policyengine_us --cov-report=html
```
**React:**
```bash
# All tests
make test
# Watch mode
npm test -- --watch
# Specific test
npm test -- TaxCalculator.test.jsx
# Coverage
npm test -- --coverage
```
### Test Quality Standards
**Good tests:**
- ✅ Test behavior, not implementation
- ✅ Clear, descriptive names
- ✅ Single assertion per test (when possible)
- ✅ Include documentation (docstrings)
- ✅ Based on official regulations with citations
**Bad tests:**
- ❌ Testing private methods
- ❌ Mocking everything
- ❌ No assertion messages
- ❌ Magic numbers without explanation
### Example: TDD for New Feature
```python
# Step 1: Write test (RED)
def test_new_york_empire_state_child_credit():
"""Test NY Empire State Child Credit for family with 1 child.
Based on NY Tax Law Section 606(c-1).
Family earning $50,000 with 1 child under 4 should receive $330.
"""
situation = create_family(
income=50000,
num_children=1,
child_ages=[2],
state="NY"
)
sim = Simulation(situation=situation)
credit = sim.calculate("ny_empire_state_child_credit", 2024)[0]
assert credit == 330, "Should receive $330 for child under 4"
# Test fails - feature doesn't exist yet
# Step 2: Implement (GREEN)
# Create variable in policyengine_us/variables/gov/states/ny/...
# Test passes
# Step 3: Refactor
# Optimize, add documentation, maintain passing tests
```
## Python Standards
### Formatting
- **Formatter**: Black with 79-character line length
- **Command**: `make format` or `black . -l 79`
- **Check without changes**: `black . -l 79 --check`
```bash
# Format all Python files
make format
# Check if formatting is needed (CI-style)
black . -l 79 --check
```
### Code Style
```python
# Imports: Grouped and alphabetized
import os
import sys
from pathlib import Path # stdlib
import numpy as np
import pandas as pd # third-party
from policyengine_us import Simulation # local
# Naming conventions
class TaxCalculator: # CamelCase for classes
pass
def calculate_income_tax(income): # snake_case for functions
annual_income = income * 12 # snake_case for variables
return annual_income
# Type hints (recommended)
def calculate_tax(income: float, state: str) -> float:
"""Calculate state income tax.
Args:
income: Annual income in dollars
state: Two-letter state code
Returns:
Tax liability in dollars
"""
pass
# Error handling - catch specific exceptions
try:
result = simulation.calculate("income_tax", 2024)
except KeyError as e:
raise ValueError(f"Invalid variable name: {e}")
```
### Testing
```python
import pytest
def test_ctc_calculation():
"""Test Child Tax Credit calculation for family with 2 children."""
situation = create_family(income=50000, num_children=2)
sim = Simulation(situation=situation)
ctc = sim.calculate("ctc", 2024)[0]
assert ctc == 4000, "CTC should be $2000 per child"
```
**Run tests:**
```bash
# All tests
make test
# Or with uv
uv run pytest tests/ -v
# Specific test
uv run pytest tests/test_tax.py::test_ctc_calculation -v
# With coverage
uv run pytest tests/ --cov=policyengine_us --cov-report=html
```
## JavaScript/React Standards
### Formatting
- **Formatters**: Prettier + ESLint
- **Command**: `npm run lint -- --fix && npx prettier --write .`
- **CI Check**: `npm run lint -- --max-warnings=0`
```bash
# Format all files
make format
# Or manually
npm run lint -- --fix
npx prettier --write .
# Check if formatting is needed (CI-style)
npm run lint -- --max-warnings=0
```
### Code Style
```javascript
// Use functional components only (no class components)
import { useState, useEffect } from "react";
function TaxCalculator({ income, state }) {
const [tax, setTax] = useState(0);
useEffect(() => {
// Calculate tax when inputs change
calculateTax(income, state).then(setTax);
}, [income, state]);
return (
<div>
<p>Tax: ${tax.toLocaleString()}</p>
</div>
);
}
// File naming
// - Components: PascalCase.jsx (TaxCalculator.jsx)
// - Utilities: camelCase.js (formatCurrency.js)
// Environment config - use config file pattern
// src/config/environment.js
const config = {
API_URL: process.env.NODE_ENV === 'production'
? 'https://api.policyengine.org'
: 'http://localhost:5000'
};
export default config;
```
### React Component Size
- Keep components under 150 lines after formatting
- Extract complex logic into custom hooks
- Split large components into smaller ones
## Version Control Standards
### Changelog Management
**CRITICAL**: For PRs, ONLY modify `changelog_entry.yaml`. NEVER manually update `CHANGELOG.md` or `changelog.yaml`.
**Correct Workflow:**
1. Create `changelog_entry.yaml` at repository root:
```yaml
- bump: patch # or minor, major
changes:
added:
- Description of new feature
fixed:
- Description of bug fix
changed:
- Description of change
```
2. Commit ONLY `changelog_entry.yaml` with your code changes
3. GitHub Actions automatically updates `CHANGELOG.md` and `changelog.yaml` on merge
**DO NOT:**
- ❌ Run `make changelog` manually during PR creation
- ❌ Commit `CHANGELOG.md` or `changelog.yaml` in your PR
- ❌ Modify main changelog files directly
### Git Workflow
1. **Create branches on PolicyEngine repos, NOT forks**
- Forks cause CI failures due to missing secrets
- Request write access if needed
2. **Branch naming**: `feature-name` or `fix-issue-123`
3. **Commit messages**:
```
Add CTC reform analysis for CRFB report
- Implement household-level calculations
- Add state-by-state comparison
- Create visualizations
Fixes #123
```
4. **PR description**: Include "Fixes #123" to auto-close issues
### Common Git Pitfalls
**Never do these:**
- ❌ Force push to main/master
- ❌ Commit secrets or `.env` files
- ❌ Skip hooks with `--no-verify`
- ❌ Create versioned files (app_v2.py, component_new.jsx)
**Always do:**
- ✅ Fix original files in place
- ✅ Run formatters before pushing
- ✅ Reference issue numbers in commits
- ✅ Watch CI after filing PR
## Common AI Pitfalls
Since many PRs are AI-generated, watch for these common mistakes:
### 1. File Versioning
**❌ Wrong:**
```bash
# Creating new versions instead of fixing originals
app_new.py
app_v2.py
component_refactored.jsx
```
**✅ Correct:**
```bash
# Always modify the original file
app.py # Fixed in place
```
### 2. Formatter Not Run
**❌ Wrong:** Committing without formatting (main cause of CI failures)
**✅ Correct:**
```bash
# Python
make format
black . -l 79
# React
npm run lint -- --fix
npx prettier --write .
```
### 3. Environment Variables
**❌ Wrong:**
```javascript
// React env vars without REACT_APP_ prefix
const API_URL = process.env.API_URL; // Won't work!
```
**✅ Correct:**
```javascript
// Use config file pattern instead
import config from './config/environment';
const API_URL = config.API_URL;
```
### 4. Using Wrong Python Version
**❌ Wrong:** Downgrading to Python 3.10 or older
**✅ Correct:** Use Python 3.13 as specified in project requirements
### 5. Manual Changelog Updates
**❌ Wrong:** Running `make changelog` and committing `CHANGELOG.md`
**✅ Correct:** Only create `changelog_entry.yaml` in PR
## Repository Setup Patterns
### Python Package Structure
```
policyengine-package/
├── policyengine_package/
│ ├── __init__.py
│ ├── core/
│ ├── calculations/
│ └── utils/
├── tests/
│ ├── test_calculations.py
│ └── test_core.py
├── pyproject.toml
├── Makefile
├── CLAUDE.md
├── CHANGELOG.md
└── README.md
```
### React App Structure
```
policyengine-app/
├── src/
│ ├── components/
│ ├── pages/
│ ├── config/
│ │ └── environment.js
│ └── App.jsx
├── public/
├── package.json
├── .eslintrc.json
├── .prettierrc
└── README.md
```
## Makefile Commands
Standard commands across PolicyEngine repos:
```bash
make install # Install dependencies
make test # Run tests
make format # Format code
make changelog # Update changelog (automation only, not manual)
make debug # Start dev server (apps)
make build # Production build (apps)
```
## CI Stability
### Common CI Issues
**1. Fork PRs Fail**
- **Problem**: PRs from forks don't have access to repository secrets
- **Solution**: Create branches directly on PolicyEngine repos
**2. GitHub API Rate Limits**
- **Problem**: Smoke tests fail with 403 errors
- **Solution**: Re-run failed jobs (different runners have different limits)
**3. Linting Failures**
- **Problem**: Code not formatted before commit
- **Solution**: Always run `make format` before committing
**4. Test Failures in CI but Pass Locally**
- **Problem**: Missing `uv run` prefix
- **Solution**: Use `uv run pytest` instead of `pytest`
## Best Practices Checklist
### Code Quality
- [ ] Code formatted with Black (Python) or Prettier (JS)
- [ ] No linting errors
- [ ] All tests pass
- [ ] Type hints added (Python, where applicable)
- [ ] Docstrings for public functions/classes
- [ ] Error handling with specific exceptions
### Version Control
- [ ] Only `changelog_entry.yaml` created (not CHANGELOG.md)
- [ ] Commit message references issue number
- [ ] Branch created on PolicyEngine repo (not fork)
- [ ] No secrets or .env files committed
- [ ] Original files modified (no _v2 or _new files)
### Testing
- [ ] Tests written for new functionality
- [ ] Tests pass locally with `make test`
- [ ] Coverage maintained or improved
- [ ] Edge cases handled
### Documentation
- [ ] README updated if needed
- [ ] Code comments for complex logic
- [ ] API documentation updated if needed
- [ ] Examples provided for new features
## Quick Reference
### Format Commands by Language
**Python:**
```bash
make format # Format code
black . -l 79 --check # Check formatting
uv run pytest tests/ -v # Run tests
```
**React:**
```bash
make format # Format code
npm run lint -- --max-warnings=0 # Check linting
npm test # Run tests
```
### Pre-Commit Checklist
```bash
# 1. Format
make format
# 2. Test
make test
# 3. Check linting
# Python: black . -l 79 --check
# React: npm run lint -- --max-warnings=0
# 4. Stage and commit
git add .
git commit -m "Description
Fixes #123"
# 5. Push and watch CI
git push
```
## Resources
- **Main CLAUDE.md**: `/PolicyEngine/CLAUDE.md`
- **Python Style**: PEP 8, Black documentation
- **React Style**: Airbnb React/JSX Style Guide
- **Testing**: pytest documentation, Jest/RTL documentation
- **Writing Style**: See policyengine-writing-skill for blog posts, PR descriptions, and documentation
## Examples
See PolicyEngine repositories for examples of standard-compliant code:
- **policyengine-us**: Python package standards
- **policyengine-app**: React app standards
- **givecalc**: Streamlit app standards
- **crfb-tob-impacts**: Analysis repository standards

View File

@@ -0,0 +1,660 @@
---
name: policyengine-uk
description: PolicyEngine-UK tax and benefit microsimulation patterns, situation creation, and common workflows
---
# PolicyEngine-UK
PolicyEngine-UK models the UK tax and benefit system, including devolved variations for Scotland and Wales.
## For Users 👥
### What is PolicyEngine-UK?
PolicyEngine-UK is the "calculator" for UK taxes and benefits. When you use policyengine.org/uk, PolicyEngine-UK runs behind the scenes.
**What it models:**
**Direct taxes:**
- Income tax (UK-wide, Scottish, and Welsh variations)
- National Insurance (Classes 1, 2, 4)
- Capital gains tax
- Dividend tax
**Property and transaction taxes:**
- Council Tax
- Stamp Duty Land Tax (England/NI)
- Land and Buildings Transaction Tax (Scotland)
- Land Transaction Tax (Wales)
**Universal Credit:**
- Standard allowance
- Child elements
- Housing cost element
- Childcare costs element
- Carer element
- Work capability elements
**Legacy benefits (being phased out):**
- Working Tax Credit
- Child Tax Credit
- Income Support
- Income-based JSA/ESA
- Housing Benefit
**Other benefits:**
- Child Benefit
- Pension Credit
- Personal Independence Payment (PIP)
- Disability Living Allowance (DLA)
- Attendance Allowance
- State Pension
**See full list:** https://policyengine.org/uk/parameters
### Understanding Variables
When you see results in PolicyEngine, these are variables:
**Income variables:**
- `employment_income` - Gross employment earnings/salary
- `self_employment_income` - Self-employment profits
- `pension_income` - Private pension income
- `property_income` - Rental income
- `savings_interest_income` - Interest from savings
- `dividend_income` - Dividend income
**Tax variables:**
- `income_tax` - Total income tax liability
- `national_insurance` - Total NI contributions
- `council_tax` - Council tax liability
**Benefit variables:**
- `universal_credit` - Universal Credit amount
- `child_benefit` - Child Benefit amount
- `pension_credit` - Pension Credit amount
- `working_tax_credit` - Working Tax Credit (legacy)
- `child_tax_credit` - Child Tax Credit (legacy)
**Summary variables:**
- `household_net_income` - Income after taxes and benefits
- `disposable_income` - Income after taxes
- `equivalised_household_net_income` - Adjusted for household size
## For Analysts 📊
### Installation and Setup
```bash
# Install PolicyEngine-UK
pip install policyengine-uk
# Or with uv (recommended)
uv pip install policyengine-uk
```
### Quick Start
```python
from policyengine_uk import Simulation
# Create a household
situation = {
"people": {
"person": {
"age": {2025: 30},
"employment_income": {2025: 30000}
}
},
"benunits": {
"benunit": {
"members": ["person"]
}
},
"households": {
"household": {
"members": ["person"],
"region": {2025: "LONDON"}
}
}
}
# Calculate taxes and benefits
sim = Simulation(situation=situation)
income_tax = sim.calculate("income_tax", 2025)[0]
universal_credit = sim.calculate("universal_credit", 2025)[0]
print(f"Income tax: £{income_tax:,.0f}")
print(f"Universal Credit: £{universal_credit:,.0f}")
```
### Web App to Python
**Web app URL:**
```
policyengine.org/uk/household?household=12345
```
**Equivalent Python (conceptually):**
The household ID represents a situation dictionary. To replicate in Python, you'd create a similar situation.
### When to Use This Skill
- Creating household situations for tax/benefit calculations
- Running microsimulations with PolicyEngine-UK
- Analyzing policy reforms and their impacts
- Building tools that use PolicyEngine-UK (calculators, analysis notebooks)
- Debugging PolicyEngine-UK calculations
## For Contributors 💻
### Repository
**Location:** PolicyEngine/policyengine-uk
**To see current implementation:**
```bash
git clone https://github.com/PolicyEngine/policyengine-uk
cd policyengine-uk
# Explore structure
tree policyengine_uk/
```
**Key directories:**
```bash
ls policyengine_uk/
# - variables/ - Tax and benefit calculations
# - parameters/ - Policy rules (YAML)
# - reforms/ - Pre-defined reforms
# - tests/ - Test cases
```
## Core Concepts
### 1. Situation Dictionary Structure
PolicyEngine UK requires a nested dictionary defining household composition:
```python
situation = {
"people": {
"person_id": {
"age": {2025: 35},
"employment_income": {2025: 30000},
# ... other person attributes
}
},
"benunits": {
"benunit_id": {
"members": ["person_id", ...]
}
},
"households": {
"household_id": {
"members": ["person_id", ...],
"region": {2025: "SOUTH_EAST"}
}
}
}
```
**Key Rules:**
- All entities must have consistent member lists
- Use year keys for all values: `{2025: value}`
- Region must be one of the ITL 1 regions (see below)
- All monetary values in pounds (not pence)
- UK tax year runs April 6 to April 5 (but use calendar year in code)
**Important Entity Difference:**
- UK uses **benunits** (benefit units): a single adult OR couple + dependent children
- This is the assessment unit for most means-tested benefits
- Unlike US which uses families/marital_units/tax_units/spm_units
### 2. Creating Simulations
```python
from policyengine_uk import Simulation
# Create simulation from situation
simulation = Simulation(situation=situation)
# Calculate variables
income_tax = simulation.calculate("income_tax", 2025)
universal_credit = simulation.calculate("universal_credit", 2025)
household_net_income = simulation.calculate("household_net_income", 2025)
```
**Common Variables:**
**Income:**
- `employment_income` - Gross employment earnings
- `self_employment_income` - Self-employment profits
- `pension_income` - Private pension income
- `property_income` - Rental income
- `savings_interest_income` - Interest income
- `dividend_income` - Dividend income
- `miscellaneous_income` - Other income sources
**Tax Outputs:**
- `income_tax` - Total income tax liability
- `national_insurance` - Total NI contributions
- `council_tax` - Council tax liability
- `VAT` - Value Added Tax paid
**Benefits:**
- `universal_credit` - Universal Credit
- `child_benefit` - Child Benefit
- `pension_credit` - Pension Credit
- `working_tax_credit` - Working Tax Credit (legacy)
- `child_tax_credit` - Child Tax Credit (legacy)
- `personal_independence_payment` - PIP
- `attendance_allowance` - Attendance Allowance
- `state_pension` - State Pension
**Summary:**
- `household_net_income` - Income after taxes and benefits
- `disposable_income` - Income after taxes
- `equivalised_household_net_income` - Adjusted for household size
### 3. Using Axes for Parameter Sweeps
To vary a parameter across multiple values:
```python
situation = {
# ... normal situation setup ...
"axes": [[{
"name": "employment_income",
"count": 1001,
"min": 0,
"max": 100000,
"period": 2025
}]]
}
simulation = Simulation(situation=situation)
# Now calculate() returns arrays of 1001 values
incomes = simulation.calculate("employment_income", 2025) # Array of 1001 values
taxes = simulation.calculate("income_tax", 2025) # Array of 1001 values
```
**Important:** Remove axes before creating single-point simulations:
```python
situation_single = situation.copy()
situation_single.pop("axes", None)
simulation = Simulation(situation=situation_single)
```
### 4. Policy Reforms
```python
from policyengine_uk import Simulation
# Define a reform (modifies parameters)
reform = {
"gov.hmrc.income_tax.rates.uk.brackets[0].rate": {
"2025-01-01.2100-12-31": 0.25 # Increase basic rate to 25%
}
}
# Create simulation with reform
simulation = Simulation(situation=situation, reform=reform)
```
## Common Patterns
### Pattern 1: Single Person Household Calculation
```python
from policyengine_uk import Simulation
situation = {
"people": {
"person": {
"age": {2025: 30},
"employment_income": {2025: 30000}
}
},
"benunits": {
"benunit": {
"members": ["person"]
}
},
"households": {
"household": {
"members": ["person"],
"region": {2025: "LONDON"}
}
}
}
sim = Simulation(situation=situation)
income_tax = sim.calculate("income_tax", 2025)[0]
national_insurance = sim.calculate("national_insurance", 2025)[0]
universal_credit = sim.calculate("universal_credit", 2025)[0]
```
### Pattern 2: Couple with Children
```python
situation = {
"people": {
"parent_1": {
"age": {2025: 35},
"employment_income": {2025: 35000}
},
"parent_2": {
"age": {2025: 33},
"employment_income": {2025: 25000}
},
"child_1": {
"age": {2025: 8}
},
"child_2": {
"age": {2025: 5}
}
},
"benunits": {
"benunit": {
"members": ["parent_1", "parent_2", "child_1", "child_2"]
}
},
"households": {
"household": {
"members": ["parent_1", "parent_2", "child_1", "child_2"],
"region": {2025: "NORTH_WEST"}
}
}
}
sim = Simulation(situation=situation)
child_benefit = sim.calculate("child_benefit", 2025)[0]
universal_credit = sim.calculate("universal_credit", 2025)[0]
```
### Pattern 3: Marginal Tax Rate Analysis
```python
# Create baseline with axes varying income
situation_with_axes = {
"people": {
"person": {
"age": {2025: 30}
}
},
"benunits": {"benunit": {"members": ["person"]}},
"households": {
"household": {
"members": ["person"],
"region": {2025: "LONDON"}
}
},
"axes": [[{
"name": "employment_income",
"count": 1001,
"min": 0,
"max": 100000,
"period": 2025
}]]
}
sim = Simulation(situation=situation_with_axes)
incomes = sim.calculate("employment_income", 2025)
net_incomes = sim.calculate("household_net_income", 2025)
# Calculate marginal tax rate
import numpy as np
mtr = 1 - (np.gradient(net_incomes) / np.gradient(incomes))
```
### Pattern 4: Regional Comparison
```python
regions = ["LONDON", "SCOTLAND", "WALES", "NORTH_EAST"]
results = {}
for region in regions:
situation = create_situation(region=region, income=30000)
sim = Simulation(situation=situation)
results[region] = {
"income_tax": sim.calculate("income_tax", 2025)[0],
"national_insurance": sim.calculate("national_insurance", 2025)[0],
"total_tax": sim.calculate("income_tax", 2025)[0] +
sim.calculate("national_insurance", 2025)[0]
}
```
### Pattern 5: Policy Reform Impact
```python
from policyengine_uk import Microsimulation, Reform
# Define reform: Increase basic rate to 25%
class IncreaseBasicRate(Reform):
def apply(self):
def modify_parameters(parameters):
parameters.gov.hmrc.income_tax.rates.uk.brackets[0].rate.update(
period="year:2025:10", value=0.25
)
return parameters
self.modify_parameters(modify_parameters)
# Run microsimulation
baseline = Microsimulation()
reformed = Microsimulation(reform=IncreaseBasicRate)
# Calculate revenue impact
baseline_revenue = baseline.calc("income_tax", 2025).sum()
reformed_revenue = reformed.calc("income_tax", 2025).sum()
revenue_change = (reformed_revenue - baseline_revenue) / 1e9 # in billions
# Calculate household impact
baseline_net_income = baseline.calc("household_net_income", 2025)
reformed_net_income = reformed.calc("household_net_income", 2025)
```
## Helper Scripts
This skill includes helper scripts in the `scripts/` directory:
```python
from policyengine_uk_skills.situation_helpers import (
create_single_person,
create_couple,
create_family_with_children,
add_region
)
# Quick situation creation
situation = create_single_person(
income=30000,
region="LONDON",
age=30
)
# Create couple
situation = create_couple(
income_1=35000,
income_2=25000,
region="SCOTLAND"
)
```
## Common Pitfalls and Solutions
### Pitfall 1: Member Lists Out of Sync
**Problem:** Different entities have different members
```python
# WRONG
"benunits": {"benunit": {"members": ["parent"]}},
"households": {"household": {"members": ["parent", "child"]}}
```
**Solution:** Keep all entity member lists consistent:
```python
# CORRECT
all_members = ["parent", "child"]
"benunits": {"benunit": {"members": all_members}},
"households": {"household": {"members": all_members}}
```
### Pitfall 2: Forgetting Year Keys
**Problem:** `"age": 35` instead of `"age": {2025: 35}`
**Solution:** Always use year dictionary:
```python
"age": {2025: 35},
"employment_income": {2025: 30000}
```
### Pitfall 3: Wrong Region Format
**Problem:** Using lowercase or incorrect region names
**Solution:** Use uppercase ITL 1 region codes:
```python
# CORRECT regions:
"region": {2025: "LONDON"}
"region": {2025: "SCOTLAND"}
"region": {2025: "WALES"}
"region": {2025: "NORTH_EAST"}
"region": {2025: "SOUTH_EAST"}
```
### Pitfall 4: Axes Persistence
**Problem:** Axes remain in situation when creating single-point simulation
**Solution:** Remove axes before single-point simulation:
```python
situation_single = situation.copy()
situation_single.pop("axes", None)
```
### Pitfall 5: Missing Benunits
**Problem:** Forgetting to include benunits (benefit units)
**Solution:** Always include benunits in UK simulations:
```python
# UK requires benunits
situation = {
"people": {...},
"benunits": {"benunit": {"members": [...]}}, # Required!
"households": {...}
}
```
## Regions in PolicyEngine UK
UK uses ITL 1 (International Territorial Level 1, formerly NUTS 1) regions:
**Regions:**
- `NORTH_EAST` - North East England
- `NORTH_WEST` - North West England
- `YORKSHIRE` - Yorkshire and the Humber
- `EAST_MIDLANDS` - East Midlands
- `WEST_MIDLANDS` - West Midlands
- `EAST_OF_ENGLAND` - East of England
- `LONDON` - London
- `SOUTH_EAST` - South East England
- `SOUTH_WEST` - South West England
- `WALES` - Wales
- `SCOTLAND` - Scotland
- `NORTHERN_IRELAND` - Northern Ireland
**Regional Tax Variations:**
**Scotland:**
- Has devolved income tax with 6 bands (starter 19%, basic 20%, intermediate 21%, higher 42%, advanced 45%, top 47%)
- Scottish residents automatically calculated with Scottish rates
**Wales:**
- Has Welsh Rate of Income Tax (WRIT)
- Currently maintains parity with England/NI rates
**England/Northern Ireland:**
- Standard UK rates: basic 20%, higher 40%, additional 45%
## Key Parameters and Values (2025/26)
### Income Tax
- **Personal Allowance:** £12,570
- **Basic rate threshold:** £50,270
- **Higher rate threshold:** £125,140
- **Rates:** 20% (basic), 40% (higher), 45% (additional)
- **Personal allowance tapering:** £1 reduction for every £2 over £100,000
### National Insurance (Class 1)
- **Lower Earnings Limit:** £6,396/year
- **Primary Threshold:** £12,570/year
- **Upper Earnings Limit:** £50,270/year
- **Rates:** 12% (between primary and upper), 2% (above upper)
### Universal Credit
- **Standard allowance:** Varies by single/couple and age
- **Taper rate:** 55% (rate at which UC reduced as income increases)
- **Work allowance:** Amount you can earn before UC reduced
### Child Benefit
- **First child:** Higher rate
- **Subsequent children:** Lower rate
- **High Income Charge:** Tapered withdrawal starting at £60,000
## Version Compatibility
- Use `policyengine-uk>=1.0.0` for 2025 calculations
- Check version: `import policyengine_uk; print(policyengine_uk.__version__)`
- Different years may require different package versions
## Debugging Tips
1. **Enable tracing:**
```python
simulation.trace = True
result = simulation.calculate("variable_name", 2025)
```
2. **Check intermediate calculations:**
```python
gross_income = simulation.calculate("gross_income", 2025)
disposable_income = simulation.calculate("disposable_income", 2025)
```
3. **Verify situation structure:**
```python
import json
print(json.dumps(situation, indent=2))
```
4. **Test with PolicyEngine web app:**
- Go to policyengine.org/uk/household
- Enter same inputs
- Compare results
## Additional Resources
- **Documentation:** https://policyengine.org/uk/docs
- **API Reference:** https://github.com/PolicyEngine/policyengine-uk
- **Variable Explorer:** https://policyengine.org/uk/variables
- **Parameter Explorer:** https://policyengine.org/uk/parameters
## Examples Directory
See `examples/` for complete working examples:
- `single_person.yaml` - Single person household
- `couple.yaml` - Couple without children
- `family_with_children.yaml` - Family with dependents
- `universal_credit_sweep.yaml` - Analyzing UC with axes
## Key Differences from US System
1. **Benefit Units:** UK uses `benunits` (single/couple + children) instead of US multiple entity types
2. **Universal Credit:** Consolidated means-tested benefit (vs separate SNAP, TANF, etc. in US)
3. **National Insurance:** Separate from income tax with own thresholds (vs US Social Security tax)
4. **Devolved Taxes:** Scotland and Wales have different income tax rates
5. **Tax Year:** April 6 to April 5 (vs calendar year in US)
6. **No State Variation:** Council Tax is local, but most taxes/benefits are national (vs 50 US states)

View File

@@ -0,0 +1,29 @@
# Example: Couple without children in Scotland
# Person 1: £35,000, Age: 35
# Person 2: £25,000, Age: 33
people:
person_1:
age:
2025: 35
employment_income:
2025: 35000
person_2:
age:
2025: 33
employment_income:
2025: 25000
benunits:
benunit:
members:
- person_1
- person_2
households:
household:
members:
- person_1
- person_2
region:
2025: SCOTLAND

View File

@@ -0,0 +1,41 @@
# Example: Family with children in Wales
# Parent 1: £35,000, Age: 35
# Parent 2: £25,000, Age: 33
# Child 1: Age 8
# Child 2: Age 5
people:
parent_1:
age:
2025: 35
employment_income:
2025: 35000
parent_2:
age:
2025: 33
employment_income:
2025: 25000
child_1:
age:
2025: 8
child_2:
age:
2025: 5
benunits:
benunit:
members:
- parent_1
- parent_2
- child_1
- child_2
households:
household:
members:
- parent_1
- parent_2
- child_1
- child_2
region:
2025: WALES

View File

@@ -0,0 +1,21 @@
# Example: Single person household in London
# Income: £30,000, Age: 30
people:
person:
age:
2025: 30
employment_income:
2025: 30000
benunits:
benunit:
members:
- person
households:
household:
members:
- person
region:
2025: LONDON

View File

@@ -0,0 +1,38 @@
# Example: Analyzing Universal Credit with income variation
# Single parent with 2 children in North West
# Sweeps employment income from £0 to £50,000
people:
parent:
age:
2025: 30
child_1:
age:
2025: 8
child_2:
age:
2025: 5
benunits:
benunit:
members:
- parent
- child_1
- child_2
households:
household:
members:
- parent
- child_1
- child_2
region:
2025: NORTH_WEST
# Axes: Vary employment income from £0 to £50,000
axes:
- - name: employment_income
count: 1001
min: 0
max: 50000
period: 2025

View File

@@ -0,0 +1,339 @@
"""
Helper functions for creating PolicyEngine-UK situations.
These utilities simplify the creation of situation dictionaries
for common household configurations.
"""
CURRENT_YEAR = 2025
# UK ITL 1 regions
VALID_REGIONS = [
"NORTH_EAST",
"NORTH_WEST",
"YORKSHIRE",
"EAST_MIDLANDS",
"WEST_MIDLANDS",
"EAST_OF_ENGLAND",
"LONDON",
"SOUTH_EAST",
"SOUTH_WEST",
"WALES",
"SCOTLAND",
"NORTHERN_IRELAND"
]
def create_single_person(income, region="LONDON", age=30, **kwargs):
"""
Create a situation for a single person household.
Args:
income (float): Employment income
region (str): ITL 1 region (e.g., "LONDON", "SCOTLAND")
age (int): Person's age
**kwargs: Additional person attributes (e.g., self_employment_income)
Returns:
dict: PolicyEngine situation dictionary
"""
if region not in VALID_REGIONS:
raise ValueError(f"Invalid region. Must be one of: {', '.join(VALID_REGIONS)}")
person_attrs = {
"age": {CURRENT_YEAR: age},
"employment_income": {CURRENT_YEAR: income},
}
person_attrs.update({k: {CURRENT_YEAR: v} for k, v in kwargs.items()})
return {
"people": {"person": person_attrs},
"benunits": {"benunit": {"members": ["person"]}},
"households": {
"household": {
"members": ["person"],
"region": {CURRENT_YEAR: region}
}
}
}
def create_couple(
income_1, income_2=0, region="LONDON", age_1=35, age_2=35, **kwargs
):
"""
Create a situation for a couple without children.
Args:
income_1 (float): First person's employment income
income_2 (float): Second person's employment income
region (str): ITL 1 region
age_1 (int): First person's age
age_2 (int): Second person's age
**kwargs: Additional household attributes
Returns:
dict: PolicyEngine situation dictionary
"""
if region not in VALID_REGIONS:
raise ValueError(f"Invalid region. Must be one of: {', '.join(VALID_REGIONS)}")
members = ["person_1", "person_2"]
household_attrs = {
"members": members,
"region": {CURRENT_YEAR: region}
}
household_attrs.update({k: {CURRENT_YEAR: v} for k, v in kwargs.items()})
return {
"people": {
"person_1": {
"age": {CURRENT_YEAR: age_1},
"employment_income": {CURRENT_YEAR: income_1}
},
"person_2": {
"age": {CURRENT_YEAR: age_2},
"employment_income": {CURRENT_YEAR: income_2}
}
},
"benunits": {"benunit": {"members": members}},
"households": {"household": household_attrs}
}
def create_family_with_children(
parent_income,
num_children=1,
child_ages=None,
region="LONDON",
parent_age=35,
couple=False,
partner_income=0,
**kwargs
):
"""
Create a situation for a family with children.
Args:
parent_income (float): Primary parent's employment income
num_children (int): Number of children
child_ages (list): List of child ages (defaults to [5, 8, 12, ...])
region (str): ITL 1 region
parent_age (int): Parent's age
couple (bool): Whether this is a couple household
partner_income (float): Partner's income if couple
**kwargs: Additional household attributes
Returns:
dict: PolicyEngine situation dictionary
"""
if region not in VALID_REGIONS:
raise ValueError(f"Invalid region. Must be one of: {', '.join(VALID_REGIONS)}")
if child_ages is None:
child_ages = [5 + i * 3 for i in range(num_children)]
elif len(child_ages) != num_children:
raise ValueError("Length of child_ages must match num_children")
people = {
"parent": {
"age": {CURRENT_YEAR: parent_age},
"employment_income": {CURRENT_YEAR: parent_income}
}
}
members = ["parent"]
if couple:
people["partner"] = {
"age": {CURRENT_YEAR: parent_age},
"employment_income": {CURRENT_YEAR: partner_income}
}
members.append("partner")
for i, age in enumerate(child_ages):
child_id = f"child_{i+1}"
people[child_id] = {"age": {CURRENT_YEAR: age}}
members.append(child_id)
household_attrs = {
"members": members,
"region": {CURRENT_YEAR: region}
}
household_attrs.update({k: {CURRENT_YEAR: v} for k, v in kwargs.items()})
return {
"people": people,
"benunits": {"benunit": {"members": members}},
"households": {"household": household_attrs}
}
def add_income_sources(
situation,
person_id=None,
self_employment_income=0,
pension_income=0,
property_income=0,
savings_interest_income=0,
dividend_income=0,
miscellaneous_income=0
):
"""
Add additional income sources to a person in an existing situation.
Args:
situation (dict): Existing PolicyEngine situation
person_id (str): Person ID to add income to (defaults to first person)
self_employment_income (float): Self-employment income
pension_income (float): Private pension income
property_income (float): Rental income
savings_interest_income (float): Interest income
dividend_income (float): Dividend income
miscellaneous_income (float): Other income
Returns:
dict: Updated situation with additional income
"""
# Get person ID
if person_id is None:
person_id = list(situation["people"].keys())[0]
# Add income sources
if self_employment_income > 0:
situation["people"][person_id]["self_employment_income"] = {
CURRENT_YEAR: self_employment_income
}
if pension_income > 0:
situation["people"][person_id]["pension_income"] = {
CURRENT_YEAR: pension_income
}
if property_income > 0:
situation["people"][person_id]["property_income"] = {
CURRENT_YEAR: property_income
}
if savings_interest_income > 0:
situation["people"][person_id]["savings_interest_income"] = {
CURRENT_YEAR: savings_interest_income
}
if dividend_income > 0:
situation["people"][person_id]["dividend_income"] = {
CURRENT_YEAR: dividend_income
}
if miscellaneous_income > 0:
situation["people"][person_id]["miscellaneous_income"] = {
CURRENT_YEAR: miscellaneous_income
}
return situation
def add_axes(situation, variable_name, min_val, max_val, count=1001):
"""
Add axes to a situation for parameter sweeps.
Args:
situation (dict): Existing PolicyEngine situation
variable_name (str): Variable to vary (e.g., "employment_income")
min_val (float): Minimum value
max_val (float): Maximum value
count (int): Number of points (default: 1001)
Returns:
dict: Updated situation with axes
"""
situation["axes"] = [[{
"name": variable_name,
"count": count,
"min": min_val,
"max": max_val,
"period": CURRENT_YEAR
}]]
return situation
def set_region(situation, region):
"""
Set or change the region for a household.
Args:
situation (dict): Existing PolicyEngine situation
region (str): ITL 1 region (e.g., "LONDON", "SCOTLAND")
Returns:
dict: Updated situation
"""
if region not in VALID_REGIONS:
raise ValueError(f"Invalid region. Must be one of: {', '.join(VALID_REGIONS)}")
household_id = list(situation["households"].keys())[0]
situation["households"][household_id]["region"] = {CURRENT_YEAR: region}
return situation
def create_pensioner_household(
pension_income,
state_pension_income=0,
region="LONDON",
age=70,
couple=False,
partner_pension_income=0,
partner_age=68,
**kwargs
):
"""
Create a situation for a pensioner household.
Args:
pension_income (float): Private pension income
state_pension_income (float): State pension income
region (str): ITL 1 region
age (int): Pensioner's age
couple (bool): Whether this is a couple household
partner_pension_income (float): Partner's pension income if couple
partner_age (int): Partner's age if couple
**kwargs: Additional household attributes
Returns:
dict: PolicyEngine situation dictionary
"""
if region not in VALID_REGIONS:
raise ValueError(f"Invalid region. Must be one of: {', '.join(VALID_REGIONS)}")
people = {
"pensioner": {
"age": {CURRENT_YEAR: age},
"pension_income": {CURRENT_YEAR: pension_income},
"state_pension": {CURRENT_YEAR: state_pension_income}
}
}
members = ["pensioner"]
if couple:
people["partner"] = {
"age": {CURRENT_YEAR: partner_age},
"pension_income": {CURRENT_YEAR: partner_pension_income},
"state_pension": {CURRENT_YEAR: 0}
}
members.append("partner")
household_attrs = {
"members": members,
"region": {CURRENT_YEAR: region}
}
household_attrs.update({k: {CURRENT_YEAR: v} for k, v in kwargs.items()})
return {
"people": people,
"benunits": {"benunit": {"members": members}},
"households": {"household": household_attrs}
}

View File

@@ -0,0 +1,524 @@
---
name: policyengine-us
description: PolicyEngine-US tax and benefit microsimulation patterns, situation creation, and common workflows
---
# PolicyEngine-US
PolicyEngine-US models the US federal and state tax and benefit system.
## For Users 👥
### What is PolicyEngine-US?
PolicyEngine-US is the "calculator" for US taxes and benefits. When you use policyengine.org/us, PolicyEngine-US runs behind the scenes.
**What it models:**
**Federal taxes:**
- Income tax (with standard/itemized deductions)
- Payroll tax (Social Security, Medicare)
- Capital gains tax
**Federal benefits:**
- Earned Income Tax Credit (EITC)
- Child Tax Credit (CTC)
- SNAP (food stamps)
- WIC, ACA premium tax credits
- Social Security, SSI, TANF
**State programs (varies by state):**
- State income tax (all 50 states + DC)
- State EITC, CTC
- State-specific benefits
**See full list:** https://policyengine.org/us/parameters
### Understanding Variables
When you see results in PolicyEngine, these are variables:
**Income variables:**
- `employment_income` - W-2 wages
- `self_employment_income` - 1099 income
- `qualified_dividend_income` - Dividends
- `capital_gains` - Capital gains
**Tax variables:**
- `income_tax` - Federal income tax
- `state_income_tax` - State income tax
- `payroll_tax` - FICA taxes
**Benefit variables:**
- `eitc` - Earned Income Tax Credit
- `ctc` - Child Tax Credit
- `snap` - SNAP benefits
**Summary variables:**
- `household_net_income` - Income after taxes and benefits
- `household_tax` - Total taxes
- `household_benefits` - Total benefits
## For Analysts 📊
### Installation and Setup
```bash
# Install PolicyEngine-US
pip install policyengine-us
# Or with uv (recommended)
uv pip install policyengine-us
```
### Quick Start
```python
from policyengine_us import Simulation
# Create a household
situation = {
"people": {
"you": {
"age": {2024: 30},
"employment_income": {2024: 50000}
}
},
"families": {"family": {"members": ["you"]}},
"marital_units": {"marital_unit": {"members": ["you"]}},
"tax_units": {"tax_unit": {"members": ["you"]}},
"spm_units": {"spm_unit": {"members": ["you"]}},
"households": {
"household": {
"members": ["you"],
"state_name": {2024: "CA"}
}
}
}
# Calculate taxes and benefits
sim = Simulation(situation=situation)
income_tax = sim.calculate("income_tax", 2024)[0]
eitc = sim.calculate("eitc", 2024)[0]
print(f"Income tax: ${income_tax:,.0f}")
print(f"EITC: ${eitc:,.0f}")
```
### Web App to Python
**Web app URL:**
```
policyengine.org/us/household?household=12345
```
**Equivalent Python (conceptually):**
The household ID represents a situation dictionary. To replicate in Python, you'd create a similar situation.
### When to Use This Skill
- Creating household situations for tax/benefit calculations
- Running microsimulations with PolicyEngine-US
- Analyzing policy reforms and their impacts
- Building tools that use PolicyEngine-US (calculators, analysis notebooks)
- Debugging PolicyEngine-US calculations
## For Contributors 💻
### Repository
**Location:** PolicyEngine/policyengine-us
**To see current implementation:**
```bash
git clone https://github.com/PolicyEngine/policyengine-us
cd policyengine-us
# Explore structure
tree policyengine_us/
```
**Key directories:**
```bash
ls policyengine_us/
# - variables/ - Tax and benefit calculations
# - parameters/ - Policy rules (YAML)
# - reforms/ - Pre-defined reforms
# - tests/ - Test cases
```
## Core Concepts
### 1. Situation Dictionary Structure
PolicyEngine requires a nested dictionary defining household composition and characteristics:
```python
situation = {
"people": {
"person_id": {
"age": {2024: 35},
"employment_income": {2024: 50000},
# ... other person attributes
}
},
"families": {
"family_id": {"members": ["person_id", ...]}
},
"marital_units": {
"marital_unit_id": {"members": ["person_id", ...]}
},
"tax_units": {
"tax_unit_id": {"members": ["person_id", ...]}
},
"spm_units": {
"spm_unit_id": {"members": ["person_id", ...]}
},
"households": {
"household_id": {
"members": ["person_id", ...],
"state_name": {2024: "CA"}
}
}
}
```
**Key Rules:**
- All entities must have consistent member lists
- Use year keys for all values: `{2024: value}`
- State must be two-letter code (e.g., "CA", "NY", "TX")
- All monetary values in dollars (not cents)
### 2. Creating Simulations
```python
from policyengine_us import Simulation
# Create simulation from situation
simulation = Simulation(situation=situation)
# Calculate variables
income_tax = simulation.calculate("income_tax", 2024)
eitc = simulation.calculate("eitc", 2024)
household_net_income = simulation.calculate("household_net_income", 2024)
```
**Common Variables:**
**Income:**
- `employment_income` - W-2 wages
- `self_employment_income` - 1099/business income
- `qualified_dividend_income` - Qualified dividends
- `capital_gains` - Capital gains
- `interest_income` - Interest income
- `social_security` - Social Security benefits
- `pension_income` - Pension/retirement income
**Deductions:**
- `charitable_cash_donations` - Cash charitable giving
- `real_estate_taxes` - State and local property taxes
- `mortgage_interest` - Mortgage interest deduction
- `medical_expense` - Medical and dental expenses
- `casualty_loss` - Casualty and theft losses
**Tax Outputs:**
- `income_tax` - Total federal income tax
- `payroll_tax` - FICA taxes
- `state_income_tax` - State income tax
- `household_tax` - Total taxes (federal + state + local)
**Benefits:**
- `eitc` - Earned Income Tax Credit
- `ctc` - Child Tax Credit
- `snap` - SNAP benefits
- `household_benefits` - Total benefits
**Summary:**
- `household_net_income` - Income minus taxes plus benefits
### 3. Using Axes for Parameter Sweeps
To vary a parameter across multiple values:
```python
situation = {
# ... normal situation setup ...
"axes": [[{
"name": "employment_income",
"count": 1001,
"min": 0,
"max": 200000,
"period": 2024
}]]
}
simulation = Simulation(situation=situation)
# Now calculate() returns arrays of 1001 values
incomes = simulation.calculate("employment_income", 2024) # Array of 1001 values
taxes = simulation.calculate("income_tax", 2024) # Array of 1001 values
```
**Important:** Remove axes before creating single-point simulations:
```python
situation_single = situation.copy()
situation_single.pop("axes", None)
simulation = Simulation(situation=situation_single)
```
### 4. Policy Reforms
```python
from policyengine_us import Simulation
# Define a reform (modifies parameters)
reform = {
"gov.irs.credits.ctc.amount.base_amount": {
"2024-01-01.2100-12-31": 5000 # Increase CTC to $5000
}
}
# Create simulation with reform
simulation = Simulation(situation=situation, reform=reform)
```
## Common Patterns
### Pattern 1: Single Household Calculation
```python
from policyengine_us import Simulation
situation = {
"people": {
"parent": {
"age": {2024: 35},
"employment_income": {2024: 60000}
},
"child": {
"age": {2024: 5}
}
},
"families": {"family": {"members": ["parent", "child"]}},
"marital_units": {"marital_unit": {"members": ["parent"]}},
"tax_units": {"tax_unit": {"members": ["parent", "child"]}},
"spm_units": {"spm_unit": {"members": ["parent", "child"]}},
"households": {
"household": {
"members": ["parent", "child"],
"state_name": {2024: "NY"}
}
}
}
sim = Simulation(situation=situation)
income_tax = sim.calculate("income_tax", 2024)[0]
ctc = sim.calculate("ctc", 2024)[0]
```
### Pattern 2: Marginal Tax Rate Analysis
```python
# Create baseline with axes varying income
situation_with_axes = {
# ... situation setup ...
"axes": [[{
"name": "employment_income",
"count": 1001,
"min": 0,
"max": 200000,
"period": 2024
}]]
}
sim = Simulation(situation=situation_with_axes)
incomes = sim.calculate("employment_income", 2024)
taxes = sim.calculate("income_tax", 2024)
# Calculate marginal tax rate
import numpy as np
mtr = np.gradient(taxes) / np.gradient(incomes)
```
### Pattern 3: Charitable Donation Impact
```python
# Baseline (no donation)
situation_baseline = create_situation(income=100000, donation=0)
sim_baseline = Simulation(situation=situation_baseline)
tax_baseline = sim_baseline.calculate("income_tax", 2024)[0]
# With donation
situation_donation = create_situation(income=100000, donation=5000)
sim_donation = Simulation(situation=situation_donation)
tax_donation = sim_donation.calculate("income_tax", 2024)[0]
# Tax savings from donation
tax_savings = tax_baseline - tax_donation
effective_discount = tax_savings / 5000 # e.g., 0.24 = 24% discount
```
### Pattern 4: State Comparison
```python
states = ["CA", "NY", "TX", "FL"]
results = {}
for state in states:
situation = create_situation(state=state, income=75000)
sim = Simulation(situation=situation)
results[state] = {
"state_income_tax": sim.calculate("state_income_tax", 2024)[0],
"total_tax": sim.calculate("household_tax", 2024)[0]
}
```
## Helper Scripts
This skill includes helper scripts in the `scripts/` directory:
```python
from policyengine_skills.situation_helpers import (
create_single_filer,
create_married_couple,
create_family_with_children,
add_itemized_deductions
)
# Quick situation creation
situation = create_single_filer(
income=50000,
state="CA",
age=30
)
# Add deductions
situation = add_itemized_deductions(
situation,
charitable_donations=5000,
mortgage_interest=10000,
real_estate_taxes=8000
)
```
## Common Pitfalls and Solutions
### Pitfall 1: Member Lists Out of Sync
**Problem:** Different entities have different members
```python
# WRONG
"tax_units": {"tax_unit": {"members": ["parent"]}},
"households": {"household": {"members": ["parent", "child"]}}
```
**Solution:** Keep all entity member lists consistent:
```python
# CORRECT
all_members = ["parent", "child"]
"families": {"family": {"members": all_members}},
"tax_units": {"tax_unit": {"members": all_members}},
"households": {"household": {"members": all_members}}
```
### Pitfall 2: Forgetting Year Keys
**Problem:** `"age": 35` instead of `"age": {2024: 35}`
**Solution:** Always use year dictionary:
```python
"age": {2024: 35},
"employment_income": {2024: 50000}
```
### Pitfall 3: Net Taxes vs Gross Taxes
**Problem:** Forgetting to subtract benefits from taxes
**Solution:** Use proper calculation:
```python
# Net taxes (what household actually pays)
net_tax = sim.calculate("household_tax", 2024) - \
sim.calculate("household_benefits", 2024)
```
### Pitfall 4: Axes Persistence
**Problem:** Axes remain in situation when creating single-point simulation
**Solution:** Remove axes before single-point simulation:
```python
situation_single = situation.copy()
situation_single.pop("axes", None)
```
### Pitfall 5: State-Specific Variables
**Problem:** Using NYC-specific variables without `in_nyc: True`
**Solution:** Set NYC flag for NY residents in NYC:
```python
"households": {
"household": {
"state_name": {2024: "NY"},
"in_nyc": {2024: True} # Required for NYC taxes
}
}
```
## NYC Handling
For New York City residents:
```python
situation = {
# ... people setup ...
"households": {
"household": {
"members": ["person"],
"state_name": {2024: "NY"},
"in_nyc": {2024: True} # Enable NYC tax calculations
}
}
}
```
## Version Compatibility
- Always use `policyengine-us>=1.155.0` for 2024 calculations
- Check version: `import policyengine_us; print(policyengine_us.__version__)`
- Different years may require different package versions
## Debugging Tips
1. **Enable tracing:**
```python
simulation.trace = True
result = simulation.calculate("variable_name", 2024)
```
2. **Check intermediate calculations:**
```python
agi = simulation.calculate("adjusted_gross_income", 2024)
taxable_income = simulation.calculate("taxable_income", 2024)
```
3. **Verify situation structure:**
```python
import json
print(json.dumps(situation, indent=2))
```
4. **Test with PolicyEngine web app:**
- Go to policyengine.org/us/household
- Enter same inputs
- Compare results
## Additional Resources
- **Documentation:** https://policyengine.org/us/docs
- **API Reference:** https://github.com/PolicyEngine/policyengine-us
- **Example Notebooks:** https://github.com/PolicyEngine/analysis-notebooks
- **Variable Explorer:** https://policyengine.org/us/variables
## Examples Directory
See `examples/` for complete working examples:
- `single_filer.yaml` - Single person household
- `married_couple.yaml` - Married filing jointly
- `family_with_children.yaml` - Family with dependents
- `itemized_deductions.yaml` - Using itemized deductions
- `donation_sweep.yaml` - Analyzing donation impacts with axes

View File

@@ -0,0 +1,71 @@
# Example: Analyzing charitable donation impacts using axes
# Married couple with 2 children in New York
# Sweeps charitable donations from $0 to $50,000
people:
parent_1:
age:
2024: 35
employment_income:
2024: 100000
parent_2:
age:
2024: 35
employment_income:
2024: 50000
child_1:
age:
2024: 8
child_2:
age:
2024: 5
families:
family:
members:
- parent_1
- parent_2
- child_1
- child_2
marital_units:
marital_unit:
members:
- parent_1
- parent_2
- child_1
- child_2
tax_units:
tax_unit:
members:
- parent_1
- parent_2
- child_1
- child_2
spm_units:
spm_unit:
members:
- parent_1
- parent_2
- child_1
- child_2
households:
household:
members:
- parent_1
- parent_2
- child_1
- child_2
state_name:
2024: NY
# Axes: Vary charitable donations from $0 to $50,000
axes:
- - name: charitable_cash_donations
count: 1001
min: 0
max: 50000
period: 2024

View File

@@ -0,0 +1,38 @@
# Example: Single tax filer in California
# Income: $60,000, Age: 30, with charitable donations
people:
person:
age:
2024: 30
employment_income:
2024: 60000
charitable_cash_donations:
2024: 5000
families:
family:
members:
- person
marital_units:
marital_unit:
members:
- person
tax_units:
tax_unit:
members:
- person
spm_units:
spm_unit:
members:
- person
households:
household:
members:
- person
state_name:
2024: CA

View File

@@ -0,0 +1,257 @@
"""
Helper functions for creating PolicyEngine-US situations.
These utilities simplify the creation of situation dictionaries
for common household configurations.
"""
CURRENT_YEAR = 2024
def create_single_filer(income, state="CA", age=35, **kwargs):
"""
Create a situation for a single tax filer.
Args:
income (float): Employment income
state (str): Two-letter state code (e.g., "CA", "NY")
age (int): Person's age
**kwargs: Additional person attributes (e.g., self_employment_income)
Returns:
dict: PolicyEngine situation dictionary
"""
person_attrs = {
"age": {CURRENT_YEAR: age},
"employment_income": {CURRENT_YEAR: income},
}
person_attrs.update({k: {CURRENT_YEAR: v} for k, v in kwargs.items()})
return {
"people": {"person": person_attrs},
"families": {"family": {"members": ["person"]}},
"marital_units": {"marital_unit": {"members": ["person"]}},
"tax_units": {"tax_unit": {"members": ["person"]}},
"spm_units": {"spm_unit": {"members": ["person"]}},
"households": {
"household": {
"members": ["person"],
"state_name": {CURRENT_YEAR: state}
}
}
}
def create_married_couple(
income_1, income_2=0, state="CA", age_1=35, age_2=35, **kwargs
):
"""
Create a situation for a married couple filing jointly.
Args:
income_1 (float): First spouse's employment income
income_2 (float): Second spouse's employment income
state (str): Two-letter state code
age_1 (int): First spouse's age
age_2 (int): Second spouse's age
**kwargs: Additional household attributes
Returns:
dict: PolicyEngine situation dictionary
"""
members = ["spouse_1", "spouse_2"]
household_attrs = {
"members": members,
"state_name": {CURRENT_YEAR: state}
}
household_attrs.update({k: {CURRENT_YEAR: v} for k, v in kwargs.items()})
return {
"people": {
"spouse_1": {
"age": {CURRENT_YEAR: age_1},
"employment_income": {CURRENT_YEAR: income_1}
},
"spouse_2": {
"age": {CURRENT_YEAR: age_2},
"employment_income": {CURRENT_YEAR: income_2}
}
},
"families": {"family": {"members": members}},
"marital_units": {"marital_unit": {"members": members}},
"tax_units": {"tax_unit": {"members": members}},
"spm_units": {"spm_unit": {"members": members}},
"households": {"household": household_attrs}
}
def create_family_with_children(
parent_income,
num_children=1,
child_ages=None,
state="CA",
parent_age=35,
married=False,
spouse_income=0,
**kwargs
):
"""
Create a situation for a family with children.
Args:
parent_income (float): Primary parent's employment income
num_children (int): Number of children
child_ages (list): List of child ages (defaults to [5, 8, 12, ...])
state (str): Two-letter state code
parent_age (int): Parent's age
married (bool): Whether parents are married
spouse_income (float): Spouse's income if married
**kwargs: Additional household attributes
Returns:
dict: PolicyEngine situation dictionary
"""
if child_ages is None:
child_ages = [5 + i * 3 for i in range(num_children)]
elif len(child_ages) != num_children:
raise ValueError("Length of child_ages must match num_children")
people = {
"parent": {
"age": {CURRENT_YEAR: parent_age},
"employment_income": {CURRENT_YEAR: parent_income}
}
}
members = ["parent"]
if married:
people["spouse"] = {
"age": {CURRENT_YEAR: parent_age},
"employment_income": {CURRENT_YEAR: spouse_income}
}
members.append("spouse")
for i, age in enumerate(child_ages):
child_id = f"child_{i+1}"
people[child_id] = {"age": {CURRENT_YEAR: age}}
members.append(child_id)
household_attrs = {
"members": members,
"state_name": {CURRENT_YEAR: state}
}
household_attrs.update({k: {CURRENT_YEAR: v} for k, v in kwargs.items()})
return {
"people": people,
"families": {"family": {"members": members}},
"marital_units": {
"marital_unit": {
"members": members if married else ["parent"]
}
},
"tax_units": {"tax_unit": {"members": members}},
"spm_units": {"spm_unit": {"members": members}},
"households": {"household": household_attrs}
}
def add_itemized_deductions(
situation,
charitable_donations=0,
mortgage_interest=0,
real_estate_taxes=0,
medical_expenses=0,
casualty_losses=0
):
"""
Add itemized deductions to an existing situation.
Adds deductions to the first person in the situation.
Args:
situation (dict): Existing PolicyEngine situation
charitable_donations (float): Cash charitable contributions
mortgage_interest (float): Mortgage interest paid
real_estate_taxes (float): State and local property taxes
medical_expenses (float): Medical and dental expenses
casualty_losses (float): Casualty and theft losses
Returns:
dict: Updated situation with deductions
"""
# Get first person ID
first_person = list(situation["people"].keys())[0]
# Add deductions
if charitable_donations > 0:
situation["people"][first_person]["charitable_cash_donations"] = {
CURRENT_YEAR: charitable_donations
}
if mortgage_interest > 0:
situation["people"][first_person]["mortgage_interest"] = {
CURRENT_YEAR: mortgage_interest
}
if real_estate_taxes > 0:
situation["people"][first_person]["real_estate_taxes"] = {
CURRENT_YEAR: real_estate_taxes
}
if medical_expenses > 0:
situation["people"][first_person]["medical_expense"] = {
CURRENT_YEAR: medical_expenses
}
if casualty_losses > 0:
situation["people"][first_person]["casualty_loss"] = {
CURRENT_YEAR: casualty_losses
}
return situation
def add_axes(situation, variable_name, min_val, max_val, count=1001):
"""
Add axes to a situation for parameter sweeps.
Args:
situation (dict): Existing PolicyEngine situation
variable_name (str): Variable to vary (e.g., "employment_income")
min_val (float): Minimum value
max_val (float): Maximum value
count (int): Number of points (default: 1001)
Returns:
dict: Updated situation with axes
"""
situation["axes"] = [[{
"name": variable_name,
"count": count,
"min": min_val,
"max": max_val,
"period": CURRENT_YEAR
}]]
return situation
def set_state_nyc(situation, in_nyc=True):
"""
Set state to NY and configure NYC residence.
Args:
situation (dict): Existing PolicyEngine situation
in_nyc (bool): Whether household is in NYC
Returns:
dict: Updated situation
"""
household_id = list(situation["households"].keys())[0]
situation["households"][household_id]["state_name"] = {CURRENT_YEAR: "NY"}
situation["households"][household_id]["in_nyc"] = {CURRENT_YEAR: in_nyc}
return situation

View File

@@ -0,0 +1,526 @@
---
name: policyengine-writing
description: PolicyEngine writing style for blog posts, documentation, PR descriptions, and research reports - emphasizing active voice, quantitative precision, and neutral tone
---
# PolicyEngine Writing Skill
Use this skill when writing blog posts, documentation, PR descriptions, research reports, or any public-facing PolicyEngine content.
## When to Use This Skill
- Writing blog posts about policy analysis
- Creating PR descriptions
- Drafting documentation
- Writing research reports
- Composing social media posts
- Creating newsletters
- Writing README files
## Core Principles
PolicyEngine's writing emphasizes clarity, precision, and objectivity.
1. **Active voice** - Prefer active constructions over passive
2. **Direct and quantitative** - Use specific numbers, avoid vague adjectives/adverbs
3. **Sentence case** - Use sentence case for headings, not title case
4. **Neutral tone** - Describe what policies do, not whether they're good or bad
5. **Precise language** - Choose exact verbs over vague modifiers
## Active Voice
Active voice makes writing clearer and more direct.
**✅ Correct (Active):**
```
Harris proposes expanding the Earned Income Tax Credit
The reform reduces poverty by 3.2%
PolicyEngine projects higher costs than other organizations
We estimate the ten-year costs
The bill lowers the state's top income tax rate
Montana raises the EITC from 10% to 20%
```
**❌ Wrong (Passive):**
```
The Earned Income Tax Credit is proposed to be expanded by Harris
Poverty is reduced by 3.2% by the reform
Higher costs are projected by PolicyEngine
The ten-year costs are estimated
The state's top income tax rate is lowered by the bill
The EITC is raised from 10% to 20% by Montana
```
## Quantitative and Precise
Replace vague modifiers with specific numbers and measurements.
**✅ Correct (Quantitative):**
```
Costs the state $245 million
Benefits 77% of Montana residents
Lowers the Supplemental Poverty Measure by 0.8%
Raises net income by $252 in 2026
The reform affects 14.3 million households
Hours worked falls by 0.27%, or 411,000 full-time equivalent jobs
The top decile receives an average benefit of $1,033
PolicyEngine projects costs 40% higher than the Tax Foundation
```
**❌ Wrong (Vague adjectives/adverbs):**
```
Significantly costs the state
Benefits most Montana residents
Greatly lowers poverty
Substantially raises net income
The reform affects many households
Hours worked falls considerably
High earners receive large benefits
PolicyEngine projects much higher costs
```
## Sentence Case for Headings
Use sentence case (capitalize only the first word and proper nouns) for all headings.
**✅ Correct (Sentence case):**
```
## The proposal
## Nationwide impacts
## Household impacts
## Statewide impacts 2026
## Case study: the End Child Poverty Act
## Key findings
```
**❌ Wrong (Title case):**
```
## The Proposal
## Nationwide Impacts
## Household Impacts
## Statewide Impacts 2026
## Case Study: The End Child Poverty Act
## Key Findings
```
## Neutral, Objective Tone
Describe what policies do without value judgments. Let readers draw their own conclusions from the data.
**✅ Correct (Neutral):**
```
The reform reduces poverty by 3.2% and raises inequality by 0.16%
Single filers with earnings between $8,000 and $37,000 see their net incomes increase
The tax changes raise the net income of 75.9% of residents
PolicyEngine projects higher costs than other organizations
The top income decile receives 42% of total benefits
```
**❌ Wrong (Value judgments):**
```
The reform successfully reduces poverty by 3.2% but unfortunately raises inequality
Low-income workers finally see their net incomes increase
The tax changes benefit most residents
PolicyEngine provides more accurate cost estimates
The wealthiest households receive a disproportionate share of benefits
```
## Precise Verbs Over Adverbs
Choose specific verbs instead of generic verbs modified by adverbs.
**✅ Correct (Precise verbs):**
```
The bill lowers the top rate from 5.9% to 5.4%
The policy raises the maximum credit from $632 to $1,774
The reform increases the phase-in rate from 7.65% to 15.3%
This doubles Montana's EITC from 10% to 20%
The change eliminates the age cap
```
**❌ Wrong (Vague verbs + adverbs):**
```
The bill significantly changes the top rate
The policy substantially increases the maximum credit
The reform greatly boosts the phase-in rate
This dramatically expands Montana's EITC
The change completely removes the age cap
```
## Concrete Examples
Always include specific household examples with precise numbers.
**✅ Correct:**
```
For a single adult with no children and $10,000 of earnings, the tax provisions
increase their net income by $69 in 2026 and $68 in 2027, solely from the
doubled EITC match.
A single parent of two kids with an annual income of $50,000 will see a $252
increase to their net income: $179 from the expanded EITC, and $73 from the
lower bracket threshold.
A married couple with no dependents and $200,000 of earnings will see their
liability drop by $1,306 in 2027.
```
**❌ Wrong:**
```
Low-income workers see modest increases to their net income from the
expanded EITC.
Families with children benefit substantially from the tax changes.
High earners also see significant reductions in their tax liability.
```
## Tables and Data
Use tables liberally to present data clearly. Always include units and context.
**Example 1: Tax parameters over time**
| Year | Phase-in rate | Max credit | Phase-out start | Phase-out rate |
| ---- | ------------- | ---------- | --------------- | -------------- |
| 2025 | 15.3% | $1,774 | $13,706 | 15.3% |
| 2026 | 15.3% | $1,815 | $14,022 | 15.3% |
| 2027 | 15.3% | $1,852 | $14,306 | 15.3% |
**Example 2: Household impacts**
| Household composition | 2026 net income change | 2027 net income change |
| ------------------------------ | ---------------------- | ---------------------- |
| Single, no children, $10,000 | $66 | $68 |
| Single, two children, $50,000 | $252 | $266 |
| Married, no children, $200,000 | $853 | $1,306 |
**Example 3: Ten-year costs**
| Year | Federal cost ($ billions) |
| ------- | ------------------------- |
| 2025 | 14.3 |
| 2026 | 14.4 |
| 2027 | 14.7 |
| 2025-34 | 143.7 |
## Avoid Superlatives
Replace superlative claims with specific comparisons.
**✅ Correct:**
```
PolicyEngine projects costs 40% higher than the Tax Foundation
The top decile receives an average benefit of $1,033
The reform reduces child poverty by 3.2 percentage points
This represents Montana's largest income tax cut since 2021
```
**❌ Wrong:**
```
PolicyEngine provides the most accurate cost projections
The wealthiest households receive massive benefits
The reform dramatically slashes child poverty
This is Montana's largest income tax cut in history
```
## Structure and Organization
Follow a clear hierarchical structure with key findings up front.
**Standard blog post structure:**
```markdown
# Title (H1)
Opening paragraph states what happened and when, with a link to PolicyEngine.
Key results in [year]:
- Cost: $245 million
- Benefits: 77% of residents
- Poverty impact: Reduces SPM by 0.8%
- Inequality impact: Raises Gini by 0.16%
## The proposal (H2)
Detailed description of the policy changes, often with a table showing
the specific parameter values.
## Household impacts (H2)
Specific examples for representative household types.
### Example 1: Single filer (H3)
Detailed calculation...
### Example 2: Family with children (H3)
Detailed calculation...
## Statewide impacts (H2)
Population-level analysis with charts and tables.
### Budgetary impact (H3)
Cost/revenue estimates...
### Distributional impact (H3)
Winners/losers by income decile...
### Poverty and inequality (H3)
Impact on poverty rates and inequality measures...
## Methodology (H2)
Explanation of data sources, modeling approach, and caveats.
```
## Common Patterns
### Opening Paragraphs
State the facts directly with dates, actors, and actions:
```
[On April 28, 2025], Governor Greg Gianforte (R-MT) signed House Bill 337,
a bill that amends Montana's individual income tax code.
Vice President Harris proposes expanding the Earned Income Tax Credit (EITC)
for filers without qualifying dependents.
In her economic plan, Harris proposes to restore the expanded Earned Income
Tax Credit for workers without children to its level under the American
Rescue Plan Act in 2021.
```
### Key Findings Format
Lead with bullet points of quantitative results:
```
Key results in 2027:
- Costs the state $245 million
- Benefits 77% of Montana residents
- Lowers the Supplemental Poverty Measure by 0.8%
- Raises the Gini index by 0.16%
```
### Methodological Transparency
Always specify the model, version, and assumptions:
```
Based on static microsimulation modeling with PolicyEngine US (version 1.103.0),
we project the following economic impacts for 2025.
Assuming no behavioral responses, we project that the EITC expansion will cost
the federal government $14.3 billion in 2025.
Incorporating elasticities of labor supply used by the Congressional Budget Office
increases the reform's cost.
Over the ten-year budget window, this amounts to $143.7 billion.
```
### Household Examples
Always include the household composition, income, and specific dollar impacts:
```
For a single adult with no children and $10,000 of earnings, the tax provisions
increase their net income by $69 in 2026 and $68 in 2027.
A single parent of two kids with an annual income of $50,000 will see a $252
increase to their net income due to House Bill 337: $179 from the expanded EITC,
and $73 from the lower bracket threshold.
```
## Examples in Context
### Blog Post Opening
**✅ Correct:**
```
On April 28, 2025, Governor Gianforte signed House Bill 337, which lowers
Montana's top income tax rate from 5.9% to 5.4% and doubles the state EITC
from 10% to 20% of the federal credit.
Key results in 2027:
- Costs the state $245 million
- Benefits 77% of Montana residents
- Lowers the Supplemental Poverty Measure by 0.8%
- Raises the Gini index by 0.16%
Use PolicyEngine to view the full results or calculate the effect on your
household.
```
**❌ Wrong:**
```
On April 28, 2025, Governor Gianforte made history by signing an amazing new
tax cut bill that will dramatically help Montana families. House Bill 337
significantly reduces tax rates and greatly expands the EITC.
This groundbreaking reform will:
- Cost the state money
- Help most residents
- Reduce poverty substantially
- Impact inequality
Check out PolicyEngine to see how much you could save!
```
### PR Description
**✅ Correct:**
```
## Summary
This PR adds Claude Code plugin configuration to enable automated installation
of agents and skills for PolicyEngine development.
## Changes
- Add plugin auto-install configuration in .claude/settings.json
- Configure auto-install of country-models plugin from PolicyEngine/policyengine-claude
## Benefits
- Access to 15 specialized agents
- 3 slash commands (/encode-policy, /review-pr, /fix-pr)
- 2 skills (policyengine-us-skill, policyengine-standards-skill)
## Testing
After merging, team members trust the repo and the plugin auto-installs.
```
**❌ Wrong:**
```
## Summary
This amazing PR adds incredible new Claude Code plugin support that will
revolutionize PolicyEngine development!
## Changes
- Adds some configuration files
- Sets up plugins and stuff
## Benefits
- Gets you lots of cool new features
- Makes development much easier
- Provides great new tools
## Testing
Should work great once merged!
```
### Documentation
**✅ Correct:**
```
## Installation
Install PolicyEngine-US from PyPI:
```bash
pip install policyengine-us
```
This installs version 1.103.0 or later, which includes support for 2025
tax parameters.
```
**❌ Wrong:**
```
## Installation
Simply install PolicyEngine-US:
```bash
pip install policyengine-us
```
This will install the latest version with all the newest features!
```
## Special Cases
### Comparisons to Other Organizations
State facts neutrally without claiming superiority:
**✅ Correct:**
```
PolicyEngine projects higher costs than other organizations when considering
behavioral responses.
| Organization | Cost, 2025-2034 ($ billions) |
| ------------------------- | ---------------------------- |
| PolicyEngine (static) | 144 |
| PolicyEngine (dynamic) | 201 |
| Tax Foundation | 157 |
| Penn Wharton Budget Model | 135 |
```
**❌ Wrong:**
```
PolicyEngine provides more accurate estimates than other organizations.
Unlike other models that underestimate costs, PolicyEngine correctly accounts
for behavioral responses to project a more realistic $201 billion cost.
```
### Discussing Limitations
Acknowledge limitations directly without hedging:
**✅ Correct:**
```
## Caveats
The Current Population Survey has several limitations for tax microsimulation:
- Truncates high incomes for privacy, underestimating tax impacts on high earners
- Underestimates benefit receipt compared to administrative totals
- Reflects 2020 data with 2025 policy parameters
- Lacks detail for specific income types (assumes all capital gains are long-term)
```
**❌ Wrong:**
```
## Caveats
While our model is highly sophisticated, like all models it has some potential
limitations that users should be aware of:
- The data might not perfectly capture high incomes
- Benefits may be slightly underestimated
- We do our best to extrapolate older data to current years
```
## Writing Checklist
Before publishing, verify:
- [ ] Use active voice throughout
- [ ] Include specific numbers for all claims
- [ ] Use sentence case for all headings
- [ ] Maintain neutral, objective tone
- [ ] Choose precise verbs over vague adverbs
- [ ] Include concrete household examples
- [ ] Present data in tables
- [ ] Avoid all superlatives
- [ ] Structure with clear hierarchy
- [ ] Open with key quantitative findings
- [ ] Specify model version and assumptions
- [ ] Link to PolicyEngine when relevant
- [ ] Acknowledge limitations directly
## Resources
- **Example posts**: See `policyengine-app/src/posts/articles/` for reference implementations
- **PolicyEngine app**: https://policyengine.org for linking to analyses
- **Microsimulation docs**: https://policyengine.org/us/docs for methodology details