--- name: policyengine-analysis description: Common analysis patterns for PolicyEngine research repositories (CRFB, newsletters, dashboards, impact studies) --- # PolicyEngine Analysis Patterns for creating policy impact analyses, dashboards, and research using PolicyEngine. ## For Users 👥 ### What are Analysis Repositories? Analysis repositories produce the research you see on PolicyEngine: **Blog posts:** - "How Montana's tax cuts affect poverty" - "Harris EITC proposal costs and impacts" - "UK Budget 2024 analysis" **Dashboards:** - State tax comparisons - Policy proposal scorecards - Interactive calculators (GiveCalc, SALT calculator) **Research reports:** - Distributional analyses for organizations - Policy briefs for legislators - Impact assessments ### How Analysis Works 1. **Define policy reform** using PolicyEngine parameters 2. **Create household examples** showing specific impacts 3. **Run population simulations** for aggregate effects 4. **Calculate distributional impacts** (who wins, who loses) 5. **Create visualizations** (charts, tables) 6. **Write report** following policyengine-writing-skill style 7. **Publish** to blog or share with stakeholders ### Reading PolicyEngine Analysis **Key sections in typical analysis:** **The proposal:** - What policy changes - Specific parameter values **Household impacts:** - 3-5 example households - Dollar amounts for each - Charts showing impact across income range **Statewide/national impacts:** - Total cost or revenue - Winners and losers by income decile - Poverty and inequality effects **See policyengine-writing-skill for writing conventions.** ## For Analysts 📊 ### When to Use This Skill - Creating policy impact analyses - Building interactive dashboards with Streamlit/Plotly - Writing analysis notebooks - Calculating distributional impacts - Comparing policy proposals - Creating visualizations for research - Publishing policy research ### Example Analysis Repositories - `crfb-tob-impacts` - Policy impact analyses - `newsletters` - Data-driven newsletters - `2024-election-dashboard` - Policy comparison dashboards - `marginal-child` - Specialized policy analyses - `givecalc` - Charitable giving calculator ## Repository Structure Standard analysis repository structure: ``` analysis-repo/ ├── analysis.ipynb # Main Jupyter notebook ├── app.py # Streamlit app (if applicable) ├── requirements.txt # Python dependencies ├── README.md # Documentation ├── data/ # Data files (if needed) ├── outputs/ # Generated charts, tables └── .streamlit/ # Streamlit config └── config.toml ``` ## Common Analysis Patterns ### Pattern 1: Impact Analysis Across Income Distribution ```python import pandas as pd import numpy as np from policyengine_us import Simulation # Define reform reform = { "gov.irs.credits.ctc.amount.base_amount": { "2024-01-01.2100-12-31": 5000 } } # Analyze across income distribution incomes = np.linspace(0, 200000, 101) results = [] for income in incomes: # Baseline situation = create_situation(income=income) sim_baseline = Simulation(situation=situation) tax_baseline = sim_baseline.calculate("income_tax", 2024)[0] # Reform sim_reform = Simulation(situation=situation, reform=reform) tax_reform = sim_reform.calculate("income_tax", 2024)[0] results.append({ "income": income, "tax_baseline": tax_baseline, "tax_reform": tax_reform, "tax_change": tax_reform - tax_baseline }) df = pd.DataFrame(results) ``` ### Pattern 2: Household-Level Case Studies ```python # Define representative households households = { "Single, No Children": { "income": 40000, "num_children": 0, "married": False }, "Single Parent, 2 Children": { "income": 50000, "num_children": 2, "married": False }, "Married, 2 Children": { "income": 100000, "num_children": 2, "married": True } } # Calculate impacts for each case_studies = {} for name, params in households.items(): situation = create_family(**params) sim_baseline = Simulation(situation=situation) sim_reform = Simulation(situation=situation, reform=reform) case_studies[name] = { "baseline_tax": sim_baseline.calculate("income_tax", 2024)[0], "reform_tax": sim_reform.calculate("income_tax", 2024)[0], "ctc_baseline": sim_baseline.calculate("ctc", 2024)[0], "ctc_reform": sim_reform.calculate("ctc", 2024)[0] } case_df = pd.DataFrame(case_studies).T ``` ### Pattern 3: State-by-State Comparison ```python states = ["CA", "NY", "TX", "FL", "PA", "OH", "IL", "MI"] state_results = [] for state in states: situation = create_situation(income=75000, state=state) sim_baseline = Simulation(situation=situation) sim_reform = Simulation(situation=situation, reform=reform) state_results.append({ "state": state, "baseline_net_income": sim_baseline.calculate("household_net_income", 2024)[0], "reform_net_income": sim_reform.calculate("household_net_income", 2024)[0], "change": (sim_reform.calculate("household_net_income", 2024)[0] - sim_baseline.calculate("household_net_income", 2024)[0]) }) state_df = pd.DataFrame(state_results) ``` ### Pattern 4: Marginal Analysis (Winners/Losers) ```python import plotly.graph_objects as go # Calculate across income range situation_with_axes = { # ... setup ... "axes": [[{ "name": "employment_income", "count": 1001, "min": 0, "max": 200000, "period": 2024 }]] } sim_baseline = Simulation(situation=situation_with_axes) sim_reform = Simulation(situation=situation_with_axes, reform=reform) incomes = sim_baseline.calculate("employment_income", 2024) baseline_net = sim_baseline.calculate("household_net_income", 2024) reform_net = sim_reform.calculate("household_net_income", 2024) gains = reform_net - baseline_net # Identify winners and losers winners = gains > 0 losers = gains < 0 neutral = gains == 0 print(f"Winners: {winners.sum() / len(gains) * 100:.1f}%") print(f"Losers: {losers.sum() / len(gains) * 100:.1f}%") print(f"Neutral: {neutral.sum() / len(gains) * 100:.1f}%") ``` ## Visualization Patterns ### Standard Plotly Configuration ```python import plotly.graph_objects as go # PolicyEngine brand colors TEAL = "#39C6C0" BLUE = "#2C6496" DARK_GRAY = "#616161" def create_pe_layout(title, xaxis_title, yaxis_title): """Create standard PolicyEngine chart layout.""" return go.Layout( title=title, xaxis_title=xaxis_title, yaxis_title=yaxis_title, font=dict(family="Roboto Serif", size=14), plot_bgcolor="white", hovermode="x unified", xaxis=dict( showgrid=True, gridcolor="lightgray", zeroline=True ), yaxis=dict( showgrid=True, gridcolor="lightgray", zeroline=True ) ) # Use in charts fig = go.Figure(layout=create_pe_layout( "Tax Impact by Income", "Income", "Tax Change" )) fig.add_trace(go.Scatter(x=incomes, y=tax_change, line=dict(color=TEAL))) ``` ### Common Chart Types **1. Line Chart (Impact by Income)** ```python fig = go.Figure() fig.add_trace(go.Scatter( x=df.income, y=df.tax_change, mode='lines', name='Tax Change', line=dict(color=TEAL, width=3) )) fig.update_layout( title="Tax Impact by Income Level", xaxis_title="Income", yaxis_title="Tax Change ($)", xaxis_tickformat="$,.0f", yaxis_tickformat="$,.0f" ) ``` **2. Bar Chart (State Comparison)** ```python fig = go.Figure() fig.add_trace(go.Bar( x=state_df.state, y=state_df.change, marker_color=TEAL )) fig.update_layout( title="Net Income Change by State", xaxis_title="State", yaxis_title="Change ($)", yaxis_tickformat="$,.0f" ) ``` **3. Waterfall Chart (Budget Impact)** ```python fig = go.Figure(go.Waterfall( x=["Baseline", "Tax Credit", "Phase-out", "Reform"], y=[baseline_revenue, credit_cost, phaseout_revenue, 0], measure=["absolute", "relative", "relative", "total"], connector={"line": {"color": "gray"}} )) ``` ## Streamlit Dashboard Patterns ### Basic Streamlit Setup ```python import streamlit as st from policyengine_us import Simulation st.set_page_config(page_title="Policy Analysis", layout="wide") st.title("Policy Impact Calculator") # User inputs col1, col2, col3 = st.columns(3) with col1: income = st.number_input("Income", value=60000, step=5000) with col2: state = st.selectbox("State", ["CA", "NY", "TX", "FL"]) with col3: num_children = st.number_input("Children", value=0, min_value=0, max_value=10) # Calculate if st.button("Calculate"): situation = create_family( parent_income=income, num_children=num_children, state=state ) sim_baseline = Simulation(situation=situation) sim_reform = Simulation(situation=situation, reform=reform) # Display results col1, col2, col3 = st.columns(3) with col1: st.metric( "Baseline Tax", f"${sim_baseline.calculate('income_tax', 2024)[0]:,.0f}" ) with col2: st.metric( "Reform Tax", f"${sim_reform.calculate('income_tax', 2024)[0]:,.0f}" ) with col3: change = (sim_reform.calculate('income_tax', 2024)[0] - sim_baseline.calculate('income_tax', 2024)[0]) st.metric("Change", f"${change:,.0f}", delta=f"${-change:,.0f}") ``` ### Interactive Chart with Streamlit ```python # Create chart based on user inputs incomes = np.linspace(0, income_max, 1001) results = [] for income in incomes: situation = create_situation(income=income, state=selected_state) sim = Simulation(situation=situation, reform=reform) results.append(sim.calculate("household_net_income", 2024)[0]) fig = go.Figure() fig.add_trace(go.Scatter(x=incomes, y=results, line=dict(color=TEAL))) st.plotly_chart(fig, use_container_width=True) ``` ## Jupyter Notebook Best Practices ### Notebook Structure ```python # Cell 1: Title and Description """ # Policy Analysis: [Policy Name] **Date:** [Date] **Author:** [Your Name] ## Summary Brief description of the analysis and key findings. """ # Cell 2: Imports import pandas as pd import numpy as np import plotly.graph_objects as go from policyengine_us import Simulation # Cell 3: Configuration YEAR = 2024 STATES = ["CA", "NY", "TX", "FL"] # Cell 4+: Analysis sections with markdown headers ``` ### Export Results ```python # Save DataFrame df.to_csv("outputs/impact_analysis.csv", index=False) # Save Plotly chart fig.write_html("outputs/chart.html") fig.write_image("outputs/chart.png", width=1200, height=600) # Save summary statistics summary = { "total_winners": winners.sum(), "total_losers": losers.sum(), "avg_gain": gains[winners].mean(), "avg_loss": gains[losers].mean() } pd.DataFrame([summary]).to_csv("outputs/summary.csv", index=False) ``` ## Repository-Specific Examples This skill includes example templates in the `examples/` directory: - `impact_analysis_template.ipynb` - Standard impact analysis - `dashboard_template.py` - Streamlit dashboard - `state_comparison.py` - State-by-state analysis - `case_studies.py` - Household case studies - `reform_definitions.py` - Common reform patterns ## Common Pitfalls ### Pitfall 1: Not Using Consistent Year **Problem:** Mixing 2024 and 2025 calculations **Solution:** Define year constant at top: ```python CURRENT_YEAR = 2024 # Use everywhere simulation.calculate("income_tax", CURRENT_YEAR) ``` ### Pitfall 2: Inefficient Simulations **Problem:** Creating new simulation for each income level **Solution:** Use axes for efficiency: ```python # SLOW for income in incomes: situation = create_situation(income=income) sim = Simulation(situation=situation) results.append(sim.calculate("income_tax", 2024)[0]) # FAST situation_with_axes = create_situation_with_axes(incomes) sim = Simulation(situation=situation_with_axes) results = sim.calculate("income_tax", 2024) # Array of all results ``` ### Pitfall 3: Forgetting to Compare Baseline and Reform **Problem:** Only showing reform results **Solution:** Always show both: ```python results = { "baseline": sim_baseline.calculate("income_tax", 2024), "reform": sim_reform.calculate("income_tax", 2024), "change": reform - baseline } ``` ## PolicyEngine API Usage For larger-scale analyses, use the PolicyEngine API: ```python import requests def calculate_via_api(situation, reform=None): """Calculate using PolicyEngine API.""" url = "https://api.policyengine.org/us/calculate" payload = { "household": situation, "policy_id": reform_id if reform else baseline_policy_id } response = requests.post(url, json=payload) return response.json() ``` ## Testing Analysis Code ```python import pytest def test_reform_increases_ctc(): """Test that reform increases CTC as expected.""" situation = create_family(income=50000, num_children=2) sim_baseline = Simulation(situation=situation) sim_reform = Simulation(situation=situation, reform=reform) ctc_baseline = sim_baseline.calculate("ctc", 2024)[0] ctc_reform = sim_reform.calculate("ctc", 2024)[0] assert ctc_reform > ctc_baseline, "Reform should increase CTC" assert ctc_reform == 5000 * 2, "CTC should be $5000 per child" ``` ## Documentation Standards ### README Template ```markdown # [Analysis Name] ## Overview Brief description of the analysis. ## Key Findings - Finding 1 - Finding 2 - Finding 3 ## Methodology Explanation of approach and data sources. ## How to Run \```bash pip install -r requirements.txt python app.py # or jupyter notebook analysis.ipynb \``` ## Outputs - `outputs/chart1.png` - Description - `outputs/results.csv` - Description ## Contact PolicyEngine Team - hello@policyengine.org ``` ## Additional Resources - **PolicyEngine API Docs:** https://policyengine.org/us/api - **Analysis Examples:** https://github.com/PolicyEngine/analysis-notebooks - **Streamlit Docs:** https://docs.streamlit.io - **Plotly Docs:** https://plotly.com/python/