Initial commit

2025-11-30 08:47:43 +08:00
commit 2e8d89fca3
41 changed files with 14051 additions and 0 deletions
--- a/commands/create-pr.md
+++ b/commands/create-pr.md
@@ -0,0 +1,328 @@
+---
+description: Create PR as draft, wait for CI to pass, then mark ready for review (solves the "I'll check back later" problem)
+---
+
+# Creating PR with CI Verification
+
+**IMPORTANT:** This command solves the common problem where Claude creates a PR, says "I'll wait for CI", then gives up. This command ACTUALLY waits for CI completion.
+
+## Workflow
+
+This command will:
+1. ✅ Create PR as draft
+2. ✅ Wait for CI checks to complete (actually wait, not give up)
+3. ✅ Mark as ready for review if CI passes
+4. ✅ Report failures with links if CI fails
+
+## Step 1: Determine Current Branch and Changes
+
+```bash
+# Get current branch
+CURRENT_BRANCH=$(git branch --show-current)
+
+# Get base branch (usually main or master)
+BASE_BRANCH=$(git remote show origin | grep 'HEAD branch' | cut -d' ' -f5)
+
+# Verify we're not on main/master
+if [[ "$CURRENT_BRANCH" == "main" || "$CURRENT_BRANCH" == "master" ]]; then
+  echo "ERROR: Cannot create PR from main/master branch"
+  echo "Create a feature branch first: git checkout -b feature-name"
+  exit 1
+fi
+
+# Check if branch is pushed
+git rev-parse --verify origin/$CURRENT_BRANCH > /dev/null 2>&1
+if [ $? -ne 0 ]; then
+  echo "Branch not pushed yet. Pushing now..."
+  git push -u origin $CURRENT_BRANCH
+fi
+```
+
+## Step 2: Gather PR Context
+
+Run in parallel to gather information:
+
+```bash
+# 1. Git status
+git status
+
+# 2. Diff since divergence from base
+git diff ${BASE_BRANCH}...HEAD
+
+# 3. Commit history for this branch
+git log ${BASE_BRANCH}..HEAD --oneline
+
+# 4. Check if PR already exists
+gh pr view --json number,state,isDraft 2>/dev/null
+```
+
+**If PR already exists:**
+```bash
+EXISTING_PR=$(gh pr view --json number --jq '.number')
+echo "PR #$EXISTING_PR already exists for this branch"
+echo "Use /review-pr $EXISTING_PR or /fix-pr $EXISTING_PR instead"
+exit 0
+```
+
+## Step 3: Create PR Title and Body
+
+Based on the changes (read from git diff and commit history):
+
+**Title:** Concise summary of changes (50 chars max)
+
+**Body format:**
+```markdown
+## Summary
+- Bullet point 1
+- Bullet point 2
+
+## Changes
+- Specific change 1
+- Specific change 2
+
+## Testing
+- Test approach
+- Verification steps
+
+## Checklist
+- [ ] Tests pass locally
+- [ ] Code formatted (make format)
+- [ ] Changelog entry created (if applicable)
+
+---
+
+🤖 Generated with [Claude Code](https://claude.com/claude-code)
+
+Co-Authored-By: Claude <noreply@anthropic.com>
+```
+
+## Step 4: Create PR as Draft
+
+**CRITICAL:** Always create as draft initially.
+
+```bash
+# Create PR as draft
+gh pr create \
+  --title "PR Title Here" \
+  --body "$(cat <<'EOF'
+PR body here
+EOF
+)" \
+  --draft
+
+# Get PR number
+PR_NUMBER=$(gh pr view --json number --jq '.number')
+echo "Created draft PR #$PR_NUMBER"
+echo "URL: $(gh pr view --json url --jq '.url')"
+```
+
+## Step 5: Wait for CI Checks (DO NOT GIVE UP!)
+
+**This is where Claude usually fails. DO NOT say "I'll check back later".**
+
+**Instead, ACTUALLY wait using a polling loop with NO timeout:**
+
+```bash
+echo "Waiting for CI checks to complete..."
+echo "Typical CI times by repository:"
+echo "  - policyengine-app: 3-6 minutes"
+echo "  - policyengine-uk: 9-12 minutes"
+echo "  - policyengine-api: 30 minutes"
+echo "  - policyengine-us: 60-75 minutes (longest)"
+echo ""
+echo "I will poll every 15 seconds until all checks complete. No time limit."
+
+POLL_INTERVAL=15  # Check every 15 seconds
+ELAPSED=0
+
+while true; do
+  # Get check status
+  CHECKS_JSON=$(gh pr checks $PR_NUMBER --json name,status,conclusion)
+
+  # Count checks
+  TOTAL=$(echo "$CHECKS_JSON" | jq '. | length')
+
+  if [ "$TOTAL" -eq 0 ]; then
+    echo "No CI checks found yet. Waiting for checks to start..."
+    sleep $POLL_INTERVAL
+    ELAPSED=$((ELAPSED + POLL_INTERVAL))
+    continue
+  fi
+
+  # Count by status
+  COMPLETED=$(echo "$CHECKS_JSON" | jq '[.[] | select(.status == "COMPLETED")] | length')
+  FAILED=$(echo "$CHECKS_JSON" | jq '[.[] | select(.conclusion == "FAILURE")] | length')
+
+  echo "[$ELAPSED s] CI Checks: $COMPLETED/$TOTAL completed, $FAILED failed"
+
+  # If all completed
+  if [ "$COMPLETED" -eq "$TOTAL" ]; then
+    if [ "$FAILED" -eq 0 ]; then
+      echo "✅ All CI checks passed after $ELAPSED seconds!"
+      break
+    else
+      echo "❌ Some CI checks failed after $ELAPSED seconds."
+      # Show failures
+      gh pr checks $PR_NUMBER
+      break
+    fi
+  fi
+
+  # Continue waiting (no timeout!)
+  sleep $POLL_INTERVAL
+  ELAPSED=$((ELAPSED + POLL_INTERVAL))
+done
+```
+
+**Important:** No timeout means we actually wait as long as needed. Population simulations can take 30+ minutes.
+
+## Step 6: Mark Ready for Review (If CI Passed)
+
+```bash
+if [ "$FAILED" -eq 0 ] && [ "$COMPLETED" -eq "$TOTAL" ]; then
+  echo "Marking PR as ready for review..."
+  gh pr ready $PR_NUMBER
+
+  echo ""
+  echo "✅ PR #$PR_NUMBER is ready for review!"
+  echo "URL: $(gh pr view $PR_NUMBER --json url --jq '.url')"
+  echo ""
+  echo "All CI checks passed:"
+  gh pr checks $PR_NUMBER
+else
+  echo ""
+  echo "⚠️ PR remains as draft due to CI issues."
+  echo "URL: $(gh pr view $PR_NUMBER --json url --jq '.url')"
+  echo ""
+  echo "CI status:"
+  gh pr checks $PR_NUMBER
+  echo ""
+  echo "To fix and retry:"
+  echo "1. Fix the failing checks"
+  echo "2. Push changes: git push"
+  echo "3. Run this command again to re-check CI"
+fi
+```
+
+
+## Key Differences from Default Behavior
+
+**Old way (Claude gives up):**
+```
+Claude: "I've created the PR. CI will take a while, I'll check back later..."
+[Chat ends, Claude never checks back]
+User: Has to manually check CI and mark ready
+```
+
+**New way (this command actually waits):**
+```
+Claude: "I've created the PR as draft. Now waiting for CI..."
+[Actually polls CI status every 15 seconds]
+Claude: "CI passed! Marking as ready for review."
+[PR is ready, user doesn't have to do anything]
+```
+
+## Error Handling
+
+**If CI fails:**
+```bash
+echo "❌ CI checks failed. Here are the failures:"
+gh pr checks $PR_NUMBER
+
+echo ""
+echo "Common fixes:"
+echo "- Linting errors: make format && git push"
+echo "- Test failures: See logs at PR URL"
+echo "- Use /fix-pr $PR_NUMBER to attempt automated fixes"
+```
+
+**If no CI configured:**
+```bash
+if [ "$TOTAL" -eq 0 ] && [ $ELAPSED -gt 60 ]; then
+  echo "⚠️ No CI checks detected after 60 seconds."
+  echo "Repository may not have CI configured."
+  echo "Marking PR as ready for manual review."
+  gh pr ready $PR_NUMBER
+fi
+```
+
+## Usage Examples
+
+**Basic:**
+```bash
+# Make changes, commit
+git add .
+git commit -m "Add feature"
+
+# Create PR (command waits for CI)
+/create-pr
+```
+
+**With custom title:**
+```bash
+# Command can accept optional arguments
+/create-pr "Add California EITC implementation"
+```
+
+**Checking existing PR:**
+```bash
+# If PR exists, command tells you and suggests alternatives
+/create-pr
+# Output: "PR #123 exists. Use /review-pr 123 or /fix-pr 123"
+```
+
+## Integration with Other Commands
+
+**After /encode-policy or /fix-pr:**
+```bash
+# These commands might push code
+# Use /create-pr to create PR and wait for CI
+/create-pr
+```
+
+**After fixing CI issues:**
+```bash
+# Fix issues locally
+make format
+git add .
+git commit -m "Fix linting"
+git push
+
+# Command will detect existing PR and re-check CI
+/create-pr  # "PR #123 exists, checking CI status..."
+```
+
+## CI Duration Expectations
+
+**By repository (based on actual data):**
+- **policyengine-app:** 3-6 minutes (fast)
+- **policyengine-uk:** 9-12 minutes (medium)
+- **policyengine-api:** ~30 minutes (slow)
+- **policyengine-us:** 60-75 minutes (very slow - population simulations)
+
+**The command has no timeout** - it will wait as long as needed, even for policyengine-us's 75-minute CI runs.
+
+## Why This Works
+
+**Problem:** Claude's internal timeout or context leads it to give up
+
+**Solution:**
+1. ✅ Explicit polling loop (Claude sees the code will wait)
+2. ✅ Clear status updates (Claude knows it's still working)
+3. ✅ Timeout is explicit (Claude knows how long to wait)
+4. ✅ Fallback handling (What to do if timeout occurs)
+
+**Key insight:** By having the command contain the waiting logic in bash, Claude executes it and actually waits instead of just saying "I'll wait."
+
+## Notes
+
+**This command is essential for:**
+- Automated PR workflows
+- CI-dependent merges
+- Team workflows requiring CI validation
+- Reducing manual PR management
+
+**Use this instead of:**
+- Manual `gh pr create` followed by manual CI checking
+- Asking Claude to "wait for CI" (which it can't do reliably)
+- Creating ready PRs before CI passes
--- a/commands/encode-policy.md
+++ b/commands/encode-policy.md
@@ -0,0 +1,303 @@
+---
+description: Orchestrates multi-agent workflow to implement new government benefit programs
+---
+
+# Implementing $ARGUMENTS in PolicyEngine
+
+Coordinate the multi-agent workflow to implement $ARGUMENTS as a complete, production-ready government benefit program.
+
+## Program Type Detection
+
+This workflow adapts based on the type of program being implemented:
+
+**TANF/Benefit Programs** (e.g., state TANF, SNAP, WIC):
+- Phase 4: test-creator creates both unit and integration tests
+- Phase 7: Uses specialized @complete:country-models:tanf-program-reviewer agent in parallel validation
+- Optional phases: May be skipped for simplified implementations
+
+**Other Government Programs** (e.g., tax credits, deductions):
+- Phase 4: test-creator creates both unit and integration tests
+- Phase 7: Uses general @complete:country-models:implementation-validator agent in parallel validation
+- Optional phases: Include based on production requirements
+
+## Phase 0: Implementation Approach (TANF Programs Only)
+
+**For TANF programs, detect implementation approach from $ARGUMENTS:**
+
+**Auto-detect from user's request:**
+- If $ARGUMENTS contains "simple" or "simplified" → Use **Simplified** approach
+- If $ARGUMENTS contains "full" or "complete" → Use **Full** approach
+- If unclear → Default to **Simplified** approach
+
+**Simplified Implementation:**
+- Use federal baseline for gross income, demographic eligibility, immigration eligibility
+- Faster implementation
+- Suitable for most states that follow federal definitions
+- Only creates state-specific variables for: income limits, disregards, benefit amounts, final calculation
+
+**Full Implementation:**
+- Create state-specific income definitions, eligibility criteria
+- More detailed implementation
+- Required when state has unique income definitions or eligibility rules
+- Creates all state-specific variables
+
+**Record the detected approach and pass it to Phase 4 (rules-engineer).**
+
+## Phase 1: Issue and PR Setup
+
+Invoke @complete:country-models:issue-manager agent to:
+- Search for existing issue or create new one for $ARGUMENTS
+- Create draft PR immediately in PolicyEngine/policyengine-us repository (NOT personal fork)
+- Return issue number and PR URL for tracking
+
+## Phase 2: Variable Naming Convention
+Invoke @complete:country-models:naming-coordinator agent to:
+- Analyze existing naming patterns in the codebase
+- Establish variable naming convention for $ARGUMENTS
+- Analyze existing folder structure patterns in the codebase
+- Post naming decisions and folder structure to GitHub issue for all agents to reference
+
+**Quality Gate**: Naming convention and folder structure must be documented before proceeding to ensure consistency across parallel development.
+
+## Phase 3: Document Collection
+
+**Phase 3A: Initial Document Gathering**
+
+Invoke @complete:country-models:document-collector agent to gather official $ARGUMENTS documentation, save as `working_references.md` in the repository, and post to GitHub issue
+
+**After agent completes:**
+1. Check the agent's report for "📄 PDFs Requiring Extraction" section
+2. **Decision Point:**
+   - **If PDFs are CRITICAL** (State Plans with benefit formulas, calculation methodology):
+     - Ask the user: "Please send me these PDF URLs so I can extract their content:"
+     - List each PDF URL on a separate line
+     - Wait for user to send the URLs (they will auto-extract)
+     - Proceed to Phase 3B
+   - **If PDFs are SUPPLEMENTARY** (additional reference, not essential):
+     - Note them for future reference
+     - Proceed directly to Phase 4 with current documentation
+
+3. **If no PDFs listed:**
+   - Skip to Phase 4 (documentation complete)
+
+**Phase 3B: PDF Extraction & Complete Documentation** (Only if CRITICAL PDFs found)
+
+1. After receiving extracted PDF content from user:
+2. Relaunch @complete:country-models:document-collector agent with:
+   - Original task description
+   - Extracted PDF content included in prompt
+   - Instruction: "You are in Phase 2 - integrate this PDF content with your HTML research"
+3. Agent creates complete documentation
+
+**Quality Gate**: Documentation must include:
+- Official program guidelines or state plan
+- Income limits and benefit schedules
+- Eligibility criteria and priority groups
+- Seasonal/temporal rules if applicable
+- ✅ All critical PDFs extracted and integrated (if applicable)
+
+## Phase 4: Development (Parallel on Same Branch)
+
+Run both agents IN PARALLEL - they work on different folders so no conflicts:
+
+**@complete:country-models:test-creator** → works in `tests/` folder:
+- Create comprehensive INTEGRATION tests from documentation
+- Create UNIT tests for each variable that will have a formula
+- Both test types created in ONE invocation
+- Use only existing PolicyEngine variables
+- Test realistic calculations based on documentation
+
+**@complete:country-models:rules-engineer** → works in `variables/` + `parameters/` folders:
+- **Implementation Approach:** [Pass the decision from Phase 0: "simplified" or "full"]
+  - **If Simplified TANF:** Do NOT create state-specific gross income variables - use federal baseline (`tanf_gross_earned_income`, `tanf_gross_unearned_income`)
+  - **If Full TANF:** Create complete state-specific income definitions as needed
+- **Step 1:** Create all parameters first using embedded parameter-architect patterns
+  - Complete parameter structure with all thresholds, amounts, rates
+  - Include proper references from documentation
+- **Step 2:** Implement variables using the parameters
+  - Zero hard-coded values
+  - Complete implementations only
+  - Follow simplified/full approach from Phase 0
+
+**Quality Requirements**:
+- rules-engineer: ZERO hard-coded values, parameters created before variables
+- test-creator: All tests (unit + integration) created together, based purely on documentation
+
+## Phase 5: Pre-Push Validation
+Invoke @complete:country-models:pr-pusher agent to:
+- Ensure changelog entry exists
+- Run formatters (black, isort)
+- Fix any linting issues
+- Run local tests for quick validation
+- Push branch and report initial CI status
+
+**Quality Gate**: Branch must be properly formatted with changelog before continuing.
+
+## Optional Enhancement Phases
+
+**These phases are OPTIONAL and should be considered based on implementation type:**
+
+### For Production-Ready Implementations:
+The following enhancements may be applied to ensure production quality:
+
+1. **Cross-Program Validation**
+   - @complete:country-models:cross-program-validator: Check interactions with other benefits
+   - Prevents benefit cliffs and unintended interactions
+
+2. **Documentation Enhancement**
+   - @complete:country-models:documentation-enricher: Add examples and regulatory citations
+   - Improves maintainability and compliance verification
+
+3. **Performance Optimization**
+   - @complete:country-models:performance-optimizer: Vectorize and optimize calculations
+   - Ensures scalability for large-scale simulations
+
+**Decision Criteria:**
+- **Simplified/Experimental TANF**: Skip these optional phases
+- **Production TANF**: Include based on specific requirements
+- **Full Production Deployment**: Include all enhancements
+
+## Phase 6: Validation
+
+**Run validators to check implementation quality:**
+
+**@complete:country-models:implementation-validator:**
+- Check for hard-coded values in variables
+- Verify placeholder or incomplete implementations
+- Check federal/state parameter organization
+- Assess test quality and coverage
+- Identify performance and vectorization issues
+
+**For TANF/Benefit Programs, also run @complete:country-models:tanf-program-reviewer:**
+- Learn from PA TANF and OH OWF reference implementations first
+- Validate code formulas against regulations
+- Verify test coverage with manual calculations
+- Check parameter structure and references
+- Focus on: eligibility rules, income disregards, benefit formulas
+
+**Quality Gate:** Review validator reports before proceeding
+
+## Phase 7: Local Testing & Fixes
+**CRITICAL: ALWAYS invoke @complete:country-models:ci-fixer agent - do NOT manually fix issues**
+
+Invoke @complete:country-models:ci-fixer agent to:
+- Run all tests locally: `policyengine-core test policyengine_us/tests/policy/baseline/gov/states/[STATE]/[PROGRAM] -c policyengine_us -v`
+- Identify ALL failing tests
+- For each failing test:
+  - Read the test file to understand expected values
+  - Read the actual test output to see what was calculated
+  - Determine root cause: incorrect test expectations OR bug in implementation
+  - Fix the issue:
+    - If test expectations are wrong: update the test file with correct values
+    - If implementation is wrong: fix the variable/parameter code
+  - Re-run tests to verify fix
+- Iterate until ALL tests pass locally
+- **Reference Verification**
+  - Verify all parameters have reference metadata
+  - Verify all variables have reference fields
+  - Keep `sources/` folder files for future reference
+  - If references missing: Create todos for adding them
+- Run `make format` before committing fixes
+- Push final fixes to PR branch
+
+**Success Metrics**:
+- All tests pass locally (green output)
+- All references embedded in code metadata
+- `sources/` folder kept for future reference
+- Code properly formatted
+- Implementation complete and working
+- Clean commit history
+
+## Phase 8: Final Summary
+
+After all tests pass and references are embedded:
+- Update PR description with final implementation status
+- Add summary of what was implemented
+- Report completion to user
+- **Keep PR as draft** - user will mark ready when they choose
+- **WORKFLOW COMPLETE**
+
+## Anti-Patterns This Workflow Prevents
+
+1. **Hard-coded values**: Rules-engineer enforces parameterization
+2. **Incomplete implementations**: Validator catches before PR
+3. **Federal/state mixing**: Proper parameter organization enforced
+4. **Non-existent variables in tests**: Test creator uses only real variables
+5. **Missing edge cases**: Edge-case-generator covers all boundaries
+6. **Benefit cliffs**: Cross-program-validator identifies interactions
+7. **Poor documentation**: Documentation-enricher adds examples
+8. **Performance issues**: Performance-optimizer ensures vectorization
+9. **Review delays**: Most issues caught and fixed automatically
+
+## Execution Instructions
+
+**YOUR ROLE**: You are an orchestrator ONLY. You must:
+1. Invoke agents using the Task tool
+2. Wait for their completion
+3. Check quality gates
+4. **PAUSE and wait for user confirmation before proceeding to next phase**
+
+**YOU MUST NOT**:
+- Write any code yourself
+- Fix any issues manually
+- Run tests directly
+- Edit files
+
+**Execution Flow (ONE PHASE AT A TIME)**:
+
+Execute each phase sequentially and **STOP after each phase** to wait for user instructions:
+
+0. **Phase 0**: Implementation Approach (TANF Programs Only)
+   - Auto-detect from $ARGUMENTS ("simple"/"simplified" vs. "full"/"complete")
+   - Default to Simplified if unclear
+   - Inform user of detected approach
+   - **STOP - Wait for user to say "continue" or provide adjustments**
+
+1. **Phase 1**: Issue and PR Setup
+   - Complete the phase
+   - Report results
+   - **STOP - Wait for user to say "continue" or provide adjustments**
+
+2. **Phase 2**: Variable Naming Convention
+   - Complete the phase
+   - Report results
+   - **STOP - Wait for user to say "continue" or provide adjustments**
+
+3. **Phase 3**: Document Collection
+   - Complete the phase
+   - Report results
+   - **STOP - Wait for user to say "continue" or provide adjustments**
+
+4. **Phase 4**: Development (Test + Implementation)
+   - Pass simplified/full decision to rules-engineer
+   - Run test-creator and rules-engineer in parallel (different folders)
+   - Report results
+   - **STOP - Wait for user to say "continue" or provide adjustments**
+
+5. **Phase 5**: Pre-Push Validation
+   - Complete the phase
+   - Report results
+   - **STOP - Wait for user to say "continue" or provide adjustments**
+
+6. **Phase 6**: Validation
+   - Run validators
+   - Report results
+   - **STOP - Wait for user to say "continue" or provide adjustments**
+
+7. **Phase 7**: Local Testing & Fixes (Including Reference Verification)
+   - Complete the phase
+   - Report results
+   - **STOP - Wait for user to say "continue" or provide adjustments**
+
+8. **Phase 8**: Final Summary
+   - Update PR description
+   - Report final results (keep PR as draft)
+   - **WORKFLOW COMPLETE**
+
+**CRITICAL RULES**:
+- Do NOT proceed to the next phase until user explicitly says to continue
+- After each phase, summarize what was accomplished
+- If user provides adjustments, incorporate them before continuing
+- All 8 phases are REQUIRED - pausing doesn't mean skipping
+
+If any agent fails, report the failure but DO NOT attempt to fix it yourself. Wait for user instructions.
--- a/commands/fix-pr.md
+++ b/commands/fix-pr.md
@@ -0,0 +1,333 @@
+---
+description: Apply fixes to an existing PR, addressing all review comments and validation issues
+---
+
+# Fixing PR: $ARGUMENTS
+
+## Determining Which PR to Review
+
+First, determine which PR to review based on the arguments:
+
+```bash
+# If no arguments provided, use current branch's PR
+if [ -z "$ARGUMENTS" ]; then
+    CURRENT_BRANCH=$(git branch --show-current)
+    PR_NUMBER=$(gh pr list --head "$CURRENT_BRANCH" --json number --jq '.[0].number')
+    if [ -z "$PR_NUMBER" ]; then
+        echo "No PR found for current branch $CURRENT_BRANCH"
+        exit 1
+    fi
+# If argument is a number, use it directly
+elif [[ "$ARGUMENTS" =~ ^[0-9]+$ ]]; then
+    PR_NUMBER=$ARGUMENTS
+# Otherwise, search for PR by description/title
+else
+    PR_NUMBER=$(gh pr list --search "$ARGUMENTS" --json number,title --jq '.[0].number')
+    if [ -z "$PR_NUMBER" ]; then
+        echo "No PR found matching: $ARGUMENTS"
+        exit 1
+    fi
+fi
+
+echo "Reviewing PR #$PR_NUMBER"
+```
+
+Orchestrate agents to review, validate, and fix issues in PR #$PR_NUMBER, addressing all GitHub review comments.
+
+## ⚠️ CRITICAL: Agent Usage is MANDATORY
+
+**You are a coordinator, NOT an implementer. You MUST:**
+1. **NEVER make direct code changes** - always use agents
+2. **INVOKE agents with specific, detailed instructions**
+3. **WAIT for each agent to complete** before proceeding
+4. **VERIFY agent outputs** before moving to next phase
+
+**If you find yourself using Edit, Write, or MultiEdit directly, STOP and invoke the appropriate agent instead.**
+
+## Phase 1: PR Analysis
+After determining PR_NUMBER above, gather context about the PR and review comments:
+
+```bash
+gh pr view $PR_NUMBER --comments
+gh pr checks $PR_NUMBER  # Note: CI runs on draft PRs too!
+gh pr diff $PR_NUMBER
+```
+
+Document findings:
+- Current CI status (even draft PRs have CI)
+- Review comments to address
+- Files changed
+- Type of implementation (new program, bug fix, enhancement)
+
+## Phase 2: Enhanced Domain Validation
+Run comprehensive validation using specialized agents:
+
+### Step 1: Domain-Specific Validation
+Invoke **implementation-validator** to check:
+- Federal/state jurisdiction separation
+- Variable naming conventions and duplicates
+- Hard-coded value patterns
+- Performance optimization opportunities
+- Documentation placement
+- PolicyEngine-specific patterns
+
+### Step 2: Reference Validation
+Invoke **reference-validator** to verify:
+- All parameters have references
+- References actually corroborate values
+- Federal sources for federal params
+- State sources for state params
+- Specific citations, not generic links
+
+### Step 3: Implementation Validation
+Invoke **implementation-validator** to check:
+- No hard-coded values in formulas
+- Complete implementations (no TODOs)
+- Proper entity usage
+- Correct formula patterns
+
+Create a comprehensive checklist:
+- [ ] Domain validation issues
+- [ ] Reference validation issues
+- [ ] Implementation issues
+- [ ] Review comments to address
+- [ ] CI failures to fix
+
+## Phase 3: Sequential Fix Application
+
+**CRITICAL: You MUST use agents for ALL fixes. DO NOT make direct edits yourself.**
+
+Based on issues found, invoke agents IN ORDER to avoid conflicts.
+
+**Why Sequential**: Unlike initial implementation where we need isolation, PR fixes must be applied sequentially because:
+- Each fix builds on the previous one
+- Avoids merge conflicts
+- Tests need to see the fixed implementation
+- Documentation needs to reflect the final code
+
+**MANDATORY AGENT USAGE - Apply fixes in this exact order:**
+
+### Step 1: Fix Domain & Parameter Issues
+```markdown
+YOU MUST INVOKE THESE AGENTS - DO NOT FIX DIRECTLY:
+
+1. First, invoke implementation-validator:
+   "Scan all files in this PR and create a comprehensive list of all domain violations and implementation issues"
+
+2. Then invoke parameter-architect (REQUIRED for ANY hard-coded values):
+   "Design parameter structure for these hard-coded values found: [list all values]
+    Create the YAML parameter files with proper federal/state separation"
+
+3. Then invoke rules-engineer (REQUIRED for code changes):
+   "Refactor all variables to use the new parameters created by parameter-architect.
+    Fix all hard-coded values in: [list files]"
+
+4. Then invoke reference-validator:
+   "Add proper references to all new parameters created"
+
+5. ONLY AFTER all agents complete: Commit changes
+```
+
+### Step 2: Add Missing Tests (MANDATORY)
+```markdown
+REQUIRED - Must generate tests even if none were failing:
+
+1. Invoke edge-case-generator:
+   "Generate boundary tests for all parameters created in Step 1.
+    Test edge cases for: [list all new parameters]"
+
+2. Invoke test-creator:
+   "Create integration tests for the refactored Idaho LIHEAP implementation.
+    Include tests for all new parameter files created."
+
+3. VERIFY tests pass before committing
+4. Commit test additions
+```
+
+### Step 3: Enhance Documentation
+1. **documentation-enricher**: Add examples and references to updated code
+2. Commit documentation improvements
+
+### Step 4: Optimize Performance (if needed)
+1. **performance-optimizer**: Vectorize and optimize calculations
+2. Run tests to ensure no regressions
+3. Commit optimizations
+
+### Step 5: Validate Integrations
+1. **cross-program-validator**: Check benefit interactions
+2. Fix any cliff effects or integration issues found
+3. Commit integration fixes
+
+## Phase 4: Apply Fixes
+For each issue identified:
+
+1. **Read current implementation**
+   ```bash
+   gh pr checkout $PR_NUMBER
+   ```
+
+2. **Apply agent-generated fixes**
+   - Use Edit/MultiEdit for targeted fixes
+   - Preserve existing functionality
+   - Add only what's needed
+
+3. **Verify fixes locally**
+   ```bash
+   make test
+   make format
+   ```
+
+## Phase 5: Address Review Comments
+For each GitHub comment:
+
+1. **Parse comment intent**
+   - Is it requesting a change?
+   - Is it asking for clarification?
+   - Is it pointing out an issue?
+
+2. **Generate response**
+   - If change requested: Apply fix and confirm
+   - If clarification: Add documentation/comment
+   - If issue: Fix and explain approach
+
+3. **Post response on GitHub**
+   ```bash
+   gh pr comment $PR_NUMBER --body "Addressed: [explanation of fix]"
+   ```
+
+## Phase 6: CI Validation
+Invoke ci-fixer to ensure all checks pass:
+
+1. **Push fixes**
+   ```bash
+   git add -A
+   git commit -m "Address review comments
+
+   - Fixed hard-coded values identified in review
+   - Added missing tests for edge cases
+   - Enhanced documentation with examples
+   - Optimized performance issues
+   
+   Addresses comments from @reviewer"
+   git push
+   ```
+
+2. **Monitor CI**
+   ```bash
+   gh pr checks $PR_NUMBER --watch
+   ```
+
+3. **Fix any CI failures**
+   - Format issues: `make format`
+   - Test failures: Fix with targeted agents
+   - Lint issues: Apply corrections
+
+## Phase 7: Final Review & Summary
+Invoke implementation-validator for final validation:
+- All comments addressed?
+- All tests passing?
+- No regressions introduced?
+
+Post summary comment:
+```bash
+gh pr comment $PR_NUMBER --body "## Summary of Changes
+
+### Issues Addressed
+✅ Fixed hard-coded values in [files]
+✅ Added parameterization for [values]
+✅ Enhanced test coverage (+X tests)
+✅ Improved documentation
+✅ All CI checks passing
+
+### Review Comments Addressed
+- @reviewer1: [Issue] → [Fix applied]
+- @reviewer2: [Question] → [Clarification added]
+
+### Ready for Re-Review
+All identified issues have been addressed. The implementation now:
+- Uses parameters for all configurable values
+- Has comprehensive test coverage  
+- Includes documentation with examples
+- Passes all CI checks"
+```
+
+## Command Options
+
+### Usage Examples
+- `/review-pr` - Review PR for current branch
+- `/review-pr 6444` - Review PR #6444
+- `/review-pr "Idaho LIHEAP"` - Search for and review PR by title/description
+
+### Quick Fix Mode
+`/review-pr [PR] --quick`
+- Only fix CI failures
+- Skip comprehensive review
+- Focus on getting checks green
+
+### Deep Review Mode  
+`/review-pr [PR] --deep`
+- Run all validators
+- Generate comprehensive tests
+- Full documentation enhancement
+- Cross-program validation
+
+### Comment Only Mode
+`/review-pr [PR] --comments-only`
+- Only address GitHub review comments
+- Skip additional validation
+- Faster turnaround
+
+## Success Metrics
+
+The PR is ready when:
+- ✅ All CI checks passing
+- ✅ All review comments addressed
+- ✅ No hard-coded values
+- ✅ Comprehensive test coverage
+- ✅ Documentation complete
+- ✅ No performance issues
+
+## Common Review Patterns
+
+### "Hard-coded value" Comment
+1. Identify the value
+2. Create parameter with parameter-architect
+3. Update implementation with rules-engineer
+4. Add test with test-creator
+
+### "Missing test" Comment  
+1. Identify the scenario
+2. Use edge-case-generator for boundaries
+3. Use test-creator for integration tests
+4. Verify with implementation-validator
+
+### "Needs documentation" Comment
+1. Use documentation-enricher
+2. Add calculation examples
+3. Add regulatory references
+4. Explain edge cases
+
+### "Performance issue" Comment
+1. Use performance-optimizer
+2. Vectorize operations
+3. Remove redundant calculations
+4. Test with large datasets
+
+## Error Handling
+
+If agents produce conflicting fixes:
+1. Prioritize fixes that address review comments
+2. Ensure no regressions
+3. Maintain backward compatibility
+4. Document any tradeoffs
+
+## Pre-Flight Checklist
+
+Before starting, confirm:
+- [ ] I will NOT make direct edits (no Edit/Write/MultiEdit by coordinator)
+- [ ] I will invoke agents for ALL changes
+- [ ] I will wait for each agent to complete
+- [ ] I will generate tests even if current tests pass
+- [ ] I will commit after each agent phase
+
+Start with determining which PR to review, then proceed to Phase 1: Analyze the PR and review comments.
--- a/commands/review-pr.md
+++ b/commands/review-pr.md
@@ -0,0 +1,420 @@
+---
+description: Review an existing PR and post findings to GitHub (does not make code changes)
+---
+
+# Reviewing PR: $ARGUMENTS
+
+**READ-ONLY MODE**: This command analyzes the PR and posts a comprehensive review to GitHub WITHOUT making any code changes.
+
+Use `/fix-pr` to apply fixes automatically.
+
+## Important: Avoiding Duplicate Reviews
+
+**Before posting**, check if you already have an active review on this PR. If you do AND there are no replies from others:
+- Delete your old comments
+- Post a single updated review
+- DO NOT create multiple review comments
+
+Only post additional reviews if others have engaged with your previous review.
+
+## Determining Which PR to Review
+
+First, determine which PR to review based on the arguments:
+
+```bash
+# If no arguments provided, use current branch's PR
+if [ -z "$ARGUMENTS" ]; then
+    CURRENT_BRANCH=$(git branch --show-current)
+    PR_NUMBER=$(gh pr list --head "$CURRENT_BRANCH" --json number --jq '.[0].number')
+    if [ -z "$PR_NUMBER" ]; then
+        echo "No PR found for current branch $CURRENT_BRANCH"
+        exit 1
+    fi
+# If argument is a number, use it directly
+elif [[ "$ARGUMENTS" =~ ^[0-9]+$ ]]; then
+    PR_NUMBER=$ARGUMENTS
+# Otherwise, search for PR by description/title
+else
+    PR_NUMBER=$(gh pr list --search "$ARGUMENTS" --json number,title --jq '.[0].number')
+    if [ -z "$PR_NUMBER" ]; then
+        echo "No PR found matching: $ARGUMENTS"
+        exit 1
+    fi
+fi
+
+echo "Reviewing PR #$PR_NUMBER"
+```
+
+## Phase 1: PR Analysis
+
+Gather context about the PR:
+
+```bash
+gh pr view $PR_NUMBER --comments
+gh pr checks $PR_NUMBER
+gh pr diff $PR_NUMBER
+```
+
+Document findings:
+- Current CI status
+- Existing review comments
+- Files changed
+- Type of implementation (new program, bug fix, enhancement)
+
+## Phase 2: Comprehensive Validation (Read-Only)
+
+Run validators to COLLECT issues (do not fix):
+
+### Step 1: Domain Validation
+Invoke **implementation-validator** to identify:
+- Federal/state jurisdiction issues
+- Variable naming violations
+- Hard-coded values
+- Performance optimization opportunities
+- Documentation gaps
+- PolicyEngine pattern violations
+
+**IMPORTANT**: Instruct agent to ONLY report issues, not fix them.
+
+### Step 2: Reference Validation
+Invoke **reference-validator** to check:
+- Missing parameter references
+- References that don't corroborate values
+- Incorrect source types (federal vs state)
+- Generic/vague citations
+
+### Step 3: Implementation Validation
+Invoke **implementation-validator** to identify:
+- Hard-coded values in formulas
+- Incomplete implementations (TODOs, stubs)
+- Entity usage errors
+- Formula pattern violations
+
+### Step 4: Test Coverage Analysis
+Invoke **edge-case-generator** to identify:
+- Missing boundary tests
+- Untested edge cases
+- Parameter combinations not tested
+
+### Step 5: Documentation Review
+Invoke **documentation-enricher** to find:
+- Missing docstrings
+- Lack of calculation examples
+- Missing regulatory references
+- Unclear variable descriptions
+
+## Phase 3: Collect and Organize Findings
+
+Aggregate all issues into structured format:
+
+```json
+{
+  "overall_summary": "Brief summary of PR quality and main concerns",
+  "severity": "APPROVE|COMMENT|REQUEST_CHANGES",
+  "line_comments": [
+    {
+      "path": "policyengine_us/variables/gov/states/ma/dta/ccfa/income.py",
+      "line": 42,
+      "body": "💡 **Hard-coded value detected**\n\nThis value should be parameterized:\n\n```suggestion\nmax_income = parameters(period).gov.states.ma.dta.ccfa.max_income\n```\n\n📚 See: [Parameter Guidelines](https://github.com/PolicyEngine/policyengine-us/blob/master/CLAUDE.md#parameter-structure-best-practices)"
+    },
+    {
+      "path": "policyengine_us/tests/variables/gov/states/ma/dta/ccfa/test_eligibility.yaml",
+      "line": 15,
+      "body": "🧪 **Missing edge case test**\n\nConsider adding a test for income exactly at the threshold:\n\n```yaml\n- name: At threshold\n  period: 2024-01\n  input:\n    household_income: 75_000\n  output:\n    ma_ccfa_eligible: true\n```"
+    }
+  ],
+  "general_comments": [
+    "Consider adding integration tests for multi-child households",
+    "Documentation could benefit from real-world calculation examples"
+  ]
+}
+```
+
+## Phase 4: Check for Existing Reviews
+
+**IMPORTANT**: Before posting a new review, check if you already have an active review:
+
+```bash
+# Check for existing reviews from you (Claude)
+EXISTING_REVIEWS=$(gh api "/repos/{owner}/{repo}/pulls/$PR_NUMBER/reviews" \
+  --jq '[.[] | select(.user.login == "MaxGhenis") | {id: .id, state: .state, submitted_at: .submitted_at}]')
+
+# Check if there are any comments/replies from others since your last review
+LATEST_REVIEW_TIME=$(echo "$EXISTING_REVIEWS" | jq -r '.[0].submitted_at // empty')
+
+if [ -n "$LATEST_REVIEW_TIME" ]; then
+  # Check for comments after your review
+  COMMENTS_AFTER=$(gh api "/repos/{owner}/{repo}/issues/$PR_NUMBER/comments" \
+    --jq "[.[] | select(.created_at > \"$LATEST_REVIEW_TIME\" and .user.login != \"MaxGhenis\")]")
+
+  if [ -z "$COMMENTS_AFTER" ] || [ "$COMMENTS_AFTER" == "[]" ]; then
+    echo "Existing review found with no replies from others"
+    echo "STRATEGY: Delete old comments and post single updated review"
+
+    # Delete old comment-only reviews/comments
+    gh api "/repos/{owner}/{repo}/issues/$PR_NUMBER/comments" \
+      --jq '.[] | select(.user.login == "MaxGhenis") | .id' | \
+      while read comment_id; do
+        gh api --method DELETE "/repos/{owner}/{repo}/issues/comments/$comment_id"
+      done
+  else
+    echo "Others have replied to your review - will add new review"
+  fi
+fi
+```
+
+## Phase 5: Post GitHub Review
+
+Post the review using GitHub CLI:
+
+```bash
+# Create review comments JSON
+cat > /tmp/review_comments.json <<'EOF'
+[FORMATTED_COMMENTS_ARRAY]
+EOF
+
+# Post the review with inline comments
+gh api \
+  --method POST \
+  -H "Accept: application/vnd.github+json" \
+  -H "X-GitHub-Api-Version: 2022-11-28" \
+  "/repos/{owner}/{repo}/pulls/$PR_NUMBER/reviews" \
+  -f body="$OVERALL_SUMMARY" \
+  -f event="$SEVERITY" \
+  -F comments=@/tmp/review_comments.json
+```
+
+### Severity Guidelines
+
+- **APPROVE**: Minor suggestions only, no critical issues
+- **COMMENT**: Has issues but not blocking (educational feedback)
+- **REQUEST_CHANGES**: Has critical issues that must be fixed:
+  - Hard-coded values
+  - Missing tests for core functionality
+  - Incorrect implementations
+  - Missing required documentation
+
+## Phase 6: Post Summary Comment
+
+Add a comprehensive summary as a regular comment:
+
+```bash
+gh pr comment $PR_NUMBER --body "## 📋 Review Summary
+
+### ✅ Strengths
+- [List positive aspects]
+- Well-structured variable organization
+- Clear parameter names
+
+### 🔍 Issues Found
+
+#### Critical (Must Fix)
+- [ ] 3 hard-coded values in income calculations
+- [ ] Missing edge case tests for boundary conditions
+- [ ] Parameter references not corroborated by sources
+
+#### Suggestions (Optional)
+- [ ] Consider vectorization in \`calculate_benefit()\`
+- [ ] Add calculation walkthrough to documentation
+- [ ] Extract shared logic into helper variable
+
+### 📊 Validation Results
+- **Domain Validation**: X issues
+- **Reference Validation**: X issues
+- **Implementation Validation**: X issues
+- **Test Coverage**: X gaps identified
+- **Documentation**: X improvements suggested
+
+### 🚀 Next Steps
+
+To apply these fixes automatically, run:
+\`\`\`bash
+/fix-pr $PR_NUMBER
+\`\`\`
+
+Or address manually and re-request review when ready.
+
+---
+💡 **Tip**: Use \`/fix-pr\` to automatically apply all suggested fixes."
+```
+
+## Phase 6: CI Status Check
+
+If CI is failing, include CI failure summary:
+
+```bash
+gh pr checks $PR_NUMBER --json name,status,conclusion \
+  --jq '.[] | select(.conclusion == "failure") | "❌ " + .name'
+```
+
+Add to review:
+```
+### ⚠️ CI Failures
+- ❌ lint: Black formatting issues
+- ❌ test: 3 test failures in test_income.py
+
+Run `/fix-pr` to automatically fix these issues.
+```
+
+## Command Options
+
+### Usage Examples
+- `/review-pr` - Review PR for current branch
+- `/review-pr 6390` - Review PR #6390
+- `/review-pr "Massachusetts CCFA"` - Search for and review PR by title
+
+### Validation Modes
+
+#### Standard Review (Default)
+- All validators run
+- Comprehensive analysis
+- Balanced severity
+
+#### Quick Review
+`/review-pr [PR] --quick`
+- Skip deep validation
+- Focus on CI failures and critical issues
+- Faster turnaround
+
+#### Strict Review
+`/review-pr [PR] --strict`
+- Maximum scrutiny
+- Flag even minor style issues
+- Request changes for any violations
+
+## Output Format
+
+The review will include:
+
+1. **Overall Summary**: High-level assessment
+2. **Inline Comments**: Specific issues at exact lines
+3. **Code Suggestions**: GitHub suggestion blocks where applicable
+4. **Summary Comment**: Checklist of all issues
+5. **Next Steps**: How to address findings
+
+## Review Categories
+
+### 🔴 Critical Issues (Always flag)
+- Hard-coded values in formulas
+- Missing parameter references
+- Incorrect formula implementations
+- Missing tests for core functionality
+- CI failures
+
+### 🟡 Suggestions (Optional improvements)
+- Performance optimizations
+- Documentation enhancements
+- Code style improvements
+- Additional test coverage
+- Refactoring opportunities
+
+### 🟢 Positive Feedback
+- Well-implemented patterns
+- Good test coverage
+- Clear documentation
+- Proper parameterization
+
+## Success Criteria
+
+A good review should:
+- ✅ Identify all hard-coded values
+- ✅ Flag missing/incorrect references
+- ✅ Suggest specific improvements with examples
+- ✅ Provide actionable next steps
+- ✅ Be respectful and constructive
+- ✅ Include code suggestion blocks for easy application
+
+## Common Review Patterns
+
+### Hard-coded Value
+```markdown
+💡 **Hard-coded value detected**
+
+This value should be parameterized. Create a parameter:
+
+\`\`\`yaml
+# policyengine_us/parameters/gov/states/ma/program/threshold.yaml
+values:
+  2024-01-01: 75000
+metadata:
+  unit: currency-USD
+  period: year
+  reference: [Link to source]
+\`\`\`
+
+Then use in formula:
+\`\`\`suggestion
+threshold = parameters(period).gov.states.ma.program.threshold
+\`\`\`
+```
+
+### Missing Test
+```markdown
+🧪 **Missing edge case test**
+
+Add test for boundary condition:
+
+\`\`\`yaml
+- name: At exact threshold
+  period: 2024-01
+  input:
+    income: 75_000
+  output:
+    eligible: false  # Just above threshold
+\`\`\`
+```
+
+### Missing Documentation
+```markdown
+📚 **Documentation enhancement**
+
+Add docstring with example:
+
+\`\`\`suggestion
+def formula(person, period, parameters):
+    """
+    Calculate benefit amount.
+
+    Example:
+        Income $50,000 → Benefit $2,400
+        Income $75,000 → Benefit $1,200
+
+    Reference: MA DTA Regulation 106 CMR 702.310
+    """
+\`\`\`
+```
+
+### Performance Issue
+```markdown
+⚡ **Vectorization opportunity**
+
+Replace loop with vectorized operation:
+
+\`\`\`suggestion
+return where(
+    household_size == 1,
+    base_amount,
+    base_amount * household_size * 0.7
+)
+\`\`\`
+```
+
+## Error Handling
+
+If validation agents fail:
+1. Run basic validation manually
+2. Document what couldn't be checked
+3. Note limitations in review
+4. Still post whatever findings were collected
+
+## Pre-Flight Checklist
+
+Before starting, confirm:
+- [ ] I will NOT make any code changes
+- [ ] I will ONLY analyze and post reviews
+- [ ] I will collect findings from all validators
+- [ ] I will format findings as GitHub review comments
+- [ ] I will provide actionable suggestions
+- [ ] I will be constructive and respectful
+
+Start by determining which PR to review, then proceed to Phase 1: Analyze the PR.