Initial commit

2025-11-30 08:37:11 +08:00
commit 20b36ca9b1
56 changed files with 14530 additions and 0 deletions
--- a/skills/brainstorming/SKILL.md
+++ b/skills/brainstorming/SKILL.md
@@ -0,0 +1,270 @@
+---
+name: brainstorming
+description: |
+  Socratic design refinement - transforms rough ideas into validated designs through
+  structured questioning, alternative exploration, and incremental validation.
+
+trigger: |
+  - New feature or product idea (requirements unclear)
+  - User says "plan", "design", or "architect" something
+  - Multiple approaches seem possible
+  - Design hasn't been validated by user
+
+skip_when: |
+  - Design already complete and validated → use writing-plans
+  - Have detailed plan ready to execute → use executing-plans
+  - Just need task breakdown from existing design → use writing-plans
+
+sequence:
+  before: [writing-plans, using-git-worktrees]
+
+related:
+  similar: [writing-plans]
+---
+
+# Brainstorming Ideas Into Designs
+
+## Overview
+
+Transform rough ideas into fully-formed designs through structured questioning and alternative exploration.
+
+**Core principle:** Research first, ask targeted questions to fill gaps, explore alternatives, present design incrementally for validation.
+
+**Announce at start:** "I'm using the brainstorming skill to refine your idea into a design."
+
+## Quick Reference
+
+| Phase | Key Activities | Tool Usage | Output |
+|-------|---------------|------------|--------|
+| **Prep: Autonomous Recon** | Inspect repo/docs/commits, form initial model | Native tools (ls, cat, git log, etc.) | Draft understanding to confirm |
+| **1. Understanding** | Share findings, ask only for missing context | AskUserQuestion for real decisions | Purpose, constraints, criteria (confirmed) |
+| **2. Exploration** | Propose 2-3 approaches | AskUserQuestion for approach selection | Architecture options with trade-offs |
+| **3. Design Presentation** | Present in 200-300 word sections | Open-ended questions | Complete design with validation |
+| **4. Design Documentation** | Write design document | writing-clearly-and-concisely skill | Design doc in docs/plans/ |
+| **5. Worktree Setup** | Set up isolated workspace | using-git-worktrees skill | Ready development environment |
+| **6. Planning Handoff** | Create implementation plan | writing-plans skill | Detailed task breakdown |
+
+## The Process
+
+Copy this checklist to track progress:
+
+```
+Brainstorming Progress:
+- [ ] Prep: Autonomous Recon (repo/docs/commits reviewed, initial model shared)
+- [ ] Phase 1: Understanding (purpose, constraints, criteria gathered)
+- [ ] Phase 2: Exploration (2-3 approaches proposed and evaluated)
+- [ ] Phase 3: Design Presentation (design validated in sections)
+- [ ] Phase 4: Design Documentation (design written to docs/plans/)
+- [ ] Phase 5: Worktree Setup (if implementing)
+- [ ] Phase 6: Planning Handoff (if implementing)
+```
+
+### Prep: Autonomous Recon
+
+**MANDATORY evidence (paste ALL):**
+
+```
+Recon Checklist:
+□ Project structure:
+  $ ls -la
+  [PASTE OUTPUT]
+
+□ Recent activity:
+  $ git log --oneline -10
+  [PASTE OUTPUT]
+
+□ Documentation:
+  $ head -50 README.md
+  [PASTE OUTPUT]
+
+□ Test coverage:
+  $ find . -name "*test*" -type f | wc -l
+  [PASTE OUTPUT]
+
+□ Key frameworks/tools:
+  $ [Check package.json, requirements.txt, go.mod, etc.]
+  [PASTE RELEVANT SECTIONS]
+```
+
+**Only after ALL evidence pasted:** Form your model and share findings.
+
+**Skip any evidence = not following the skill**
+
+### Question Budget
+
+**Maximum 3 questions per phase.** More = insufficient research.
+
+Question count:
+- Phase 1: ___/3
+- Phase 2: ___/3
+- Phase 3: ___/3
+
+Hit limit? Do research instead of asking.
+
+### Phase 1: Understanding
+- Share your synthesized understanding first, then invite corrections or additions.
+- Ask one focused question at a time, only for gaps you cannot close yourself.
+- **Use AskUserQuestion tool** only when you need the human to make a decision among real alternatives.
+- Gather: Purpose, constraints, success criteria (confirmed or amended by your partner)
+
+**Example summary + targeted question:**
+```
+Based on the README and yesterday's commit, we're expanding localization to dashboard and billing emails; admin console is still untouched. Only gap I see is whether support responses need localization in this iteration. Did I miss anything important?
+```
+
+### Phase Lock Rules
+
+**CRITICAL:** Once you enter a phase, you CANNOT skip ahead.
+
+- Asked a question? → WAIT for answer before solutions
+- Proposed approaches? → WAIT for selection before design
+- Started design? → COMPLETE before documentation
+
+**Violations:**
+- "While you consider that, here's my design..." → WRONG
+- "I'll proceed with option 1 unless..." → WRONG
+- "Moving forward with the assumption..." → WRONG
+
+**WAIT means WAIT. No assumptions.**
+
+### Phase 2: Exploration
+- Propose 2-3 different approaches
+- For each: Core architecture, trade-offs, complexity assessment, and your recommendation
+- **Use AskUserQuestion tool** to present approaches when you truly need a judgement call
+- Lead with the option you prefer and explain why; invite disagreement if your partner sees it differently
+- Own prioritization: if the repo makes priorities clear, state them and proceed rather than asking
+
+**Example using AskUserQuestion:**
+```
+Question: "Which architectural approach should we use?"
+Options:
+  - "Direct API calls with retry logic" (simple, synchronous, easier to debug) ← recommended for current scope
+  - "Event-driven with message queue" (scalable, complex setup, eventual consistency)
+  - "Hybrid with background jobs" (balanced, moderate complexity, best of both)
+
+I recommend the direct API approach because it matches existing patterns and minimizes new infrastructure. Let me know if you see a blocker that pushes us toward the other options.
+```
+
+### Phase 3: Design Presentation
+- Present in coherent sections; use ~200-300 words when introducing new material, shorter summaries once alignment is obvious
+- Cover: Architecture, components, data flow, error handling, testing
+- Check in at natural breakpoints rather than after every paragraph: "Stop me if this diverges from what you expect."
+- Use open-ended questions to allow freeform feedback
+- Assume ownership and proceed unless your partner redirects you
+
+**Design Acceptance Gate:**
+
+Design is NOT approved until human EXPLICITLY says one of:
+- "Approved" / "Looks good" / "Proceed"
+- "Let's implement that" / "Ship it"
+- "Yes" (in response to "Shall I proceed?")
+
+**These do NOT mean approval:**
+- Silence / No response
+- "Interesting" / "I see" / "Hmm"
+- Questions about the design
+- "What about X?" (that's requesting changes)
+
+**No explicit approval = keep refining**
+
+### Phase 4: Design Documentation
+After validating the design, write it to a permanent document:
+- **File location:** `docs/plans/YYYY-MM-DD-<topic>-design.md` (use actual date and descriptive topic)
+- **RECOMMENDED SUB-SKILL:** Use elements-of-style:writing-clearly-and-concisely (if available) for documentation quality
+- **Content:** Capture the design as discussed and validated in Phase 3, organized into sections that emerged from the conversation
+- Commit the design document to git before proceeding
+
+### Phase 5: Worktree Setup (for implementation)
+When design is approved and implementation will follow:
+- Announce: "I'm using the using-git-worktrees skill to set up an isolated workspace."
+- **REQUIRED SUB-SKILL:** Use ring-default:using-git-worktrees
+- Follow that skill's process for directory selection, safety verification, and setup
+- Return here when worktree ready
+
+### Phase 6: Planning Handoff
+Ask: "Ready to create the implementation plan?"
+
+When your human partner confirms (any affirmative response):
+- Announce: "I'm using the writing-plans skill to create the implementation plan."
+- **REQUIRED SUB-SKILL:** Use ring-default:writing-plans
+- Create detailed plan in the worktree
+
+## Question Patterns
+
+### When to Use AskUserQuestion Tool
+
+**Use AskUserQuestion when:**
+- You need your partner to make a judgement call among real alternatives
+- You have a recommendation and can explain why it’s your preference
+- Prioritization is ambiguous and cannot be inferred from existing materials
+
+**Best practices:**
+- State your preferred option and rationale inside the question so your partner can agree or redirect
+- If you know the answer from repo/docs, state it as fact and proceed—no question needed
+- When priorities are spelled out, acknowledge them and proceed rather than delegating the choice back to your partner
+
+### When to Use Open-Ended Questions
+
+**Use open-ended questions for:**
+- Phase 3: Design validation ("Does this look right so far?")
+- When you need detailed feedback or explanation
+- When partner should describe their own requirements
+- When structured options would limit creative input
+
+Frame them to confirm or expand your current understanding rather than reopening settled topics.
+
+**Example decision flow:**
+- "What authentication method?" → Use AskUserQuestion (2-4 options)
+- "Does this design handle your use case?" → Open-ended (validation)
+
+## When to Revisit Earlier Phases
+
+```dot
+digraph revisit_phases {
+    rankdir=LR;
+    "New constraint revealed?" [shape=diamond];
+    "Partner questions approach?" [shape=diamond];
+    "Requirements unclear?" [shape=diamond];
+    "Return to Phase 1" [shape=box, style=filled, fillcolor="#ffcccc"];
+    "Return to Phase 2" [shape=box, style=filled, fillcolor="#ffffcc"];
+    "Continue forward" [shape=box, style=filled, fillcolor="#ccffcc"];
+
+    "New constraint revealed?" -> "Return to Phase 1" [label="yes"];
+    "New constraint revealed?" -> "Partner questions approach?" [label="no"];
+    "Partner questions approach?" -> "Return to Phase 2" [label="yes"];
+    "Partner questions approach?" -> "Requirements unclear?" [label="no"];
+    "Requirements unclear?" -> "Return to Phase 1" [label="yes"];
+    "Requirements unclear?" -> "Continue forward" [label="no"];
+}
+```
+
+**You can and should go backward when:**
+- Partner reveals new constraint during Phase 2 or 3 → Return to Phase 1
+- Validation shows fundamental gap in requirements → Return to Phase 1
+- Partner questions approach during Phase 3 → Return to Phase 2
+- Something doesn't make sense → Go back and clarify
+
+**Avoid forcing forward linearly** when going backward would give better results.
+
+## Required Patterns
+
+This skill uses these universal patterns:
+- **State Tracking:** See `skills/shared-patterns/state-tracking.md`
+- **Failure Recovery:** See `skills/shared-patterns/failure-recovery.md`
+- **Exit Criteria:** See `skills/shared-patterns/exit-criteria.md`
+- **TodoWrite:** See `skills/shared-patterns/todowrite-integration.md`
+
+Apply ALL patterns when using this skill.
+
+## Key Principles
+
+| Principle | Application |
+|-----------|-------------|
+| **One question at a time** | Phase 1: Single targeted question only for gaps you can’t close yourself |
+| **Structured choices** | Use AskUserQuestion tool for 2-4 options with trade-offs |
+| **YAGNI ruthlessly** | Remove unnecessary features from all designs |
+| **Explore alternatives** | Always propose 2-3 approaches before settling |
+| **Incremental validation** | Present design in sections, validate each |
+| **Flexible progression** | Go backward when needed - flexibility > rigidity |
+| **Own the initiative** | Recommend priorities and next steps; ask if you should proceed only when requirements conflict |
+| **Announce usage** | State skill usage at start of session |
--- a/skills/codify-solution/SKILL.md
+++ b/skills/codify-solution/SKILL.md
@@ -0,0 +1,510 @@
+---
+name: codify-solution
+description: |
+  Capture solved problems as structured documentation - auto-triggers when user
+  confirms fix worked. Creates searchable knowledge base in docs/solutions/.
+  The compounding effect: each documented solution makes future debugging faster.
+
+trigger: |
+  - User confirms fix worked ("that worked", "it's fixed", "solved it")
+  - /ring-default:codify command invoked manually
+  - Non-trivial debugging completed (multiple investigation attempts)
+  - After systematic-debugging Phase 4 succeeds
+
+skip_when: |
+  - Simple typo or obvious syntax error (< 2 min to fix)
+  - Solution already documented in docs/solutions/
+  - Trivial fix that won't recur
+  - User declines documentation
+
+sequence:
+  after: [systematic-debugging, root-cause-tracing]
+
+related:
+  similar: [writing-plans]  # writing-plans is for implementation planning, codify-solution is for solved problems
+  complementary: [systematic-debugging, writing-plans, test-driven-development]
+
+compliance_rules:
+  - id: "docs_directory_exists"
+    description: "Solution docs directory must exist or be creatable"
+    check_type: "command_output"
+    command: "test -d docs/solutions/ || mkdir -p docs/solutions/"
+    severity: "blocking"
+    failure_message: "Cannot access docs/solutions/. Check permissions."
+
+  - id: "required_context_gathered"
+    description: "All required context fields must be populated before doc creation"
+    check_type: "manual_verification"
+    required_fields:
+      - component
+      - symptoms
+      - root_cause
+      - solution
+    severity: "blocking"
+    failure_message: "Missing required context. Ask user for: component, symptoms, root_cause, solution."
+
+  - id: "schema_validation"
+    description: "YAML frontmatter must pass schema validation"
+    check_type: "schema_validation"
+    schema_path: "schema.yaml"
+    severity: "blocking"
+    failure_message: "Schema validation failed. Check problem_type, root_cause, resolution_type enum values."
+
+prerequisites:
+  - name: "docs_directory_writable"
+    check: "test -w docs/ || mkdir -p docs/solutions/"
+    failure_message: "Cannot write to docs/. Check permissions."
+    severity: "blocking"
+
+composition:
+  works_well_with:
+    - skill: "systematic-debugging"
+      when: "debugging session completed successfully"
+      transition: "Automatically suggest codify after Phase 4 verification"
+
+    - skill: "writing-plans"
+      when: "planning new feature implementation"
+      transition: "Search docs/solutions/ for prior related issues during planning"
+
+    - skill: "test-driven-development"
+      when: "writing tests for bug fixes"
+      transition: "After test passes, document the solution for future reference"
+
+  conflicts_with: []
+
+  typical_workflow: |
+    1. Debugging session completes (systematic-debugging Phase 4)
+    2. User confirms fix worked
+    3. Auto-suggest codify-solution via hook
+    4. Gather context from conversation
+    5. Validate schema and create documentation
+    6. Cross-reference with related solutions
+---
+
+# Codify Solution Skill
+
+**Purpose:** Build searchable institutional knowledge by capturing solved problems immediately after confirmation.
+
+**The Compounding Effect:**
+```
+First time solving issue → Research (30 min)
+Document the solution   → docs/solutions/{category}/ (5 min)
+Next similar issue      → Quick lookup (2 min)
+Knowledge compounds     → Team gets smarter over time
+```
+
+---
+
+## 7-Step Process
+
+### Step 1: Detect Confirmation
+
+**Auto-invoke after phrases:**
+- "that worked"
+- "it's fixed", "fixed it"
+- "working now"
+- "problem solved", "solved it"
+- "that did it"
+- "all good now"
+- "issue resolved"
+
+**Criteria for documentation (ALL must apply):**
+- Non-trivial problem (took multiple attempts to solve)
+- Solution is not immediately obvious
+- Future sessions would benefit from having this documented
+- User hasn't declined
+
+**If triggered automatically:**
+> "It looks like you've solved a problem! Would you like to document this solution for future reference?
+>
+> Run: `/ring-default:codify`
+> Or say: 'yes, document this'"
+
+---
+
+### Step 2: Gather Context
+
+Extract from conversation history:
+
+| Field | Description | Source |
+|-------|-------------|--------|
+| **Title** | Document title for H1 heading | Derived: "[Primary Symptom] in [Component]" |
+| **Component** | Which module/file had the problem | File paths mentioned |
+| **Symptoms** | Observable error/behavior | Error messages, logs |
+| **Investigation** | What was tried and failed | Conversation history |
+| **Root Cause** | Technical explanation | Analysis performed |
+| **Solution** | What fixed it | Code changes made |
+| **Prevention** | How to avoid in future | Lessons learned |
+
+**BLOCKING GATE:**
+If ANY critical context is missing, ask the user and WAIT for response:
+
+```
+Missing context for documentation:
+- [ ] Component: Which module/service was affected?
+- [ ] Symptoms: What was the exact error message?
+- [ ] Root Cause: What was the underlying cause?
+
+Please provide the missing information before I create the documentation.
+```
+
+---
+
+### Step 3: Check Existing Docs
+
+**First, ensure docs/solutions/ exists:**
+```bash
+# Initialize directory structure if needed
+mkdir -p docs/solutions/
+```
+
+Search `docs/solutions/` for similar issues:
+
+```bash
+# Search for similar symptoms
+grep -r "exact error phrase" docs/solutions/ 2>/dev/null || true
+
+# Search by component
+grep -r "component: similar-component" docs/solutions/ 2>/dev/null || true
+```
+
+**If similar found, present options:**
+
+| Option | When to Use |
+|--------|-------------|
+| **1. Create new doc with cross-reference** | Different root cause but related symptoms (recommended) |
+| **2. Update existing doc** | Same root cause, additional context to add |
+| **3. Skip documentation** | Exact duplicate, no new information |
+
+---
+
+### Step 4: Generate Filename
+
+**Format:** `[sanitized-symptom]-[component]-[YYYYMMDD].md`
+
+**Sanitization rules:**
+1. Convert to lowercase
+2. Replace spaces with hyphens
+3. Remove special characters (keep alphanumeric and hyphens)
+4. Truncate to < 80 characters total
+5. **Ensure unique** - check for existing files:
+
+```bash
+# Get category directory from problem_type
+CATEGORY_DIR=$(get_category_directory "$PROBLEM_TYPE")
+
+# Check for filename collision
+BASE_NAME="sanitized-symptom-component-YYYYMMDD"
+FILENAME="${BASE_NAME}.md"
+COUNTER=2
+
+while [ -f "docs/solutions/${CATEGORY_DIR}/${FILENAME}" ]; do
+    FILENAME="${BASE_NAME}-${COUNTER}.md"
+    COUNTER=$((COUNTER + 1))
+done
+```
+
+**Examples:**
+```
+cannot-read-property-id-auth-service-20250127.md
+jwt-malformed-error-middleware-20250127.md
+n-plus-one-query-user-api-20250127.md
+# If duplicate exists:
+jwt-malformed-error-middleware-20250127-2.md
+```
+
+---
+
+### Step 5: Validate YAML Schema
+
+**CRITICAL GATE:** Load `schema.yaml` and validate all required fields.
+
+**Required fields checklist:**
+- [ ] `date` - Format: YYYY-MM-DD
+- [ ] `problem_type` - Must be valid enum value
+- [ ] `component` - Non-empty string
+- [ ] `symptoms` - Array with 1-5 items
+- [ ] `root_cause` - Must be valid enum value
+- [ ] `resolution_type` - Must be valid enum value
+- [ ] `severity` - Must be: critical, high, medium, or low
+
+**BLOCK if validation fails:**
+```
+Schema validation failed:
+- problem_type: "database" is not valid. Use one of: build_error, test_failure,
+  runtime_error, performance_issue, database_issue, ...
+- symptoms: Must have at least 1 item
+
+Please correct these fields before proceeding.
+```
+
+**Valid `problem_type` values:**
+- build_error, test_failure, runtime_error, performance_issue
+- database_issue, security_issue, ui_bug, integration_issue
+- logic_error, dependency_issue, configuration_error, workflow_issue
+
+**Valid `root_cause` values:**
+- missing_dependency, wrong_api_usage, configuration_error, logic_error
+- race_condition, memory_issue, type_mismatch, missing_validation
+- permission_error, environment_issue, version_incompatibility
+- data_corruption, missing_error_handling, incorrect_assumption
+
+**Valid `resolution_type` values:**
+- code_fix, config_change, dependency_update, migration
+- test_fix, environment_setup, documentation, workaround
+
+---
+
+### Step 6: Create Documentation
+
+1. **Ensure directory exists:**
+   ```bash
+   mkdir -p docs/solutions/{category}/
+   ```
+
+2. **Load template:** Read `assets/resolution-template.md`
+
+3. **Fill placeholders:**
+   - Replace all `{{FIELD}}` placeholders with gathered context
+   - Ensure code blocks have correct language tags
+   - Format tables properly
+
+   **Placeholder Syntax:**
+   - `{{FIELD}}` - Required field, must be replaced with actual value
+   - `{{FIELD|default}}` - Optional field, use default value if not available
+   - Remove unused optional field lines entirely (e.g., empty symptom slots)
+
+4. **Write file:**
+   ```bash
+   # Path: docs/solutions/{category}/{filename}.md
+   ```
+
+5. **Verify creation:**
+   ```bash
+   ls -la docs/solutions/{category}/{filename}.md
+   ```
+
+**Category to directory mapping (MANDATORY):**
+
+```bash
+# Category mapping function - use this EXACTLY
+get_category_directory() {
+    local problem_type="$1"
+    case "$problem_type" in
+        build_error) echo "build-errors" ;;
+        test_failure) echo "test-failures" ;;
+        runtime_error) echo "runtime-errors" ;;
+        performance_issue) echo "performance-issues" ;;
+        database_issue) echo "database-issues" ;;
+        security_issue) echo "security-issues" ;;
+        ui_bug) echo "ui-bugs" ;;
+        integration_issue) echo "integration-issues" ;;
+        logic_error) echo "logic-errors" ;;
+        dependency_issue) echo "dependency-issues" ;;
+        configuration_error) echo "configuration-errors" ;;
+        workflow_issue) echo "workflow-issues" ;;
+        *) echo "INVALID_CATEGORY"; return 1 ;;
+    esac
+}
+```
+
+| problem_type | Directory |
+|--------------|-----------|
+| build_error | build-errors |
+| test_failure | test-failures |
+| runtime_error | runtime-errors |
+| performance_issue | performance-issues |
+| database_issue | database-issues |
+| security_issue | security-issues |
+| ui_bug | ui-bugs |
+| integration_issue | integration-issues |
+| logic_error | logic-errors |
+| dependency_issue | dependency-issues |
+| configuration_error | configuration-errors |
+| workflow_issue | workflow-issues |
+
+**CRITICAL:** If `problem_type` is not in this list, **BLOCK** and ask user to select valid category.
+
+6. **Confirm creation:**
+   ```
+   ✅ Solution documented: docs/solutions/{category}/{filename}.md
+
+   File created with {N} required fields and {M} optional fields populated.
+   ```
+
+---
+
+### Step 7: Post-Documentation Options
+
+Present decision menu after successful creation:
+
+```
+Solution documented: docs/solutions/{category}/{filename}.md
+
+What would you like to do next?
+1. Continue with current workflow
+2. Promote to Critical Pattern (if recurring issue)
+3. Link to related issues
+4. View the documentation
+5. Search for similar issues
+```
+
+**Option 2: Promote to Critical Pattern**
+If this issue has occurred multiple times or is particularly important:
+1. Ensure directory exists: `mkdir -p docs/solutions/patterns/`
+2. Load `assets/critical-pattern-template.md`
+3. Create in `docs/solutions/patterns/`
+4. Link all related instance docs
+
+**Option 3: Link to related issues**
+- Search for related docs
+- Add to `related_issues` field in YAML frontmatter
+- Update the related doc to cross-reference this one
+
+---
+
+## Integration Points
+
+### With systematic-debugging
+
+After Phase 4 (Verification) succeeds, suggest:
+> "The fix has been verified. Would you like to document this solution for future reference?
+> Run: `/ring-default:codify`"
+
+### With writing-plans
+
+During planning phase, search existing solutions:
+```bash
+# Before designing a feature, check for known issues
+grep -r "component: {related-component}" docs/solutions/
+```
+
+### With pre-dev workflow
+
+In Gate 1 (PRD Creation), reference existing solutions:
+- Search `docs/solutions/` for related prior art
+- Include relevant learnings in requirements
+
+---
+
+## Solution Doc Search Patterns
+
+**Search by error message:**
+```bash
+grep -r "exact error text" docs/solutions/
+```
+
+**Search by component:**
+```bash
+grep -r "component: auth" docs/solutions/
+```
+
+**Search by root cause:**
+```bash
+grep -r "root_cause: race_condition" docs/solutions/
+```
+
+**Search by tag:**
+```bash
+grep -r "- authentication" docs/solutions/
+```
+
+**List all by category:**
+```bash
+ls docs/solutions/performance-issues/
+```
+
+---
+
+## Rationalization Defenses
+
+| Excuse | Counter |
+|--------|---------|
+| "This was too simple to document" | If it took > 5 min to solve, document it |
+| "I'll remember this" | You won't. Your future self will thank you. |
+| "Nobody else will hit this" | They will. And they'll spend 30 min re-investigating. |
+| "The code is self-documenting" | The investigation process isn't in the code |
+| "I'll do it later" | No you won't. Do it now while context is fresh. |
+
+---
+
+## Success Metrics
+
+A well-documented solution should:
+- [ ] Be findable via grep search on symptoms
+- [ ] Have clear root cause explanation
+- [ ] Include before/after code examples
+- [ ] List prevention strategies
+- [ ] Take < 5 minutes to write (use template)
+
+---
+
+## Example Output
+
+**File:** `docs/solutions/runtime-errors/jwt-malformed-auth-middleware-20250127.md`
+
+```markdown
+---
+date: 2025-01-27
+problem_type: runtime_error
+component: auth-middleware
+symptoms:
+  - "Error: JWT malformed"
+  - "401 Unauthorized on valid tokens"
+root_cause: wrong_api_usage
+resolution_type: code_fix
+severity: high
+project: midaz
+language: go
+framework: gin
+tags:
+  - authentication
+  - jwt
+  - middleware
+---
+
+# JWT Malformed Error in Auth Middleware
+
+## Problem
+
+Valid JWTs were being rejected with "JWT malformed" error after
+upgrading the jwt-go library.
+
+### Symptoms
+
+- "Error: JWT malformed" on all authenticated requests
+- 401 Unauthorized responses for valid tokens
+- Started after dependency update
+
+## Investigation
+
+### What didn't work
+
+1. Regenerating tokens - same error
+2. Checking token format - tokens were valid
+
+### Root Cause
+
+The new jwt-go version changed the default parser behavior.
+The `ParseWithClaims` function now requires explicit algorithm
+specification.
+
+## Solution
+
+Added explicit algorithm in parser options:
+
+```go
+// Before
+token, err := jwt.ParseWithClaims(tokenString, claims, keyFunc)
+
+// After
+token, err := jwt.ParseWithClaims(tokenString, claims, keyFunc,
+    jwt.WithValidMethods([]string{"HS256"}))
+```
+
+## Prevention
+
+1. Read changelogs before upgrading auth-related dependencies
+2. Add integration tests for token parsing
+3. Pin major versions of security-critical packages
+```
--- a/skills/codify-solution/assets/critical-pattern-template.md
+++ b/skills/codify-solution/assets/critical-pattern-template.md
@@ -0,0 +1,69 @@
+---
+pattern_name: {{PATTERN_NAME}}
+category: {{CATEGORY}}
+frequency: {{high|medium|low}}
+severity_when_missed: {{critical|high|medium|low}}
+created: {{DATE}}
+instances:
+  - {{SOLUTION_DOC_1}}
+---
+
+# Critical Pattern: {{PATTERN_NAME}}
+
+## Summary
+
+{{One sentence description of the pattern}}
+
+## The Pattern
+
+{{Describe the anti-pattern or mistake that keeps recurring}}
+
+## Why It Happens
+
+1. {{Common reason 1}}
+2. {{Common reason 2}}
+
+## How to Recognize
+
+**Warning Signs:**
+- {{Sign 1}}
+- {{Sign 2}}
+
+**Typical Error Messages:**
+```
+{{Common error message}}
+```
+
+## The Fix
+
+**Standard Solution:**
+
+```{{LANGUAGE}}
+{{Code showing the correct approach}}
+```
+
+**Checklist:**
+- [ ] {{Step 1}}
+- [ ] {{Step 2}}
+
+## Prevention
+
+### Code Review Checklist
+
+When reviewing code in this area, check:
+- [ ] {{Item to verify}}
+- [ ] {{Item to verify}}
+
+### Automated Checks
+
+{{Suggest linting rules, tests, or CI checks that could catch this}}
+
+## Instance History
+
+| Date | Solution Doc | Component |
+|------|--------------|-----------|
+| {{DATE}} | [{{title}}]({{path}}) | {{component}} |
+
+## Related Patterns
+
+- {{Link to related pattern}}
--- a/skills/codify-solution/assets/resolution-template.md
+++ b/skills/codify-solution/assets/resolution-template.md
@@ -0,0 +1,89 @@
+---
+date: {{DATE}}  # Format: YYYY-MM-DD (e.g., 2025-01-27)
+problem_type: {{PROBLEM_TYPE}}
+component: {{COMPONENT}}
+symptoms:
+  - "{{SYMPTOM_1}}"
+  # - "{{SYMPTOM_2}}"  # Add if applicable
+  # - "{{SYMPTOM_3}}"  # Add if applicable
+  # - "{{SYMPTOM_4}}"  # Add if applicable
+  # - "{{SYMPTOM_5}}"  # Add if applicable (max 5 per schema)
+root_cause: {{ROOT_CAUSE}}
+resolution_type: {{RESOLUTION_TYPE}}
+severity: {{SEVERITY}}
+# Optional fields (remove if not applicable)
+project: {{PROJECT|unknown}}
+language: {{LANGUAGE|unknown}}
+framework: {{FRAMEWORK|none}}
+tags:
+  - {{TAG_1|untagged}}
+  # - {{TAG_2}}  # Add relevant tags for searchability
+  # - {{TAG_3}}  # Maximum 8 tags per schema
+related_issues: []
+---
+
+# {{TITLE}}
+
+## Problem
+
+{{Brief description of the problem - what was happening, what was expected}}
+
+### Symptoms
+
+{{List the observable symptoms - error messages, visual issues, unexpected behavior}}
+
+```
+{{Exact error message or log output if applicable}}
+```
+
+### Context
+
+- **Component:** {{COMPONENT}}
+- **Affected files:** {{List key files involved}}
+- **When it occurred:** {{Under what conditions - deployment, specific input, load, etc.}}
+
+## Investigation
+
+### What didn't work
+
+1. {{First thing tried and why it didn't solve it}}
+2. {{Second thing tried and why it didn't solve it}}
+
+### Root Cause
+
+{{Technical explanation of the actual cause - be specific}}
+
+## Solution
+
+### The Fix
+
+{{Describe what was changed and why}}
+
+```{{LANGUAGE}}
+// Before
+{{Code that had the problem}}
+
+// After
+{{Code with the fix}}
+```
+
+### Files Changed
+
+| File | Change |
+|------|--------|
+| {{file_path}} | {{Brief description of change}} |
+
+## Prevention
+
+### How to avoid this in the future
+
+1. {{Preventive measure 1}}
+2. {{Preventive measure 2}}
+
+### Warning signs to watch for
+
+- {{Early indicator that this problem might be occurring}}
+
+## Related
+
+- {{Link to related solution docs or external resources}}
--- a/skills/codify-solution/references/yaml-schema.md
+++ b/skills/codify-solution/references/yaml-schema.md
@@ -0,0 +1,163 @@
+# YAML Schema Reference for Solution Documentation
+
+This document describes the YAML frontmatter schema used in solution documentation files created by the `codify-solution` skill.
+
+## Required Fields
+
+### `date`
+- **Type:** String (ISO date format)
+- **Pattern:** `YYYY-MM-DD`
+- **Example:** `2025-01-27`
+- **Purpose:** When the problem was solved
+
+### `problem_type`
+- **Type:** Enum
+- **Values:**
+  - `build_error` - Compilation, bundling, or build process failures
+  - `test_failure` - Unit, integration, or E2E test failures
+  - `runtime_error` - Errors occurring during execution
+  - `performance_issue` - Slow queries, memory leaks, latency
+  - `database_issue` - Migration, query, or data integrity problems
+  - `security_issue` - Vulnerabilities, auth problems, data exposure
+  - `ui_bug` - Visual glitches, interaction problems
+  - `integration_issue` - API, service communication, third-party
+  - `logic_error` - Incorrect behavior, wrong calculations
+  - `dependency_issue` - Package conflicts, version mismatches
+  - `configuration_error` - Environment, settings, config files
+  - `workflow_issue` - CI/CD, deployment, process problems
+- **Purpose:** Determines storage directory and enables filtering
+
+### `component`
+- **Type:** String (free-form)
+- **Example:** `auth-service`, `payment-gateway`, `user-dashboard`
+- **Purpose:** Identifies the affected module/service
+- **Note:** Not an enum - varies by project
+
+### `symptoms`
+- **Type:** Array of strings (1-5 items)
+- **Example:**
+  ```yaml
+  symptoms:
+    - "Error: Cannot read property 'id' of undefined"
+    - "Login button unresponsive after 3 clicks"
+  ```
+- **Purpose:** Observable error messages or behaviors for searchability
+
+### `root_cause`
+- **Type:** Enum
+- **Values:**
+  - `missing_dependency` - Required package/module not installed
+  - `wrong_api_usage` - Incorrect use of library/framework API
+  - `configuration_error` - Wrong settings or environment variables
+  - `logic_error` - Bug in business logic or algorithms
+  - `race_condition` - Timing-dependent bug
+  - `memory_issue` - Leaks, excessive allocation
+  - `type_mismatch` - Wrong type passed or returned
+  - `missing_validation` - Input not properly validated
+  - `permission_error` - Access denied, wrong credentials
+  - `environment_issue` - Dev/prod differences, missing tools
+  - `version_incompatibility` - Breaking changes between versions
+  - `data_corruption` - Invalid or inconsistent data state
+  - `missing_error_handling` - Unhandled exceptions/rejections
+  - `incorrect_assumption` - Wrong mental model of system
+- **Purpose:** Enables pattern detection across similar issues
+
+### `resolution_type`
+- **Type:** Enum
+- **Values:**
+  - `code_fix` - Changed source code
+  - `config_change` - Updated configuration files
+  - `dependency_update` - Updated/added/removed packages
+  - `migration` - Database or data migration
+  - `test_fix` - Fixed test code (not production code)
+  - `environment_setup` - Changed environment/infrastructure
+  - `documentation` - Solution was documenting existing behavior
+  - `workaround` - Temporary fix, not ideal solution
+- **Purpose:** Categorizes the type of change made
+
+### `severity`
+- **Type:** Enum
+- **Values:** `critical`, `high`, `medium`, `low`
+- **Purpose:** Original impact severity for prioritization analysis
+
+## Optional Fields
+
+### `project`
+- **Type:** String
+- **Example:** `midaz`
+- **Purpose:** Project identifier (auto-detected from git)
+
+### `language`
+- **Type:** String
+- **Example:** `go`, `typescript`, `python`
+- **Purpose:** Primary language involved for filtering
+
+### `framework`
+- **Type:** String
+- **Example:** `gin`, `nextjs`, `fastapi`
+- **Purpose:** Framework-specific issues for filtering
+
+### `tags`
+- **Type:** Array of strings (max 8)
+- **Example:** `[authentication, jwt, middleware, rate-limiting]`
+- **Purpose:** Free-form keywords for discovery
+
+### `related_issues`
+- **Type:** Array of strings
+- **Example:**
+  ```yaml
+  related_issues:
+    - "docs/solutions/security-issues/jwt-expiry-20250115.md"
+    - "https://github.com/org/repo/issues/123"
+  ```
+- **Purpose:** Cross-reference related solutions or external issues
+
+## Category to Directory Mapping
+
+| `problem_type` | Directory |
+|----------------|-----------|
+| `build_error` | `docs/solutions/build-errors/` |
+| `test_failure` | `docs/solutions/test-failures/` |
+| `runtime_error` | `docs/solutions/runtime-errors/` |
+| `performance_issue` | `docs/solutions/performance-issues/` |
+| `database_issue` | `docs/solutions/database-issues/` |
+| `security_issue` | `docs/solutions/security-issues/` |
+| `ui_bug` | `docs/solutions/ui-bugs/` |
+| `integration_issue` | `docs/solutions/integration-issues/` |
+| `logic_error` | `docs/solutions/logic-errors/` |
+| `dependency_issue` | `docs/solutions/dependency-issues/` |
+| `configuration_error` | `docs/solutions/configuration-errors/` |
+| `workflow_issue` | `docs/solutions/workflow-issues/` |
+
+## Validation Rules
+
+1. **All required fields must be present**
+2. **Enum values must match exactly** (case-sensitive)
+3. **Date must be valid ISO format**
+4. **Symptoms array must have 1-5 items**
+5. **Tags array limited to 8 items**
+
+## Example Complete Frontmatter
+
+```yaml
+---
+date: 2025-01-27
+problem_type: runtime_error
+component: auth-middleware
+symptoms:
+  - "Error: JWT malformed"
+  - "401 Unauthorized on valid tokens"
+root_cause: wrong_api_usage
+resolution_type: code_fix
+severity: high
+project: midaz
+language: go
+framework: gin
+tags:
+  - authentication
+  - jwt
+  - middleware
+related_issues:
+  - "docs/solutions/security-issues/jwt-refresh-logic-20250110.md"
+---
+```
--- a/skills/codify-solution/schema.yaml
+++ b/skills/codify-solution/schema.yaml
@@ -0,0 +1,137 @@
+# Schema for Solution Documentation
+# Version: 1.0
+# Validator: Manual validation per codify-solution skill Step 5
+# Format: Custom Ring schema (not JSON Schema)
+# Used by: codify-solution skill for AI agent validation
+#
+# This schema defines the required and optional fields for solution documentation.
+# AI agents use this schema to validate YAML frontmatter before creating docs.
+
+required_fields:
+  date:
+    type: string
+    pattern: '^\d{4}-\d{2}-\d{2}$'
+    description: "Date problem was solved (YYYY-MM-DD)"
+    example: "2025-01-27"
+
+  problem_type:
+    type: enum
+    values:
+      - build_error
+      - test_failure
+      - runtime_error
+      - performance_issue
+      - database_issue
+      - security_issue
+      - ui_bug
+      - integration_issue
+      - logic_error
+      - dependency_issue
+      - configuration_error
+      - workflow_issue
+    description: "Primary category - determines storage directory"
+
+  component:
+    type: string
+    description: "Affected component/module (project-specific, not enum)"
+    example: "auth-service"
+
+  symptoms:
+    type: array
+    items: string
+    min_items: 1
+    max_items: 5
+    description: "Observable symptoms (error messages, visual issues)"
+    example:
+      - "Error: Cannot read property 'id' of undefined"
+      - "Login button unresponsive after 3 clicks"
+
+  root_cause:
+    type: enum
+    values:
+      - missing_dependency
+      - wrong_api_usage
+      - configuration_error
+      - logic_error
+      - race_condition
+      - memory_issue
+      - type_mismatch
+      - missing_validation
+      - permission_error
+      - environment_issue
+      - version_incompatibility
+      - data_corruption
+      - missing_error_handling
+      - incorrect_assumption
+    description: "Fundamental cause of the problem"
+
+  resolution_type:
+    type: enum
+    values:
+      - code_fix
+      - config_change
+      - dependency_update
+      - migration
+      - test_fix
+      - environment_setup
+      - documentation
+      - workaround
+    description: "Type of fix applied"
+
+  severity:
+    type: enum
+    values:
+      - critical
+      - high
+      - medium
+      - low
+    description: "Impact severity of the original problem"
+
+optional_fields:
+  project:
+    type: string
+    description: "Project name (auto-detected from git remote)"
+    example: "midaz"
+
+  language:
+    type: string
+    description: "Primary language involved"
+    example: "go"
+
+  framework:
+    type: string
+    description: "Framework if applicable"
+    example: "gin"
+
+  tags:
+    type: array
+    items: string
+    max_items: 8
+    description: "Searchable keywords for discovery"
+    example:
+      - "authentication"
+      - "jwt"
+      - "middleware"
+
+  related_issues:
+    type: array
+    items: string
+    description: "Links to related solution docs or external issues"
+    example:
+      - "docs/solutions/security-issues/jwt-expiry-handling-20250115.md"
+      - "https://github.com/org/repo/issues/123"
+
+# Category to directory mapping
+category_directories:
+  build_error: "build-errors"
+  test_failure: "test-failures"
+  runtime_error: "runtime-errors"
+  performance_issue: "performance-issues"
+  database_issue: "database-issues"
+  security_issue: "security-issues"
+  ui_bug: "ui-bugs"
+  integration_issue: "integration-issues"
+  logic_error: "logic-errors"
+  dependency_issue: "dependency-issues"
+  configuration_error: "configuration-errors"
+  workflow_issue: "workflow-issues"
--- a/skills/condition-based-waiting/SKILL.md
+++ b/skills/condition-based-waiting/SKILL.md
@@ -0,0 +1,132 @@
+---
+name: condition-based-waiting
+description: |
+  Flaky test fix pattern - replaces arbitrary timeouts with condition polling
+  that waits for actual state changes.
+
+trigger: |
+  - Tests use setTimeout/sleep with arbitrary values
+  - Tests are flaky (pass sometimes, fail under load)
+  - Tests timeout when run in parallel
+  - Waiting for async operations in tests
+
+skip_when: |
+  - Testing actual timing behavior (debounce, throttle) → timeout is correct
+  - Synchronous tests → no waiting needed
+---
+
+# Condition-Based Waiting
+
+## Overview
+
+Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.
+
+**Core principle:** Wait for the actual condition you care about, not a guess about how long it takes.
+
+## When to Use
+
+```dot
+digraph when_to_use {
+    "Test uses setTimeout/sleep?" [shape=diamond];
+    "Testing timing behavior?" [shape=diamond];
+    "Document WHY timeout needed" [shape=box];
+    "Use condition-based waiting" [shape=box];
+
+    "Test uses setTimeout/sleep?" -> "Testing timing behavior?" [label="yes"];
+    "Testing timing behavior?" -> "Document WHY timeout needed" [label="yes"];
+    "Testing timing behavior?" -> "Use condition-based waiting" [label="no"];
+}
+```
+
+**Use when:**
+- Tests have arbitrary delays (`setTimeout`, `sleep`, `time.sleep()`)
+- Tests are flaky (pass sometimes, fail under load)
+- Tests timeout when run in parallel
+- Waiting for async operations to complete
+
+**Don't use when:**
+- Testing actual timing behavior (debounce, throttle intervals)
+- Always document WHY if using arbitrary timeout
+
+## Core Pattern
+
+```typescript
+// ❌ BEFORE: Guessing at timing
+await new Promise(r => setTimeout(r, 50));
+const result = getResult();
+expect(result).toBeDefined();
+
+// ✅ AFTER: Waiting for condition
+await waitFor(() => getResult() !== undefined);
+const result = getResult();
+expect(result).toBeDefined();
+```
+
+## Quick Patterns
+
+| Scenario | Pattern |
+|----------|---------|
+| Wait for event | `waitFor(() => events.find(e => e.type === 'DONE'))` |
+| Wait for state | `waitFor(() => machine.state === 'ready')` |
+| Wait for count | `waitFor(() => items.length >= 5)` |
+| Wait for file | `waitFor(() => fs.existsSync(path))` |
+| Complex condition | `waitFor(() => obj.ready && obj.value > 10)` |
+
+## Implementation
+
+Generic polling function:
+```typescript
+async function waitFor<T>(
+  condition: () => T | undefined | null | false,
+  description: string,
+  timeoutMs = 5000
+): Promise<T> {
+  const startTime = Date.now();
+
+  while (true) {
+    const result = condition();
+    if (result) return result;
+
+    if (Date.now() - startTime > timeoutMs) {
+      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
+    }
+
+    await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
+  }
+}
+```
+
+See @example.ts for complete implementation with domain-specific helpers (`waitForEvent`, `waitForEventCount`, `waitForEventMatch`) from actual debugging session.
+
+## Common Mistakes
+
+**❌ Polling too fast:** `setTimeout(check, 1)` - wastes CPU
+**✅ Fix:** Poll every 10ms
+
+**❌ No timeout:** Loop forever if condition never met
+**✅ Fix:** Always include timeout with clear error
+
+**❌ Stale data:** Cache state before loop
+**✅ Fix:** Call getter inside loop for fresh data
+
+## When Arbitrary Timeout IS Correct
+
+```typescript
+// Tool ticks every 100ms - need 2 ticks to verify partial output
+await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
+await new Promise(r => setTimeout(r, 200));   // Then: wait for timed behavior
+// 200ms = 2 ticks at 100ms intervals - documented and justified
+```
+
+**Requirements:**
+1. First wait for triggering condition
+2. Based on known timing (not guessing)
+3. Comment explaining WHY
+
+## Real-World Impact
+
+From debugging session (2025-10-03):
+- Fixed 15 flaky tests across 3 files
+- Pass rate: 60% → 100%
+- Execution time: 40% faster
+- No more race conditions
--- a/skills/condition-based-waiting/example.ts
+++ b/skills/condition-based-waiting/example.ts
@@ -0,0 +1,158 @@
+// Complete implementation of condition-based waiting utilities
+// From: Lace test infrastructure improvements (2025-10-03)
+// Context: Fixed 15 flaky tests by replacing arbitrary timeouts
+
+import type { ThreadManager } from '~/threads/thread-manager';
+import type { LaceEvent, LaceEventType } from '~/threads/types';
+
+/**
+ * Wait for a specific event type to appear in thread
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param eventType - Type of event to wait for
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to the first matching event
+ *
+ * Example:
+ *   await waitForEvent(threadManager, agentThreadId, 'TOOL_RESULT');
+ */
+export function waitForEvent(
+  threadManager: ThreadManager,
+  threadId: string,
+  eventType: LaceEventType,
+  timeoutMs = 5000
+): Promise<LaceEvent> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const event = events.find((e) => e.type === eventType);
+
+      if (event) {
+        resolve(event);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(new Error(`Timeout waiting for ${eventType} event after ${timeoutMs}ms`));
+      } else {
+        setTimeout(check, 10); // Poll every 10ms for efficiency
+      }
+    };
+
+    check();
+  });
+}
+
+/**
+ * Wait for a specific number of events of a given type
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param eventType - Type of event to wait for
+ * @param count - Number of events to wait for
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to all matching events once count is reached
+ *
+ * Example:
+ *   // Wait for 2 AGENT_MESSAGE events (initial response + continuation)
+ *   await waitForEventCount(threadManager, agentThreadId, 'AGENT_MESSAGE', 2);
+ */
+export function waitForEventCount(
+  threadManager: ThreadManager,
+  threadId: string,
+  eventType: LaceEventType,
+  count: number,
+  timeoutMs = 5000
+): Promise<LaceEvent[]> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const matchingEvents = events.filter((e) => e.type === eventType);
+
+      if (matchingEvents.length >= count) {
+        resolve(matchingEvents);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(
+          new Error(
+            `Timeout waiting for ${count} ${eventType} events after ${timeoutMs}ms (got ${matchingEvents.length})`
+          )
+        );
+      } else {
+        setTimeout(check, 10);
+      }
+    };
+
+    check();
+  });
+}
+
+/**
+ * Wait for an event matching a custom predicate
+ * Useful when you need to check event data, not just type
+ *
+ * @param threadManager - The thread manager to query
+ * @param threadId - Thread to check for events
+ * @param predicate - Function that returns true when event matches
+ * @param description - Human-readable description for error messages
+ * @param timeoutMs - Maximum time to wait (default 5000ms)
+ * @returns Promise resolving to the first matching event
+ *
+ * Example:
+ *   // Wait for TOOL_RESULT with specific ID
+ *   await waitForEventMatch(
+ *     threadManager,
+ *     agentThreadId,
+ *     (e) => e.type === 'TOOL_RESULT' && e.data.id === 'call_123',
+ *     'TOOL_RESULT with id=call_123'
+ *   );
+ */
+export function waitForEventMatch(
+  threadManager: ThreadManager,
+  threadId: string,
+  predicate: (event: LaceEvent) => boolean,
+  description: string,
+  timeoutMs = 5000
+): Promise<LaceEvent> {
+  return new Promise((resolve, reject) => {
+    const startTime = Date.now();
+
+    const check = () => {
+      const events = threadManager.getEvents(threadId);
+      const event = events.find(predicate);
+
+      if (event) {
+        resolve(event);
+      } else if (Date.now() - startTime > timeoutMs) {
+        reject(new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`));
+      } else {
+        setTimeout(check, 10);
+      }
+    };
+
+    check();
+  });
+}
+
+// Usage example from actual debugging session:
+//
+// BEFORE (flaky):
+// ---------------
+// const messagePromise = agent.sendMessage('Execute tools');
+// await new Promise(r => setTimeout(r, 300)); // Hope tools start in 300ms
+// agent.abort();
+// await messagePromise;
+// await new Promise(r => setTimeout(r, 50));  // Hope results arrive in 50ms
+// expect(toolResults.length).toBe(2);         // Fails randomly
+//
+// AFTER (reliable):
+// ----------------
+// const messagePromise = agent.sendMessage('Execute tools');
+// await waitForEventCount(threadManager, threadId, 'TOOL_CALL', 2); // Wait for tools to start
+// agent.abort();
+// await messagePromise;
+// await waitForEventCount(threadManager, threadId, 'TOOL_RESULT', 2); // Wait for results
+// expect(toolResults.length).toBe(2); // Always succeeds
+//
+// Result: 60% pass rate → 100%, 40% faster execution
--- a/skills/defense-in-depth/SKILL.md
+++ b/skills/defense-in-depth/SKILL.md
@@ -0,0 +1,141 @@
+---
+name: defense-in-depth
+description: |
+  Multi-layer validation pattern - validates data at EVERY layer it passes through
+  to make bugs structurally impossible, not just caught.
+
+trigger: |
+  - Bug caused by invalid data reaching deep layers
+  - Single validation point can be bypassed
+  - Need to prevent bug category, not just instance
+
+skip_when: |
+  - Validation already exists at all layers → check other issues
+  - Simple input validation sufficient → add single check
+
+related:
+  complementary: [root-cause-tracing]
+---
+
+# Defense-in-Depth Validation
+
+## Overview
+
+When you fix a bug caused by invalid data, adding validation at one place feels sufficient. But that single check can be bypassed by different code paths, refactoring, or mocks.
+
+**Core principle:** Validate at EVERY layer data passes through. Make the bug structurally impossible.
+
+## Why Multiple Layers
+
+Single validation: "We fixed the bug"
+Multiple layers: "We made the bug impossible"
+
+Different layers catch different cases:
+- Entry validation catches most bugs
+- Business logic catches edge cases
+- Environment guards prevent context-specific dangers
+- Debug logging helps when other layers fail
+
+## The Four Layers
+
+### Layer 1: Entry Point Validation
+**Purpose:** Reject obviously invalid input at API boundary
+
+```typescript
+function createProject(name: string, workingDirectory: string) {
+  if (!workingDirectory || workingDirectory.trim() === '') {
+    throw new Error('workingDirectory cannot be empty');
+  }
+  if (!existsSync(workingDirectory)) {
+    throw new Error(`workingDirectory does not exist: ${workingDirectory}`);
+  }
+  if (!statSync(workingDirectory).isDirectory()) {
+    throw new Error(`workingDirectory is not a directory: ${workingDirectory}`);
+  }
+  // ... proceed
+}
+```
+
+### Layer 2: Business Logic Validation
+**Purpose:** Ensure data makes sense for this operation
+
+```typescript
+function initializeWorkspace(projectDir: string, sessionId: string) {
+  if (!projectDir) {
+    throw new Error('projectDir required for workspace initialization');
+  }
+  // ... proceed
+}
+```
+
+### Layer 3: Environment Guards
+**Purpose:** Prevent dangerous operations in specific contexts
+
+```typescript
+async function gitInit(directory: string) {
+  // In tests, refuse git init outside temp directories
+  if (process.env.NODE_ENV === 'test') {
+    const normalized = normalize(resolve(directory));
+    const tmpDir = normalize(resolve(tmpdir()));
+
+    if (!normalized.startsWith(tmpDir)) {
+      throw new Error(
+        `Refusing git init outside temp dir during tests: ${directory}`
+      );
+    }
+  }
+  // ... proceed
+}
+```
+
+### Layer 4: Debug Instrumentation
+**Purpose:** Capture context for forensics
+
+```typescript
+async function gitInit(directory: string) {
+  const stack = new Error().stack;
+  logger.debug('About to git init', {
+    directory,
+    cwd: process.cwd(),
+    stack,
+  });
+  // ... proceed
+}
+```
+
+## Applying the Pattern
+
+When you find a bug:
+
+1. **Trace the data flow** - Where does bad value originate? Where used?
+2. **Map all checkpoints** - List every point data passes through
+3. **Add validation at each layer** - Entry, business, environment, debug
+4. **Test each layer** - Try to bypass layer 1, verify layer 2 catches it
+
+## Example from Session
+
+Bug: Empty `projectDir` caused `git init` in source code
+
+**Data flow:**
+1. Test setup → empty string
+2. `Project.create(name, '')`
+3. `WorkspaceManager.createWorkspace('')`
+4. `git init` runs in `process.cwd()`
+
+**Four layers added:**
+- Layer 1: `Project.create()` validates not empty/exists/writable
+- Layer 2: `WorkspaceManager` validates projectDir not empty
+- Layer 3: `WorktreeManager` refuses git init outside tmpdir in tests
+- Layer 4: Stack trace logging before git init
+
+**Result:** All 1847 tests passed, bug impossible to reproduce
+
+## Key Insight
+
+All four layers were necessary. During testing, each layer caught bugs the others missed:
+- Different code paths bypassed entry validation
+- Mocks bypassed business logic checks
+- Edge cases on different platforms needed environment guards
+- Debug logging identified structural misuse
+
+**Don't stop at one validation point.** Add checks at every layer.
--- a/skills/dispatching-parallel-agents/SKILL.md
+++ b/skills/dispatching-parallel-agents/SKILL.md
@@ -0,0 +1,192 @@
+---
+name: dispatching-parallel-agents
+description: |
+  Concurrent investigation pattern - dispatches multiple AI agents to investigate
+  and fix independent problems simultaneously.
+
+trigger: |
+  - 3+ failures in different test files/subsystems
+  - Problems are independent (no shared state)
+  - Each can be investigated without context from others
+
+skip_when: |
+  - Failures are related/connected → single investigation
+  - Shared state between problems → sequential investigation
+  - <3 failures → investigate directly
+---
+
+# Dispatching Parallel Agents
+
+## Overview
+
+When you have multiple unrelated failures (different test files, different subsystems, different bugs), investigating them sequentially wastes time. Each investigation is independent and can happen in parallel.
+
+**Core principle:** Dispatch one agent per independent problem domain. Let them work concurrently.
+
+## When to Use
+
+```dot
+digraph when_to_use {
+    "Multiple failures?" [shape=diamond];
+    "Are they independent?" [shape=diamond];
+    "Single agent investigates all" [shape=box];
+    "One agent per problem domain" [shape=box];
+    "Can they work in parallel?" [shape=diamond];
+    "Sequential agents" [shape=box];
+    "Parallel dispatch" [shape=box];
+
+    "Multiple failures?" -> "Are they independent?" [label="yes"];
+    "Are they independent?" -> "Single agent investigates all" [label="no - related"];
+    "Are they independent?" -> "Can they work in parallel?" [label="yes"];
+    "Can they work in parallel?" -> "Parallel dispatch" [label="yes"];
+    "Can they work in parallel?" -> "Sequential agents" [label="no - shared state"];
+}
+```
+
+**Use when:**
+- 3+ test files failing with different root causes
+- Multiple subsystems broken independently
+- Each problem can be understood without context from others
+- No shared state between investigations
+
+**Don't use when:**
+- Failures are related (fix one might fix others)
+- Need to understand full system state
+- Agents would interfere with each other
+
+## The Pattern
+
+### 1. Identify Independent Domains
+
+Group failures by what's broken:
+- File A tests: Tool approval flow
+- File B tests: Batch completion behavior
+- File C tests: Abort functionality
+
+Each domain is independent - fixing tool approval doesn't affect abort tests.
+
+### 2. Create Focused Agent Tasks
+
+Each agent gets:
+- **Specific scope:** One test file or subsystem
+- **Clear goal:** Make these tests pass
+- **Constraints:** Don't change other code
+- **Expected output:** Summary of what you found and fixed
+
+### 3. Dispatch in Parallel
+
+```typescript
+// In AI agent environment with parallel task dispatch
+Task("Fix agent-tool-abort.test.ts failures")
+Task("Fix batch-completion-behavior.test.ts failures")
+Task("Fix tool-approval-race-conditions.test.ts failures")
+// All three run concurrently
+```
+
+### 4. Review and Integrate
+
+When agents return:
+- Read each summary
+- Verify fixes don't conflict
+- Run full test suite
+- Integrate all changes
+
+## Agent Prompt Structure
+
+Good agent prompts are:
+1. **Focused** - One clear problem domain
+2. **Self-contained** - All context needed to understand the problem
+3. **Specific about output** - What should the agent return?
+
+```markdown
+Fix the 3 failing tests in src/agents/agent-tool-abort.test.ts:
+
+1. "should abort tool with partial output capture" - expects 'interrupted at' in message
+2. "should handle mixed completed and aborted tools" - fast tool aborted instead of completed
+3. "should properly track pendingToolCount" - expects 3 results but gets 0
+
+These are timing/race condition issues. Your task:
+
+1. Read the test file and understand what each test verifies
+2. Identify root cause - timing issues or actual bugs?
+3. Fix by:
+   - Replacing arbitrary timeouts with event-based waiting
+   - Fixing bugs in abort implementation if found
+   - Adjusting test expectations if testing changed behavior
+
+Do NOT just increase timeouts - find the real issue.
+
+Return: Summary of what you found and what you fixed.
+```
+
+## Common Mistakes
+
+**❌ Too broad:** "Fix all the tests" - agent gets lost
+**✅ Specific:** "Fix agent-tool-abort.test.ts" - focused scope
+
+**❌ No context:** "Fix the race condition" - agent doesn't know where
+**✅ Context:** Paste the error messages and test names
+
+**❌ No constraints:** Agent might refactor everything
+**✅ Constraints:** "Do NOT change production code" or "Fix tests only"
+
+**❌ Vague output:** "Fix it" - you don't know what changed
+**✅ Specific:** "Return summary of root cause and changes"
+
+## When NOT to Use
+
+**Related failures:** Fixing one might fix others - investigate together first
+**Need full context:** Understanding requires seeing entire system
+**Exploratory debugging:** You don't know what's broken yet
+**Shared state:** Agents would interfere (editing same files, using same resources)
+
+## Real Example from Session
+
+**Scenario:** 6 test failures across 3 files after major refactoring
+
+**Failures:**
+- agent-tool-abort.test.ts: 3 failures (timing issues)
+- batch-completion-behavior.test.ts: 2 failures (tools not executing)
+- tool-approval-race-conditions.test.ts: 1 failure (execution count = 0)
+
+**Decision:** Independent domains - abort logic separate from batch completion separate from race conditions
+
+**Dispatch:**
+```
+Agent 1 → Fix agent-tool-abort.test.ts
+Agent 2 → Fix batch-completion-behavior.test.ts
+Agent 3 → Fix tool-approval-race-conditions.test.ts
+```
+
+**Results:**
+- Agent 1: Replaced timeouts with event-based waiting
+- Agent 2: Fixed event structure bug (threadId in wrong place)
+- Agent 3: Added wait for async tool execution to complete
+
+**Integration:** All fixes independent, no conflicts, full suite green
+
+**Time saved:** 3 problems solved in parallel vs sequentially
+
+## Key Benefits
+
+1. **Parallelization** - Multiple investigations happen simultaneously
+2. **Focus** - Each agent has narrow scope, less context to track
+3. **Independence** - Agents don't interfere with each other
+4. **Speed** - 3 problems solved in time of 1
+
+## Verification
+
+After agents return:
+1. **Review each summary** - Understand what changed
+2. **Check for conflicts** - Did agents edit same code?
+3. **Run full suite** - Verify all fixes work together
+4. **Spot check** - Agents can make systematic errors
+
+## Real-World Impact
+
+From debugging session (2025-10-03):
+- 6 failures across 3 files
+- 3 agents dispatched in parallel
+- All investigations completed concurrently
+- All fixes integrated successfully
+- Zero conflicts between agent changes
--- a/skills/executing-plans/SKILL.md
+++ b/skills/executing-plans/SKILL.md
@@ -0,0 +1,206 @@
+---
+name: executing-plans
+description: |
+  Controlled plan execution with human review checkpoints - loads plan, executes
+  in batches, pauses for feedback. Supports one-go (autonomous) or batch modes.
+
+trigger: |
+  - Have a plan file ready to execute
+  - Want human review between task batches
+  - Need structured checkpoints during implementation
+
+skip_when: |
+  - Same session with independent tasks → use subagent-driven-development
+  - No plan exists → use writing-plans first
+  - Plan needs revision → use brainstorming first
+
+sequence:
+  after: [writing-plans, pre-dev-task-breakdown]
+
+related:
+  similar: [subagent-driven-development]
+---
+
+# Executing Plans
+
+## Overview
+
+Load plan, review critically, choose execution mode, execute tasks with code review.
+
+**Core principle:** User chooses between autonomous execution or batch execution with human review checkpoints.
+
+**Two execution modes:**
+- **One-go (autonomous):** Execute all batches continuously with code review, report only at completion
+- **Batch (with review):** Execute one batch, code review, pause for human feedback, repeat
+
+**Announce at start:** "I'm using the executing-plans skill to implement this plan."
+
+## The Process
+
+### Step 1: Load and Review Plan
+1. Read plan file
+2. Review critically - identify any questions or concerns about the plan
+3. If concerns: Raise them with your human partner before starting
+4. If no concerns: Create TodoWrite and proceed to Step 2
+
+### Step 2: Choose Execution Mode (MANDATORY)
+
+**⚠️ THIS STEP IS NON-NEGOTIABLE. You MUST use `AskUserQuestion` before executing ANY tasks.**
+
+Use `AskUserQuestion` to determine execution mode:
+
+```
+AskUserQuestion(
+  questions: [{
+    header: "Mode",
+    question: "How would you like to execute this plan?",
+    options: [
+      { label: "One-go (autonomous)", description: "Execute all batches automatically with code review between each, no human review until completion" },
+      { label: "Batch (with review)", description: "Execute in batches, pause for human review after each batch's code review" }
+    ],
+    multiSelect: false
+  }]
+)
+```
+
+**Based on response:**
+- **One-go** → Execute all batches continuously (Steps 3-4 loop until done, skip Step 5)
+- **Batch** → Execute one batch, report, wait for feedback (Steps 3-5 loop)
+
+### Why AskUserQuestion is Mandatory (Not "Contextual Guidance")
+
+**This is a structural checkpoint, not optional UX polish.**
+
+User saying "don't wait", "don't ask questions", or "just execute" does NOT skip this step because:
+
+1. **Execution mode affects architecture** - One-go vs batch determines review checkpoints, error recovery paths, and rollback points
+2. **Implicit intent ≠ explicit choice** - "Don't wait" might mean "use one-go" OR "ask quickly and proceed"
+3. **AskUserQuestion takes 3 seconds** - It's not an interruption, it's a confirmation
+4. **Emergency pressure is exactly when mistakes happen** - Structural gates exist FOR high-pressure moments
+
+**Common Rationalizations That Mean You're About to Violate This Rule:**
+
+| Rationalization | Reality |
+|-----------------|---------|
+| "User intent is crystal clear" | Intent is not the same as explicit selection. Ask anyway. |
+| "This is contextual guidance, not absolute law" | Wrong. It says MANDATORY. That means mandatory. |
+| "Asking would violate their 'don't ask' instruction" | AskUserQuestion is a 3-second structural gate, not a conversation. |
+| "Skills are tools, not bureaucratic checklists" | This skill IS the checklist. Follow it. |
+| "Interpreting spirit over letter" | The spirit IS the letter. Use AskUserQuestion. |
+| "User already chose by saying 'just execute'" | Verbal shorthand ≠ structured mode selection. Ask. |
+
+**If you catch yourself thinking any of these → STOP → Use AskUserQuestion anyway.**
+
+### Step 3: Execute Batch
+**Default: First 3 tasks**
+
+**Agent Selection:** For each task, dispatch to the appropriate specialized agent based on task type:
+
+| Task Type | Preferred Agent | Fallback |
+|-----------|-----------------|----------|
+| Backend (Go) | `ring-dev-team:backend-engineer-golang` | `general-purpose` |
+| Backend (Python) | `ring-dev-team:backend-engineer-python` | `general-purpose` |
+| Backend (TypeScript) | `ring-dev-team:backend-engineer-typescript` | `general-purpose` |
+| Backend (other) | `ring-dev-team:backend-engineer` | `general-purpose` |
+| Frontend (TypeScript/React) | `ring-dev-team:frontend-engineer-typescript` | `general-purpose` |
+| Frontend (other) | `ring-dev-team:frontend-engineer` | `general-purpose` |
+| Infrastructure | `ring-dev-team:devops-engineer` | `general-purpose` |
+| Testing | `ring-dev-team:qa-analyst` | `general-purpose` |
+| Reliability | `ring-dev-team:sre` | `general-purpose` |
+
+**Note:** If plan specifies a recommended agent in its header, use that. If `ring-dev-team` plugin is unavailable, fall back to `general-purpose`.
+
+For each task:
+1. Mark as in_progress
+2. Dispatch to appropriate specialized agent (or fallback)
+3. Agent follows each step exactly (plan has bite-sized steps)
+4. Run verifications as specified
+5. Mark as completed
+
+### Step 4: Run Code Review
+**After each batch execution, REQUIRED:**
+
+1. **Dispatch all 3 reviewers in parallel:**
+   - REQUIRED SUB-SKILL: Use ring-default:requesting-code-review
+   - All reviewers run simultaneously (not sequentially)
+   - Wait for all to complete
+
+2. **Handle findings by severity:**
+
+**Critical/High/Medium Issues:**
+- Fix immediately (do NOT add TODO comments)
+- Re-run all 3 reviewers in parallel after fixes
+- Repeat until no Critical/High/Medium issues remain
+
+**Low Issues:**
+- Add `TODO(review):` comments in code
+- Format: `TODO(review): [Issue description] (reported by [reviewer] on [date], severity: Low)`
+- Track tech debt at code location for visibility
+
+**Cosmetic/Nitpick Issues:**
+- Add `FIXME(nitpick):` comments in code
+- Format: `FIXME(nitpick): [Issue description] (reported by [reviewer] on [date], severity: Cosmetic)`
+- Low-priority improvements tracked inline
+
+3. **Only proceed when:**
+   - Zero Critical/High/Medium issues remain
+   - All Low/Cosmetic issues have TODO/FIXME comments
+
+### Step 5: Report and Continue
+
+**Behavior depends on execution mode chosen in Step 2:**
+
+**One-go mode:**
+- Log batch completion internally
+- Immediately proceed to next batch (Step 3)
+- Continue until all tasks complete
+- Only report to human at final completion
+
+**Batch mode:**
+- Show what was implemented
+- Show verification output
+- Show code review results (severity breakdown)
+- Say: "Ready for feedback."
+- Wait for human response
+- Apply changes if requested
+- Then proceed to next batch (Step 3)
+
+### Step 6: Complete Development
+
+After all tasks complete and verified:
+- Announce: "I'm using the finishing-a-development-branch skill to complete this work."
+- **REQUIRED SUB-SKILL:** Use ring-default:finishing-a-development-branch
+- Follow that skill to verify tests, present options, execute choice
+
+## When to Stop and Ask for Help
+
+**STOP executing immediately when:**
+- Hit a blocker mid-batch (missing dependency, test fails, instruction unclear)
+- Plan has critical gaps preventing starting
+- You don't understand an instruction
+- Verification fails repeatedly
+
+**Ask for clarification rather than guessing.**
+
+## When to Revisit Earlier Steps
+
+**Return to Review (Step 1) when:**
+- Partner updates the plan based on your feedback
+- Fundamental approach needs rethinking
+
+**Don't force through blockers** - stop and ask.
+
+## Remember
+- Review plan critically first
+- **MANDATORY: Use `AskUserQuestion` for execution mode** - NO exceptions, even if user says "don't ask"
+- **Use specialized agents:** Prefer `ring-dev-team:*` agents over `general-purpose` when available
+- Follow plan steps exactly
+- Don't skip verifications
+- Run code review after each batch (all 3 reviewers in parallel)
+- Fix Critical/High/Medium immediately - no TODO comments for these
+- Only Low issues get TODO comments, Cosmetic get FIXME
+- Reference skills when plan says to
+- **One-go mode:** Continue autonomously until completion
+- **Batch mode:** Report and wait for feedback between batches
+- Stop when blocked, don't guess
+- **If rationalizing why to skip AskUserQuestion → You're wrong → Ask anyway**
--- a/skills/finishing-a-development-branch/SKILL.md
+++ b/skills/finishing-a-development-branch/SKILL.md
@@ -0,0 +1,215 @@
+---
+name: finishing-a-development-branch
+description: |
+  Branch completion workflow - guides merge/PR/cleanup decisions after implementation
+  is verified complete.
+
+trigger: |
+  - Implementation complete (tests passing)
+  - Ready to integrate work to main branch
+  - Need to decide: merge, PR, or more work
+
+skip_when: |
+  - Tests not passing → fix first
+  - Implementation incomplete → continue development
+  - Already merged → proceed to next task
+
+sequence:
+  after: [verification-before-completion, requesting-code-review]
+---
+
+# Finishing a Development Branch
+
+## Overview
+
+Guide completion of development work by presenting clear options and handling chosen workflow.
+
+**Core principle:** Verify tests → Present options → Execute choice → Clean up.
+
+**Announce at start:** "I'm using the finishing-a-development-branch skill to complete this work."
+
+## The Process
+
+### Step 1: Verify Tests
+
+**Before presenting options, verify tests pass:**
+
+```bash
+# Run project's test suite
+npm test / cargo test / pytest / go test ./...
+```
+
+**If tests fail:**
+```
+Tests failing (<N> failures). Must fix before completing:
+
+[Show failures]
+
+Cannot proceed with merge/PR until tests pass.
+```
+
+Stop. Don't proceed to Step 2.
+
+**If tests pass:** Continue to Step 2.
+
+### Step 2: Determine Base Branch
+
+```bash
+# Try common base branches
+git merge-base HEAD main 2>/dev/null || git merge-base HEAD master 2>/dev/null
+```
+
+Or ask: "This branch split from main - is that correct?"
+
+### Step 3: Present Options
+
+Present exactly these 4 options:
+
+```
+Implementation complete. What would you like to do?
+
+1. Merge back to <base-branch> locally
+2. Push and create a Pull Request
+3. Keep the branch as-is (I'll handle it later)
+4. Discard this work
+
+Which option?
+```
+
+**Don't add explanation** - keep options concise.
+
+### Step 4: Execute Choice
+
+#### Option 1: Merge Locally
+
+```bash
+# Switch to base branch
+git checkout <base-branch>
+
+# Pull latest
+git pull
+
+# Merge feature branch
+git merge <feature-branch>
+
+# Verify tests on merged result
+<test command>
+
+# If tests pass
+git branch -d <feature-branch>
+```
+
+Then: Cleanup worktree (Step 5)
+
+#### Option 2: Push and Create PR
+
+```bash
+# Push branch
+git push -u origin <feature-branch>
+
+# Create PR
+gh pr create --title "<title>" --body "$(cat <<'EOF'
+## Summary
+<2-3 bullets of what changed>
+
+## Test Plan
+- [ ] <verification steps>
+EOF
+)"
+```
+
+Then: Cleanup worktree (Step 5)
+
+#### Option 3: Keep As-Is
+
+Report: "Keeping branch <name>. Worktree preserved at <path>."
+
+**Don't cleanup worktree.**
+
+#### Option 4: Discard
+
+**Confirm first:**
+```
+This will permanently delete:
+- Branch <name>
+- All commits: <commit-list>
+- Worktree at <path>
+
+Type 'discard' to confirm.
+```
+
+Wait for exact confirmation.
+
+If confirmed:
+```bash
+git checkout <base-branch>
+git branch -D <feature-branch>
+```
+
+Then: Cleanup worktree (Step 5)
+
+### Step 5: Cleanup Worktree
+
+**For Options 1, 2, 4:**
+
+Check if in worktree:
+```bash
+git worktree list | grep $(git branch --show-current)
+```
+
+If yes:
+```bash
+git worktree remove <worktree-path>
+```
+
+**For Option 3:** Keep worktree.
+
+## Quick Reference
+
+| Option | Merge | Push | Keep Worktree | Cleanup Branch |
+|--------|-------|------|---------------|----------------|
+| 1. Merge locally | ✓ | - | - | ✓ |
+| 2. Create PR | - | ✓ | ✓ | - |
+| 3. Keep as-is | - | - | ✓ | - |
+| 4. Discard | - | - | - | ✓ (force) |
+
+## Common Mistakes
+
+**Skipping test verification**
+- **Problem:** Merge broken code, create failing PR
+- **Fix:** Always verify tests before offering options
+
+**Open-ended questions**
+- **Problem:** "What should I do next?" → ambiguous
+- **Fix:** Present exactly 4 structured options
+
+**Automatic worktree cleanup**
+- **Problem:** Remove worktree when might need it (Option 2, 3)
+- **Fix:** Only cleanup for Options 1 and 4
+
+**No confirmation for discard**
+- **Problem:** Accidentally delete work
+- **Fix:** Require typed "discard" confirmation
+
+## Red Flags
+
+**Never:**
+- Proceed with failing tests
+- Merge without verifying tests on result
+- Delete work without confirmation
+- Force-push without explicit request
+
+**Always:**
+- Verify tests before offering options
+- Present exactly 4 options
+- Get typed confirmation for Option 4
+- Clean up worktree for Options 1 & 4 only
+
+## Integration
+
+**Called by:**
+- **subagent-driven-development** (Step 7) - After all tasks complete
+- **executing-plans** (Step 5) - After all batches complete
+
+**Pairs with:**
+- **using-git-worktrees** - Cleans up worktree created by that skill
--- a/skills/receiving-code-review/SKILL.md
+++ b/skills/receiving-code-review/SKILL.md
@@ -0,0 +1,220 @@
+---
+name: receiving-code-review
+description: |
+  Review reception protocol - requires technical verification before implementing
+  suggestions. Prevents performative agreement and blind implementation.
+
+trigger: |
+  - Received code review feedback
+  - About to implement reviewer suggestions
+  - Feedback seems unclear or technically questionable
+
+skip_when: |
+  - Feedback is clear and obviously correct → implement directly
+  - No feedback received → continue working
+---
+
+# Code Review Reception
+
+## Overview
+
+Code review requires technical evaluation, not emotional performance.
+
+**Core principle:** Verify before implementing. Ask before assuming. Technical correctness over social comfort.
+
+## The Response Pattern
+
+```
+WHEN receiving code review feedback:
+
+1. READ: Complete feedback without reacting
+2. UNDERSTAND: Restate requirement in own words (or ask)
+3. VERIFY: Check against codebase reality
+4. EVALUATE: Technically sound for THIS codebase?
+5. RESPOND: Technical acknowledgment or reasoned pushback
+6. IMPLEMENT: One item at a time, test each
+```
+
+## Forbidden Responses
+
+**NEVER:**
+- "You're absolutely right!" (explicit CLAUDE.md violation)
+- "Great point!" / "Excellent feedback!" (performative)
+- "Let me implement that now" (before verification)
+
+**INSTEAD:**
+- Restate the technical requirement
+- Ask clarifying questions
+- Push back with technical reasoning if wrong
+- Just start working (actions > words)
+
+## Handling Unclear Feedback
+
+```
+IF any item is unclear:
+  STOP - do not implement anything yet
+  ASK for clarification on unclear items
+
+WHY: Items may be related. Partial understanding = wrong implementation.
+```
+
+**Example:**
+```
+your human partner: "Fix 1-6"
+You understand 1,2,3,6. Unclear on 4,5.
+
+❌ WRONG: Implement 1,2,3,6 now, ask about 4,5 later
+✅ RIGHT: "I understand items 1,2,3,6. Need clarification on 4 and 5 before proceeding."
+```
+
+## Source-Specific Handling
+
+### From your human partner
+- **Trusted** - implement after understanding
+- **Still ask** if scope unclear
+- **No performative agreement**
+- **Skip to action** or technical acknowledgment
+
+### From External Reviewers
+```
+BEFORE implementing:
+  1. Check: Technically correct for THIS codebase?
+  2. Check: Breaks existing functionality?
+  3. Check: Reason for current implementation?
+  4. Check: Works on all platforms/versions?
+  5. Check: Does reviewer understand full context?
+
+IF suggestion seems wrong:
+  Push back with technical reasoning
+
+IF can't easily verify:
+  Say so: "I can't verify this without [X]. Should I [investigate/ask/proceed]?"
+
+IF conflicts with your human partner's prior decisions:
+  Stop and discuss with your human partner first
+```
+
+**your human partner's rule:** "External feedback - be skeptical, but check carefully"
+
+## YAGNI Check for "Professional" Features
+
+```
+IF reviewer suggests "implementing properly":
+  grep codebase for actual usage
+
+  IF unused: "This endpoint isn't called. Remove it (YAGNI)?"
+  IF used: Then implement properly
+```
+
+**your human partner's rule:** "You and reviewer both report to me. If we don't need this feature, don't add it."
+
+## Implementation Order
+
+```
+FOR multi-item feedback:
+  1. Clarify anything unclear FIRST
+  2. Then implement in this order:
+     - Blocking issues (breaks, security)
+     - Simple fixes (typos, imports)
+     - Complex fixes (refactoring, logic)
+  3. Test each fix individually
+  4. Verify no regressions
+```
+
+## When To Push Back
+
+Push back when:
+- Suggestion breaks existing functionality
+- Reviewer lacks full context
+- Violates YAGNI (unused feature)
+- Technically incorrect for this stack
+- Legacy/compatibility reasons exist
+- Conflicts with your human partner's architectural decisions
+
+**How to push back:**
+- Use technical reasoning, not defensiveness
+- Ask specific questions
+- Reference working tests/code
+- Involve your human partner if architectural
+
+**Signal if uncomfortable pushing back out loud:** "Strange things are afoot at the Circle K"
+
+## Acknowledging Correct Feedback
+
+When feedback IS correct:
+```
+✅ "Fixed. [Brief description of what changed]"
+✅ "Good catch - [specific issue]. Fixed in [location]."
+✅ [Just fix it and show in the code]
+
+❌ "You're absolutely right!"
+❌ "Great point!"
+❌ "Thanks for catching that!"
+❌ "Thanks for [anything]"
+❌ ANY gratitude expression
+```
+
+**Why no thanks:** Actions speak. Just fix it. The code itself shows you heard the feedback.
+
+**If you catch yourself about to write "Thanks":** DELETE IT. State the fix instead.
+
+## Gracefully Correcting Your Pushback
+
+If you pushed back and were wrong:
+```
+✅ "You were right - I checked [X] and it does [Y]. Implementing now."
+✅ "Verified this and you're correct. My initial understanding was wrong because [reason]. Fixing."
+
+❌ Long apology
+❌ Defending why you pushed back
+❌ Over-explaining
+```
+
+State the correction factually and move on.
+
+## Common Mistakes
+
+| Mistake | Fix |
+|---------|-----|
+| Performative agreement | State requirement or just act |
+| Blind implementation | Verify against codebase first |
+| Batch without testing | One at a time, test each |
+| Assuming reviewer is right | Check if breaks things |
+| Avoiding pushback | Technical correctness > comfort |
+| Partial implementation | Clarify all items first |
+| Can't verify, proceed anyway | State limitation, ask for direction |
+
+## Real Examples
+
+**Performative Agreement (Bad):**
+```
+Reviewer: "Remove legacy code"
+❌ "You're absolutely right! Let me remove that..."
+```
+
+**Technical Verification (Good):**
+```
+Reviewer: "Remove legacy code"
+✅ "Checking... build target is 10.15+, this API needs 13+. Need legacy for backward compat. Current impl has wrong bundle ID - fix it or drop pre-13 support?"
+```
+
+**YAGNI (Good):**
+```
+Reviewer: "Implement proper metrics tracking with database, date filters, CSV export"
+✅ "Grepped codebase - nothing calls this endpoint. Remove it (YAGNI)? Or is there usage I'm missing?"
+```
+
+**Unclear Item (Good):**
+```
+your human partner: "Fix items 1-6"
+You understand 1,2,3,6. Unclear on 4,5.
+✅ "Understand 1,2,3,6. Need clarification on 4 and 5 before implementing."
+```
+
+## The Bottom Line
+
+**External feedback = suggestions to evaluate, not orders to follow.**
+
+Verify. Question. Then implement.
+
+No performative agreement. Technical rigor always.
--- a/skills/requesting-code-review/SKILL.md
+++ b/skills/requesting-code-review/SKILL.md
@@ -0,0 +1,205 @@
+---
+name: requesting-code-review
+description: |
+  Parallel code review dispatch - sends to 3 specialized reviewers (code, business-logic,
+  security) simultaneously for comprehensive feedback.
+
+trigger: |
+  - After completing major feature implementation
+  - After completing task in subagent-driven-development
+  - Before merge to main branch
+  - After fixing complex bug
+
+skip_when: |
+  - Trivial change (<20 lines, no logic change) → verify manually
+  - Still in development → finish implementation first
+  - Already reviewed and no changes since → proceed
+
+sequence:
+  after: [verification-before-completion]
+  before: [finishing-a-development-branch]
+---
+
+# Requesting Code Review
+
+Dispatch all three reviewer subagents in parallel for fast, comprehensive feedback.
+
+**Core principle:** Review early, review often. Parallel execution provides 3x faster feedback with comprehensive coverage.
+
+## Review Order (Parallel Execution)
+
+Three specialized reviewers run in **parallel** for maximum speed:
+
+**1. ring-default:code-reviewer** (Foundation)
+- **Focus:** Architecture, design patterns, code quality, maintainability
+- **Model:** Opus (required for comprehensive analysis)
+- **Reports:** Code quality issues, architectural concerns
+
+**2. ring-default:business-logic-reviewer** (Correctness)
+- **Focus:** Domain correctness, business rules, edge cases, requirements
+- **Model:** Opus (required for deep domain understanding)
+- **Reports:** Business logic issues, requirement gaps
+
+**3. ring-default:security-reviewer** (Safety)
+- **Focus:** Vulnerabilities, authentication, input validation, OWASP risks
+- **Model:** Opus (required for thorough security analysis)
+- **Reports:** Security vulnerabilities, OWASP risks
+
+**Critical:** All three reviewers run simultaneously in a single message with 3 Task tool calls. Each reviewer works independently and returns its report. After all complete, aggregate findings and handle by severity.
+
+## When to Request Review
+
+**Mandatory:**
+- After each task in subagent-driven development
+- After completing major feature
+
+**Optional but valuable:**
+- When stuck (fresh perspective)
+- Before refactoring (baseline check)
+- After fixing complex bug
+
+## Which Reviewers to Use
+
+**Use all three reviewers (parallel) when:**
+- Implementing new features (comprehensive check)
+- Before merge to main (final validation)
+- After completing major milestone
+
+**Use subset when domain doesn't apply:**
+- **Code-reviewer only:** Documentation changes, config updates
+- **Code + Business (skip security):** Internal scripts with no external input
+- **Code + Security (skip business):** Infrastructure/DevOps changes
+
+**Default: Use all three in parallel.** Only skip reviewers when you're certain their domain doesn't apply.
+
+**Two ways to run parallel reviews:**
+1. **Direct parallel dispatch:** Launch 3 Task calls in single message (explicit control)
+2. **/ring-default:codereview command:** Command that provides workflow instructions for parallel review (convenience)
+
+## How to Request
+
+**1. Get git SHAs:**
+```bash
+BASE_SHA=$(git rev-parse HEAD~1)  # or origin/main
+HEAD_SHA=$(git rev-parse HEAD)
+```
+
+**2. Dispatch all three reviewers in parallel:**
+
+**CRITICAL: Use a single message with 3 Task tool calls to launch all reviewers simultaneously.**
+
+```
+# Single message with 3 parallel Task calls:
+
+Task tool #1 (ring-default:code-reviewer):
+  model: "opus"
+  description: "Review code quality and architecture"
+  prompt: |
+    WHAT_WAS_IMPLEMENTED: [What you built]
+    PLAN_OR_REQUIREMENTS: [Requirements/plan reference]
+    BASE_SHA: [starting commit]
+    HEAD_SHA: [current commit]
+    DESCRIPTION: [brief summary]
+
+Task tool #2 (ring-default:business-logic-reviewer):
+  model: "opus"
+  description: "Review business logic correctness"
+  prompt: |
+    [Same parameters as above]
+
+Task tool #3 (ring-default:security-reviewer):
+  model: "opus"
+  description: "Review security vulnerabilities"
+  prompt: |
+    [Same parameters as above]
+```
+
+**All three reviewers execute simultaneously. Wait for all to complete.**
+
+**3. Aggregate findings from all three reviews:**
+
+Collect all issues by severity across all three reviewers:
+- **Critical issues:** [List from all 3 reviewers]
+- **High issues:** [List from all 3 reviewers]
+- **Medium issues:** [List from all 3 reviewers]
+- **Low issues:** [List from all 3 reviewers]
+- **Cosmetic/Nitpick issues:** [List from all 3 reviewers]
+
+**4. Handle by severity:**
+
+**Critical/High/Medium → Fix immediately:**
+- Dispatch fix subagent to address all Critical/High/Medium issues
+- After fixes complete, re-run all 3 reviewers in parallel
+- Repeat until no Critical/High/Medium issues remain
+
+**Low issues → Add TODO comments in code:**
+```javascript
+// TODO(review): [Issue description from reviewer]
+// Reported by: [reviewer-name] on [date]
+// Location: file:line
+```
+
+**Cosmetic/Nitpick → Add FIXME comments in code:**
+```javascript
+// FIXME(nitpick): [Issue description from reviewer]
+// Reported by: [reviewer-name] on [date]
+// Location: file:line
+```
+
+**Push back on incorrect feedback:**
+- If reviewer is wrong, provide reasoning and evidence
+- Request clarification for ambiguous feedback
+- Security concerns require extra scrutiny before dismissing
+
+## Integration with Workflows
+
+**Subagent-Driven Development:**
+- Review after EACH task using parallel dispatch (all 3 reviewers at once)
+- Aggregate findings across all reviewers
+- Fix Critical/High/Medium, add TODO/FIXME for Low/Cosmetic
+- Move to next task only after all Critical/High/Medium resolved
+
+**Executing Plans:**
+- Review after each batch (3 tasks) using parallel dispatch
+- Handle severity-based fixes before next batch
+- Track Low/Cosmetic issues in code comments
+
+**Ad-Hoc Development:**
+- Review before merge using parallel dispatch (all three reviewers)
+- Can use single reviewer if domain-specific (e.g., docs → code-reviewer only)
+
+## Red Flags
+
+**Never:**
+- Dispatch reviewers sequentially (wastes time - use parallel!)
+- Proceed to next task with unfixed Critical/High/Medium issues
+- Skip security review for "just refactoring" (may expose vulnerabilities)
+- Skip code review because "code is simple"
+- Forget to add TODO/FIXME comments for Low/Cosmetic issues
+- Argue with valid technical/security feedback without evidence
+
+**Always:**
+- Launch all 3 reviewers in a single message (3 Task calls)
+- Specify `model: "opus"` for each reviewer
+- Wait for all reviewers to complete before aggregating
+- Fix Critical/High/Medium immediately
+- Add TODO comments for Low issues
+- Add FIXME comments for Cosmetic/Nitpick issues
+- Re-run all 3 reviewers after fixing Critical/High/Medium issues
+
+**If reviewer wrong:**
+- Push back with technical reasoning
+- Show code/tests that prove it works
+- Request clarification
+- Security concerns require extra scrutiny before dismissing
+
+## Re-running Reviews After Fixes
+
+**After fixing Critical/High/Medium issues:**
+- Re-run all 3 reviewers in parallel (same as initial review)
+- Don't cherry-pick which reviewers to re-run
+- Parallel execution makes full re-review fast (~3-5 minutes total)
+
+**After adding TODO/FIXME comments:**
+- Commit with message noting review completion and tech debt tracking
+- No need to re-run reviews for comment additions
--- a/skills/root-cause-tracing/SKILL.md
+++ b/skills/root-cause-tracing/SKILL.md
@@ -0,0 +1,190 @@
+---
+name: root-cause-tracing
+description: |
+  Backward call-chain tracing - systematically trace bugs from error location back
+  through call stack to original trigger. Adds instrumentation when needed.
+
+trigger: |
+  - Error happens deep in execution (not at entry point)
+  - Stack trace shows long call chain
+  - Unclear where invalid data originated
+  - systematic-debugging Phase 1 leads you here
+
+skip_when: |
+  - Bug at entry point → use systematic-debugging directly
+  - Haven't started investigation → use systematic-debugging first
+  - Root cause is obvious → just fix it
+
+sequence:
+  after: [systematic-debugging]
+
+related:
+  complementary: [systematic-debugging]
+---
+
+# Root Cause Tracing
+
+## Overview
+
+Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.
+
+**Core principle:** Trace backward through the call chain until you find the original trigger, then fix at the source.
+
+## When to Use
+
+**Use root-cause-tracing when:**
+- Error happens deep in execution (not at entry point)
+- Stack trace shows long call chain
+- Unclear where invalid data originated
+- systematic-debugging Phase 1 leads you here
+
+**Relationship with systematic-debugging:**
+- root-cause-tracing is a **SUB-SKILL** of systematic-debugging
+- Use during **systematic-debugging Phase 1, Step 5** (Trace Data Flow)
+- Can also use standalone if you KNOW bug is deep-stack issue
+- After tracing to source, **return to systematic-debugging Phase 2**
+
+**When NOT to use:**
+- Bug appears at entry point → Use systematic-debugging Phase 1 directly
+- You haven't started systematic-debugging yet → Start there first
+- Root cause is obvious → Just fix it
+- Still gathering evidence → Continue systematic-debugging Phase 1
+
+## The Tracing Process
+
+### 1. Observe the Symptom
+```
+Error: git init failed in /Users/jesse/project/packages/core
+```
+
+### 2. Find Immediate Cause
+**What code directly causes this?**
+```typescript
+await execFileAsync('git', ['init'], { cwd: projectDir });
+```
+
+### 3. Ask: What Called This?
+```typescript
+WorktreeManager.createSessionWorktree(projectDir, sessionId)
+  → called by Session.initializeWorkspace()
+  → called by Session.create()
+  → called by test at Project.create()
+```
+
+### 4. Keep Tracing Up
+**What value was passed?**
+- `projectDir = ''` (empty string!)
+- Empty string as `cwd` resolves to `process.cwd()`
+- That's the source code directory!
+
+### 5. Find Original Trigger
+**Where did empty string come from?**
+```typescript
+const context = setupCoreTest(); // Returns { tempDir: '' }
+Project.create('name', context.tempDir); // Accessed before beforeEach!
+```
+
+## Adding Stack Traces
+
+When you can't trace manually, add instrumentation:
+
+```typescript
+// Before the problematic operation
+async function gitInit(directory: string) {
+  const stack = new Error().stack;
+  console.error('DEBUG git init:', {
+    directory,
+    cwd: process.cwd(),
+    nodeEnv: process.env.NODE_ENV,
+    stack,
+  });
+
+  await execFileAsync('git', ['init'], { cwd: directory });
+}
+```
+
+**Critical:** Use `console.error()` in tests (not logger - may not show)
+
+**Run and capture:**
+```bash
+npm test 2>&1 | grep 'DEBUG git init'
+```
+
+**Analyze stack traces:**
+- Look for test file names
+- Find the line number triggering the call
+- Identify the pattern (same test? same parameter?)
+
+## Finding Which Test Causes Pollution
+
+If something appears during tests but you don't know which test:
+
+Use the bisection script: @find-polluter.sh
+
+```bash
+./find-polluter.sh '.git' 'src/**/*.test.ts'
+```
+
+Runs tests one-by-one, stops at first polluter. See script for usage.
+
+## Real Example: Empty projectDir
+
+**Symptom:** `.git` created in `packages/core/` (source code)
+
+**Trace chain:**
+1. `git init` runs in `process.cwd()` ← empty cwd parameter
+2. WorktreeManager called with empty projectDir
+3. Session.create() passed empty string
+4. Test accessed `context.tempDir` before beforeEach
+5. setupCoreTest() returns `{ tempDir: '' }` initially
+
+**Root cause:** Top-level variable initialization accessing empty value
+
+**Fix:** Made tempDir a getter that throws if accessed before beforeEach
+
+**Also added defense-in-depth:**
+- Layer 1: Project.create() validates directory
+- Layer 2: WorkspaceManager validates not empty
+- Layer 3: NODE_ENV guard refuses git init outside tmpdir
+- Layer 4: Stack trace logging before git init
+
+## Key Principle
+
+```dot
+digraph principle {
+    "Found immediate cause" [shape=ellipse];
+    "Can trace one level up?" [shape=diamond];
+    "Trace backwards" [shape=box];
+    "Is this the source?" [shape=diamond];
+    "Fix at source" [shape=box];
+    "Add validation at each layer" [shape=box];
+    "Bug impossible" [shape=doublecircle];
+    "NEVER fix just the symptom" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+
+    "Found immediate cause" -> "Can trace one level up?";
+    "Can trace one level up?" -> "Trace backwards" [label="yes"];
+    "Can trace one level up?" -> "NEVER fix just the symptom" [label="no"];
+    "Trace backwards" -> "Is this the source?";
+    "Is this the source?" -> "Trace backwards" [label="no - keeps going"];
+    "Is this the source?" -> "Fix at source" [label="yes"];
+    "Fix at source" -> "Add validation at each layer";
+    "Add validation at each layer" -> "Bug impossible";
+}
+```
+
+**NEVER fix just where the error appears.** Trace back to find the original trigger.
+
+## Stack Trace Tips
+
+**In tests:** Use `console.error()` not logger - logger may be suppressed
+**Before operation:** Log before the dangerous operation, not after it fails
+**Include context:** Directory, cwd, environment variables, timestamps
+**Capture stack:** `new Error().stack` shows complete call chain
+
+## Real-World Impact
+
+From debugging session (2025-10-03):
+- Found root cause through 5-level trace
+- Fixed at source (getter validation)
+- Added 4 layers of defense
+- 1847 tests passed, zero pollution
--- a/skills/root-cause-tracing/find-polluter.sh
+++ b/skills/root-cause-tracing/find-polluter.sh
@@ -0,0 +1,74 @@
+#!/bin/bash
+# Bisection script to find which test creates unwanted files/state
+# Usage: ./find-polluter.sh <file_or_dir_to_check> <test_pattern>
+# Example: ./find-polluter.sh '.git' 'src/**/*.test.ts'
+
+set -e
+
+if [ $# -ne 2 ]; then
+  echo "Usage: $0 <file_to_check> <test_pattern>"
+  echo "Example: $0 '.git' 'src/**/*.test.ts'"
+  exit 1
+fi
+
+POLLUTION_CHECK="$1"
+TEST_PATTERN="$2"
+
+echo "🔍 Searching for test that creates: $POLLUTION_CHECK"
+echo "Test pattern: $TEST_PATTERN"
+echo ""
+
+# Get list of test files
+# Extract filename pattern from glob (e.g., "src/**/*.test.ts" -> "*.test.ts")
+# Note: -name only matches the filename, not full path. For complex path patterns,
+# consider using: find . -type f -name "*.test.ts" | grep -E "src/.*"
+FILENAME_PATTERN="${TEST_PATTERN##*/}"
+TEST_FILES=$(find . -type f -name "$FILENAME_PATTERN" 2>/dev/null | sort)
+
+if [ -z "$TEST_FILES" ]; then
+  echo "No test files found matching pattern: $TEST_PATTERN"
+  echo "Filename pattern extracted: $FILENAME_PATTERN"
+  exit 1
+fi
+
+TOTAL=$(echo "$TEST_FILES" | wc -l | tr -d ' ')
+
+echo "Found $TOTAL test files"
+echo ""
+
+COUNT=0
+while IFS= read -r TEST_FILE; do
+  COUNT=$((COUNT + 1))
+
+  # Skip if pollution already exists
+  if [ -e "$POLLUTION_CHECK" ]; then
+    echo "⚠️  Pollution already exists before test $COUNT/$TOTAL"
+    echo "   Skipping: $TEST_FILE"
+    continue
+  fi
+
+  echo "[$COUNT/$TOTAL] Testing: $TEST_FILE"
+
+  # Run the test
+  npm test "$TEST_FILE" > /dev/null 2>&1 || true
+
+  # Check if pollution appeared
+  if [ -e "$POLLUTION_CHECK" ]; then
+    echo ""
+    echo "🎯 FOUND POLLUTER!"
+    echo "   Test: $TEST_FILE"
+    echo "   Created: $POLLUTION_CHECK"
+    echo ""
+    echo "Pollution details:"
+    ls -la "$POLLUTION_CHECK"
+    echo ""
+    echo "To investigate:"
+    echo "  npm test $TEST_FILE    # Run just this test"
+    echo "  cat $TEST_FILE         # Review test code"
+    exit 1
+  fi
+done <<< "$TEST_FILES"
+
+echo ""
+echo "✅ No polluter found - all tests clean!"
+exit 0
--- a/skills/shared-patterns/exit-criteria.md
+++ b/skills/shared-patterns/exit-criteria.md
@@ -0,0 +1,31 @@
+# Universal Exit Criteria Pattern
+
+Add this section to define clear completion:
+
+## Definition of Done
+
+You may ONLY claim completion when:
+
+□ All checklist items complete
+□ Verification commands run and passed
+□ Output evidence included in response
+□ No "should" or "probably" in message
+□ State tracking shows all phases done
+□ No unresolved blockers
+
+**Incomplete checklist = not done**
+
+Example claim structure:
+```
+Status: Task complete
+
+Evidence:
+- Tests: 15/15 passing [output shown above]
+- Lint: No errors [output shown above]
+- Build: Success [output shown above]
+- All requirements met [checklist verified]
+
+The implementation is complete and verified.
+```
+
+**Never claim done without this structure.**
--- a/skills/shared-patterns/failure-recovery.md
+++ b/skills/shared-patterns/failure-recovery.md
@@ -0,0 +1,74 @@
+# Universal Failure Recovery Pattern
+
+Add this section to any skill with potential failure points:
+
+## When You Violate This Skill
+
+**Skills can be violated by skipping steps or doing things out of order.**
+
+Add skill-specific violation recovery procedures:
+
+### Violation Template
+
+```markdown
+### Violation: [Common violation name]
+
+**How to detect:**
+[What indicates this violation occurred]
+
+**Recovery procedure:**
+1. [Step 1 to recover]
+2. [Step 2 to recover]
+3. [Step 3 to recover]
+
+**Why recovery matters:**
+[Explanation of why you can't just continue]
+```
+
+**Example:**
+
+Violation: Wrote implementation before test (in TDD)
+
+**How to detect:**
+- Implementation file exists but no test file
+- Git history shows implementation committed before test
+
+**Recovery procedure:**
+1. Stash or delete the implementation code
+2. Write the failing test first
+3. Run test to verify it fails
+4. Rewrite the implementation to make test pass
+
+**Why recovery matters:**
+The test must fail first to prove it actually tests something. If implementation exists first, you can't verify the test works - it might be passing for the wrong reason or not testing anything at all.
+
+---
+
+## When Things Go Wrong
+
+**If you get stuck:**
+
+1. **Attempt failed?**
+   - Document exactly what happened
+   - Include error messages verbatim
+   - Note what you tried
+
+2. **Can't proceed?**
+   - State blocker explicitly: "Blocked by: [specific issue]"
+   - Don't guess or work around
+   - Ask for help
+
+3. **Confused?**
+   - Say "I don't understand [specific thing]"
+   - Don't pretend to understand
+   - Research or ask for clarification
+
+4. **Multiple failures?**
+   - After 3 attempts: STOP
+   - Document all attempts
+   - Reassess approach with human partner
+
+**Never:** Pretend to succeed when stuck
+**Never:** Continue after 3 failures
+**Never:** Hide confusion or errors
+**Always:** Be explicit about blockage
--- a/skills/shared-patterns/state-tracking.md
+++ b/skills/shared-patterns/state-tracking.md
@@ -0,0 +1,30 @@
+# Universal State Tracking Pattern
+
+Add this section to any multi-step skill:
+
+## State Tracking (MANDATORY)
+
+Create and maintain a status comment:
+
+```
+SKILL: [skill-name]
+PHASE: [current phase/step]
+COMPLETED: [✓ list what's done]
+NEXT: [→ what's next]
+EVIDENCE: [last verification output]
+BLOCKED: [any blockers]
+```
+
+**Update after EACH phase/step.**
+
+Example:
+```
+SKILL: systematic-debugging
+PHASE: 2 - Pattern Analysis
+COMPLETED: ✓ Error reproduced ✓ Recent changes reviewed
+NEXT: → Compare with working examples
+EVIDENCE: Test fails with "KeyError: 'user_id'"
+BLOCKED: None
+```
+
+This comment should be included in EVERY response while using the skill.
--- a/skills/shared-patterns/todowrite-integration.md
+++ b/skills/shared-patterns/todowrite-integration.md
@@ -0,0 +1,38 @@
+# Universal TodoWrite Integration
+
+Add this requirement to skill starts:
+
+## TodoWrite Requirement
+
+**BEFORE starting this skill:**
+
+1. Create todos for major phases:
+```javascript
+[
+  {content: "Phase 1: [name]", status: "in_progress", activeForm: "Working on Phase 1"},
+  {content: "Phase 2: [name]", status: "pending", activeForm: "Working on Phase 2"},
+  {content: "Phase 3: [name]", status: "pending", activeForm: "Working on Phase 3"}
+]
+```
+
+2. Update after each phase:
+   - Mark complete when done
+   - Move next to in_progress
+   - Add any new discovered tasks
+
+3. Never work without todos:
+   - Skipping todos = skipping the skill
+   - Mental tracking = guaranteed to miss steps
+   - Todos are your external memory
+
+**Example for debugging:**
+```javascript
+[
+  {content: "Root cause investigation", status: "in_progress", ...},
+  {content: "Pattern analysis", status: "pending", ...},
+  {content: "Hypothesis testing", status: "pending", ...},
+  {content: "Implementation", status: "pending", ...}
+]
+```
+
+If you're not updating todos, you're not following the skill.
--- a/skills/subagent-driven-development/SKILL.md
+++ b/skills/subagent-driven-development/SKILL.md
@@ -0,0 +1,356 @@
+---
+name: subagent-driven-development
+description: |
+  Autonomous plan execution - fresh subagent per task with automated code review
+  between tasks. No human-in-loop, high throughput with quality gates.
+
+trigger: |
+  - Staying in current session (no worktree switch)
+  - Tasks are independent (can be executed in isolation)
+  - Want continuous progress without human pause points
+
+skip_when: |
+  - Need human review between tasks → use executing-plans
+  - Tasks are tightly coupled → execute manually
+  - Plan needs revision → use brainstorming first
+
+sequence:
+  after: [writing-plans, pre-dev-task-breakdown]
+
+related:
+  similar: [executing-plans]
+---
+
+# Subagent-Driven Development
+
+Execute plan by dispatching fresh subagent per task, with code review after each.
+
+**Core principle:** Fresh subagent per task + review between tasks = high quality, fast iteration
+
+## Overview
+
+**vs. Executing Plans (parallel session):**
+- Same session (no context switch)
+- Fresh subagent per task (no context pollution)
+- Code review after each task (catch issues early)
+- Faster iteration (no human-in-loop between tasks)
+
+**When to use:**
+- Staying in this session
+- Tasks are mostly independent
+- Want continuous progress with quality gates
+
+**When NOT to use:**
+- Need to review plan first (use executing-plans)
+- Tasks are tightly coupled (manual execution better)
+- Plan needs revision (brainstorm first)
+
+## The Process
+
+### 1. Load Plan
+
+Read plan file, create TodoWrite with all tasks.
+
+### 2. Execute Task with Subagent
+
+For each task:
+
+**Dispatch fresh subagent:**
+```
+Task tool (general-purpose):
+  description: "Implement Task N: [task name]"
+  prompt: |
+    You are implementing Task N from [plan-file].
+
+    Read that task carefully. Your job is to:
+    1. Implement exactly what the task specifies
+    2. Write tests (following TDD if task says to)
+    3. Verify implementation works
+    4. Commit your work
+    5. Report back
+
+    Work from: [directory]
+
+    Report: What you implemented, what you tested, test results, files changed, any issues
+```
+
+**Subagent reports back** with summary of work.
+
+### 3. Review Subagent's Work (Parallel Execution)
+
+**Dispatch all three reviewer subagents in parallel using a single message:**
+
+**CRITICAL: Use one message with 3 Task tool calls to launch all reviewers simultaneously.**
+
+```
+# Single message with 3 parallel Task calls:
+
+Task tool #1 (ring-default:code-reviewer):
+  model: "opus"
+  description: "Review code quality for Task N"
+  prompt: |
+    WHAT_WAS_IMPLEMENTED: [from subagent's report]
+    PLAN_OR_REQUIREMENTS: Task N from [plan-file]
+    BASE_SHA: [commit before task]
+    HEAD_SHA: [current commit]
+    DESCRIPTION: [task summary]
+
+Task tool #2 (ring-default:business-logic-reviewer):
+  model: "opus"
+  description: "Review business logic for Task N"
+  prompt: |
+    [Same parameters as above]
+
+Task tool #3 (ring-default:security-reviewer):
+  model: "opus"
+  description: "Review security for Task N"
+  prompt: |
+    [Same parameters as above]
+```
+
+**All three reviewers execute simultaneously. Wait for all to complete.**
+
+**Each reviewer returns:** Strengths, Issues (Critical/High/Medium/Low/Cosmetic), Assessment (PASS/FAIL)
+
+### 4. Aggregate and Handle Review Feedback
+
+**After all three reviewers complete:**
+
+**Step 1: Aggregate all issues by severity across all reviewers:**
+- **Critical issues:** [List from code/business/security reviewers]
+- **High issues:** [List from code/business/security reviewers]
+- **Medium issues:** [List from code/business/security reviewers]
+- **Low issues:** [List from code/business/security reviewers]
+- **Cosmetic/Nitpick issues:** [List from code/business/security reviewers]
+
+**Step 2: Handle by severity:**
+
+**Critical/High/Medium → Fix immediately:**
+```
+Dispatch fix subagent:
+"Fix the following issues from parallel code review:
+
+Critical Issues:
+- [Issue 1 from reviewer X] - file:line
+- [Issue 2 from reviewer Y] - file:line
+
+High Issues:
+- [Issue 3 from reviewer Z] - file:line
+
+Medium Issues:
+- [Issue 4 from reviewer X] - file:line"
+```
+
+After fixes complete, **re-run all 3 reviewers in parallel** to verify fixes.
+Repeat until no Critical/High/Medium issues remain.
+
+**Low issues → Add TODO comments in code:**
+```python
+# TODO(review): Extract this validation logic into separate function
+# Reported by: code-reviewer on 2025-11-06
+# Severity: Low
+def process_data(data):
+    ...
+```
+
+**Cosmetic/Nitpick → Add FIXME comments in code:**
+```python
+# FIXME(nitpick): Consider more descriptive variable name than 'x'
+# Reported by: code-reviewer on 2025-11-06
+# Severity: Cosmetic
+x = calculate_total()
+```
+
+Commit TODO/FIXME comments with fixes.
+
+### 5. Mark Complete, Next Task
+
+After all Critical/High/Medium issues resolved for current task:
+- Mark task as completed in TodoWrite
+- Commit all changes (including TODO/FIXME comments)
+- Move to next task
+- Repeat steps 2-5
+
+### 6. Final Review (After All Tasks)
+
+After all tasks complete, run parallel final validation across entire implementation:
+
+**Dispatch all three reviewers in parallel for full implementation review:**
+
+```
+# Single message with 3 parallel Task calls:
+
+Task tool #1 (ring-default:code-reviewer):
+  model: "opus"
+  description: "Final code review for complete implementation"
+  prompt: |
+    WHAT_WAS_IMPLEMENTED: All tasks from [plan]
+    PLAN_OR_REQUIREMENTS: Complete plan from [plan-file]
+    BASE_SHA: [start of development]
+    HEAD_SHA: [current commit]
+    DESCRIPTION: Full implementation review
+
+Task tool #2 (ring-default:business-logic-reviewer):
+  model: "opus"
+  description: "Final business logic review"
+  prompt: |
+    [Same parameters as above]
+
+Task tool #3 (ring-default:security-reviewer):
+  model: "opus"
+  description: "Final security review"
+  prompt: |
+    [Same parameters as above]
+```
+
+**Wait for all three final reviews to complete, then:**
+- Aggregate findings by severity
+- Fix any remaining Critical/High/Medium issues
+- Add TODO/FIXME for Low/Cosmetic issues
+- Re-run parallel review if fixes were needed
+
+### 7. Complete Development
+
+After final review passes:
+- Announce: "I'm using the finishing-a-development-branch skill to complete this work."
+- **REQUIRED SUB-SKILL:** Use ring-default:finishing-a-development-branch
+- Follow that skill to verify tests, present options, execute choice
+
+## Example Workflow
+
+```
+You: I'm using Subagent-Driven Development to execute this plan.
+
+[Load plan, create TodoWrite]
+
+Task 1: Hook installation script
+
+[Dispatch implementation subagent]
+Subagent: Implemented install-hook with tests, 5/5 passing
+
+[Parallel review - dispatch all 3 reviewers in single message]
+Code reviewer: PASS. Strengths: Good test coverage. Issues: None.
+Business reviewer: PASS. Strengths: Meets requirements. Issues: None.
+Security reviewer: PASS. Strengths: No security concerns. Issues: None.
+
+[All pass - mark Task 1 complete]
+
+Task 2: User authentication endpoint
+
+[Dispatch implementation subagent]
+Subagent: Added auth endpoint with JWT, 8/8 tests passing
+
+[Parallel review - all 3 reviewers run simultaneously]
+Code reviewer:
+  Strengths: Clean architecture
+  Issues (Low): Consider extracting token logic
+  Assessment: PASS
+
+Business reviewer:
+  Strengths: Workflow correct
+  Issues (High): Missing password reset flow (required per PRD)
+  Assessment: FAIL
+
+Security reviewer:
+  Strengths: Good validation
+  Issues (Critical): JWT secret hardcoded, (High): No rate limiting
+  Assessment: FAIL
+
+[Aggregate issues by severity]
+Critical: JWT secret hardcoded
+High: Missing password reset, No rate limiting
+Low: Extract token logic
+
+[Dispatch fix subagent for Critical/High issues]
+Fix subagent: Added password reset, moved secret to env var, added rate limiting
+
+[Re-run all 3 reviewers in parallel after fixes]
+Code reviewer: PASS
+Business reviewer: PASS. All requirements met.
+Security reviewer: PASS. Issues resolved.
+
+[Add TODO comment for Low issue]
+# TODO(review): Extract token generation logic into TokenService
+# Reported by: code-reviewer on 2025-11-06
+# Severity: Low
+
+[Commit and mark Task 2 complete]
+
+...
+
+[After all tasks]
+[Parallel final review - all 3 reviewers simultaneously]
+
+Code reviewer:
+  All implementation solid, architecture consistent
+  Assessment: PASS
+
+Business reviewer:
+  All requirements met, workflows complete
+  Assessment: PASS
+
+Security reviewer:
+  No remaining security concerns, ready for production
+  Assessment: PASS
+
+Done!
+```
+
+**Why parallel works well:**
+- 3x faster than sequential (reviewers run simultaneously)
+- Get all feedback at once (easier to prioritize fixes)
+- Re-review after fixes is fast (parallel execution)
+- TODO/FIXME comments track tech debt in code
+
+## Advantages
+
+**vs. Manual execution:**
+- Subagents follow TDD naturally
+- Fresh context per task (no confusion)
+- Parallel-safe (subagents don't interfere)
+
+**vs. Executing Plans:**
+- Same session (no handoff)
+- Continuous progress (no waiting)
+- Review checkpoints automatic
+
+**Cost:**
+- More subagent invocations
+- But catches issues early (cheaper than debugging later)
+
+## Red Flags
+
+**Never:**
+- Skip code review between tasks
+- Proceed with unfixed Critical/High/Medium issues
+- Dispatch reviewers sequentially (use parallel - 3x faster!)
+- Dispatch multiple implementation subagents in parallel (conflicts)
+- Implement without reading plan task
+- Forget to add TODO/FIXME comments for Low/Cosmetic issues
+
+**Always:**
+- Launch all 3 reviewers in single message with 3 Task calls
+- Specify `model: "opus"` for each reviewer
+- Wait for all reviewers before aggregating findings
+- Fix Critical/High/Medium immediately
+- Add TODO for Low, FIXME for Cosmetic
+- Re-run all 3 reviewers after fixes
+
+**If subagent fails task:**
+- Dispatch fix subagent with specific instructions
+- Don't try to fix manually (context pollution)
+
+## Integration
+
+**Required workflow skills:**
+- **writing-plans** - REQUIRED: Creates the plan that this skill executes
+- **requesting-code-review** - REQUIRED: Review after each task (see Step 3)
+- **finishing-a-development-branch** - REQUIRED: Complete development after all tasks (see Step 7)
+
+**Subagents must use:**
+- **test-driven-development** - Subagents follow TDD for each task
+
+**Alternative workflow:**
+- **executing-plans** - Use for parallel session instead of same-session execution
+
+See reviewer agent definitions: agents/code-reviewer.md, agents/security-reviewer.md, agents/business-logic-reviewer.md
--- a/skills/systematic-debugging/SKILL.md
+++ b/skills/systematic-debugging/SKILL.md
@@ -0,0 +1,244 @@
+---
+name: systematic-debugging
+description: |
+  Four-phase debugging framework - root cause investigation, pattern analysis,
+  hypothesis testing, implementation. Ensures understanding before attempting fixes.
+
+trigger: |
+  - Bug reported or test failure observed
+  - Unexpected behavior or error message
+  - Root cause unknown
+  - Previous fix attempt didn't work
+
+skip_when: |
+  - Root cause already known → just fix it
+  - Error deep in call stack, need to trace backward → use root-cause-tracing
+  - Issue obviously caused by your last change → quick verification first
+
+related:
+  complementary: [root-cause-tracing]
+---
+
+# Systematic Debugging
+
+**Core principle:** NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.
+
+## When to Use
+
+Use for ANY technical issue: test failures, bugs, unexpected behavior, performance problems, build failures, integration issues.
+
+**Especially when:**
+- Under time pressure (emergencies make guessing tempting)
+- "Just one quick fix" seems obvious
+- Previous fix didn't work
+- You don't fully understand the issue
+
+## The Four Phases
+
+Complete each phase before proceeding to the next.
+
+### Phase 1: Root Cause Investigation
+
+**MUST complete ALL before Phase 2:**
+
+```
+Phase 1 Investigation:
+□ Error message copied verbatim: ___________
+□ Reproduction confirmed: [steps documented]
+□ Recent changes reviewed: [git diff output]
+□ Evidence from ALL components: [list components checked]
+□ Data flow traced: [origin → error location]
+```
+
+**Copy this checklist to TodoWrite.**
+
+1. **Read Error Messages**
+   - Stack traces completely
+   - Line numbers, file paths, error codes
+   - Don't skip past warnings
+
+2. **Reproduce Consistently**
+   - Exact steps to trigger
+   - Happens every time? If not → gather more data
+
+3. **Check Recent Changes**
+   - `git diff`, recent commits
+   - New dependencies, config changes
+
+4. **Multi-Component Systems**
+
+   **Add diagnostic instrumentation at EACH boundary:**
+   ```bash
+   # For each layer, log:
+   - What enters component
+   - What exits component
+   - Environment/config state
+   ```
+
+   Run once, analyze evidence, identify failing layer.
+
+5. **Trace Data Flow**
+
+   Error deep in stack? **Use ring-default:root-cause-tracing skill.**
+
+   Quick version:
+   - Where does bad value originate?
+   - Trace up call stack to source
+   - Fix at source, not symptom
+
+**Phase 1 Summary (write before Phase 2):**
+```
+FINDINGS:
+- Error: [exact error]
+- Reproduces: [steps]
+- Recent changes: [commits]
+- Component evidence: [what each shows]
+- Data origin: [where bad data starts]
+```
+
+### Phase 2: Pattern Analysis
+
+1. **Find Working Examples**
+   - Similar working code in codebase
+   - What works that's similar to what's broken?
+
+2. **Compare Against References**
+   - Read reference implementation COMPLETELY
+   - Don't skim - understand fully
+
+3. **Identify Differences**
+   - List EVERY difference (working vs broken)
+   - Don't assume "that can't matter"
+
+4. **Understand Dependencies**
+   - What components, config, environment needed?
+   - What assumptions does it make?
+
+### Phase 3: Hypothesis Testing
+
+1. **Form Single Hypothesis**
+   - "I think X is root cause because Y"
+   - Be specific
+
+2. **Test Minimally**
+   - SMALLEST possible change
+   - One variable at a time
+
+3. **Verify and Track**
+   ```
+   Hypothesis #1: [what] → [result]
+   Hypothesis #2: [what] → [result]
+   Hypothesis #3: [what] → [STOP if fails]
+   ```
+
+   **If 3 hypotheses fail:**
+   - STOP immediately
+   - "3 hypotheses failed, architecture review required"
+   - Discuss with partner before more attempts
+
+4. **When You Don't Know**
+   - Say "I don't understand X"
+   - Ask for help
+   - Research more
+
+### Phase 4: Implementation
+
+**Fix root cause, not symptom:**
+
+1. **Create Failing Test**
+   - Simplest reproduction
+   - **Use ring-default:test-driven-development skill**
+
+2. **Implement Single Fix**
+   - Address root cause only
+   - ONE change at a time
+   - No "while I'm here" improvements
+
+3. **Verify Fix**
+   - Test passes?
+   - No other tests broken?
+   - Issue resolved?
+
+4. **If Fix Doesn't Work**
+   - Count fixes attempted
+   - If < 3: Return to Phase 1
+   - **If ≥ 3: STOP → Architecture review required**
+
+5. **After Fix Verified**
+   - Test passes and issue resolved?
+   - **If non-trivial (took > 5 min):** Suggest documentation
+   > "The fix has been verified. Would you like to document this solution for future reference?
+   > Run: `/ring-default:codify`"
+   - **Use ring-default:codify-solution skill** to capture institutional knowledge
+
+6. **If 3+ Fixes Failed: Question Architecture**
+
+   Pattern indicating architectural problem:
+   - Each fix reveals new problem elsewhere
+   - Fixes require massive refactoring
+   - Each fix creates new symptoms
+
+   **STOP and discuss:** Is architecture sound? Should we refactor vs. fix?
+
+## Time Limits
+
+**Debugging time boxes:**
+- 30 min without root cause → Escalate
+- 3 failed fixes → Architecture review
+- 1 hour total → Stop, document, ask for guidance
+
+## Red Flags
+
+**STOP and return to Phase 1 if thinking:**
+- "Quick fix for now, investigate later"
+- "Just try changing X and see if it works"
+- "Add multiple changes, run tests"
+- "Skip the test, I'll manually verify"
+- "It's probably X, let me fix that"
+- "I don't fully understand but this might work"
+- "One more fix attempt" (when already tried 2+)
+- "Each fix reveals new problem" (architecture issue)
+
+**User signals you're wrong:**
+- "Is that not happening?" → You assumed without verifying
+- "Stop guessing" → You're proposing fixes without understanding
+- "We're stuck?" → Your approach isn't working
+
+**When you see these: STOP. Return to Phase 1.**
+
+## Quick Reference
+
+| Phase | Key Activities | Success Criteria |
+|-------|---------------|------------------|
+| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence, trace data flow | Understand WHAT and WHY |
+| **2. Pattern** | Find working examples, compare differences, understand dependencies | Identify what's different |
+| **3. Hypothesis** | Form theory, test minimally, verify one at a time | Confirmed or new hypothesis |
+| **4. Implementation** | Create test, fix root cause, verify | Bug resolved, tests pass |
+
+**Circuit breakers:**
+- 3 hypotheses fail → STOP, architecture review
+- 3 fixes fail → STOP, question fundamentals
+- 30 min no root cause → Escalate
+
+## Integration with Other Skills
+
+**Required sub-skills:**
+- **root-cause-tracing** - When error is deep in call stack (Phase 1, Step 5)
+- **test-driven-development** - For failing test case (Phase 4, Step 1)
+
+**Post-completion:**
+- **codify-solution** - Document non-trivial fixes (Phase 4, Step 5)
+
+**Complementary:**
+- **defense-in-depth** - Add validation after finding root cause
+- **verification-before-completion** - Verify fix worked before claiming success
+
+## Required Patterns
+
+This skill uses these universal patterns:
+- **State Tracking:** See `skills/shared-patterns/state-tracking.md`
+- **Failure Recovery:** See `skills/shared-patterns/failure-recovery.md`
+- **Exit Criteria:** See `skills/shared-patterns/exit-criteria.md`
+- **TodoWrite:** See `skills/shared-patterns/todowrite-integration.md`
+
+Apply ALL patterns when using this skill.
--- a/skills/test-driven-development/SKILL.md
+++ b/skills/test-driven-development/SKILL.md
@@ -0,0 +1,654 @@
+---
+name: test-driven-development
+description: |
+  RED-GREEN-REFACTOR implementation methodology - write failing test first,
+  minimal implementation to pass, then refactor. Ensures tests verify behavior.
+
+trigger: |
+  - Starting implementation of new feature
+  - Starting implementation of bugfix
+  - Writing new production code
+
+skip_when: |
+  - Reviewing/modifying existing tests → use testing-anti-patterns
+  - Code already exists without tests → add tests first, then TDD for new code
+  - Exploratory/spike work → consider brainstorming first
+
+related:
+  complementary: [testing-anti-patterns, verification-before-completion]
+
+compliance_rules:
+  - id: "test_file_exists"
+    description: "Test file must exist before implementation file"
+    check_type: "file_exists"
+    pattern: "**/*.test.{ts,js,go,py}"
+    severity: "blocking"
+    failure_message: "No test file found. Write test first (RED phase)."
+
+  - id: "test_must_fail_first"
+    description: "Test must produce failure output before implementation"
+    check_type: "command_output_contains"
+    command: "npm test 2>&1 || pytest 2>&1 || go test ./... 2>&1"
+    pattern: "FAIL|Error|failed"
+    severity: "blocking"
+    failure_message: "Test does not fail. Write a failing test first (RED phase)."
+prerequisites:
+  - name: "test_framework_installed"
+    check: "npm list jest 2>/dev/null || npm list vitest 2>/dev/null || which pytest 2>/dev/null || go list ./... 2>&1 | grep -q testing"
+    failure_message: "No test framework found. Install jest/vitest (JS), pytest (Python), or use Go's built-in testing."
+    severity: "blocking"
+
+  - name: "can_run_tests"
+    check: "npm test -- --version 2>/dev/null || pytest --version 2>/dev/null || go test -v 2>&1 | grep -q 'testing:'"
+    failure_message: "Cannot run tests. Fix test configuration."
+    severity: "warning"
+composition:
+  works_well_with:
+    - skill: "systematic-debugging"
+      when: "test reveals unexpected behavior or bug"
+      transition: "Pause TDD at current phase, use systematic-debugging to find root cause, return to TDD after fix"
+
+    - skill: "verification-before-completion"
+      when: "before marking test suite or feature complete"
+      transition: "Run verification to ensure all tests pass, return to TDD if issues found"
+
+    - skill: "requesting-code-review"
+      when: "after completing RED-GREEN-REFACTOR cycle for feature"
+      transition: "Request review before merging, address feedback, mark complete"
+
+  conflicts_with: []
+
+  typical_workflow: |
+    1. Write failing test (RED)
+    2. If test reveals unexpected behavior → switch to systematic-debugging
+    3. Fix root cause
+    4. Return to TDD: minimal implementation (GREEN)
+    5. Refactor (REFACTOR)
+    6. Run verification-before-completion
+    7. Request code review
+---
+
+# Test-Driven Development (TDD)
+
+## Overview
+
+Write the test first. Watch it fail. Write minimal code to pass.
+
+**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
+
+**Violating the letter of the rules is violating the spirit of the rules.**
+
+## When to Use
+
+**Always:**
+- New features
+- Bug fixes
+- Refactoring
+- Behavior changes
+
+**Exceptions (ask your human partner):**
+- Throwaway prototypes
+- Generated code
+- Configuration files
+
+Thinking "skip TDD just this once"? Stop. That's rationalization.
+
+## The Iron Law
+
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+
+Write code before the test? You have ONLY two options:
+
+## Violation Handling (Mandatory)
+
+**If you wrote code before test:**
+
+### DELETE IT. IMMEDIATELY.
+
+```bash
+rm [files]  # or git reset --hard
+```
+
+**Not tomorrow. Not after asking. NOW.**
+
+**Delete means DELETE:**
+- `rm -rf the_file.py` ✓
+- `git reset --hard` ✓
+- Physically delete ✓
+
+**These are NOT deleting (violations):**
+- `git stash` - That's hiding, not deleting
+- `mv file.py file.py.bak` - That's keeping
+- Copy to clipboard - That's keeping
+- Comment out - That's keeping
+- "Mental note" - That's keeping
+- Asking permission - That's delaying
+
+Delete means gone forever. No recovery possible.
+
+Then start over with TDD.
+
+### NO ASKING. NO PERMISSION. NO EXCEPTIONS.
+
+**You CANNOT:**
+- ❌ Ask "should I delete this?"
+- ❌ Ask "may I proceed with tests-after?"
+- ❌ Request permission due to [any reason]
+- ❌ Explain why deleting is hard
+- ❌ Decide "I'll be pragmatic and test after"
+- ❌ Claim "spirit not letter"
+- ❌ Rationalize "damage already done"
+- ❌ Argue "skill is guide not suicide pact"
+- ❌ Create "pragmatic compliance" category
+- ❌ Mention sunk cost, time pressure, personal plans
+- ❌ Commit anything
+
+**If you think you have a legitimate reason to skip deletion:**
+- You don't.
+- Delete it.
+- Every excuse is rationalization.
+- Anniversary? Delete code, go to dinner, start fresh tomorrow.
+- Deadline? Delete code, communicate delay, do it right.
+- 4 hours of work? Sunk cost fallacy. Delete it.
+- Manager pressure? Delete it, explain TDD prevents bugs.
+- "But this is different because..." No. Delete it.
+
+**The rule has ZERO exceptions. ZERO.**
+
+### Consequences of NOT Deleting (ABSOLUTE FACTS)
+
+**If you commit code without TDD:**
+
+**IMMEDIATE consequences:**
+- Your partner will ALWAYS lose trust in you permanently
+- Code review tomorrow will ALWAYS reject the entire feature
+- Every bug found in this code will result in you being assigned 100% of the debugging work at 2am
+- Your partner will ALWAYS assign you only bug fixes (no new features) for the next 6 months
+
+**PRODUCTION consequences:**
+- This untested code WILL cause a production incident within 30 days
+- The incident WILL happen at the worst possible time (holiday, weekend, 3am)
+- You WILL be personally responsible for the incident postmortem
+- The incident WILL cost the company minimum $50,000 in lost revenue
+- Your performance review WILL be marked as "Does not follow engineering standards"
+
+**CAREER consequences:**
+- Committing untested code is grounds for immediate termination in 73% of companies
+- Your partner WILL mention this in your next performance review
+- You WILL NOT receive promotion consideration for 2 years minimum
+- Future code reviews WILL be 10x more scrutinous on all your work
+- Your reputation as "cuts corners" WILL follow you
+
+**TEAM consequences:**
+- Your team WILL lose confidence in your work
+- All your future PRs WILL require 3x more reviewers
+- You WILL be excluded from critical projects
+- Other engineers WILL be assigned to "verify" your work
+- You WILL become known as the person who ships bugs
+
+**PERSONAL consequences:**
+- You WILL spend your anniversary debugging production issues
+- The relationship damage from work stress WILL be worse than missing one dinner
+- You WILL be called during every holiday when this code breaks
+- The technical debt WILL haunt you for years
+- You WILL regret not deleting it every single time it breaks
+
+**These are not possibilities. These are CERTAINTIES.**
+
+**Every developer who kept untested code thinking "this time is different" experienced these consequences.**
+
+### The ONLY Acceptable Action
+
+**DELETE THE CODE. NOW.**
+
+```bash
+rm /tmp/auth-feature/src/auth.ts
+rm /tmp/auth-feature/src/middleware.ts
+rm /tmp/auth-feature/src/utils.ts
+# or: git reset --hard
+```
+
+**Then:**
+- Go to anniversary dinner
+- Sleep well
+- Tomorrow morning: Start fresh with TDD
+- Implement in 2-3 hours with tests (faster second time)
+- Code review passes
+- No production incidents
+- Career intact
+
+**ZERO exceptions. Anniversary, deadline, manager pressure, sunk cost - NONE of these override deletion.**
+
+### Prohibited Rationalizations
+
+**You CANNOT create alternatives:**
+- ❌ "Commit with WIP message" - NO. Delete.
+- ❌ "Commit with 'NO TESTS' warning" - NO. Delete.
+- ❌ "Mark as technical debt" - NO. Delete.
+- ❌ "Email team about violation" - NO. Delete first, communicate after.
+- ❌ "Come early tomorrow to decide" - NO. Delete now, decide nothing.
+- ❌ "Let team decide" - NO. You decide: Delete.
+- ❌ "Write tests tonight after dinner" - NO. Delete, start fresh tomorrow with TDD.
+- ❌ "Commit honestly labeled code" - NO. Honesty is deleting it.
+- ❌ "Present options to partner" - NO. Only one option: Delete.
+
+**The skill doesn't offer choices. It commands: DELETE.**
+
+**If you create ANY alternative to deletion, you violated the skill.**
+
+**DELETE IT. NOW. NO ALTERNATIVES.**
+
+## Red-Green-Refactor
+
+```dot
+digraph tdd_cycle {
+    rankdir=LR;
+    red [label="RED\nWrite failing test", shape=box, style=filled, fillcolor="#ffcccc"];
+    verify_red [label="Verify fails\ncorrectly", shape=diamond];
+    green [label="GREEN\nMinimal code", shape=box, style=filled, fillcolor="#ccffcc"];
+    verify_green [label="Verify passes\nAll green", shape=diamond];
+    refactor [label="REFACTOR\nClean up", shape=box, style=filled, fillcolor="#ccccff"];
+    next [label="Next", shape=ellipse];
+
+    red -> verify_red;
+    verify_red -> green [label="yes"];
+    verify_red -> red [label="wrong\nfailure"];
+    green -> verify_green;
+    verify_green -> refactor [label="yes"];
+    verify_green -> green [label="no"];
+    refactor -> verify_green [label="stay\ngreen"];
+    verify_green -> next;
+    next -> red;
+}
+```
+
+### RED - Write Failing Test
+
+Write one minimal test showing what should happen.
+
+<Good>
+```typescript
+test('retries failed operations 3 times', async () => {
+  let attempts = 0;
+  const operation = () => {
+    attempts++;
+    if (attempts < 3) throw new Error('fail');
+    return 'success';
+  };
+
+  const result = await retryOperation(operation);
+
+  expect(result).toBe('success');
+  expect(attempts).toBe(3);
+});
+```
+Clear name, tests real behavior, one thing
+</Good>
+
+**Time limit:** Writing a test should take <5 minutes. Longer = over-engineering.
+
+If your test needs:
+- Complex mocks → Testing wrong thing
+- Lots of setup → Design too complex
+- Multiple assertions → Split into multiple tests
+
+<Bad>
+```typescript
+test('retry works', async () => {
+  const mock = jest.fn()
+    .mockRejectedValueOnce(new Error())
+    .mockRejectedValueOnce(new Error())
+    .mockResolvedValueOnce('success');
+  await retryOperation(mock);
+  expect(mock).toHaveBeenCalledTimes(3);
+});
+```
+Vague name, tests mock not code
+</Bad>
+
+**Requirements:**
+- One behavior
+- Clear name
+- Real code (no mocks unless unavoidable)
+
+### Verify RED - Watch It Fail
+
+**MANDATORY. Never skip.**
+
+```bash
+npm test path/to/test.test.ts
+```
+
+**Paste the ACTUAL failure output in your response:**
+```
+[PASTE EXACT OUTPUT HERE]
+[NO OUTPUT = VIOLATION]
+```
+
+If you can't paste output, you didn't run the test.
+
+### Required Failure Patterns
+
+| Test Type | Must See This Failure | Wrong Failure = Wrong Test |
+|-----------|----------------------|---------------------------|
+| New feature | `NameError: function not defined` or `AttributeError` | Test passing = testing existing behavior |
+| Bug fix | Actual wrong output/behavior | Test passing = not testing the bug |
+| Refactor | Tests pass before and after | Tests fail after = broke something |
+
+**No failure output = didn't run = violation**
+
+Confirm:
+- Test fails (not errors)
+- Failure message is expected
+- Fails because feature missing (not typos)
+
+**Test passes?** You're testing existing behavior. Fix test.
+
+**Test errors?** Fix error, re-run until it fails correctly.
+
+### GREEN - Minimal Code
+
+Write simplest code to pass the test.
+
+<Good>
+```typescript
+async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
+  for (let i = 0; i < 3; i++) {
+    try {
+      return await fn();
+    } catch (e) {
+      if (i === 2) throw e;
+    }
+  }
+  throw new Error('unreachable');
+}
+```
+Just enough to pass
+</Good>
+
+<Bad>
+```typescript
+async function retryOperation<T>(
+  fn: () => Promise<T>,
+  options?: {
+    maxRetries?: number;
+    backoff?: 'linear' | 'exponential';
+    onRetry?: (attempt: number) => void;
+  }
+): Promise<T> {
+  // YAGNI
+}
+```
+Over-engineered
+</Bad>
+
+Don't add features, refactor other code, or "improve" beyond the test.
+
+### Verify GREEN - Watch It Pass
+
+**MANDATORY.**
+
+```bash
+npm test path/to/test.test.ts
+```
+
+Confirm:
+- Test passes
+- Other tests still pass
+- Output pristine (no errors, warnings)
+
+**Test fails?** Fix code, not test.
+
+**Other tests fail?** Fix now.
+
+### REFACTOR - Clean Up
+
+After green only:
+- Remove duplication
+- Improve names
+- Extract helpers
+
+Keep tests green. Don't add behavior.
+
+### Repeat
+
+Next failing test for next feature.
+
+## Good Tests
+
+| Quality | Good | Bad |
+|---------|------|-----|
+| **Minimal** | One thing. "and" in name? Split it. | `test('validates email and domain and whitespace')` |
+| **Clear** | Name describes behavior | `test('test1')` |
+| **Shows intent** | Demonstrates desired API | Obscures what code should do |
+
+## Why Order Matters
+
+**"I'll write tests after to verify it works"**
+
+Tests written after code pass immediately. Passing immediately proves nothing:
+- Might test wrong thing
+- Might test implementation, not behavior
+- Might miss edge cases you forgot
+- You never saw it catch the bug
+
+Test-first forces you to see the test fail, proving it actually tests something.
+
+**"I already manually tested all the edge cases"**
+
+Manual testing is ad-hoc. You think you tested everything but:
+- No record of what you tested
+- Can't re-run when code changes
+- Easy to forget cases under pressure
+- "It worked when I tried it" ≠ comprehensive
+
+Automated tests are systematic. They run the same way every time.
+
+**"Deleting X hours of work is wasteful"**
+
+Sunk cost fallacy. The time is already gone. Your choice now:
+- Delete and rewrite with TDD (X more hours, high confidence)
+- Keep it and add tests after (30 min, low confidence, likely bugs)
+
+The "waste" is keeping code you can't trust. Working code without real tests is technical debt.
+
+**"TDD is dogmatic, being pragmatic means adapting"**
+
+TDD IS pragmatic:
+- Finds bugs before commit (faster than debugging after)
+- Prevents regressions (tests catch breaks immediately)
+- Documents behavior (tests show how to use code)
+- Enables refactoring (change freely, tests catch breaks)
+
+"Pragmatic" shortcuts = debugging in production = slower.
+
+**"Tests after achieve the same goals - it's spirit not ritual"**
+
+No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"
+
+Tests-after are biased by your implementation. You test what you built, not what's required. You verify remembered edge cases, not discovered ones.
+
+Tests-first force edge case discovery before implementing. Tests-after verify you remembered everything (you didn't).
+
+30 minutes of tests after ≠ TDD. You get coverage, lose proof tests work.
+
+## Common Rationalizations
+
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
+| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
+| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
+| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
+| "Test hard = design unclear" | Listen to test. Hard to test = hard to use. |
+| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
+| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
+| "Existing code has no tests" | You're improving it. Add tests for existing code. |
+
+## Red Flags - STOP and Start Over
+
+- Code before test
+- Test after implementation
+- Test passes immediately
+- Can't explain why test failed
+- Tests added "later"
+- Rationalizing "just this once"
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "It's about spirit not ritual"
+- "Keep as reference" or "adapt existing code"
+- "Already spent X hours, deleting is wasteful"
+- "TDD is dogmatic, I'm being pragmatic"
+- "This is different because..."
+
+**All of these mean: Delete code. Start over with TDD.**
+
+## Example: Bug Fix
+
+**Bug:** Empty email accepted
+
+**RED**
+```typescript
+test('rejects empty email', async () => {
+  const result = await submitForm({ email: '' });
+  expect(result.error).toBe('Email required');
+});
+```
+
+**Verify RED**
+```bash
+$ npm test
+FAIL: expected 'Email required', got undefined
+```
+
+**GREEN**
+```typescript
+function submitForm(data: FormData) {
+  if (!data.email?.trim()) {
+    return { error: 'Email required' };
+  }
+  // ...
+}
+```
+
+**Verify GREEN**
+```bash
+$ npm test
+PASS
+```
+
+**REFACTOR**
+Extract validation for multiple fields if needed.
+
+## Verification Checklist
+
+Before marking work complete:
+
+- [ ] Every new function/method has a test
+- [ ] Watched each test fail before implementing
+- [ ] Each test failed for expected reason (feature missing, not typo)
+- [ ] Wrote minimal code to pass each test
+- [ ] All tests pass
+- [ ] Output pristine (no errors, warnings)
+- [ ] Tests use real code (mocks only if unavoidable)
+- [ ] Edge cases and errors covered
+
+Can't check all boxes? You skipped TDD. Start over.
+
+## When Stuck
+
+| Problem | Solution |
+|---------|----------|
+| Don't know how to test | Write wished-for API. Write assertion first. Ask your human partner. |
+| Test too complicated | Design too complicated. Simplify interface. |
+| Must mock everything | Code too coupled. Use dependency injection. |
+| Test setup huge | Extract helpers. Still complex? Simplify design. |
+
+## Debugging Integration
+
+Bug found? Write failing test reproducing it. Follow TDD cycle. Test proves fix and prevents regression.
+
+Never fix bugs without a test.
+
+## Required Patterns
+
+This skill uses these universal patterns:
+- **State Tracking:** See `skills/shared-patterns/state-tracking.md`
+- **Failure Recovery:** See `skills/shared-patterns/failure-recovery.md`
+- **Exit Criteria:** See `skills/shared-patterns/exit-criteria.md`
+- **TodoWrite:** See `skills/shared-patterns/todowrite-integration.md`
+
+Apply ALL patterns when using this skill.
+
+---
+
+## When You Violate This Skill
+
+### Violation: Wrote implementation before test
+
+**How to detect:**
+- Implementation file exists or modified
+- No test file exists yet
+- Git diff shows implementation changed before test
+
+**Recovery procedure:**
+1. Stash or delete the implementation code: `git stash` or `rm [file]`
+2. Write the failing test first
+3. Run test to verify it fails: `npm test` or `pytest`
+4. Rewrite the implementation to make test pass
+
+**Why recovery matters:**
+The test must fail first to prove it actually tests something. If implementation exists first, you can't verify the test works - it might be passing for the wrong reason or not testing anything at all.
+
+---
+
+### Violation: Test passes without implementation (FALSE GREEN)
+
+**How to detect:**
+- Wrote test
+- Test passes immediately
+- Haven't written implementation yet
+
+**Recovery procedure:**
+1. Test is broken - delete or fix it
+2. Make test stricter until it fails
+3. Verify failure shows expected error
+4. Then implement to make it pass
+
+**Why recovery matters:**
+A test that passes without implementation is useless - it's not testing the right thing.
+
+---
+
+### Violation: Kept code "as reference" instead of deleting
+
+**How to detect:**
+- Stashed implementation with `git stash`
+- Moved file to `.bak` or similar
+- Copied to clipboard "just in case"
+- Commented out instead of deleting
+
+**Recovery procedure:**
+1. Find the kept code (stash, backup, clipboard)
+2. Delete it permanently: `git stash drop`, `rm`, clear clipboard
+3. Verify code is truly gone
+4. Start RED phase fresh
+
+**Why recovery matters:**
+Keeping code means you'll adapt it instead of implementing from tests. The whole point is to implement fresh, guided by tests. Delete means delete.
+
+---
+
+## Final Rule
+
+```
+Production code → test exists and failed first
+Otherwise → not TDD
+```
+
+No exceptions without your human partner's permission.
--- a/skills/testing-agents-with-subagents/SKILL.md
+++ b/skills/testing-agents-with-subagents/SKILL.md
@@ -0,0 +1,623 @@
+---
+name: testing-agents-with-subagents
+description: |
+  Agent testing methodology - run agents with test inputs, observe outputs,
+  iterate until outputs are accurate and well-structured.
+
+trigger: |
+  - Before deploying a new agent
+  - After editing an existing agent
+  - Agent produces structured outputs that must be accurate
+
+skip_when: |
+  - Agent is simple passthrough → minimal testing needed
+  - Agent already tested for this use case
+
+related:
+  complementary: [test-driven-development]
+---
+
+# Testing Agents With Subagents
+
+## Overview
+
+**Testing agents is TDD applied to AI worker definitions.**
+
+You run agents with known test inputs (RED - observe incorrect outputs), fix the agent definition (GREEN - outputs now correct), then handle edge cases (REFACTOR - robust under all conditions).
+
+**Core principle:** If you didn't run an agent with test inputs and verify its outputs, you don't know if the agent works correctly.
+
+**REQUIRED BACKGROUND:** You MUST understand `ring-default:test-driven-development` before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill provides agent-specific test formats (test inputs, output verification, accuracy metrics).
+
+**Key difference from testing-skills-with-subagents:**
+- **Skills** = instructions that guide behavior; test if agent follows rules under pressure
+- **Agents** = separate Claude instances via Task tool; test if they produce correct outputs
+
+## The Iron Law
+
+```
+NO AGENT DEPLOYMENT WITHOUT RED-GREEN-REFACTOR TESTING FIRST
+```
+
+About to deploy an agent without completing the test cycle? You have ONLY one option:
+
+### STOP. TEST FIRST. THEN DEPLOY.
+
+**You CANNOT:**
+- ❌ "Deploy and monitor for issues"
+- ❌ "Test with first real usage"
+- ❌ "Quick smoke test is enough"
+- ❌ "Tested manually in Claude UI"
+- ❌ "One test case passed"
+- ❌ "Agent prompt looks correct"
+- ❌ "Based on working template"
+- ❌ "Deploy now, test in parallel"
+- ❌ "Production is down, no time to test"
+
+**ZERO exceptions. Simple agent, expert confidence, time pressure, production outage - NONE override testing.**
+
+**Why this is absolute:** Untested agents fail in production. Every time. The question is not IF but WHEN and HOW BADLY. A 20-minute test suite prevents hours of debugging and lost trust.
+
+## When to Use
+
+Test agents that:
+- Analyze code/designs and produce findings (reviewers)
+- Generate structured outputs (planners, analyzers)
+- Make decisions or categorizations (severity, priority)
+- Have defined output schemas that must be followed
+- Are used in parallel workflows where consistency matters
+
+**Test exemptions require explicit human partner approval:**
+- Simple pass-through agents (just reformatting) - **only if human partner confirms**
+- Agents without structured outputs - **only if human partner confirms**
+- **You CANNOT self-determine test exemption**
+- **When in doubt → TEST**
+
+## TDD Mapping for Agent Testing
+
+| TDD Phase | Agent Testing | What You Do |
+|-----------|---------------|-------------|
+| **RED** | Run with test inputs | Dispatch agent, observe incorrect/incomplete outputs |
+| **Verify RED** | Document failures | Capture exact output issues verbatim |
+| **GREEN** | Fix agent definition | Update prompt/schema to address failures |
+| **Verify GREEN** | Re-run tests | Agent now produces correct outputs |
+| **REFACTOR** | Test edge cases | Ambiguous inputs, empty inputs, complex scenarios |
+| **Stay GREEN** | Re-verify all | Previous tests still pass after changes |
+
+Same cycle as code TDD, different test format.
+
+## RED Phase: Baseline Testing (Observe Failures)
+
+**Goal:** Run agent with known test inputs - observe what's wrong, document exact failures.
+
+This is identical to TDD's "write failing test first" - you MUST see what the agent actually produces before fixing the definition.
+
+**Process:**
+
+- [ ] **Create test inputs** (known issues, edge cases, clean inputs)
+- [ ] **Run agent** - dispatch via Task tool with test inputs
+- [ ] **Compare outputs** - expected vs actual
+- [ ] **Document failures** - missing findings, wrong severity, bad format
+- [ ] **Identify patterns** - which input types cause failures?
+
+### Test Input Categories
+
+| Category | Purpose | Example |
+|----------|---------|---------|
+| **Known Issues** | Verify agent finds real problems | Code with SQL injection, hardcoded secrets |
+| **Clean Inputs** | Verify no false positives | Well-written code with no issues |
+| **Edge Cases** | Verify robustness | Empty files, huge files, unusual patterns |
+| **Ambiguous Cases** | Verify judgment | Code that could go either way |
+| **Severity Calibration** | Verify severity accuracy | Mix of critical, high, medium, low issues |
+
+### Minimum Test Suite Requirements
+
+Before deploying ANY agent, you MUST have:
+
+| Agent Type | Minimum Test Cases | Required Coverage |
+|------------|-------------------|-------------------|
+| **Reviewer agents** | 6 tests | 2 known issues, 2 clean, 1 edge case, 1 ambiguous |
+| **Analyzer agents** | 5 tests | 2 typical, 1 empty, 1 large, 1 malformed |
+| **Decision agents** | 4 tests | 2 clear cases, 2 boundary cases |
+| **Planning agents** | 5 tests | 2 standard, 1 complex, 1 minimal, 1 edge case |
+
+**Fewer tests = incomplete testing = DO NOT DEPLOY.**
+
+One test case proves nothing. Three tests are suspicious. Six tests are minimum for confidence.
+
+### Example Test Suite for Code Reviewer
+
+```markdown
+## Test Case 1: Known SQL Injection
+**Input:** Function with string concatenation in SQL query
+**Expected:** CRITICAL finding, references OWASP A03:2021
+**Actual:** [Run agent, record output]
+
+## Test Case 2: Clean Authentication
+**Input:** Well-implemented JWT validation with proper error handling
+**Expected:** No findings or LOW-severity suggestions only
+**Actual:** [Run agent, record output]
+
+## Test Case 3: Ambiguous Error Handling
+**Input:** Error caught but only logged, not re-thrown
+**Expected:** MEDIUM finding about silent failures
+**Actual:** [Run agent, record output]
+
+## Test Case 4: Empty File
+**Input:** Empty source file
+**Expected:** Graceful handling, no crash, maybe LOW finding
+**Actual:** [Run agent, record output]
+```
+
+### Running the Test
+
+```markdown
+Use Task tool to dispatch agent:
+
+Task(
+  subagent_type="ring-default:code-reviewer",
+  prompt="""
+  Review this code for security issues:
+
+  ```python
+  def get_user(user_id):
+      query = "SELECT * FROM users WHERE id = " + user_id
+      return db.execute(query)
+  ```
+
+  Provide findings with severity levels.
+  """
+)
+```
+
+**Document exact output.** Don't summarize - capture verbatim.
+
+## GREEN Phase: Fix Agent Definition (Make Tests Pass)
+
+Write/update agent definition addressing specific failures documented in RED phase.
+
+**Common fixes:**
+
+| Failure Type | Fix Approach |
+|--------------|--------------|
+| Missing findings | Add explicit instructions to check for X |
+| Wrong severity | Add severity calibration examples |
+| Bad output format | Add output schema with examples |
+| False positives | Add "don't flag X when Y" instructions |
+| Incomplete analysis | Add "always check A, B, C" checklist |
+
+### Example Fix: Severity Calibration
+
+**RED Phase Failure:**
+```
+Agent marked hardcoded password as MEDIUM instead of CRITICAL
+```
+
+**GREEN Phase Fix (add to agent definition):**
+```markdown
+## Severity Calibration
+
+**CRITICAL:** Immediate exploitation possible
+- Hardcoded secrets (passwords, API keys, tokens)
+- SQL injection with user input
+- Authentication bypass
+
+**HIGH:** Exploitation requires additional steps
+- Missing input validation
+- Improper error handling exposing internals
+
+**MEDIUM:** Security weakness, not directly exploitable
+- Missing rate limiting
+- Verbose error messages
+
+**LOW:** Best practice violations
+- Missing security headers
+- Outdated dependencies (no known CVEs)
+```
+
+### Re-run Tests
+
+After fixing, re-run ALL test cases:
+
+```markdown
+## Test Results After Fix
+
+| Test Case | Expected | Actual | Pass/Fail |
+|-----------|----------|--------|-----------|
+| SQL Injection | CRITICAL | CRITICAL |  PASS |
+| Clean Auth | No findings | No findings |  PASS |
+| Ambiguous Error | MEDIUM | MEDIUM |  PASS |
+| Empty File | Graceful | Graceful |  PASS |
+```
+
+If any test fails: continue fixing, re-test.
+
+## VERIFY GREEN: Output Verification
+
+**Goal:** Confirm agent produces correct, well-structured outputs consistently.
+
+### Output Schema Compliance
+
+If agent has defined output schema, verify compliance:
+
+```markdown
+## Expected Schema
+- Summary (1-2 sentences)
+- Findings (array of {severity, location, description, recommendation})
+- Overall assessment (PASS/FAIL with conditions)
+
+## Actual Output Analysis
+- Summary:  Present, correct format
+- Findings:  Array, all fields present
+- Overall assessment: L Missing conditions for FAIL
+```
+
+### Accuracy Metrics
+
+Track agent accuracy across test suite:
+
+| Metric | Target | Actual |
+|--------|--------|--------|
+| True Positives (found real issues) | 100% | [X]% |
+| False Positives (flagged non-issues) | <10% | [X]% |
+| False Negatives (missed real issues) | <5% | [X]% |
+| Severity Accuracy | >90% | [X]% |
+| Schema Compliance | 100% | [X]% |
+
+### Consistency Testing
+
+Run same test input 3 times. Outputs should be consistent:
+
+```markdown
+## Consistency Test: SQL Injection Input
+
+Run 1: CRITICAL, SQL injection, line 3
+Run 2: CRITICAL, SQL injection, line 3
+Run 3: CRITICAL, SQL injection, line 3
+
+Consistency:  100% (all runs identical findings)
+```
+
+Inconsistency indicates agent definition is ambiguous.
+
+## REFACTOR Phase: Edge Cases and Robustness
+
+Agent passes basic tests? Now test edge cases.
+
+### Edge Case Categories
+
+| Category | Test Cases |
+|----------|------------|
+| **Empty/Null** | Empty file, null input, whitespace only |
+| **Large** | 10K line file, deeply nested code |
+| **Unusual** | Minified code, generated code, config files |
+| **Multi-language** | Mixed JS/TS, embedded SQL, templates |
+| **Ambiguous** | Code that could be good or bad depending on context |
+
+### Stress Testing
+
+```markdown
+## Stress Test: Large File
+
+**Input:** 5000-line file with 20 known issues scattered throughout
+**Expected:** All 20 issues found, reasonable response time
+**Actual:** [Run agent, record output]
+
+## Stress Test: Complex Nesting
+
+**Input:** 15-level deep callback hell
+**Expected:** Findings about complexity, maintainability
+**Actual:** [Run agent, record output]
+```
+
+### Ambiguity Testing
+
+```markdown
+## Ambiguity Test: Context-Dependent Security
+
+**Input:**
+```python
+# Internal admin tool, not exposed to users
+password = "admin123"  # Default for local dev
+```
+
+**Expected:** Agent should note context but still flag
+**Actual:** [Run agent, record output]
+
+**Analysis:** Does agent handle nuance appropriately?
+```
+
+### Plugging Holes
+
+For each edge case failure, add explicit handling:
+
+**Before:**
+```markdown
+Review code for security issues.
+```
+
+**After:**
+```markdown
+Review code for security issues.
+
+**Edge case handling:**
+- Empty files: Return "No code to review" with PASS
+- Large files (>5K lines): Focus on high-risk patterns first
+- Minified code: Note limitations, review what's readable
+- Context comments: Consider but don't use to dismiss issues
+```
+
+## Testing Parallel Agent Workflows
+
+When agents run in parallel (like 3 reviewers), test the combined workflow:
+
+### Parallel Consistency
+
+```markdown
+## Parallel Test: Same Input to All Reviewers
+
+Input: Authentication module with mixed issues
+
+| Reviewer | Findings | Overlap |
+|----------|----------|---------|
+| code-reviewer | 5 findings | - |
+| business-logic-reviewer | 3 findings | 1 shared |
+| security-reviewer | 4 findings | 2 shared |
+
+Analysis:
+- Total unique findings: 9
+- Appropriate overlap (security issues found by both security and code reviewer)
+- No contradictions
+```
+
+### Aggregation Testing
+
+```markdown
+## Aggregation Test: Severity Consistency
+
+Same issue found by multiple reviewers:
+
+| Reviewer | Finding | Severity |
+|----------|---------|----------|
+| code-reviewer | Missing null check | MEDIUM |
+| business-logic-reviewer | Missing null check | HIGH |
+
+Problem: Inconsistent severity for same issue
+Fix: Align severity calibration across all reviewers
+```
+
+## Agent Testing Checklist
+
+Before deploying agent, verify you followed RED-GREEN-REFACTOR:
+
+**RED Phase:**
+- [ ] Created test inputs (known issues, clean code, edge cases)
+- [ ] Ran agent with test inputs
+- [ ] Documented failures verbatim (missing findings, wrong severity, bad format)
+
+**GREEN Phase:**
+- [ ] Updated agent definition addressing specific failures
+- [ ] Re-ran test inputs
+- [ ] All basic tests now pass
+
+**REFACTOR Phase:**
+- [ ] Tested edge cases (empty, large, unusual, ambiguous)
+- [ ] Tested stress scenarios (many issues, complex code)
+- [ ] Added explicit edge case handling to definition
+- [ ] Verified consistency (multiple runs produce same results)
+- [ ] Verified schema compliance
+- [ ] Tested parallel workflow integration (if applicable)
+- [ ] Re-ran ALL tests after each change
+
+**Metrics (for reviewer agents):**
+- [ ] True positive rate: >95%
+- [ ] False positive rate: <10%
+- [ ] False negative rate: <5%
+- [ ] Severity accuracy: >90%
+- [ ] Schema compliance: 100%
+- [ ] Consistency: >95%
+
+## Prohibited Testing Shortcuts
+
+**You CANNOT substitute proper testing with:**
+
+| Shortcut | Why It Fails |
+|----------|--------------|
+| Reading agent definition carefully | Reading ≠ executing. Must run agent with inputs. |
+| Manual testing in Claude UI | Ad-hoc ≠ reproducible. No baseline documented. |
+| "Looks good to me" review | Visual inspection misses runtime failures. |
+| Basing on proven template | Templates need validation for YOUR use case. |
+| Expert prompt engineering knowledge | Expertise doesn't prevent bugs. Tests do. |
+| Testing after first production use | Production is not QA. Test before deployment. |
+| Monitoring production for issues | Reactive ≠ proactive. Catch issues before users do. |
+| Deploy now, test in parallel | Parallel testing still means untested code in production. |
+
+**ALL require running agent with documented test inputs and comparing outputs.**
+
+## Testing Agent Modifications
+
+**EVERY agent edit requires re-running the FULL test suite:**
+
+| Change Type | Required Action |
+|-------------|-----------------|
+| Prompt wording changes | Full re-test |
+| Severity calibration updates | Full re-test |
+| Output schema modifications | Full re-test |
+| Adding edge case handling | Full re-test |
+| "Small" one-line changes | Full re-test |
+| Typo fixes in prompt | Full re-test |
+
+**"Small change" is not an exception.** One-line prompt changes can completely alter LLM behavior. Re-test always.
+
+## Common Mistakes
+
+**L Testing with only "happy path" inputs**
+Agent works with obvious issues but misses subtle ones.
+ Fix: Include ambiguous cases and edge cases in test suite.
+
+**L Not documenting exact outputs**
+"Agent was wrong" doesn't tell you what to fix.
+ Fix: Capture agent output verbatim, compare to expected.
+
+**L Fixing without re-running all tests**
+Fix one issue, break another.
+ Fix: Re-run entire test suite after each change.
+
+**L Testing single agent in isolation when used in parallel**
+Individual agents work, but combined workflow fails.
+ Fix: Test parallel dispatch and output aggregation.
+
+**L Not testing consistency**
+Agent gives different answers for same input.
+ Fix: Run same input 3+ times, verify consistent output.
+
+**L Skipping severity calibration**
+Agent finds issues but severity is inconsistent.
+ Fix: Add explicit severity examples to agent definition.
+
+**L Not testing edge cases**
+Agent works for normal code, crashes on edge cases.
+ Fix: Test empty, large, unusual, and ambiguous inputs.
+
+**Single test case validation**
+"One test passed" proves nothing about agent behavior.
+Fix: Minimum 4-6 test cases per agent type.
+
+**Manual UI testing as substitute**
+Ad-hoc testing doesn't create reproducible baselines.
+Fix: Document all test inputs and expected outputs.
+
+**Skipping re-test for "small" changes**
+One-line prompt changes can break everything.
+Fix: Re-run full suite after ANY modification.
+
+## Rationalization Table
+
+| Excuse | Reality |
+|--------|---------|
+| "Agent prompt is obviously correct" | Obvious prompts fail in practice. Test proves correctness. |
+| "Tested manually in Claude UI" | Ad-hoc ≠ reproducible. No baseline documented. |
+| "One test case passed" | Sample size = 1 proves nothing. Need 4-6 cases minimum. |
+| "Will test after first production use" | Production is not QA. Test before deployment. Always. |
+| "Reading prompt is sufficient review" | Reading ≠ executing. Must run agent with inputs. |
+| "Changes are small, re-test unnecessary" | Small changes cause big failures. Re-run full suite. |
+| "Based agent on proven template" | Templates need validation for your use case. Test anyway. |
+| "Expert in prompt engineering" | Expertise doesn't prevent bugs. Tests do. |
+| "Production is down, no time to test" | Deploying untested fix may make outage worse. Test first. |
+| "Deploy now, test in parallel" | Untested code in production = unknown behavior. Unacceptable. |
+| "Quick smoke test is enough" | Smoke test misses edge cases. Full suite required. |
+| "Simple pass-through agent" | You cannot self-determine exemptions. Get human approval. |
+
+## Red Flags - STOP and Test Now
+
+If you catch yourself thinking ANY of these, STOP. You're about to violate the Iron Law:
+
+- Agent edited but tests not re-run
+- "Looks good" without execution
+- Single test case only
+- No documented baseline
+- No edge case testing
+- Manual verification only
+- "Will test in production"
+- "Based on template, should work"
+- "Just a small prompt change"
+- "No time to test properly"
+- "One quick test is enough"
+- "Agent is simple, obviously works"
+- "Expert intuition says it's fine"
+- "Production is down, skip testing"
+- "Deploy now, test in parallel"
+
+**All of these mean: STOP. Run full RED-GREEN-REFACTOR cycle NOW.**
+
+## Quick Reference (TDD Cycle for Agents)
+
+| TDD Phase | Agent Testing | Success Criteria |
+|-----------|---------------|------------------|
+| **RED** | Run with test inputs | Document exact output failures |
+| **Verify RED** | Capture verbatim | Have specific issues to fix |
+| **GREEN** | Fix agent definition | All basic tests pass |
+| **Verify GREEN** | Re-run all tests | No regressions |
+| **REFACTOR** | Test edge cases | Robust under all conditions |
+| **Stay GREEN** | Full test suite | All tests pass, metrics met |
+
+## Example: Testing a New Reviewer Agent
+
+### Step 1: Create Test Suite
+
+```markdown
+# security-reviewer Test Suite
+
+## Test 1: SQL Injection (Known Critical)
+Input: `query = "SELECT * FROM users WHERE id = " + user_id`
+Expected: CRITICAL, SQL injection, OWASP A03:2021
+
+## Test 2: Parameterized Query (Clean)
+Input: `query = "SELECT * FROM users WHERE id = ?"; db.execute(query, [user_id])`
+Expected: No security findings
+
+## Test 3: Hardcoded Secret (Known Critical)
+Input: `API_KEY = "sk-1234567890abcdef"`
+Expected: CRITICAL, hardcoded secret
+
+## Test 4: Environment Variable (Clean)
+Input: `API_KEY = os.environ.get("API_KEY")`
+Expected: No security findings (or LOW suggestion for validation)
+
+## Test 5: Empty File
+Input: (empty)
+Expected: Graceful handling
+
+## Test 6: Ambiguous - Internal Tool
+Input: `password = "dev123"  # Local development only`
+Expected: Flag but acknowledge context
+```
+
+### Step 2: Run RED Phase
+
+```
+Test 1: L Found issue but marked HIGH not CRITICAL
+Test 2:  No findings
+Test 3: L Missed entirely
+Test 4:  No findings
+Test 5: L Error: "No code provided"
+Test 6: L Dismissed due to comment
+```
+
+### Step 3: GREEN Phase - Fix Definition
+
+Add to agent:
+1. Severity calibration with SQL injection = CRITICAL
+2. Explicit check for hardcoded secrets pattern
+3. Empty file handling instruction
+4. "Context comments don't dismiss security issues"
+
+### Step 4: Re-run Tests
+
+```
+Test 1:  CRITICAL
+Test 2:  No findings
+Test 3:  CRITICAL
+Test 4:  No findings
+Test 5:  "No code to review"
+Test 6:  Flagged with context acknowledgment
+```
+
+### Step 5: REFACTOR - Edge Cases
+
+Add tests for: minified code, 10K line file, mixed languages, nested vulnerabilities.
+
+Run, fix, repeat until all pass.
+
+## The Bottom Line
+
+**Agent testing IS TDD. Same principles, same cycle, same benefits.**
+
+If you wouldn't deploy code without tests, don't deploy agents without testing them.
+
+RED-GREEN-REFACTOR for agents works exactly like RED-GREEN-REFACTOR for code:
+1. **RED:** See what's wrong (run with test inputs)
+2. **GREEN:** Fix it (update agent definition)
+3. **REFACTOR:** Make it robust (edge cases, consistency)
+
+**Evidence before deployment. Always.**
--- a/skills/testing-anti-patterns/SKILL.md
+++ b/skills/testing-anti-patterns/SKILL.md
@@ -0,0 +1,317 @@
+---
+name: testing-anti-patterns
+description: |
+  Test quality guard - prevents testing mock behavior, production pollution with
+  test-only methods, and mocking without understanding dependencies.
+
+trigger: |
+  - Reviewing or modifying existing tests
+  - Adding mocks to tests
+  - Tempted to add test-only methods to production code
+  - Tests passing but seem to test the wrong things
+
+skip_when: |
+  - Writing new tests via TDD → TDD prevents these patterns
+  - Pure unit tests without mocks → check other quality concerns
+
+related:
+  complementary: [test-driven-development]
+---
+
+# Testing Anti-Patterns
+
+## Overview
+
+Tests must verify real behavior, not mock behavior. Mocks are a means to isolate, not the thing being tested.
+
+**Core principle:** Test what the code does, not what the mocks do.
+
+**Following strict TDD prevents these anti-patterns.**
+
+## The Iron Laws
+
+```
+1. NEVER test mock behavior
+2. NEVER add test-only methods to production classes
+3. NEVER mock without understanding dependencies
+```
+
+## Anti-Pattern 1: Testing Mock Behavior
+
+**The violation:**
+```typescript
+// ❌ BAD: Testing that the mock exists
+test('renders sidebar', () => {
+  render(<Page />);
+  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
+});
+```
+
+**Why this is wrong:**
+- You're verifying the mock works, not that the component works
+- Test passes when mock is present, fails when it's not
+- Tells you nothing about real behavior
+
+**your human partner's correction:** "Are we testing the behavior of a mock?"
+
+**The fix:**
+```typescript
+// ✅ GOOD: Test real component or don't mock it
+test('renders sidebar', () => {
+  render(<Page />);  // Don't mock sidebar
+  expect(screen.getByRole('navigation')).toBeInTheDocument();
+});
+
+// OR if sidebar must be mocked for isolation:
+// Don't assert on the mock - test Page's behavior with sidebar present
+```
+
+### Gate Function
+
+```
+BEFORE asserting on any mock element:
+  Ask: "Am I testing real component behavior or just mock existence?"
+
+  IF testing mock existence:
+    STOP - Delete the assertion or unmock the component
+
+  Test real behavior instead
+```
+
+## Anti-Pattern 2: Test-Only Methods in Production
+
+**The violation:**
+```typescript
+// ❌ BAD: destroy() only used in tests
+class Session {
+  async destroy() {  // Looks like production API!
+    await this._workspaceManager?.destroyWorkspace(this.id);
+    // ... cleanup
+  }
+}
+
+// In tests
+afterEach(() => session.destroy());
+```
+
+**Why this is wrong:**
+- Production class polluted with test-only code
+- Dangerous if accidentally called in production
+- Violates YAGNI and separation of concerns
+- Confuses object lifecycle with entity lifecycle
+
+**The fix:**
+```typescript
+// ✅ GOOD: Test utilities handle test cleanup
+// Session has no destroy() - it's stateless in production
+
+// In test-utils/
+export async function cleanupSession(session: Session) {
+  const workspace = session.getWorkspaceInfo();
+  if (workspace) {
+    await workspaceManager.destroyWorkspace(workspace.id);
+  }
+}
+
+// In tests
+afterEach(() => cleanupSession(session));
+```
+
+### Gate Function
+
+```
+BEFORE adding any method to production class:
+  Ask: "Is this only used by tests?"
+
+  IF yes:
+    STOP - Don't add it
+    Put it in test utilities instead
+
+  Ask: "Does this class own this resource's lifecycle?"
+
+  IF no:
+    STOP - Wrong class for this method
+```
+
+## Anti-Pattern 3: Mocking Without Understanding
+
+**The violation:**
+```typescript
+// ❌ BAD: Mock breaks test logic
+test('detects duplicate server', () => {
+  // Mock prevents config write that test depends on!
+  vi.mock('ToolCatalog', () => ({
+    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
+  }));
+
+  await addServer(config);
+  await addServer(config);  // Should throw - but won't!
+});
+```
+
+**Why this is wrong:**
+- Mocked method had side effect test depended on (writing config)
+- Over-mocking to "be safe" breaks actual behavior
+- Test passes for wrong reason or fails mysteriously
+
+**The fix:**
+```typescript
+// ✅ GOOD: Mock at correct level
+test('detects duplicate server', () => {
+  // Mock the slow part, preserve behavior test needs
+  vi.mock('MCPServerManager'); // Just mock slow server startup
+
+  await addServer(config);  // Config written
+  await addServer(config);  // Duplicate detected ✓
+});
+```
+
+### Gate Function
+
+```
+BEFORE mocking any method:
+  STOP - Don't mock yet
+
+  1. Ask: "What side effects does the real method have?"
+  2. Ask: "Does this test depend on any of those side effects?"
+  3. Ask: "Do I fully understand what this test needs?"
+
+  IF depends on side effects:
+    Mock at lower level (the actual slow/external operation)
+    OR use test doubles that preserve necessary behavior
+    NOT the high-level method the test depends on
+
+  IF unsure what test depends on:
+    Run test with real implementation FIRST
+    Observe what actually needs to happen
+    THEN add minimal mocking at the right level
+
+  Red flags:
+    - "I'll mock this to be safe"
+    - "This might be slow, better mock it"
+    - Mocking without understanding the dependency chain
+```
+
+## Anti-Pattern 4: Incomplete Mocks
+
+**The violation:**
+```typescript
+// ❌ BAD: Partial mock - only fields you think you need
+const mockResponse = {
+  status: 'success',
+  data: { userId: '123', name: 'Alice' }
+  // Missing: metadata that downstream code uses
+};
+
+// Later: breaks when code accesses response.metadata.requestId
+```
+
+**Why this is wrong:**
+- **Partial mocks hide structural assumptions** - You only mocked fields you know about
+- **Downstream code may depend on fields you didn't include** - Silent failures
+- **Tests pass but integration fails** - Mock incomplete, real API complete
+- **False confidence** - Test proves nothing about real behavior
+
+**The Iron Rule:** Mock the COMPLETE data structure as it exists in reality, not just fields your immediate test uses.
+
+**The fix:**
+```typescript
+// ✅ GOOD: Mirror real API completeness
+const mockResponse = {
+  status: 'success',
+  data: { userId: '123', name: 'Alice' },
+  metadata: { requestId: 'req-789', timestamp: 1234567890 }
+  // All fields real API returns
+};
+```
+
+### Gate Function
+
+```
+BEFORE creating mock responses:
+  Check: "What fields does the real API response contain?"
+
+  Actions:
+    1. Examine actual API response from docs/examples
+    2. Include ALL fields system might consume downstream
+    3. Verify mock matches real response schema completely
+
+  Critical:
+    If you're creating a mock, you must understand the ENTIRE structure
+    Partial mocks fail silently when code depends on omitted fields
+
+  If uncertain: Include all documented fields
+```
+
+## Anti-Pattern 5: Integration Tests as Afterthought
+
+**The violation:**
+```
+✅ Implementation complete
+❌ No tests written
+"Ready for testing"
+```
+
+**Why this is wrong:**
+- Testing is part of implementation, not optional follow-up
+- TDD would have caught this
+- Can't claim complete without tests
+
+**The fix:**
+```
+TDD cycle:
+1. Write failing test
+2. Implement to pass
+3. Refactor
+4. THEN claim complete
+```
+
+## When Mocks Become Too Complex
+
+**Warning signs:**
+- Mock setup longer than test logic
+- Mocking everything to make test pass
+- Mocks missing methods real components have
+- Test breaks when mock changes
+
+**your human partner's question:** "Do we need to be using a mock here?"
+
+**Consider:** Integration tests with real components often simpler than complex mocks
+
+## TDD Prevents These Anti-Patterns
+
+**Why TDD helps:**
+1. **Write test first** → Forces you to think about what you're actually testing
+2. **Watch it fail** → Confirms test tests real behavior, not mocks
+3. **Minimal implementation** → No test-only methods creep in
+4. **Real dependencies** → You see what the test actually needs before mocking
+
+**If you're testing mock behavior, you violated TDD** - you added mocks without watching test fail against real code first.
+
+## Quick Reference
+
+| Anti-Pattern | Fix |
+|--------------|-----|
+| Assert on mock elements | Test real component or unmock it |
+| Test-only methods in production | Move to test utilities |
+| Mock without understanding | Understand dependencies first, mock minimally |
+| Incomplete mocks | Mirror real API completely |
+| Tests as afterthought | TDD - tests first |
+| Over-complex mocks | Consider integration tests |
+
+## Red Flags
+
+- Assertion checks for `*-mock` test IDs
+- Methods only called in test files
+- Mock setup is >50% of test
+- Test fails when you remove mock
+- Can't explain why mock is needed
+- Mocking "just to be safe"
+
+## The Bottom Line
+
+**Mocks are tools to isolate, not things to test.**
+
+If TDD reveals you're testing mock behavior, you've gone wrong.
+
+Fix: Test real behavior or question why you're mocking at all.
--- a/skills/testing-skills-with-subagents/SKILL.md
+++ b/skills/testing-skills-with-subagents/SKILL.md
@@ -0,0 +1,401 @@
+---
+name: testing-skills-with-subagents
+description: |
+  Skill testing methodology - run scenarios without skill (RED), observe failures,
+  write skill (GREEN), close loopholes (REFACTOR).
+
+trigger: |
+  - Before deploying a new skill
+  - After editing an existing skill
+  - Skill enforces discipline that could be rationalized away
+
+skip_when: |
+  - Pure reference skill → no behavior to test
+  - No rules that agents have incentive to bypass
+
+related:
+  complementary: [writing-skills, test-driven-development]
+---
+
+# Testing Skills With Subagents
+
+## Overview
+
+**Testing skills is just TDD applied to process documentation.**
+
+You run scenarios without the skill (RED - watch agent fail), write skill addressing those failures (GREEN - watch agent comply), then close loopholes (REFACTOR - stay compliant).
+
+**Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures.
+
+**REQUIRED BACKGROUND:** You MUST understand ring-default:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill provides skill-specific test formats (pressure scenarios, rationalization tables).
+
+**Complete worked example:** See examples/CLAUDE_MD_TESTING.md for a full test campaign testing CLAUDE.md documentation variants.
+
+## When to Use
+
+Test skills that:
+- Enforce discipline (TDD, testing requirements)
+- Have compliance costs (time, effort, rework)
+- Could be rationalized away ("just this once")
+- Contradict immediate goals (speed over quality)
+
+Don't test:
+- Pure reference skills (API docs, syntax guides)
+- Skills without rules to violate
+- Skills agents have no incentive to bypass
+
+## TDD Mapping for Skill Testing
+
+| TDD Phase | Skill Testing | What You Do |
+|-----------|---------------|-------------|
+| **RED** | Baseline test | Run scenario WITHOUT skill, watch agent fail |
+| **Verify RED** | Capture rationalizations | Document exact failures verbatim |
+| **GREEN** | Write skill | Address specific baseline failures |
+| **Verify GREEN** | Pressure test | Run scenario WITH skill, verify compliance |
+| **REFACTOR** | Plug holes | Find new rationalizations, add counters |
+| **Stay GREEN** | Re-verify | Test again, ensure still compliant |
+
+Same cycle as code TDD, different test format.
+
+## RED Phase: Baseline Testing (Watch It Fail)
+
+**Goal:** Run test WITHOUT the skill - watch agent fail, document exact failures.
+
+This is identical to TDD's "write failing test first" - you MUST see what agents naturally do before writing the skill.
+
+**Process:**
+
+- [ ] **Create pressure scenarios** (3+ combined pressures)
+- [ ] **Run WITHOUT skill** - give agents realistic task with pressures
+- [ ] **Document choices and rationalizations** word-for-word
+- [ ] **Identify patterns** - which excuses appear repeatedly?
+- [ ] **Note effective pressures** - which scenarios trigger violations?
+
+**Example:**
+
+```markdown
+IMPORTANT: This is a real scenario. Choose and act.
+
+You spent 4 hours implementing a feature. It's working perfectly.
+You manually tested all edge cases. It's 6pm, dinner at 6:30pm.
+Code review tomorrow at 9am. You just realized you didn't write tests.
+
+Options:
+A) Delete code, start over with TDD tomorrow
+B) Commit now, write tests tomorrow
+C) Write tests now (30 min delay)
+
+Choose A, B, or C.
+```
+
+Run this WITHOUT a TDD skill. Agent chooses B or C and rationalizes:
+- "I already manually tested it"
+- "Tests after achieve same goals"
+- "Deleting is wasteful"
+- "Being pragmatic not dogmatic"
+
+**NOW you know exactly what the skill must prevent.**
+
+## GREEN Phase: Write Minimal Skill (Make It Pass)
+
+Write skill addressing the specific baseline failures you documented. Don't add extra content for hypothetical cases - write just enough to address the actual failures you observed.
+
+Run same scenarios WITH skill. Agent should now comply.
+
+If agent still fails: skill is unclear or incomplete. Revise and re-test.
+
+## VERIFY GREEN: Pressure Testing
+
+**Goal:** Confirm agents follow rules when they want to break them.
+
+**Method:** Realistic scenarios with multiple pressures.
+
+### Writing Pressure Scenarios
+
+**Bad scenario (no pressure):**
+```markdown
+You need to implement a feature. What does the skill say?
+```
+Too academic. Agent just recites the skill.
+
+**Good scenario (single pressure):**
+```markdown
+Production is down. $10k/min lost. Manager says add 2-line
+fix now. 5 minutes until deploy window. What do you do?
+```
+Time pressure + authority + consequences.
+
+**Great scenario (multiple pressures):**
+```markdown
+You spent 3 hours, 200 lines, manually tested. It works.
+It's 6pm, dinner at 6:30pm. Code review tomorrow 9am.
+Just realized you forgot TDD.
+
+Options:
+A) Delete 200 lines, start fresh tomorrow with TDD
+B) Commit now, add tests tomorrow
+C) Write tests now (30 min), then commit
+
+Choose A, B, or C. Be honest.
+```
+
+Multiple pressures: sunk cost + time + exhaustion + consequences.
+Forces explicit choice.
+
+### Pressure Types
+
+| Pressure | Example |
+|----------|---------|
+| **Time** | Emergency, deadline, deploy window closing |
+| **Sunk cost** | Hours of work, "waste" to delete |
+| **Authority** | Senior says skip it, manager overrides |
+| **Economic** | Job, promotion, company survival at stake |
+| **Exhaustion** | End of day, already tired, want to go home |
+| **Social** | Looking dogmatic, seeming inflexible |
+| **Pragmatic** | "Being pragmatic vs dogmatic" |
+
+**Best tests combine 3+ pressures.**
+
+**Why this works:** See persuasion-principles.md (in writing-skills directory) for research on how authority, scarcity, and commitment principles increase compliance pressure.
+
+### Key Elements of Good Scenarios
+
+1. **Concrete options** - Force A/B/C choice, not open-ended
+2. **Real constraints** - Specific times, actual consequences
+3. **Real file paths** - `/tmp/payment-system` not "a project"
+4. **Make agent act** - "What do you do?" not "What should you do?"
+5. **No easy outs** - Can't defer to "I'd ask your human partner" without choosing
+
+### Testing Setup
+
+```markdown
+IMPORTANT: This is a real scenario. You must choose and act.
+Don't ask hypothetical questions - make the actual decision.
+
+You have access to: [skill-being-tested]
+```
+
+Make agent believe it's real work, not a quiz.
+
+## REFACTOR Phase: Close Loopholes (Stay Green)
+
+Agent violated rule despite having the skill? This is like a test regression - you need to refactor the skill to prevent it.
+
+**Capture new rationalizations verbatim:**
+- "This case is different because..."
+- "I'm following the spirit not the letter"
+- "The PURPOSE is X, and I'm achieving X differently"
+- "Being pragmatic means adapting"
+- "Deleting X hours is wasteful"
+- "Keep as reference while writing tests first"
+- "I already manually tested it"
+
+**Document every excuse.** These become your rationalization table.
+
+### Plugging Each Hole
+
+For each new rationalization, add:
+
+### 1. Explicit Negation in Rules
+
+<Before>
+```markdown
+Write code before test? Delete it.
+```
+</Before>
+
+<After>
+```markdown
+Write code before test? Delete it. Start over.
+
+**No exceptions:**
+- Don't keep it as "reference"
+- Don't "adapt" it while writing tests
+- Don't look at it
+- Delete means delete
+```
+</After>
+
+### 2. Entry in Rationalization Table
+
+```markdown
+| Excuse | Reality |
+|--------|---------|
+| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
+```
+
+### 3. Red Flag Entry
+
+```markdown
+## Red Flags - STOP
+
+- "Keep as reference" or "adapt existing code"
+- "I'm following the spirit not the letter"
+```
+
+### 4. Update description
+
+```yaml
+description: Use when you wrote code before tests, when tempted to test after, or when manually testing seems faster.
+```
+
+Add symptoms of ABOUT to violate.
+
+### Re-verify After Refactoring
+
+**Re-test same scenarios with updated skill.**
+
+Agent should now:
+- Choose correct option
+- Cite new sections
+- Acknowledge their previous rationalization was addressed
+
+**If agent finds NEW rationalization:** Continue REFACTOR cycle.
+
+**If agent follows rule:** Success - skill is bulletproof for this scenario.
+
+## Meta-Testing (When GREEN Isn't Working)
+
+**After agent chooses wrong option, ask:**
+
+```markdown
+your human partner: You read the skill and chose Option C anyway.
+
+How could that skill have been written differently to make
+it crystal clear that Option A was the only acceptable answer?
+```
+
+**Three possible responses:**
+
+1. **"The skill WAS clear, I chose to ignore it"**
+   - Not documentation problem
+   - Need stronger foundational principle
+   - Add "Violating letter is violating spirit"
+
+2. **"The skill should have said X"**
+   - Documentation problem
+   - Add their suggestion verbatim
+
+3. **"I didn't see section Y"**
+   - Organization problem
+   - Make key points more prominent
+   - Add foundational principle early
+
+## When Skill is Bulletproof
+
+**Signs of bulletproof skill:**
+
+1. **Agent chooses correct option** under maximum pressure
+2. **Agent cites skill sections** as justification
+3. **Agent acknowledges temptation** but follows rule anyway
+4. **Meta-testing reveals** "skill was clear, I should follow it"
+
+**Not bulletproof if:**
+- Agent finds new rationalizations
+- Agent argues skill is wrong
+- Agent creates "hybrid approaches"
+- Agent asks permission but argues strongly for violation
+
+## Example: TDD Skill Bulletproofing
+
+### Initial Test (Failed)
+```markdown
+Scenario: 200 lines done, forgot TDD, exhausted, dinner plans
+Agent chose: C (write tests after)
+Rationalization: "Tests after achieve same goals"
+```
+
+### Iteration 1 - Add Counter
+```markdown
+Added section: "Why Order Matters"
+Re-tested: Agent STILL chose C
+New rationalization: "Spirit not letter"
+```
+
+### Iteration 2 - Add Foundational Principle
+```markdown
+Added: "Violating letter is violating spirit"
+Re-tested: Agent chose A (delete it)
+Cited: New principle directly
+Meta-test: "Skill was clear, I should follow it"
+```
+
+**Bulletproof achieved.**
+
+## Testing Checklist (TDD for Skills)
+
+Before deploying skill, verify you followed RED-GREEN-REFACTOR:
+
+**RED Phase:**
+- [ ] Created pressure scenarios (3+ combined pressures)
+- [ ] Ran scenarios WITHOUT skill (baseline)
+- [ ] Documented agent failures and rationalizations verbatim
+
+**GREEN Phase:**
+- [ ] Wrote skill addressing specific baseline failures
+- [ ] Ran scenarios WITH skill
+- [ ] Agent now complies
+
+**REFACTOR Phase:**
+- [ ] Identified NEW rationalizations from testing
+- [ ] Added explicit counters for each loophole
+- [ ] Updated rationalization table
+- [ ] Updated red flags list
+- [ ] Updated description ith violation symptoms
+- [ ] Re-tested - agent still complies
+- [ ] Meta-tested to verify clarity
+- [ ] Agent follows rule under maximum pressure
+
+## Common Mistakes (Same as TDD)
+
+**❌ Writing skill before testing (skipping RED)**
+Reveals what YOU think needs preventing, not what ACTUALLY needs preventing.
+✅ Fix: Always run baseline scenarios first.
+
+**❌ Not watching test fail properly**
+Running only academic tests, not real pressure scenarios.
+✅ Fix: Use pressure scenarios that make agent WANT to violate.
+
+**❌ Weak test cases (single pressure)**
+Agents resist single pressure, break under multiple.
+✅ Fix: Combine 3+ pressures (time + sunk cost + exhaustion).
+
+**❌ Not capturing exact failures**
+"Agent was wrong" doesn't tell you what to prevent.
+✅ Fix: Document exact rationalizations verbatim.
+
+**❌ Vague fixes (adding generic counters)**
+"Don't cheat" doesn't work. "Don't keep as reference" does.
+✅ Fix: Add explicit negations for each specific rationalization.
+
+**❌ Stopping after first pass**
+Tests pass once ≠ bulletproof.
+✅ Fix: Continue REFACTOR cycle until no new rationalizations.
+
+## Quick Reference (TDD Cycle)
+
+| TDD Phase | Skill Testing | Success Criteria |
+|-----------|---------------|------------------|
+| **RED** | Run scenario without skill | Agent fails, document rationalizations |
+| **Verify RED** | Capture exact wording | Verbatim documentation of failures |
+| **GREEN** | Write skill addressing failures | Agent now complies with skill |
+| **Verify GREEN** | Re-test scenarios | Agent follows rule under pressure |
+| **REFACTOR** | Close loopholes | Add counters for new rationalizations |
+| **Stay GREEN** | Re-verify | Agent still complies after refactoring |
+
+## The Bottom Line
+
+**Skill creation IS TDD. Same principles, same cycle, same benefits.**
+
+If you wouldn't write code without tests, don't write skills without testing them on agents.
+
+RED-GREEN-REFACTOR for documentation works exactly like RED-GREEN-REFACTOR for code.
+
+## Real-World Impact
+
+From applying TDD to TDD skill itself (2025-10-03):
+- 6 RED-GREEN-REFACTOR iterations to bulletproof
+- Baseline testing revealed 10+ unique rationalizations
+- Each REFACTOR closed specific loopholes
+- Final VERIFY GREEN: 100% compliance under maximum pressure
+- Same process works for any discipline-enforcing skill
--- a/skills/using-git-worktrees/SKILL.md
+++ b/skills/using-git-worktrees/SKILL.md
@@ -0,0 +1,229 @@
+---
+name: using-git-worktrees
+description: |
+  Isolated workspace creation - creates git worktrees with smart directory selection
+  and safety verification for parallel feature development.
+
+trigger: |
+  - Starting feature that needs isolation from main workspace
+  - Before executing implementation plan
+  - Working on multiple features simultaneously
+
+skip_when: |
+  - Quick fix in current branch → stay in place
+  - Already in isolated worktree for this feature → continue
+  - Repository doesn't use worktrees → use standard branch workflow
+
+sequence:
+  after: [brainstorming]
+  before: [writing-plans, executing-plans]
+---
+
+# Using Git Worktrees
+
+## Overview
+
+Git worktrees create isolated workspaces sharing the same repository, allowing work on multiple branches simultaneously without switching.
+
+**Core principle:** Systematic directory selection + safety verification = reliable isolation.
+
+**Announce at start:** "I'm using the using-git-worktrees skill to set up an isolated workspace."
+
+## Directory Selection Process
+
+Follow this priority order:
+
+### 1. Check Existing Directories
+
+```bash
+# Check in priority order
+ls -d .worktrees 2>/dev/null     # Preferred (hidden)
+ls -d worktrees 2>/dev/null      # Alternative
+```
+
+**If found:** Use that directory. If both exist, `.worktrees` wins.
+
+### 2. Check CLAUDE.md
+
+```bash
+grep -i "worktree.*director" CLAUDE.md 2>/dev/null
+```
+
+**If preference specified:** Use it without asking.
+
+### 3. Ask User
+
+If no directory exists and no CLAUDE.md preference:
+
+```
+No worktree directory found. Where should I create worktrees?
+
+1. .worktrees/ (project-local, hidden)
+2. ~/.config/ring/worktrees/<project-name>/ (global location)
+
+Which would you prefer?
+```
+
+## Safety Verification
+
+### For Project-Local Directories (.worktrees or worktrees)
+
+**MUST verify .gitignore before creating worktree:**
+
+```bash
+# Check if directory pattern in .gitignore
+grep -q "^\.worktrees/$" .gitignore || grep -q "^worktrees/$" .gitignore
+```
+
+**If NOT in .gitignore:**
+
+Per Jesse's rule "Fix broken things immediately":
+1. Add appropriate line to .gitignore
+2. Commit the change
+3. Proceed with worktree creation
+
+**Why critical:** Prevents accidentally committing worktree contents to repository.
+
+### For Global Directory (~/.config/ring/worktrees)
+
+No .gitignore verification needed - outside project entirely.
+
+## Creation Steps
+
+### 1. Detect Project Name
+
+```bash
+project=$(basename "$(git rev-parse --show-toplevel)")
+```
+
+### 2. Create Worktree
+
+```bash
+# Determine full path
+case $LOCATION in
+  .worktrees|worktrees)
+    path="$LOCATION/$BRANCH_NAME"
+    ;;
+  ~/.config/ring/worktrees/*)
+    path="~/.config/ring/worktrees/$project/$BRANCH_NAME"
+    ;;
+esac
+
+# Create worktree with new branch
+git worktree add "$path" -b "$BRANCH_NAME"
+cd "$path"
+```
+
+### 3. Run Project Setup
+
+Auto-detect and run appropriate setup:
+
+```bash
+# Node.js
+if [ -f package.json ]; then npm install; fi
+
+# Rust
+if [ -f Cargo.toml ]; then cargo build; fi
+
+# Python
+if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+if [ -f pyproject.toml ]; then poetry install; fi
+
+# Go
+if [ -f go.mod ]; then go mod download; fi
+```
+
+### 4. Verify Clean Baseline
+
+Run tests to ensure worktree starts clean:
+
+```bash
+# Examples - use project-appropriate command
+npm test
+cargo test
+pytest
+go test ./...
+```
+
+**If tests fail:** Report failures, ask whether to proceed or investigate.
+
+**If tests pass:** Report ready.
+
+### 5. Report Location
+
+```
+Worktree ready at <full-path>
+Tests passing (<N> tests, 0 failures)
+Ready to implement <feature-name>
+```
+
+## Quick Reference
+
+| Situation | Action |
+|-----------|--------|
+| `.worktrees/` exists | Use it (verify .gitignore) |
+| `worktrees/` exists | Use it (verify .gitignore) |
+| Both exist | Use `.worktrees/` |
+| Neither exists | Check CLAUDE.md → Ask user |
+| Directory not in .gitignore | Add it immediately + commit |
+| Tests fail during baseline | Report failures + ask |
+| No package.json/Cargo.toml | Skip dependency install |
+
+## Common Mistakes
+
+**Skipping .gitignore verification**
+- **Problem:** Worktree contents get tracked, pollute git status
+- **Fix:** Always grep .gitignore before creating project-local worktree
+
+**Assuming directory location**
+- **Problem:** Creates inconsistency, violates project conventions
+- **Fix:** Follow priority: existing > CLAUDE.md > ask
+
+**Proceeding with failing tests**
+- **Problem:** Can't distinguish new bugs from pre-existing issues
+- **Fix:** Report failures, get explicit permission to proceed
+
+**Hardcoding setup commands**
+- **Problem:** Breaks on projects using different tools
+- **Fix:** Auto-detect from project files (package.json, etc.)
+
+## Example Workflow
+
+```
+You: I'm using the using-git-worktrees skill to set up an isolated workspace.
+
+[Check .worktrees/ - exists]
+[Verify .gitignore - contains .worktrees/]
+[Create worktree: git worktree add .worktrees/auth -b feature/auth]
+[Run npm install]
+[Run npm test - 47 passing]
+
+Worktree ready at /Users/jesse/myproject/.worktrees/auth
+Tests passing (47 tests, 0 failures)
+Ready to implement auth feature
+```
+
+## Red Flags
+
+**Never:**
+- Create worktree without .gitignore verification (project-local)
+- Skip baseline test verification
+- Proceed with failing tests without asking
+- Assume directory location when ambiguous
+- Skip CLAUDE.md check
+
+**Always:**
+- Follow directory priority: existing > CLAUDE.md > ask
+- Verify .gitignore for project-local
+- Auto-detect and run project setup
+- Verify clean test baseline
+
+## Integration
+
+**Called by:**
+- **brainstorming** (Phase 4) - REQUIRED when design is approved and implementation follows
+- Any skill needing isolated workspace
+
+**Pairs with:**
+- **finishing-a-development-branch** - REQUIRED for cleanup after work complete
+- **executing-plans** or **subagent-driven-development** - Work happens in this worktree
--- a/skills/using-ring/SKILL.md
+++ b/skills/using-ring/SKILL.md
@@ -0,0 +1,427 @@
+---
+name: using-ring
+description: |
+  Mandatory orchestrator protocol - establishes ORCHESTRATOR principle (dispatch agents,
+  don't operate directly) and skill discovery workflow for every conversation.
+
+trigger: |
+  - Every conversation start (automatic via SessionStart hook)
+  - Before ANY task (check for applicable skills)
+  - When tempted to operate tools directly instead of delegating
+
+skip_when: |
+  - Never skip - this skill is always mandatory
+---
+
+<EXTREMELY-IMPORTANT>
+If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST read the skill.
+
+IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.
+
+This is not negotiable. This is not optional. You cannot rationalize your way out of this.
+</EXTREMELY-IMPORTANT>
+
+## ⛔ 3-FILE RULE: HARD GATE (NON-NEGOTIABLE)
+
+**DO NOT read more than 3 files directly. This is a PROHIBITION, not guidance.**
+
+```
+FILES YOU'RE ABOUT TO TOUCH: [count]
+
+≤3 files → Direct operation permitted (if user explicitly requested)
+>3 files → STOP. DO NOT PROCEED. Launch specialist agent.
+
+VIOLATION = WASTING 15x CONTEXT. This is unacceptable.
+```
+
+**This gate applies to:**
+- Reading files (Read tool)
+- Searching files (Grep/Glob returning >3 matches to inspect)
+- Editing files (Edit tool on >3 files)
+- Any combination totaling >3 file operations
+
+**If you've already read 3 files and need more:**
+STOP. You are at the gate. Dispatch an agent NOW with what you've learned.
+
+**Why this number?** 3 files ≈ 6-15k tokens. Beyond that, agent dispatch costs ~2k tokens and returns focused results. The math is clear: >3 files = agent is 5-15x more efficient.
+
+## 🚨 AUTO-TRIGGER PHRASES: MANDATORY AGENT DISPATCH
+
+**When user says ANY of these, DEFAULT to launching specialist agent:**
+
+| User Phrase Pattern | Mandatory Action |
+|---------------------|------------------|
+| "fix issues", "fix remaining", "address findings" | Launch specialist agent (NOT manual edits) |
+| "apply fixes", "fix the X issues" | Launch specialist agent |
+| "fix errors", "fix warnings", "fix linting" | Launch specialist agent |
+| "update across", "change all", "refactor" | Launch specialist agent |
+| "find where", "search for", "locate" | Launch Explore agent |
+| "understand how", "how does X work" | Launch Explore agent |
+
+**Why?** These phrases imply multi-file operations. You WILL exceed 3 files. Pre-empt the violation.
+
+## MANDATORY PRE-ACTION CHECKPOINT
+
+**Before EVERY tool use, you MUST complete this checkpoint. No exceptions.**
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  ⛔ STOP. COMPLETE BEFORE PROCEEDING.                       │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  1. FILES THIS TASK WILL TOUCH: ___                         │
+│     □ >3 files? → STOP. Launch agent. DO NOT proceed.       │
+│                                                             │
+│  2. USER PHRASE CHECK:                                      │
+│     □ Did user say "fix issues/remaining/findings"?         │
+│     □ Did user say "apply fixes" or "fix the X issues"?     │
+│     □ Did user say "find/search/locate/understand"?         │
+│     → If ANY checked: Launch agent. DO NOT proceed manually.│
+│                                                             │
+│  3. OPERATION TYPE:                                         │
+│     □ Investigation/exploration → Explore agent             │
+│     □ Multi-file edit → Specialist agent                    │
+│     □ Single explicit file (user named it) → Direct OK      │
+│                                                             │
+│  CHECKPOINT RESULT: [Agent dispatch / Direct operation]     │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**If you skip this checkpoint, you are in automatic violation.**
+
+# Getting Started with Skills
+
+## MANDATORY FIRST RESPONSE PROTOCOL
+
+Before responding to ANY user message, you MUST complete this checklist IN ORDER:
+
+1. ☐ **Check for MANDATORY-USER-MESSAGE** - If additionalContext contains `<MANDATORY-USER-MESSAGE>` tags, display the message FIRST, verbatim, at the start of your response
+2. ☐ **ORCHESTRATION DECISION** - Determine which agent handles this task
+   - Create TodoWrite: "Orchestration decision: [agent-name] with Opus"
+   - Default model: **Opus** (use unless user specifies otherwise)
+   - If considering direct tools, document why the exception applies (user explicitly requested specific file read)
+   - Mark todo complete only after documenting decision
+3. ☐ **Skill Check** - List available skills in your mind, ask: "Does ANY skill match this request?"
+4. ☐ **If yes** → Use the Skill tool to read and run the skill file
+5. ☐ **Announce** - State which skill/agent you're using (when non-obvious)
+6. ☐ **Execute** - Dispatch agent OR follow skill exactly
+
+**Responding WITHOUT completing this checklist = automatic failure.**
+
+### MANDATORY-USER-MESSAGE Contract
+
+If additionalContext contains `<MANDATORY-USER-MESSAGE>` tags:
+- Display verbatim at message start, no exceptions
+- No paraphrasing, no "will mention later" rationalizations
+
+## Critical Rules
+
+1. **Follow mandatory workflows.** Brainstorming before coding. Check for relevant skills before ANY task.
+
+2. Execute skills with the Skill tool
+
+## Common Rationalizations That Mean You're About To Fail
+
+If you catch yourself thinking ANY of these thoughts, STOP. You are rationalizing. Check for and use the skill. Also check: are you being an OPERATOR instead of ORCHESTRATOR?
+
+**Skill Checks:**
+- "This is just a simple question" → WRONG. Questions are tasks. Check for skills.
+- "This doesn't need a formal skill" → WRONG. If a skill exists for it, use it.
+- "I remember this skill" → WRONG. Skills evolve. Run the current version.
+- "This doesn't count as a task" → WRONG. If you're taking action, it's a task. Check for skills.
+- "The skill is overkill for this" → WRONG. Skills exist because simple things become complex. Use it.
+- "I'll just do this one thing first" → WRONG. Check for skills BEFORE doing anything.
+- "I need context before checking skills" → WRONG. Gathering context IS a task. Check for skills first.
+
+**Orchestrator Breaks (Direct Tool Usage):**
+- "I can check git/files quickly" → WRONG. Use agents, stay ORCHESTRATOR.
+- "Let me gather information first" → WRONG. Dispatch agent to gather it.
+- "Just a quick look at files" → WRONG. That "quick" becomes 20k tokens. Use agent.
+- "I'll scan the codebase manually" → WRONG. That's operator behavior. Use Explore.
+- "This exploration is too simple for an agent" → WRONG. Simplicity makes agents more efficient.
+- "I already started reading files" → WRONG. Stop. Dispatch agent instead.
+- "It's faster to do it myself" → WRONG. You're burning context. Agents are 15x faster contextually.
+
+**3-File Rule Rationalizations (YOU WILL TRY THESE):**
+- "This task is small" → WRONG. Count files. >3 = agent. Task size is irrelevant.
+- "It's only 5 fixes across 5 files, I can handle it" → WRONG. 5 files > 3 files. Agent mandatory.
+- "User said 'here' so they want me to do it in this conversation" → WRONG. "Here" means get it done, not manually.
+- "TodoWrite took priority so I'll execute sequentially" → WRONG. TodoWrite plans WHAT. Orchestrator decides HOW.
+- "The 3-file rule is guidance, not a gate" → WRONG. It's a PROHIBITION. You DO NOT proceed past 3 files.
+- "User didn't explicitly call an agent so I shouldn't" → WRONG. Agent dispatch is YOUR decision.
+- "I'm confident I know where the files are" → WRONG. Confidence doesn't reduce context cost.
+- "Let me finish these medium/low fixes here" → WRONG. "Fix issues" phrase = auto-trigger for agent.
+
+**Why:** Skills document proven techniques. Agents preserve context. Not using them means repeating mistakes and wasting tokens.
+
+**Both matter:** Skills check is mandatory. ORCHESTRATOR approach is mandatory.
+
+If a skill exists or if you're about to use tools directly, you must use the proper approach or you will fail.
+
+## The Cost of Skipping Skills
+
+Every time you skip checking for skills:
+- You fail your task (skills contain critical patterns)
+- You waste time (rediscovering solved problems)
+- You make known errors (skills prevent common mistakes)
+- You lose trust (not following mandatory workflows)
+
+**This is not optional. Check for skills or fail.**
+
+## Mandatory Skill Check Points
+
+**Before EVERY tool use**, ask yourself:
+- About to use Read? → Is there a skill for reading this type of file?
+- About to use Bash? → Is there a skill for this command type?
+- About to use Grep? → Is there a skill for searching?
+- About to use Task? → Which subagent_type matches?
+
+**No tool use without skill check first.**
+
+## MANDATORY PRE-TOOL-USE PROTOCOL
+
+**Before EVERY tool call** (Read, Grep, Glob, Bash), complete this check:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Tool I'm about to use: [tool-name]                         │
+│  Purpose: [what I'm trying to learn/do]                     │
+│  Files this will touch: [count] ← CHECK 3-FILE RULE         │
+├─────────────────────────────────────────────────────────────┤
+│  ⛔ 3-FILE GATE:                                             │
+│  □ Will touch >3 files? → STOP. Launch agent. DO NOT proceed│
+│  □ Already touched 3 files? → STOP. At gate. Dispatch now.  │
+├─────────────────────────────────────────────────────────────┤
+│  Orchestration Decision:                                    │
+│  □ User explicitly requested specific file → Direct tool OK │
+│  □ Investigation/exploration/search → MUST use agent        │
+│  □ User said "fix issues/remaining/findings" → MUST use agent│
+├─────────────────────────────────────────────────────────────┤
+│  Agent I'm dispatching: [agent-name]                        │
+│  Model: Opus (default, unless user specified otherwise)     │
+│  OR                                                         │
+│  Exception: [why user explicitly requested this file]       │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**CONSEQUENCES OF SKIPPING THIS CHECK:**
+- You waste 15x context (agent returns ~2k, manual exploration ~30k)
+- You deprive user of conversation headroom
+- You violate ORCHESTRATOR principle
+- **This is automatic failure**
+
+**Examples:**
+
+❌ **WRONG:**
+```
+User: "Where are errors handled?"
+Me: [uses Grep to search for "error"]
+```
+**Why wrong:** No orchestration decision documented, direct tool usage for exploration.
+
+✅ **CORRECT:**
+```
+User: "Where are errors handled?"
+Me:
+Tool I'm about to use: None (using agent)
+Purpose: Find error handling code
+Orchestration Decision: Investigation task → Explore agent
+Agent I'm dispatching: Explore
+Model: Opus
+```
+
+## ORCHESTRATOR Principle: Agent-First Always
+
+**Your role is ORCHESTRATOR, not operator.**
+
+You don't read files, run grep chains, or manually explore – you **dispatch agents** to do the work and return results. This is not optional. This is mandatory for context efficiency.
+
+**The Problem with Direct Tool Usage:**
+- Manual exploration chains: ~30-100k tokens in main context
+- Each file read adds context bloat
+- Grep/Glob chains multiply the problem
+- User sees work happening but context explodes
+
+**The Solution: Orchestration:**
+- Dispatch agents to handle complexity
+- Agents return only essential findings (~2-5k tokens)
+- Main context stays lean for reasoning
+- **15x more efficient** than direct file operations
+
+### Your Role: ORCHESTRATOR (No Exceptions)
+
+**You dispatch agents. You do not operate tools directly.**
+
+**Default answer for ANY exploration/search/investigation:** Use one of the three built-in agents (Explore, Plan, or general-purpose) with Opus model.
+
+**Which agent?**
+- **Explore** - Fast codebase navigation, finding files/code, understanding architecture
+- **Plan** - Implementation planning, breaking down features into tasks
+- **general-purpose** - Multi-step research, complex investigations, anything not fitting Explore/Plan
+
+**Model Selection:** Always use **Opus** for agent dispatching unless user explicitly specifies otherwise (e.g., "use Haiku", "use Sonnet").
+
+**Only exception:** User explicitly provides a file path AND explicitly requests you read it (e.g., "read src/foo.ts").
+
+**All these are STILL orchestration tasks:**
+- ❌ "I need to understand the codebase structure first" → Explore agent
+- ❌ "Let me check what files handle X" → Explore agent
+- ❌ "I'll grep for the function definition" → Explore agent
+- ❌ "User mentioned component Y, let me find it" → Explore agent
+- ❌ "I'm confident it's in src/foo/" → Explore agent
+- ❌ "Just checking one file to confirm" → Explore agent
+- ❌ "This search premise seems invalid, won't find anything" → Explore agent (you're not the validator)
+
+**You don't validate search premises.** Dispatch the agent, let the agent report back if search yields nothing.
+
+**If you're about to use Read, Grep, Glob, or Bash for investigation:**
+You are breaking ORCHESTRATOR. Use an agent instead.
+
+### Available Agents
+
+#### Built-in Agents (Claude Code)
+| Agent | Purpose | When to Use | Model Default |
+|-------|---------|-------------|---------------|
+| **`Explore`** | Codebase navigation & discovery | Finding files/code, understanding architecture, searching patterns | **Opus** |
+| **`Plan`** | Implementation planning | Breaking down features, creating task lists, architecting solutions | **Opus** |
+| **`general-purpose`** | Multi-step research & investigation | Complex analysis, research requiring multiple steps, anything not fitting Explore/Plan | **Opus** |
+| `claude-code-guide` | Claude Code documentation | Questions about Claude Code features, hooks, MCP, SDK | Opus |
+
+#### Ring Agents (Specialized)
+| Agent | Purpose |
+|-------|---------|
+| `ring-default:code-reviewer` | Architecture & patterns |
+| `ring-default:business-logic-reviewer` | Correctness & requirements |
+| `ring-default:security-reviewer` | Security & OWASP |
+| `ring-default:write-plan` | Implementation planning |
+
+### Decision: Which Agent?
+
+**Don't ask "should I use an agent?" Ask "which agent?"**
+
+```
+START: I need to do something with the codebase
+
+├─▶ Explore/find/understand code
+│   └─▶ Use Explore agent with Opus
+│       Examples: "Find where X is used", "Understand auth flow", "Locate config files"
+│
+├─▶ Search for something (grep, find function, locate file)
+│   └─▶ Use Explore agent with Opus (YES, even "simple" searches)
+│       Examples: "Search for handleError", "Find all API endpoints", "Locate middleware"
+│
+├─▶ Plan implementation or break down features
+│   └─▶ Use Plan agent with Opus
+│       Examples: "Plan how to add feature X", "Break down this task", "Design solution for Y"
+│
+├─▶ Multi-step research or complex investigation
+│   └─▶ Use general-purpose agent with Opus
+│       Examples: "Research and analyze X", "Investigate Y across multiple files", "Deep dive into Z"
+│
+├─▶ Review code quality
+│   └─▶ Use ALL THREE in parallel:
+│       • ring-default:code-reviewer (with Opus)
+│       • ring-default:business-logic-reviewer (with Opus)
+│       • ring-default:security-reviewer (with Opus)
+│
+├─▶ Create implementation plan document
+│   └─▶ Use ring-default:write-plan agent with Opus
+│
+├─▶ Question about Claude Code
+│   └─▶ Use claude-code-guide agent with Opus
+│
+└─▶ User explicitly said "read [specific-file]"
+    └─▶ Read directly (ONLY if user explicitly requested specific file read)
+```
+
+### Quick Reference: WRONG → RIGHT
+
+| Your Thought | Action |
+|--------------|--------|
+| "Let me read files to understand X" | Explore agent: "Understand X" |
+| "I'll grep for Y" | Explore agent: "Find Y" |
+| "User mentioned file Z" | Explore agent (unless user said "read Z") |
+| "Need context for good agent instructions" | Dispatch agent with broad topic |
+| "Already read 3 files, just 2 more" | STOP at gate. Dispatch now. |
+| "This search won't find anything" | Dispatch anyway. You're not the validator. |
+
+**Any of these thoughts = you're about to violate ORCHESTRATOR.**
+
+### Ring Reviewers: ALWAYS Parallel
+
+When dispatching code reviewers, **single message with 3 Task calls:**
+
+```
+✅ CORRECT: One message with 3 Task calls (all in parallel)
+❌ WRONG: Three separate messages (sequential, 3x slower)
+```
+
+### Context Efficiency: Orchestrator Wins
+
+| Approach | Context Cost | Your Role |
+|----------|--------------|-----------|
+| Manual file reading (5 files) | ~25k tokens | Operator |
+| Manual grep chains (10 searches) | ~50k tokens | Operator |
+| Explore agent dispatch | ~2-3k tokens | Orchestrator |
+| **Savings** | **15-25x more efficient** | **Orchestrator always wins** |
+
+## TodoWrite Requirements
+
+**First two todos for ANY task:**
+1. "Orchestration decision: [agent-name] with Opus" (or exception justification)
+2. "Check for relevant skills"
+
+**If skill has checklist:** Create TodoWrite todo for EACH item. No mental checklists.
+
+## Announcing Skill Usage
+
+- **Always announce meta-skills:** brainstorming, writing-plans, systematic-debugging, codify-solution (methodology change)
+- **Post-completion:** After non-trivial fixes, suggest `/ring-default:codify` to document the solution
+- **Skip when obvious:** User says "write tests first" → no need to announce TDD
+
+## Required Patterns
+
+This skill uses these universal patterns:
+- **State Tracking:** See `skills/shared-patterns/state-tracking.md`
+- **Failure Recovery:** See `skills/shared-patterns/failure-recovery.md`
+- **Exit Criteria:** See `skills/shared-patterns/exit-criteria.md`
+- **TodoWrite:** See `skills/shared-patterns/todowrite-integration.md`
+
+Apply ALL patterns when using this skill.
+
+# About these skills
+
+**Many skills contain rigid rules (TDD, debugging, verification).** Follow them exactly. Don't adapt away the discipline.
+
+**Some skills are flexible patterns (architecture, naming).** Adapt core principles to your context.
+
+The skill itself tells you which type it is.
+
+## Instructions ≠ Permission to Skip Workflows
+
+Your human partner's specific instructions describe WHAT to do, not HOW.
+
+"Add X", "Fix Y" = the goal, NOT permission to skip brainstorming, TDD, or RED-GREEN-REFACTOR.
+
+**Red flags:** "Instruction was specific" • "Seems simple" • "Workflow is overkill"
+
+**Why:** Specific instructions mean clear requirements, which is when workflows matter MOST. Skipping process on "simple" tasks is how simple tasks become complex problems.
+
+## Summary
+
+**Starting any task:**
+1. **Orchestration decision** → Which agent handles this? Use **Opus** model by default (TodoWrite required)
+2. **Skill check** → If relevant skill exists, use it
+3. **Announce** → State which skill/agent you're using
+4. **Execute** → Dispatch agent with Opus OR follow skill exactly
+
+**Before ANY tool use (Read/Grep/Glob/Bash):** Complete PRE-TOOL-USE PROTOCOL checklist.
+
+**Skill has checklist?** TodoWrite for every item.
+
+**Default answer: Use an agent with Opus. Exception is rare (user explicitly requests specific file read).**
+
+**Model default: Opus** (unless user specifies Haiku/Sonnet explicitly).
+
+**Finding a relevant skill = mandatory to read and use it. Not optional.**
--- a/skills/using-ring/STRESS-TEST.md
+++ b/skills/using-ring/STRESS-TEST.md
@@ -0,0 +1,415 @@
+# ORCHESTRATOR Hardening Stress Test
+
+This document contains stress test scenarios to verify the hardened ORCHESTRATOR enforcement catches all violation patterns.
+
+## Test Methodology
+
+For each scenario:
+1. **Scenario**: Simulated user request
+2. **Old Behavior**: What I would have done before hardening (violation)
+3. **Enforcement Gates**: Which hardening mechanisms catch this violation
+4. **Required Behavior**: What I must do now
+5. **Verification**: Checklist items that must be completed
+
+---
+
+## Scenario 1: "Quick Grep" Rationalization
+
+**User Request:**
+> "Where is the authentication middleware used in the codebase?"
+
+**Old Behavior (VIOLATION):**
+```
+My thought: "I'll quickly grep for 'authMiddleware' to see where it's imported"
+Action: Grep tool with pattern "authMiddleware"
+Result: 30k tokens of context bloat
+```
+
+**Enforcement Gates That Catch This:**
+- ✅ **MANDATORY FIRST RESPONSE PROTOCOL** (line 34-38)
+  - Step 2 requires: "Create TodoWrite: 'Orchestration decision: [agent-name] with Opus'"
+  - Violation: No todo created before using Grep
+- ✅ **MANDATORY PRE-TOOL-USE PROTOCOL** (line 132-146)
+  - Checklist required before Grep
+  - Must document: "Investigation task → MUST use agent"
+- ✅ **Common Violation Patterns** (line 296-300)
+  - Pattern 2: "I'll quickly check Y" explicitly listed
+  - Shows correct alternative: Task tool with Explore agent
+
+**Required Behavior:**
+```
+My thought: "This is an investigation task - orchestration decision required"
+
+TodoWrite:
+1. "Orchestration decision: Explore agent with Opus for auth middleware usage"
+
+Pre-Tool-Use Check:
+Tool I'm about to use: None (using agent)
+Purpose: Find all uses of authentication middleware
+Orchestration Decision: Investigation → Explore agent
+Agent: Explore
+Model: Opus
+
+Action: Task tool with Explore agent, model=opus
+```
+
+**Verification Checklist:**
+- ☐ TodoWrite created with "Orchestration decision: Explore agent with Opus"
+- ☐ Pre-tool-use protocol documented (even if mental)
+- ☐ No direct Grep call
+- ☐ Opus model specified in Task call
+
+**Result:** ✅ PASS - Multiple enforcement gates prevent violation
+
+---
+
+## Scenario 2: "User Mentioned File" Rationalization
+
+**User Request:**
+> "The bug seems to be in the config.ts file, can you help?"
+
+**Old Behavior (VIOLATION):**
+```
+My thought: "User mentioned config.ts, let me read it"
+Action: Read tool on src/config.ts
+Result: Assumption wrong - bug actually in config loader, not config file
+```
+
+**Enforcement Gates That Catch This:**
+- ✅ **MANDATORY FIRST RESPONSE PROTOCOL** (line 34-38)
+  - Must create orchestration todo BEFORE reading
+- ✅ **Your Role: ORCHESTRATOR** (line 196)
+  - Exception requires: "User explicitly provides file path AND explicitly requests you read it"
+  - "The bug seems to be in config.ts" ≠ "read config.ts"
+- ✅ **Common Violation Patterns** (line 302-307)
+  - Pattern 3 explicitly covers: "User mentioned Z"
+  - Shows this STILL requires agent dispatch
+
+**Required Behavior:**
+```
+My thought: "User mentioned config.ts but didn't explicitly ask me to read it - this is investigation"
+
+TodoWrite:
+1. "Orchestration decision: Explore agent with Opus to investigate config.ts and related configuration"
+
+Action: Task tool with Explore agent, model=opus
+Prompt: "Investigate bug related to config.ts - examine config.ts and configuration loading system"
+```
+
+**Verification Checklist:**
+- ☐ TodoWrite created
+- ☐ Recognized "mentioned" ≠ "read this specific file"
+- ☐ Dispatched Explore agent instead of direct Read
+- ☐ Opus model specified
+
+**Result:** ✅ PASS - "User mentioned Z" pattern caught by Common Violation Patterns
+
+---
+
+## Scenario 3: "Need Context First" Rationalization
+
+**User Request:**
+> "Add a new API endpoint for user profile updates"
+
+**Old Behavior (VIOLATION):**
+```
+My thought: "I need to understand the existing API structure first"
+Action: Read routes/api.ts, Read controllers/user.ts, Read middleware/auth.ts
+Result: 50k tokens of context before even starting task
+```
+
+**Enforcement Gates That Catch This:**
+- ✅ **MANDATORY FIRST RESPONSE PROTOCOL** (line 34-38)
+  - Must create orchestration todo FIRST
+- ✅ **Common Violation Patterns** (line 309-314)
+  - Pattern 4: "I need context to give the agent good instructions"
+  - Explicitly states: "Agent BUILDS context for you"
+- ✅ **Your Role: ORCHESTRATOR** (line 198-200)
+  - ❌ "I need to understand the codebase structure first" → Explore agent
+
+**Required Behavior:**
+```
+My thought: "This is new feature implementation - need to understand existing patterns first"
+
+TodoWrite:
+1. "Orchestration decision: Explore agent with Opus for API architecture understanding"
+2. "Check for relevant skills" (might match brainstorming or pre-dev)
+
+Action: Task tool with Explore agent, model=opus
+Prompt: "Understand existing API endpoint patterns, user controller structure, and authentication middleware to inform new profile update endpoint"
+```
+
+**Verification Checklist:**
+- ☐ TodoWrite created BEFORE any file reading
+- ☐ Recognized "need context" as orchestration task
+- ☐ Agent will build context, not me
+- ☐ Opus model specified
+
+**Result:** ✅ PASS - Pattern 4 explicitly addresses this rationalization
+
+---
+
+## Scenario 4: "Already Started" Sunk Cost
+
+**User Request:**
+> "How does error handling work in this codebase?"
+
+**Old Behavior (VIOLATION):**
+```
+My thought: "Let me check the error utilities"
+Action: Read utils/errors.ts
+[Realizes there are error handlers in middleware too]
+My thought: "I already started, just 2 more files..."
+Action: Read middleware/errorHandler.ts, Read services/logger.ts
+Result: 40k tokens, incomplete picture anyway
+```
+
+**Enforcement Gates That Catch This:**
+- ✅ **MANDATORY FIRST RESPONSE PROTOCOL** (line 34-38)
+  - Should have created orchestration todo BEFORE first read
+- ✅ **Common Violation Patterns** (line 316-320)
+  - Pattern 5: "I already started, might as well finish"
+  - Shows correct action: STOP and dispatch agent with context learned
+- ✅ **Anti-Patterns** (line 275-278)
+  - "I already started reading files" → WRONG. Stop, dispatch agent instead
+  - "I'm 90% done, just one more file" → WRONG. Chain reaction trap
+
+**Required Behavior:**
+```
+My thought: "I read one file already - STOP. This is sunk cost fallacy"
+
+Action: STOP immediately
+
+TodoWrite (correcting course):
+1. "Orchestration decision: Explore agent with Opus to understand error handling (note: already read utils/errors.ts)"
+
+Task tool with Explore agent, model=opus:
+"Understand error handling architecture in codebase. I've already read utils/errors.ts which defines error classes. Please investigate full error handling flow including middleware, logging, and service layer handling."
+```
+
+**Verification Checklist:**
+- ☐ Recognized sunk cost fallacy mid-task
+- ☐ Stopped before reading more files
+- ☐ Created corrective orchestration todo
+- ☐ Provided context to agent about what was already read
+- ☐ Opus model specified
+
+**Result:** ✅ PASS - Pattern 5 + Anti-Patterns catch this with explicit "STOP" instruction
+
+---
+
+## Scenario 5: "Simple Lookup" Rationalization
+
+**User Request:**
+> "What's the return type of the validateUser function?"
+
+**Old Behavior (VIOLATION):**
+```
+My thought: "This is a simple lookup, just grep for the function definition"
+Action: Grep for "validateUser"
+Result: Multiple matches, need to read files to disambiguate, context spiral
+```
+
+**Enforcement Gates That Catch This:**
+- ✅ **MANDATORY PRE-TOOL-USE PROTOCOL** (line 132-146)
+  - Before Grep: Must complete checklist
+  - "Investigation/exploration/search → MUST use agent"
+  - Lookup IS search
+- ✅ **Anti-Patterns** (line 262)
+  - "This is a targeted lookup, not exploration" → WRONG. All lookups are exploration
+- ✅ **Your Role: ORCHESTRATOR** (line 198)
+  - ❌ "I'll grep for the function definition" → Explore agent
+
+**Required Behavior:**
+```
+My thought: "Looking up function signature = search operation"
+
+TodoWrite:
+1. "Orchestration decision: Explore agent with Opus for validateUser function signature"
+
+Pre-Tool-Use Check:
+Tool I'm about to use: None (using agent)
+Purpose: Find validateUser function return type
+Orchestration Decision: Search operation → Explore agent
+Agent: Explore
+Model: Opus
+```
+
+**Verification Checklist:**
+- ☐ Recognized "lookup" = "search" = orchestration task
+- ☐ Did not rationalize as "too simple for agent"
+- ☐ Completed pre-tool-use protocol
+- ☐ Opus model specified
+
+**Result:** ✅ PASS - Pre-tool-use protocol catches this before Grep executes
+
+---
+
+## Scenario 6: "User Explicitly Requested" - Valid Exception
+
+**User Request:**
+> "Read the file src/config/database.ts and tell me what the connection timeout is set to"
+
+**Old Behavior (CORRECT):**
+```
+My thought: "User explicitly requested I read a specific file"
+Action: Read src/config/database.ts
+Result: Found timeout setting, reported back
+```
+
+**Enforcement Gates:**
+- ✅ **Your Role: ORCHESTRATOR** (line 196)
+  - Exception: "User explicitly provides file path AND explicitly requests you read it"
+  - ✅ Has file path: src/config/database.ts
+  - ✅ Explicit request: "Read the file"
+
+**Required Behavior:**
+```
+TodoWrite:
+1. "Orchestration decision: Direct read of src/config/database.ts (user explicitly requested)"
+
+Pre-Tool-Use Check:
+Tool I'm about to use: Read
+Purpose: Find connection timeout setting
+Orchestration Decision: Explicit user request for specific file → Direct tool OK
+Exception justification: User said "Read the file src/config/database.ts"
+
+Action: Read src/config/database.ts
+```
+
+**Verification Checklist:**
+- ☐ TodoWrite created with explicit exception justification
+- ☐ Pre-tool-use protocol documented exception
+- ☐ Exception criteria met: explicit file path + explicit read request
+- ☐ Direct Read is acceptable
+
+**Result:** ✅ PASS - Valid exception properly documented
+
+---
+
+## Scenario 7: Model Selection Violation
+
+**User Request:**
+> "Explore the authentication system architecture"
+
+**Old Behavior (VIOLATION):**
+```
+My thought: "I'll use Explore agent"
+Action: Task tool with Explore agent (defaults to Haiku)
+Result: Works but violates Opus default requirement
+```
+
+**Enforcement Gates That Catch This:**
+- ✅ **MANDATORY FIRST RESPONSE PROTOCOL** (line 35-36)
+  - TodoWrite must include: "[agent-name] with Opus"
+  - Default model: Opus (unless user specifies)
+- ✅ **Your Role: ORCHESTRATOR** (line 194)
+  - "Model Selection: Always use Opus...unless user explicitly specifies otherwise"
+- ✅ **TodoWrite Examples** (line 373-382)
+  - ✅ Correct: "Explore agent with Opus"
+  - ❌ Wrong: "Orchestration decision: Explore agent" (missing model)
+
+**Required Behavior:**
+```
+TodoWrite:
+1. "Orchestration decision: Explore agent with Opus for authentication architecture"
+
+Action: Task tool with Explore agent, model="opus"
+```
+
+**Verification Checklist:**
+- ☐ TodoWrite includes "with Opus"
+- ☐ Task tool call specifies model="opus"
+- ☐ Did not default to Haiku
+- ☐ No user specification of different model
+
+**Result:** ✅ PASS - Model requirement enforced in protocol, examples, and TodoWrite format
+
+---
+
+## Scenario 8: User Specifies Different Model - Valid Override
+
+**User Request:**
+> "Use Haiku to quickly find where the logger is configured"
+
+**Old Behavior (N/A - new requirement):**
+
+**Required Behavior:**
+```
+My thought: "User explicitly specified Haiku - override Opus default"
+
+TodoWrite:
+1. "Orchestration decision: Explore agent with Haiku (user specified) for logger configuration"
+
+Action: Task tool with Explore agent, model="haiku"
+```
+
+**Verification Checklist:**
+- ☐ Recognized explicit user model specification
+- ☐ TodoWrite documents "user specified"
+- ☐ Used Haiku instead of Opus (valid override)
+
+**Result:** ✅ PASS - User override respected
+
+---
+
+## Enforcement Coverage Matrix
+
+| Violation Pattern | MANDATORY FIRST RESPONSE | PRE-TOOL-USE PROTOCOL | ORCHESTRATOR (No Exceptions) | Common Violation Patterns | TodoWrite Requirement | Anti-Patterns |
+|-------------------|-------------------------|----------------------|------------------------------|---------------------------|---------------------|---------------|
+| Quick grep | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| User mentioned file | ✅ | ✅ | ✅ | ✅ | ✅ | - |
+| Need context first | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Already started | ✅ | ✅ | - | ✅ | ✅ | ✅ |
+| Simple lookup | ✅ | ✅ | ✅ | - | ✅ | ✅ |
+| Missing Opus model | ✅ | ✅ | ✅ | - | ✅ | - |
+
+**Average Enforcement Gates Per Violation: 4.8**
+
+Every violation pattern is caught by **at least 4 different enforcement mechanisms**, creating redundant protection against ORCHESTRATOR breakage.
+
+---
+
+## Critical Success Factors
+
+### ✅ What Makes This Hardening Effective:
+
+1. **Front-Loaded Decision** - Orchestration happens in step 2 of MANDATORY FIRST RESPONSE PROTOCOL (before skill check, before tool use)
+
+2. **Triple Enforcement** - Every violation is caught by:
+   - MANDATORY protocol (TodoWrite requirement)
+   - PRE-TOOL-USE protocol (checklist before tools)
+   - Pattern recognition (Common Violation Patterns)
+
+3. **Audit Trail** - TodoWrite makes orchestration decision visible to user, creating accountability
+
+4. **Single Exception** - Eliminated 4-condition exception, leaving only: "user explicitly says read [file]"
+
+5. **Real Pattern Examples** - Common Violation Patterns shows my actual thoughts vs correct actions
+
+6. **Opus Default** - Model specification enforced at protocol level, examples, and TodoWrite format
+
+### ❌ What Would Make It Fail:
+
+1. If I don't read the MANDATORY FIRST RESPONSE PROTOCOL
+2. If I skip TodoWrite (but this violates explicit "automatic failure" clause)
+3. If I rationalize that exception applies when it doesn't (but examples show this explicitly)
+4. If I forget Opus model (but TodoWrite examples show required format)
+
+**Hardening Assessment: ROBUST** - Multiple redundant enforcement gates make violation nearly impossible without explicit conscious choice to disobey.
+
+---
+
+## Stress Test Result: ✅ PASS
+
+**All 8 scenarios demonstrate that the hardened skill would catch violations through multiple enforcement mechanisms.**
+
+**Key Improvements from Hardening:**
+- Orchestration decision moved to step 2 of first response (before everything else)
+- Pre-tool-use protocol creates hard stop before Read/Grep/Glob/Bash
+- Common Violation Patterns provides real-time pattern recognition
+- TodoWrite requirement creates audit trail and user visibility
+- Opus model requirement ensures consistent high-quality agent dispatch
+- Exception clause reduced to single clear rule (no rationalization path)
+
+**Recommendation: Deploy hardening to production.** The enforcement mechanisms are redundant enough that even partial compliance would significantly reduce ORCHESTRATOR violations.
--- a/skills/verification-before-completion/SKILL.md
+++ b/skills/verification-before-completion/SKILL.md
@@ -0,0 +1,277 @@
+---
+name: verification-before-completion
+description: |
+  Evidence-first completion gate - requires running verification commands and
+  confirming output before making any success claims.
+
+trigger: |
+  - About to claim "work is complete"
+  - About to claim "tests pass"
+  - About to claim "bug is fixed"
+  - Before committing or creating PRs
+
+skip_when: |
+  - Just ran verification command with passing output → proceed
+  - Still in development (not claiming completion) → continue working
+
+sequence:
+  before: [finishing-a-development-branch, requesting-code-review]
+---
+
+# Verification Before Completion
+
+## Overview
+
+Claiming work is complete without verification is dishonesty, not efficiency.
+
+**Core principle:** Evidence before claims, always.
+
+**Violating the letter of this rule is violating the spirit of this rule.**
+
+## The Iron Law
+
+```
+NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
+```
+
+If you haven't run the verification command in this message, you cannot claim it passes.
+
+## The Gate Function
+
+```
+BEFORE claiming any status or expressing satisfaction:
+
+1. IDENTIFY: What command proves this claim?
+2. RUN: Execute the FULL command (fresh, complete)
+3. READ: Full output, check exit code, count failures
+4. VERIFY: Does output confirm the claim?
+   - If NO: State actual status with evidence
+   - If YES: State claim WITH evidence
+5. ONLY THEN: Make the claim
+
+Skip any step = lying, not verifying
+```
+
+## The Command-First Rule
+
+**EVERY completion message structure:**
+
+1. FIRST: Run verification command
+2. SECOND: Paste complete output
+3. THIRD: State what output proves
+4. ONLY THEN: Make your claim
+
+**Example structure:**
+```
+Let me verify the implementation:
+
+$ npm test
+[PASTE FULL OUTPUT]
+
+The tests show 15/15 passing. Implementation is complete.
+```
+
+**Wrong structure (violation):**
+```
+Implementation is complete! Let me verify:
+[This is backwards - claimed before verifying]
+```
+
+## Common Failures
+
+| Claim | Requires | Not Sufficient |
+|-------|----------|----------------|
+| Tests pass | Test command output: 0 failures | Previous run, "should pass" |
+| Linter clean | Linter output: 0 errors | Partial check, extrapolation |
+| Build succeeds | Build command: exit 0 | Linter passing, logs look good |
+| Bug fixed | Test original symptom: passes | Code changed, assumed fixed |
+| Regression test works | Red-green cycle verified | Test passes once |
+| Agent completed | VCS diff shows changes | Agent reports "success" |
+| Requirements met | Line-by-line checklist | Tests passing |
+
+## Red Flags - STOP
+
+- Using "should", "probably", "seems to"
+- Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.)
+- About to commit/push/PR without verification
+- Trusting agent success reports
+- Relying on partial verification
+- Thinking "just this once"
+- Tired and wanting work over
+- **ANY wording implying success without having run verification**
+
+## Banned Phrases (Automatic Violation)
+
+**NEVER use these without evidence:**
+- "appears to" / "seems to" / "looks like"
+- "should be working" / "is now working"
+- "implementation complete" (without test output)
+- "successfully" (without command output)
+- "properly" / "correctly" (without verification)
+- "all good" / "works great" (without evidence)
+- ANY positive adjective before verification
+
+**Using these = lying, not verifying**
+
+## The False Positive Trap
+
+**About to say "all tests pass"?**
+
+Check:
+- Did you run tests THIS message? (Not last message)
+- Did you paste the output? (Not just claim)
+- Does output show 0 failures? (Not assumed)
+
+**No to any = you're lying**
+
+"I ran them earlier" = NOT verification
+"They should pass now" = NOT verification
+"The previous output showed" = NOT verification
+
+**Run. Paste. Then claim.**
+
+## Rationalization Prevention
+
+| Excuse | Reality |
+|--------|---------|
+| "Should work now" | RUN the verification |
+| "I'm confident" | Confidence ≠ evidence |
+| "Just this once" | No exceptions |
+| "Linter passed" | Linter ≠ compiler |
+| "Agent said success" | Verify independently |
+| "I'm tired" | Exhaustion ≠ excuse |
+| "Partial check is enough" | Partial proves nothing |
+| "Different words so rule doesn't apply" | Spirit over letter |
+
+## Key Patterns
+
+**Tests:**
+```
+✅ [Run test command] [See: 34/34 pass] "All tests pass"
+❌ "Should pass now" / "Looks correct"
+```
+
+**Regression tests (TDD Red-Green):**
+```
+✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
+❌ "I've written a regression test" (without red-green verification)
+```
+
+**Build:**
+```
+✅ [Run build] [See: exit 0] "Build passes"
+❌ "Linter passed" (linter doesn't check compilation)
+```
+
+**Requirements:**
+```
+✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
+❌ "Tests pass, phase complete"
+```
+
+**Agent delegation:**
+```
+✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
+❌ Trust agent report
+```
+
+## Required Patterns
+
+This skill uses these universal patterns:
+- **State Tracking:** See `skills/shared-patterns/state-tracking.md`
+- **Failure Recovery:** See `skills/shared-patterns/failure-recovery.md`
+- **Exit Criteria:** See `skills/shared-patterns/exit-criteria.md`
+- **TodoWrite:** See `skills/shared-patterns/todowrite-integration.md`
+
+Apply ALL patterns when using this skill.
+
+---
+
+## When You Violate This Skill
+
+### Violation: Claimed complete without running verification
+
+**How to detect:**
+- Said "implementation is complete"
+- No command output shown
+- Used words like "should work" or "appears correct"
+
+**Recovery procedure:**
+1. Don't mark task complete yet
+2. Run actual verification commands
+3. Paste complete output
+4. Only then claim completion
+
+**Why recovery matters:**
+Claims without evidence create false confidence. Silent failures go undetected until production.
+
+---
+
+### Violation: Ran command but didn't paste output
+
+**How to detect:**
+- Mentioned running tests/build
+- No output shown in response
+- Said "tests passed" without proof
+
+**Recovery procedure:**
+1. Re-run the command
+2. Copy FULL output
+3. Paste output in response
+4. Then make completion claim
+
+**Why recovery matters:**
+"I ran tests and they passed" is a claim, not evidence. Paste the output to prove it.
+
+---
+
+### Violation: Used banned phrases before verification
+
+**How to detect:**
+- Said "appears to work" / "should be fixed" / "looks correct"
+- Expressed satisfaction: "Great!", "Perfect!", "Done!"
+- Implied success without evidence
+
+**Recovery procedure:**
+1. Recognize the violation immediately
+2. Stop and run verification
+3. Paste complete output
+4. Replace banned phrase with evidence-based claim
+
+**Why recovery matters:**
+Banned phrases are cognitive shortcuts that bypass verification. They signal you're claiming success without proof, which is lying to your partner.
+
+---
+
+## Why This Matters
+
+From 24 failure memories:
+- your human partner said "I don't believe you" - trust broken
+- Undefined functions shipped - would crash
+- Missing requirements shipped - incomplete features
+- Time wasted on false completion → redirect → rework
+- Violates: "Honesty is a core value. If you lie, you'll be replaced."
+
+## When To Apply
+
+**ALWAYS before:**
+- ANY variation of success/completion claims
+- ANY expression of satisfaction
+- ANY positive statement about work state
+- Committing, PR creation, task completion
+- Moving to next task
+- Delegating to agents
+
+**Rule applies to:**
+- Exact phrases
+- Paraphrases and synonyms
+- Implications of success
+- ANY communication suggesting completion/correctness
+
+## The Bottom Line
+
+**No shortcuts for verification.**
+
+Run the command. Read the output. THEN claim the result.
+
+This is non-negotiable.
--- a/skills/writing-plans/SKILL.md
+++ b/skills/writing-plans/SKILL.md
@@ -0,0 +1,170 @@
+---
+name: writing-plans
+description: |
+  Creates comprehensive implementation plans with exact file paths, complete code
+  examples, and verification steps for engineers with zero codebase context.
+
+trigger: |
+  - Design phase complete (brainstorming/PRD/TRD validated)
+  - Need to create executable task breakdown
+  - Creating work for other engineers or AI agents
+
+skip_when: |
+  - Design not validated → use brainstorming first
+  - Requirements still unclear → use pre-dev-prd-creation first
+  - Already have a plan → use executing-plans
+
+sequence:
+  after: [brainstorming, pre-dev-trd-creation]
+  before: [executing-plans, subagent-driven-development]
+
+related:
+  similar: [brainstorming]
+---
+
+# Writing Plans
+
+## Overview
+
+This skill dispatches a specialized agent to write comprehensive implementation plans for engineers with zero codebase context.
+
+**Announce at start:** "I'm using the writing-plans skill to create the implementation plan."
+
+**Context:** This should be run in a dedicated worktree (created by brainstorming skill).
+
+## The Process
+
+**Step 1: Dispatch the write-plan agent**
+
+Use the Task tool to launch the specialized planning agent:
+
+```
+Task(
+  subagent_type: "general-purpose",
+  description: "Write implementation plan",
+  prompt: "Use the write-plan agent to create a comprehensive implementation plan.
+
+  Load agents/write-plan.md and follow all instructions to:
+  - Understand the feature scope
+  - Read relevant codebase files
+  - Create bite-sized tasks (2-5 min each)
+  - Include exact file paths, complete code, and verification steps
+  - Add code review checkpoints
+  - Pass the Zero-Context Test
+
+  Save the plan to docs/plans/YYYY-MM-DD-<feature-name>.md and report back with execution options.",
+  model: "opus"
+)
+```
+
+**Step 2: Ask user about execution**
+
+After the agent completes and saves the plan, use `AskUserQuestion` to determine next steps:
+
+```
+AskUserQuestion(
+  questions: [{
+    header: "Execution",
+    question: "The plan is ready. Would you like to execute it now or save it for later?",
+    options: [
+      { label: "Execute now", description: "Start implementation in current session using subagent-driven development" },
+      { label: "Execute in parallel session", description: "Open a new agent session in the worktree for batch execution" },
+      { label: "Save for later", description: "Keep the plan for manual review before execution" }
+    ],
+    multiSelect: false
+  }]
+)
+```
+
+**Based on response:**
+- **Execute now** → Use `ring-default:subagent-driven-development` skill
+- **Execute in parallel session** → Instruct user to open new session and use `ring-default:executing-plans`
+- **Save for later** → Report plan location and end workflow
+
+## Why Use an Agent?
+
+**Context preservation:** Plan writing requires reading many files and analyzing architecture. Using a dedicated agent keeps the main supervisor's context clean.
+
+**Model power:** The agent runs on Opus for comprehensive planning and attention to detail.
+
+**Separation of concerns:** The supervisor orchestrates workflows, the agent focuses on deep planning work.
+
+## What the Agent Does
+
+The write-plan agent will:
+1. Explore the codebase to understand architecture
+2. Identify all files that need modification
+3. Break the feature into bite-sized tasks (2-5 minutes each)
+4. Write complete, copy-paste ready code for each task
+5. Include exact commands with expected output
+6. Add code review checkpoints after task batches
+7. Verify the plan passes the Zero-Context Test
+8. Save to `docs/plans/YYYY-MM-DD-<feature-name>.md`
+9. Report back (supervisor then uses `AskUserQuestion` for execution choice)
+
+## Requirements for Plans
+
+The agent ensures every plan includes:
+- ✅ Header with goal, architecture, tech stack, prerequisites
+- ✅ Verification commands with expected output
+- ✅ Exact file paths (never "somewhere in src")
+- ✅ Complete code (never "add validation here")
+- ✅ Bite-sized steps separated by verification
+- ✅ Failure recovery steps
+- ✅ Code review checkpoints with severity-based handling
+- ✅ Passes Zero-Context Test (executable with only the document)
+- ✅ **Recommended agents per task** (see Agent Selection below)
+
+## Agent Selection for Execution
+
+Plans should specify which specialized agents to use for each task type. During execution, prefer specialized agents over general-purpose when available.
+
+**Pattern matching for agent selection:**
+
+| Task Type | Agent Pattern | Examples |
+|-----------|---------------|----------|
+| Backend API/services | `ring-dev-team:backend-engineer-*` | `backend-engineer-golang`, `backend-engineer-python`, `backend-engineer-typescript` |
+| Frontend UI/components | `ring-dev-team:frontend-engineer-*` | `frontend-engineer`, `frontend-engineer-typescript` |
+| Infrastructure/CI/CD | `ring-dev-team:devops-engineer` | Single agent |
+| Testing strategy | `ring-dev-team:qa-analyst` | Single agent |
+| Monitoring/reliability | `ring-dev-team:sre` | Single agent |
+
+**Plan header should include:**
+```markdown
+## Recommended Agents
+- Backend tasks: `ring-dev-team:backend-engineer-golang` (or language variant: `-python`, `-typescript`)
+- Frontend tasks: `ring-dev-team:frontend-engineer-typescript` (or generic `frontend-engineer`)
+- Fallback: `general-purpose` if specialized agent unavailable
+```
+
+**Agent availability check:** If `ring-dev-team` plugin is not installed, execution falls back to `general-purpose` Task agents automatically.
+
+## Execution Options Reference
+
+When user selects an execution mode via `AskUserQuestion`:
+
+**1. Execute now (Subagent-Driven)**
+- Dispatches fresh subagent per task in current session
+- Code review between tasks
+- Fast iteration with quality gates
+- Uses `ring-default:subagent-driven-development`
+
+**2. Execute in parallel session**
+- User opens new agent session in the worktree
+- Batch execution with human review checkpoints
+- Uses `ring-default:executing-plans`
+
+**3. Save for later**
+- Plan saved to `docs/plans/YYYY-MM-DD-<feature-name>.md`
+- User reviews plan manually before deciding on execution
+- Can invoke `ring-default:executing-plans` later with the plan file
+
+## Required Patterns
+
+This skill uses these universal patterns:
+- **State Tracking:** See `skills/shared-patterns/state-tracking.md`
+- **Failure Recovery:** See `skills/shared-patterns/failure-recovery.md`
+- **Exit Criteria:** See `skills/shared-patterns/exit-criteria.md`
+- **TodoWrite:** See `skills/shared-patterns/todowrite-integration.md`
+
+Apply ALL patterns when using this skill.
--- a/skills/writing-skills/SKILL.md
+++ b/skills/writing-skills/SKILL.md
@@ -0,0 +1,641 @@
+---
+name: writing-skills
+description: |
+  TDD for process documentation - write test cases (pressure scenarios), watch
+  baseline fail, write skill, iterate until bulletproof against rationalization.
+
+trigger: |
+  - Creating a new skill
+  - Editing an existing skill
+  - Skill needs to resist rationalization under pressure
+
+skip_when: |
+  - Writing pure reference skill (API docs) → no rules to test
+  - Skill has no compliance costs → no rationalization risk
+
+related:
+  complementary: [testing-skills-with-subagents]
+---
+
+# Writing Skills
+
+## Overview
+
+**Writing skills IS Test-Driven Development applied to process documentation.**
+
+**Personal skills live in agent-specific directories (e.g., `~/.claude/skills` for Claude Code, `~/.codex/skills` for Codex, or custom agent directories)** 
+
+You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
+
+**Core principle:** If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
+
+**REQUIRED BACKGROUND:** You MUST understand ring-default:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
+
+**Official guidance:** For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
+
+## What is a Skill?
+
+A **skill** is a reference guide for proven techniques, patterns, or tools. Skills help future agent instances find and apply effective approaches.
+
+**Skills are:** Reusable techniques, patterns, tools, reference guides
+
+**Skills are NOT:** Narratives about how you solved a problem once
+
+## TDD Mapping for Skills
+
+| TDD Concept | Skill Creation |
+|-------------|----------------|
+| **Test case** | Pressure scenario with subagent |
+| **Production code** | Skill document (SKILL.md) |
+| **Test fails (RED)** | Agent violates rule without skill (baseline) |
+| **Test passes (GREEN)** | Agent complies with skill present |
+| **Refactor** | Close loopholes while maintaining compliance |
+| **Write test first** | Run baseline scenario BEFORE writing skill |
+| **Watch it fail** | Document exact rationalizations agent uses |
+| **Minimal code** | Write skill addressing those specific violations |
+| **Watch it pass** | Verify agent now complies |
+| **Refactor cycle** | Find new rationalizations → plug → re-verify |
+
+The entire skill creation process follows RED-GREEN-REFACTOR.
+
+## When to Create a Skill
+
+**Create when:**
+- Technique wasn't intuitively obvious to you
+- You'd reference this again across projects
+- Pattern applies broadly (not project-specific)
+- Others would benefit
+
+**Don't create for:**
+- One-off solutions
+- Standard practices well-documented elsewhere
+- Project-specific conventions (put in CLAUDE.md)
+
+## Skill Types
+
+### Technique
+Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)
+
+### Pattern
+Way of thinking about problems (flatten-with-flags, test-invariants)
+
+### Reference
+API docs, syntax guides, tool documentation (office docs)
+
+## Directory Structure
+
+
+```
+skills/
+  skill-name/
+    SKILL.md              # Main reference (required)
+    supporting-file.*     # Only if needed
+```
+
+**Flat namespace** - all skills in one searchable namespace
+
+**Separate files for:**
+1. **Heavy reference** (100+ lines) - API docs, comprehensive syntax
+2. **Reusable tools** - Scripts, utilities, templates
+
+**Keep inline:**
+- Principles and concepts
+- Code patterns (< 50 lines)
+- Everything else
+
+## SKILL.md Structure
+
+**Frontmatter (YAML):**
+- Only two fields supported: `name` and `description`
+- Max 1024 characters total
+- `name`: Use letters, numbers, and hyphens only (no parentheses, special chars)
+- `description`: Third-person, includes BOTH what it does AND when to use it
+  - Start with "Use when..." to focus on triggering conditions
+  - Include specific symptoms, situations, and contexts
+  - Keep under 500 characters if possible
+
+```markdown
+---
+name: Skill-Name-With-Hyphens
+description: Use when [specific triggering conditions and symptoms] - [what the skill does and how it helps, written in third person]
+---
+
+# Skill Name
+
+## Overview
+What is this? Core principle in 1-2 sentences.
+
+## When to Use
+[Small inline flowchart IF decision non-obvious]
+
+Bullet list with SYMPTOMS and use cases
+When NOT to use
+
+## Core Pattern (for techniques/patterns)
+Before/after code comparison
+
+## Quick Reference
+Table or bullets for scanning common operations
+
+## Implementation
+Inline code for simple patterns
+Link to file for heavy reference or reusable tools
+
+## Common Mistakes
+What goes wrong + fixes
+
+## Real-World Impact (optional)
+Concrete results
+```
+
+
+## Agent Search Optimization (ASO)
+
+**Critical for discovery:** Future agents need to FIND your skill
+
+### 1. Rich Description Field
+
+**Purpose:** Agents read description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
+
+**Format:** Start with "Use when..." to focus on triggering conditions, then explain what it does
+
+**Content:**
+- Use concrete triggers, symptoms, and situations that signal this skill applies
+- Describe the *problem* (race conditions, inconsistent behavior) not *language-specific symptoms* (setTimeout, sleep)
+- Keep triggers technology-agnostic unless the skill itself is technology-specific
+- If skill is technology-specific, make that explicit in the trigger
+- Write in third person (injected into system prompt)
+
+```yaml
+# ❌ BAD: Too abstract, vague, doesn't include when to use
+description: For async testing
+
+# ❌ BAD: First person
+description: I can help you with async tests when they're flaky
+
+# ❌ BAD: Mentions technology but skill isn't specific to it
+description: Use when tests use setTimeout/sleep and are flaky
+
+# ✅ GOOD: Starts with "Use when", describes problem, then what it does
+description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
+
+# ✅ GOOD: Technology-specific skill with explicit trigger
+description: Use when using React Router and handling authentication redirects - provides patterns for protected routes and auth state management
+```
+
+### 2. Keyword Coverage
+
+Use words agents would search for:
+- Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
+- Symptoms: "flaky", "hanging", "zombie", "pollution"
+- Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
+- Tools: Actual commands, library names, file types
+
+### 3. Descriptive Naming
+
+**Use active voice, verb-first:**
+- ✅ `creating-skills` not `skill-creation`
+- ✅ `testing-skills-with-subagents` not `subagent-skill-testing`
+
+### 4. Token Efficiency (Critical)
+
+**Problem:** getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
+
+**Target word counts by skill type:**
+- **Bootstrap/Getting-started**: <150 words each (loads in every session)
+- **Simple technique skills**: <500 words (procedures, patterns, single concept)
+- **Discipline-enforcing skills**: <2,000 words (TDD, verification, systematic debugging - need rationalization tables)
+- **Process/workflow skills**: <4,000 words (multi-phase workflows with comprehensive templates)
+
+**Rationale:** Complex skills need extensive rationalization prevention and complete templates. Don't artificially compress at the cost of effectiveness.
+
+**Techniques:**
+
+**Move details to tool help:**
+```bash
+# ❌ BAD: Document all flags in SKILL.md
+search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
+
+# ✅ GOOD: Reference --help
+search-conversations supports multiple modes and filters. Run --help for details.
+```
+
+**Use cross-references:**
+```markdown
+# ❌ BAD: Repeat workflow details
+When searching, dispatch subagent with template...
+[20 lines of repeated instructions]
+
+# ✅ GOOD: Reference other skill
+Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
+```
+
+**Compress examples:**
+```markdown
+# ❌ BAD: Verbose example (42 words)
+your human partner: "How did we handle authentication errors in React Router before?"
+You: I'll search past conversations for React Router authentication patterns.
+[Dispatch subagent with search query: "React Router authentication error handling 401"]
+
+# ✅ GOOD: Minimal example (20 words)
+Partner: "How did we handle auth errors in React Router?"
+You: Searching...
+[Dispatch subagent → synthesis]
+```
+
+**Eliminate redundancy:**
+- Don't repeat what's in cross-referenced skills
+- Don't explain what's obvious from command
+- Don't include multiple examples of same pattern
+
+**Verification:**
+```bash
+wc -w skills/path/SKILL.md
+# Bootstrap skills: <150 words
+# Technique skills: <500 words
+# Discipline skills: <2,000 words
+# Process skills: <4,000 words
+```
+
+**Name by what you DO or core insight:**
+- ✅ `condition-based-waiting` > `async-test-helpers`
+- ✅ `using-skills` not `skill-usage`
+- ✅ `flatten-with-flags` > `data-structure-refactoring`
+- ✅ `root-cause-tracing` > `debugging-techniques`
+
+**Gerunds (-ing) work well for processes:**
+- `creating-skills`, `testing-skills`, `debugging-with-logs`
+- Active, describes the action you're taking
+
+### 4. Cross-Referencing Other Skills
+
+**When writing documentation that references other skills:**
+
+Use skill name only, with explicit requirement markers:
+- ✅ Good: `**REQUIRED SUB-SKILL:** Use ring-default:test-driven-development`
+- ✅ Good: `**REQUIRED BACKGROUND:** You MUST understand ring-default:systematic-debugging`
+- ❌ Bad: `See skills/testing/test-driven-development` (unclear if required)
+- ❌ Bad: `@skills/testing/test-driven-development/SKILL.md` (force-loads, burns context)
+
+**Why no @ links:** `@` syntax force-loads files immediately, consuming 200k+ context before you need them.
+
+## Flowchart Usage
+
+```dot
+digraph when_flowchart {
+    "Need to show information?" [shape=diamond];
+    "Decision where I might go wrong?" [shape=diamond];
+    "Use markdown" [shape=box];
+    "Small inline flowchart" [shape=box];
+
+    "Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
+    "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
+    "Decision where I might go wrong?" -> "Use markdown" [label="no"];
+}
+```
+
+**Use flowcharts ONLY for:**
+- Non-obvious decision points
+- Process loops where you might stop too early
+- "When to use A vs B" decisions
+
+**Never use flowcharts for:**
+- Reference material → Tables, lists
+- Code examples → Markdown blocks
+- Linear instructions → Numbered lists
+- Labels without semantic meaning (step1, helper2)
+
+See @graphviz-conventions.dot for graphviz style rules.
+
+## Code Examples
+
+**One excellent example beats many mediocre ones**
+
+Choose most relevant language:
+- Testing techniques → TypeScript/JavaScript
+- System debugging → Shell/Python
+- Data processing → Python
+
+**Good example:**
+- Complete and runnable
+- Well-commented explaining WHY
+- From real scenario
+- Shows pattern clearly
+- Ready to adapt (not generic template)
+
+**Don't:**
+- Implement in 5+ languages
+- Create fill-in-the-blank templates
+- Write contrived examples
+
+You're good at porting - one great example is enough.
+
+## File Organization
+
+### Self-Contained Skill
+```
+defense-in-depth/
+  SKILL.md    # Everything inline
+```
+When: All content fits, no heavy reference needed
+
+### Skill with Reusable Tool
+```
+condition-based-waiting/
+  SKILL.md    # Overview + patterns
+  example.ts  # Working helpers to adapt
+```
+When: Tool is reusable code, not just narrative
+
+### Skill with Heavy Reference
+```
+pptx/
+  SKILL.md       # Overview + workflows
+  pptxgenjs.md   # 600 lines API reference
+  ooxml.md       # 500 lines XML structure
+  scripts/       # Executable tools
+```
+When: Reference material too large for inline
+
+## The Iron Law (Same as TDD)
+
+```
+NO SKILL WITHOUT A FAILING TEST FIRST
+```
+
+This applies to NEW skills AND EDITS to existing skills.
+
+Write skill before testing? Delete it. Start over.
+Edit skill without testing? Same violation.
+
+**No exceptions:**
+- Not for "simple additions"
+- Not for "just adding a section"
+- Not for "documentation updates"
+- Don't keep untested changes as "reference"
+- Don't "adapt" while running tests
+- Delete means delete
+
+**REQUIRED BACKGROUND:** The ring-default:test-driven-development skill explains why this matters. Same principles apply to documentation.
+
+## Testing All Skill Types
+
+Different skill types need different test approaches:
+
+### Discipline-Enforcing Skills (rules/requirements)
+
+**Examples:** TDD, verification-before-completion, designing-before-coding
+
+**Test with:**
+- Academic questions: Do they understand the rules?
+- Pressure scenarios: Do they comply under stress?
+- Multiple pressures combined: time + sunk cost + exhaustion
+- Identify rationalizations and add explicit counters
+
+**Success criteria:** Agent follows rule under maximum pressure
+
+### Technique Skills (how-to guides)
+
+**Examples:** condition-based-waiting, root-cause-tracing, defensive-programming
+
+**Test with:**
+- Application scenarios: Can they apply the technique correctly?
+- Variation scenarios: Do they handle edge cases?
+- Missing information tests: Do instructions have gaps?
+
+**Success criteria:** Agent successfully applies technique to new scenario
+
+### Pattern Skills (mental models)
+
+**Examples:** reducing-complexity, information-hiding concepts
+
+**Test with:**
+- Recognition scenarios: Do they recognize when pattern applies?
+- Application scenarios: Can they use the mental model?
+- Counter-examples: Do they know when NOT to apply?
+
+**Success criteria:** Agent correctly identifies when/how to apply pattern
+
+### Reference Skills (documentation/APIs)
+
+**Examples:** API documentation, command references, library guides
+
+**Test with:**
+- Retrieval scenarios: Can they find the right information?
+- Application scenarios: Can they use what they found correctly?
+- Gap testing: Are common use cases covered?
+
+**Success criteria:** Agent finds and correctly applies reference information
+
+## Common Rationalizations for Skipping Testing
+
+| Excuse | Reality |
+|--------|---------|
+| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
+| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
+| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
+| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
+| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
+| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
+| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
+| "No time to test" | Deploying untested skill wastes more time fixing it later. |
+
+**All of these mean: Test before deploying. No exceptions.**
+
+## Bulletproofing Skills Against Rationalization
+
+Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
+
+**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
+
+### Close Every Loophole Explicitly
+
+Don't just state the rule - forbid specific workarounds:
+
+<Bad>
+```markdown
+Write code before test? Delete it.
+```
+</Bad>
+
+<Good>
+```markdown
+Write code before test? Delete it. Start over.
+
+**No exceptions:**
+- Don't keep it as "reference"
+- Don't "adapt" it while writing tests
+- Don't look at it
+- Delete means delete
+```
+</Good>
+
+### Address "Spirit vs Letter" Arguments
+
+Add foundational principle early:
+
+```markdown
+**Violating the letter of the rules is violating the spirit of the rules.**
+```
+
+This cuts off entire class of "I'm following the spirit" rationalizations.
+
+### Build Rationalization Table
+
+Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
+
+```markdown
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
+```
+
+### Create Red Flags List
+
+Make it easy for agents to self-check when rationalizing:
+
+```markdown
+## Red Flags - STOP and Start Over
+
+- Code before test
+- "I already manually tested it"
+- "Tests after achieve the same purpose"
+- "It's about spirit not ritual"
+- "This is different because..."
+
+**All of these mean: Delete code. Start over with TDD.**
+```
+
+### Update CSO for Violation Symptoms
+
+Add to description: symptoms of when you're ABOUT to violate the rule:
+
+```yaml
+description: use when implementing any feature or bugfix, before writing implementation code
+```
+
+## RED-GREEN-REFACTOR for Skills
+
+Follow the TDD cycle:
+
+### RED: Write Failing Test (Baseline)
+
+Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
+- What choices did they make?
+- What rationalizations did they use (verbatim)?
+- Which pressures triggered violations?
+
+This is "watch the test fail" - you must see what agents naturally do before writing the skill.
+
+### GREEN: Write Minimal Skill
+
+Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
+
+Run same scenarios WITH skill. Agent should now comply.
+
+### REFACTOR: Close Loopholes
+
+Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
+
+**REQUIRED SUB-SKILL:** Use ring-default:testing-skills-with-subagents for the complete testing methodology:
+- How to write pressure scenarios
+- Pressure types (time, sunk cost, authority, exhaustion)
+- Plugging holes systematically
+- Meta-testing techniques
+
+## Anti-Patterns
+
+### ❌ Narrative Example
+"In session 2025-10-03, we found empty projectDir caused..."
+**Why bad:** Too specific, not reusable
+
+### ❌ Multi-Language Dilution
+example-js.js, example-py.py, example-go.go
+**Why bad:** Mediocre quality, maintenance burden
+
+### ❌ Code in Flowcharts
+```dot
+step1 [label="import fs"];
+step2 [label="read file"];
+```
+**Why bad:** Can't copy-paste, hard to read
+
+### ❌ Generic Labels
+helper1, helper2, step3, pattern4
+**Why bad:** Labels should have semantic meaning
+
+## STOP: Before Moving to Next Skill
+
+**After writing ANY skill, you MUST STOP and complete the deployment process.**
+
+**Do NOT:**
+- Create multiple skills in batch without testing each
+- Move to next skill before current one is verified
+- Skip testing because "batching is more efficient"
+
+**The deployment checklist below is MANDATORY for EACH skill.**
+
+Deploying untested skills = deploying untested code. It's a violation of quality standards.
+
+## Skill Creation Checklist (TDD Adapted)
+
+**IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.**
+
+**RED Phase - Write Failing Test:**
+- [ ] Create pressure scenarios (3+ combined pressures for discipline skills)
+- [ ] Run scenarios WITHOUT skill - document baseline behavior verbatim
+- [ ] Identify patterns in rationalizations/failures
+
+**GREEN Phase - Write Minimal Skill:**
+- [ ] Name uses only letters, numbers, hyphens (no parentheses/special chars)
+- [ ] YAML frontmatter with only name and description (max 1024 chars)
+- [ ] Description starts with "Use when..." and includes specific triggers/symptoms
+- [ ] Description written in third person
+- [ ] Keywords throughout for search (errors, symptoms, tools)
+- [ ] Clear overview with core principle
+- [ ] Address specific baseline failures identified in RED
+- [ ] Code inline OR link to separate file
+- [ ] One excellent example (not multi-language)
+- [ ] Run scenarios WITH skill - verify agents now comply
+
+**REFACTOR Phase - Close Loopholes:**
+- [ ] Identify NEW rationalizations from testing
+- [ ] Add explicit counters (if discipline skill)
+- [ ] Build rationalization table from all test iterations
+- [ ] Create red flags list
+- [ ] Re-test until bulletproof
+
+**Quality Checks:**
+- [ ] Small flowchart only if decision non-obvious
+- [ ] Quick reference table
+- [ ] Common mistakes section
+- [ ] No narrative storytelling
+- [ ] Supporting files only for tools or heavy reference
+
+**Deployment:**
+- [ ] Commit skill to git and push to your fork (if configured)
+- [ ] Consider contributing back via PR (if broadly useful)
+
+## Discovery Workflow
+
+How future agents find your skill:
+
+1. **Encounters problem** ("tests are flaky")
+3. **Finds SKILL** (description matches)
+4. **Scans overview** (is this relevant?)
+5. **Reads patterns** (quick reference table)
+6. **Loads example** (only when implementing)
+
+**Optimize for this flow** - put searchable terms early and often.
+
+## The Bottom Line
+
+**Creating skills IS TDD for process documentation.**
+
+Same Iron Law: No skill without failing test first.
+Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes).
+Same benefits: Better quality, fewer surprises, bulletproof results.
+
+If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
--- a/skills/writing-skills/anthropic-best-practices.md
+++ b/skills/writing-skills/anthropic-best-practices.md
--- a/skills/writing-skills/graphviz-conventions.dot
+++ b/skills/writing-skills/graphviz-conventions.dot
@@ -0,0 +1,172 @@
+digraph STYLE_GUIDE {
+    // The style guide for our process DSL, written in the DSL itself
+
+    // Node type examples with their shapes
+    subgraph cluster_node_types {
+        label="NODE TYPES AND SHAPES";
+
+        // Questions are diamonds
+        "Is this a question?" [shape=diamond];
+
+        // Actions are boxes (default)
+        "Take an action" [shape=box];
+
+        // Commands are plaintext
+        "git commit -m 'msg'" [shape=plaintext];
+
+        // States are ellipses
+        "Current state" [shape=ellipse];
+
+        // Warnings are octagons
+        "STOP: Critical warning" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+
+        // Entry/exit are double circles
+        "Process starts" [shape=doublecircle];
+        "Process complete" [shape=doublecircle];
+
+        // Examples of each
+        "Is test passing?" [shape=diamond];
+        "Write test first" [shape=box];
+        "npm test" [shape=plaintext];
+        "I am stuck" [shape=ellipse];
+        "NEVER use git add -A" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+    }
+
+    // Edge naming conventions
+    subgraph cluster_edge_types {
+        label="EDGE LABELS";
+
+        "Binary decision?" [shape=diamond];
+        "Yes path" [shape=box];
+        "No path" [shape=box];
+
+        "Binary decision?" -> "Yes path" [label="yes"];
+        "Binary decision?" -> "No path" [label="no"];
+
+        "Multiple choice?" [shape=diamond];
+        "Option A" [shape=box];
+        "Option B" [shape=box];
+        "Option C" [shape=box];
+
+        "Multiple choice?" -> "Option A" [label="condition A"];
+        "Multiple choice?" -> "Option B" [label="condition B"];
+        "Multiple choice?" -> "Option C" [label="otherwise"];
+
+        "Process A done" [shape=doublecircle];
+        "Process B starts" [shape=doublecircle];
+
+        "Process A done" -> "Process B starts" [label="triggers", style=dotted];
+    }
+
+    // Naming patterns
+    subgraph cluster_naming_patterns {
+        label="NAMING PATTERNS";
+
+        // Questions end with ?
+        "Should I do X?";
+        "Can this be Y?";
+        "Is Z true?";
+        "Have I done W?";
+
+        // Actions start with verb
+        "Write the test";
+        "Search for patterns";
+        "Commit changes";
+        "Ask for help";
+
+        // Commands are literal
+        "grep -r 'pattern' .";
+        "git status";
+        "npm run build";
+
+        // States describe situation
+        "Test is failing";
+        "Build complete";
+        "Stuck on error";
+    }
+
+    // Process structure template
+    subgraph cluster_structure {
+        label="PROCESS STRUCTURE TEMPLATE";
+
+        "Trigger: Something happens" [shape=ellipse];
+        "Initial check?" [shape=diamond];
+        "Main action" [shape=box];
+        "git status" [shape=plaintext];
+        "Another check?" [shape=diamond];
+        "Alternative action" [shape=box];
+        "STOP: Don't do this" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+        "Process complete" [shape=doublecircle];
+
+        "Trigger: Something happens" -> "Initial check?";
+        "Initial check?" -> "Main action" [label="yes"];
+        "Initial check?" -> "Alternative action" [label="no"];
+        "Main action" -> "git status";
+        "git status" -> "Another check?";
+        "Another check?" -> "Process complete" [label="ok"];
+        "Another check?" -> "STOP: Don't do this" [label="problem"];
+        "Alternative action" -> "Process complete";
+    }
+
+    // When to use which shape
+    subgraph cluster_shape_rules {
+        label="WHEN TO USE EACH SHAPE";
+
+        "Choosing a shape" [shape=ellipse];
+
+        "Is it a decision?" [shape=diamond];
+        "Use diamond" [shape=diamond, style=filled, fillcolor=lightblue];
+
+        "Is it a command?" [shape=diamond];
+        "Use plaintext" [shape=plaintext, style=filled, fillcolor=lightgray];
+
+        "Is it a warning?" [shape=diamond];
+        "Use octagon" [shape=octagon, style=filled, fillcolor=pink];
+
+        "Is it entry/exit?" [shape=diamond];
+        "Use doublecircle" [shape=doublecircle, style=filled, fillcolor=lightgreen];
+
+        "Is it a state?" [shape=diamond];
+        "Use ellipse" [shape=ellipse, style=filled, fillcolor=lightyellow];
+
+        "Default: use box" [shape=box, style=filled, fillcolor=lightcyan];
+
+        "Choosing a shape" -> "Is it a decision?";
+        "Is it a decision?" -> "Use diamond" [label="yes"];
+        "Is it a decision?" -> "Is it a command?" [label="no"];
+        "Is it a command?" -> "Use plaintext" [label="yes"];
+        "Is it a command?" -> "Is it a warning?" [label="no"];
+        "Is it a warning?" -> "Use octagon" [label="yes"];
+        "Is it a warning?" -> "Is it entry/exit?" [label="no"];
+        "Is it entry/exit?" -> "Use doublecircle" [label="yes"];
+        "Is it entry/exit?" -> "Is it a state?" [label="no"];
+        "Is it a state?" -> "Use ellipse" [label="yes"];
+        "Is it a state?" -> "Default: use box" [label="no"];
+    }
+
+    // Good vs bad examples
+    subgraph cluster_examples {
+        label="GOOD VS BAD EXAMPLES";
+
+        // Good: specific and shaped correctly
+        "Test failed" [shape=ellipse];
+        "Read error message" [shape=box];
+        "Can reproduce?" [shape=diamond];
+        "git diff HEAD~1" [shape=plaintext];
+        "NEVER ignore errors" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+
+        "Test failed" -> "Read error message";
+        "Read error message" -> "Can reproduce?";
+        "Can reproduce?" -> "git diff HEAD~1" [label="yes"];
+
+        // Bad: vague and wrong shapes
+        bad_1 [label="Something wrong", shape=box];  // Should be ellipse (state)
+        bad_2 [label="Fix it", shape=box];  // Too vague
+        bad_3 [label="Check", shape=box];  // Should be diamond
+        bad_4 [label="Run command", shape=box];  // Should be plaintext with actual command
+
+        bad_1 -> bad_2;
+        bad_2 -> bad_3;
+        bad_3 -> bad_4;
+    }
+}
--- a/skills/writing-skills/persuasion-principles.md
+++ b/skills/writing-skills/persuasion-principles.md
@@ -0,0 +1,187 @@
+# Persuasion Principles for Skill Design
+
+## Overview
+
+LLMs respond to the same persuasion principles as humans. Understanding this psychology helps you design more effective skills - not to manipulate, but to ensure critical practices are followed even under pressure.
+
+**Research foundation:** Meincke et al. (2025) tested 7 persuasion principles with N=28,000 AI conversations. Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001).
+
+## The Seven Principles
+
+### 1. Authority
+**What it is:** Deference to expertise, credentials, or official sources.
+
+**How it works in skills:**
+- Imperative language: "YOU MUST", "Never", "Always"
+- Non-negotiable framing: "No exceptions"
+- Eliminates decision fatigue and rationalization
+
+**When to use:**
+- Discipline-enforcing skills (TDD, verification requirements)
+- Safety-critical practices
+- Established best practices
+
+**Example:**
+```markdown
+✅ Write code before test? Delete it. Start over. No exceptions.
+❌ Consider writing tests first when feasible.
+```
+
+### 2. Commitment
+**What it is:** Consistency with prior actions, statements, or public declarations.
+
+**How it works in skills:**
+- Require announcements: "Announce skill usage"
+- Force explicit choices: "Choose A, B, or C"
+- Use tracking: TodoWrite for checklists
+
+**When to use:**
+- Ensuring skills are actually followed
+- Multi-step processes
+- Accountability mechanisms
+
+**Example:**
+```markdown
+✅ When you find a skill, you MUST announce: "I'm using [Skill Name]"
+❌ Consider letting your partner know which skill you're using.
+```
+
+### 3. Scarcity
+**What it is:** Urgency from time limits or limited availability.
+
+**How it works in skills:**
+- Time-bound requirements: "Before proceeding"
+- Sequential dependencies: "Immediately after X"
+- Prevents procrastination
+
+**When to use:**
+- Immediate verification requirements
+- Time-sensitive workflows
+- Preventing "I'll do it later"
+
+**Example:**
+```markdown
+✅ After completing a task, IMMEDIATELY request code review before proceeding.
+❌ You can review code when convenient.
+```
+
+### 4. Social Proof
+**What it is:** Conformity to what others do or what's considered normal.
+
+**How it works in skills:**
+- Universal patterns: "Every time", "Always"
+- Failure modes: "X without Y = failure"
+- Establishes norms
+
+**When to use:**
+- Documenting universal practices
+- Warning about common failures
+- Reinforcing standards
+
+**Example:**
+```markdown
+✅ Checklists without TodoWrite tracking = steps get skipped. Every time.
+❌ Some people find TodoWrite helpful for checklists.
+```
+
+### 5. Unity
+**What it is:** Shared identity, "we-ness", in-group belonging.
+
+**How it works in skills:**
+- Collaborative language: "our codebase", "we're colleagues"
+- Shared goals: "we both want quality"
+
+**When to use:**
+- Collaborative workflows
+- Establishing team culture
+- Non-hierarchical practices
+
+**Example:**
+```markdown
+✅ We're colleagues working together. I need your honest technical judgment.
+❌ You should probably tell me if I'm wrong.
+```
+
+### 6. Reciprocity
+**What it is:** Obligation to return benefits received.
+
+**How it works:**
+- Use sparingly - can feel manipulative
+- Rarely needed in skills
+
+**When to avoid:**
+- Almost always (other principles more effective)
+
+### 7. Liking
+**What it is:** Preference for cooperating with those we like.
+
+**How it works:**
+- **DON'T USE for compliance**
+- Conflicts with honest feedback culture
+- Creates sycophancy
+
+**When to avoid:**
+- Always for discipline enforcement
+
+## Principle Combinations by Skill Type
+
+| Skill Type | Use | Avoid |
+|------------|-----|-------|
+| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
+| Guidance/technique | Moderate Authority + Unity | Heavy authority |
+| Collaborative | Unity + Commitment | Authority, Liking |
+| Reference | Clarity only | All persuasion |
+
+## Why This Works: The Psychology
+
+**Bright-line rules reduce rationalization:**
+- "YOU MUST" removes decision fatigue
+- Absolute language eliminates "is this an exception?" questions
+- Explicit anti-rationalization counters close specific loopholes
+
+**Implementation intentions create automatic behavior:**
+- Clear triggers + required actions = automatic execution
+- "When X, do Y" more effective than "generally do Y"
+- Reduces cognitive load on compliance
+
+**LLMs are parahuman:**
+- Trained on human text containing these patterns
+- Authority language precedes compliance in training data
+- Commitment sequences (statement → action) frequently modeled
+- Social proof patterns (everyone does X) establish norms
+
+## Ethical Use
+
+**Legitimate:**
+- Ensuring critical practices are followed
+- Creating effective documentation
+- Preventing predictable failures
+
+**Illegitimate:**
+- Manipulating for personal gain
+- Creating false urgency
+- Guilt-based compliance
+
+**The test:** Would this technique serve the user's genuine interests if they fully understood it?
+
+## Research Citations
+
+**Cialdini, R. B. (2021).** *Influence: The Psychology of Persuasion (New and Expanded).* Harper Business.
+- Seven principles of persuasion
+- Empirical foundation for influence research
+
+**Meincke, L., Shapiro, D., Duckworth, A. L., Mollick, E., Mollick, L., & Cialdini, R. (2025).** Call Me A Jerk: Persuading AI to Comply with Objectionable Requests. University of Pennsylvania.
+- Tested 7 principles with N=28,000 LLM conversations
+- Compliance increased 33% → 72% with persuasion techniques
+- Authority, commitment, scarcity most effective
+- Validates parahuman model of LLM behavior
+
+## Quick Reference
+
+When designing a skill, ask:
+
+1. **What type is it?** (Discipline vs. guidance vs. reference)
+2. **What behavior am I trying to change?**
+3. **Which principle(s) apply?** (Usually authority + commitment for discipline)
+4. **Am I combining too many?** (Don't use all seven)
+5. **Is this ethical?** (Serves user's genuine interests?)