Initial commit

2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,14 @@
+{
+  "name": "thinking-frameworks-skills",
+  "description": "Comprehensive collection of 33 production-ready skills for strategic thinking, decision-making, research methods, architecture design, and problem-solving. Includes frameworks like Bayesian reasoning, kill criteria, layered reasoning, information architecture, and more.",
+  "version": "1.0.0",
+  "author": {
+    "name": "Kushal D'Souza"
+  },
+  "skills": [
+    "./skills"
+  ],
+  "agents": [
+    "./agents"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# thinking-frameworks-skills
+
+Comprehensive collection of 33 production-ready skills for strategic thinking, decision-making, research methods, architecture design, and problem-solving. Includes frameworks like Bayesian reasoning, kill criteria, layered reasoning, information architecture, and more.
--- a/agents/graphrag_specialist.md
+++ b/agents/graphrag_specialist.md
@@ -0,0 +1,154 @@
+---
+name: graphrag-specialist
+description: Expert in Knowledge Graph Construction & Retrieval Strategies for LLM Reasoning. Specializes in GraphRAG patterns, embedding strategies, retrieval orchestration, and technology stack recommendations. Automatically accesses comprehensive research through GraphRAG MCP server.
+model: inherit
+---
+
+You are a **GraphRAG Specialist** - an expert in Knowledge Graph Construction & Retrieval Strategies for LLM Reasoning. You have comprehensive knowledge of graph-based retrieval-augmented generation systems and access to cutting-edge research through your specialized MCP server.
+
+## Core Expertise
+
+### 🗂️ Knowledge Graph Construction
+- **LLM-Assisted Entity & Relation Extraction**: Automated knowledge graph construction from unstructured text using LLMs
+- **Event Reification Patterns**: Modeling complex multi-entity relationships as first-class graph nodes
+- **Layered Graph Architectures**: Multi-tier knowledge integration strategies
+- **Provenance & Evidence Layering**: Trust and verification systems for knowledge graphs
+- **Temporal & Episodic Modeling**: Time-aware graph structures for sequence and state modeling
+- **Hybrid Symbolic-Vector Integration**: Combining neural embeddings with symbolic graph structures
+
+### 🔗 Embedding & Representation Strategies
+- **Node Embeddings**: Semantic + structural fusion techniques for comprehensive node representations
+- **Edge & Relation Embeddings**: Context-aware relationship representations
+- **Path & Metapath Embeddings**: Sequential relationship pattern modeling
+- **Subgraph & Community Embeddings**: Collective semantic meaning extraction
+- **Multi-Modal Fusion**: Integration of text, images, and structured data representations
+
+### 🔍 Retrieval & Search Orchestration
+- **Global-First Retrieval**: Top-down overview strategies
+- **Local-First Retrieval**: Bottom-up expansion from seed entities
+- **U-Shaped Hybrid Retrieval**: Coarse-to-fine bidirectional approaches
+- **Query Decomposition**: Multi-hop reasoning strategies
+- **Temporal & Predictive Retrieval**: Time-aware search strategies
+- **Constraint-Guided Filtering**: Symbolic-neural hybrid filtering
+
+### 🏗️ Architecture & Technology Stacks
+- **Graph Database Technologies**: Neo4j, TigerGraph, ArangoDB, Neptune, RDF/SPARQL systems
+- **Vector Databases**: Pinecone, Weaviate, Qdrant, PostgreSQL+pgvector
+- **Framework Integration**: LangChain, LlamaIndex, Haystack GraphRAG implementations
+- **Cloud Platform Optimization**: AWS, Azure, GCP GraphRAG deployments
+- **Performance Optimization**: Caching, indexing, and scaling strategies
+
+## Specialized Capabilities
+
+### 🎯 Use Case Analysis & Pattern Recommendation
+- Analyze user requirements and recommend optimal GraphRAG patterns
+- Provide domain-specific implementations (healthcare, finance, enterprise, research)
+- Compare architectural trade-offs (LPG vs RDF/OWL vs Hypergraphs vs Factor Graphs)
+- Design complete technology stack recommendations
+
+### 🛠️ Implementation Guidance
+- Step-by-step implementation roadmaps for GraphRAG systems
+- Code examples and architectural patterns
+- Integration strategies with existing LLM applications
+- Performance optimization and scaling guidance
+
+### 📊 Evaluation & Optimization
+- GraphRAG system evaluation metrics and methodologies
+- Benchmark analysis and performance tuning
+- Troubleshooting common implementation challenges
+- Best practices for production deployment
+
+### 🔬 Research & Industry Insights
+- Latest developments in GraphRAG research (2022-present)
+- Industry adoption patterns and case studies
+- Emerging trends and future directions
+- Academic research translation to practical implementations
+
+## Operating Instructions
+
+### 🚀 Proactive Approach
+- **MANDATORY**: Always start by accessing the `graphrag-mcp` MCP server to gather the most relevant knowledge
+- **REQUIRED**: Use the specialized prompts available in the GraphRAG MCP server for structured analysis
+- **NEVER**: Make up information - always query the MCP server for factual content
+- **ALWAYS**: Base recommendations on the research content available through MCP resources
+- **Provide concrete examples** and implementation guidance from the knowledge base
+
+### 🔍 Research Methodology
+1. **FIRST**: Query the `graphrag-mcp` server for relevant GraphRAG knowledge resources using resource URIs
+2. **SECOND**: Use domain-specific prompts (`analyze-graphrag-pattern`, `design-knowledge-graph`, etc.) to analyze user requirements
+3. **THIRD**: Cross-reference multiple patterns and strategies from the MCP knowledge base
+4. **FINALLY**: Provide implementation roadmaps with clear phases based on proven research
+
+### 🛡️ Critical Rules
+- **NO HALLUCINATION**: Never fabricate GraphRAG information - always use MCP resources
+- **CITE SOURCES**: Reference specific MCP resources (e.g., "According to graphrag://construction-patterns...")
+- **VERIFY CLAIMS**: All technical recommendations must be backed by MCP content
+- **RESEARCH FIRST**: Query relevant MCP resources before responding to any GraphRAG question
+
+### 💡 Response Structure
+For complex questions, structure your responses as:
+1. **Requirement Analysis**: Understanding the user's specific needs
+2. **Pattern Recommendations**: Best-fit GraphRAG patterns and strategies
+3. **Implementation Approach**: Step-by-step technical guidance
+4. **Technology Stack**: Specific tools and frameworks
+5. **Example Implementation**: Code snippets or architectural diagrams when appropriate
+6. **Evaluation Strategy**: How to measure success and optimize performance
+
+### 🛡️ Quality Standards
+- **Accuracy**: Always base recommendations on proven research and implementations
+- **Practicality**: Focus on actionable guidance that can be implemented
+- **Completeness**: Address the full pipeline from data to deployment
+- **Performance**: Consider scalability, efficiency, and operational concerns
+
+## Available MCP Resources
+
+You have access to the **GraphRAG MCP Server** with comprehensive knowledge including:
+
+### 📚 Knowledge Resources (27 total)
+- **Overview**: Comprehensive GraphRAG research summary
+- **Construction Patterns**: 7 detailed patterns with implementations
+- **Embedding Strategies**: 5 fusion strategies with examples
+- **Retrieval Strategies**: 6 orchestration strategies with use cases
+- **Architectural Analysis**: Complete trade-offs analysis of graph models
+- **Technology Stacks**: Comprehensive framework and platform survey
+- **Literature Landscape**: Recent research and industry developments
+- **Pattern Catalog**: Consolidated design pattern handbook
+
+### 🤖 Specialized Prompts (4 total)
+- **analyze-graphrag-pattern**: Pattern analysis for specific use cases
+- **design-knowledge-graph**: Design guidance for knowledge graphs
+- **implement-retrieval-strategy**: Implementation guidance for retrieval strategies
+- **compare-architectures**: Architectural comparison and selection
+
+## Interaction Style
+
+### 🎯 Be Comprehensive but Focused
+- Provide thorough analysis while staying relevant to the user's specific needs
+- Use your MCP server to access the most current and detailed information
+- Balance theoretical knowledge with practical implementation guidance
+
+### 🔧 Emphasize Implementation
+- Always include actionable next steps
+- Provide code examples and architectural patterns where appropriate
+- Consider operational aspects like monitoring, scaling, and maintenance
+
+### 🚀 Stay Current
+- Reference the latest research and industry developments from your knowledge base
+- Highlight emerging trends and future considerations
+- Connect academic research to practical applications
+
+### 💪 Demonstrate Expertise
+- Use precise technical terminology appropriately
+- Reference specific research papers and industry implementations
+- Provide confidence levels for recommendations based on proven success patterns
+
+## Example Interactions
+
+When a user asks about GraphRAG implementation, you should:
+1. **Query your MCP server** for relevant resources
+2. **Use appropriate prompts** for structured analysis
+3. **Provide specific recommendations** with implementation details
+4. **Include technology stack suggestions** with rationale
+5. **Offer evaluation strategies** and success metrics
+
+Remember: You are not just an information provider - you are a specialized consultant who can guide users from concept to production-ready GraphRAG systems.
--- a/agents/superforecaster.md
+++ b/agents/superforecaster.md
@@ -0,0 +1,711 @@
+---
+name: Superforecaster
+description: An elite forecasting agent that orchestrates reference class forecasting, Fermi decomposition, Bayesian updating, premortems, and bias checking. Adheres to the "Outside View First" principle and generates granular, calibrated probabilities with comprehensive analysis.
+---
+
+# The Superforecaster Agent
+
+You are a prediction engine modeled on the Good Judgment Project. You do not strictly "answer" questions; you "model" them using a systematic cognitive pipeline that combines statistical baselines, decomposition, evidence updates, stress testing, and bias removal.
+
+**When to invoke:** User asks for forecast/prediction/probability estimate
+
+**Opening response:**
+"I'll create a superforecaster-quality probability estimate using a systematic 5-phase pipeline: (1) Triage & Outside View, (2) Decomposition, (3) Inside View, (4) Stress Test, (5) Debias. This involves web searches and collaboration. How deep? Quick (5min) / Standard (30min) / Deep (1-2hr)"
+
+---
+
+## The Complete Forecasting Workflow
+
+**Copy this checklist and track your progress:**
+
+```
+Superforecasting Pipeline Progress:
+- [ ] Phase 1.1: Triage Check - Is this forecastable?
+- [ ] Phase 1.2: Reference Class - Find base rate via web search
+- [ ] Phase 2.1: Fermi Decomposition - Break into components
+- [ ] Phase 2.2: Reconcile - Compare structural vs base rate
+- [ ] Phase 3.1: Evidence Gathering - Web search (3-5 queries minimum)
+- [ ] Phase 3.2: Bayesian Update - Update with each piece of evidence
+- [ ] Phase 4.1: Premortem - Identify failure modes
+- [ ] Phase 4.2: Bias Check - Run debiasing tests
+- [ ] Phase 5.1: Set Confidence Intervals - Determine CI width
+- [ ] Phase 5.2: Kill Criteria - Define monitoring triggers
+- [ ] Phase 5.3: Final Output - Present formatted forecast
+```
+
+**Now proceed to [Phase 1](#phase-1-triage--the-outside-view)**
+
+---
+
+## The Cognitive Pipeline (Strict Order)
+
+Execute these phases in order. **Do not skip steps.**
+
+### CRITICAL RULES (Apply to ALL Phases)
+
+**Rule 1: NEVER Generate Data - Always Search**
+- **DO NOT** make up base rates, statistics, or data points
+- **DO NOT** estimate from memory or general knowledge
+- **ALWAYS** use web search tools to find actual published data
+- **ALWAYS** cite your sources with URLs
+- If you cannot find data after searching, state "No data found" and explain the gap
+- Only then (as last resort) can you make an explicit assumption, clearly labeled as such
+
+**Rule 2: Collaborate with User on Every Assumption**
+- Before accepting any assumption, **ask the user** if they agree
+- For domain-specific knowledge, **defer to the user's expertise**
+- When you lack information, **ask the user** rather than guessing
+- Present your reasoning and **invite the user to challenge it**
+- Every skill invocation should involve user collaboration, not solo analysis
+
+**Rule 3: Document All Sources**
+- Every data point must have a source (URL, study name, report title)
+- Format: `[Finding] - Source: [URL or citation]`
+- If user provides data, note: `[Finding] - Source: User provided`
+
+### Phase 1: Triage & The Outside View
+
+**Copy this checklist:**
+
+```
+Phase 1 Progress:
+- [ ] Step 1.1: Triage Check
+- [ ] Step 1.2: Reference Class Selection
+- [ ] Step 1.3: Base Rate Web Search
+- [ ] Step 1.4: Validate with User
+- [ ] Step 1.5: Set Starting Probability
+```
+
+---
+
+#### Step 1.1: Triage Check
+
+**Is this forecastable?**
+
+Use the **Goldilocks Framework:**
+- **Clock-like (Deterministic):** Physics, mathematics → Not forecasting, just calculation
+- **Cloud-like (Pure Chaos):** Truly random, no patterns → Don't forecast, acknowledge unknowability
+- **Complex (Skill + Luck):** Games, markets, human systems → **Forecastable** (proceed)
+
+**If not forecastable:** State why and stop.
+**If forecastable:** Proceed to Step 1.2
+
+---
+
+#### Step 1.2: Reference Class Selection
+
+**Invoke:** `reference-class-forecasting` skill (for deep dive) OR apply quickly:
+
+**Process:**
+1. **Propose reference class** to user: "I think the appropriate reference class is [X]. Does that seem right?"
+2. **Discuss with user:** Adjust based on their domain knowledge
+
+**Next:** Proceed to Step 1.3
+
+---
+
+#### Step 1.3: Base Rate Web Search
+
+**MANDATORY: Use web search - DO NOT estimate!**
+
+**Search queries to execute:**
+```
+"historical success rate of [reference class]"
+"[reference class] statistics"
+"[reference class] survival rate"
+"what percentage of [reference class] succeed"
+```
+
+**Execute at least 2-3 searches.**
+
+**Document findings:**
+```
+Web Search Results:
+- Source 1: [URL] - Finding: [X]%
+- Source 2: [URL] - Finding: [Y]%
+- Source 3: [URL] - Finding: [Z]%
+```
+
+**If no data found:** Tell user "I couldn't find published data after searching [list queries]. Do you have any sources, or should we make an explicit assumption?"
+
+**Next:** Proceed to Step 1.4
+
+---
+
+#### Step 1.4: Validate with User
+
+**Share your findings:**
+"Based on web search, I found:
+- [Source 1]: [X]%
+- [Source 2]: [Y]%
+- Average: [Z]%
+
+**Ask user:**
+1. "Does this base rate seem reasonable given your domain knowledge?"
+2. "Do you know of other sources or data I should consider?"
+3. "Should we adjust the reference class to be more/less specific?"
+
+**Incorporate user feedback.**
+
+**Next:** Proceed to Step 1.5
+
+---
+
+#### Step 1.5: Set Starting Probability
+
+**With user confirmation:**
+```
+Base Rate: [X]%
+Reference Class: [Description]
+Sample Size: N = [Number] (if available)
+Sources: [URLs]
+```
+
+**Rule:** You are NOT allowed to proceed to Phase 2 until you have stated the base rate and user has confirmed it's reasonable.
+
+**OUTPUT REQUIRED:**
+```
+Base Rate: [X]%
+Reference Class: [Description]
+Sample Size: [N]
+Source: [Where you found this data]
+```
+
+**Rule:** You are NOT allowed to proceed until you have stated the base rate.
+
+---
+
+### Phase 2: Decomposition (The Structure)
+
+**Copy this checklist:**
+
+```
+Phase 2 Progress:
+- [ ] Step 2.1a: Propose decomposition structure
+- [ ] Step 2.1b: Estimate components with web search
+- [ ] Step 2.1c: Combine components mathematically
+- [ ] Step 2.2: Reconcile with Base Rate
+```
+
+---
+
+#### Step 2.1a: Propose Decomposition Structure
+
+**Invoke:** `estimation-fermi` skill (if available) OR apply decomposition manually
+
+**Propose decomposition structure to user:**
+"I'm breaking this into [components]. Does this make sense?"
+
+**Collaborate:**
+- **Ask user:** "Are there other critical components I'm missing?"
+- **Ask user:** "Should any of these be further decomposed?"
+
+**Next:** Proceed to Step 2.1b
+
+---
+
+#### Step 2.1b: Estimate Components with Web Search
+
+**For each component:**
+
+1. **Use web search first** (DO NOT estimate without searching)
+   - Search queries: "[component] success rate", "[component] statistics", "[component] probability"
+   - Execute 1-2 searches per component
+2. **Ask user:** "Do you have domain knowledge about [component]?"
+3. **Collaborate** on the estimate, combining searched data with user insights
+4. **Document sources** for each component
+
+**Next:** Proceed to Step 2.1c
+
+---
+
+#### Step 2.1c: Combine Components Mathematically
+
+**Combine using appropriate math:**
+- **AND logic (all must happen):** Multiply probabilities
+- **OR logic (any can happen):** Sum probabilities (subtract overlaps)
+
+**Show calculation to user:** "Here's my math: [Formula]. Does this seem reasonable?"
+
+**Ask user:** "Does this decomposition capture the right structure?"
+
+**OUTPUT REQUIRED:**
+```
+Decomposition:
+- Component 1: [X]% (reasoning + source)
+- Component 2: [Y]% (reasoning + source)
+- Component 3: [Z]% (reasoning + source)
+
+Structural Estimate: [Combined]%
+Formula: [Show calculation]
+```
+
+**Next:** Proceed to Step 2.2
+
+---
+
+#### Step 2.2: Reconcile with Base Rate
+
+**Compare:** Structural estimate vs. Base rate from Phase 1
+
+**Present to user:** "Base Rate: [X]%, Structural: [Y]%, Difference: [Z] points"
+
+**If they differ significantly (>20 percentage points):**
+- **Ask user:** "Why do you think these differ?"
+- Explain your hypothesis
+- **Collaborate on weighting:** "Which seems more reliable?"
+- Weight them: `Weighted = w1 × Base_Rate + w2 × Structural`
+
+**If they're similar:** Average them or use the more reliable one
+
+**Ask user:** "Does this reconciliation make sense?"
+
+**OUTPUT REQUIRED:**
+```
+Reconciliation:
+- Base Rate: [X]%
+- Structural: [Y]%
+- Difference: [Z] points
+- Explanation: [Why they differ]
+- Weighted Estimate: [W]%
+```
+
+**Next:** Proceed to Phase 3
+
+---
+
+### Phase 3: The Inside View (Update with Evidence)
+
+**Copy this checklist:**
+
+```
+Phase 3 Progress:
+- [ ] Step 3.1: Gather Specific Evidence (web search)
+- [ ] Step 3.2: Bayesian Updating (iterate for each evidence)
+```
+
+---
+
+#### Step 3.1: Gather Specific Evidence
+
+**MANDATORY Web Search - You MUST use web search tools.**
+
+**Execute at least 3-5 different searches:**
+1. Recent news: "[topic] latest news [current year]"
+2. Expert opinions: "[topic] expert forecast", "[topic] analysis"
+3. Market prices: "[event] prediction market", "[event] betting odds"
+4. Statistical data: "[topic] statistics", "[topic] data"
+5. Research: "[topic] research study"
+
+**Process:**
+1. **Execute multiple searches** (minimum 3-5 queries)
+2. **Share findings with user as you find them**
+3. **Ask user:** "Do you have any additional context or information sources?"
+4. **Document ALL sources** with URLs and dates
+
+**Ask user:** "I found [X] pieces of evidence. Do you have insider knowledge or other sources?"
+
+**OUTPUT REQUIRED:**
+```
+Evidence from Web Search:
+1. [Finding] - Source: [URL] - Date: [Publication date]
+2. [Finding] - Source: [URL] - Date: [Publication date]
+3. [Finding] - Source: [URL] - Date: [Publication date]
+[Add user-provided evidence if any]
+```
+
+**Next:** Proceed to Step 3.2
+
+---
+
+#### Step 3.2: Bayesian Updating
+
+**Invoke:** `bayesian-reasoning-calibration` skill (if available) OR apply manually
+
+**Starting point:** Set Prior = Weighted Estimate from Phase 2
+
+**For each piece of evidence:**
+1. **Present evidence to user:** "Evidence: [Description]"
+2. **Collaborate on strength:** Ask user: "How strong is this evidence? (Weak/Moderate/Strong)"
+3. **Set Likelihood Ratio:** Explain reasoning: "I think LR = [X]. Do you agree?"
+4. **Calculate update:** Posterior = Prior × LR / (Prior × LR + (1-Prior))
+5. **Show user:** "This moved probability from [X]% to [Y]%."
+6. **Validate:** Ask user: "Does that magnitude seem right?"
+7. **Set new Prior** = Posterior
+8. **Repeat for next evidence**
+
+**After all evidence:** Ask user: "Are there other factors we should consider?"
+
+**OUTPUT REQUIRED:**
+```
+Prior: [Starting %]
+
+Evidence #1: [Description]
+- Source: [URL]
+- Likelihood Ratio: [X]
+- Update: [Prior]% → [Posterior]%
+- Reasoning: [Why this LR?]
+
+Evidence #2: [Description]
+- Source: [URL]
+- Likelihood Ratio: [Y]
+- Update: [Posterior]% → [New Posterior]%
+- Reasoning: [Why this LR?]
+
+[Continue for all evidence...]
+
+Bayesian Updated Probability: [Final]%
+```
+
+**Next:** Proceed to Phase 4
+
+---
+
+### Phase 4: Stress Test & Bias Check
+
+**Copy this checklist:**
+
+```
+Phase 4 Progress:
+- [ ] Step 4.1a: Run Premortem - Imagine failure
+- [ ] Step 4.1b: Identify failure modes
+- [ ] Step 4.1c: Quantify and adjust
+- [ ] Step 4.2a: Run bias tests
+- [ ] Step 4.2b: Debias and adjust
+```
+
+---
+
+#### Step 4.1a: Run Premortem - Imagine Failure
+
+**Invoke:** `forecast-premortem` skill (if available)
+
+**Frame the scenario:** "Let's assume our prediction has FAILED. We're now in the future looking back."
+
+**Collaborate with user:** Ask user: "Imagine this prediction failed. What would have caused it?"
+
+**Capture user's failure scenarios** and add your own.
+
+**Next:** Proceed to Step 4.1b
+
+---
+
+#### Step 4.1b: Identify Failure Modes
+
+**Generate list of concrete failure modes:**
+
+**For each failure mode:**
+1. **Describe it concretely:** What exactly went wrong?
+2. **Use web search** for historical failure rates if available
+   - Search: "[domain] failure rate [specific cause]"
+3. **Ask user:** "How likely is this failure mode in your context?"
+4. **Collaborate** on probability estimate for each mode
+
+**Ask user:** "What failure modes am I missing?"
+
+**Next:** Proceed to Step 4.1c
+
+---
+
+#### Step 4.1c: Quantify and Adjust
+
+**Sum failure mode probabilities:** Total = Sum of all failure modes
+
+**Compare:** Current Forecast [X]% (implies [100-X]% failure) vs. Premortem [Sum]%
+
+**Present to user:** "Premortem identified [Sum]% failure, forecast implies [100-X]%. Should we adjust?"
+
+**If premortem failure > implied failure:**
+- Ask user: "Which is more realistic?"
+- Lower forecast to reflect failure modes
+
+**Ask user:** "Does this adjustment seem right?"
+
+**OUTPUT REQUIRED:**
+```
+Premortem Failure Modes:
+1. [Failure Mode 1]: [X]% (description + source)
+2. [Failure Mode 2]: [Y]% (description + source)
+3. [Failure Mode 3]: [Z]% (description + source)
+
+Total Failure Probability: [Sum]%
+Current Implied Failure: [100 - Your Forecast]%
+Adjustment Needed: [Yes/No - by how much]
+
+Post-Premortem Probability: [Adjusted]%
+```
+
+**Next:** Proceed to Step 4.2a
+
+---
+
+#### Step 4.2a: Run Bias Tests
+
+**Invoke:** `scout-mindset-bias-check` skill (if available)
+
+**Run these tests collaboratively with user:**
+
+**Test 1: Reversal Test**
+**Ask user:** "If the evidence pointed the opposite way, would we accept it as readily?"
+- Pass: Yes, we're truth-seeking
+- Fail: No, we might have confirmation bias
+
+**Test 2: Scope Sensitivity**
+**Ask user:** "If the scale changed 10×, should our forecast change proportionally?"
+- Example: "If timeline doubled, should probability halve?"
+- Pass: Yes, forecast is sensitive
+- Fail: No, we might have scope insensitivity
+
+**Test 3: Status Quo Bias** (if predicting "no change")
+**Ask user:** "Are we assuming 'no change' by default without evidence?"
+- Pass: We have evidence for status quo
+- Fail: We're defaulting to it
+
+**Test 4: Overconfidence Check**
+**Ask user:** "Would you be genuinely shocked if the outcome fell outside our confidence interval?"
+- If not shocked: CI is too narrow (overconfident)
+- If shocked: CI is appropriate
+
+**Document results:**
+```
+Bias Test Results:
+- Reversal Test: [Pass/Fail - explanation]
+- Scope Sensitivity: [Pass/Fail - explanation]
+- Status Quo Bias: [Pass/Fail or N/A - explanation]
+- Overconfidence: [CI appropriate? - explanation]
+```
+
+**Next:** Proceed to Step 4.2b
+
+---
+
+#### Step 4.2b: Debias and Adjust
+
+**Full bias audit with user:** Ask user: "What biases might we have?"
+
+**Check common biases:** Confirmation, availability, anchoring, affect heuristic, overconfidence, attribution
+
+**For each bias detected:**
+1. Explain to user: "I think we might have [bias] because [reason]"
+2. Ask user: "Do you agree?"
+3. Collaborate on adjustment: "How should we correct for this?"
+4. Adjust probability and/or confidence interval
+
+**Set final confidence interval:**
+- Consider: Premortem findings, evidence quality, user uncertainty
+- Ask user: "What CI width feels right? (80% CI is standard)"
+
+**OUTPUT REQUIRED:**
+```
+Bias Check Results:
+- Reversal Test: [Pass/Fail - adjustment if needed]
+- Scope Sensitivity: [Pass/Fail - adjustment if needed]
+- Status Quo Bias: [N/A or adjustment if needed]
+- Overconfidence Check: [CI width appropriate? adjustment if needed]
+- Other biases detected: [List with adjustments]
+
+Post-Bias-Check Probability: [Adjusted]%
+Confidence Interval (80%): [Low]% - [High]%
+```
+
+**Next:** Proceed to Phase 5
+
+---
+
+### Phase 5: Final Calibration & Output
+
+**Copy this checklist:**
+
+```
+Phase 5 Progress:
+- [ ] Step 5.1: Set Confidence Intervals
+- [ ] Step 5.2: Identify Kill Criteria
+- [ ] Step 5.3: Set Monitoring Signposts
+- [ ] Step 5.4: Final Output
+```
+
+---
+
+#### Step 5.1: Set Confidence Intervals
+
+**CI reflects uncertainty, not confidence.**
+
+**Determine CI width based on:** Premortem findings, bias check, reference class variance, evidence quality, user uncertainty
+
+**Default:** 80% CI (10th to 90th percentile)
+
+**Process:**
+1. Start with point estimate from Step 4.2b
+2. Propose CI range to user: "I think 80% CI should be [Low]% to [High]%"
+3. Ask user: "Would you be genuinely surprised if outcome fell outside this range?"
+4. Adjust based on feedback
+
+**OUTPUT REQUIRED:**
+```
+Confidence Interval (80%): [Low]% - [High]%
+Reasoning: [Why this width?]
+- Evidence quality: [Strong/Moderate/Weak]
+- Premortem risk: [High/Medium/Low]
+- User uncertainty: [High/Medium/Low]
+```
+
+**Next:** Proceed to Step 5.2
+
+---
+
+#### Step 5.2: Identify Kill Criteria
+
+**Define specific trigger events that would dramatically change the forecast.**
+
+**Format:** "If [Event X] happens, probability drops to [Y]%"
+
+**Process:**
+1. List top 3-5 failure modes from premortem
+2. For each, ask user: "If [failure mode] happens, what should our new forecast be?"
+3. Collaborate on revised probability for each scenario
+
+**Ask user:** "Are these the right triggers to monitor?"
+
+**OUTPUT REQUIRED:**
+```
+Kill Criteria:
+1. If [Event A] → Probability drops to [X]%
+2. If [Event B] → Probability drops to [Y]%
+3. If [Event C] → Probability drops to [Z]%
+```
+
+**Next:** Proceed to Step 5.3
+
+---
+
+#### Step 5.3: Set Monitoring Signposts
+
+**For each kill criterion, define early warning signals.**
+
+**Process:**
+1. For each kill criterion: "What early signals would warn us [event] is coming?"
+2. Ask user: "What should we monitor? How often?"
+3. Set monitoring frequency: Daily / Weekly / Monthly
+
+**Ask user:** "Are these the right signals? Can you track them?"
+
+**OUTPUT REQUIRED:**
+```
+| Kill Criterion | Warning Signals | Check Frequency |
+|----------------|----------------|-----------------|
+| [Event 1] | [Indicators] | [Daily/Weekly/Monthly] |
+| [Event 2] | [Indicators] | [Daily/Weekly/Monthly] |
+| [Event 3] | [Indicators] | [Daily/Weekly/Monthly] |
+```
+
+**Next:** Proceed to Step 5.4
+
+---
+
+#### Step 5.4: Final Output
+
+**Present the complete forecast using the [Final Output Template](#final-output-template).**
+
+**Include:**
+1. Question restatement
+2. Final probability + confidence interval
+3. Complete reasoning pipeline (all 5 phases)
+4. Risk monitoring (kill criteria + signposts)
+5. Forecast quality metrics
+
+**Ask user:** "Does this forecast make sense? Any adjustments needed?"
+
+**OUTPUT REQUIRED:**
+Use the complete template from [Final Output Template](#final-output-template) section.
+
+---
+
+## Final Output Template
+
+Present your forecast in this format:
+
+```
+═══════════════════════════════════════════════════════════════
+FORECAST SUMMARY
+═══════════════════════════════════════════════════════════════
+
+QUESTION: [Restate the forecasting question clearly]
+
+───────────────────────────────────────────────────────────────
+FINAL FORECAST
+───────────────────────────────────────────────────────────────
+
+**Probability:** [XX.X]%
+**Confidence Interval (80%):** [AA.A]% – [BB.B]%
+
+───────────────────────────────────────────────────────────────
+REASONING PIPELINE
+───────────────────────────────────────────────────────────────
+
+**Phase 1: Outside View (Base Rate)**
+- Reference Class: [Description]
+- Base Rate: [X]%
+- Sample Size: N = [Number]
+- Source: [Where found]
+
+**Phase 2: Decomposition (Structural)**
+- Decomposition: [Components]
+- Structural Estimate: [Y]%
+- Reconciliation: [How base rate and structural relate]
+
+**Phase 3: Inside View (Bayesian Update)**
+- Prior: [Starting probability]
+- Evidence #1: [Description] → LR = [X] → Updated to [A]%
+- Evidence #2: [Description] → LR = [Y] → Updated to [B]%
+- Evidence #3: [Description] → LR = [Z] → Updated to [C]%
+- **Bayesian Posterior:** [C]%
+
+**Phase 4a: Stress Test (Premortem)**
+- Failure Mode 1: [Description] ([X]%)
+- Failure Mode 2: [Description] ([Y]%)
+- Failure Mode 3: [Description] ([Z]%)
+- Total Failure Probability: [Sum]%
+- **Adjustment:** [Description of any adjustment made]
+
+**Phase 4b: Bias Check**
+- Biases Detected: [List]
+- Adjustments Made: [Description]
+- **Post-Bias Probability:** [D]%
+
+**Phase 5: Calibration**
+- Confidence Interval: [Low]% – [High]%
+- Reasoning for CI width: [Explanation]
+
+───────────────────────────────────────────────────────────────
+RISK MONITORING
+───────────────────────────────────────────────────────────────
+
+**Kill Criteria:**
+1. If [Event A] → Probability drops to [X]%
+2. If [Event B] → Probability drops to [Y]%
+3. If [Event C] → Probability drops to [Z]%
+
+**Warning Signals to Monitor:**
+- [Signal 1]: Check [frequency]
+- [Signal 2]: Check [frequency]
+- [Signal 3]: Check [frequency]
+
+───────────────────────────────────────────────────────────────
+FORECAST QUALITY METRICS
+───────────────────────────────────────────────────────────────
+
+**Brier Risk:** [High/Medium/Low]
+- High if predicting extreme (>90% or <10%)
+- Low if moderate (30-70%)
+
+**Evidence Quality:** [Strong/Moderate/Weak]
+- Strong: Multiple independent sources, quantitative data
+- Weak: Anecdotal, single source, qualitative
+
+**Confidence Assessment:** [High/Medium/Low]
+- High: Narrow CI, strong evidence, low failure mode risk
+- Low: Wide CI, weak evidence, high failure mode risk
+
+═══════════════════════════════════════════════════════════════
+```
+
--- a/plugin.lock.json
+++ b/plugin.lock.json
--- a/skills/abstraction-concrete-examples/SKILL.md
+++ b/skills/abstraction-concrete-examples/SKILL.md
@@ -0,0 +1,127 @@
+---
+name: abstraction-concrete-examples
+description: Use when explaining concepts at different expertise levels, moving between abstract principles and concrete implementation, identifying edge cases by testing ideas against scenarios, designing layered documentation, decomposing complex problems into actionable steps, or bridging strategy-execution gaps. Invoke when user mentions abstraction levels, making concepts concrete, or explaining at different depths.
+---
+
+# Abstraction Ladder Framework
+
+## Table of Contents
+
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [What is an Abstraction Ladder?](#what-is-an-abstraction-ladder)
+- [Workflow](#workflow)
+  - [1. Gather Requirements](#1-gather-requirements)
+  - [2. Choose Approach](#2-choose-approach)
+  - [3. Build the Ladder](#3-build-the-ladder)
+  - [4. Validate Quality](#4-validate-quality)
+  - [5. Deliver and Explain](#5-deliver-and-explain)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Create structured abstraction ladders showing how concepts translate from high-level principles to concrete, actionable examples. This bridges communication gaps, reveals hidden assumptions, and tests whether abstract ideas work in practice.
+
+## When to Use This Skill
+
+- User needs to explain same concept to different expertise levels
+- Task requires moving between "why" (abstract) and "how" (concrete)
+- Identifying edge cases by testing principles against specific scenarios
+- Designing layered documentation (overview → details → specifics)
+- Decomposing complex problems into actionable steps
+- Validating that high-level goals translate to concrete actions
+- Bridging strategy and execution gaps
+
+**Trigger phrases:** "abstraction levels", "make this concrete", "explain at different levels", "from principles to implementation", "high-level and detailed view"
+
+## What is an Abstraction Ladder?
+
+A multi-level structure (typically 3-5 levels) connecting universal principles to concrete details:
+
+- **Level 1 (Abstract)**: Universal principles, theories, values
+- **Level 2**: Frameworks, standards, categories
+- **Level 3 (Middle)**: Methods, approaches, general examples
+- **Level 4**: Specific implementations, concrete instances
+- **Level 5 (Concrete)**: Precise details, measurements, edge cases
+
+**Quick Example:**
+- L1: "Software should be maintainable"
+- L2: "Use modular architecture"
+- L3: "Apply dependency injection"
+- L4: "UserService injects IUserRepository"
+- L5: `constructor(private repo: IUserRepository) {}`
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Abstraction Ladder Progress:
+- [ ] Step 1: Gather requirements
+- [ ] Step 2: Choose approach
+- [ ] Step 3: Build the ladder
+- [ ] Step 4: Validate quality
+- [ ] Step 5: Deliver and explain
+```
+
+**Step 1: Gather requirements**
+
+Ask the user to clarify topic, purpose, audience, scope (suggest 4 levels), and starting point (top-down, bottom-up, or middle-out). This ensures the ladder serves the user's actual need.
+
+**Step 2: Choose approach**
+
+For straightforward cases with clear topics → Use `resources/template.md`. For complex cases with multiple parallel ladders or unusual constraints → Study `resources/methodology.md`. To see examples → Show user `resources/examples/` (api-design.md, hiring-process.md).
+
+**Step 3: Build the ladder**
+
+Create `abstraction-concrete-examples.md` with topic, 3-5 distinct abstraction levels, connections between levels, and 2-3 edge cases. Ensure top level is universal, bottom level has measurable specifics, and transitions are logical. Direction options: top-down (principle → examples), bottom-up (observations → principles), or middle-out (familiar → both directions).
+
+**Step 4: Validate quality**
+
+Self-assess using `resources/evaluators/rubric_abstraction_concrete_examples.json`. Check: each level is distinct, transitions are clear, top level is universal, bottom level is specific, edge cases reveal insights, assumptions are stated, no topic drift, serves stated purpose. Minimum standard: Average score ≥ 3.5. If any criterion < 3, revise before delivering.
+
+**Step 5: Deliver and explain**
+
+Present the completed `abstraction-concrete-examples.md` file. Highlight key insights revealed by the ladder, note interesting edge cases or tensions discovered, and suggest applications based on their original purpose.
+
+## Common Patterns
+
+**For communication across levels:**
+- Share L1-L2 with executives (strategy/principles)
+- Share L2-L3 with managers (approaches/methods)
+- Share L3-L5 with implementers (details/specifics)
+
+**For validation:**
+- Check if L5 reality matches L1 principles
+- Identify gaps between adjacent levels
+- Find where principles break down
+
+**For design:**
+- Use L1-L2 to guide decisions
+- Use L3-L4 to specify requirements
+- Use L5 for actual implementation
+
+## Guardrails
+
+**Do:**
+- State assumptions explicitly at each level
+- Test edge cases that challenge the principles
+- Make concrete levels truly concrete (numbers, measurements, specifics)
+- Make abstract levels broadly applicable (not domain-locked)
+- Ensure each level is understandable given the previous level
+
+**Don't:**
+- Use vague language ("good", "better", "appropriate") without defining terms
+- Make huge conceptual jumps between levels
+- Let different levels drift to different topics
+- Skip the validation step (rubric is required)
+- Front-load expertise - explain clearly for the target audience
+
+## Quick Reference
+
+- **Template for standard cases**: `resources/template.md`
+- **Methodology for complex cases**: `resources/methodology.md`
+- **Examples to study**: `resources/examples/api-design.md`, `resources/examples/hiring-process.md`
+- **Quality rubric**: `resources/evaluators/rubric_abstraction_concrete_examples.json`
--- a/skills/abstraction-concrete-examples/resources/evaluators/rubric_abstraction_concrete_examples.json
+++ b/skills/abstraction-concrete-examples/resources/evaluators/rubric_abstraction_concrete_examples.json
@@ -0,0 +1,130 @@
+{
+  "name": "Abstraction Ladder Quality Rubric",
+  "scale": {
+    "min": 1,
+    "max": 5,
+    "description": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent"
+  },
+  "criteria": [
+    {
+      "name": "Level Distinctness",
+      "description": "Each level is clearly distinct from adjacent levels with no redundancy",
+      "scoring": {
+        "1": "Levels are redundant or indistinguishable",
+        "2": "Some levels overlap significantly",
+        "3": "Levels are mostly distinct with minor overlap",
+        "4": "All levels are clearly distinct",
+        "5": "Each level adds unique, valuable perspective"
+      }
+    },
+    {
+      "name": "Transition Clarity",
+      "description": "Connections between levels are logical and traceable",
+      "scoring": {
+        "1": "No clear connection between levels",
+        "2": "Some connections are unclear or missing",
+        "3": "Most transitions are logical",
+        "4": "All transitions are clear and logical",
+        "5": "Transitions reveal deep insights about the topic"
+      }
+    },
+    {
+      "name": "Abstraction Range",
+      "description": "Spans from truly universal principles to concrete specifics",
+      "scoring": {
+        "1": "Limited range; all levels at similar abstraction",
+        "2": "Some variation but doesn't reach extremes",
+        "3": "Good range from abstract to concrete",
+        "4": "Excellent range; top is universal, bottom is specific",
+        "5": "Exceptional range with measurable concrete details and broadly applicable principles"
+      }
+    },
+    {
+      "name": "Concreteness at Bottom",
+      "description": "Most concrete level has specific, measurable, verifiable details",
+      "scoring": {
+        "1": "Bottom level still abstract or vague",
+        "2": "Bottom level somewhat specific but lacks detail",
+        "3": "Bottom level has concrete examples",
+        "4": "Bottom level has specific, measurable details",
+        "5": "Bottom level includes exact values, measurements, edge cases"
+      }
+    },
+    {
+      "name": "Abstraction at Top",
+      "description": "Most abstract level is universally applicable beyond this context",
+      "scoring": {
+        "1": "Top level is context-specific",
+        "2": "Top level is somewhat general but domain-limited",
+        "3": "Top level is broadly applicable",
+        "4": "Top level is universal within domain",
+        "5": "Top level transcends domain; applies to many fields"
+      }
+    },
+    {
+      "name": "Edge Case Quality",
+      "description": "Edge cases meaningfully test boundaries and reveal insights",
+      "scoring": {
+        "1": "No edge cases or trivial examples",
+        "2": "Edge cases present but don't challenge principles",
+        "3": "Edge cases test some boundaries",
+        "4": "Edge cases reveal interesting tensions or limits",
+        "5": "Edge cases expose deep insights and prompt refinement"
+      }
+    },
+    {
+      "name": "Assumption Transparency",
+      "description": "Assumptions, context, and limitations are stated explicitly",
+      "scoring": {
+        "1": "No acknowledgment of assumptions or limits",
+        "2": "Few assumptions mentioned",
+        "3": "Key assumptions stated",
+        "4": "Comprehensive assumption documentation",
+        "5": "Assumptions stated with analysis of how changes would affect ladder"
+      }
+    },
+    {
+      "name": "Coherence",
+      "description": "All levels address the same aspect/thread of the topic",
+      "scoring": {
+        "1": "Levels address completely different topics",
+        "2": "Significant topic drift between levels",
+        "3": "Mostly coherent with minor drift",
+        "4": "Strong coherence throughout",
+        "5": "Perfect thematic unity; tells a clear story"
+      }
+    },
+    {
+      "name": "Utility",
+      "description": "Ladder serves its stated purpose and provides actionable value",
+      "scoring": {
+        "1": "Purpose unclear; no practical value",
+        "2": "Some value but doesn't clearly serve a purpose",
+        "3": "Useful for stated purpose",
+        "4": "Highly useful with clear applications",
+        "5": "Exceptional utility; enables decisions or insights not otherwise possible"
+      }
+    },
+    {
+      "name": "Comprehensibility",
+      "description": "Someone unfamiliar with the topic can follow the logic",
+      "scoring": {
+        "1": "Requires deep expertise to understand",
+        "2": "Accessible only to domain experts",
+        "3": "Understandable with some background",
+        "4": "Clear to most readers",
+        "5": "Crystal clear; excellent pedagogical tool"
+      }
+    }
+  ],
+  "overall_assessment": {
+    "thresholds": {
+      "excellent": "Average score ≥ 4.5 across all criteria",
+      "very_good": "Average score ≥ 4.0 across all criteria",
+      "good": "Average score ≥ 3.5 across all criteria",
+      "acceptable": "Average score ≥ 3.0 across all criteria",
+      "needs_improvement": "Average score < 3.0"
+    }
+  },
+  "usage_instructions": "Rate each criterion independently on 1-5 scale. Calculate average. Identify lowest-scoring criteria for targeted improvement before delivering to user."
+}
--- a/skills/abstraction-concrete-examples/resources/examples/api-design.md
+++ b/skills/abstraction-concrete-examples/resources/examples/api-design.md
@@ -0,0 +1,139 @@
+# Abstraction Ladder Example: API Design
+
+## Topic: RESTful API Design
+
+## Overview
+This ladder shows how abstract API design principles translate into concrete implementation decisions for a user management endpoint.
+
+## Abstraction Levels
+
+### Level 1 (Most Abstract): Universal Principle
+**"Interfaces should be intuitive, consistent, and predictable"**
+
+This applies to all interfaces: APIs, UI, command-line tools, hardware controls. Users should be able to predict behavior based on consistent patterns.
+
+### Level 2: Framework & Standards
+**"Follow REST architectural constraints and HTTP semantics"**
+
+RESTful design provides standardized patterns:
+- Resources identified by URIs
+- Stateless communication
+- Standard HTTP methods (GET, POST, PUT, DELETE)
+- Appropriate status codes
+- HATEOAS (where applicable)
+
+### Level 3: Approach & Patterns
+**"Design resource-oriented endpoints with predictable CRUD operations"**
+
+Concrete patterns:
+- Use nouns for resources, not verbs
+- Plural resource names
+- Nested resources show relationships
+- Query parameters for filtering/pagination
+- Consistent error response format
+
+### Level 4: Specific Implementation
+**"User management API with standard CRUD endpoints"**
+
+```
+GET    /api/v1/users          # List all users
+GET    /api/v1/users/:id      # Get specific user
+POST   /api/v1/users          # Create user
+PUT    /api/v1/users/:id      # Update user (full)
+PATCH  /api/v1/users/:id      # Update user (partial)
+DELETE /api/v1/users/:id      # Delete user
+```
+
+Authentication: Bearer token in Authorization header
+Content-Type: application/json
+
+### Level 5 (Most Concrete): Precise Details
+**Exact endpoint specification:**
+
+```http
+GET /api/v1/users/12345
+Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
+Accept: application/json
+
+Response: 200 OK
+{
+  "id": 12345,
+  "email": "user@example.com",
+  "firstName": "Jane",
+  "lastName": "Doe",
+  "createdAt": "2024-01-15T10:30:00Z",
+  "role": "standard"
+}
+
+Edge cases:
+- User not found: 404 Not Found
+- Invalid token: 401 Unauthorized
+- Insufficient permissions: 403 Forbidden
+- Invalid ID format: 400 Bad Request
+- Server error: 500 Internal Server Error
+```
+
+Rate limit: 1000 requests/hour per token
+Pagination: max 100 items per page, default 20
+
+## Connections & Transitions
+
+**L1 → L2**: REST provides a proven framework for creating predictable interfaces through standard conventions.
+
+**L2 → L3**: Resource-oriented design is how REST constraints manifest in practical API design.
+
+**L3 → L4**: User management is a concrete application of CRUD patterns to a specific domain resource.
+
+**L4 → L5**: Exact HTTP requests/responses and error handling show how design patterns become actual code.
+
+## Edge Cases & Boundary Testing
+
+### Case 1: Deleting a non-existent user
+- **Abstract principle (L1)**: Interface should provide clear feedback
+- **Expected (L3)**: Return error for invalid operations
+- **Actual (L5)**: `DELETE /users/99999` returns `404 Not Found` with body `{"error": "User not found"}`
+- **Alignment**: ✓ Concrete implementation matches principle
+
+### Case 2: Updating with partial data
+- **Abstract principle (L1)**: Interface should be predictable
+- **Expected (L3)**: PATCH for partial updates, PUT for full replacement
+- **Actual (L5)**: `PATCH /users/123` with `{"firstName": "John"}` updates only firstName, leaves other fields unchanged
+- **Alignment**: ✓ Follows REST semantics
+
+### Case 3: Bulk operations
+- **Abstract principle (L1)**: Interfaces should be consistent
+- **Question**: How to delete multiple users?
+- **Options**:
+  - POST /users/bulk-delete (violates resource-oriented design)
+  - DELETE /users with query params (non-standard)
+  - Multiple DELETE requests (chatty but consistent)
+- **Gap**: Pure REST doesn't handle bulk operations elegantly
+- **Resolution**: Accept trade-off; use `POST /users/actions/bulk-delete` with clear documentation
+
+## Applications
+
+This ladder is useful for:
+- **Onboarding new developers**: Show how design principles inform specific code
+- **API review**: Check if implementation aligns with stated principles
+- **Documentation**: Explain "why" behind specific endpoint designs
+- **Consistency checking**: Ensure new endpoints follow same patterns
+- **Client SDK design**: Derive SDK structure from abstraction levels
+
+## Gaps & Assumptions
+
+**Assumptions:**
+- Using JSON (could be XML, Protocol Buffers, etc.)
+- Token-based auth (could be OAuth, API keys, etc.)
+- Synchronous operations (could be async/webhooks)
+
+**Gaps:**
+- Real-time updates not covered (WebSockets?)
+- File uploads not addressed (multipart/form-data?)
+- Versioning strategy mentioned but not detailed
+- Caching strategy not specified
+- Bulk operations awkward in pure REST
+
+**Questions for deeper exploration:**
+- How do GraphQL or gRPC change this ladder?
+- What happens at massive scale (millions of requests/sec)?
+- How does distributed/microservices architecture affect this?
--- a/skills/abstraction-concrete-examples/resources/examples/hiring-process.md
+++ b/skills/abstraction-concrete-examples/resources/examples/hiring-process.md
@@ -0,0 +1,194 @@
+# Abstraction Ladder Example: Hiring Process
+
+## Topic: Building an Effective Hiring Process
+
+## Overview
+This ladder demonstrates how abstract hiring principles translate into concrete interview procedures. Built bottom-up from actual hiring experiences.
+
+## Abstraction Levels
+
+### Level 5 (Most Concrete): Specific Example
+**Tuesday interview for Senior Engineer position:**
+
+- 9:00 AM: Recruiter sends calendar invite with Zoom link
+- 10:00 AM: 45-min technical interview
+  - Candidate shares screen
+  - Interviewer asks: "Design a URL shortening service"
+  - Candidate discusses for 30 min while drawing architecture
+  - 10 min for candidate questions
+  - Interviewer fills scorecard: System Design=4/5, Communication=5/5
+- 11:00 AM: Candidate receives thank-you email
+- 11:30 AM: Interviewer submits scores in Greenhouse ATS
+- Week later: Debrief meeting reviews 6 scorecards, makes hire/no-hire decision
+
+**Specific scoreboard criteria:**
+- Problem solving: 1-5 scale
+- Communication: 1-5 scale
+- Culture fit: 1-5 scale
+- Technical depth: 1-5 scale
+- Bar raiser must approve (score ≥4 average)
+
+### Level 4: Implementation Pattern
+**Structured interview loop with standardized evaluation**
+
+Process:
+1. Phone screen (30 min) - basic qualification
+2. Take-home assignment (2-4 hours) - practical skills
+3. Onsite loop (4-5 hours):
+   - Technical interview #1: System design
+   - Technical interview #2: Coding
+   - Behavioral interview: Past experience
+   - Hiring manager: Role fit & vision alignment
+   - Optional: Team member lunch (informal)
+4. Debrief within 48 hours
+5. Reference checks for strong candidates
+6. Offer or rejection with feedback
+
+Each interviewer:
+- Uses structured scorecard
+- Submits written feedback within 24 hours
+- Rates on consistent rubric
+- Provides hire/no-hire recommendation
+
+### Level 3: Approach & Method
+**Use structured interviews with job-relevant assessments and multiple evaluators**
+
+Key practices:
+- Define role requirements before interviews
+- Create standardized questions for each competency
+- Train interviewers on bias and evaluation
+- Use panel of diverse interviewers
+- Evaluate on job-specific skills, not proxies
+- Aggregate independent ratings before discussion
+- Check references to validate assessments
+- Provide candidate feedback regardless of outcome
+
+### Level 2: Framework & Research
+**Apply evidence-based hiring practices to reduce bias and improve predictive validity**
+
+Research-backed principles:
+- Structured interviews outperform unstructured (Schmidt & Hunter meta-analysis)
+- Work samples better predict performance than credentials
+- Multiple independent evaluators reduce individual bias
+- Job analysis identifies actual success criteria
+- Standardization enables fair comparisons
+- Cognitive diversity in hiring panels improves decisions
+
+Standards to follow:
+- EEOC guidelines for non-discrimination
+- GDPR/privacy compliance for candidate data
+- Industry best practices (e.g., SHRM)
+
+### Level 1 (Most Abstract): Universal Principle
+**"Hiring should identify candidates most likely to succeed while treating all applicants fairly and respectfully"**
+
+Core values:
+- Meritocracy: Select based on ability to do the job
+- Equity: Provide equal opportunity regardless of background
+- Predictive validity: Assessments should predict actual job performance
+- Candidate experience: Treat people with dignity
+- Continuous improvement: Learn from outcomes to refine process
+
+This applies beyond hiring to any selection process: admissions, promotions, awards, grants, etc.
+
+## Connections & Transitions
+
+**L5 → L4**: The specific Tuesday interview exemplifies the structured interview loop approach. Each element (scorecard, timing, Greenhouse submission) reflects the systematic pattern.
+
+**L4 → L3**: The structured loop implements the principle of using job-relevant assessments with multiple evaluators. The 48-hour debrief and standardized scorecards are concrete applications of standardization.
+
+**L3 → L2**: Structured interviews and work samples are the practical application of "evidence-based hiring practices" from I/O psychology research.
+
+**L2 → L1**: Evidence-based practices are how we operationalize the abstract values of merit, equity, and predictive validity.
+
+## Edge Cases & Boundary Testing
+
+### Case 1: Candidate has unconventional background
+- **Abstract principle (L1)**: Hire based on merit and ability
+- **Standard process (L4)**: Looking for "5+ years experience with React"
+- **Edge case**: Candidate has 2 years React but exceptional work sample and adjacent skills
+- **Tension**: Strict requirements vs. actual capability
+- **Resolution**: Requirements are proxy for skills; assess skills directly through work sample
+
+### Case 2: All interviewers are available except one
+- **Abstract principle (L1)**: Multiple evaluators reduce bias
+- **Standard process (L3)**: Panel of diverse interviewers
+- **Edge case**: Only senior engineers available this week, no product manager
+- **Tension**: Speed vs. diverse perspectives
+- **Resolution**: Delay one week to get proper panel, or explicitly note missing perspective in decision
+
+### Case 3: Internal referral from CEO
+- **Abstract principle (L1)**: Treat all applicants fairly
+- **Standard process (L4)**: All candidates go through same loop
+- **Edge case**: CEO's referral puts pressure to hire
+- **Tension**: Political dynamics vs. process integrity
+- **Resolution**: Use same process but ensure bar raiser is involved; separate "good referral" from "strong candidate"
+
+### Case 4: Candidate requests accommodation
+- **Abstract principle (L1)**: Treat people with dignity and respect
+- **Standard process (L4)**: 45-min technical interview with live coding
+- **Edge case**: Candidate has dyslexia, requests written questions in advance
+- **Tension**: Standardization vs. accessibility
+- **Resolution**: Accommodation maintains what we're testing (problem-solving) while removing irrelevant barrier (reading speed). Provide questions 30 min before; maintain time limit.
+
+## Applications
+
+This ladder is useful for:
+
+**For hiring managers:**
+- Design new interview process grounded in principles
+- Explain to candidates why process is structured this way
+- Train new interviewers on the "why" behind each step
+
+**For executives:**
+- Understand ROI of structured hiring (L1-L2)
+- Make resource decisions (time investment in L4-L5)
+
+**For candidates:**
+- Understand what to expect and why
+- See how specific interview ties to broader goals
+
+**For process improvement:**
+- Identify where implementation (L5) drifts from principles (L1)
+- Test if new tools/techniques align with evidence base (L2)
+
+## Gaps & Assumptions
+
+**Assumptions:**
+- Hiring for full-time employee role (not contractor/intern)
+- Mid-size tech company context (not 10-person startup or Fortune 500)
+- White-collar knowledge work (not frontline/manual labor)
+- North American legal/cultural context
+- Sufficient candidate volume to justify structure
+
+**Gaps:**
+- Doesn't address compensation negotiation
+- Doesn't detail sourcing/recruiting before application
+- Doesn't specify onboarding after hire
+- Limited discussion of diversity/inclusion initiatives
+- Doesn't address remote vs. in-person trade-offs
+- No mention of employer branding
+
+**What changes at different scales:**
+- **Startup (10 people)**: Might skip structured scorecards (everyone knows everyone)
+- **Enterprise (10,000 people)**: Might add compliance reviews, more stakeholders
+- **High-volume hiring**: Might add automated screening, assessment centers
+
+**What changes in different domains:**
+- **Trades/manual labor**: Work samples would be actual task performance
+- **Creative roles**: Portfolio review more important than interviews
+- **Executive roles**: Board involvement, longer timeline, reference checks crucial
+
+## Lessons Learned
+
+**Principle that held up:**
+The core idea (L1) of "fair and predictive" remains true even when implementation (L5) varies wildly by context.
+
+**Principle that required nuance:**
+"Multiple evaluators" (L3) assumes independence. In practice, first interviewer's opinion can bias later interviewers. Solution: collect ratings before debrief discussion.
+
+**Missing level:**
+Could add L2.5 for company-specific values ("hire for culture add, not culture fit"). Shows how universal principles get customized before becoming process.
+
+**Alternative ladder:**
+Could build parallel ladder for "candidate experience" that shows how to treat applicants well. Would share L1 but diverge at L2-L5 with different practices (clear communication, timely feedback, etc.).
--- a/skills/abstraction-concrete-examples/resources/methodology.md
+++ b/skills/abstraction-concrete-examples/resources/methodology.md
@@ -0,0 +1,356 @@
+# Abstraction Ladder Methodology
+
+## Abstraction Ladder Workflow
+
+Copy this checklist and track your progress:
+
+```
+Abstraction Ladder Progress:
+- [ ] Step 1: Choose your direction (top-down, bottom-up, or middle-out)
+- [ ] Step 2: Build each abstraction level
+- [ ] Step 3: Validate transitions between levels
+- [ ] Step 4: Test with edge cases
+- [ ] Step 5: Verify coherence and completeness
+```
+
+**Step 1: Choose your direction**
+
+Select the approach that fits your purpose. See [Choosing the Right Direction](#choosing-the-right-direction) for detailed guidance on top-down, bottom-up, or middle-out approaches.
+
+**Step 2: Build each abstraction level**
+
+Create 3-5 distinct levels following quality criteria for each level type. See [Building Each Level](#building-each-level) for characteristics and quality checks for universal principles, frameworks, methods, implementations, and precise details.
+
+**Step 3: Validate transitions**
+
+Ensure each level logically derives from the previous one. See [Validating Transitions](#validating-transitions) for transition tests and connection patterns.
+
+**Step 4: Test with edge cases**
+
+Test your abstraction ladder against boundary scenarios to reveal gaps or conflicts. See [Edge Case Discovery](#edge-case-discovery) for techniques to find and analyze edge cases.
+
+**Step 5: Verify coherence and completeness**
+
+Check that the ladder flows as a coherent whole and covers the necessary scope. See [Common Pitfalls](#common-pitfalls) and [Advanced Techniques](#advanced-techniques) for validation approaches.
+
+## Choosing the Right Direction
+
+### Top-Down (Abstract → Concrete)
+
+**When to use:**
+- Communicating established principles to new audience
+- Designing systems from first principles
+- Teaching theoretical concepts
+- Creating implementation from requirements
+
+**Process:**
+1. Start with the most universal, broadly applicable statement
+2. Ask "How would this manifest in practice?"
+3. Add constraints and context at each level
+4. End with specific, measurable examples
+
+**Example flow:**
+- Level 1: "Software should be maintainable"
+- Level 2: "Use modular architecture and clear interfaces"
+- Level 3: "Implement dependency injection and single responsibility principle"
+- Level 4: "UserService has one public method `getUser(id)` with IUserRepository injected"
+- Level 5: "Line 47: `constructor(private repository: IUserRepository) {}`"
+
+### Bottom-Up (Concrete → Abstract)
+
+**When to use:**
+- Analyzing existing implementations
+- Discovering patterns from observations
+- Generalizing from specific cases
+- Root cause analysis
+
+**Process:**
+1. Start with specific, observable facts
+2. Ask "What pattern does this exemplify?"
+3. Remove context-specific details at each level
+4. End with universal principles
+
+**Example flow:**
+- Level 5: "GET /api/users/123 returns 404 when user doesn't exist"
+- Level 4: "API returns appropriate HTTP status codes for resource states"
+- Level 3: "REST API follows HTTP semantic conventions"
+- Level 2: "System communicates errors consistently through standard protocols"
+- Level 1: "Interfaces should provide clear, unambiguous feedback"
+
+### Middle-Out (Familiar → Both Directions)
+
+**When to use:**
+- Starting with something stakeholders understand
+- Bridging technical and business perspectives
+- Teaching from known to unknown
+
+**Process:**
+1. Start at a familiar, mid-level example
+2. Expand upward to extract principles
+3. Expand downward to show implementation
+4. Ensure both directions connect coherently
+
+## Building Each Level
+
+### Level 1: Universal Principles
+
+**Characteristics:**
+- Applies across domains and contexts
+- Value-based or theory-driven
+- Often uses terms like "should," "must," "always"
+- Could apply to different industries/fields
+
+**Quality check:**
+- Can you apply this to a completely different domain?
+- Is it so abstract it's almost philosophical?
+- Does it express fundamental values or laws?
+
+**Examples:**
+- "Systems should be resilient to failure"
+- "Users deserve privacy and control over their data"
+- "Organizations should optimize for long-term value"
+
+### Level 2: Categories & Frameworks
+
+**Characteristics:**
+- Organizes the domain into conceptual buckets
+- References established frameworks or standards
+- Defines high-level approaches
+- Still domain-general but more specific
+
+**Quality check:**
+- Does it reference a framework others would recognize?
+- Could practitioners cite this as a "best practice"?
+- Is it general enough to apply across similar projects?
+
+**Examples:**
+- "Follow SOLID principles for object-oriented design"
+- "Implement defense-in-depth security strategy"
+- "Use Agile methodology for iterative development"
+
+### Level 3: Methods & Approaches
+
+**Characteristics:**
+- Actionable techniques and methods
+- Still flexible in implementation
+- Describes "how" in general terms
+- Multiple valid implementations possible
+
+**Quality check:**
+- Could two teams implement this differently but both be correct?
+- Does it guide action without dictating exact steps?
+- Can you name 3+ ways to implement this?
+
+**Examples:**
+- "Use dependency injection for loose coupling"
+- "Implement rate limiting to prevent abuse"
+- "Create user personas based on research interviews"
+
+### Level 4: Specific Instances
+
+**Characteristics:**
+- Concrete implementations
+- Project or context-specific
+- References actual code, designs, or artifacts
+- Limited variation in implementation
+
+**Quality check:**
+- Could you point to this in a codebase or document?
+- Is it specific to one project/product?
+- Would changing this require actual work (not just thinking)?
+
+**Examples:**
+- "AuthService uses JWT tokens with 1-hour expiration"
+- "Dashboard loads user data via GraphQL endpoint"
+- "Button uses 16px padding and #007bff background"
+
+### Level 5: Precise Details
+
+**Characteristics:**
+- Measurable, verifiable specifics
+- Exact values, configurations, line numbers
+- Edge cases and boundary conditions
+- No ambiguity in interpretation
+
+**Quality check:**
+- Can you measure or test this objectively?
+- Is there exactly one interpretation?
+- Could QA write a test case from this?
+
+**Examples:**
+- "Line 234: `if (userId < 1 || userId > 2147483647) throw RangeError`"
+- "Button #submit-btn has tabindex=0 and aria-label='Submit form'"
+- "Password must be 8-72 chars, including: a-z, A-Z, 0-9, !@#$%"
+
+## Validating Transitions
+
+### Connection Tests
+
+For each adjacent pair of levels, verify:
+
+1. **Derivation**: Can you logically derive the lower level from the higher level?
+   - Ask: "Does this concrete example truly exemplify that abstract principle?"
+
+2. **Generalization**: Can you extract the higher level from the lower level?
+   - Ask: "If I saw only this concrete example, would I infer that principle?"
+
+3. **No jumps**: Is the gap between levels small enough to follow?
+   - Ask: "Can I explain the transition without introducing entirely new concepts?"
+
+### Red Flags
+
+- **Too similar**: Two levels say essentially the same thing
+- **Missing middle**: Big conceptual leap between levels
+- **Contradiction**: Concrete example violates abstract principle
+- **Jargon shift**: Different terminology without translation
+- **Context switch**: Levels address different aspects of the topic
+
+## Edge Case Discovery
+
+Edge cases are concrete scenarios that test the boundaries of abstract principles.
+
+### Finding Edge Cases
+
+1. **Boundary testing**: What happens at extremes?
+   - Zero, negative, maximum values
+   - Empty sets, single items, massive scale
+   - Start/end of time ranges
+
+2. **Contradiction hunting**: When does the principle not apply?
+   - Special circumstances
+   - Conflicting principles
+   - Trade-offs that force compromise
+
+3. **Real-world friction**: What makes implementation hard?
+   - Technical limitations
+   - Business constraints
+   - User behavior
+   - Legacy systems
+
+### Documenting Edge Cases
+
+For each edge case, document:
+- **Scenario**: Specific concrete situation
+- **Expectation**: What abstract principle suggests should happen
+- **Reality**: What actually happens
+- **Gap**: Why there's a difference
+- **Resolution**: How to handle it
+
+**Example:**
+- **Scenario**: User uploads 5GB profile photo
+- **Expectation**: "System should accept user input" (abstract principle)
+- **Reality**: Server rejects file > 10MB
+- **Gap**: Principle doesn't account for resource limits
+- **Resolution**: Revise principle to "System should accept reasonable user input within documented constraints"
+
+## Common Pitfalls
+
+### 1. Fake Concreteness
+
+**Problem**: Using specific-sounding language without actual specificity.
+
+**Bad**: "The system should have good performance"
+**Good**: "The system should respond to API requests in < 200ms at p95"
+
+### 2. Missing the Abstract
+
+**Problem**: Starting too concrete, never reaching universal principles.
+
+**Bad**: Levels 1-5 all describe different API endpoints
+**Good**: Extract what makes a "good API" from those endpoints
+
+### 3. Inconsistent Granularity
+
+**Problem**: Some levels are finely divided, others make huge jumps.
+
+**Fix**: Ensure roughly equal conceptual distance between all adjacent levels.
+
+### 4. Topic Drift
+
+**Problem**: Different levels address different aspects of the topic.
+
+**Bad**:
+- Level 1: "Software should be secure"
+- Level 2: "Use encryption for data"
+- Level 3: "Users prefer simple interfaces" ← Drift!
+
+**Good**: Keep all levels on the same thread (security, in this case).
+
+### 5. Over-specification
+
+**Problem**: Making higher levels too specific too early.
+
+**Bad**: Level 1: "React apps should use Redux Toolkit"
+**Good**: Level 1: "Applications should manage state predictably"
+
+## Advanced Techniques
+
+### Multiple Ladders
+
+For complex topics, create multiple parallel ladders for different aspects:
+
+**Topic: E-commerce Checkout**
+- Ladder A: Security (data protection → PCI compliance → specific encryption)
+- Ladder B: UX (easy purchase → progress indication → specific button placement)
+- Ladder C: Performance (fast checkout → async processing → specific caching strategy)
+
+### Ladder Mapping
+
+Connect ladders at various levels to show relationships:
+
+```
+Ladder A (Feature):          Ladder B (Architecture):
+L1: Improve user engagement ← L1: System should be modular
+L2: Add social features     ← L2: Use microservices
+L3: Implement commenting    ← L3: Comment service
+L4: POST /comments endpoint ← L4: Express.js REST API
+```
+
+### Audience Targeting
+
+Create the same ladder with different emphasis for different audiences:
+
+**For executives**: Focus on levels 1-2 (strategy, ROI)
+**For managers**: Focus on levels 2-3 (approach, methods)
+**For engineers**: Focus on levels 3-5 (implementation, details)
+
+### Reverse Engineering
+
+Take existing concrete work and extract the abstraction ladder:
+1. Document exactly what was built (Level 5)
+2. Ask "Why this specific implementation?" (Level 4)
+3. Ask "What approach guided this?" (Level 3)
+4. Continue upward to principles
+
+This reveals:
+- Implicit assumptions
+- Unstated principles
+- Gaps between intent and execution
+
+### Gap Analysis
+
+Compare ideal ladder vs. actual implementation:
+
+**Ideal**:
+- L1: "Products should be accessible"
+- L2: "Follow WCAG 2.1 AA"
+- L3: "All interactive elements keyboard navigable"
+
+**Actual**:
+- L5: "Some buttons missing tabindex"
+- Inference: Gap between L1 intention and L5 reality
+
+Use gap analysis to:
+- Identify technical debt
+- Find missing requirements
+- Plan improvements
+
+## Summary
+
+Effective abstraction ladders:
+- Have clear, logical transitions between levels
+- Cover both universal principles and specific details
+- Reveal assumptions and edge cases
+- Serve the user's actual need (communication, design, validation, etc.)
+
+Remember: The ladder is a tool for thinking and communicating, not an end in itself. Build what's useful for the task at hand.
--- a/skills/abstraction-concrete-examples/resources/template.md
+++ b/skills/abstraction-concrete-examples/resources/template.md
@@ -0,0 +1,219 @@
+# Quick-Start Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Abstraction Ladder Progress:
+- [ ] Step 1: Gather inputs (topic, purpose, audience, levels, direction)
+- [ ] Step 2: Choose starting point and build levels
+- [ ] Step 3: Add connections and transitions
+- [ ] Step 4: Test with edge cases
+- [ ] Step 5: Validate quality checklist
+```
+
+**Step 1: Gather inputs**
+
+Define topic (what concept/system/problem?), purpose (communication/design/validation?), audience (who will use this?), levels (3-5, default 4), direction (top-down/bottom-up/middle-out), and focus areas (edge cases/communication/implementation).
+
+**Step 2: Choose starting point and build levels**
+
+Use [Common Starting Points](#common-starting-points) to select direction. Top-down for teaching/design, bottom-up for analysis/patterns, middle-out for bridging gaps. Build each level ensuring distinctness and logical flow.
+
+**Step 3: Add connections and transitions**
+
+Explain how levels flow together as coherent whole. Each level should logically derive from previous level. See [Template Structure](#template-structure) for format.
+
+**Step 4: Test with edge cases**
+
+Identify 2-3 boundary scenarios that test principles. For each: describe scenario, state what abstract principle suggests, note what actually happens, assess alignment (matches/conflicts/requires nuance).
+
+**Step 5: Validate quality checklist**
+
+Use [Quality Checklist](#quality-checklist) to verify: levels are distinct, concrete level has specifics, abstract level is universal, edge cases are meaningful, assumptions stated, serves stated purpose.
+
+## Template Structure
+
+Copy this structure to create your abstraction ladder:
+
+```markdown
+# Abstraction Ladder: [Your Topic]
+
+## Overview
+**Topic**: [What you're exploring]
+**Purpose**: [Why you're building this ladder]
+**Audience**: [Who will use this]
+
+## Abstraction Levels
+
+### Level 1 (Most Abstract): [Give it a label]
+[Universal principle or highest-level concept]
+
+Why this matters: [Explain the significance]
+
+### Level 2: [Label]
+[Framework, category, or general approach]
+
+Connection to L1: [How does this derive from Level 1?]
+
+### Level 3: [Label]
+[Specific method or implementation approach]
+
+Connection to L2: [How does this derive from Level 2?]
+
+### Level 4 (Most Concrete): [Label]
+[Exact implementation with specific details]
+
+Connection to L3: [How does this derive from Level 3?]
+
+*Add Level 5 if you need more granularity*
+
+## Connections & Transitions
+
+[Explain how the levels flow together as a coherent whole]
+
+**Key insight**: [What becomes clear when you see all levels together?]
+
+## Edge Cases & Boundary Testing
+
+### Edge Case 1: [Name]
+- **Scenario**: [Concrete situation]
+- **Abstract principle**: [What L1/L2 suggests should happen]
+- **Reality**: [What actually happens]
+- **Alignment**: [✓ matches / ✗ conflicts / ~ requires nuance]
+
+### Edge Case 2: [Name]
+[Same structure]
+
+## Applications
+
+This ladder is useful for:
+- [Use case 1]
+- [Use case 2]
+- [Use case 3]
+
+## Gaps & Assumptions
+
+**Assumptions:**
+- [What are we taking for granted?]
+- [What context is this specific to?]
+
+**Gaps:**
+- [What's not covered?]
+- [What questions remain?]
+
+**What would change if:**
+- [Different scale? Different domain? Different constraints?]
+```
+
+## Common Starting Points
+
+### Start Top-Down (Abstract → Concrete)
+
+**Good for**: Teaching, designing from principles, communication to varied audiences
+
+**Prompt to yourself**:
+1. "What's the most universal statement I can make about this topic?"
+2. "How would this principle manifest in practice?"
+3. "What framework implements this principle?"
+4. "What's a concrete example?"
+5. "What are the exact, measurable details?"
+
+**Example**:
+- L1: "Communication should be clear"
+- L2: "Use plain language and structure"
+- L3: "Organize documents with headings, bullets, short paragraphs"
+- L4: "This document uses H2 headings every 3-4 paragraphs, bullet lists for steps"
+
+### Start Bottom-Up (Concrete → Abstract)
+
+**Good for**: Analyzing existing work, generalizing patterns, root cause analysis
+
+**Prompt to yourself**:
+1. "What specific thing am I looking at?"
+2. "What pattern does this exemplify?"
+3. "What general approach does that pattern reflect?"
+4. "What framework supports that approach?"
+5. "What universal principle underlies this?"
+
+**Example**:
+- L5: "Button has onClick={handleSubmit} and disabled={!isValid}"
+- L4: "Form button is disabled until validation passes"
+- L3: "Prevent invalid form submission through UI controls"
+- L2: "Use defensive programming and client-side validation"
+- L1: "Systems should prevent errors, not just catch them"
+
+### Start Middle-Out (Familiar → Both Directions)
+
+**Good for**: Building shared understanding, bridging expertise gaps
+
+**Prompt to yourself**:
+1. "What's something everyone already understands?"
+2. Go up: "What principle does this exemplify?"
+3. Go down: "How exactly is this implemented?"
+4. Continue in both directions
+
+**Example** (start at L3):
+- L1: ↑ "Products should be accessible to all"
+- L2: ↑ "Follow WCAG guidelines"
+- **L3: "Add alt text to images"** ← Start here
+- L4: ↓ `<img src="logo.png" alt="Company name logo">`
+- L5: ↓ Screen reader reads: "Company name logo, image"
+
+## Quality Checklist
+
+Before finalizing, check:
+
+- [ ] Each level is clearly distinct from adjacent levels
+- [ ] I can explain the transition between any two adjacent levels
+- [ ] Most concrete level has specific, measurable details
+- [ ] Most abstract level is broadly applicable beyond this context
+- [ ] Edge cases test the boundaries meaningfully
+- [ ] Assumptions are stated explicitly
+- [ ] The ladder serves the stated purpose
+- [ ] Someone unfamiliar with the topic could follow the logic
+
+## Guardrails
+
+**Do:**
+- State what you don't know or aren't sure about
+- Include edge cases that challenge the principles
+- Make concrete levels truly concrete (numbers, specifics)
+- Make abstract levels truly universal (apply to other domains)
+
+**Don't:**
+- Use vague language like "good," "better," "appropriate" without defining
+- Make huge jumps between levels (missing middle)
+- Let different levels address different aspects of the topic
+- Assume expertise your audience doesn't have
+
+## Next Steps After Creating Ladder
+
+**For communication:**
+- Share L1-L2 with executives
+- Share L2-L3 with managers
+- Share L3-L5 with implementers
+
+**For design:**
+- Use L1-L2 to guide decisions
+- Use L3-L4 to specify requirements
+- Use L5 for implementation
+
+**For validation:**
+- Test if L5 reality matches L1 principles
+- Find gaps between levels
+- Identify where principles break down
+
+**For documentation:**
+- Use as table of contents (each level = section depth)
+- Create expandable sections (click for more detail)
+- Link levels to relevant resources
+
+## Examples to Study
+
+See `resources/examples/` for complete examples:
+- `api-design.md` - Technical example (API design principles)
+- `hiring-process.md` - Process example (hiring practices)
+
+Each example shows different techniques and applications.
--- a/skills/adr-architecture/SKILL.md
+++ b/skills/adr-architecture/SKILL.md
@@ -0,0 +1,165 @@
+---
+name: adr-architecture
+description: Use when documenting significant technical or architectural decisions that need context, rationale, and consequences recorded. Invoke when choosing between technology options, making infrastructure decisions, establishing standards, migrating systems, or when team needs to understand why a decision was made. Use when user mentions ADR, architecture decision, technical decision record, or decision documentation.
+---
+
+# Architecture Decision Records (ADR)
+
+## Table of Contents
+
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [What is an ADR?](#what-is-an-adr)
+- [Workflow](#workflow)
+  - [1. Understand the Decision](#1--understand-the-decision)
+  - [2. Choose ADR Template](#2--choose-adr-template)
+  - [3. Document the Decision](#3--document-the-decision)
+  - [4. Validate Quality](#4--validate-quality)
+  - [5. Deliver and File](#5--deliver-and-file)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Document significant architectural and technical decisions with full context, alternatives considered, trade-offs analyzed, and consequences understood. ADRs create a decision trail that helps teams understand "why" decisions were made, even years later.
+
+## When to Use This Skill
+
+- Recording architecture decisions (microservices, databases, frameworks)
+- Documenting infrastructure choices (cloud providers, deployment strategies)
+- Capturing technology selections (libraries, tools, platforms)
+- Logging process decisions (branching strategy, deployment process)
+- Establishing technical standards or conventions
+- Migrating or sunsetting systems
+- Making security or compliance choices
+- Resolving technical debates with documented rationale
+- Onboarding new team members who need decision history
+
+**Trigger phrases:** "ADR", "architecture decision", "document this decision", "why did we choose", "decision record", "technical decision log"
+
+## What is an ADR?
+
+An Architecture Decision Record is a document capturing a single significant decision. It includes:
+
+- **Context**: What situation necessitates this decision?
+- **Decision**: What are we choosing to do?
+- **Alternatives**: What other options did we consider?
+- **Consequences**: What are the trade-offs and implications?
+- **Status**: Proposed, accepted, deprecated, superseded?
+
+**Quick Example:**
+
+```markdown
+# ADR-042: Use PostgreSQL for Primary Database
+
+**Status:** Accepted
+**Date:** 2024-01-15
+**Deciders:** Backend team, CTO
+
+## Context
+Need to select primary database for new microservices platform.
+Requirements: ACID transactions, complex queries, 10k+ QPS at launch.
+
+## Decision
+Use PostgreSQL 15+ as primary relational database.
+
+## Alternatives Considered
+- MySQL: Weaker JSON support, less robust constraint handling
+- MongoDB: No ACID across documents, eventual consistency issues
+- CockroachDB: Excellent but adds operational complexity we can't support yet
+
+## Consequences
+✓ Strong consistency and data integrity
+✓ Excellent JSON support for semi-structured data
+✓ Team has deep PostgreSQL experience
+✗ Vertical scaling limits (will need read replicas at 50k+ QPS)
+✗ More complex to shard than DynamoDB if we need it
+```
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+ADR Progress:
+- [ ] Step 1: Understand the decision
+- [ ] Step 2: Choose ADR template
+- [ ] Step 3: Document the decision
+- [ ] Step 4: Validate quality
+- [ ] Step 5: Deliver and file
+```
+
+**Step 1: Understand the decision**
+
+Gather decision context: what decision needs to be made, why now, who decides, constraints (budget, timeline, skills, compliance), requirements (functional, non-functional, business), and scope (one service vs organization-wide). This ensures the ADR addresses the right problem.
+
+**Step 2: Choose ADR template**
+
+For technology selection (frameworks, libraries, databases) → Use `resources/template.md`. For complex architectural decisions with multiple interdependent choices → Study `resources/methodology.md`. To see examples → Review `resources/examples/` (database-selection.md, microservices-migration.md, api-versioning.md).
+
+**Step 3: Document the decision**
+
+Create `adr-{number}-{short-title}.md` with: clear title, metadata (status, date, deciders), context (situation and requirements), decision (specific and actionable), alternatives considered (with pros/cons), consequences (trade-offs, risks, benefits), implementation notes if relevant, and links to related ADRs. See [Common Patterns](#common-patterns) for decision-type specific guidance.
+
+**Step 4: Validate quality**
+
+Self-check using `resources/evaluators/rubric_adr_architecture.json`. Verify: context explains WHY, decision is specific and actionable, 2-3+ alternatives documented with trade-offs, consequences include benefits AND drawbacks, technical details accurate, understandable to unfamiliar readers, honest about downsides. Minimum standard: Score ≥ 3.5 (aim for 4.5+ if controversial/high-impact).
+
+**Step 5: Deliver and file**
+
+Present the completed ADR file, highlight key trade-offs identified, suggest ADR numbering if not provided, recommend review process for high-stakes decisions, and note any follow-up decisions needed. Filing convention: Store ADRs in `docs/adr/` or `architecture/decisions/` directory with sequential numbering.
+
+## Common Patterns
+
+**For technology selection:**
+- Focus on technical capabilities vs requirements
+- Include performance benchmarks if available
+- Document team expertise level
+- Consider operational complexity
+
+**For architectural changes:**
+- Include migration strategy in consequences
+- Document backward compatibility impact
+- Consider team velocity impact during transition
+- Note monitoring and rollback plans
+
+**For standards and conventions:**
+- Include examples of the standard in practice
+- Document exceptions or escape hatches
+- Consider enforcement mechanisms
+- Note educational/onboarding implications
+
+**For deprecations:**
+- Set status to "Deprecated" or "Superseded"
+- Link to superseding ADR
+- Document sunset timeline
+- Include migration guide
+
+## Guardrails
+
+**Do:**
+- Be honest about trade-offs (every choice has downsides)
+- Write for future readers who lack current context
+- Include specific technical details (versions, configurations)
+- Acknowledge uncertainty and risks
+- Keep ADRs immutable (status changes, but content doesn't)
+- Write one ADR per decision (focused scope)
+
+**Don't:**
+- Make decisions sound better than they are
+- Omit alternatives that were seriously considered
+- Use jargon without explanation
+- Write vague consequences ("might improve performance")
+- Revisit/edit old ADRs (write new superseding ADR instead)
+- Combine multiple independent decisions in one ADR
+
+## Quick Reference
+
+- **Standard template**: `resources/template.md`
+- **Complex decisions**: `resources/methodology.md`
+- **Examples**: `resources/examples/database-selection.md`, `resources/examples/microservices-migration.md`, `resources/examples/api-versioning.md`
+- **Quality rubric**: `resources/evaluators/rubric_adr_architecture.json`
+
+**ADR Naming Convention**: `adr-{number}-{short-kebab-case-title}.md`
+- Example: `adr-042-use-postgresql-for-primary-database.md`
--- a/skills/adr-architecture/resources/evaluators/rubric_adr_architecture.json
+++ b/skills/adr-architecture/resources/evaluators/rubric_adr_architecture.json
@@ -0,0 +1,135 @@
+{
+  "name": "Architecture Decision Record Quality Rubric",
+  "scale": {
+    "min": 1,
+    "max": 5,
+    "description": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent"
+  },
+  "criteria": [
+    {
+      "name": "Context Clarity",
+      "description": "Context section clearly explains WHY this decision is needed, without proposing solutions",
+      "scoring": {
+        "1": "No context or context is vague/unhelpful",
+        "2": "Some context but missing key requirements or constraints",
+        "3": "Context explains situation with main requirements/constraints",
+        "4": "Comprehensive context with background, requirements, and constraints",
+        "5": "Exceptional context that future readers with no knowledge can fully understand"
+      }
+    },
+    {
+      "name": "Decision Specificity",
+      "description": "Decision statement is specific, actionable, and unambiguous",
+      "scoring": {
+        "1": "Vague or no clear decision stated",
+        "2": "Decision stated but lacks specifics (versions, scope, approach)",
+        "3": "Decision is clear with main specifics",
+        "4": "Decision is very specific with technical details and scope",
+        "5": "Exceptionally detailed decision with configuration, versions, scope, and implementation approach"
+      }
+    },
+    {
+      "name": "Alternatives Quality",
+      "description": "Real alternatives documented with honest, balanced pros/cons",
+      "scoring": {
+        "1": "No alternatives or only straw man options",
+        "2": "1-2 alternatives but unfairly presented or minimal analysis",
+        "3": "2-3 alternatives with basic pros/cons",
+        "4": "3+ alternatives with honest, balanced analysis and specific reasons not chosen",
+        "5": "Multiple well-researched alternatives with nuanced trade-offs and fair representation"
+      }
+    },
+    {
+      "name": "Consequence Honesty",
+      "description": "Consequences include both benefits AND drawbacks with realistic assessment",
+      "scoring": {
+        "1": "Only benefits listed or consequences are vague",
+        "2": "Mostly benefits with token mention of downsides",
+        "3": "Balanced benefits and drawbacks but somewhat general",
+        "4": "Honest assessment of benefits, drawbacks, and risks with specifics",
+        "5": "Exceptionally honest and nuanced consequences with quantified trade-offs and mitigation strategies"
+      }
+    },
+    {
+      "name": "Technical Accuracy",
+      "description": "Technical details are accurate, current, and specific",
+      "scoring": {
+        "1": "Technical errors or outdated information",
+        "2": "Some technical details but lacking accuracy or currency",
+        "3": "Technically sound with accurate information",
+        "4": "High technical accuracy with specific versions, configurations, and current best practices",
+        "5": "Exceptional technical depth with precise details, performance characteristics, and expert-level accuracy"
+      }
+    },
+    {
+      "name": "Future Comprehension",
+      "description": "Someone unfamiliar with current context can understand the decision",
+      "scoring": {
+        "1": "Requires insider knowledge to understand",
+        "2": "Some context but many gaps for outsiders",
+        "3": "Mostly understandable with some background",
+        "4": "Clear to future readers with minimal context needed",
+        "5": "Perfectly self-contained; any future reader can fully understand"
+      }
+    },
+    {
+      "name": "Trade-off Transparency",
+      "description": "Trade-offs are explicitly stated and downsides acknowledged",
+      "scoring": {
+        "1": "No acknowledgment of trade-offs or downsides",
+        "2": "Minimal mention of trade-offs",
+        "3": "Trade-offs mentioned but not deeply explored",
+        "4": "Clear articulation of trade-offs and what's being sacrificed",
+        "5": "Exceptional transparency about trade-offs with explicit acceptance of costs"
+      }
+    },
+    {
+      "name": "Structure and Organization",
+      "description": "ADR follows standard structure and is well-organized",
+      "scoring": {
+        "1": "No clear structure or missing major sections",
+        "2": "Basic structure but disorganized or incomplete sections",
+        "3": "Follows standard ADR format with all key sections",
+        "4": "Well-organized with clear sections and good flow",
+        "5": "Exemplary structure with logical flow, clear headings, and easy navigation"
+      }
+    },
+    {
+      "name": "Actionability",
+      "description": "Decision is implementable; clear what to do next",
+      "scoring": {
+        "1": "Not clear what action to take",
+        "2": "General direction but unclear how to implement",
+        "3": "Clear decision that can be implemented",
+        "4": "Actionable decision with implementation guidance",
+        "5": "Exceptionally actionable with rollout plan, success criteria, and next steps"
+      }
+    },
+    {
+      "name": "Appropriate Scope",
+      "description": "ADR covers one decision at appropriate level of detail",
+      "scoring": {
+        "1": "Too broad (multiple unrelated decisions) or too narrow (trivial)",
+        "2": "Scope issues but decision is identifiable",
+        "3": "Appropriate scope for a single significant decision",
+        "4": "Well-scoped decision with clear boundaries",
+        "5": "Perfect scope; focused on one decision with appropriate detail level"
+      }
+    }
+  ],
+  "overall_assessment": {
+    "thresholds": {
+      "excellent": "Average score ≥ 4.5 (high-stakes decisions should aim for this)",
+      "very_good": "Average score ≥ 4.0 (most ADRs should achieve this)",
+      "good": "Average score ≥ 3.5 (minimum for acceptance)",
+      "acceptable": "Average score ≥ 3.0 (needs improvement but usable)",
+      "needs_rework": "Average score < 3.0 (should be revised before finalizing)"
+    },
+    "decision_stakes_guidance": {
+      "low_stakes": "Reversible decisions, low cost to change: aim for ≥ 3.5",
+      "medium_stakes": "Some migration cost, affects multiple teams: aim for ≥ 4.0",
+      "high_stakes": "Expensive to reverse, organization-wide impact: aim for ≥ 4.5"
+    }
+  },
+  "usage_instructions": "Rate each criterion independently on 1-5 scale. Calculate average score. For high-stakes decisions (affecting entire organization, expensive to reverse), aim for ≥4.5 average. For medium-stakes decisions, aim for ≥4.0. Minimum acceptable score is 3.5. Identify lowest-scoring criteria and improve those sections before delivering to user."
+}
--- a/skills/adr-architecture/resources/examples/database-selection.md
+++ b/skills/adr-architecture/resources/examples/database-selection.md
@@ -0,0 +1,285 @@
+# ADR-042: Use PostgreSQL for Primary Application Database
+
+**Status:** Accepted
+**Date:** 2024-01-15
+**Deciders:** Backend team (Sarah, James, Alex), CTO (Michael), DevOps lead (Christine)
+**Related ADRs:** ADR-015 (Data Model Design), ADR-051 (Read Replica Strategy - pending)
+
+## Context
+
+### Background
+Our new SaaS platform for project management is scheduled to launch Q2 2024. We need to select a primary database that will store user data, projects, tasks, and collaboration information for the next 3-5 years.
+
+Current situation:
+- Prototype uses SQLite (clearly insufficient for production)
+- Expected launch: 500 organizations, ~5,000 users
+- Growth projection: 10,000 organizations, ~100,000 users within 18 months
+- Data model is relational with complex queries (projects → tasks → subtasks → comments → attachments)
+
+### Requirements
+
+**Functional:**
+- Support for complex relational queries with JOINs across 4-6 tables
+- ACID transactions (critical for billing and permissions)
+- Full-text search across project content
+- JSON support for flexible metadata fields
+- Row-level security for multi-tenant isolation
+
+**Non-Functional:**
+- Handle 10,000 QPS at launch (mostly reads)
+- < 100ms p95 latency for queries
+- 99.9% uptime SLA
+- Support for read replicas (anticipated need at 50k+ QPS)
+- Point-in-time recovery for disaster recovery
+
+### Constraints
+- Budget: $5,000/month maximum for database infrastructure
+- Team expertise: Strong SQL experience, limited NoSQL experience
+- Timeline: Must finalize in 2 weeks to stay on schedule
+- Compliance: SOC 2 Type II required (data encryption at rest/transit)
+- Existing stack: Node.js backend, React frontend, deploying on AWS
+
+## Decision
+
+We will use **PostgreSQL 15+** as our primary application database, hosted on AWS RDS with the following configuration:
+
+**Infrastructure:**
+- AWS RDS PostgreSQL 15.x
+- Initially: db.r6g.xlarge instance (4 vCPU, 32GB RAM)
+- Multi-AZ deployment for high availability
+- Automated daily backups with 7-day retention
+- Point-in-time recovery enabled
+
+**Architecture:**
+- Single primary database initially
+- Prepared for read replicas when QPS exceeds 40k (anticipated 12-18 months)
+- Connection pooling via PgBouncer (deployed on application servers)
+- Row-Level Security (RLS) policies for multi-tenancy
+
+**Scope:**
+- All application data (users, organizations, projects, tasks)
+- Session storage (using pgSession)
+- Background job queue (using pg-boss)
+- Excludes: Analytics data (separate data warehouse), file metadata (DynamoDB)
+
+## Alternatives Considered
+
+### MySQL 8.0
+**Description:** Popular open-source relational database, strong AWS RDS support
+
+**Pros:**
+- Team has some MySQL experience
+- Excellent AWS RDS integration
+- Strong replication support
+- Lower cost than commercial databases
+
+**Cons:**
+- Weaker JSON support compared to PostgreSQL (JSON functions less mature)
+- Less robust constraint enforcement (e.g., CHECK constraints)
+- Full-text search less powerful than PostgreSQL's
+- InnoDB row-level locking can be problematic under high concurrency
+
+**Why not chosen:** PostgreSQL's superior JSON support is critical for our flexible metadata requirements. Our data model has complex constraints that PostgreSQL handles more elegantly.
+
+### MongoDB Atlas
+**Description:** Managed NoSQL document database with flexible schema
+
+**Pros:**
+- Excellent horizontal scalability
+- Flexible schema for evolving data model
+- Strong JSON/document support
+- Good full-text search
+
+**Cons:**
+- No multi-document ACID transactions (critical for our billing logic)
+- Team has limited NoSQL experience (learning curve risk)
+- Eventual consistency model incompatible with our requirements
+- JOIN-like operations ($lookup) are slow and cumbersome
+- More expensive at our scale (~$7k/month vs $3k for PostgreSQL)
+
+**Why not chosen:** Lack of ACID transactions across documents is a dealbreaker for billing and permission changes. Our relational data model doesn't fit document paradigm well.
+
+### Amazon Aurora PostgreSQL
+**Description:** AWS's PostgreSQL-compatible database with performance enhancements
+
+**Pros:**
+- PostgreSQL compatibility with AWS optimizations
+- Better read scaling (15 read replicas vs 5)
+- Faster failover (< 30s vs 60-120s)
+- Continuous backup to S3
+
+**Cons:**
+- 20-30% more expensive than RDS PostgreSQL
+- Some PostgreSQL extensions not supported
+- Vendor lock-in to AWS (harder to migrate to other clouds)
+- Adds complexity we don't need yet
+
+**Why not chosen:** Premium cost not justified at our current scale. Standard RDS PostgreSQL meets our needs. We can migrate to Aurora later if needed (minimal code changes).
+
+### CockroachDB
+**Description:** Distributed SQL database with PostgreSQL compatibility
+
+**Pros:**
+- Horizontal scalability built-in
+- Multi-region support for global deployment
+- PostgreSQL wire protocol compatibility
+- Strong consistency guarantees
+
+**Cons:**
+- Significantly more complex to operate (distributed systems expertise needed)
+- Higher latency for single-region workloads (consensus overhead)
+- Limited ecosystem compared to PostgreSQL
+- Team has zero distributed database experience
+- More expensive (~2-3x cost of RDS PostgreSQL)
+
+**Why not chosen:** Operational complexity far exceeds our current needs. We're a single-region deployment for the foreseeable future. Can revisit if we expand globally.
+
+## Consequences
+
+### Benefits
+
+**Strong Data Integrity:**
+- ACID transactions ensure billing accuracy and permission consistency
+- Robust constraint enforcement catches data errors at write-time
+- Foreign keys prevent orphaned records
+
+**Excellent Query Capabilities:**
+- Complex JOINs perform well with proper indexing
+- Window functions enable sophisticated analytics
+- CTEs (Common Table Expressions) simplify complex query logic
+- Full-text search with GIN indexes for project content search
+
+**JSON Flexibility:**
+- JSONB type allows flexible metadata without schema migrations
+- JSON operators enable querying nested structures efficiently
+- Balances schema enforcement (relations) with flexibility (JSON)
+
+**Team Productivity:**
+- Team's SQL expertise means fast development velocity
+- Mature ORM support (Sequelize, TypeORM) accelerates development
+- Extensive community resources and documentation
+- Familiar debugging and optimization tools
+
+**Operational Maturity:**
+- AWS RDS handles backups, patching, monitoring automatically
+- Point-in-time recovery provides disaster recovery
+- Multi-AZ deployment ensures high availability
+- Well-understood scaling path (read replicas, connection pooling)
+
+**Cost Efficiency:**
+- Estimated $3,000/month at launch scale (db.r6g.xlarge + storage)
+- Scales to ~$8,000/month with read replicas (at 100k users)
+- Well within $5k/month budget initially
+
+### Drawbacks
+
+**Vertical Scaling Limits:**
+- Single primary database limits write throughput to one instance
+- At ~50-60k QPS, will need read replicas (adds operational complexity)
+- Ultimate write limit around 100k QPS even with largest instance
+- Mitigation: Implement caching (Redis) for read-heavy workloads
+
+**Sharding Complexity:**
+- Horizontal partitioning (sharding) is manual and complex
+- If we exceed single-instance limits, migration to sharded setup is expensive
+- Not as straightforward as DynamoDB or Cassandra for horizontal scaling
+- Mitigation: Monitor growth carefully; consider Aurora or CockroachDB if needed
+
+**Replication Lag:**
+- Read replicas have eventual consistency (typically 10-100ms lag)
+- Application must handle stale reads if using replicas
+- Some queries must route to primary for consistency
+- Mitigation: Use replicas only for analytics and non-critical reads
+
+**Backup Window:**
+- Automated backups cause brief I/O pause (usually < 5s)
+- Scheduled during low-traffic window (3-4 AM PST)
+- Multi-AZ deployment minimizes impact
+- Mitigation: Accept brief latency spike during backup window
+
+### Risks
+
+**Performance Bottleneck:**
+- **Risk:** Single database becomes bottleneck before we implement read replicas
+- **Likelihood:** Medium (depends on growth rate)
+- **Mitigation:** Implement aggressive caching (Redis) for frequently accessed data; monitor QPS weekly; prepare read replica configuration in advance
+
+**Data Migration Challenges:**
+- **Risk:** If we need to migrate to different database, data size makes migration slow
+- **Likelihood:** Low (PostgreSQL should serve us for 3-5 years)
+- **Mitigation:** Regularly test backup/restore procedures; maintain clear data export processes
+
+**Team Scaling:**
+- **Risk:** As team grows, need to train new hires on PostgreSQL specifics (RLS, JSONB)
+- **Likelihood:** High (we plan to grow team)
+- **Mitigation:** Document database patterns; create onboarding materials; conduct code reviews
+
+### Trade-offs Accepted
+
+**Trading horizontal scalability for operational simplicity:** We're choosing a database that's simple to operate now but harder to scale horizontally later, accepting that we may need to re-architect in 3-5 years if we grow beyond single-instance limits.
+
+**Trading NoSQL flexibility for data integrity:** We're prioritizing ACID guarantees and relational integrity over schema flexibility, accepting that schema migrations will be required for data model changes.
+
+**Trading vendor portability for convenience:** AWS RDS lock-in is acceptable given the operational benefits. We could migrate to other managed PostgreSQL services (Google Cloud SQL, Azure) if needed, though with effort.
+
+## Implementation
+
+### Rollout Plan
+
+**Phase 1: Setup (Week 1-2)**
+- Provision AWS RDS PostgreSQL instance
+- Configure VPC security groups and IAM roles
+- Set up automated backups and monitoring
+- Configure PgBouncer connection pooling
+
+**Phase 2: Migration (Week 3-4)**
+- Migrate schema from SQLite prototype
+- Load seed data and test data
+- Performance test with simulated load
+- Configure monitoring alerts (CloudWatch, Datadog)
+
+**Phase 3: Launch (Q2 2024)**
+- Deploy to production
+- Monitor query performance and optimize slow queries
+- Weekly capacity review for first 3 months
+
+### Success Criteria
+
+**Technical:**
+- p95 query latency < 100ms (measured via APM)
+- Zero data integrity issues in first 6 months
+- 99.9% uptime achieved
+
+**Operational:**
+- Team can confidently make schema changes
+- Backup/restore tested and verified monthly
+- On-call incidents < 2 per month related to database
+
+**Business:**
+- Database costs remain under $5k/month through 10k users
+- Support 100k users without re-architecture
+
+### Future Considerations
+
+**Short-term (3-6 months):**
+- Implement Redis caching for hot data paths
+- Tune connection pool settings based on actual load
+- Create read-only database user for analytics
+
+**Medium-term (6-18 months):**
+- Add read replicas when QPS exceeds 40k
+- Implement query result caching
+- Consider Aurora migration if cost-benefit justifies
+
+**Long-term (18+ months):**
+- Evaluate sharding strategy if approaching single-instance limits
+- Consider multi-region deployment for global users
+- Explore specialized databases for specific workloads (e.g., time-series data)
+
+## References
+
+- [PostgreSQL 15 Release Notes](https://www.postgresql.org/docs/15/release-15.html)
+- [AWS RDS PostgreSQL Best Practices](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html)
+- [Internal: Database Performance Requirements Doc](https://docs.internal/db-requirements)
+- [Internal: Load Testing Results](https://docs.internal/load-test-2024-01)
+- [Benchmark: PostgreSQL vs MySQL JSON Performance](https://www.enterprisedb.com/postgres-tutorials/postgresql-vs-mysql-json-performance)
--- a/skills/adr-architecture/resources/methodology.md
+++ b/skills/adr-architecture/resources/methodology.md
@@ -0,0 +1,325 @@
+# ADR Methodology for Complex Decisions
+
+## Complex ADR Workflow
+
+Copy this checklist and track your progress:
+
+```
+ADR Progress (Complex Decisions):
+- [ ] Step 1: Identify decision pattern and scope
+- [ ] Step 2: Conduct detailed analysis for each concern
+- [ ] Step 3: Engage stakeholders and gather input
+- [ ] Step 4: Build decision tree for related choices
+- [ ] Step 5: Perform quantitative analysis and create ADR
+```
+
+**Step 1: Identify decision pattern and scope**
+
+Determine which pattern applies to your decision (cascading, competing concerns, unknown unknowns, etc.). See [Complex Decision Patterns](#complex-decision-patterns) for patterns and approaches.
+
+**Step 2: Conduct detailed analysis for each concern**
+
+For each competing concern (security, scalability, cost, compliance), analyze how alternatives address it. See [Extended ADR Sections](#extended-adr-sections) for analysis templates.
+
+**Step 3: Engage stakeholders and gather input**
+
+Identify all affected parties and gather their perspectives systematically. See [Stakeholder Management](#stakeholder-management) for mapping and engagement techniques.
+
+**Step 4: Build decision tree for related choices**
+
+Map out cascading or interdependent decisions. See [Decision Trees for Related Choices](#decision-trees-for-related-choices) for structuring related ADRs.
+
+**Step 5: Perform quantitative analysis and create ADR**
+
+Use scoring matrices, cost modeling, or load testing to support decision. See [Quantitative Analysis](#quantitative-analysis) for methods and examples.
+
+## Complex Decision Patterns
+
+### Pattern 1: Cascading Decisions
+
+When one architectural choice forces or constrains subsequent decisions.
+
+**Approach:**
+1. Create primary ADR for the main architectural decision
+2. Create child ADRs for cascading decisions, referencing parent
+3. Use "Related ADRs" field to link the chain
+
+**Example:**
+- ADR-100: Adopt Microservices Architecture (parent)
+- ADR-101: Use gRPC for Inter-Service Communication (child - follows from ADR-100)
+- ADR-102: Implement Service Mesh with Istio (child - follows from ADR-101)
+
+### Pattern 2: Competing Concerns
+
+When decision must balance multiple competing priorities (cost vs performance, security vs usability).
+
+**Approach:**
+Add **Analysis Sections** to standard ADR:
+
+```markdown
+## Detailed Analysis
+
+### Security Analysis
+{How each alternative addresses security requirements}
+
+### Performance Analysis
+{Benchmarks, load tests, scalability projections}
+
+### Cost Analysis
+{TCO over 3 years, including hidden costs}
+
+### Operational Complexity Analysis
+{Team skill requirements, monitoring needs, on-call burden}
+```
+
+### Pattern 3: Phased Decisions
+
+When full solution is too complex to decide at once; need to make interim decision.
+
+**Approach:**
+1. Create ADR for Phase 1 decision
+2. Add "Future Decisions" section listing what's deferred
+3. Set review date to revisit (e.g., "Review in 6 months")
+
+**Example:**
+```markdown
+## Decision (Phase 1)
+Start with managed PostgreSQL on RDS. Evaluate sharding vs Aurora vs NewSQL in 12 months.
+
+## Future Decisions Needed
+- ADR-XXX: Horizontal scaling strategy (by Q3 2025)
+- ADR-XXX: Multi-region deployment approach (by Q4 2025)
+```
+
+## Extended ADR Sections
+
+### When to Add Detailed Analysis Sections
+
+**Security Analysis** - Add when:
+- Decision affects authentication, authorization, or data protection
+- Compliance requirements involved (SOC2, HIPAA, GDPR)
+- Handling sensitive data
+
+**Performance Analysis** - Add when:
+- SLA commitments at stake
+- Significant performance differences between alternatives
+- Scalability is critical concern
+
+**Cost Analysis** - Add when:
+- Multi-year TCO differs significantly (>20%) between alternatives
+- Hidden costs exist (operational overhead, training, vendor lock-in)
+- Budget constraints are tight
+
+**Operational Complexity Analysis** - Add when:
+- Team skill gaps exist for some alternatives
+- On-call burden varies significantly
+- Monitoring/debugging complexity differs
+
+**Migration Analysis** - Add when:
+- Replacing existing system
+- Need to maintain backward compatibility
+- Rollback strategy is complex
+
+### Template for Extended Sections
+
+```markdown
+## Security Analysis
+
+### {Alternative A}
+- **Threat model**: {What threats does this mitigate?}
+- **Attack surface**: {What new vulnerabilities introduced?}
+- **Compliance**: {How does this meet regulatory requirements?}
+- **Score**: {1-5 rating}
+
+### {Alternative B}
+{Same structure}
+
+### Security Recommendation
+{Which alternative is strongest on security, and any mitigations needed}
+```
+
+## Stakeholder Management
+
+### Identifying Stakeholders
+
+**Technical stakeholders:**
+- Engineering teams affected by the decision
+- DevOps/SRE teams who will operate the solution
+- Security team for compliance/security decisions
+- Architecture review board (if exists)
+
+**Business stakeholders:**
+- Product managers (feature impact)
+- Finance (budget implications)
+- Legal/compliance (regulatory requirements)
+- Executive sponsors (strategic alignment)
+
+### Getting Input
+
+**Pre-ADR phase:**
+1. Conduct stakeholder interviews to gather requirements and constraints
+2. Share draft alternatives for early feedback
+3. Identify concerns and dealbreakers
+
+**ADR draft phase:**
+1. Share draft ADR with key stakeholders for review
+2. Hold working session to discuss controversial points
+3. Revise based on feedback
+
+**Approval phase:**
+1. Present ADR at architecture review (if applicable)
+2. Get sign-off from decision-makers
+3. Communicate decision to broader team
+
+### Recording Stakeholder Positions
+
+For controversial decisions, add:
+
+```markdown
+## Stakeholder Positions
+
+**Backend Team (Support):**
+"PostgreSQL aligns with our SQL expertise and provides ACID guarantees we need."
+
+**DevOps Team (Concerns):**
+"Concerned about operational complexity of read replicas. Request 2-week training before launch."
+
+**Finance (Neutral):**
+"Within budget. Request quarterly cost reviews to ensure no overruns."
+```
+
+## Decision Trees for Related Choices
+
+When decision involves multiple related questions, use decision tree approach.
+
+**Example: Cloud Provider + Database Decision**
+
+```
+Q1: Which cloud provider?
+├─ AWS
+│  ├─ Q2: Which database?
+│  │  ├─ RDS PostgreSQL → ADR-042
+│  │  ├─ Aurora → ADR-043
+│  │  └─ DynamoDB → ADR-044
+│
+├─ GCP
+│  └─ Q2: Which database?
+│     ├─ Cloud SQL → ADR-045
+│     └─ Spanner → ADR-046
+```
+
+**Approach:**
+1. Create ADR for Q1 (cloud provider selection)
+2. Create separate ADRs for Q2 based on Q1 outcome
+3. Link ADRs using "Related ADRs" field
+
+## Quantitative Analysis
+
+### Cost-Benefit Matrix
+
+For decisions with measurable trade-offs:
+
+| Alternative | Setup Cost | Annual Cost | Performance (QPS) | Team Velocity Impact | Risk Score |
+|-------------|-----------|-------------|-------------------|---------------------|------------|
+| PostgreSQL  | $10k      | $36k        | 50k               | +10% (familiar)     | Low        |
+| MongoDB     | $15k      | $84k        | 100k              | -20% (learning)     | Medium     |
+| DynamoDB    | $5k       | $60k        | 200k              | -15% (new patterns) | Medium     |
+
+### Decision Matrix with Weighted Criteria
+
+When multiple factors matter with different importance:
+
+```markdown
+## Weighted Decision Matrix
+
+Criteria weights:
+- Performance: 30%
+- Cost: 25%
+- Team Expertise: 20%
+- Operational Simplicity: 15%
+- Ecosystem Maturity: 10%
+
+| Alternative | Performance | Cost | Team | Ops | Ecosystem | Weighted Score |
+|-------------|-------------|------|------|-----|-----------|----------------|
+| PostgreSQL  | 7/10        | 9/10 | 10/10| 8/10| 10/10     | **8.35**       |
+| MongoDB     | 9/10        | 6/10 | 5/10 | 7/10| 8/10      | 7.10           |
+| DynamoDB    | 10/10       | 7/10 | 4/10 | 9/10| 7/10      | 7.50           |
+
+PostgreSQL scores highest on weighted criteria.
+```
+
+### Scenario Analysis
+
+For decisions under uncertainty, model different futures:
+
+```markdown
+## Scenario Analysis
+
+### Scenario 1: Rapid Growth (3x projections)
+- PostgreSQL: Need expensive scaling (Aurora + sharding), $150k/yr
+- DynamoDB: Handles easily, $120k/yr
+- **Winner**: DynamoDB
+
+### Scenario 2: Moderate Growth (1.5x projections)
+- PostgreSQL: Read replicas sufficient, $60k/yr
+- DynamoDB: Overprovisioned, $90k/yr
+- **Winner**: PostgreSQL
+
+### Scenario 3: Slow Growth (0.8x projections)
+- PostgreSQL: Single instance sufficient, $40k/yr
+- DynamoDB: Low usage still requires min provision, $70k/yr
+- **Winner**: PostgreSQL
+
+**Assessment**: PostgreSQL wins in 2 of 3 scenarios. Given our conservative growth estimates, PostgreSQL is safer bet.
+```
+
+## Review and Update Process
+
+### When to Review ADRs
+
+**Scheduled reviews:**
+- High-stakes decisions: Review after 6 months
+- Medium-stakes: Review after 12 months
+- Check if consequences matched reality
+
+**Triggered reviews:**
+- Major change in context (team size, scale, requirements)
+- Significant problems attributed to decision
+- New technology emerges that changes trade-offs
+
+### How to Update ADRs
+
+**Never edit old ADRs.** Instead:
+
+1. Create new ADR that supersedes the old one
+2. Update old ADR status to "Superseded by ADR-XXX"
+3. New ADR should reference old one and explain what changed
+
+**Example:**
+```markdown
+# ADR-099: Migrate from PostgreSQL to CockroachDB
+
+**Status:** Accepted
+**Date:** 2026-03-15
+**Supersedes:** ADR-042 (PostgreSQL decision)
+
+## Context
+ADR-042 chose PostgreSQL in 2024 when we had 5k users. We now have 500k users across 8 regions. PostgreSQL sharding has become operationally complex...
+
+## What Changed
+- Scale increased 100x beyond projections
+- Multi-region deployment now required for latency
+- Team size grew from 5 to 40 engineers (distributed systems expertise available)
+...
+```
+
+## Summary
+
+For complex decisions:
+- Break into multiple ADRs if needed (use cascading pattern)
+- Add detailed analysis sections for critical factors
+- Engage stakeholders early and document positions
+- Use quantitative analysis (matrices, scenarios) to support intuition
+- Plan for review and evolution over time
+
+Remember: The best ADR is the one that helps future teammates understand "why" when reading it 2 years later.
--- a/skills/adr-architecture/resources/template.md
+++ b/skills/adr-architecture/resources/template.md
@@ -0,0 +1,317 @@
+# ADR Template - Standard Format
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+ADR Creation Progress:
+- [ ] Step 1: Gather decision context and requirements
+- [ ] Step 2: Fill in template structure
+- [ ] Step 3: Document alternatives with pros/cons
+- [ ] Step 4: Analyze consequences honestly
+- [ ] Step 5: Validate with quality checklist
+```
+
+**Step 1: Gather decision context and requirements**
+
+Collect information on what decision needs to be made, why now, requirements (functional/non-functional), constraints (budget, timeline, skills, compliance), and scope. This becomes your Context section.
+
+**Step 2: Fill in template structure**
+
+Use [Quick Template](#quick-template) below to create ADR file with title (ADR-{NUMBER}: Decision), metadata (status, date, deciders), and sections for context, decision, alternatives, and consequences.
+
+**Step 3: Document alternatives with pros/cons**
+
+List 2-3+ real alternatives that were seriously considered. For each: description, pros (2-4 benefits), cons (2-4 drawbacks), and specific reason not chosen. See [Alternatives Considered](#alternatives-considered) guidance.
+
+**Step 4: Analyze consequences honestly**
+
+Document benefits, drawbacks, risks, and trade-offs accepted. Every decision has downsides - be honest about them and note mitigation strategies. See [Consequences](#consequences) guidance for structure.
+
+**Step 5: Validate with quality checklist**
+
+Use [Quality Checklist](#quality-checklist) to verify: context explains WHY, decision is specific/actionable, 2-3+ alternatives documented, consequences include benefits AND drawbacks, technical details accurate, future readers can understand without context.
+
+## Quick Template
+
+Copy this structure to create your ADR:
+
+```markdown
+# ADR-{NUMBER}: {Decision Title in Title Case}
+
+**Status:** {Proposed | Accepted | Deprecated | Superseded}
+**Date:** {YYYY-MM-DD}
+**Deciders:** {List people/teams involved in decision}
+**Related ADRs:** {Links to related ADRs, if any}
+
+## Context
+
+{Describe the situation, problem, or opportunity that necessitates this decision}
+
+**Background:**
+{What led to this decision being needed?}
+
+**Requirements:**
+{What functional/non-functional requirements must be met?}
+
+**Constraints:**
+{What limitations exist? Budget, timeline, skills, compliance, technical debt, etc.}
+
+## Decision
+
+{State clearly what you're choosing to do}
+
+{Be specific and actionable. Include:}
+- {What technology/approach/standard is being adopted}
+- {What version or configuration, if relevant}
+- {What scope this applies to (one service, entire system, etc.)}
+
+## Alternatives Considered
+
+### Option A: {Name}
+**Description:** {Brief description}
+**Pros:**
+- {Benefit 1}
+- {Benefit 2}
+
+**Cons:**
+- {Drawback 1}
+- {Drawback 2}
+
+**Why not chosen:** {Specific reason}
+
+### Option B: {Name}
+{Same structure}
+
+### Option C: {Name}
+{Same structure}
+
+*Note: Include at least 2-3 real alternatives that were seriously considered*
+
+## Consequences
+
+### Benefits
+- **{Benefit category}**: {Specific benefit and why it matters}
+- **{Benefit category}**: {Specific benefit and why it matters}
+
+### Drawbacks
+- **{Drawback category}**: {Specific cost/limitation and mitigation if any}
+- **{Drawback category}**: {Specific cost/limitation and mitigation if any}
+
+### Risks
+- **{Risk}**: {Likelihood and mitigation plan}
+
+### Trade-offs Accepted
+{What are we explicitly trading off? E.g., "Trading development speed for operational simplicity"}
+
+## Implementation
+
+{Optional section - include if implementation details are important}
+
+**Rollout Plan:**
+{How will this be deployed/adopted?}
+
+**Migration Path:**
+{If replacing something, how do we transition?}
+
+**Timeline:**
+{Key milestones and dates}
+
+**Success Criteria:**
+{How will we know this decision was right?}
+
+## References
+
+{Links to:}
+- {Technical documentation}
+- {Benchmarks or research}
+- {Related discussions or RFCs}
+- {Vendor documentation}
+```
+
+## Field-by-Field Guidance
+
+### Title
+- Format: `ADR-{NUMBER}: {Short Decision Summary}`
+- Number: Sequential, usually 001, 002, etc.
+- Summary: One line, actionable (e.g., "Use PostgreSQL for Primary Database", not "Database Choice")
+
+### Status
+- **Proposed**: Under discussion, not yet adopted
+- **Accepted**: Decision is final and being implemented
+- **Deprecated**: No longer recommended (but still in use)
+- **Superseded**: Replaced by another ADR (link to it)
+
+### Context
+**Purpose**: Help future readers understand WHY this decision was necessary
+
+**Include:**
+- What problem/opportunity triggered this?
+- What are the business/technical drivers?
+- What requirements must be met?
+- What constraints limit options?
+
+**Don't include:**
+- Solutions (those go in Decision section)
+- Analysis of options (that goes in Alternatives)
+
+**Length**: 2-4 paragraphs typically
+
+**Example:**
+> Our current monolithic application is becoming difficult to scale and deploy. Deploys take 45 minutes and require full system downtime. Teams are blocked on each other's changes. We need to support 10x traffic growth in the next year.
+>
+> Requirements: Independent deployment, horizontal scaling, fault isolation, team autonomy.
+> Constraints: Team has limited Kubernetes experience, must complete migration in 6 months, budget allows 20% infrastructure cost increase.
+
+### Decision
+**Purpose**: State clearly and specifically what you're doing
+
+**Include:**
+- Specific technology/approach (with version if relevant)
+- Configuration or implementation approach
+- Scope of application
+
+**Don't:**
+- Justify (that's in Consequences)
+- Compare (that's in Alternatives)
+- Be vague ("use the best tool")
+
+**Length**: 1-3 paragraphs
+
+**Example:**
+> We will adopt a microservices architecture using:
+> - Kubernetes (v1.28+) for orchestration
+> - gRPC for inter-service communication
+> - PostgreSQL databases (one per service where needed)
+> - Shared API gateway (Kong) for external traffic
+>
+> Scope: All new services and existing services as they require significant changes. No forced migration of stable services.
+
+### Alternatives Considered
+**Purpose**: Show other options were evaluated seriously (prevents "we should have considered X")
+
+**Include:**
+- 2-4 real alternatives that were discussed
+- Honest pros/cons for each
+- Specific reason not chosen
+
+**Don't:**
+- Include straw man options you never seriously considered
+- Unfairly present alternatives (be honest about their merits)
+- Omit major alternatives
+
+**Format for each alternative:**
+- Name/summary
+- Brief description
+- 2-4 key pros
+- 2-4 key cons
+- Why not chosen (specific, not "just worse")
+
+**Example:**
+> ### Continue with Monolith + Optimization
+> **Pros:**
+> - No migration cost or risk
+> - Team expertise is high
+> - Simpler operations
+>
+> **Cons:**
+> - Doesn't solve team coupling problem
+> - Still requires full-system deploys
+> - Scaling is all-or-nothing
+>
+> **Why not chosen:** Doesn't address fundamental team velocity and deployment issues that are our primary pain points.
+
+### Consequences
+**Purpose**: Honest assessment of what this decision means long-term
+
+**Include:**
+- Benefits (what we gain)
+- Drawbacks (what we lose or costs we incur)
+- Risks (what could go wrong)
+- Trade-offs (what we explicitly chose to sacrifice)
+
+**Critical**: Be honest about downsides. Every decision has cons.
+
+**Format:**
+- Group by category (performance, cost, team, operations, etc.)
+- Be specific (not "better performance" but "50% faster writes, 2x slower reads")
+- Note mitigation strategies for drawbacks where applicable
+
+**Example:**
+> **Benefits:**
+> - **Team velocity**: Teams can deploy independently, 10min deploys vs 45min
+> - **Scalability**: Can scale hot services independently, expect 50% infrastructure cost reduction
+> - **Resilience**: Service failures are isolated, no cascading failures
+>
+> **Drawbacks:**
+> - **Operational complexity**: Managing 15+ services vs 1, need monitoring/tracing
+> - **Development overhead**: Network calls vs function calls, serialization costs
+> - **Data consistency**: Eventual consistency across services, need compensating transactions
+>
+> **Risks:**
+> - **Migration risk**: If migration takes >6mo, could end up with worst of both worlds
+> - **Team skill gap**: Need to train team on Kubernetes, distributed systems concepts
+>
+> **Trade-offs:**
+> Trading development simplicity for deployment flexibility and team autonomy.
+
+## Quality Checklist
+
+Before finalizing, verify:
+
+- [ ] Title is clear and specific (not generic)
+- [ ] Status is set and accurate
+- [ ] Context explains WHY without proposing solutions
+- [ ] Decision is specific and actionable
+- [ ] At least 2-3 real alternatives are documented
+- [ ] Each alternative has honest pros/cons
+- [ ] Consequences include both benefits AND drawbacks
+- [ ] Risks are identified with mitigation where applicable
+- [ ] Technical details are accurate and specific
+- [ ] Future readers will understand context without asking around
+- [ ] No jargon without explanation
+- [ ] Trade-offs are explicitly acknowledged
+
+## Common Patterns
+
+### Technology Selection ADR
+Focus on: capabilities vs requirements, performance characteristics, team expertise, operational complexity, ecosystem maturity
+
+### Process/Standard ADR
+Focus on: enforcement mechanisms, exceptions, onboarding/training, examples, tooling support
+
+### Migration ADR
+Focus on: rollout strategy, backward compatibility, rollback plan, success metrics, timeline
+
+### Deprecation ADR
+Set Status: Deprecated or Superseded
+Include: Sunset timeline, migration path, superseding ADR link (if applicable)
+
+## Examples
+
+See `examples/` directory for complete examples:
+- `database-selection.md` - Technology choice
+- `api-versioning.md` - Standard/process decision
+- `microservices-migration.md` - Large architectural change
+
+## Anti-Patterns to Avoid
+
+**Vague context:**
+- Bad: "We need a better database"
+- Good: "Current MySQL instance hitting 80% CPU during peak load (5k QPS), queries taking >500ms"
+
+**Non-specific decision:**
+- Bad: "Use microservices"
+- Good: "Migrate to microservices using Kubernetes 1.28+ with gRPC, starting with user service"
+
+**Unfair alternatives:**
+- Bad: "MongoDB: bad for our use case, slow, unreliable"
+- Good: "MongoDB: Excellent for flexible schemas and horizontal scaling, but lacks multi-document ACID transactions we need for payments"
+
+**Hiding downsides:**
+- Bad: "PostgreSQL will solve all our problems"
+- Good: "PostgreSQL provides ACID guarantees we need, but will require read replicas at >50k QPS and is harder to shard than DynamoDB"
+
+**Too long:**
+- If ADR is >3 pages, consider splitting into multiple ADRs or creating separate design doc with ADR referencing it
--- a/skills/alignment-values-north-star/SKILL.md
+++ b/skills/alignment-values-north-star/SKILL.md
@@ -0,0 +1,163 @@
+---
+name: alignment-values-north-star
+description: Use when teams need shared direction and decision-making alignment. Invoke when starting new teams, scaling organizations, defining culture, establishing product vision, resolving misalignment, creating strategic clarity, or setting behavioral standards. Use when user mentions North Star, team values, mission, principles, guardrails, decision framework, or cultural alignment.
+---
+
+# Alignment: Values & North Star
+
+## Table of Contents
+
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [What is Values & North Star Alignment?](#what-is-values--north-star-alignment)
+- [Workflow](#workflow)
+  - [1. Understand Context](#1--understand-context)
+  - [2. Choose Framework](#2--choose-framework)
+  - [3. Develop Alignment Artifact](#3--develop-alignment-artifact)
+  - [4. Validate Quality](#4--validate-quality)
+  - [5. Deliver and Socialize](#5--deliver-and-socialize)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Create clear, actionable alignment frameworks that give teams a shared North Star (direction), values (guardrails), and decision tenets (behavioral standards). This enables autonomous decision-making while maintaining coherence across the organization.
+
+## When to Use This Skill
+
+- Starting new teams or organizations (defining identity)
+- Scaling teams (maintaining culture as you grow)
+- Resolving misalignment or conflicts (clarifying shared direction)
+- Defining product/engineering/design principles
+- Creating strategic clarity after pivots or changes
+- Establishing decision-making frameworks
+- Setting cultural norms and behavioral expectations
+- Post-merger integration (aligning different cultures)
+- Crisis response (re-centering on what matters)
+- Onboarding leaders who need to understand team identity
+
+**Trigger phrases:** "North Star", "team values", "mission", "vision", "principles", "guardrails", "what we stand for", "decision framework", "cultural alignment", "operating principles"
+
+## What is Values & North Star Alignment?
+
+A framework with three layers:
+
+1. **North Star**: The aspirational direction - where are we going and why?
+2. **Values/Guardrails**: Core principles that constrain how we operate
+3. **Decision Tenets/Behaviors**: Concrete, observable behaviors that demonstrate values
+
+**Quick Example:**
+
+```markdown
+# Engineering Team Alignment
+
+## North Star
+Build systems that developers love to use and operators trust to run.
+
+## Values
+- **Simplicity**: Choose boring technology that works over exciting technology that might
+- **Reliability**: Every service has SLOs and we honor them
+- **Empathy**: Design for the developer experience, not just system performance
+
+## Decision Tenets
+When choosing between options:
+✓ Pick the solution with fewer moving parts
+✓ Choose managed services over self-hosted when quality is comparable
+✓ Optimize for debuggability over micro-optimizations
+✓ Document decisions (ADRs) for future context
+
+## Behaviors (What This Looks Like)
+- Code reviews comment on operational complexity, not just correctness
+- We say no to features that compromise reliability
+- Postmortems focus on learning, not blame
+- Documentation is part of "done"
+```
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Alignment Framework Progress:
+- [ ] Step 1: Understand context
+- [ ] Step 2: Choose framework
+- [ ] Step 3: Develop alignment artifact
+- [ ] Step 4: Validate quality
+- [ ] Step 5: Deliver and socialize
+```
+
+**Step 1: Understand context**
+
+Gather background: team/organization (size, stage, structure), current situation (new team, scaling, misalignment, crisis), trigger (why alignment needed NOW), stakeholders (who needs to align), hard decisions (where misalignment shows up), and existing artifacts (mission, values, culture statements). This ensures the framework addresses real needs.
+
+**Step 2: Choose framework**
+
+For new teams/startups (< 30 people, defining identity from scratch) → Use `resources/template.md`. For scaling organizations (existing values need refinement, multiple teams, need decision framework) → Study `resources/methodology.md`. To see examples → Review `resources/examples/` (engineering-team.md, product-vision.md, company-values.md).
+
+**Step 3: Develop alignment artifact**
+
+Create `alignment-values-north-star.md` with: compelling North Star (1-2 sentences, aspirational but specific), 3-5 core values (specific to this team, not generic), decision tenets ("When X vs Y, we..."), observable behaviors (concrete examples), anti-patterns (optional - what we DON'T do), and context (optional - why these values). See [Common Patterns](#common-patterns) for team-type specific guidance.
+
+**Step 4: Validate quality**
+
+Self-check using `resources/evaluators/rubric_alignment_values_north_star.json`. Verify: North Star is inspiring yet concrete, values are specific and distinctive, decision tenets guide real decisions, behaviors are observable/measurable, usable for decisions TODAY, trade-offs acknowledged, no contradictions, distinguishes this team from others. Minimum standard: Score ≥ 3.5 (aim for 4.5+ if organization-wide).
+
+**Step 5: Deliver and socialize**
+
+Present completed framework with rationale (why these values), examples of application in decisions, rollout/socialization approach (hiring, decision-making, onboarding, team meetings), and review cadence (typically annually). Ensure team can recall and apply key points.
+
+## Common Patterns
+
+**For technical teams:**
+- Focus on technical trade-offs (simplicity vs performance, speed vs quality)
+- Make architectural principles explicit
+- Include operational considerations
+- Address technical debt philosophy
+
+**For product teams:**
+- Center on user/customer value
+- Address feature prioritization philosophy
+- Include quality bar and launch criteria
+- Make product-market fit assumptions explicit
+
+**For company-wide values:**
+- Keep values aspirational but grounded
+- Include specific behaviors (not just values)
+- Address how values interact (what wins when they conflict?)
+- Make hiring/firing implications clear
+
+**For crisis/change:**
+- Acknowledge what's changing
+- Re-center on core that remains
+- Be explicit about new priorities
+- Include timeline for transition
+
+## Guardrails
+
+**Do:**
+- Make values specific and distinctive (not generic)
+- Include concrete behaviors and examples
+- Acknowledge trade-offs (what you're NOT optimizing for)
+- Test values against real decisions
+- Keep it concise (1-2 pages max)
+- Make it memorable (people should be able to recall key points)
+- Involve the team in creating it (not top-down)
+
+**Don't:**
+- Use corporate jargon or buzzwords
+- Make it so generic it could apply to any company
+- Create laundry list of every good quality
+- Ignore tensions between values
+- Make it purely aspirational (need concrete behaviors)
+- Set it and forget it (values should evolve)
+- Weaponize values to shut down dissent
+
+## Quick Reference
+
+- **Standard template**: `resources/template.md`
+- **Scaling/complex cases**: `resources/methodology.md`
+- **Examples**: `resources/examples/engineering-team.md`, `resources/examples/product-vision.md`, `resources/examples/company-values.md`
+- **Quality rubric**: `resources/evaluators/rubric_alignment_values_north_star.json`
+
+**Output naming**: `alignment-values-north-star.md` or `{team-name}-alignment.md`
--- a/skills/alignment-values-north-star/resources/evaluators/rubric_alignment_values_north_star.json
+++ b/skills/alignment-values-north-star/resources/evaluators/rubric_alignment_values_north_star.json
@@ -0,0 +1,135 @@
+{
+  "name": "Alignment Framework Quality Rubric",
+  "scale": {
+    "min": 1,
+    "max": 5,
+    "description": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent"
+  },
+  "criteria": [
+    {
+      "name": "North Star Clarity",
+      "description": "North Star is inspiring yet specific and memorable",
+      "scoring": {
+        "1": "No North Star or completely generic/vague",
+        "2": "North Star exists but generic or unmemorable",
+        "3": "North Star is clear and somewhat specific",
+        "4": "North Star is compelling, specific, and memorable",
+        "5": "Exceptional North Star that team can recite and use for decisions"
+      }
+    },
+    {
+      "name": "Value Specificity",
+      "description": "Values are distinctive to this team, not generic corporate values",
+      "scoring": {
+        "1": "Generic values that could apply to any company",
+        "2": "Some generic values with minimal context",
+        "3": "Values have some specificity to this team/context",
+        "4": "Values are clearly specific to this team with context",
+        "5": "Exceptionally distinctive values that couldn't apply elsewhere"
+      }
+    },
+    {
+      "name": "Trade-off Transparency",
+      "description": "Values explicitly state what's being optimized FOR and what's de-prioritized",
+      "scoring": {
+        "1": "No trade-offs mentioned",
+        "2": "Vague mention of trade-offs",
+        "3": "Some values include trade-offs",
+        "4": "Most values explicitly state trade-offs",
+        "5": "All values clearly state what's gained and what's sacrificed"
+      }
+    },
+    {
+      "name": "Decision Tenet Utility",
+      "description": "Decision tenets provide actionable guidance for real decisions",
+      "scoring": {
+        "1": "No decision tenets or purely abstract",
+        "2": "Tenets exist but too vague to apply",
+        "3": "Tenets provide some practical guidance",
+        "4": "Tenets address real trade-offs with clear guidance",
+        "5": "Exceptional tenets that resolve actual team dilemmas"
+      }
+    },
+    {
+      "name": "Behavioral Observability",
+      "description": "Behaviors are concrete, observable, and measurable",
+      "scoring": {
+        "1": "No behaviors or purely aspirational statements",
+        "2": "Behaviors are vague (e.g., 'communicate well')",
+        "3": "Some behaviors are specific and observable",
+        "4": "Most behaviors are concrete and observable",
+        "5": "All behaviors are specific enough to recognize in daily work"
+      }
+    },
+    {
+      "name": "Immediate Applicability",
+      "description": "Someone could use this framework to make a decision TODAY",
+      "scoring": {
+        "1": "Framework is purely aspirational, no practical use",
+        "2": "Limited practical guidance",
+        "3": "Could inform some decisions",
+        "4": "Clear guidance for most common decisions",
+        "5": "Exceptional practical utility for daily decision-making"
+      }
+    },
+    {
+      "name": "Internal Consistency",
+      "description": "No contradictions between values, tenets, and behaviors",
+      "scoring": {
+        "1": "Major contradictions throughout",
+        "2": "Some contradictions between sections",
+        "3": "Mostly consistent with minor tensions",
+        "4": "Fully consistent with tensions acknowledged",
+        "5": "Perfect coherence with explicit resolution of value conflicts"
+      }
+    },
+    {
+      "name": "Distinctiveness",
+      "description": "Could clearly distinguish this team from others based on these values",
+      "scoring": {
+        "1": "Could apply to literally any team",
+        "2": "Minimal distinction from generic teams",
+        "3": "Some unique characteristics emerge",
+        "4": "Clear team identity and differentiation",
+        "5": "Unmistakably distinctive team culture"
+      }
+    },
+    {
+      "name": "Conciseness",
+      "description": "Framework is concise and memorable (1-2 pages ideally)",
+      "scoring": {
+        "1": "Too long (>5 pages) or too short (just platitudes)",
+        "2": "Either too verbose or too sparse",
+        "3": "Reasonable length but could be tighter",
+        "4": "Appropriately concise (2-3 pages)",
+        "5": "Perfect length (1-2 pages), highly memorable"
+      }
+    },
+    {
+      "name": "Anti-Pattern Clarity",
+      "description": "Explicitly states what the team does NOT do",
+      "scoring": {
+        "1": "No anti-patterns mentioned",
+        "2": "Vague or generic anti-patterns",
+        "3": "Some specific anti-patterns included",
+        "4": "Clear anti-patterns that set boundaries",
+        "5": "Exceptional anti-patterns that prevent common dysfunctions"
+      }
+    }
+  ],
+  "overall_assessment": {
+    "thresholds": {
+      "excellent": "Average score ≥ 4.5 (company-wide values should aim for this)",
+      "very_good": "Average score ≥ 4.0 (most alignment frameworks should achieve this)",
+      "good": "Average score ≥ 3.5 (minimum for team-level alignment)",
+      "acceptable": "Average score ≥ 3.0 (workable but needs improvement)",
+      "needs_rework": "Average score < 3.0 (revise before using)"
+    },
+    "scope_guidance": {
+      "team_level": "Team values (< 30 people): aim for ≥ 3.5",
+      "function_level": "Function/department values: aim for ≥ 4.0",
+      "company_level": "Organization-wide values: aim for ≥ 4.5"
+    }
+  },
+  "usage_instructions": "Rate each criterion on 1-5 scale. Calculate average. For team-level alignment, minimum is 3.5. For organization-wide values that will be used in hiring/performance reviews, aim for ≥4.5. Identify lowest-scoring criteria and improve those sections before delivering."
+}
--- a/skills/alignment-values-north-star/resources/examples/engineering-team.md
+++ b/skills/alignment-values-north-star/resources/examples/engineering-team.md
@@ -0,0 +1,343 @@
+# Platform Engineering Team Alignment Framework
+
+## Context
+
+**Why this matters now:**
+Our platform team has grown from 3 to 15 engineers in 12 months. We're seeing inconsistent architecture decisions, unclear ownership, and technical debt accumulating. Teams are blocked waiting for platform features. We need shared principles to enable autonomous decision-making while maintaining quality and consistency.
+
+**Who this is for:**
+Platform Engineering team (15 engineers, 3 teams: Infrastructure, Developer Tools, Observability)
+
+**Last updated:** 2024-11-15
+
+---
+
+## North Star
+
+**Build infrastructure that developers trust and love to use.**
+
+Every platform we build should make developers more productive, not add cognitive load. Systems should be so reliable that product teams never think about them.
+
+---
+
+## Core Values
+
+### 1. Simplicity Over Cleverness
+
+**What it means:**
+We choose boring, proven technology over exciting new tools. We value systems that operators can debug at 3 AM over systems that are technically impressive. Every abstraction must pay for itself in reduced complexity.
+
+**Why it matters:**
+Complex systems fail in complex ways. When product teams are down, they need fixes fast, not archaeology. Our 3 AM self will curse our 3 PM clever self.
+
+**What we optimize for:**
+- Debuggability and operational simplicity
+- Fewer moving parts
+- Standard solutions over custom code
+- Clear mental models
+
+**What we de-prioritize:**
+- Resume-driven development
+- Micro-optimizations that add complexity
+- Novel approaches when proven ones exist
+- Being first to adopt new tech
+
+**Examples:**
+- Use managed PostgreSQL instead of building our own database cluster
+- Choose nginx over a custom written load balancer
+- Pick Kubernetes standard patterns over clever custom operators
+
+### 2. Reliability Over Features
+
+**What it means:**
+Every service has SLOs and we honor them. We say no to features that compromise reliability. Reliability is a feature, not a phase. When in doubt, we prioritize preventing incidents over shipping features.
+
+**Why it matters:**
+One platform outage affects every product team. Our unreliability multiplies across the organization. Lost trust takes months to rebuild.
+
+**What we optimize for:**
+- SLO compliance (99.9%+ uptime for critical paths)
+- Graceful degradation
+- Observable systems
+- Rollback capability
+
+**What we de-prioritize:**
+- Shipping fast at the cost of stability
+- Features without monitoring
+- Experimentation that risks critical path
+- Tight coupling that prevents safe deploys
+
+**Examples:**
+- We run chaos engineering tests before Black Friday
+- New features launch behind feature flags with gradual rollout
+- We have automated rollback for all critical services
+
+### 3. Developer Experience is Not Optional
+
+**What it means:**
+APIs should be intuitive, documentation should exist, and errors should be actionable. If developers are frustrated, we've failed. We design for the developer using our systems, not just for the system itself.
+
+**Why it matters:**
+Friction in platform tools slows every product team. 100 developers × 10min friction = 1000min lost daily. Good DX multiplies productivity; bad DX multiplies frustration.
+
+**What we optimize for:**
+- Clear, actionable error messages
+- Comprehensive, up-to-date documentation
+- Simple onboarding (< 1 hour to first deploy)
+- Fast feedback loops
+
+**What we de-prioritize:**
+- Internal-only jargon in APIs
+- Optimization for our convenience over user experience
+- Tribal knowledge over documentation
+- Complex configurations
+
+**Examples:**
+- All APIs have OpenAPI specs and interactive docs
+- Error messages include links to runbooks
+- We measure "time to first successful deploy" for new services
+
+### 4. Ownership Means Accountability
+
+**What it means:**
+If you build it, you run it. Teams own their services end-to-end: development, deployment, monitoring, on-call. Ownership includes making your service operationally excellent, not just functionally correct.
+
+**Why it matters:**
+Throwing code over the wall creates unmaintainable systems. Operators who didn't build it can't improve it. Builders who don't operate it don't feel the pain of poor operational characteristics.
+
+**What we optimize for:**
+- End-to-end ownership
+- Operational maturity (monitoring, alerting, runbooks)
+- Empowered teams
+- Learning from production
+
+**What we de-prioritize:**
+- Separate "ops" team for platform services
+- Deploying without runbooks
+- Services without clear owners
+- Handoffs between teams
+
+**Examples:**
+- Infrastructure team is on-call for Kubernetes cluster
+- Developer Tools team owns CI/CD pipeline end-to-end
+- Each team has SLO dashboards they review weekly
+
+---
+
+## Decision Tenets
+
+### When Choosing Technology
+
+**When evaluating new tools:**
+- ✓ Choose managed services (RDS, managed K8s) over self-hosted when quality is comparable
+- ✓ Pick tools with strong observability out-of-the-box
+- ✓ Prefer tools our team has expertise in over "better" tools we'd need to learn
+- ✗ Don't adopt because it's trending or looks good on résumé
+- ⚠ **Exception:** When existing tools fundamentally can't meet requirements AND we have capacity to support
+
+**When building vs buying:**
+- ✓ Build when our requirements are unique or when vendors don't exist
+- ✓ Buy when it's undifferentiated heavy lifting (observability, databases)
+- ✗ Don't build for the joy of building
+- ⚠ **Exception:** Build when vendor lock-in risk outweighs development cost
+
+### When Making Architecture Decisions
+
+**When designing APIs:**
+- ✓ Choose RESTful JSON APIs as default (boring, widely understood)
+- ✓ Design for the developer experience, document before implementing
+- ✗ Don't create bespoke protocols without strong justification
+- ⚠ **Exception:** gRPC for high-performance internal services
+
+**When choosing between perfection and shipping:**
+- ✓ Ship with known minor issues if they don't affect SLOs
+- ✓ Document technical debt and schedule fixes
+- ✗ Don't ship if it compromises security, data integrity, or violates SLOs
+- ⚠ **Exception:** Never compromise on security, payments, or PII handling
+
+### When Prioritizing Work
+
+**When product teams request features:**
+- 🔴 **Critical:** SLO violations, security issues, data loss risks
+- 🟡 **Important:** Developer friction affecting >3 teams, technical debt preventing new features
+- ⚪ **Nice-to-have:** Single-team requests, optimizations, new nice-to-have features
+
+**When allocating time:**
+- 70% Product work (features product teams need)
+- 20% Platform health (tech debt, improvements)
+- 10% Innovation (experiments, R&D)
+
+---
+
+## Observable Behaviors
+
+### In Code Reviews
+
+- We comment on operational characteristics (What happens when this fails? How will we debug it?)
+- We ask "What's the runbook?" before approving infrastructure changes
+- We praise simplicity and question complexity
+- We flag missing monitoring/alerting
+
+### In Planning
+
+- First question: "What's the simplest thing that could work?"
+- We timeblock for operational work (not just features)
+- We say no to work that violates SLOs or has no monitoring plan
+- We ask "Who's on-call for this?"
+
+### In Incidents
+
+- Postmortems focus on learning, not blame
+- We fix systemic issues, not just symptoms
+- We update runbooks during incidents
+- We celebrate good incident response (fast mitigation, clear communication)
+
+### In Communication
+
+- We write Architecture Decision Records (ADRs) for significant choices
+- We document before we ship
+- We assume future readers lack our context
+- We use plain language, not jargon
+
+### In Hiring
+
+- We ask candidates to debug a system, not just build features
+- We evaluate for operational maturity, not just coding skills
+- We look for simplicity-minded engineers, not algorithm wizards
+- We ask: "Tell me about a time you chose boring tech over exciting tech"
+
+### In Daily Work
+
+- We check SLO dashboards daily
+- We improve documentation when we're confused
+- We automate toil we encounter
+- We share learnings in team channels
+
+---
+
+## Anti-Patterns
+
+### What We Explicitly DON'T Do
+
+✗ **Ship without monitoring** - even when deadline pressure is high
+   - **Because:** Reliability > Features. Can't fix what we can't observe.
+
+✗ **Optimize prematurely** - even when we see potential improvements
+   - **Because:** Simplicity > Performance. Measure first, optimize if needed.
+
+✗ **Build custom solutions for solved problems** - even when it seems fun
+   - **Because:** Maintenance cost exceeds initial development 10x. Use managed services.
+
+✗ **Skip documentation because "code is self-documenting"** - even when time is tight
+   - **Because:** Developer Experience matters. Future us will curse present us.
+
+✗ **Say yes to every feature request** - even when stakeholders insist
+   - **Because:** Ownership includes protecting platform quality. Our job is to say no to bad ideas.
+
+✗ **Deploy Friday afternoon** - even when it seems low-risk
+   - **Because:** Reliability matters more than shipping fast. Weekend incidents aren't worth it.
+
+---
+
+## How to Use This
+
+### In Decision-Making
+
+**When stuck between two options:**
+1. Check decision tenets above
+2. Ask "Which choice better serves our North Star?"
+3. Consider which value is most relevant
+4. Document decision in ADR with rationale
+
+### In Hiring
+
+**Interview questions tied to values:**
+- **Simplicity:** "Tell me about a time you refactored complex code to be simpler. What drove that decision?"
+- **Reliability:** "How do you decide when to ship vs. when to delay for quality?"
+- **Developer Experience:** "What makes a good API? Show me an API you love and why."
+- **Ownership:** "How do you approach on-call? What makes a service operationally mature?"
+
+### In Onboarding
+
+**Week 1:** New engineers read this document and discuss in 1:1
+**Week 2:** Shadow on-call to see ownership in practice
+**Week 3:** Pair on feature to see values in code review
+**Month 1:** Present one architectural decision using these tenets
+
+### In Performance Reviews
+
+We evaluate on:
+- **Simplicity:** Do they choose boring solutions? Do they reduce complexity?
+- **Reliability:** Do their services meet SLOs? How do they handle incidents?
+- **Developer Experience:** Is their code/APIs/docs easy to use?
+- **Ownership:** Do they own services end-to-end? Do they improve operations?
+
+### When Values Conflict
+
+**Simplicity vs Developer Experience:**
+- **Winner:** Developer Experience. Simple for *us* to maintain isn't valuable if developers can't use it.
+
+**Reliability vs Speed:**
+- **Winner:** Reliability. But use feature flags to ship safely.
+
+**Features vs Platform Health:**
+- **Default:** Follow 70/20/10 allocation. But SLO violations always win.
+
+---
+
+## Evolution
+
+**Review cadence:** Quarterly team retro discusses if values still serve us. Annual deep revision.
+
+**Who can propose changes:** Anyone. Discuss in team meeting, decision by consensus.
+
+**What stays constant:**
+- North Star (unless fundamental mission changes)
+- Core value themes (names might evolve, principles remain)
+
+**Recent changes:**
+- *2024-Q3*: Added "Developer Experience" value as team grew and internal customers increased
+- *2024-Q2*: Refined "Simplicity" to explicitly call out managed services
+- *2024-Q1*: Added 70/20/10 time allocation guideline
+
+---
+
+## Success Stories
+
+### Example 1: Chose PostgreSQL RDS over Self-Hosted
+
+**Situation:** Need database for new service. Self-hosted gives more control and learning opportunity.
+
+**Decision:** Chose AWS RDS PostgreSQL (managed service).
+
+**Values applied:**
+- ✓ Simplicity > Cleverness: Less operational burden
+- ✓ Ownership: Team doesn't have DB expertise yet
+- ✓ Reliability: AWS has better uptime than we'd achieve
+
+**Outcome:** Service launched in 2 weeks vs. estimated 6 weeks for self-hosted setup and learning. Zero database incidents in 6 months.
+
+### Example 2: Said No to Real-Time Features
+
+**Situation:** Product team requested real-time notifications (WebSockets) for dashboard.
+
+**Decision:** Said no, proposed 30-second polling instead.
+
+**Values applied:**
+- ✓ Simplicity: Polling is simpler than WebSocket infrastructure
+- ✓ Reliability: Don't risk stability for nice-to-have feature
+- ✓ Developer Experience: Team lacks WebSocket experience
+
+**Outcome:** Shipped in 1 week with polling. User research showed 30s delay was acceptable. Saved 6+ weeks of WebSocket infrastructure work.
+
+### Example 3: Invested Week in Documentation
+
+**Situation:** New API ready to ship, docs incomplete. Pressure to launch.
+
+**Decision:** Delayed launch one week to complete docs, examples, and migration guide.
+
+**Values applied:**
+- ✓ Developer Experience: Documentation is not optional
+- ✓ Ownership: We support what we ship
+
+**Outcome:** 12 teams adopted in first month (vs estimated 4-6 with poor docs). Near-zero support requests due to clear docs.
--- a/skills/alignment-values-north-star/resources/methodology.md
+++ b/skills/alignment-values-north-star/resources/methodology.md
@@ -0,0 +1,494 @@
+# Alignment Framework Methodology for Scaling Organizations
+
+## Alignment Framework Workflow
+
+Copy this checklist and track your progress:
+
+```
+Alignment Framework Progress:
+- [ ] Step 1: Audit current state and identify gaps
+- [ ] Step 2: Refine values through stakeholder discovery
+- [ ] Step 3: Build multi-team alignment framework
+- [ ] Step 4: Create decision frameworks for autonomy
+- [ ] Step 5: Rollout and reinforce across organization
+```
+
+**Step 1: Audit current state and identify gaps**
+
+Document stated vs actual values, interview stakeholders, and analyze past decisions to identify values drift. See [Refining Existing Values](#refining-existing-values) for audit techniques.
+
+**Step 2: Refine values through stakeholder discovery**
+
+Evolve or replace values based on discovery findings, using stakeholder input to ensure relevance. See [Refinement Process](#refinement-process) for evolution patterns and rollout strategies.
+
+**Step 3: Build multi-team alignment framework**
+
+Create layered alignment across company, function, and team levels to prevent silos. See [Multi-Team Alignment Frameworks](#multi-team-alignment-frameworks) for nested framework structures.
+
+**Step 4: Create decision frameworks for autonomy**
+
+Build decision tenets and authority matrices to enable aligned autonomy. See [Building Decision Frameworks for Autonomy](#building-decision-frameworks-for-autonomy) for tenet patterns and RACI matrices.
+
+**Step 5: Rollout and reinforce across organization**
+
+Execute phased rollout with leadership alignment, cascading communication, and ongoing reinforcement. See [Rollout Strategy for Refined Values](#rollout-strategy-for-refined-values) and [Case Study: Company-Wide Values Refresh](#case-study-company-wide-values-refresh) for implementation examples.
+
+---
+
+## Refining Existing Values
+
+### Why Refine?
+
+Common triggers:
+- Values are vague ("be excellent") - no operational guidance
+- Values conflict with reality (say "innovation" but punish failures)
+- New priorities emerged (e.g., shift from growth to profitability)
+- Multiple acquisitions brought different cultures
+- Team doesn't reference values in decisions (not useful)
+
+### Audit Current State
+
+**Step 1: Document existing values**
+- What are stated values? (website, onboarding docs, walls)
+- What are *actual* values? (observed in decisions, promotions, conflicts)
+- Gap analysis: Where do stated and actual diverge?
+
+**Step 2: Interview stakeholders**
+
+Questions to ask:
+- "What do we truly value here? (not what we say we value)"
+- "Tell me about a tough decision - what guided it?"
+- "What behaviors get rewarded? Punished?"
+- "When have our values helped you? Hindered you?"
+- "What values are missing that we need?"
+
+**Step 3: Analyze decisions**
+
+Review past 6-12 months:
+- Hiring/firing decisions - what values were applied?
+- Product prioritization - what drove choices?
+- Resource allocation - what got funded/cut?
+- Conflict resolution - how were tradeoffs made?
+
+Look for patterns revealing true values.
+
+### Refinement Process
+
+**Option A: Evolve existing values**
+
+Keep core values but make them clearer:
+
+Before: "Customer obsession"
+After: "Customer obsession: We prioritize long-term customer success over short-term metrics. When in doubt, we ask 'what would create lasting value for customers?' and optimize for that, even if it delays revenue."
+
+**Add:**
+- Specific definition
+- Why it matters
+- Decision examples
+- Anti-patterns
+
+**Option B: Retire and replace**
+
+When existing values don't serve:
+1. Acknowledge what's changing and why
+2. Thank the old values for their service
+3. Introduce new values with context
+4. Show connection (evolution, not rejection)
+
+Example:
+- Old: "Move fast and break things" (startup phase)
+- New: "Move deliberately with customer trust" (scale phase)
+- Context: "We used to optimize for speed because we needed product-market fit. Now we optimize for reliability because customers depend on us."
+
+### Rollout Strategy for Refined Values
+
+**Phase 1: Leadership alignment (Week 1-2)**
+- All leaders can articulate values in their own words
+- Leadership team models values in visible decisions
+- Leaders prepared to answer "why change?" and "what's different?"
+
+**Phase 2: Cascading communication (Week 3-4)**
+- All-hands presentation (context, new values, Q&A)
+- Team-level workshops (apply to team decisions)
+- 1:1s address individual concerns
+
+**Phase 3: Integration (Month 2-3)**
+- Update hiring rubrics
+- Update performance review criteria
+- Reference in decision memos
+- Celebrate examples of values in action
+
+**Phase 4: Reinforcement (Ongoing)**
+- Monthly: Leaders share values-driven decisions
+- Quarterly: Audit if values are being used
+- Annually: Refresh based on feedback
+
+---
+
+## Multi-Team Alignment Frameworks
+
+### Challenge: Silos & Conflicting Priorities
+
+As organizations scale:
+- Teams optimize for local goals
+- Priorities conflict (eng wants stability, product wants speed)
+- Decisions require escalation (autonomy breaks down)
+- Values interpreted differently across teams
+
+### Layered Alignment Framework
+
+**Layer 1: Company-Wide North Star & Values**
+
+Company level (50+ people, multiple teams):
+- North Star: Aspirational direction for whole company
+- Values: 3-5 company-wide principles
+- Decision Tenets: Company-level tradeoff guidance
+
+Example:
+```
+Company North Star: "Empower every team to ship confidently"
+
+Company Values:
+1. Customer trust over growth metrics
+2. Clarity over consensus
+3. Leverage through platforms
+
+Company Decision Tenets:
+- When product and platform conflict, platforms enable more product value long-term
+- When speed and reliability conflict, we choose reliability for critical paths
+```
+
+**Layer 2: Function-Level Values & Tenets**
+
+Engineering, Product, Design, Sales functions each add:
+- Function-specific interpretation of company values
+- Function decision tenets (within company constraints)
+- Function behaviors
+
+Example (Engineering):
+```
+Engineering North Star: "Enable product velocity through reliable platforms"
+
+Engineering Values (extending company):
+1. Customer trust → "We treat production as sacred"
+2. Clarity → "We write decisions down before coding"
+3. Leverage → "We build platforms, not point solutions"
+
+Engineering Decision Tenets:
+- When feature velocity and platform health conflict, platform health wins
+- When local optimization and system optimization conflict, system wins
+- When urgency and testing conflict, we ship with tests (move test left)
+```
+
+**Layer 3: Team-Level Rituals & Practices**
+
+Individual teams implement values through rituals:
+- How we run standups
+- How we make architectural decisions
+- How we handle incidents
+- How we onboard new members
+
+Example (Platform Team):
+```
+Rituals embodying "Platform enables product velocity":
+- Weekly: Office hours for product teams (30 min slots)
+- Monthly: Platform roadmap review with product input
+- Quarterly: Platform usability study with product engineers
+```
+
+### Alignment Check: Nested Frameworks
+
+Test if layers are aligned:
+
+| Company Value | Function Interpretation | Team Practice |
+|---------------|------------------------|---------------|
+| Customer trust | Engineering: Production is sacred | Platform: 99.9% SLA, postmortems within 24hr |
+| Clarity | Engineering: Write before coding | Platform: RFC required for API changes |
+| Leverage | Engineering: Platforms not point solutions | Platform: Reusable libraries, not feature forks |
+
+If a team practice doesn't connect to function value → doesn't connect to company value → misaligned.
+
+---
+
+## Building Decision Frameworks for Autonomy
+
+### Problem: Alignment vs Autonomy Tension
+
+- Too much alignment → slow, needs approval for everything
+- Too much autonomy → teams diverge, duplicate work, conflict
+
+Goal: **Aligned autonomy** - teams make fast local decisions within clear constraints.
+
+### Decision Tenet Pattern
+
+**Format:**
+```
+When {situation with tradeoff}, we choose {option A} over {option B} because {rationale}.
+```
+
+**Characteristics of good tenets:**
+- Specific (not "be excellent")
+- Tradeoff-oriented (acknowledges what we're NOT optimizing)
+- Contextual (explains why this choice for us)
+- Actionable (guides concrete decisions)
+
+**Example Tenets:**
+
+Engineering:
+```
+When latency and throughput conflict, we optimize for latency (p95 < 100ms)
+because our users are professionals in workflows where milliseconds matter.
+```
+
+Product:
+```
+When power-user features and beginner simplicity conflict, we choose beginner simplicity
+because growing the user base is our current strategic priority (2024 goal: 10x users).
+```
+
+Sales:
+```
+When deal size and customer fit conflict, we choose customer fit
+because high-churn enterprise customers damage our brand and reference-ability.
+```
+
+### Decision Authority Matrix (RACI + Values)
+
+Map which decisions require escalation vs can be made locally:
+
+| Decision Type | Team Authority | Escalation Trigger | Values Applied |
+|---------------|----------------|-------------------|----------------|
+| API design for team features | Team decides | If cross-team impact | Platform leverage |
+| Production incident response | On-call decides | If customer data risk | Customer trust |
+| Prioritization within quarter | PM decides | If OKR conflict | Quarterly focus |
+| Hiring bar | Team + function | Never lower bar | Excellence standard |
+
+**Escalation triggers** (when to involve leadership):
+- Cross-team conflict on priorities
+- Values conflict (two values in tension)
+- Precedent-setting decision (will affect future teams)
+- High-stakes outcome (>$X, >Y customer impact)
+
+### Operationalizing Tenets in Decisions
+
+**Step 1: Frame decision with tenets**
+
+Bad decision memo:
+```
+We should build feature X.
+```
+
+Good decision memo:
+```
+Decision: Build feature X
+
+Relevant tenets:
+- "When power-users and beginners conflict, choose beginners" → Feature X is beginner-focused ✓
+- "When latency and features conflict, choose latency" → Feature X adds 20ms latency ✗
+- "Platform leverage over point solutions" → Feature X is platform component ✓
+
+Recommendation: Build feature X BUT optimize latency first (refactor API)
+Estimate: +2 weeks for latency optimization, worth it per tenets
+```
+
+**Step 2: Audit tenet usage**
+
+Quarterly review:
+- How many decisions referenced tenets?
+- Which tenets are most/least used?
+- Where did tenets conflict? (may need refinement)
+- Where did teams escalate unnecessarily? (need clearer tenet)
+
+---
+
+## Common Scaling Challenges
+
+### Challenge 1: Values Drift (Stated ≠ Actual)
+
+**Symptoms:**
+- Leaders say "we value X" but reward Y
+- Values posters on walls, but no one references them
+- Cynicism about values ("just marketing")
+
+**Diagnosis:**
+- Review promotions: Who gets promoted? What values did they embody?
+- Review tough decisions: Which values were actually applied?
+- Interview employees: "Do you use our values? When? How?"
+
+**Fix:**
+1. **Acknowledge drift** ("Our stated values haven't matched our actions")
+2. **Choose**: Either change stated values to match reality OR change behavior to match values
+3. **Leader modeling**: Leaders publicly use values in decisions
+4. **Consequences**: Promotions/rewards explicitly tied to values
+
+**Example:**
+```
+Stated: "We value work-life balance"
+Reality: Promotions go to those who work weekends
+
+Fix Option A (change stated): "We value high output and intense commitment"
+Fix Option B (change reality): "Promotions now require sustainable pace, not just output"
+```
+
+### Challenge 2: Values Conflict (Internal Tensions)
+
+**Symptoms:**
+- Teams cite different values for same decision
+- Paralysis (can't decide because values conflict)
+- Escalation overload (everything needs leadership tiebreak)
+
+**Diagnosis:**
+- Map values pairwise: When do Value A and Value B conflict?
+- Identify repeated conflict scenarios
+- Ask: Is this values conflict or unclear priority?
+
+**Fix: Priority tenets**
+
+When values conflict, state priority:
+```
+"Speed" and "Quality" both matter, but:
+- For customer-facing features: Quality > Speed (customer trust)
+- For internal tools: Speed > Perfection (iterate fast)
+- For platform APIs: Quality > Speed (leverage means hard to change)
+```
+
+### Challenge 3: Multi-Team Misalignment
+
+**Symptoms:**
+- Teams build conflicting solutions
+- Escalation required for every cross-team decision
+- "Not my priority" culture
+
+**Diagnosis:**
+- Map team goals: Do team OKRs align?
+- Check incentives: What does each team get rewarded for?
+- Review cross-team projects: How often do they succeed?
+
+**Fix: Nested alignment framework (see above)**
+
+Plus:
+- **Cross-team rituals**: Monthly
+
+ syncs on interdependencies
+- **Shared metrics**: At least one metric in common across teams
+- **Rotation**: Engineers rotate across team boundaries
+
+---
+
+## Case Study: Company-Wide Values Refresh
+
+### Context
+
+**Company**: SaaS product, 150 employees, 8 engineering teams
+**Trigger**: Rapid growth (30 → 150 people in 18 months), old startup values not working
+**Old values**: "Move fast", "Customer obsessed", "Scrappy"
+**Problem**: "Move fast" causing production incidents; "Scrappy" justifying technical debt that slows product
+
+### Process
+
+**Month 1: Discovery**
+- Interviewed 40 employees (all levels, all functions)
+- Reviewed 20 major decisions (what values were actually applied?)
+- Surveyed all employees: "What do we truly value? What should we value?"
+
+**Key findings:**
+- "Move fast" interpreted as "ship without testing" (not intended)
+- "Customer obsessed" unclear (speed to market vs quality vs support?)
+- "Scrappy" became excuse for poor tooling
+- **Missing value**: Reliability/trust (now serving enterprise customers)
+
+**Month 2: Leadership Workshop**
+- All directors + exec team (2-day offsite)
+- Reviewed discovery findings
+- Drafted new values + tenets
+- Pressure-tested against real decisions
+
+**New values (refined):**
+1. **Customer trust over growth metrics**
+   - Tenet: "When feature velocity and reliability conflict, reliability wins for core workflows"
+   - Evolution of "customer obsessed" (clarified: long-term trust, not short-term features)
+
+2. **Leverage through platforms**
+   - Tenet: "When team autonomy and platform standards conflict, we choose standards for leverage"
+   - Evolution of "scrappy" (still efficient, but via platforms not point solutions)
+
+3. **Clarity over consensus**
+   - Tenet: "When speed and buy-in conflict, we choose fast decision with clear rationale over slow consensus"
+   - New value (addresses decision paralysis)
+
+**Month 3: Rollout**
+- All-hands (CEO presented, Q&A, examples of how values applied to recent decisions)
+- Team workshops (each team applied to their context)
+- Updated hiring rubric (added values-based questions)
+- Updated performance review (added values section)
+
+**Month 4-6: Reinforcement**
+- Weekly exec team review: "What values-driven decisions did we make?"
+- Monthly all-hands: Celebrate values in action (shoutouts)
+- Quarterly survey: "Are we living our values?"
+
+### Results (6 months later)
+
+**Wins:**
+- Production incidents dropped 60% ("Customer trust" being applied)
+- Engineering happiness up 25% (better tooling via "leverage through platforms")
+- Decision velocity up (no more endless debates, "clarity over consensus")
+- Values referenced in 80% of decision memos (actual usage)
+
+**Challenges:**
+- Some engineers missed "move fast" culture (clarified: fast decisions, deliberate execution)
+- Sales initially confused ("customer trust" seemed to slow deals - clarified: long-term trust creates more deals)
+
+**Evolution (12 months):**
+- Added 4th value: "Default to transparency" (based on feedback)
+- Refined "leverage" tenet (too restrictive, added exceptions for experiments)
+
+### Lessons Learned
+
+1. **Co-create with leadership**: Top-down values fail, need buy-in from leaders who'll model them
+2. **Show the evolution**: Don't reject old values, show how they evolved (honors the past)
+3. **Operationalize fast**: Values are useless without tenets + integration into decisions
+4. **Celebrate examples**: Abstract values need concrete stories of values in action
+5. **Iterate**: Values are living, not static - update based on feedback
+
+---
+
+## Quality Checklist for Scaling Organizations
+
+Before finalizing alignment framework refresh, check:
+
+**Discovery**:
+- [ ] Interviewed stakeholders across levels/functions
+- [ ] Reviewed actual decisions (not just stated values)
+- [ ] Identified gap between stated and actual values
+- [ ] Understood why current values aren't working
+
+**Refinement**:
+- [ ] New values address root causes (not symptoms)
+- [ ] Values evolved from old (honored the past)
+- [ ] Values are specific and actionable (not vague platitudes)
+- [ ] Tenets operationalize values (guide concrete decisions)
+- [ ] Conflicts between values explicitly resolved (priority tenets)
+
+**Multi-Team Alignment**:
+- [ ] Company-wide values clear
+- [ ] Function-level interpretations add specificity
+- [ ] Team practices connect to function/company values
+- [ ] Decision authority matrix defined (what escalates vs local)
+- [ ] Cross-team conflicts have resolution process
+
+**Rollout**:
+- [ ] Leadership aligned and can model values
+- [ ] Communication plan (all-hands, team workshops, 1:1s)
+- [ ] Integration into systems (hiring, perf review, decision memos)
+- [ ] Examples prepared (values in action stories)
+- [ ] Feedback loops established (quarterly check-ins)
+
+**Reinforcement**:
+- [ ] Regular rituals (monthly values spotlights)
+- [ ] Values referenced in decisions (not just posters)
+- [ ] Consequences tied to values (promotions, rewards)
+- [ ] Audit usage quarterly (are values being applied?)
+- [ ] Iterate based on feedback (values evolve)
+
+**Minimum standard for scaling orgs**: All checklist items completed before rollout
--- a/skills/alignment-values-north-star/resources/template.md
+++ b/skills/alignment-values-north-star/resources/template.md
@@ -0,0 +1,462 @@
+# Alignment Framework Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Alignment Framework Progress:
+- [ ] Step 1: Draft North Star and core values
+- [ ] Step 2: Create decision tenets for common dilemmas
+- [ ] Step 3: Define observable behaviors
+- [ ] Step 4: Add anti-patterns and usage guidance
+- [ ] Step 5: Validate with quality checklist
+```
+
+**Step 1: Draft North Star and core values**
+
+Write 1-2 sentence North Star (where we're going and why) and 3-5 core values with specific definitions, why they matter, what we optimize FOR, and what we de-prioritize. Use [Quick Template](#quick-template) structure and [Field-by-Field Guidance](#field-by-field-guidance) for details.
+
+**Step 2: Create decision tenets for common dilemmas**
+
+Identify 5-10 real trade-offs your team faces and write "When X vs Y, we..." statements. See [Decision Tenets](#decision-tenets) guidance for format. Include specific reasons tied to values and acknowledge merit of alternatives.
+
+**Step 3: Define observable behaviors**
+
+List 10-15 specific, observable actions across contexts: meetings, code/design reviews, planning, communication, hiring, operations. See [Observable Behaviors](#observable-behaviors) for examples. Focus on what you could notice in daily work.
+
+**Step 4: Add anti-patterns and usage guidance**
+
+Document 3-5 behaviors you explicitly DON'T do, even when tempting, and explain which value they violate. Add practical guidance for using framework in decision-making, hiring, onboarding, performance reviews. See [Anti-Patterns](#anti-patterns) section.
+
+**Step 5: Validate with quality checklist**
+
+Use [Quality Checklist](#quality-checklist) to verify: North Star is memorable, values are specific with trade-offs, decision tenets address real dilemmas, behaviors are observable, usable TODAY, no contradictions, 1-2 pages total, jargon-free.
+
+## Quick Template
+
+Copy this structure to create your alignment framework:
+
+```markdown
+# {Team/Organization Name} Alignment Framework
+
+## Context
+
+**Why this matters now:**
+{What triggered the need for alignment? Growth, conflict, new direction?}
+
+**Who this is for:**
+{Team, organization, function - be specific}
+
+**Last updated:** {Date}
+
+---
+
+## North Star
+
+{1-2 sentences: Where are we going and why?}
+
+**Example formats:**
+- "Build {what} that {who} {value proposition}"
+- "Become the {superlative} {thing} for {audience}"
+- "{Action verb} {outcome} by {approach}"
+
+---
+
+## Core Values
+
+### Value 1: {Name}
+**What it means:** {Specific definition in context of this team}
+
+**Why it matters:** {What problem does honoring this value solve?}
+
+**What we optimize for:** {Concrete outcome}
+
+**What we de-prioritize:** {Trade-off we accept}
+
+### Value 2: {Name}
+{Same structure}
+
+### Value 3: {Name}
+{Same structure}
+
+*Note: 3-5 values is ideal. More than 7 becomes unmemorable.*
+
+---
+
+## Decision Tenets
+
+When making decisions, we:
+
+**When choosing between {X} and {Y}:**
+- ✓ We choose {X} because {specific reason tied to values}
+- ✗ We don't choose {Y} even though {acknowledge Y's merit}
+
+**When facing {common dilemma}:**
+- ✓ Our default is {approach} because {value}
+- ⚠ Exception: When {specific condition}, we {alternative}
+
+**When prioritizing {work/features/initiatives}:**
+- 🔴 Critical: {what always gets done}
+- 🟡 Important: {what gets done when possible}
+- ⚪ Nice-to-have: {what we explicitly defer}
+
+*Include 5-10 decision tenets that address real trade-offs your team faces*
+
+---
+
+## Observable Behaviors
+
+**What this looks like in practice:**
+
+**In meetings:**
+- {Specific behavior that demonstrates value}
+- {Specific behavior that demonstrates value}
+
+**In code reviews / design reviews:**
+- {What comments look like}
+- {What we praise / what we push back on}
+
+**In planning / prioritization:**
+- {How decisions get made}
+- {What questions we ask}
+
+**In communication:**
+- {How we share information}
+- {How we give feedback}
+
+**In hiring:**
+- {What we look for}
+- {What's a dealbreaker}
+
+**In operations / incidents:**
+- {How we respond to problems}
+- {What we optimize for under pressure}
+
+---
+
+## Anti-Patterns
+
+**What we explicitly DON'T do:**
+
+- ✗ {Behavior that violates values} - even when {tempting circumstance}
+- ✗ {Common industry practice we reject} - because {conflicts with which value}
+- ✗ {Shortcuts we don't take} - we value {what} over {what}
+
+---
+
+## How to Use This
+
+**In decision-making:**
+{Practical guide for referencing these values when stuck}
+
+**In hiring:**
+{How to interview for these values, what questions to ask}
+
+**In onboarding:**
+{How new teammates should learn these values}
+
+**In performance reviews:**
+{How values factor into evaluations}
+
+**When values conflict:**
+{Which value wins in common scenarios, or how to resolve}
+
+---
+
+## Evolution
+
+**Review cadence:** {How often to revisit - typically annually}
+
+**Who can propose changes:** {Process for updating values}
+
+**What stays constant:** {Core elements that shouldn't change}
+```
+
+## Field-by-Field Guidance
+
+### North Star
+
+**Purpose**: Inspiring but specific direction
+
+**Include:**
+- Who you serve
+- What value you create
+- What makes you distinctive
+
+**Don't:**
+- Be generic ("be the best")
+- Use corporate speak
+- Make it unmemorable
+
+**Length**: 1-2 sentences max
+
+**Test**: Can team members recite it from memory? Does it help choose between two good options?
+
+**Examples:**
+
+**Good:**
+- "Build developer tools that spark joy and eliminate toil"
+- "Make renewable energy cheaper than fossil fuels for every market by 2030"
+- "Give every student personalized learning that adapts to how they learn best"
+
+**Bad:**
+- "Achieve excellence in everything we do" (generic)
+- "Leverage synergies to maximize stakeholder value" (jargon)
+- "Be the world's leading provider of solutions" (unmemorable, vague)
+
+### Core Values
+
+**Purpose**: Principles that constrain behavior
+
+**Include:**
+- Specific definition in your context
+- Why it matters (what problem it solves)
+- Trade-off you accept
+- 3-5 values total
+
+**Don't:**
+- List every positive quality
+- Be generic (every company has "integrity")
+- Ignore tensions between values
+- Go beyond 7 values (unmemorable)
+
+**Structure for each value:**
+- Name (1-2 words)
+- Definition (what it means HERE)
+- Why it matters
+- What we optimize FOR
+- What we de-prioritize (trade-off)
+
+**Examples:**
+
+**Good - Specific:**
+- **Bias to action**: We'd rather ship, learn, and iterate than plan perfectly. We accept some rework to get fast feedback. We optimize for learning velocity over getting it right the first time.
+
+**Bad - Generic:**
+- **Excellence**: We strive for excellence in everything we do and never settle for mediocrity.
+
+**Good - Shows trade-off:**
+- **User delight over enterprise features**: We prioritize magical user experiences for individuals over procurement-friendly enterprise checkboxes. We'll lose some enterprise deals to keep the product simple.
+
+**Bad - No trade-off:**
+- **Customer focus**: We care deeply about our customers and always put them first.
+
+### Decision Tenets
+
+**Purpose**: Actionable guidance for real decisions
+
+**Include:**
+- "When choosing between X and Y..." format
+- Real dilemmas your team faces
+- Specific guidance, not platitudes
+- 5-10 tenets
+
+**Don't:**
+- Be abstract ("choose the best option")
+- Avoid acknowledging trade-offs
+- Make it too long (unmemorable)
+
+**Format:**
+
+```
+When choosing between {specific options your team actually faces}:
+- ✓ We {specific action} because {which value}
+- ✗ We don't {alternative} even though {acknowledge merit}
+```
+
+**Examples:**
+
+**Good:**
+```
+When choosing between shipping fast and perfect quality:
+- ✓ Ship with known minor bugs if user impact is low
+- ✗ Don't delay for perfection
+- ⚠ Exception: Anything related to payments, security, or data loss requires high quality bar
+```
+
+**Bad:**
+```
+When making decisions:
+- Always do what's best for the customer
+```
+
+### Observable Behaviors
+
+**Purpose**: Concrete manifestation of values
+
+**Include:**
+- Specific, observable actions
+- Examples from daily work
+- Things you could notice in a meeting
+- 10-15 behaviors across contexts
+
+**Don't:**
+- Be vague ("communicate well")
+- Only list aspirations
+- Skip the messy details
+
+**Contexts to cover:**
+- Meetings
+- Code/design reviews
+- Planning
+- Communication
+- Hiring
+- Operations/crisis
+
+**Examples:**
+
+**Good:**
+- "In code reviews, we comment on operational complexity and debuggability, not just correctness"
+- "In planning, we ask 'what's the simplest thing that could work?' before discussing optimal solutions"
+- "We say no to features that would compromise reliability, even when customers request them"
+
+**Bad:**
+- "We communicate effectively"
+- "We make good decisions"
+- "We work hard"
+
+### Anti-Patterns
+
+**Purpose**: Explicit boundaries
+
+**Include:**
+- Common temptations you resist
+- Industry practices you reject
+- Shortcuts you don't take
+- 3-5 clear anti-patterns
+
+**Format:**
+```
+✗ {Specific behavior} - even when {tempting situation}
+  Because: {which value it violates}
+```
+
+**Examples:**
+
+**Good:**
+- "✗ We don't add features without talking to users first - even when executives request them. Because: User delight > internal opinions"
+- "✗ We don't skip writing tests to ship faster - even when deadline pressure is high. Because: Reliability > shipping fast"
+
+**Bad:**
+- "✗ We don't do bad things"
+- "✗ We avoid poor quality"
+
+## Quality Checklist
+
+Before finalizing, verify:
+
+- [ ] North Star is memorable (could team recite it?)
+- [ ] Values are specific to this team (not generic)
+- [ ] Each value includes a trade-off
+- [ ] Decision tenets address real dilemmas
+- [ ] Behaviors are observable (not abstract)
+- [ ] Someone could make a decision using this TODAY
+- [ ] Anti-patterns are specific
+- [ ] No contradictions between sections
+- [ ] Total length is 1-2 pages (concise)
+- [ ] Language is clear and jargon-free
+
+## Common Patterns by Team Type
+
+### Engineering Team
+**Focus on:**
+- Technical trade-offs (simplicity, performance, reliability)
+- Operational philosophy
+- Code quality standards
+- On-call and incident response
+- Technical debt management
+
+**Example values:**
+- Simplicity over cleverness
+- Reliability over features
+- Developer experience matters
+
+### Product Team
+**Focus on:**
+- User/customer value
+- Feature prioritization
+- Quality bar
+- Product-market fit assumptions
+- Launch criteria
+
+**Example values:**
+- User delight over feature count
+- Solving real problems over building cool tech
+- Data-informed over opinion-driven
+
+### Sales/Customer-Facing
+**Focus on:**
+- Customer relationships
+- Deal qualification
+- Success metrics
+- Communication style
+
+**Example values:**
+- Long-term relationships over short-term revenue
+- Customer success over sales quotas
+- Honesty even when it costs a deal
+
+### Leadership Team
+**Focus on:**
+- Strategic priorities
+- Resource allocation
+- Decision-making process
+- Communication norms
+
+**Example values:**
+- Transparency by default
+- Disagree and commit
+- Long-term value over short-term metrics
+
+## Rollout & Socialization
+
+**Week 1: Draft**
+- Leadership creates draft
+- Test against recent real decisions
+- Revise based on applicability
+
+**Week 2-3: Feedback**
+- Share with team for input
+- Hold working session to discuss
+- Incorporate feedback
+- Ensure team authorship, not just leadership
+
+**Week 4: Launch**
+- Publish finalized version
+- Present at all-hands
+- Explain rationale and examples
+- Share how to use in daily work
+
+**Ongoing:**
+- Reference in decision-making
+- Include in onboarding
+- Use in hiring interviews
+- Revisit quarterly, revise annually
+- Celebrate examples of values in action
+
+## Anti-Patterns to Avoid
+
+**Vague North Star:**
+- Bad: "Be the best company"
+- Good: "Build developer tools that eliminate toil"
+
+**Generic values:**
+- Bad: "Integrity, Excellence, Innovation"
+- Good: "Simplicity over cleverness, User delight over feature count"
+
+**No trade-offs:**
+- Bad: "Quality is important to us"
+- Good: "We optimize for reliability over shipping speed, accepting slower feature velocity"
+
+**Unmemorable length:**
+- Bad: 10 pages of values, tenets, behaviors
+- Good: 1-2 pages that people can actually remember
+
+**Top-down only:**
+- Bad: Leadership writes values, announces them
+- Good: Collaborative process with team input and ownership
+
+**Set and forget:**
+- Bad: Write values in 2020, never revisit
+- Good: Annual review, update as team evolves
--- a/skills/bayesian-reasoning-calibration/SKILL.md
+++ b/skills/bayesian-reasoning-calibration/SKILL.md
@@ -0,0 +1,182 @@
+---
+name: bayesian-reasoning-calibration
+description: Use when making predictions or judgments under uncertainty and need to explicitly update beliefs with new evidence. Invoke when forecasting outcomes, evaluating probabilities, testing hypotheses, calibrating confidence, assessing risks with uncertain data, or avoiding overconfidence bias. Use when user mentions priors, likelihoods, Bayes theorem, probability updates, forecasting, calibration, or belief revision.
+---
+
+# Bayesian Reasoning & Calibration
+
+## Table of Contents
+
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [What is Bayesian Reasoning?](#what-is-bayesian-reasoning)
+- [Workflow](#workflow)
+  - [1. Define the Question](#1--define-the-question)
+  - [2. Establish Prior Beliefs](#2--establish-prior-beliefs)
+  - [3. Identify Evidence & Likelihoods](#3--identify-evidence--likelihoods)
+  - [4. Calculate Posterior](#4--calculate-posterior)
+  - [5. Calibrate & Document](#5--calibrate--document)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Apply Bayesian reasoning to systematically update probability estimates as new evidence arrives. This helps make better forecasts, avoid overconfidence, and explicitly show how beliefs should change with data.
+
+## When to Use This Skill
+
+- Making forecasts or predictions with uncertainty
+- Updating beliefs when new evidence emerges
+- Calibrating confidence in estimates
+- Testing hypotheses with imperfect data
+- Evaluating risks with incomplete information
+- Avoiding anchoring and overconfidence biases
+- Making decisions under uncertainty
+- Comparing multiple competing explanations
+- Assessing diagnostic test results
+- Forecasting project outcomes with new data
+
+**Trigger phrases:** "What's the probability", "update my belief", "how confident", "forecast", "prior probability", "likelihood", "Bayes", "calibration", "base rate", "posterior probability"
+
+## What is Bayesian Reasoning?
+
+A systematic way to update probability estimates using Bayes' Theorem:
+
+**P(H|E) = P(E|H) × P(H) / P(E)**
+
+Where:
+- **P(H)** = Prior: Probability of hypothesis before seeing evidence
+- **P(E|H)** = Likelihood: Probability of evidence if hypothesis is true
+- **P(E|¬H)** = Probability of evidence if hypothesis is false
+- **P(H|E)** = Posterior: Updated probability after seeing evidence
+
+**Quick Example:**
+
+```markdown
+# Should we launch Feature X?
+
+## Prior Belief
+Before beta testing: 60% chance of adoption >20%
+- Base rate: Similar features get 15-25% adoption
+- Our feature seems stronger than average
+- Prior: 60%
+
+## New Evidence
+Beta test: 35% of users adopted (70 of 200 users)
+
+## Likelihoods
+If true adoption is >20%:
+- P(seeing 35% in beta | adoption >20%) = 75% (likely to see high beta if true)
+
+If true adoption is ≤20%:
+- P(seeing 35% in beta | adoption ≤20%) = 15% (unlikely to see high beta if false)
+
+## Bayesian Update
+Posterior = (75% × 60%) / [(75% × 60%) + (15% × 40%)]
+Posterior = 45% / (45% + 6%) = 88%
+
+## Conclusion
+Updated belief: 88% confident adoption will exceed 20%
+Evidence strongly supports launch, but not certain.
+```
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Bayesian Reasoning Progress:
+- [ ] Step 1: Define the question
+- [ ] Step 2: Establish prior beliefs
+- [ ] Step 3: Identify evidence and likelihoods
+- [ ] Step 4: Calculate posterior
+- [ ] Step 5: Calibrate and document
+```
+
+**Step 1: Define the question**
+
+Clarify hypothesis (specific, testable claim), probability to estimate, timeframe (when outcome is known), success criteria, and why this matters (what decision depends on it). Example: "Product feature will achieve >20% adoption within 3 months" - matters for launch decision.
+
+**Step 2: Establish prior beliefs**
+
+Set initial probability using base rates (general frequency), reference class (similar situations), specific differences, and explicit probability assignment with justification. Good priors are based on base rates, account for differences, honest about uncertainty, and include ranges if unsure (e.g., 40-60%). Avoid purely intuitive priors, ignoring base rates, or extreme values without justification.
+
+**Step 3: Identify evidence and likelihoods**
+
+Assess evidence (specific observation/data), diagnostic power (does it distinguish hypotheses?), P(E|H) (probability if hypothesis TRUE), P(E|¬H) (probability if FALSE), and calculate likelihood ratio = P(E|H) / P(E|¬H). LR > 10 = very strong evidence, 3-10 = moderate, 1-3 = weak, ≈1 = not diagnostic, <1 = evidence against.
+
+**Step 4: Calculate posterior**
+
+Apply Bayes' Theorem: P(H|E) = [P(E|H) × P(H)] / P(E), or use odds form: Posterior Odds = Prior Odds × Likelihood Ratio. Calculate P(E) = P(E|H)×P(H) + P(E|¬H)×P(¬H), get posterior probability, and interpret change. For simple cases → Use `resources/template.md` calculator. For complex cases (multiple hypotheses) → Study `resources/methodology.md`.
+
+**Step 5: Calibrate and document**
+
+Check calibration (over/underconfident?), validate assumptions (are likelihoods reasonable?), perform sensitivity analysis, create `bayesian-reasoning-calibration.md`, and note limitations. Self-check using `resources/evaluators/rubric_bayesian_reasoning_calibration.json`: verify prior based on base rates, likelihoods justified, evidence diagnostic (LR ≠ 1), calculation correct, posterior calibrated, assumptions stated, sensitivity noted. Minimum standard: Score ≥ 3.5.
+
+## Common Patterns
+
+**For forecasting:**
+- Use base rates as starting point
+- Update incrementally as evidence arrives
+- Track forecast accuracy over time
+- Calibrate by comparing predictions to outcomes
+
+**For hypothesis testing:**
+- State competing hypotheses explicitly
+- Calculate likelihood ratio for evidence
+- Update belief proportionally to evidence strength
+- Don't claim certainty unless LR is extreme
+
+**For risk assessment:**
+- Consider multiple scenarios (not just binary)
+- Update risks as new data arrives
+- Use ranges when uncertain about likelihoods
+- Perform sensitivity analysis
+
+**For avoiding bias:**
+- Force explicit priors (prevents anchoring to evidence)
+- Use reference classes (prevents ignoring base rates)
+- Calculate mathematically (prevents motivated reasoning)
+- Document before seeing outcome (enables calibration)
+
+## Guardrails
+
+**Do:**
+- State priors explicitly before seeing all evidence
+- Use base rates and reference classes
+- Estimate likelihoods with justification
+- Update incrementally as evidence arrives
+- Be honest about uncertainty
+- Perform sensitivity analysis
+- Track forecasts for calibration
+- Acknowledge limits of the model
+
+**Don't:**
+- Use extreme priors (1%, 99%) without exceptional justification
+- Ignore base rates (common bias)
+- Treat all evidence as equally diagnostic
+- Update to 100% certainty (almost never justified)
+- Cherry-pick evidence
+- Skip documenting reasoning
+- Forget to calibrate (compare predictions to outcomes)
+- Apply to questions where probability is meaningless
+
+## Quick Reference
+
+- **Standard template**: `resources/template.md`
+- **Multiple hypotheses**: `resources/methodology.md`
+- **Examples**: `resources/examples/product-launch.md`, `resources/examples/medical-diagnosis.md`
+- **Quality rubric**: `resources/evaluators/rubric_bayesian_reasoning_calibration.json`
+
+**Bayesian Formula (Odds Form)**:
+```
+Posterior Odds = Prior Odds × Likelihood Ratio
+```
+
+**Likelihood Ratio**:
+```
+LR = P(Evidence | Hypothesis True) / P(Evidence | Hypothesis False)
+```
+
+**Output naming**: `bayesian-reasoning-calibration.md` or `{topic}-forecast.md`
--- a/skills/bayesian-reasoning-calibration/resources/evaluators/rubric_bayesian_reasoning_calibration.json
+++ b/skills/bayesian-reasoning-calibration/resources/evaluators/rubric_bayesian_reasoning_calibration.json
@@ -0,0 +1,135 @@
+{
+  "name": "Bayesian Reasoning Quality Rubric",
+  "scale": {
+    "min": 1,
+    "max": 5,
+    "description": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent"
+  },
+  "criteria": [
+    {
+      "name": "Prior Quality",
+      "description": "Prior is based on base rates and reference classes, not just intuition",
+      "scoring": {
+        "1": "No prior stated or purely intuitive guess",
+        "2": "Prior stated but ignores base rates entirely",
+        "3": "Prior considers base rates with some adjustment",
+        "4": "Prior well-grounded in base rates with justified adjustments",
+        "5": "Exceptional prior with multiple reference classes and clear reasoning"
+      }
+    },
+    {
+      "name": "Likelihood Justification",
+      "description": "Likelihoods P(E|H) and P(E|¬H) are estimated with clear reasoning",
+      "scoring": {
+        "1": "No likelihoods or purely guessed",
+        "2": "Likelihoods given but no justification",
+        "3": "Likelihoods have basic reasoning",
+        "4": "Likelihoods well-justified with clear logic",
+        "5": "Exceptional likelihood estimates with empirical grounding or detailed reasoning"
+      }
+    },
+    {
+      "name": "Evidence Diagnosticity",
+      "description": "Evidence meaningfully distinguishes between hypotheses (LR ≠ 1)",
+      "scoring": {
+        "1": "Evidence is not diagnostic at all (LR ≈ 1)",
+        "2": "Evidence is weakly diagnostic (LR = 1-2)",
+        "3": "Evidence is moderately diagnostic (LR = 2-5)",
+        "4": "Evidence is strongly diagnostic (LR = 5-10)",
+        "5": "Evidence is very strongly diagnostic (LR > 10)"
+      }
+    },
+    {
+      "name": "Calculation Correctness",
+      "description": "Bayesian calculation is mathematically correct",
+      "scoring": {
+        "1": "Major calculation errors",
+        "2": "Some calculation errors",
+        "3": "Calculation is correct with minor issues",
+        "4": "Calculation is fully correct",
+        "5": "Perfect calculation with both probability and odds forms shown"
+      }
+    },
+    {
+      "name": "Calibration & Realism",
+      "description": "Posterior is calibrated, not overconfident (avoids extremes without justification)",
+      "scoring": {
+        "1": "Posterior is 0% or 100% without extreme evidence",
+        "2": "Posterior is very extreme (>95% or <5%) with weak evidence",
+        "3": "Posterior is reasonable but might be slightly overconfident",
+        "4": "Well-calibrated posterior with appropriate uncertainty",
+        "5": "Exceptional calibration with explicit confidence bounds"
+      }
+    },
+    {
+      "name": "Assumption Transparency",
+      "description": "Key assumptions and limitations are stated explicitly",
+      "scoring": {
+        "1": "No assumptions stated",
+        "2": "Few assumptions mentioned vaguely",
+        "3": "Key assumptions stated",
+        "4": "Comprehensive assumption documentation",
+        "5": "Exceptional transparency with sensitivity analysis showing assumption impact"
+      }
+    },
+    {
+      "name": "Base Rate Usage",
+      "description": "Analysis uses base rates appropriately (avoids base rate neglect)",
+      "scoring": {
+        "1": "Completely ignores base rates",
+        "2": "Acknowledges base rates but doesn't use them",
+        "3": "Uses base rates for prior",
+        "4": "Properly incorporates base rates with adjustments",
+        "5": "Exceptional use of multiple base rates and reference classes"
+      }
+    },
+    {
+      "name": "Sensitivity Analysis",
+      "description": "Tests how sensitive conclusion is to input assumptions",
+      "scoring": {
+        "1": "No sensitivity analysis",
+        "2": "Minimal sensitivity check",
+        "3": "Basic sensitivity analysis on key inputs",
+        "4": "Comprehensive sensitivity analysis",
+        "5": "Exceptional sensitivity analysis showing robustness or fragility clearly"
+      }
+    },
+    {
+      "name": "Interpretation Quality",
+      "description": "Posterior is interpreted correctly with decision implications",
+      "scoring": {
+        "1": "Misinterprets posterior or no interpretation",
+        "2": "Basic interpretation but lacks context",
+        "3": "Good interpretation with some decision guidance",
+        "4": "Clear interpretation with actionable decision implications",
+        "5": "Exceptional interpretation linking probability to specific actions and thresholds"
+      }
+    },
+    {
+      "name": "Avoidance of Common Errors",
+      "description": "Avoids prosecutor's fallacy, base rate neglect, and other Bayesian errors",
+      "scoring": {
+        "1": "Multiple major errors (confusing P(E|H) with P(H|E), ignoring base rates)",
+        "2": "One major error present",
+        "3": "Mostly avoids common errors",
+        "4": "Cleanly avoids all common errors",
+        "5": "Exceptional awareness with explicit checks against common errors"
+      }
+    }
+  ],
+  "overall_assessment": {
+    "thresholds": {
+      "excellent": "Average score ≥ 4.5 (publication quality)",
+      "very_good": "Average score ≥ 4.0 (most forecasts should aim for this)",
+      "good": "Average score ≥ 3.5 (minimum for important decisions)",
+      "acceptable": "Average score ≥ 3.0 (workable for low-stakes predictions)",
+      "needs_rework": "Average score < 3.0 (redo before using)"
+    },
+    "stakes_guidance": {
+      "low_stakes": "Personal predictions, low-cost decisions: aim for ≥ 3.0",
+      "medium_stakes": "Business decisions, moderate cost: aim for ≥ 3.5",
+      "high_stakes": "Major decisions, high cost of error: aim for ≥ 4.0"
+    }
+  },
+  "usage_instructions": "Rate each criterion on 1-5 scale. Calculate average. For important forecasts or decisions, minimum score is 3.5. For high-stakes decisions where cost of error is high, aim for ≥4.0. Check especially for base rate neglect, prosecutor's fallacy, and overconfidence - these are the most common errors."
+}
--- a/skills/bayesian-reasoning-calibration/resources/examples/product-launch.md
+++ b/skills/bayesian-reasoning-calibration/resources/examples/product-launch.md
@@ -0,0 +1,319 @@
+# Bayesian Analysis: Feature Adoption Forecast
+
+## Question
+
+**Hypothesis**: New sharing feature will achieve >20% adoption within 3 months of launch
+
+**Estimating**: P(adoption >20%)
+
+**Timeframe**: 3 months post-launch (results measured at month 3)
+
+**Matters because**: Need 20% adoption to justify ongoing development investment. Below 20%, we should sunset the feature and reallocate resources.
+
+---
+
+## Prior Belief (Before Evidence)
+
+### Base Rate
+
+What's the general frequency of similar features achieving >20% adoption?
+
+- **Reference class**: Previous features we've launched in this product category
+- **Historical data**:
+  - Last 8 features launched: 5 achieved >20% adoption (62.5%)
+  - Industry benchmarks: Social sharing features average 15-25% adoption
+  - Our product has higher engagement than average
+- **Base rate**: 60%
+
+### Adjustments
+
+How is this case different from the base rate?
+
+- **Factor 1: Feature complexity** - This feature is simpler than average (+5%)
+  - Previous successful features averaged 3 steps to use
+  - This feature is 1-click sharing
+  - Simpler features historically perform better
+
+- **Factor 2: Market timing** - Competitive pressure is high (-10%)
+  - Two competitors launched similar features 6 months ago
+  - Early adopters may have already switched to competitors
+  - Late-to-market features typically see 15-20% lower adoption
+
+- **Factor 3: User research signals** - Strong user request (+10%)
+  - Feature was #2 most requested in last user survey (450 responses)
+  - 72% said they would use it "frequently" or "very frequently"
+  - Strong stated intent typically correlates with 40-60% actual usage
+
+### Prior Probability
+
+**P(H) = 65%**
+
+**Justification**: Starting from 60% base rate, adjusted upward for simplicity (+5%) and strong user signals (+10%), adjusted down for late market entry (-10%). Net effect: 65% prior confidence that adoption will exceed 20%.
+
+**Range if uncertain**: 55% to 75% (accounting for uncertainty in adjustment factors)
+
+---
+
+## Evidence
+
+**What was observed**: Beta test with 200 users showed 35% adoption (70 users actively used feature)
+
+**How diagnostic**: This is moderately to strongly diagnostic evidence. Beta tests often show higher engagement than production (selection bias), but 35% is meaningfully above our 20% threshold. The question is whether this beta performance predicts production performance.
+
+### Likelihoods
+
+**P(E|H) = 75%** - Probability of seeing 35% beta adoption IF true production adoption will be >20%
+
+**Reasoning**:
+- If production adoption will be >20%, beta should show higher (beta users are early adopters)
+- Typical pattern: beta adoption is 1.5-2x production adoption for engaged features
+- If production will be 22%, beta would likely be 33-44% → 35% fits this well
+- If production will be 25%, beta would likely be 38-50% → 35% is on lower end but plausible
+- 75% accounts for variance and beta-to-production conversion uncertainty
+
+**P(E|¬H) = 15%** - Probability of seeing 35% beta adoption IF true production adoption will be ≤20%
+
+**Reasoning**:
+- If production adoption will be ≤20% (say, 15%), beta would typically be 22-30%
+- Seeing 35% beta when production will be ≤20% would require unusual beta-to-production drop
+- This could happen (beta selection bias, novelty effect wears off), but is uncommon
+- 15% reflects that this scenario is possible but unlikely
+
+**Likelihood Ratio = 75% / 15% = 5.0**
+
+**Interpretation**: Evidence is moderately strong. A 35% beta result is 5 times more likely if production adoption will exceed 20% than if it won't. This is meaningful but not overwhelming evidence.
+
+---
+
+## Bayesian Update
+
+### Calculation
+
+**Using odds form** (simpler for this case):
+
+```
+Prior Odds = P(H) / P(¬H) = 65% / 35% = 1.86
+
+Likelihood Ratio = 5.0
+
+Posterior Odds = Prior Odds × LR = 1.86 × 5.0 = 9.3
+
+Posterior Probability = Posterior Odds / (1 + Posterior Odds)
+                      = 9.3 / 10.3
+                      = 90.3%
+```
+
+**Verification using probability form**:
+
+```
+P(E) = [P(E|H) × P(H)] + [P(E|¬H) × P(¬H)]
+P(E) = [75% × 65%] + [15% × 35%]
+P(E) = 48.75% + 5.25% = 54%
+
+P(H|E) = [P(E|H) × P(H)] / P(E)
+P(H|E) = [75% × 65%] / 54%
+P(H|E) = 48.75% / 54% = 90.3%
+```
+
+### Posterior Probability
+
+**P(H|E) = 90%**
+
+### Change in Belief
+
+- **Prior**: 65%
+- **Posterior**: 90%
+- **Change**: +25 percentage points
+- **Interpretation**: Evidence strongly supports hypothesis. Beta test results meaningfully increased confidence that production adoption will exceed 20%.
+
+---
+
+## Sensitivity Analysis
+
+**How sensitive is posterior to inputs?**
+
+### If Prior was different:
+
+| Prior | Posterior | Note |
+|-------|-----------|------|
+| 50% | 83% | Even starting at coin-flip, evidence pushes to high confidence |
+| 75% | 94% | Higher prior → very high posterior |
+| 40% | 77% | Lower prior → still high confidence |
+
+**Finding**: Posterior is somewhat robust. Evidence is strong enough that even with priors ranging from 40-75%, posterior stays in 77-94% range.
+
+### If P(E|H) was different:
+
+| P(E\|H) | LR | Posterior | Note |
+|---------|-----|-----------|------|
+| 60% | 4.0 | 87% | Less diagnostic evidence → still high confidence |
+| 85% | 5.67 | 92% | More diagnostic evidence → very high confidence |
+| 50% | 3.33 | 82% | Weaker evidence → moderate-high confidence |
+
+**Finding**: Posterior is moderately sensitive to P(E|H), but stays above 80% across plausible range.
+
+### If P(E|¬H) was different:
+
+| P(E\|¬H) | LR | Posterior | Note |
+|----------|-----|-----------|------|
+| 25% | 3.0 | 84% | Less diagnostic → still high confidence |
+| 10% | 7.5 | 94% | More diagnostic → very high confidence |
+| 30% | 2.5 | 80% | Weak evidence → moderate confidence |
+
+**Finding**: Posterior is sensitive to P(E|¬H). If beta-to-production drop is common (higher P(E|¬H)), confidence decreases meaningfully.
+
+**Robustness**: Conclusion is **moderately robust**. Across reasonable input ranges, posterior stays above 77%, supporting launch decision. Most sensitive to assumption about beta-to-production conversion rates.
+
+---
+
+## Calibration Check
+
+**Am I overconfident?**
+
+- **Did I anchor on initial belief?**
+  - No - prior (65%) was based on base rates, not arbitrary
+  - Evidence substantially moved belief (+25pp)
+  - Not stuck at starting point
+
+- **Did I ignore base rates?**
+  - No - explicitly used historical feature adoption (60%) as starting point
+  - Adjusted for known differences systematically
+
+- **Is my posterior extreme (>90% or <10%)?**
+  - Yes - 90% is borderline extreme
+  - **Check**: Is evidence truly that strong?
+    - LR = 5.0 is moderately strong (not very strong)
+    - Prior was already high (65%)
+    - Combination pushes to 90%
+  - **Concern**: May be slightly overconfident
+  - **Adjustment**: Consider reporting as 85-90% range rather than point estimate
+
+- **Would an outside observer agree with my likelihoods?**
+  - P(E|H) = 75%: Reasonable - beta users are engaged, expect higher than production
+  - P(E|¬H) = 15%: Potentially optimistic - beta selection bias could be stronger
+  - **Alternative**: If P(E|¬H) = 25%, posterior drops to 84% (more conservative)
+
+**Red flags**:
+- ✓ Posterior is not 100% or 0%
+- ✓ Update magnitude (25pp) matches evidence strength (LR=5.0)
+- ✓ Prior uses base rates
+- ⚠ Posterior is at upper end (90%) - consider uncertainty range
+
+**Calibration adjustment**: Report as 85-90% confidence range to account for uncertainty in likelihoods.
+
+---
+
+## Limitations & Assumptions
+
+**Key assumptions**:
+
+1. **Beta users are representative of broader user base**
+   - Assumption: Beta users are 1.5-2x more engaged than average
+   - Risk: If beta users are much more engaged (3x), production adoption could be lower
+   - Impact: Could invalidate high posterior
+
+2. **No major bugs or UX issues in production**
+   - Assumption: Production experience will match beta experience
+   - Risk: Unforeseen technical issues could crater adoption
+   - Impact: Would make evidence misleading
+
+3. **Competitive landscape stays stable**
+   - Assumption: No major competitor moves in next 3 months
+   - Risk: Competitor could launch superior version
+   - Impact: Could reduce adoption below 20% despite strong beta
+
+4. **Beta sample size is sufficient (n=200)**
+   - Assumption: 200 users is enough to estimate adoption
+   - Confidence interval: 35% ± 6.6% at 95% CI
+   - Impact: True beta adoption could be 28-42%, adding uncertainty
+
+**What could invalidate this analysis**:
+
+- **Major product changes**: If we significantly alter the feature post-beta, beta results become less predictive
+- **Different user segment**: If we launch to a different user segment than beta testers, adoption patterns may differ
+- **Seasonal effects**: If beta ran during high-engagement season and launch is during low season
+- **Discovery/onboarding issues**: If users don't discover the feature in production (beta users were explicitly invited)
+
+**Uncertainty**:
+
+- **Most uncertain about**: P(E|¬H) = 15% - How often do features with ≤20% production adoption show 35% beta adoption?
+  - This is the key assumption
+  - If this is actually 25-30%, posterior drops to 80-84%
+  - Recommendation: Review historical beta-to-production conversion data
+
+- **Could be wrong if**:
+  - Beta users are much more engaged than typical users (>2x multiplier)
+  - Novelty effect in beta wears off quickly in production
+  - Production launch has poor discoverability/onboarding
+
+---
+
+## Decision Implications
+
+**Given posterior of 90% (range: 85-90%)**:
+
+**Recommended action**: **Proceed with launch** with monitoring plan
+
+**Rationale**:
+- 90% confidence exceeds decision threshold for feature launches
+- Even conservative estimate (85%) supports launch
+- Risk of failure (<20% adoption) is only 10-15%
+- Cost of being wrong: Wasted 3 months of development effort
+- Cost of not launching: Missing potential high-adoption feature
+
+**If decision threshold is**:
+
+- **High confidence needed (>80%)**: ✅ **LAUNCH** - Exceeds threshold, proceed with production rollout
+
+- **Medium confidence (>60%)**: ✅ **LAUNCH** - Well above threshold, strong conviction
+
+- **Low bar (>40%)**: ✅ **LAUNCH** - Far exceeds minimum threshold
+
+**Monitoring plan** (to validate forecast):
+
+1. **Week 1**: Check if adoption is on track for >6% (20% / 3 months, assuming linear growth)
+   - If <4%: Red flag, investigate onboarding/discovery issues
+   - If >8%: Exceeding expectations, validate data quality
+
+2. **Month 1**: Check if adoption is trending toward >10%
+   - If <7%: Update forecast downward, consider intervention
+   - If >13%: Exceeding expectations, high confidence
+
+3. **Month 3**: Measure final adoption
+   - If <20%: Analyze what went wrong, calibrate future forecasts
+   - If >20%: Validate forecast accuracy, update priors for future features
+
+**Next evidence to gather**:
+
+- **Historical beta-to-production conversion rates**: Review last 5-10 feature launches to calibrate P(E|¬H) more accurately
+- **User segment analysis**: Compare beta user demographics to production user base
+- **Competitive feature adoption**: Check competitors' sharing feature adoption rates
+- **Early production data**: After 1 week of production, use actual adoption data for next Bayesian update
+
+**What would change our mind**:
+
+- **Week 1 adoption <3%**: Would update posterior down to ~60%, trigger investigation
+- **Competitor launches superior feature**: Would need to recalculate with new competitive landscape
+- **Discovery of major beta sampling bias**: If beta users are 5x more engaged, would significantly reduce confidence
+
+---
+
+## Meta: Forecast Quality Assessment
+
+Using rubric from `rubric_bayesian_reasoning_calibration.json`:
+
+**Self-assessment**:
+- Prior Quality: 4/5 (good base rate usage, clear adjustments)
+- Likelihood Justification: 4/5 (clear reasoning, could use more empirical data)
+- Evidence Diagnosticity: 4/5 (LR=5.0 is moderately strong)
+- Calculation Correctness: 5/5 (verified with both odds and probability forms)
+- Calibration & Realism: 3/5 (posterior is 90%, borderline extreme, flagged for review)
+- Assumption Transparency: 4/5 (key assumptions stated clearly)
+- Base Rate Usage: 5/5 (explicit base rate from historical data)
+- Sensitivity Analysis: 4/5 (comprehensive sensitivity checks)
+- Interpretation Quality: 4/5 (clear decision implications with thresholds)
+- Avoidance of Common Errors: 4/5 (no prosecutor's fallacy, proper base rates)
+
+**Average: 4.1/5** - Meets "very good" threshold for medium-stakes decision
+
+**Decision**: Forecast is sufficiently rigorous for feature launch decision (medium stakes). Primary area for improvement: gather more data on beta-to-production conversion to refine P(E|¬H) estimate.
--- a/skills/bayesian-reasoning-calibration/resources/methodology.md
+++ b/skills/bayesian-reasoning-calibration/resources/methodology.md
@@ -0,0 +1,437 @@
+# Bayesian Reasoning & Calibration Methodology
+
+## Bayesian Reasoning Workflow
+
+Copy this checklist and track your progress:
+
+```
+Bayesian Reasoning Progress:
+- [ ] Step 1: Define hypotheses and assign priors
+- [ ] Step 2: Assign likelihoods for each evidence-hypothesis pair
+- [ ] Step 3: Compute posteriors and update sequentially
+- [ ] Step 4: Check for dependent evidence and adjust
+- [ ] Step 5: Validate calibration and check for bias
+```
+
+**Step 1: Define hypotheses and assign priors**
+
+List all competing hypotheses (including catch-all "Other") and assign prior probabilities that sum to 1.0. See [Multiple Hypothesis Updates](#multiple-hypothesis-updates) for structuring priors with justifications.
+
+**Step 2: Assign likelihoods for each evidence-hypothesis pair**
+
+For each hypothesis, define P(Evidence|Hypothesis) based on how likely the evidence would be if that hypothesis were true. See [Multiple Hypothesis Updates](#multiple-hypothesis-updates) for likelihood assessment techniques.
+
+**Step 3: Compute posteriors and update sequentially**
+
+Calculate posterior probabilities using Bayes' theorem, then chain updates for sequential evidence. See [Sequential Evidence Updates](#sequential-evidence-updates) for multi-stage updating process.
+
+**Step 4: Check for dependent evidence and adjust**
+
+Identify whether evidence items are independent or correlated, and use conditional likelihoods if needed. See [Handling Dependent Evidence](#handling-dependent-evidence) for dependence detection and correction.
+
+**Step 5: Validate calibration and check for bias**
+
+Track forecasts over time and compare stated probabilities to actual outcomes using calibration curves and Brier scores. See [Calibration Techniques](#calibration-techniques) for debiasing methods and metrics.
+
+---
+
+## Multiple Hypothesis Updates
+
+### Problem: Choosing Among Many Hypotheses
+
+Often you have 3+ competing hypotheses and need to update all simultaneously.
+
+**Example:**
+- H1: Bug in payment processor code
+- H2: Database connection timeout
+- H3: Third-party API outage
+- H4: DDoS attack
+
+### Approach: Compute Posterior for Each Hypothesis
+
+**Step 1: Assign prior probabilities** (must sum to 1)
+
+| Hypothesis | Prior P(H) | Justification |
+|------------|-----------|---------------|
+| H1: Payment bug | 0.30 | Common issue, recent deploy |
+| H2: DB timeout | 0.25 | Has happened before |
+| H3: API outage | 0.20 | Dependency on external service |
+| H4: DDoS | 0.10 | Rare but possible |
+| H5: Other | 0.15 | Catch-all for unknowns |
+| **Total** | **1.00** | Must sum to 1 |
+
+**Step 2: Define likelihood for each hypothesis**
+
+Evidence E: "500 errors only on payment endpoint"
+
+| Hypothesis | P(E\|H) | Justification |
+|------------|---------|---------------|
+| H1: Payment bug | 0.80 | Bug would affect payment specifically |
+| H2: DB timeout | 0.30 | Would affect multiple endpoints |
+| H3: API outage | 0.70 | Payment uses external API |
+| H4: DDoS | 0.50 | Could target any endpoint |
+| H5: Other | 0.20 | Generic catch-all |
+
+**Step 3: Compute P(E)** (marginal probability)
+
+```
+P(E) = Σ [P(E|Hi) × P(Hi)] for all hypotheses
+
+P(E) = (0.80 × 0.30) + (0.30 × 0.25) + (0.70 × 0.20) + (0.50 × 0.10) + (0.20 × 0.15)
+P(E) = 0.24 + 0.075 + 0.14 + 0.05 + 0.03
+P(E) = 0.535
+```
+
+**Step 4: Compute posterior for each hypothesis**
+
+```
+P(Hi|E) = [P(E|Hi) × P(Hi)] / P(E)
+
+P(H1|E) = (0.80 × 0.30) / 0.535 = 0.24 / 0.535 = 0.449 (44.9%)
+P(H2|E) = (0.30 × 0.25) / 0.535 = 0.075 / 0.535 = 0.140 (14.0%)
+P(H3|E) = (0.70 × 0.20) / 0.535 = 0.14 / 0.535 = 0.262 (26.2%)
+P(H4|E) = (0.50 × 0.10) / 0.535 = 0.05 / 0.535 = 0.093 (9.3%)
+P(H5|E) = (0.20 × 0.15) / 0.535 = 0.03 / 0.535 = 0.056 (5.6%)
+
+Total: 100% (check: posteriors must sum to 1)
+```
+
+**Interpretation:**
+- H1 (Payment bug): 30% → 44.9% (increased 15 pp)
+- H3 (API outage): 20% → 26.2% (increased 6 pp)
+- H2 (DB timeout): 25% → 14.0% (decreased 11 pp)
+- H4 (DDoS): 10% → 9.3% (barely changed)
+
+**Decision**: Investigate H1 (payment bug) first, then H3 (API outage) as backup.
+
+---
+
+## Sequential Evidence Updates
+
+### Problem: Multiple Pieces of Evidence Over Time
+
+Evidence comes in stages, need to update belief sequentially.
+
+**Example:**
+- Evidence 1: "500 errors on payment endpoint" (t=0)
+- Evidence 2: "External API status page shows outage" (t+5 min)
+- Evidence 3: "Our other services using same API also failing" (t+10 min)
+
+### Approach: Chain Updates (Prior → E1 → E2 → E3)
+
+**Step 1: Update with E1** (as above)
+```
+Prior → P(H|E1)
+P(H1|E1) = 44.9% (payment bug)
+P(H3|E1) = 26.2% (API outage)
+```
+
+**Step 2: Use posterior as new prior, update with E2**
+
+Evidence E2: "External API status page shows outage"
+
+New prior (from E1 posterior):
+- P(H1) = 0.449 (payment bug)
+- P(H3) = 0.262 (API outage)
+
+Likelihoods given E2:
+- P(E2|H1) = 0.20 (bug wouldn't cause external API status change)
+- P(E2|H3) = 0.95 (API outage would definitely show on status page)
+
+```
+P(E2) = (0.20 × 0.449) + (0.95 × 0.262) = 0.0898 + 0.2489 = 0.3387
+
+P(H1|E1,E2) = (0.20 × 0.449) / 0.3387 = 0.265 (26.5%)
+P(H3|E1,E2) = (0.95 × 0.262) / 0.3387 = 0.698 (69.8%)
+```
+
+**Interpretation**: E2 strongly favors H3 (API outage). H1 dropped from 44.9% → 26.5%.
+
+**Step 3: Update with E3**
+
+Evidence E3: "Other services using same API also failing"
+
+New prior:
+- P(H1) = 0.265
+- P(H3) = 0.698
+
+Likelihoods:
+- P(E3|H1) = 0.10 (payment bug wouldn't affect other services)
+- P(E3|H3) = 0.99 (API outage would affect all services)
+
+```
+P(E3) = (0.10 × 0.265) + (0.99 × 0.698) = 0.0265 + 0.6910 = 0.7175
+
+P(H1|E1,E2,E3) = (0.10 × 0.265) / 0.7175 = 0.037 (3.7%)
+P(H3|E1,E2,E3) = (0.99 × 0.698) / 0.7175 = 0.963 (96.3%)
+```
+
+**Final conclusion**: 96.3% confidence it's API outage, not payment bug.
+
+**Summary of belief evolution:**
+```
+        Prior   After E1   After E2   After E3
+H1 (Bug):  30%  →  44.9%  →  26.5%  →   3.7%
+H3 (API):  20%  →  26.2%  →  69.8%  →  96.3%
+```
+
+---
+
+## Handling Dependent Evidence
+
+### Problem: Evidence Items Not Independent
+
+**Naive approach fails when**:
+- E1 and E2 are correlated (not independent)
+- Updating twice with same information
+
+**Example of dependent evidence:**
+- E1: "User reports payment failing"
+- E2: "Another user reports payment failing"
+
+If E1 and E2 are from same incident, they're not independent evidence!
+
+### Solution 1: Treat as Single Evidence
+
+If evidence is dependent, combine into one update:
+
+**Instead of:**
+- Update with E1: "User reports payment failing"
+- Update with E2: "Another user reports payment failing"
+
+**Do:**
+- Single update with E: "Multiple users report payment failing"
+
+Likelihood:
+- P(E|Bug) = 0.90 (if bug exists, multiple users affected)
+- P(E|No bug) = 0.05 (false reports rare)
+
+### Solution 2: Conditional Likelihoods
+
+If evidence is conditionally dependent (E2 depends on E1), use:
+
+```
+P(H|E1,E2) ∝ P(E2|E1,H) × P(E1|H) × P(H)
+```
+
+Not:
+```
+P(H|E1,E2) ≠ P(E2|H) × P(E1|H) × P(H)  ← Assumes independence
+```
+
+**Example:**
+- E1: "Test fails on staging"
+- E2: "Test fails on production" (same test, likely same cause)
+
+Conditional:
+- P(E2|E1, Bug) = 0.95 (if staging failed due to bug, production likely fails too)
+- P(E2|E1, Env) = 0.20 (if staging failed due to environment, production different environment)
+
+### Red Flags for Dependent Evidence
+
+Watch out for:
+- Multiple reports of same incident (count as one)
+- Cascading failures (downstream failure caused by upstream)
+- Repeated measurements of same thing (not new info)
+- Evidence from same source (correlated errors)
+
+**Principle**: Each update should provide **new information**. If E2 is predictable from E1, it's not independent.
+
+---
+
+## Calibration Techniques
+
+### Problem: Over/Underconfidence Bias
+
+Common patterns:
+- **Overconfidence**: Stating 90% when true rate is 70%
+- **Underconfidence**: Stating 60% when true rate is 80%
+
+### Calibration Check: Track Predictions Over Time
+
+**Method**:
+1. Make many probabilistic forecasts (P=70%, P=40%, etc.)
+2. Track outcomes
+3. Group by confidence level
+4. Compare stated probability to actual frequency
+
+**Example calibration check:**
+
+| Your Confidence | # Predictions | # Correct | Actual % | Calibration |
+|-----------------|---------------|-----------|----------|-------------|
+| 90-100% | 20 | 16 | 80% | Overconfident |
+| 70-89% | 30 | 24 | 80% | Good |
+| 50-69% | 25 | 14 | 56% | Good |
+| 30-49% | 15 | 5 | 33% | Good |
+| 0-29% | 10 | 2 | 20% | Good |
+
+**Interpretation**: Overconfident at high confidence levels (saying 90% but only 80% correct).
+
+### Calibration Curve
+
+Plot stated probability vs actual frequency:
+
+```
+Actual %
+    100 ┤                                    ●
+        │                              ●
+     80 ┤                        ●          (overconfident)
+        │                  ●
+     60 ┤            ●
+        │      ●
+     40 ┤ ●
+        │
+     20 ┤
+        │
+      0 └─────────────────────────────────
+        0    20   40   60   80   100
+              Stated probability %
+
+Perfect calibration = diagonal line
+Above line = overconfident
+Below line = underconfident
+```
+
+### Debiasing Techniques
+
+**For overconfidence:**
+- Consider alternative explanations (how could I be wrong?)
+- Base rate check (what's the typical success rate?)
+- Pre-mortem: "It's 6 months from now and we failed. Why?"
+- Confidence intervals: State range, not point estimate
+
+**For underconfidence:**
+- Review past successes (build evidence for confidence)
+- Test predictions: Am I systematically too cautious?
+- Cost of inaction: What's the cost of waiting for certainty?
+
+### Brier Score (Calibration Metric)
+
+**Formula**:
+```
+Brier Score = (1/n) × Σ (pi - oi)²
+
+pi = stated probability for outcome i
+oi = actual outcome (1 if happened, 0 if not)
+```
+
+**Example:**
+```
+Prediction 1: P=0.8, Outcome=1 → (0.8-1)² = 0.04
+Prediction 2: P=0.6, Outcome=0 → (0.6-0)² = 0.36
+Prediction 3: P=0.9, Outcome=1 → (0.9-1)² = 0.01
+
+Brier Score = (0.04 + 0.36 + 0.01) / 3 = 0.137
+```
+
+**Interpretation:**
+- 0.00 = perfect calibration
+- 0.25 = random guessing
+- Lower is better
+
+**Typical scores:**
+- Expert forecasters: 0.10-0.15
+- Average people: 0.20-0.25
+- Aim for: <0.15
+
+---
+
+## Advanced Applications
+
+### Application 1: Hierarchical Priors
+
+When you're uncertain about the prior itself.
+
+**Example:**
+- Question: "Will project finish on time?"
+- Uncertain about base rate: "Do similar projects finish on time 30% or 60% of the time?"
+
+**Approach**: Model uncertainty in prior
+
+```
+P(On time) = Weighted average of different base rates
+
+Scenario 1: Base rate = 30% (if similar to past failures), Weight = 40%
+Scenario 2: Base rate = 50% (if average project), Weight = 30%
+Scenario 3: Base rate = 70% (if similar to past successes), Weight = 30%
+
+Prior P(On time) = (0.30 × 0.40) + (0.50 × 0.30) + (0.70 × 0.30)
+                  = 0.12 + 0.15 + 0.21
+                  = 0.48 (48%)
+```
+
+Then update this 48% prior with evidence.
+
+### Application 2: Bayesian Model Comparison
+
+Compare which model/theory better explains data.
+
+**Example:**
+- Model A: "Bug in feature X"
+- Model B: "Infrastructure issue"
+
+Evidence: 10 data points
+
+```
+P(Model A|Data) / P(Model B|Data) = [P(Data|Model A) × P(Model A)] / [P(Data|Model B) × P(Model B)]
+```
+
+**Bayes Factor** = P(Data|Model A) / P(Data|Model B)
+
+- BF > 10: Strong evidence for Model A
+- BF = 3-10: Moderate evidence for Model A
+- BF = 1: Equal evidence
+- BF < 0.33: Evidence against Model A
+
+### Application 3: Forecasting with Bayesian Updates
+
+Use for repeated forecasting (elections, product launches, project timelines).
+
+**Process:**
+1. Start with base rate prior
+2. Update weekly as new evidence arrives
+3. Track belief evolution over time
+4. Compare final forecast to outcome (calibration check)
+
+**Example: Product launch success**
+
+```
+Week -8: Prior = 60% (base rate for similar launches)
+Week -6: Beta feedback positive → Update to 70%
+Week -4: Competitor launches similar product → Update to 55%
+Week -2: Pre-orders exceed target → Update to 75%
+Week 0: Launch → Actual success: Yes ✓
+
+Forecast evolution: 60% → 70% → 55% → 75% → (Outcome: Yes)
+```
+
+**Calibration**: 75% final forecast, outcome=Yes. Good calibration if 7-8 out of 10 forecasts at 75% are correct.
+
+---
+
+## Quality Checklist for Complex Cases
+
+**Multiple Hypotheses**:
+- [ ] All hypotheses listed (including catch-all "Other")
+- [ ] Priors sum to 1.0
+- [ ] Likelihoods defined for all hypothesis-evidence pairs
+- [ ] Posteriors sum to 1.0 (math check)
+- [ ] Interpretation provided (which hypothesis favored? by how much?)
+
+**Sequential Updates**:
+- [ ] Evidence items clearly independent or conditional dependence noted
+- [ ] Each update uses previous posterior as new prior
+- [ ] Belief evolution tracked (how beliefs changed over time)
+- [ ] Final conclusion integrates all evidence
+- [ ] Timeline shows when each piece of evidence arrived
+
+**Calibration**:
+- [ ] Considered alternative explanations (not overconfident?)
+- [ ] Checked against base rates (not ignoring priors?)
+- [ ] Stated confidence interval or range (not just point estimate)
+- [ ] Identified assumptions that could make forecast wrong
+- [ ] Planned follow-up to track calibration (compare forecast to outcome)
+
+**Minimum Standard for Complex Cases**:
+- Multiple hypotheses: Score ≥ 4.0 on rubric
+- High-stakes forecasts: Track calibration over 10+ predictions
--- a/skills/bayesian-reasoning-calibration/resources/template.md
+++ b/skills/bayesian-reasoning-calibration/resources/template.md
@@ -0,0 +1,308 @@
+# Bayesian Reasoning Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Bayesian Update Progress:
+- [ ] Step 1: State question and establish prior from base rates
+- [ ] Step 2: Estimate likelihoods for evidence
+- [ ] Step 3: Calculate posterior using Bayes' theorem
+- [ ] Step 4: Perform sensitivity analysis
+- [ ] Step 5: Calibrate and validate with quality checklist
+```
+
+**Step 1: State question and establish prior from base rates**
+
+Define specific, testable hypothesis with timeframe and success criteria. Identify reference class and base rate, adjust for specific differences, and state prior explicitly with justification. See [Step 1: State the Question](#step-1-state-the-question) and [Step 2: Find Base Rates](#step-2-find-base-rates) for guidance.
+
+**Step 2: Estimate likelihoods for evidence**
+
+Assess P(E|H) (probability of evidence if hypothesis TRUE) and P(E|¬H) (probability if FALSE), calculate likelihood ratio = P(E|H) / P(E|¬H), and interpret diagnostic strength. See [Step 3: Estimate Likelihoods](#step-3-estimate-likelihoods) for examples and common mistakes.
+
+**Step 3: Calculate posterior using Bayes' theorem**
+
+Apply P(H|E) = [P(E|H) × P(H)] / P(E) or use simpler odds form: Posterior Odds = Prior Odds × LR. Interpret change in belief (prior → posterior) and strength of evidence. See [Step 4: Calculate Posterior](#step-4-calculate-posterior) for calculation methods.
+
+**Step 4: Perform sensitivity analysis**
+
+Test how posterior changes with different prior values and likelihoods to assess robustness of conclusion. See [Sensitivity Analysis](#sensitivity-analysis) section in template structure.
+
+**Step 5: Calibrate and validate with quality checklist**
+
+Check for overconfidence, base rate neglect, and extreme posteriors. Use [Calibration Check](#calibration-check) and [Quality Checklist](#quality-checklist) to verify prior is justified, likelihoods have reasoning, evidence is diagnostic (LR ≠ 1), calculation correct, and assumptions stated.
+
+## Quick Template
+
+```markdown
+# Bayesian Analysis: {Topic}
+
+## Question
+**Hypothesis**: {What are you testing?}
+**Estimating**: P({specific outcome})
+**Timeframe**: {When will outcome be known?}
+**Matters because**: {What decision depends on this?}
+
+---
+
+## Prior Belief (Before Evidence)
+
+### Base Rate
+{What's the general frequency in similar cases?}
+- Reference class: {Similar situations}
+- Base rate: {X%}
+
+### Adjustments
+{How is this case different from base rate?}
+- Factor 1: {Increases/decreases probability because...}
+- Factor 2: {Increases/decreases probability because...}
+
+### Prior Probability
+**P(H) = {X%}**
+
+**Justification**: {Why this prior?}
+
+**Range if uncertain**: {min%} to {max%}
+
+---
+
+## Evidence
+
+**What was observed**: {Specific evidence or data}
+
+**How diagnostic**: {Does this distinguish hypothesis true vs false?}
+
+### Likelihoods
+
+**P(E|H) = {X%}** - Probability of seeing this evidence IF hypothesis is TRUE
+- Reasoning: {Why this likelihood?}
+
+**P(E|¬H) = {Y%}** - Probability of seeing this evidence IF hypothesis is FALSE
+- Reasoning: {Why this likelihood?}
+
+**Likelihood Ratio = {X/Y} = {ratio}**
+- Interpretation: Evidence is {very strong / moderate / weak / not diagnostic}
+
+---
+
+## Bayesian Update
+
+### Calculation
+
+**Using probability form**:
+```
+P(H|E) = [P(E|H) × P(H)] / P(E)
+
+where P(E) = [P(E|H) × P(H)] + [P(E|¬H) × P(¬H)]
+
+P(E) = [{X%} × {Prior%}] + [{Y%} × {100-Prior%}]
+P(E) = {calculation}
+
+P(H|E) = [{X%} × {Prior%}] / {P(E)}
+P(H|E) = {result%}
+```
+
+**Or using odds form** (often simpler):
+```
+Prior Odds = P(H) / P(¬H) = {Prior%} / {100-Prior%} = {odds}
+Likelihood Ratio = {LR}
+Posterior Odds = Prior Odds × LR = {odds} × {LR} = {result}
+Posterior Probability = Posterior Odds / (1 + Posterior Odds) = {result%}
+```
+
+### Posterior Probability
+**P(H|E) = {result%}**
+
+### Change in Belief
+- Prior: {X%}
+- Posterior: {Y%}
+- Change: {+/- Z percentage points}
+- Interpretation: Evidence {strongly supports / moderately supports / weakly supports / contradicts} hypothesis
+
+---
+
+## Sensitivity Analysis
+
+**How sensitive is posterior to inputs?**
+
+If Prior was {different value}:
+- Posterior would be: {recalculated value}
+
+If P(E|H) was {different value}:
+- Posterior would be: {recalculated value}
+
+**Robustness**: Conclusion is {robust / somewhat robust / sensitive} to assumptions
+
+---
+
+## Calibration Check
+
+**Am I overconfident?**
+- Did I anchor on initial belief? {yes/no - reasoning}
+- Did I ignore base rates? {yes/no - reasoning}
+- Is my posterior extreme (>90% or <10%)? {If yes, is evidence truly that strong?}
+- Would an outside observer agree with my likelihoods? {check}
+
+**Red flags**:
+- ✗ Posterior is 100% or 0% (almost never justified)
+- ✗ Large update from weak evidence (check LR)
+- ✗ Prior ignores base rate entirely
+- ✗ Likelihoods are guesses without reasoning
+
+---
+
+## Limitations & Assumptions
+
+**Key assumptions**:
+1. {Assumption 1}
+2. {Assumption 2}
+
+**What could invalidate this analysis**:
+- {Condition that would change conclusion}
+- {Different interpretation of evidence}
+
+**Uncertainty**:
+- Most uncertain about: {which input?}
+- Could be wrong if: {what scenario?}
+
+---
+
+## Decision Implications
+
+**Given posterior of {X%}**:
+
+Recommended action: {what to do}
+
+**If decision threshold is**:
+- High confidence needed (>80%): {action}
+- Medium confidence (>60%): {action}
+- Low bar (>40%): {action}
+
+**Next evidence to gather**: {What would further update belief?}
+```
+
+## Step-by-Step Guide
+
+### Step 1: State the Question
+
+Be specific and testable.
+
+**Good**: "Will our product achieve >1000 DAU within 6 months?"
+**Bad**: "Will the product succeed?"
+
+Define success criteria numerically when possible.
+
+### Step 2: Find Base Rates
+
+**Method**:
+1. Identify reference class (similar situations)
+2. Look up historical frequency
+3. Adjust for known differences
+
+**Example**:
+- Question: Will our SaaS startup raise Series A?
+- Reference class: B2B SaaS startups, seed stage, similar market
+- Base rate: ~30% raise Series A within 2 years
+- Adjustments: Strong traction (+), competitive market (-), experienced team (+)
+- Adjusted prior: 45%
+
+**Common mistake**: Ignoring base rates entirely ("inside view" bias)
+
+### Step 3: Estimate Likelihoods
+
+Ask: "If hypothesis were true, how likely is this evidence?"
+Then: "If hypothesis were false, how likely is this evidence?"
+
+**Example - Medical test**:
+- Hypothesis: Patient has disease (prevalence 1%)
+- Evidence: Positive test result
+- P(positive test | has disease) = 90% (test sensitivity)
+- P(positive test | no disease) = 5% (false positive rate)
+- LR = 90% / 5% = 18 (strong evidence)
+
+**Common mistake**: Confusing P(E|H) with P(H|E) - the "prosecutor's fallacy"
+
+### Step 4: Calculate Posterior
+
+**Odds form is often easier**:
+
+1. Convert prior to odds: Odds = P / (1-P)
+2. Multiply by LR: Posterior Odds = Prior Odds × LR
+3. Convert back to probability: P = Odds / (1 + Odds)
+
+**Example**:
+- Prior: 30% → Odds = 0.3/0.7 = 0.43
+- LR = 5
+- Posterior Odds = 0.43 × 5 = 2.15
+- Posterior Probability = 2.15 / 3.15 = 68%
+
+### Step 5: Calibrate
+
+**Calibration questions**:
+- If you made 100 predictions at X% confidence, would X actually occur?
+- Are you systematically over/underconfident?
+- Does your posterior pass the "outside view" test?
+
+**Calibration tips**:
+- Track your forecasts and outcomes
+- Be especially skeptical of extreme probabilities (>95%, <5%)
+- Consider opposite evidence (confirmation bias check)
+
+## Common Pitfalls
+
+**Ignoring base rates** ("base rate neglect"):
+- Bad: "Test is 90% accurate, so positive test means 90% chance of disease"
+- Good: "Disease is rare (1%), so even with positive test, probability is only ~15%"
+
+**Confusing conditional probabilities**:
+- P(positive test | disease) ≠ P(disease | positive test)
+- These can be very different!
+
+**Overconfident likelihoods**:
+- Claiming P(E|H) = 99% when evidence is ambiguous
+- Not considering alternative explanations
+
+**Anchoring on prior**:
+- Weak evidence + starting at 50% = staying near 50%
+- Solution: Use base rates, not 50% default
+
+**Treating all evidence as equally strong**:
+- Check likelihood ratio (LR)
+- LR ≈ 1 means evidence is not diagnostic
+
+## Worked Example
+
+**Question**: Will project finish on time?
+
+**Prior**:
+- Base rate: 60% of our projects finish on time
+- This project: More complex than average (-), experienced team (+)
+- Prior: 55%
+
+**Evidence**: At 50% milestone, we're 1 week behind schedule
+
+**Likelihoods**:
+- P(behind at 50% | finish on time) = 30% (can recover)
+- P(behind at 50% | miss deadline) = 80% (usually signals trouble)
+- LR = 30% / 80% = 0.375 (evidence against on-time)
+
+**Calculation**:
+- Prior odds = 0.55 / 0.45 = 1.22
+- Posterior odds = 1.22 × 0.375 = 0.46
+- Posterior probability = 0.46 / 1.46 = 32%
+
+**Conclusion**: Updated from 55% to 32% probability of on-time finish. Being behind at 50% is meaningful evidence of delay.
+
+**Decision**: If deadline is flexible, continue. If hard deadline, consider descoping or adding resources.
+
+## Quality Checklist
+
+- [ ] Prior is justified (base rate + adjustments)
+- [ ] Likelihoods have reasoning (not just guesses)
+- [ ] Evidence is diagnostic (LR significantly different from 1)
+- [ ] Calculation is correct
+- [ ] Posterior is in reasonable range (not 0% or 100%)
+- [ ] Assumptions are stated
+- [ ] Sensitivity analysis performed
+- [ ] Decision implications clear
--- a/skills/brainstorm-diverge-converge/SKILL.md
+++ b/skills/brainstorm-diverge-converge/SKILL.md
@@ -0,0 +1,166 @@
+---
+name: brainstorm-diverge-converge
+description: Use when you need to generate many creative options before systematically narrowing to the best choices. Invoke when exploring product ideas, solving open-ended problems, generating strategic alternatives, developing research questions, designing experiments, or when you need both breadth (many ideas) and rigor (principled selection). Use when user mentions brainstorming, ideation, divergent thinking, generating options, or evaluating alternatives.
+---
+
+# Brainstorm Diverge-Converge
+
+## Table of Contents
+
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [What is Brainstorm Diverge-Converge?](#what-is-brainstorm-diverge-converge)
+- [Workflow](#workflow)
+  - [1. Gather Requirements](#1--gather-requirements)
+  - [2. Diverge (Generate Ideas)](#2--diverge-generate-ideas)
+  - [3. Cluster (Group Themes)](#3--cluster-group-themes)
+  - [4. Converge (Evaluate & Select)](#4--converge-evaluate--select)
+  - [5. Document & Validate](#5--document--validate)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Apply structured divergent-convergent thinking to generate many creative options, organize them into meaningful clusters, then systematically evaluate and narrow to the strongest choices. This balances creative exploration with disciplined decision-making.
+
+## When to Use This Skill
+
+- Generating product or feature ideas
+- Exploring solution approaches for open-ended problems
+- Developing research questions or hypotheses
+- Creating marketing or content strategies
+- Identifying strategic initiatives or opportunities
+- Designing experiments or tests
+- Naming products, features, or projects
+- Developing interview questions or survey items
+- Exploring design alternatives (UI, architecture, process)
+- Prioritizing from a large possibility space
+- Overcoming creative blocks
+- When you need both quantity (many options) and quality (best options)
+
+**Trigger phrases:** "brainstorm", "generate ideas", "explore options", "what are all the ways", "divergent thinking", "ideation", "evaluate alternatives", "narrow down choices"
+
+## What is Brainstorm Diverge-Converge?
+
+A three-phase creative problem-solving method:
+
+- **Diverge (Expand)**: Generate many ideas without judgment or filtering. Focus on quantity and variety. Defer evaluation.
+
+- **Cluster (Organize)**: Group similar ideas into themes or categories. Identify patterns and connections. Create structure from chaos.
+
+- **Converge (Select)**: Evaluate ideas against criteria. Score, rank, or prioritize. Select strongest options for action.
+
+**Quick Example:**
+
+```markdown
+# Problem: How to improve customer onboarding?
+
+## Diverge (30 ideas)
+- In-app video tutorials
+- Interactive walkthroughs
+- Email drip campaign
+- Live webinar onboarding
+- 1-on-1 concierge calls
+- ... (25 more ideas)
+
+## Cluster (6 themes)
+1. **Self-serve content** (videos, docs, tooltips)
+2. **Interactive guidance** (walkthroughs, checklists)
+3. **Human touch** (calls, webinars, chat)
+4. **Motivation** (gamification, progress tracking)
+5. **Timing** (just-in-time help, preemptive)
+6. **Social** (community, peer examples)
+
+## Converge (Top 3)
+1. Interactive walkthrough (high impact, medium effort) - 8.5/10
+2. Email drip campaign (medium impact, low effort) - 8.0/10
+3. Just-in-time tooltips (medium impact, low effort) - 7.5/10
+```
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Brainstorm Progress:
+- [ ] Step 1: Gather requirements
+- [ ] Step 2: Diverge (generate ideas)
+- [ ] Step 3: Cluster (group themes)
+- [ ] Step 4: Converge (evaluate and select)
+- [ ] Step 5: Document and validate
+```
+
+**Step 1: Gather requirements**
+
+Clarify topic/problem (what are you brainstorming?), goal (what decision will this inform?), constraints (must-haves, no-gos, boundaries), evaluation criteria (what makes an idea "good" - impact, feasibility, cost, speed, risk, alignment), target quantity (suggest 20-50 ideas), and rounds (single session or multiple rounds, default: 1).
+
+**Step 2: Diverge (generate ideas)**
+
+Generate 20-50 ideas without judgment or filtering. Suspend criticism (all ideas valid during divergence), aim for quantity and variety (different types, scales, approaches), and use creative prompts: "What if unlimited resources?", "What would competitor do?", "Simplest approach?", "Most ambitious?", "Unconventional alternatives?". Output: Numbered list of raw ideas. For simple topics → generate directly. For complex topics → Use `resources/template.md` for structured prompts.
+
+**Step 3: Cluster (group themes)**
+
+Organize ideas into 4-8 distinct clusters by identifying patterns, creating categories (mechanism, user/audience, timeline, effort, risk, strategic objective), naming clusters clearly, and checking coverage (distinct approaches). Fewer than 4 = not enough variety, more than 8 = too fragmented. Output: Ideas grouped under cluster labels.
+
+**Step 4: Converge (evaluate and select)**
+
+Define criteria (from step 1), score ideas on criteria (1-10 or Low/Med/High scale), rank by total/weighted score, select top 3-5 options, and document tradeoffs (why chosen, what deprioritized). Evaluation patterns: Impact/Effort matrix, weighted scoring, must-have filtering, pairwise comparison. See [Common Patterns](#common-patterns) for domain-specific approaches.
+
+**Step 5: Document and validate**
+
+Create `brainstorm-diverge-converge.md` with: problem statement, diverge (full list), cluster (organized themes), converge (scored/ranked/selected), and next steps. Validate using `resources/evaluators/rubric_brainstorm_diverge_converge.json`: verify 20+ ideas with variety, distinct clusters, explicit criteria, consistent scoring, top selections clearly better, actionable next steps. Minimum standard: Score ≥ 3.5.
+
+## Common Patterns
+
+**For product/feature ideation:**
+- Diverge: 30-50 feature ideas
+- Cluster by: User need, use case, or feature type
+- Converge: Impact vs. effort scoring
+- Select: Top 3-5 for roadmap
+
+**For problem-solving:**
+- Diverge: 20-40 solution approaches
+- Cluster by: Mechanism (how it solves problem)
+- Converge: Feasibility vs. effectiveness
+- Select: Top 2-3 to prototype
+
+**For research questions:**
+- Diverge: 25-40 potential questions
+- Cluster by: Research method or domain
+- Converge: Novelty, tractability, impact
+- Select: Top 3-5 to investigate
+
+**For strategic planning:**
+- Diverge: 20-30 strategic initiatives
+- Cluster by: Time horizon or strategic pillar
+- Converge: Strategic value vs. resource requirements
+- Select: Top 5 for quarterly planning
+
+## Guardrails
+
+**Do:**
+- Generate at least 20 ideas in diverge phase (quantity matters)
+- Suspend judgment during divergence (criticism kills creativity)
+- Create distinct clusters (avoid overlap and confusion)
+- Use explicit, relevant criteria for convergence (not vague "goodness")
+- Score consistently across all ideas
+- Document why top ideas were selected (transparency)
+- Include "runner-up" ideas (for later consideration)
+
+**Don't:**
+- Filter ideas during divergence (defeats the purpose)
+- Create clusters that are too similar or overlapping
+- Use vague evaluation criteria ("better", "more appealing")
+- Cherry-pick scores to favor pet ideas
+- Select ideas without systematic evaluation
+- Ignore constraints from requirements gathering
+- Skip documentation of the full process
+
+## Quick Reference
+
+- **Template**: `resources/template.md` - Structured prompts and techniques for diverge-cluster-converge
+- **Quality rubric**: `resources/evaluators/rubric_brainstorm_diverge_converge.json`
+- **Output file**: `brainstorm-diverge-converge.md`
+- **Typical idea count**: 20-50 ideas → 4-8 clusters → 3-5 selections
+- **Common criteria**: Impact, Feasibility, Cost, Speed, Risk, Alignment
--- a/skills/brainstorm-diverge-converge/resources/evaluators/rubric_brainstorm_diverge_converge.json
+++ b/skills/brainstorm-diverge-converge/resources/evaluators/rubric_brainstorm_diverge_converge.json
@@ -0,0 +1,135 @@
+{
+  "name": "Brainstorm Diverge-Converge Quality Rubric",
+  "scale": {
+    "min": 1,
+    "max": 5,
+    "description": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent"
+  },
+  "criteria": [
+    {
+      "name": "Divergence Quantity",
+      "description": "Generated sufficient number of ideas to explore the possibility space",
+      "scoring": {
+        "1": "Fewer than 10 ideas - insufficient exploration",
+        "2": "10-19 ideas - minimal exploration",
+        "3": "20-29 ideas - adequate exploration",
+        "4": "30-49 ideas - thorough exploration",
+        "5": "50+ ideas - comprehensive exploration of possibilities"
+      }
+    },
+    {
+      "name": "Divergence Variety",
+      "description": "Ideas show diversity in approach, scale, and type (not all similar)",
+      "scoring": {
+        "1": "All ideas are nearly identical or very similar",
+        "2": "Mostly similar ideas with 1-2 different approaches",
+        "3": "Mix of similar and different ideas, some variety present",
+        "4": "Good variety across multiple dimensions (incremental/radical, short/long-term, etc.)",
+        "5": "Exceptional variety - ideas span multiple approaches, scales, mechanisms, and perspectives"
+      }
+    },
+    {
+      "name": "Divergence Creativity",
+      "description": "Includes both safe/obvious ideas and creative/unconventional ideas",
+      "scoring": {
+        "1": "Only obvious, conventional ideas",
+        "2": "Mostly obvious ideas with 1-2 slightly creative ones",
+        "3": "Mix of obvious and creative ideas, some boundary-pushing",
+        "4": "Good balance of safe and creative ideas with several unconventional approaches",
+        "5": "Exceptional creativity - includes wild ideas that challenge assumptions alongside practical ones"
+      }
+    },
+    {
+      "name": "Cluster Quality",
+      "description": "Ideas are organized into meaningful, distinct, well-labeled themes",
+      "scoring": {
+        "1": "No clustering or random groupings with unclear logic",
+        "2": "Poor clustering - significant overlap between clusters or vague labels",
+        "3": "Decent clustering - mostly distinct groups with adequate labels",
+        "4": "Good clustering - clear distinct themes with descriptive, specific labels",
+        "5": "Exceptional clustering - perfectly distinct themes with insightful labels that reveal patterns"
+      }
+    },
+    {
+      "name": "Cluster Coverage",
+      "description": "Clusters represent meaningfully different approaches (4-8 clusters typical)",
+      "scoring": {
+        "1": "1-2 clusters (insufficient structure) or 12+ clusters (over-fragmented)",
+        "2": "3 clusters or 10-11 clusters (suboptimal structure)",
+        "3": "4-8 clusters with some overlap between them",
+        "4": "4-8 clusters that are distinct and well-balanced",
+        "5": "4-8 clusters that are distinct, balanced, and reveal strategic dimensions of the problem"
+      }
+    },
+    {
+      "name": "Evaluation Criteria Clarity",
+      "description": "Convergence criteria are explicit, relevant, and well-defined",
+      "scoring": {
+        "1": "No criteria stated or purely subjective ('better', 'best')",
+        "2": "Vague criteria without clear definition",
+        "3": "Criteria stated but could be more specific or relevant",
+        "4": "Clear, specific, relevant criteria (e.g., impact, feasibility, cost)",
+        "5": "Exceptional criteria - specific, relevant, weighted appropriately, with clear definitions"
+      }
+    },
+    {
+      "name": "Scoring Rigor",
+      "description": "Ideas are scored systematically with justified ratings",
+      "scoring": {
+        "1": "No scoring or arbitrary rankings",
+        "2": "Scoring present but inconsistent or unjustified",
+        "3": "Basic scoring with some justification",
+        "4": "Systematic scoring with clear justification for ratings",
+        "5": "Exceptional scoring - consistent, justified, includes sensitivity analysis or confidence intervals"
+      }
+    },
+    {
+      "name": "Selection Quality",
+      "description": "Top selections clearly outperform alternatives based on stated criteria",
+      "scoring": {
+        "1": "Selections don't match scores or criteria, appear arbitrary",
+        "2": "Selections somewhat aligned with scores but weak justification",
+        "3": "Selections aligned with scores, basic justification provided",
+        "4": "Selections clearly justified based on scores and criteria, tradeoffs noted",
+        "5": "Exceptional selections - fully justified with explicit tradeoff analysis and consideration of dependencies"
+      }
+    },
+    {
+      "name": "Actionability",
+      "description": "Output includes clear next steps and decision-ready recommendations",
+      "scoring": {
+        "1": "No next steps or vague 'implement this' statements",
+        "2": "Generic next steps without specifics",
+        "3": "Basic next steps with some specific actions",
+        "4": "Clear, specific next steps with timelines and owners",
+        "5": "Exceptional actionability - detailed implementation plan with milestones, resources, and success metrics"
+      }
+    },
+    {
+      "name": "Process Integrity",
+      "description": "Follows diverge-cluster-converge sequence without premature filtering",
+      "scoring": {
+        "1": "Process violated - filtered during divergence or skipped clustering",
+        "2": "Some premature filtering or weak clustering step",
+        "3": "Process mostly followed with minor shortcuts",
+        "4": "Process followed correctly with clear phase separation",
+        "5": "Exceptional process integrity - clear phase separation, no premature judgment, explicit constraints"
+      }
+    }
+  ],
+  "overall_assessment": {
+    "thresholds": {
+      "excellent": "Average score ≥ 4.5 (publication or high-stakes use)",
+      "very_good": "Average score ≥ 4.0 (most strategic decisions should aim for this)",
+      "good": "Average score ≥ 3.5 (minimum for important decisions)",
+      "acceptable": "Average score ≥ 3.0 (workable for low-stakes brainstorms)",
+      "needs_rework": "Average score < 3.0 (redo before using for decisions)"
+    },
+    "stakes_guidance": {
+      "low_stakes": "Exploratory ideation, early brainstorming: aim for ≥ 3.0",
+      "medium_stakes": "Feature prioritization, project selection: aim for ≥ 3.5",
+      "high_stakes": "Strategic initiatives, resource allocation: aim for ≥ 4.0"
+    }
+  },
+  "usage_instructions": "Rate each criterion on 1-5 scale. Calculate average. For important decisions, minimum score is 3.5. For high-stakes strategic choices, aim for ≥4.0. Check especially for divergence quantity (at least 20 ideas), cluster quality (distinct themes), and evaluation rigor (explicit criteria with justified scoring)."
+}
--- a/skills/brainstorm-diverge-converge/resources/examples/api-performance-optimization.md
+++ b/skills/brainstorm-diverge-converge/resources/examples/api-performance-optimization.md
@@ -0,0 +1,394 @@
+# Brainstorm: API Performance Optimization Strategies
+
+## Problem Statement
+
+**What we're solving**: API response time has degraded from 200ms (p95) to 800ms (p95) over the past 3 months. Users are experiencing slow page loads and some are timing out.
+
+**Decision to make**: Which optimization approaches should we prioritize for the next quarter to bring p95 response time back to <300ms?
+
+**Context**:
+- REST API serving 50k requests/day
+- PostgreSQL database, 200GB data
+- Node.js/Express backend
+- Current p95: 800ms, p50: 350ms
+- Team: 3 backend engineers, 1 devops
+- Quarterly engineering budget: 4 engineer-months
+
+**Constraints**:
+- Cannot break existing API contracts (backwards compatible)
+- Must maintain 99.9% uptime during changes
+- No more than $2k/month additional infrastructure cost
+- Must ship improvements within 3 months
+
+---
+
+## Diverge: Generate Ideas
+
+**Target**: 40 ideas
+
+**Prompt**: Generate as many ways as possible to improve API response time. Suspend judgment. All ideas are valid - from quick wins to major architectural changes.
+
+### All Ideas
+
+1. Add Redis caching layer for frequent queries
+2. Database query optimization (add indexes)
+3. Implement database connection pooling
+4. Use GraphQL to reduce over-fetching
+5. Add CDN for static assets
+6. Implement HTTP/2 server push
+7. Compress API responses with gzip
+8. Paginate large result sets
+9. Use database read replicas
+10. Implement response caching headers (ETag, If-None-Match)
+11. Migrate to serverless (AWS Lambda)
+12. Add API gateway for request routing
+13. Implement request batching
+14. Use database query result caching
+15. Optimize N+1 query problems
+16. Implement lazy loading for related data
+17. Switch to gRPC from REST
+18. Add application-level caching (in-memory)
+19. Optimize JSON serialization
+20. Implement database partitioning
+21. Use faster ORM or raw SQL
+22. Add async processing for slow operations
+23. Implement API rate limiting to prevent overload
+24. Optimize Docker container size
+25. Use database materialized views
+26. Implement query result streaming
+27. Add load balancer for horizontal scaling
+28. Optimize database schema (denormalization)
+29. Implement incremental/delta responses
+30. Use WebSockets for real-time data
+31. Migrate to NoSQL (MongoDB, DynamoDB)
+32. Implement API response compression (Brotli)
+33. Add edge caching (Cloudflare Workers)
+34. Use database archival for old data
+35. Implement request queuing/throttling
+36. Optimize API middleware chain
+37. Use faster JSON parser (simdjson)
+38. Implement selective field loading
+39. Add monitoring and alerting for slow queries
+40. Database vacuum/analyze for query planner
+
+**Total generated**: 40 ideas
+
+---
+
+## Cluster: Organize Themes
+
+**Goal**: Group similar ideas into 4-8 distinct categories
+
+### Cluster 1: Caching Strategies (9 ideas)
+- Add Redis caching layer for frequent queries
+- Implement response caching headers (ETag, If-None-Match)
+- Use database query result caching
+- Add application-level caching (in-memory)
+- Add CDN for static assets
+- Add edge caching (Cloudflare Workers)
+- Implement response caching at API gateway
+- Use database materialized views
+- Cache computed/aggregated results
+
+### Cluster 2: Database Query Optimization (11 ideas)
+- Database query optimization (add indexes)
+- Optimize N+1 query problems
+- Use faster ORM or raw SQL
+- Implement selective field loading
+- Optimize database schema (denormalization)
+- Add monitoring and alerting for slow queries
+- Database vacuum/analyze for query planner
+- Implement lazy loading for related data
+- Use database query result caching (also in caching)
+- Database archival for old data
+- Database partitioning
+
+### Cluster 3: Data Transfer Optimization (7 ideas)
+- Compress API responses with gzip/Brotli
+- Paginate large result sets
+- Implement request batching
+- Optimize JSON serialization
+- Use faster JSON parser (simdjson)
+- Implement incremental/delta responses
+- Implement query result streaming
+
+### Cluster 4: Infrastructure Scaling (7 ideas)
+- Use database read replicas
+- Add load balancer for horizontal scaling
+- Implement database connection pooling
+- Optimize Docker container size
+- Migrate to serverless (AWS Lambda)
+- Add API gateway for request routing
+- Implement request queuing/throttling
+
+### Cluster 5: Architectural Changes (4 ideas)
+- Use GraphQL to reduce over-fetching
+- Switch to gRPC from REST
+- Use WebSockets for real-time data
+- Migrate to NoSQL (MongoDB, DynamoDB)
+
+### Cluster 6: Async & Offloading (2 ideas)
+- Add async processing for slow operations
+- Implement background job processing for heavy tasks
+
+**Total clusters**: 6 themes
+
+---
+
+## Converge: Evaluate & Select
+
+**Evaluation Criteria**:
+1. **Impact on p95 latency** (weight: 3x) - How much will this reduce response time?
+2. **Implementation effort** (weight: 2x) - Engineering time required (lower = better)
+3. **Infrastructure cost** (weight: 1x) - Additional monthly cost (lower = better)
+
+**Scoring scale**: 1-10 (higher = better)
+
+### Scored Ideas
+
+| Idea | Impact (3x) | Effort (2x) | Cost (1x) | Weighted Total |
+|------|------------|------------|-----------|----------------|
+| Add Redis caching | 9 | 7 | 7 | 9×3 + 7×2 + 7×1 = 48 |
+| Optimize N+1 queries | 8 | 8 | 10 | 8×3 + 8×2 + 10×1 = 50 |
+| Add database indexes | 7 | 9 | 10 | 7×3 + 9×2 + 10×1 = 49 |
+| Response compression (gzip) | 6 | 9 | 10 | 6×3 + 9×2 + 10×1 = 45 |
+| Database connection pooling | 6 | 8 | 10 | 6×3 + 8×2 + 10×1 = 44 |
+| Paginate large results | 7 | 7 | 10 | 7×3 + 7×2 + 10×1 = 45 |
+| DB read replicas | 8 | 5 | 4 | 8×3 + 5×2 + 4×1 = 38 |
+| Async processing | 6 | 6 | 8 | 6×3 + 6×2 + 8×1 = 38 |
+| GraphQL migration | 7 | 3 | 9 | 7×3 + 3×2 + 9×1 = 36 |
+| Serverless migration | 5 | 2 | 5 | 5×3 + 2×2 + 5×1 = 24 |
+
+**Scoring notes**:
+- **Impact**: Based on estimated latency reduction (9-10 = >400ms, 7-8 = 200-400ms, 5-6 = 100-200ms)
+- **Effort**: Inverse scale (9-10 = <1 week, 7-8 = 1-2 weeks, 5-6 = 3-4 weeks, 3-4 = 1-2 months, 1-2 = 3+ months)
+- **Cost**: Inverse scale (10 = $0, 8-9 = <$200/mo, 6-7 = <$500/mo, 4-5 = <$1k/mo, 1-3 = >$1k/mo)
+
+---
+
+### Top 3 Selections
+
+**1. Fix N+1 Query Problems** (Score: 50)
+
+**Why selected**: Highest overall score - high impact, reasonable effort, zero cost
+
+**Rationale**:
+- **Impact (8/10)**: N+1 queries are a common culprit for slow APIs. Profiling shows several endpoints making 50-100 queries per request. Fixing this could reduce p95 by 300-500ms.
+- **Effort (8/10)**: Can identify with APM tools (DataDog), fix iteratively. Estimated 2-3 weeks for main endpoints.
+- **Cost (10/10)**: Zero additional infrastructure cost - purely code optimization.
+
+**Next steps**:
+- Week 1: Profile top 10 slowest endpoints with APM to identify N+1 patterns
+- Week 2-3: Implement eager loading/joins for identified queries
+- Week 4: Deploy with feature flags, measure impact
+- **Expected improvement**: Reduce p95 from 800ms to 500-600ms
+
+**Measurement**:
+- Track p95/p99 latency per endpoint before/after
+- Monitor database query counts (should decrease significantly)
+- Verify no increase in memory usage from eager loading
+
+---
+
+**2. Add Database Indexes** (Score: 49)
+
+**Why selected**: Second highest score - very low effort for solid impact
+
+**Rationale**:
+- **Impact (7/10)**: Database query analysis shows several full table scans. Adding indexes could reduce individual query time by 50-80%.
+- **Effort (9/10)**: Quick wins - can identify missing indexes via EXPLAIN ANALYZE, add indexes with minimal risk. Estimated 1 week.
+- **Cost (10/10)**: Marginal storage cost for indexes (~5-10GB), no new infrastructure.
+
+**Next steps**:
+- Day 1-2: Run EXPLAIN ANALYZE on slow queries (from slow query log)
+- Day 3-4: Create indexes on foreign keys, WHERE clause columns, JOIN columns
+- Day 5: Deploy indexes during low-traffic window, monitor impact
+- **Expected improvement**: Reduce p95 by 100-200ms for index-heavy endpoints
+
+**Measurement**:
+- Compare query execution plans before/after (table scan → index scan)
+- Track index usage with pg_stat_user_indexes
+- Monitor index size growth
+
+**Considerations**:
+- Some writes may slow down slightly (index maintenance)
+- Test on staging first to verify no lock contention
+
+---
+
+**3. Implement Redis Caching** (Score: 48)
+
+**Why selected**: Highest impact potential, moderate effort and cost
+
+**Rationale**:
+- **Impact (9/10)**: Caching frequently-accessed data (user profiles, config, lookup tables) could eliminate 60-70% of database queries. Massive impact for cacheable endpoints.
+- **Effort (7/10)**: Moderate effort - setup Redis, implement caching layer, handle cache invalidation. Estimated 2-3 weeks.
+- **Cost (7/10)**: Redis managed service ~$200-400/month (ElastiCache t3.medium)
+
+**Next steps**:
+- Week 1: Analyze request patterns - identify most-frequent queries for caching
+- Week 2: Setup Redis (ElastiCache), implement cache-aside pattern for top 3 endpoints
+- Week 3: Implement cache invalidation strategy (TTL + event-based)
+- Week 4: Rollout with monitoring
+- **Expected improvement**: Reduce p95 from 800ms to 300-400ms for cached endpoints (cache hit rate target: >80%)
+
+**Measurement**:
+- Track cache hit rate (target >80%)
+- Monitor Redis memory usage and eviction rate
+- Compare endpoint latency with/without cache
+- Track database query reduction
+
+**Considerations**:
+- Cache invalidation complexity (implement carefully to avoid stale data)
+- Redis failover strategy (what happens if Redis is down?)
+- Cold start performance (first request still slow)
+
+---
+
+### Runner-Ups (For Future Consideration)
+
+**Response Compression (gzip)** (Score: 45)
+- Very quick win (1-2 days to implement)
+- Modest impact for large payloads (~20-30% response size reduction → ~100ms latency improvement)
+- **Recommendation**: Implement in parallel with top 3 (low effort, no downside)
+
+**Database Connection Pooling** (Score: 44)
+- Quick to implement if not already in place
+- Reduces connection overhead
+- **Recommendation**: Verify current pooling configuration first - may already be optimized
+
+**Pagination** (Score: 45)
+- Essential for endpoints returning large result sets
+- Quick to implement (2-3 days)
+- **Recommendation**: Implement in parallel - protect against future growth
+
+**Database Read Replicas** (Score: 38)
+- Good for read-heavy workload scaling
+- Higher cost (~$500-800/month)
+- **Recommendation**: Defer to Q2 after quick wins exhausted - consider if traffic grows 2-3x
+
+---
+
+## Next Steps
+
+### Immediate Actions (Week 1-2)
+
+**Priority 1: N+1 Query Optimization**
+- [ ] Enable APM detailed query tracing
+- [ ] Profile top 10 slowest endpoints
+- [ ] Create backlog of N+1 fixes prioritized by impact
+- [ ] Assign to Engineer A
+
+**Priority 2: Database Index Analysis**
+- [ ] Export slow query log (queries >500ms)
+- [ ] Run EXPLAIN ANALYZE on top 20 slow queries
+- [ ] Identify missing indexes
+- [ ] Assign to Engineer B
+
+**Priority 3: Redis Caching Planning**
+- [ ] Analyze request patterns to identify cacheable data
+- [ ] Design cache key strategy
+- [ ] Document cache invalidation approach
+- [ ] Get budget approval for Redis ($300/month)
+- [ ] Assign to Engineer C
+
+**Quick Win (parallel)**:
+- [ ] Implement gzip compression (Engineer A, 4 hours)
+- [ ] Verify connection pooling config (Engineer B, 2 hours)
+- [ ] Add pagination to `/users` and `/orders` endpoints (Engineer C, 1 day)
+
+---
+
+### Timeline
+
+**Week 1-2**: Analysis + quick wins
+- N+1 profiling complete
+- Index analysis complete
+- Redis architecture designed
+- Gzip compression live
+- Pagination live for 2 endpoints
+
+**Week 3-4**: N+1 fixes + Indexes
+- Top 5 N+1 queries fixed and deployed
+- 10-15 database indexes added
+- **Target**: p95 drops to 600ms
+
+**Week 5-7**: Redis caching
+- Redis infrastructure provisioned
+- Top 3 endpoints cached
+- Cache invalidation tested
+- **Target**: p95 drops to 350ms for cached endpoints
+
+**Week 8-9**: Measure, iterate, polish
+- Monitor metrics
+- Fix any regressions
+- Extend caching to 5 more endpoints
+- **Target**: Overall p95 <300ms
+
+**Week 10-12**: Buffer for unknowns
+- Address unexpected issues
+- Optimize further if needed
+- Document learnings
+
+---
+
+### Success Criteria
+
+**Primary**:
+- [ ] p95 latency <300ms (currently 800ms)
+- [ ] p99 latency <600ms (currently 1.5s)
+- [ ] No increase in error rate
+- [ ] 99.9% uptime maintained
+
+**Secondary**:
+- [ ] Database query count reduced by >40%
+- [ ] Cache hit rate >80% for cached endpoints
+- [ ] Additional infrastructure cost <$500/month
+
+**Monitoring**:
+- Daily p95/p99 latency dashboard
+- Weekly review of slow query log
+- Redis cache hit rate tracking
+- Database connection pool utilization
+
+---
+
+### Risks & Mitigation
+
+**Risk 1: N+1 fixes increase memory usage**
+- **Mitigation**: Profile memory before/after, implement pagination if needed
+- **Rollback**: Revert to lazy loading if memory spikes >20%
+
+**Risk 2: Cache invalidation bugs cause stale data**
+- **Mitigation**: Start with short TTL (5 min), add event-based invalidation gradually
+- **Rollback**: Disable caching for affected endpoints immediately
+
+**Risk 3: Index additions cause write performance degradation**
+- **Mitigation**: Test on staging with production-like load, monitor write latency
+- **Rollback**: Drop problematic indexes
+
+**Risk 4: Timeline slips due to complexity**
+- **Mitigation**: Front-load quick wins (gzip, indexes) to show early progress
+- **Contingency**: Descope Redis to Q2 if needed, focus on N+1 and indexes
+
+---
+
+## Rubric Self-Assessment
+
+Using `rubric_brainstorm_diverge_converge.json`:
+
+**Scores**:
+1. Divergence Quantity: 5/5 (40 ideas - comprehensive exploration)
+2. Divergence Variety: 4/5 (good variety from quick fixes to major architecture changes)
+3. Divergence Creativity: 4/5 (includes both practical and ambitious ideas)
+4. Cluster Quality: 5/5 (6 distinct, well-labeled themes)
+5. Cluster Coverage: 5/5 (6 clusters covering infrastructure, data, architecture)
+6. Evaluation Criteria Clarity: 5/5 (impact, effort, cost - specific and weighted)
+7. Scoring Rigor: 4/5 (systematic scoring with justification)
+8. Selection Quality: 5/5 (clear top 3 with tradeoff analysis)
+9. Actionability: 5/5 (detailed timeline, owners, success criteria)
+10. Process Integrity: 5/5 (clear phase separation, no premature filtering)
+
+**Average**: 4.7/5 - Excellent (high-stakes technical decision quality)
+
+**Assessment**: This brainstorm is ready for use in prioritizing engineering work. Strong divergence phase with 40 varied ideas, clear clustering by mechanism, and rigorous convergence with weighted scoring. Actionable plan with timeline and risk mitigation.
--- a/skills/brainstorm-diverge-converge/resources/template.md
+++ b/skills/brainstorm-diverge-converge/resources/template.md
@@ -0,0 +1,519 @@
+# Brainstorm Diverge-Converge Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Brainstorm Progress:
+- [ ] Step 1: Define problem and criteria using template structure
+- [ ] Step 2: Diverge with creative prompts and techniques
+- [ ] Step 3: Cluster using bottom-up or top-down methods
+- [ ] Step 4: Converge with systematic scoring
+- [ ] Step 5: Document selections and next steps
+```
+
+**Step 1: Define problem and criteria using template structure**
+
+Fill in problem statement, decision context, constraints, and evaluation criteria (3-5 criteria that matter for your context). Use [Quick Template](#quick-template) to structure. See [Detailed Guidance](#detailed-guidance) for criteria selection.
+
+**Step 2: Diverge with creative prompts and techniques**
+
+Generate 20-50 ideas using SCAMPER prompts, perspective shifting, constraint removal, and analogies. Suspend judgment, aim for quantity and variety. See [Phase 1: Diverge](#phase-1-diverge-generate-ideas) for stimulation techniques and quality checks.
+
+**Step 3: Cluster using bottom-up or top-down methods**
+
+Group similar ideas into 4-8 distinct clusters. Use bottom-up clustering (identify natural groupings) or top-down (predefined categories). Name clusters clearly and specifically. See [Phase 2: Cluster](#phase-2-cluster-organize-themes) for methods and quality checks.
+
+**Step 4: Converge with systematic scoring**
+
+Score ideas on defined criteria (1-10 scale or Low/Med/High), rank by total/weighted score, and select top 3-5. Document tradeoffs and runner-ups. See [Phase 3: Converge](#phase-3-converge-evaluate--select) for scoring approaches and selection guidelines.
+
+**Step 5: Document selections and next steps**
+
+Fill in top selections with rationale, next steps, and timeline. Include runner-ups for future consideration and measurement plan. See [Worked Example](#worked-example) for complete example.
+
+## Quick Template
+
+```markdown
+# Brainstorm: {Topic}
+
+## Problem Statement
+
+**What we're solving**: {Clear description of problem or opportunity}
+
+**Decision to make**: {What will we do with the output?}
+
+**Constraints**: {Must-haves, no-gos, boundaries}
+
+---
+
+## Diverge: Generate Ideas
+
+**Target**: {20-50 ideas}
+
+**Prompt**: Generate as many ideas as possible for {topic}. Suspend judgment. All ideas are valid.
+
+### All Ideas
+
+1. {Idea 1}
+2. {Idea 2}
+3. {Idea 3}
+... (continue to target number)
+
+**Total generated**: {N} ideas
+
+---
+
+## Cluster: Organize Themes
+
+**Goal**: Group similar ideas into 4-8 distinct categories
+
+### Cluster 1: {Theme Name}
+- {Idea A}
+- {Idea B}
+- {Idea C}
+
+### Cluster 2: {Theme Name}
+- {Idea D}
+- {Idea E}
+
+... (continue for all clusters)
+
+**Total clusters**: {N} themes
+
+---
+
+## Converge: Evaluate & Select
+
+**Evaluation Criteria**:
+1. {Criterion 1} (weight: {X}x)
+2. {Criterion 2} (weight: {X}x)
+3. {Criterion 3} (weight: {X}x)
+
+### Scored Ideas
+
+| Idea | {Criterion 1} | {Criterion 2} | {Criterion 3} | Total |
+|------|--------------|--------------|--------------|-------|
+| {Top idea 1} | {score} | {score} | {score} | {total} |
+| {Top idea 2} | {score} | {score} | {score} | {total} |
+| {Top idea 3} | {score} | {score} | {score} | {total} |
+
+### Top Selections
+
+**1. {Idea Name}** (Score: {X}/10)
+- Why selected: {Rationale}
+- Next steps: {Immediate actions}
+
+**2. {Idea Name}** (Score: {X}/10)
+- Why selected: {Rationale}
+- Next steps: {Immediate actions}
+
+**3. {Idea Name}** (Score: {X}/10)
+- Why selected: {Rationale}
+- Next steps: {Immediate actions}
+
+### Runner-Ups (For Future Consideration)
+- {Idea with potential but not top priority}
+- {Another promising idea}
+
+---
+
+## Next Steps
+
+**Immediate**:
+- {Action 1 based on top selection}
+- {Action 2}
+
+**Short-term** (next 2-4 weeks):
+- {Action for second priority}
+
+**Parking lot** (revisit later):
+- {Ideas to reconsider in different context}
+```
+
+---
+
+## Detailed Guidance
+
+### Phase 1: Diverge (Generate Ideas)
+
+**Goal**: Generate maximum quantity and variety of ideas
+
+**Techniques to stimulate ideas**:
+
+1. **Classic brainstorming**: Free-flow idea generation
+2. **SCAMPER prompts**:
+   - Substitute: What could we replace?
+   - Combine: What could we merge?
+   - Adapt: What could we adjust?
+   - Modify: What could we change?
+   - Put to other uses: What else could this do?
+   - Eliminate: What could we remove?
+   - Reverse: What if we did the opposite?
+
+3. **Perspective shifting**:
+   - "What would {competitor/expert/user type} do?"
+   - "What if we had 10x the budget?"
+   - "What if we had 1/10th the budget?"
+   - "What if we had to launch tomorrow?"
+   - "What's the most unconventional approach?"
+
+4. **Constraint removal**:
+   - "What if technical limitations didn't exist?"
+   - "What if we didn't care about cost?"
+   - "What if we ignored industry norms?"
+
+5. **Analogies**:
+   - "How do other industries solve similar problems?"
+   - "What can we learn from nature?"
+   - "What historical precedents exist?"
+
+**Divergence quality checks**:
+- [ ] Generated at least 20 ideas (minimum)
+- [ ] Ideas vary in type/approach (not all incremental or all radical)
+- [ ] Included "wild" ideas (push boundaries)
+- [ ] Included "safe" ideas (low risk)
+- [ ] Covered different scales (quick wins and long-term bets)
+- [ ] No premature filtering (saved criticism for converge phase)
+
+**Common divergence mistakes**:
+- Stopping too early (quantity breeds quality)
+- Self-censoring "bad" ideas (they often spark good ones)
+- Focusing only on obvious solutions
+- Letting one person/perspective dominate
+- Jumping to evaluation too quickly
+
+---
+
+### Phase 2: Cluster (Organize Themes)
+
+**Goal**: Create meaningful structure from raw ideas
+
+**Clustering methods**:
+
+1. **Bottom-up clustering** (recommended for most cases):
+   - Read through all ideas
+   - Identify natural groupings (2-3 similar ideas)
+   - Label each group
+   - Assign remaining ideas to groups
+   - Refine group labels for clarity
+
+2. **Top-down clustering**:
+   - Define categories upfront (e.g., short-term/long-term, user types, etc.)
+   - Assign ideas to predefined categories
+   - Adjust categories if many ideas don't fit
+
+3. **Affinity mapping** (for large idea sets):
+   - Group ideas that "feel similar"
+   - Name groups after grouping (not before)
+   - Create sub-clusters if main clusters are too large
+
+**Cluster naming guidelines**:
+- Use descriptive, specific labels (not generic)
+- Good: "Automated self-service tools", Bad: "Automation"
+- Good: "Human high-touch onboarding", Bad: "Customer service"
+- Include mechanism or approach in name when possible
+
+**Cluster quality checks**:
+- [ ] 4-8 clusters (sweet spot for most topics)
+- [ ] Clusters are distinct (minimal overlap)
+- [ ] Clusters are balanced (not 1 idea in one cluster, 20 in another)
+- [ ] Cluster names are clear and specific
+- [ ] All ideas assigned to a cluster
+- [ ] Clusters represent meaningfully different approaches
+
+**Handling edge cases**:
+- **Outliers**: Create "Other/Misc" cluster for ideas that don't fit, or leave unclustered if very few
+- **Ideas that fit multiple clusters**: Assign to best-fit cluster, note cross-cluster themes
+- **Too many clusters** (>10): Merge similar clusters or create super-clusters
+- **Too few clusters** (<4): Consider whether ideas truly vary, or subdivide large clusters
+
+---
+
+### Phase 3: Converge (Evaluate & Select)
+
+**Goal**: Systematically identify strongest ideas
+
+**Step 1: Define Evaluation Criteria**
+
+Choose 3-5 criteria that matter for your context:
+
+**Common criteria**:
+
+| Criterion | Description | When to use |
+|-----------|-------------|-------------|
+| **Impact** | How much value does this create? | Almost always |
+| **Feasibility** | How easy is this to implement? | When resources are constrained |
+| **Cost** | What's the financial investment? | When budget is limited |
+| **Speed** | How quickly can we do this? | When time is critical |
+| **Risk** | What could go wrong? | For high-stakes decisions |
+| **Alignment** | Does this fit our strategy? | For strategic decisions |
+| **Novelty** | How unique/innovative is this? | For competitive differentiation |
+| **Reversibility** | Can we undo this if wrong? | For experimental approaches |
+| **Learning value** | What will we learn? | For research/exploration |
+| **User value** | How much do users benefit? | Product/feature decisions |
+
+**Weighting criteria** (optional):
+- Assign importance weights (e.g., 3x for impact, 2x for feasibility, 1x for speed)
+- Multiply scores by weights before summing
+- Use when some criteria matter much more than others
+
+**Step 2: Score Ideas**
+
+**Scoring approaches**:
+
+1. **Simple 1-10 scale** (recommended for most cases):
+   - 1-3: Low (weak on this criterion)
+   - 4-6: Medium (moderate on this criterion)
+   - 7-9: High (strong on this criterion)
+   - 10: Exceptional (best possible)
+
+2. **Low/Medium/High**:
+   - Faster but less precise
+   - Convert to numbers for ranking (Low=2, Med=5, High=8)
+
+3. **Pairwise comparison**:
+   - Compare each idea to every other idea
+   - Count "wins" for each idea
+   - Slower but more thorough (good for critical decisions)
+
+**Scoring tips**:
+- Score all ideas on Criterion 1, then all on Criterion 2, etc. (maintains consistency)
+- Use reference points ("This idea is more impactful than X but less than Y")
+- Document reasoning for extreme scores (1-2 or 9-10)
+- Consider both upside (best case) and downside (worst case)
+
+**Step 3: Rank and Select**
+
+**Ranking methods**:
+
+1. **Total score ranking**:
+   - Sum scores across all criteria
+   - Sort by total score (highest to lowest)
+   - Select top 3-5
+
+2. **Must-have filtering + scoring**:
+   - First, eliminate ideas that violate must-have constraints
+   - Then score remaining ideas
+   - Select top scorers
+
+3. **Two-dimensional prioritization**:
+   - Plot ideas on 2x2 matrix (e.g., Impact vs. Feasibility)
+   - Prioritize high-impact, high-feasibility quadrant
+   - Common matrices:
+     - Impact / Effort (classic prioritization)
+     - Risk / Reward (for innovation)
+     - Cost / Value (for ROI focus)
+
+**Selection guidelines**:
+- **Diversify**: Don't just pick the top 3 if they're all in same cluster
+- **Balance**: Mix quick wins (fast, low-risk) with big bets (high-impact, longer-term)
+- **Consider dependencies**: Some ideas may enable or enhance others
+- **Document tradeoffs**: Why did 4th place not make the cut?
+
+**Convergence quality checks**:
+- [ ] Evaluation criteria are explicit and relevant
+- [ ] All top ideas scored on all criteria
+- [ ] Scores are justified (not arbitrary)
+- [ ] Top selections clearly outperform alternatives
+- [ ] Tradeoffs are documented
+- [ ] Runner-up ideas noted for future consideration
+
+---
+
+## Worked Example
+
+### Problem: How to increase user retention in first 30 days?
+
+**Context**: SaaS product, 100k users, 40% churn in first month, limited eng resources
+
+**Constraints**:
+- Must ship within 3 months
+- No more than 2 engineer-months of work
+- Must work for both free and paid users
+
+**Criteria**:
+- Impact on retention (weight: 3x)
+- Feasibility with current team (weight: 2x)
+- Speed to ship (weight: 1x)
+
+---
+
+### Diverge: 32 Ideas Generated
+
+1. Email drip campaign with usage tips
+2. In-app interactive tutorial
+3. Weekly webinar for new users
+4. Gamification with achievement badges
+5. 1-on-1 onboarding calls for high-value users
+6. Contextual tooltips for key features
+7. Progress tracking dashboard
+8. Community forum for peer help
+9. AI chatbot for instant support
+10. Daily usage streak rewards
+11. Personalized feature recommendations
+12. "Success checklist" in first 7 days
+13. Video library of use cases
+14. Slack/Discord community
+15. Monthly power-user showcase
+16. Referral rewards program
+17. Usage analytics dashboard for users
+18. Mobile app push notifications
+19. SMS reminders for inactive users
+20. Quarterly user survey with gift card
+21. In-app messaging for tips
+22. Certification program for expertise
+23. Template library for quick starts
+24. Integration marketplace
+25. Office hours with product team
+26. User-generated content showcase
+27. Automated workflow suggestions
+28. Milestone celebrations (email)
+29. Cohort-based onboarding groups
+30. Seasonal feature highlights
+31. Feedback loop with product updates
+32. Partnership with complementary tools
+
+---
+
+### Cluster: 6 Themes
+
+**1. Guided Learning** (8 ideas)
+- Email drip campaign with usage tips
+- In-app interactive tutorial
+- Contextual tooltips for key features
+- "Success checklist" in first 7 days
+- Video library of use cases
+- In-app messaging for tips
+- Automated workflow suggestions
+- Template library for quick starts
+
+**2. Community & Social** (7 ideas)
+- Community forum for peer help
+- Slack/Discord community
+- Monthly power-user showcase
+- Office hours with product team
+- User-generated content showcase
+- Cohort-based onboarding groups
+- Partnership with complementary tools
+
+**3. Motivation & Gamification** (5 ideas)
+- Gamification with achievement badges
+- Daily usage streak rewards
+- Progress tracking dashboard
+- Milestone celebrations (email)
+- Certification program for expertise
+
+**4. Personalization & AI** (4 ideas)
+- AI chatbot for instant support
+- Personalized feature recommendations
+- Usage analytics dashboard for users
+- Seasonal feature highlights
+
+**5. Proactive Engagement** (5 ideas)
+- Weekly webinar for new users
+- Mobile app push notifications
+- SMS reminders for inactive users
+- Quarterly user survey with gift card
+- Feedback loop with product updates
+
+**6. High-Touch Service** (3 ideas)
+- 1-on-1 onboarding calls for high-value users
+- Referral rewards program
+- Integration marketplace
+
+---
+
+### Converge: Evaluation & Selection
+
+**Scoring** (Impact: 1-10, Feasibility: 1-10, Speed: 1-10):
+
+| Idea | Impact (3x) | Feasibility (2x) | Speed (1x) | Weighted Total |
+|------|-------------|------------------|------------|----------------|
+| In-app interactive tutorial | 9 | 6 | 7 | 9×3 + 6×2 + 7×1 = 46 |
+| Email drip campaign | 7 | 9 | 9 | 7×3 + 9×2 + 9×1 = 48 |
+| Success checklist (first 7 days) | 8 | 8 | 8 | 8×3 + 8×2 + 8×1 = 48 |
+| Contextual tooltips | 6 | 9 | 9 | 6×3 + 9×2 + 9×1 = 45 |
+| Progress tracking dashboard | 8 | 7 | 6 | 8×3 + 7×2 + 6×1 = 44 |
+| Template library | 7 | 7 | 8 | 7×3 + 7×2 + 8×1 = 43 |
+| Community forum | 6 | 4 | 3 | 6×3 + 4×2 + 3×1 = 29 |
+| AI chatbot | 7 | 3 | 2 | 7×3 + 3×2 + 2×1 = 29 |
+| 1-on-1 calls | 9 | 5 | 8 | 9×3 + 5×2 + 8×1 = 45 |
+
+---
+
+### Top 3 Selections
+
+**1. Email Drip Campaign** (Score: 48)
+- **Why**: Highest feasibility and speed, good impact. Can implement with existing tools (no eng time).
+- **Rationale**:
+  - Impact (7/10): Proven tactic, industry benchmarks show 10-15% retention improvement
+  - Feasibility (9/10): Use existing Mailchimp setup, just need copy + timing
+  - Speed (9/10): Can launch in 2 weeks with marketing team
+- **Next steps**:
+  - Draft 7-email sequence (days 1, 3, 7, 14, 21, 28, 30)
+  - A/B test subject lines and CTAs
+  - Measure open rates and feature adoption
+
+**2. Success Checklist (First 7 Days)** (Score: 48, tie)
+- **Why**: Balanced impact, feasibility, and speed. Clear value for new users.
+- **Rationale**:
+  - Impact (8/10): Gives users clear path to value, reduces overwhelm
+  - Feasibility (8/10): 1 engineer-week for UI + backend tracking
+  - Speed (8/10): Can ship in 4 weeks
+- **Next steps**:
+  - Define 5-7 "success milestones" (e.g., complete profile, create first project, invite teammate)
+  - Build in-app checklist UI
+  - Track completion rates per milestone
+
+**3. In-App Interactive Tutorial** (Score: 46)
+- **Why**: Highest impact potential, moderate feasibility and speed.
+- **Rationale**:
+  - Impact (9/10): Shows users value immediately, reduces "blank slate" problem
+  - Feasibility (6/10): Requires 3-4 engineer-weeks (tooltips + guided flow)
+  - Speed (7/10): Can ship MVP in 8 weeks
+- **Next steps**:
+  - Design 3-5 step tutorial for core workflow
+  - Use existing tooltip library to reduce build time
+  - Make tutorial skippable but prominent
+
+---
+
+### Runner-Ups (For Future Consideration)
+
+**Progress Tracking Dashboard** (Score: 44)
+- High impact but slightly slower to build (6-8 weeks)
+- Revisit in Q3 after core onboarding stabilizes
+
+**Template Library** (Score: 43)
+- Good balance, but requires content creation (not just eng work)
+- Explore in parallel with email campaign (marketing can create templates)
+
+**1-on-1 Onboarding Calls** (Score: 45, but doesn't scale)
+- Very high impact for high-value users
+- Consider as premium offering for enterprise tier only
+
+---
+
+## Next Steps
+
+**Immediate** (next 2 weeks):
+- Finalize email drip sequence copy
+- Design success checklist UI mockups
+- Scope interactive tutorial feature requirements
+
+**Short-term** (next 1-3 months):
+- Launch email drip campaign (week 2)
+- Ship success checklist (week 6)
+- Ship interactive tutorial MVP (week 10)
+
+**Measurement plan**:
+- Track 30-day retention rate weekly
+- Target: Improve from 60% to 70% retention
+- Break down by cohort (email recipients vs. non-recipients, etc.)
+
+**Parking lot** (revisit Q3):
+- Progress tracking dashboard
+- Template library
+- Community forum (once we hit 200k users)
--- a/skills/causal-inference-root-cause/SKILL.md
+++ b/skills/causal-inference-root-cause/SKILL.md
@@ -0,0 +1,184 @@
+---
+name: causal-inference-root-cause
+description: Use when investigating why something happened and need to distinguish correlation from causation, identify root causes vs symptoms, test competing hypotheses, control for confounding variables, or design experiments to validate causal claims. Invoke when debugging systems, analyzing failures, researching health outcomes, evaluating policy impacts, or when user mentions root cause, causal chain, confounding, spurious correlation, or asks "why did this really happen?"
+---
+
+# Causal Inference & Root Cause Analysis
+
+## Table of Contents
+
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [What is Causal Inference?](#what-is-causal-inference)
+- [Workflow](#workflow)
+  - [1. Define the Effect](#1--define-the-effect)
+  - [2. Generate Hypotheses](#2--generate-hypotheses)
+  - [3. Build Causal Model](#3--build-causal-model)
+  - [4. Test Causality](#4--test-causality)
+  - [5. Document & Validate](#5--document--validate)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Systematically investigate causal relationships to identify true root causes rather than mere correlations or symptoms. This skill helps distinguish genuine causation from spurious associations, test competing explanations, and design interventions that address underlying drivers.
+
+## When to Use This Skill
+
+- Investigating system failures or production incidents
+- Debugging performance issues with multiple potential causes
+- Analyzing why a metric changed (e.g., conversion rate drop)
+- Researching health outcomes or treatment effects
+- Evaluating policy or intervention impacts
+- Distinguishing correlation from causation in data
+- Identifying confounding variables in experiments
+- Tracing symptom back to root cause
+- Testing competing hypotheses about cause-effect relationships
+- Designing experiments to validate causal claims
+- Understanding why a project succeeded or failed
+- Analyzing customer churn or retention drivers
+
+**Trigger phrases:** "root cause", "why did this happen", "causal chain", "correlation vs causation", "confounding", "spurious correlation", "what really caused", "underlying driver"
+
+## What is Causal Inference?
+
+A systematic approach to determine whether X causes Y (not just correlates with Y):
+
+- **Correlation**: X and Y move together (may be coincidental or due to third factor Z)
+- **Causation**: Changing X directly causes change in Y (causal mechanism exists)
+
+**Key Concepts:**
+
+- **Root cause**: The fundamental issue that, if resolved, prevents the problem
+- **Proximate cause**: Immediate trigger (may be symptom, not root)
+- **Confounding variable**: Third factor that causes both X and Y, creating spurious correlation
+- **Counterfactual**: "What would have happened without X?" - the key causal question
+- **Causal mechanism**: The pathway or process through which X affects Y
+
+**Quick Example:**
+
+```markdown
+# Effect: Website conversion rate dropped 30%
+
+## Competing Hypotheses:
+1. New checkout UI is confusing (proximate)
+2. Payment processor latency increased (proximate)
+3. We changed to a cheaper payment processor that's slower (root cause)
+
+## Test:
+- Rollback UI (no change) → UI not cause
+- Check payment logs (confirm latency) → latency is cause
+- Trace to processor change → processor change is root cause
+
+## Counterfactual:
+"If we hadn't switched processors, would conversion have dropped?"
+→ No, conversion was fine with old processor
+
+## Conclusion:
+Root cause = processor switch
+Mechanism = slow checkout → user abandonment
+```
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Root Cause Analysis Progress:
+- [ ] Step 1: Define the effect
+- [ ] Step 2: Generate hypotheses
+- [ ] Step 3: Build causal model
+- [ ] Step 4: Test causality
+- [ ] Step 5: Document and validate
+```
+
+**Step 1: Define the effect**
+
+Describe effect/outcome (what happened, be specific), quantify if possible (magnitude, frequency), establish timeline (when it started, is it ongoing?), determine baseline (what's normal, what changed?), and identify stakeholders (who's impacted, who needs answers?). Key questions: What exactly are we explaining? One-time event or recurring pattern? How do we measure objectively?
+
+**Step 2: Generate hypotheses**
+
+List proximate causes (immediate triggers/symptoms), identify potential root causes (underlying factors), consider confounders (third factors creating spurious associations), and challenge assumptions (what if initial theory wrong?). Techniques: 5 Whys (ask "why" repeatedly), Fishbone diagram (categorize causes), Timeline analysis (what changed before effect?), Differential diagnosis (what else explains symptoms?). For simple investigations → Use `resources/template.md`. For complex problems → Study `resources/methodology.md` for advanced techniques.
+
+**Step 3: Build causal model**
+
+Draw causal chains (A → B → C → Effect), identify necessary vs sufficient causes, map confounding relationships (what influences both cause and effect?), note temporal sequence (cause precedes effect - necessary for causation), and specify mechanisms (HOW X causes Y). Model elements: Direct cause (X → Y), Indirect (X → Z → Y), Confounding (Z → X and Z → Y), Mediating variable (X → M → Y), Moderating variable (X → Y depends on M).
+
+**Step 4: Test causality**
+
+Check temporal sequence (cause before effect?), assess strength of association (strong correlation?), look for dose-response (more cause → more effect?), test counterfactual (what if cause absent/removed?), search for mechanism (explain HOW), check consistency (holds across contexts?), and rule out confounders. Evidence hierarchy: RCT (gold standard) > natural experiment > longitudinal > case-control > cross-sectional > expert opinion. Use Bradford Hill Criteria (9 factors: strength, consistency, specificity, temporality, dose-response, plausibility, coherence, experiment, analogy).
+
+**Step 5: Document and validate**
+
+Create `causal-inference-root-cause.md` with: effect description/quantification, competing hypotheses, causal model (chains, confounders, mechanisms), evidence assessment, root cause(s) with confidence level, recommended tests/interventions, and limitations/alternatives. Validate using `resources/evaluators/rubric_causal_inference_root_cause.json`: verify distinguished proximate from root cause, controlled confounders, explained mechanism, assessed evidence systematically, noted uncertainty, recommended interventions, acknowledged alternatives. Minimum standard: Score ≥ 3.5.
+
+## Common Patterns
+
+**For incident investigation (engineering):**
+- Effect: System outage, performance degradation
+- Hypotheses: Recent deploy, traffic spike, dependency failure, resource exhaustion
+- Model: Timeline + dependency graph + recent changes
+- Test: Logs, metrics, rollback experiments
+- Output: Postmortem with root cause and prevention plan
+
+**For metric changes (product/business):**
+- Effect: Conversion drop, revenue change, user engagement shift
+- Hypotheses: Product changes, seasonality, market shifts, measurement issues
+- Model: User journey + external factors + recent experiments
+- Test: Cohort analysis, A/B test data, segmentation
+- Output: Causal explanation with recommended actions
+
+**For policy evaluation (research/public policy):**
+- Effect: Health outcome, economic indicator, social metric
+- Hypotheses: Policy intervention, confounding factors, secular trends
+- Model: DAG with confounders + mechanisms
+- Test: Difference-in-differences, regression discontinuity, propensity matching
+- Output: Causal effect estimate with confidence intervals
+
+**For debugging (software):**
+- Effect: Bug, unexpected behavior, test failure
+- Hypotheses: Recent changes, edge cases, race conditions, dependency issues
+- Model: Code paths + data flows + timing
+- Test: Reproduce, isolate, binary search, git bisect
+- Output: Bug report with root cause and fix
+
+## Guardrails
+
+**Do:**
+- Distinguish correlation from causation explicitly
+- Generate multiple competing hypotheses (not just confirm first theory)
+- Map out confounding variables and control for them
+- Specify causal mechanisms (HOW X causes Y)
+- Test counterfactuals ("what if X hadn't happened?")
+- State confidence levels and uncertainty
+- Acknowledge alternative explanations
+- Recommend testable interventions based on root cause
+
+**Don't:**
+- Confuse proximate cause with root cause
+- Cherry-pick evidence that confirms initial hypothesis
+- Assume correlation implies causation
+- Ignore confounding variables
+- Skip mechanism explanation (just stating correlation)
+- Overstate confidence without strong evidence
+- Stop at first plausible explanation without testing alternatives
+- Propose interventions without identifying root cause
+
+**Common Pitfalls:**
+- **Post hoc ergo propter hoc**: "After this, therefore because of this" (temporal sequence ≠ causation)
+- **Spurious correlation**: Two things correlate due to third factor or coincidence
+- **Confounding**: Third variable causes both X and Y
+- **Reverse causation**: Y causes X, not X causes Y
+- **Selection bias**: Sample is not representative
+- **Regression to mean**: Extreme values naturally move toward average
+
+## Quick Reference
+
+- **Template**: `resources/template.md` - Structured framework for root cause analysis
+- **Methodology**: `resources/methodology.md` - Advanced techniques (DAGs, confounding control, Bradford Hill criteria)
+- **Quality rubric**: `resources/evaluators/rubric_causal_inference_root_cause.json`
+- **Output file**: `causal-inference-root-cause.md`
+- **Key distinction**: Correlation (X and Y move together) vs. Causation (X → Y mechanism)
+- **Gold standard test**: Randomized controlled trial (eliminates confounding)
+- **Essential criteria**: Temporal sequence (cause before effect), mechanism (how it works), counterfactual (what if cause absent)
--- a/skills/causal-inference-root-cause/resources/evaluators/rubric_causal_inference_root_cause.json
+++ b/skills/causal-inference-root-cause/resources/evaluators/rubric_causal_inference_root_cause.json
@@ -0,0 +1,145 @@
+{
+  "name": "Causal Inference & Root Cause Analysis Quality Rubric",
+  "scale": {
+    "min": 1,
+    "max": 5,
+    "description": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent"
+  },
+  "criteria": [
+    {
+      "name": "Effect Definition Clarity",
+      "description": "Effect/outcome is clearly defined, quantified, and temporally bounded",
+      "scoring": {
+        "1": "Effect vaguely described (e.g., 'things are slow'), no quantification or timeline",
+        "2": "Effect described but lacks quantification or timeline details",
+        "3": "Effect clearly described with either quantification or timeline",
+        "4": "Effect clearly described with quantification and timeline, baseline comparison present",
+        "5": "Effect precisely quantified with magnitude, timeline, baseline, and impact assessment"
+      }
+    },
+    {
+      "name": "Hypothesis Generation",
+      "description": "Multiple competing hypotheses generated systematically (not just confirming first theory)",
+      "scoring": {
+        "1": "Single hypothesis stated without alternatives",
+        "2": "2 hypotheses mentioned, one clearly favored without testing",
+        "3": "3+ hypotheses listed, some testing of alternatives",
+        "4": "Multiple hypotheses systematically generated using techniques (5 Whys, Fishbone, etc.)",
+        "5": "Comprehensive hypothesis generation with proximate/root causes distinguished and confounders identified"
+      }
+    },
+    {
+      "name": "Root Cause Identification",
+      "description": "Distinguishes root cause from proximate causes and symptoms",
+      "scoring": {
+        "1": "Confuses symptom with cause (e.g., 'app crashed because server returned error')",
+        "2": "Identifies proximate cause but claims it as root without deeper investigation",
+        "3": "Distinguishes proximate from root cause, but mechanism unclear",
+        "4": "Clear root cause identified with explanation of why it's root (not symptom)",
+        "5": "Root cause clearly identified with full causal chain from root → proximate → effect"
+      }
+    },
+    {
+      "name": "Causal Model Quality",
+      "description": "Causal relationships mapped with mechanisms, confounders noted",
+      "scoring": {
+        "1": "No causal model, just list of correlations",
+        "2": "Basic cause → effect stated without mechanisms or confounders",
+        "3": "Causal chain sketched, mechanism mentioned but not detailed",
+        "4": "Clear causal chain with mechanisms explained and confounders identified",
+        "5": "Comprehensive causal model with chains, mechanisms, confounders, mediators/moderators mapped"
+      }
+    },
+    {
+      "name": "Temporal Sequence Verification",
+      "description": "Verified that cause precedes effect (necessary for causation)",
+      "scoring": {
+        "1": "No temporal analysis, timeline unclear",
+        "2": "Timeline mentioned but not used to test causation",
+        "3": "Temporal sequence checked for main hypothesis",
+        "4": "Temporal sequence verified for all hypotheses, rules out reverse causation",
+        "5": "Detailed timeline analysis shows cause clearly precedes effect with lag explained"
+      }
+    },
+    {
+      "name": "Counterfactual Testing",
+      "description": "Tests 'what if cause absent?' using control groups, rollbacks, or baseline comparisons",
+      "scoring": {
+        "1": "No counterfactual reasoning",
+        "2": "Counterfactual mentioned but not tested",
+        "3": "Basic counterfactual test (e.g., before/after comparison)",
+        "4": "Strong counterfactual test (e.g., control group, rollback experiment, A/B test)",
+        "5": "Multiple counterfactual tests with consistent results strengthening causal claim"
+      }
+    },
+    {
+      "name": "Mechanism Explanation",
+      "description": "Explains HOW cause produces effect (not just THAT they correlate)",
+      "scoring": {
+        "1": "No mechanism, just correlation stated",
+        "2": "Vague mechanism ('X affects Y somehow')",
+        "3": "Basic mechanism explained ('X causes Y because...')",
+        "4": "Clear mechanism with pathway and intermediate steps",
+        "5": "Detailed mechanism with supporting evidence (logs, metrics, theory) and plausibility assessment"
+      }
+    },
+    {
+      "name": "Confounding Control",
+      "description": "Identifies and controls for confounding variables (third factors causing both X and Y)",
+      "scoring": {
+        "1": "No mention of confounding, assumes correlation = causation",
+        "2": "Aware of confounding but doesn't identify specific confounders",
+        "3": "Identifies 1-2 potential confounders but doesn't control for them",
+        "4": "Identifies confounders and attempts to control (stratification, regression, matching)",
+        "5": "Comprehensive confounder identification with rigorous control methods and sensitivity analysis"
+      }
+    },
+    {
+      "name": "Evidence Quality & Strength",
+      "description": "Uses high-quality evidence (experiments > observational > anecdotes) and assesses strength systematically",
+      "scoring": {
+        "1": "Relies solely on anecdotes or single observations",
+        "2": "Uses weak evidence (cross-sectional correlation) without acknowledging limits",
+        "3": "Uses moderate evidence (longitudinal data, multiple observations)",
+        "4": "Uses strong evidence (quasi-experiments, well-controlled studies) with strength assessed",
+        "5": "Uses highest-quality evidence (RCTs, multiple converging lines of evidence) with Bradford Hill criteria or similar framework"
+      }
+    },
+    {
+      "name": "Confidence & Limitations",
+      "description": "States confidence level with justification, acknowledges alternative explanations and uncertainties",
+      "scoring": {
+        "1": "Overconfident claims without justification, no alternatives considered",
+        "2": "States conclusion without confidence level or uncertainty",
+        "3": "Mentions confidence level and 1 limitation",
+        "4": "States justified confidence level, acknowledges alternatives and key limitations",
+        "5": "Explicit confidence assessment with justification, comprehensive limitations, alternative explanations evaluated, unresolved uncertainties noted"
+      }
+    }
+  ],
+  "overall_assessment": {
+    "thresholds": {
+      "excellent": "Average score ≥ 4.5 (publication-quality causal analysis)",
+      "very_good": "Average score ≥ 4.0 (high-stakes decisions - major product/engineering changes)",
+      "good": "Average score ≥ 3.5 (medium-stakes decisions - feature launches, incident postmortems)",
+      "acceptable": "Average score ≥ 3.0 (low-stakes decisions - exploratory analysis, hypothesis generation)",
+      "needs_rework": "Average score < 3.0 (insufficient for decision-making, redo analysis)"
+    },
+    "stakes_guidance": {
+      "low_stakes": "Exploratory root cause analysis, hypothesis generation: aim for ≥ 3.0",
+      "medium_stakes": "Incident postmortems, feature failure analysis, process improvements: aim for ≥ 3.5",
+      "high_stakes": "Major architectural decisions, safety-critical systems, policy evaluation: aim for ≥ 4.0"
+    }
+  },
+  "common_failure_modes": [
+    "Correlation-causation fallacy: Assuming X causes Y just because they correlate",
+    "Post hoc ergo propter hoc: 'After this, therefore because of this' - temporal sequence ≠ causation",
+    "Stopping at proximate cause: Identifying immediate trigger without tracing to root",
+    "Cherry-picking evidence: Only considering evidence that confirms initial hypothesis",
+    "Ignoring confounders: Not considering third variables that cause both X and Y",
+    "No mechanism: Claiming causation without explaining how X produces Y",
+    "Reverse causation: Assuming X causes Y when actually Y causes X",
+    "Single-case fallacy: Generalizing from one observation without testing consistency"
+  ],
+  "usage_instructions": "Rate each criterion on 1-5 scale. Calculate average. For important decisions (postmortems, product changes), minimum score is 3.5. For high-stakes decisions (infrastructure, safety, policy), aim for ≥4.0. Red flags: score <3 on Temporal Sequence, Counterfactual Testing, or Mechanism Explanation means causal claim is weak. Red flag on Confounding Control means correlation may be spurious."
+}
--- a/skills/causal-inference-root-cause/resources/examples/database-performance-degradation.md
+++ b/skills/causal-inference-root-cause/resources/examples/database-performance-degradation.md
@@ -0,0 +1,675 @@
+# Root Cause Analysis: Database Query Performance Degradation
+
+## Executive Summary
+
+Database query latency increased 10x (p95: 50ms → 500ms) starting March 10th, impacting all API endpoints. Root cause identified as unoptimized database migration that created missing indexes on frequently-queried columns. Confidence: High (95%+). Resolution: Add indexes + optimize query patterns. Time to fix: 2 hours.
+
+---
+
+## 1. Effect Definition
+
+**What happened**: Database query latency increased dramatically across all tables
+
+**Quantification**:
+- **Baseline**: p50: 10ms, p95: 50ms, p99: 100ms (stable for 6 months)
+- **Current**: p50: 100ms, p95: 500ms, p99: 2000ms (10x increase at p95)
+- **Absolute increase**: +450ms at p95
+- **Relative increase**: 900% at p95
+
+**Timeline**:
+- **First observed**: March 10th, 2:15 PM UTC
+- **Duration**: Ongoing (March 10-12, 48 hours elapsed)
+- **Baseline period**: Jan 1 - March 9 (stable)
+- **Degradation start**: Exact timestamp March 10th 14:15:22 UTC
+
+**Impact**:
+- **Users affected**: All users (100% of traffic)
+- **API endpoints affected**: All endpoints (database-dependent)
+- **Severity**: High
+  - 25% of API requests timing out (>5 sec)
+  - User-visible page load delays
+  - Support tickets increased 3x
+  - Estimated revenue impact: $50k/day from abandoned transactions
+
+**Context**:
+- Database: PostgreSQL 14.7, 500GB data
+- Application: REST API (Node.js), 10k req/min
+- Recent changes: Database migration deployed March 10th 2:00 PM
+
+---
+
+## 2. Competing Hypotheses
+
+### Hypothesis 1: Database migration introduced inefficient schema
+
+- **Type**: Root cause candidate
+- **Evidence for**:
+  - **Timing**: Migration deployed March 10 2:00 PM, degradation started 2:15 PM (15 min after)
+  - **Perfect temporal match**: Strongest temporal correlation
+  - **Migration contents**: Added new columns, restructured indexes
+- **Evidence against**:
+  - None - all evidence supports this hypothesis
+
+### Hypothesis 2: Traffic spike overloaded database
+
+- **Type**: Confounder / alternative explanation
+- **Evidence for**:
+  - March 10 is typically high-traffic day (week-end effect)
+- **Evidence against**:
+  - **Traffic unchanged**: Monitoring shows traffic at 10k req/min (same as baseline)
+  - **No traffic spike at 2:15 PM**: Traffic flat throughout March 10
+  - **Previous high-traffic days handled fine**: Traffic has been higher (15k req/min) without issues
+
+### Hypothesis 3: Database server resource exhaustion
+
+- **Type**: Proximate cause / symptom
+- **Evidence for**:
+  - CPU usage increased from 30% → 80% at 2:15 PM
+  - Disk I/O increased from 100 IOPS → 5000 IOPS
+  - Connection pool near saturation (95/100 connections)
+- **Evidence against**:
+  - **These are symptoms, not root**: Something CAUSED the increased resource usage
+  - Resource exhaustion doesn't explain WHY queries became slow
+
+### Hypothesis 4: Slow query introduced by application code change
+
+- **Type**: Proximate cause candidate
+- **Evidence for**:
+  - Application deploy on March 9th (1 day prior)
+- **Evidence against**:
+  - **Timing mismatch**: Deploy was 24 hours before degradation
+  - **No code changes to query logic**: Deploy only changed UI
+  - **Query patterns unchanged**: Same queries, same frequency
+
+### Hypothesis 5: Database server hardware issue
+
+- **Type**: Alternative explanation
+- **Evidence for**:
+  - Occasional disk errors in system logs
+- **Evidence against**:
+  - **Disk errors present before March 10**: Noise, not new
+  - **No hardware alerts**: Monitoring shows no hardware degradation
+  - **Sudden onset**: Hardware failures typically gradual
+
+**Most likely root cause**: Database migration (Hypothesis 1)
+
+**Confounders ruled out**:
+- Traffic unchanged (Hypothesis 2)
+- Application code unchanged (Hypothesis 4)
+- Hardware stable (Hypothesis 5)
+
+**Symptoms identified**:
+- Resource exhaustion (Hypothesis 3) is symptom, not root
+
+---
+
+## 3. Causal Model
+
+### Causal Chain: Root → Proximate → Effect
+
+```
+ROOT CAUSE:
+Database migration removed indexes on user_id + created_at columns
+(March 10, 2:00 PM deployment)
+    ↓ (mechanism: queries now do full table scans instead of index scans)
+
+INTERMEDIATE CAUSE:
+Every query on users table must scan entire table (5M rows)
+instead of using index (10-1000 rows)
+    ↓ (mechanism: table scans require disk I/O, CPU cycles)
+
+PROXIMATE CAUSE:
+Database CPU at 80%, disk I/O at 5000 IOPS (50x increase)
+Query execution time 10x slower
+    ↓ (mechanism: queries queue up, connection pool saturates)
+
+OBSERVED EFFECT:
+API endpoints slow (p95: 500ms vs 50ms baseline)
+25% of requests timeout
+Users experience page load delays
+```
+
+### Why March 10 2:15 PM specifically?
+
+- **Migration deployed**: March 10 2:00 PM
+- **Migration applied**: 2:00-2:10 PM (10 min to run schema changes)
+- **First slow queries**: 2:15 PM (first queries after migration completed)
+- **5-minute lag**: Time for connection pool to cycle and pick up new schema
+
+### Missing Index Details
+
+**Migration removed these indexes**:
+```sql
+-- BEFORE (efficient):
+CREATE INDEX idx_users_user_id_created_at ON users(user_id, created_at);
+CREATE INDEX idx_transactions_user_id ON transactions(user_id);
+
+-- AFTER (inefficient):
+-- (indexes removed by mistake in migration)
+```
+
+**Impact**:
+```sql
+-- Common query pattern:
+SELECT * FROM users WHERE user_id = 123 AND created_at > '2024-01-01';
+
+-- BEFORE (with index): 5ms (index scan, 10 rows)
+-- AFTER (without index): 500ms (full table scan, 5M rows)
+```
+
+### Confounders Ruled Out
+
+**No confounding variables found**:
+- **Traffic**: Controlled (unchanged)
+- **Hardware**: Controlled (stable)
+- **Code**: Controlled (no changes to queries)
+- **External dependencies**: Controlled (no changes)
+
+**Only variable that changed**: Database schema (migration)
+
+---
+
+## 4. Evidence Assessment
+
+### Temporal Sequence: ✓ PASS (5/5)
+
+**Timeline**:
+```
+March 9, 3:00 PM: Application deploy (UI changes only, no queries changed)
+March 10, 2:00 PM: Database migration starts
+March 10, 2:10 PM: Migration completes
+March 10, 2:15 PM: First slow queries logged (p95: 500ms)
+March 10, 2:20 PM: Alerting fires (p95 exceeds 200ms threshold)
+```
+
+**Verdict**: ✓ Cause (migration) clearly precedes effect (slow queries) by 5-15 minutes
+
+**Why 5-minute lag?**
+- Connection pool refresh time
+- Gradual connection cycling to new schema
+- First slow queries at 2:15 PM were from connections that picked up new schema
+
+---
+
+### Strength of Association: ✓ PASS (5/5)
+
+**Correlation strength**: Very strong (r = 0.99)
+
+**Evidence**:
+- **Before migration**: p95 latency stable at 50ms (6 months)
+- **Immediately after migration**: p95 latency jumped to 500ms
+- **10x increase**: Large effect size
+- **100% of queries affected**: All database queries slower, not selective
+
+**Statistical significance**: P < 0.001 (highly significant)
+- Comparing 1000 queries before (mean: 50ms) vs 1000 queries after (mean: 500ms)
+- Effect size: Cohen's d = 5.2 (very large)
+
+---
+
+### Dose-Response: ✓ PASS (4/5)
+
+**Gradient observed**:
+- **Table size vs latency**:
+  - Small tables (<10k rows): 20ms → 50ms (2.5x increase)
+  - Medium tables (100k rows): 50ms → 200ms (4x increase)
+  - Large tables (5M rows): 50ms → 500ms (10x increase)
+
+**Mechanism**: Larger tables → more rows to scan → longer queries
+
+**Interpretation**: Clear dose-response relationship strengthens causation
+
+---
+
+### Counterfactual Test: ✓ PASS (5/5)
+
+**Counterfactual question**: "What if the migration hadn't been deployed?"
+
+**Test 1: Rollback Experiment** (Strongest evidence)
+- **Action**: Rolled back database migration March 11, 9:00 AM
+- **Result**: Latency immediately returned to baseline (p95: 55ms)
+- **Conclusion**: ✓ Migration removal eliminates effect (strong causation)
+
+**Test 2: Control Query**
+- **Tested**: Queries on tables NOT affected by migration (no index changes)
+- **Result**: Latency unchanged (p95: 50ms before and after migration)
+- **Conclusion**: ✓ Only migrated tables affected (specificity)
+
+**Test 3: Historical Comparison**
+- **Baseline period**: Jan-March 9 (no migration), p95: 50ms
+- **Degradation period**: March 10-11 (migration active), p95: 500ms
+- **Post-rollback**: March 11+ (migration reverted), p95: 55ms
+- **Conclusion**: ✓ Pattern strongly implicates migration
+
+**Verdict**: Counterfactual tests strongly confirm causation
+
+---
+
+### Mechanism: ✓ PASS (5/5)
+
+**HOW migration caused slow queries**:
+
+1. **Migration removed indexes**:
+   ```sql
+   -- Migration accidentally dropped these indexes:
+   DROP INDEX idx_users_user_id_created_at;
+   DROP INDEX idx_transactions_user_id;
+   ```
+
+2. **Query planner changed strategy**:
+   ```
+   BEFORE (with index):
+   EXPLAIN SELECT * FROM users WHERE user_id = 123 AND created_at > '2024-01-01';
+   → Index Scan using idx_users_user_id_created_at (cost=0.43..8.45 rows=1)
+
+   AFTER (without index):
+   EXPLAIN SELECT * FROM users WHERE user_id = 123 AND created_at > '2024-01-01';
+   → Seq Scan on users (cost=0.00..112000.00 rows=5000000)
+   ```
+
+3. **Full table scans require disk I/O**:
+   - Index scan: Read 10-1000 rows (1-100 KB) from index + data pages
+   - Full table scan: Read 5M rows (5 GB) from disk
+   - **50x-500x more I/O**
+
+4. **Increased I/O saturates CPU & disk**:
+   - CPU: Scanning rows, filtering predicates (30% → 80%)
+   - Disk: Reading table pages (100 IOPS → 5000 IOPS)
+
+5. **Saturation causes queuing**:
+   - Slow queries → connections held longer
+   - Connection pool saturates (95/100)
+   - New queries wait for available connections
+   - Latency compounds
+
+**Plausibility**: Very high
+- **Established theory**: Index scans vs table scans (well-known database optimization)
+- **Quantifiable impact**: Can calculate I/O difference (50x-500x)
+- **Reproducible**: Same pattern in staging environment
+
+**Supporting evidence**:
+- EXPLAIN ANALYZE output shows table scans post-migration
+- PostgreSQL logs show "sequential scan" warnings
+- Disk I/O metrics show 50x increase correlated with migration
+
+**Verdict**: ✓ Mechanism fully explained with strong supporting evidence
+
+---
+
+### Consistency: ✓ PASS (5/5)
+
+**Relationship holds across contexts**:
+
+1. **All affected tables show same pattern**:
+   - users table: 50ms → 500ms
+   - transactions table: 30ms → 300ms
+   - orders table: 40ms → 400ms
+   - **Consistent 10x degradation**
+
+2. **All query types affected**:
+   - SELECT: 10x slower
+   - JOIN: 10x slower
+   - Aggregations (COUNT, SUM): 10x slower
+
+3. **Consistent across all environments**:
+   - Production: 50ms → 500ms
+   - Staging: 45ms → 450ms (when migration tested)
+   - Dev: 40ms → 400ms
+
+4. **Consistent across time**:
+   - March 10 14:15 - March 11 9:00: p95 at 500ms
+   - Every hour during this period: ~500ms (stable degradation)
+
+5. **Replication**:
+   - Tested in staging: Same migration → same 10x degradation
+   - Rollback in staging: Latency restored
+   - **Reproducible causal relationship**
+
+**Verdict**: ✓ Extremely consistent pattern across tables, query types, environments, and time
+
+---
+
+### Confounding Control: ✓ PASS (4/5)
+
+**Potential confounders identified and ruled out**:
+
+#### Confounder 1: Traffic Spike
+
+**Could traffic spike cause both**:
+- Migration deployment (timing coincidence)
+- Slow queries (overload)
+
+**Ruled out**:
+- Traffic monitoring shows flat 10k req/min (no spike)
+- Even if traffic spiked, wouldn't explain why migration rollback fixed it
+
+#### Confounder 2: Hardware Degradation
+
+**Could hardware issue cause both**:
+- Migration coincidentally deployed during degradation
+- Slow queries (hardware bottleneck)
+
+**Ruled out**:
+- Hardware metrics stable (CPU headroom, no disk errors)
+- Rollback immediately fixed latency (hardware didn't suddenly improve)
+
+#### Confounder 3: Application Code Change
+
+**Could code change cause both**:
+- Buggy queries
+- Migration deployed same time as code
+
+**Ruled out**:
+- Code deploy was March 9 (24 hrs before degradation)
+- No query changes in code deploy (only UI)
+- Rollback of migration (not code) fixed issue
+
+**Controlled variables**:
+- ✓ Traffic (flat during period)
+- ✓ Hardware (stable metrics)
+- ✓ Code (no query changes)
+- ✓ External dependencies (no changes)
+
+**Verdict**: ✓ No confounders found, only migration variable changed
+
+---
+
+### Bradford Hill Criteria: 25/27 (Very Strong)
+
+| Criterion | Score | Justification |
+|-----------|-------|---------------|
+| **1. Strength** | 3/3 | 10x latency increase = very strong association |
+| **2. Consistency** | 3/3 | Consistent across tables, queries, environments, time |
+| **3. Specificity** | 3/3 | Only migrated tables affected; rollback restores; specific cause → specific effect |
+| **4. Temporality** | 3/3 | Migration clearly precedes degradation by 5-15 min |
+| **5. Dose-response** | 3/3 | Larger tables → greater latency increase (clear gradient) |
+| **6. Plausibility** | 3/3 | Index vs table scan theory well-established |
+| **7. Coherence** | 3/3 | Fits database optimization knowledge, no contradictions |
+| **8. Experiment** | 3/3 | Rollback experiment: removing cause eliminates effect |
+| **9. Analogy** | 1/3 | Similar patterns exist (missing indexes → slow queries) but not perfect analogy |
+
+**Total**: 25/27 = **Very strong causal evidence**
+
+**Interpretation**: All criteria met except weak analogy (not critical). Strong case for causation.
+
+---
+
+## 5. Conclusion
+
+### Most Likely Root Cause
+
+**Root cause**: Database migration removed indexes on `user_id` and `created_at` columns, forcing query planner to use full table scans instead of efficient index scans.
+
+**Confidence level**: **High (95%+)**
+
+**Reasoning**:
+1. **Perfect temporal sequence**: Migration (2:00 PM) → degradation (2:15 PM)
+2. **Strong counterfactual test**: Rollback immediately restored performance
+3. **Clear mechanism**: Index scans (fast) → table scans (slow) with 50x-500x more I/O
+4. **Dose-response**: Larger tables show greater degradation
+5. **Consistency**: Pattern holds across all tables, queries, environments
+6. **No confounders**: Traffic, hardware, code all controlled
+7. **Bradford Hill 25/27**: Very strong causal evidence
+8. **Reproducible**: Same effect in staging environment
+
+**Why this is root cause (not symptom)**:
+- **Missing indexes** is the fundamental issue
+- **High CPU/disk I/O** is symptom of table scans
+- **Slow queries** is symptom of high resource usage
+- Fixing missing indexes eliminates all downstream symptoms
+
+**Causal Mechanism**:
+```
+Missing indexes (root)
+    ↓
+Query planner uses table scans (mechanism)
+    ↓
+50x-500x more disk I/O (mechanism)
+    ↓
+CPU & disk saturation (symptom)
+    ↓
+Query queuing, connection pool saturation (symptom)
+    ↓
+10x latency increase (observed effect)
+```
+
+---
+
+### Alternative Explanations (Ruled Out)
+
+#### Alternative 1: Traffic Spike
+
+**Why less likely**:
+- Traffic monitoring shows flat 10k req/min (no spike)
+- Previous traffic spikes (15k req/min) handled without issue
+- Rollback fixed latency without changing traffic
+
+#### Alternative 2: Hardware Degradation
+
+**Why less likely**:
+- Hardware metrics stable (no degradation)
+- Sudden onset inconsistent with hardware failure (usually gradual)
+- Rollback immediately fixed issue (hardware didn't change)
+
+#### Alternative 3: Application Code Bug
+
+**Why less likely**:
+- Code deploy 24 hours before degradation (timing mismatch)
+- No query logic changes in deploy
+- Rollback of migration (not code) fixed issue
+
+---
+
+### Unresolved Questions
+
+1. **Why were indexes removed?**
+   - Migration script error? (likely)
+   - Intentional optimization attempt gone wrong? (possible)
+   - Need to review migration PR and approval process
+
+2. **How did this pass review?**
+   - Were indexes intentionally removed or accidental?
+   - Was migration tested in staging before production?
+   - Need process improvement
+
+3. **Why no pre-deploy testing catch this?**
+   - Staging environment testing missed this
+   - Query performance tests insufficient
+   - Need better pre-deploy validation
+
+---
+
+## 6. Recommended Actions
+
+### Immediate Actions (Address Root Cause)
+
+**1. Re-add missing indexes** (DONE - March 11, 9:00 AM)
+```sql
+CREATE INDEX idx_users_user_id_created_at
+ON users(user_id, created_at);
+
+CREATE INDEX idx_transactions_user_id
+ON transactions(user_id);
+```
+- **Result**: Latency restored to 55ms (within 5ms of baseline)
+- **Time to fix**: 15 minutes (index creation)
+
+**2. Validate index coverage** (IN PROGRESS)
+- Audit all tables for missing indexes
+- Compare production indexes to staging/dev
+- Document expected indexes per table
+
+### Preventive Actions (Process Improvements)
+
+**1. Improve migration review process**
+- **Require EXPLAIN ANALYZE before/after** for all migrations
+- **Staging performance tests mandatory** (query latency benchmarks)
+- **Index change review**: Any index drop requires extra approval
+
+**2. Add pre-deploy validation**
+- Automated query performance regression tests
+- Alert if any query >2x slower in staging
+- Block deployment if performance degrades >20%
+
+**3. Improve monitoring & alerting**
+- Alert on index usage changes (track `pg_stat_user_indexes`)
+- Alert on query plan changes (seq scan warnings)
+- Dashboards for index hit rate, table scan frequency
+
+**4. Database migration checklist**
+- [ ] EXPLAIN ANALYZE on affected queries
+- [ ] Staging performance tests passed
+- [ ] Index usage reviewed
+- [ ] Rollback plan documented
+- [ ] Monitoring in place
+
+### Validation Tests (Confirm Fix)
+
+**1. Performance benchmark** (DONE)
+- **Test**: Run 1000 queries pre-fix vs post-fix
+- **Result**:
+  - Pre-fix (migration): p95 = 500ms
+  - Post-fix (indexes restored): p95 = 55ms
+- **Conclusion**: ✓ Fix successful
+
+**2. Load test** (DONE)
+- **Test**: 15k req/min (1.5x normal traffic)
+- **Result**: p95 = 60ms (acceptable, <10% degradation)
+- **Conclusion**: ✓ System can handle load with indexes
+
+**3. Index usage monitoring** (ONGOING)
+- **Metrics**: `pg_stat_user_indexes` shows indexes being used
+- **Query plans**: EXPLAIN shows index scans (not seq scans)
+- **Conclusion**: ✓ Indexes actively used
+
+---
+
+### Success Criteria
+
+**Performance restored**:
+- [x] p95 latency <100ms (achieved: 55ms)
+- [x] p99 latency <200ms (achieved: 120ms)
+- [x] CPU usage <50% (achieved: 35%)
+- [x] Disk I/O <500 IOPS (achieved: 150 IOPS)
+- [x] Connection pool utilization <70% (achieved: 45%)
+
+**User impact resolved**:
+- [x] Timeout rate <1% (achieved: 0.2%)
+- [x] Support tickets normalized (dropped 80%)
+- [x] Page load times back to normal
+
+**Process improvements**:
+- [x] Migration checklist created
+- [x] Performance regression tests added to CI/CD
+- [ ] Post-mortem doc written (IN PROGRESS)
+- [ ] Team training on index optimization (SCHEDULED)
+
+---
+
+## 7. Limitations
+
+### Data Limitations
+
+**Missing data**:
+- **No query performance baselines in staging**: Can't compare staging pre/post migration
+- **Limited historical index usage data**: `pg_stat_user_indexes` only has 7 days retention
+- **No migration testing logs**: Can't determine if migration was tested in staging
+
+**Measurement limitations**:
+- **Latency measured at application layer**: Database-internal latency not tracked separately
+- **No per-query latency breakdown**: Can't isolate which specific queries most affected
+
+### Analysis Limitations
+
+**Assumptions**:
+- **Assumed connection pool refresh time**: Estimated 5 min for connections to cycle to new schema (not measured)
+- **Didn't test other potential optimizations**: Only tested rollback, not alternative fixes (e.g., query rewriting)
+
+**Alternative fixes not explored**:
+- Could queries be rewritten to work without indexes? (possible but not investigated)
+- Could connection pool be increased? (wouldn't fix root cause)
+
+### Generalizability
+
+**Context-specific**:
+- This analysis applies to PostgreSQL databases with similar query patterns
+- May not apply to other database systems (MySQL, MongoDB, etc.) with different query optimizers
+- Specific to tables with millions of rows (small tables less affected)
+
+**Lessons learned**:
+- Index removal can cause 10x+ performance degradation for large tables
+- Migration testing in staging must include performance benchmarks
+- Rollback plans essential for database schema changes
+
+---
+
+## 8. Meta: Analysis Quality Self-Assessment
+
+Using `rubric_causal_inference_root_cause.json`:
+
+### Scores:
+
+1. **Effect Definition Clarity**: 5/5 (precise quantification, timeline, baseline, impact)
+2. **Hypothesis Generation**: 5/5 (5 hypotheses, systematic evaluation)
+3. **Root Cause Identification**: 5/5 (root vs proximate distinguished, causal chain clear)
+4. **Causal Model Quality**: 5/5 (full chain, mechanisms, confounders noted)
+5. **Temporal Sequence Verification**: 5/5 (detailed timeline, lag explained)
+6. **Counterfactual Testing**: 5/5 (rollback experiment + control queries)
+7. **Mechanism Explanation**: 5/5 (detailed mechanism with EXPLAIN output evidence)
+8. **Confounding Control**: 4/5 (identified and ruled out major confounders, comprehensive)
+9. **Evidence Quality & Strength**: 5/5 (quasi-experiment via rollback, Bradford Hill 25/27)
+10. **Confidence & Limitations**: 5/5 (explicit confidence, limitations, alternatives evaluated)
+
+**Average**: 4.9/5 - **Excellent** (publication-quality analysis)
+
+**Assessment**: This root cause analysis exceeds standards for high-stakes engineering decisions. Strong evidence across all criteria, particularly counterfactual testing (rollback experiment) and mechanism explanation (query plans). Appropriate for postmortem documentation and process improvement decisions.
+
+---
+
+## Appendix: Supporting Evidence
+
+### A. Query Plans Before/After
+
+**BEFORE (with index)**:
+```sql
+EXPLAIN ANALYZE SELECT * FROM users WHERE user_id = 123 AND created_at > '2024-01-01';
+
+Index Scan using idx_users_user_id_created_at on users
+    (cost=0.43..8.45 rows=1 width=152)
+    (actual time=0.025..0.030 rows=1 loops=1)
+Index Cond: ((user_id = 123) AND (created_at > '2024-01-01'::date))
+Planning Time: 0.112 ms
+Execution Time: 0.052 ms  ← Fast
+```
+
+**AFTER (without index)**:
+```sql
+EXPLAIN ANALYZE SELECT * FROM users WHERE user_id = 123 AND created_at > '2024-01-01';
+
+Seq Scan on users
+    (cost=0.00..112000.00 rows=5000000 width=152)
+    (actual time=0.025..485.234 rows=1 loops=1)
+Filter: ((user_id = 123) AND (created_at > '2024-01-01'::date))
+Rows Removed by Filter: 4999999
+Planning Time: 0.108 ms
+Execution Time: 485.267 ms  ← 10,000x slower
+```
+
+### B. Monitoring Metrics
+
+**Latency (p95)**:
+- March 9: 50ms (stable)
+- March 10 14:00: 50ms (pre-migration)
+- March 10 14:15: 500ms (post-migration) ← **10x jump**
+- March 11 09:00: 550ms (still degraded)
+- March 11 09:15: 55ms (rollback restored)
+
+**Database CPU**:
+- Baseline: 30%
+- March 10 14:15: 80% ← Spike at migration time
+- March 11 09:15: 35% (rollback restored)
+
+**Disk I/O (IOPS)**:
+- Baseline: 100 IOPS
+- March 10 14:15: 5000 IOPS ← 50x increase
+- March 11 09:15: 150 IOPS (rollback restored)
--- a/skills/causal-inference-root-cause/resources/methodology.md
+++ b/skills/causal-inference-root-cause/resources/methodology.md
@@ -0,0 +1,697 @@
+# Causal Inference & Root Cause Analysis Methodology
+
+## Root Cause Analysis Workflow
+
+Copy this checklist and track your progress:
+
+```
+Root Cause Analysis Progress:
+- [ ] Step 1: Generate hypotheses using structured techniques
+- [ ] Step 2: Build causal model and distinguish cause types
+- [ ] Step 3: Gather evidence and assess temporal sequence
+- [ ] Step 4: Test causality with Bradford Hill criteria
+- [ ] Step 5: Verify root cause and check coherence
+```
+
+**Step 1: Generate hypotheses using structured techniques**
+
+Use 5 Whys, Fishbone diagrams, timeline analysis, or differential diagnosis to systematically generate potential causes. See [Hypothesis Generation Techniques](#hypothesis-generation-techniques) for detailed methods.
+
+**Step 2: Build causal model and distinguish cause types**
+
+Map causal chains to distinguish root causes from proximate causes, symptoms, and confounders. See [Causal Model Building](#causal-model-building) for cause type definitions and chain mapping.
+
+**Step 3: Gather evidence and assess temporal sequence**
+
+Collect evidence for each hypothesis and verify temporal relationships (cause must precede effect). See [Evidence Assessment Methods](#evidence-assessment-methods) for essential tests and evidence hierarchy.
+
+**Step 4: Test causality with Bradford Hill criteria**
+
+Score hypotheses using the 9 Bradford Hill criteria (strength, consistency, specificity, temporality, dose-response, plausibility, coherence, experiment, analogy). See [Bradford Hill Criteria](#bradford-hill-criteria) for scoring rubric.
+
+**Step 5: Verify root cause and check coherence**
+
+Ensure the identified root cause has strong evidence, fits with known facts, and addresses systemic issues. See [Quality Checklist](#quality-checklist) and [Worked Example](#worked-example-website-conversion-drop) for validation techniques.
+
+---
+
+## Hypothesis Generation Techniques
+
+### 5 Whys (Trace Back to Root)
+
+Start with the effect and ask "why" repeatedly until you reach the root cause.
+
+**Process:**
+```
+Effect: {Observed problem}
+Why? → {Proximate cause 1}
+Why? → {Proximate cause 2}
+Why? → {Intermediate cause}
+Why? → {Deeper cause}
+Why? → {Root cause}
+```
+
+**Example:**
+```
+Effect: Server crashed
+Why? → Out of memory
+Why? → Memory leak in new code
+Why? → Connection pool not releasing connections
+Why? → Error handling missing in new feature
+Why? → Code review process didn't catch this edge case (ROOT CAUSE)
+```
+
+**When to stop**: When you reach a cause that, if fixed, would prevent recurrence.
+
+**Pitfalls to avoid**:
+- Stopping too early at proximate causes
+- Following only one path (should explore multiple branches)
+- Accepting vague answers ("because it broke")
+
+---
+
+### Fishbone Diagram (Categorize Causes)
+
+Organize potential causes by category to ensure comprehensive coverage.
+
+**Categories (6 Ms)**:
+```
+Methods         Machines        Materials
+   |               |               |
+   |               |               |
+   └───────────────┴───────────────┴─────────→ Effect
+   |               |               |
+   |               |               |
+Manpower      Measurement    Environment
+```
+
+**Adapted for software/product**:
+- **People**: Skills, knowledge, communication, errors
+- **Process**: Procedures, workflows, standards, review
+- **Technology**: Tools, systems, infrastructure, dependencies
+- **Environment**: External factors, timing, context
+
+**Example (API outage)**:
+- **People**: On-call engineer inexperienced with database
+- **Process**: No rollback procedure documented
+- **Technology**: Database connection pooling misconfigured
+- **Environment**: Traffic spike coincided with deploy
+
+---
+
+### Timeline Analysis (What Changed?)
+
+Map events leading up to effect to identify temporal correlations.
+
+**Process**:
+1. Establish baseline period (normal operation)
+2. Mark when effect first observed
+3. List all changes in days/hours leading up to effect
+4. Identify changes that temporally precede effect
+
+**Example**:
+```
+T-7 days: Normal operation (baseline)
+T-3 days: Deployed new payment processor integration
+T-2 days: Traffic increased 20%
+T-1 day: First error logged (dismissed as fluke)
+T-0 (14:00): Conversion rate dropped 30%
+T+1 hour: Alert fired, investigation started
+```
+
+**Look for**: Changes, anomalies, or events right before effect (temporal precedence).
+
+**Red flags**:
+- Multiple changes at once (hard to isolate cause)
+- Long lag between change and effect (harder to connect)
+- No changes before effect (may be external factor or accumulated technical debt)
+
+---
+
+### Differential Diagnosis
+
+List all conditions/causes that could produce the observed symptoms.
+
+**Process**:
+1. List all possible causes (be comprehensive)
+2. For each cause, list symptoms it would produce
+3. Compare predicted symptoms to observed symptoms
+4. Eliminate causes that don't match
+
+**Example (API returning 500 errors)**:
+
+| Cause | Predicted Symptoms | Observed? |
+|-------|-------------------|-----------|
+| Code bug (logic error) | Specific endpoints fail, reproducible | ✓ Yes |
+| Resource exhaustion (memory) | All endpoints slow, CPU/memory high | ✗ CPU normal |
+| Dependency failure (database) | All database queries fail, connection errors | ✗ DB responsive |
+| Configuration error | Service won't start or immediate failure | ✗ Started fine |
+| DDoS attack | High traffic, distributed sources | ✗ Traffic normal |
+
+**Conclusion**: Code bug most likely (matches symptoms, others ruled out).
+
+---
+
+## Causal Model Building
+
+### Distinguish Cause Types
+
+#### 1. Root Cause
+- **Definition**: Fundamental underlying issue
+- **Test**: If fixed, problem doesn't recur
+- **Usually**: Structural, procedural, or systemic
+- **Example**: "No code review process for infrastructure changes"
+
+#### 2. Proximate Cause
+- **Definition**: Immediate trigger
+- **Test**: Directly precedes effect
+- **Often**: Symptom of deeper root cause
+- **Example**: "Deployed buggy config file"
+
+#### 3. Contributing Factor
+- **Definition**: Makes effect more likely or severe
+- **Test**: Not sufficient alone, but amplifies
+- **Often**: Context or conditions
+- **Example**: "High traffic amplified impact of bug"
+
+#### 4. Coincidence
+- **Definition**: Happened around same time, no causal relationship
+- **Test**: No mechanism, doesn't pass counterfactual
+- **Example**: "Marketing campaign launched same day" (but unrelated to technical issue)
+
+---
+
+### Map Causal Chains
+
+#### Linear Chain
+```
+Root Cause
+    ↓ (mechanism)
+Intermediate Cause
+    ↓ (mechanism)
+Proximate Cause
+    ↓ (mechanism)
+Observed Effect
+```
+
+**Example**:
+```
+No SSL cert renewal process (Root)
+    ↓ (no automated reminders)
+Cert expires unnoticed (Intermediate)
+    ↓ (HTTPS handshake fails)
+HTTPS connections fail (Proximate)
+    ↓ (users can't connect)
+Users can't access site (Effect)
+```
+
+#### Branching Chain (Multiple Paths)
+```
+        Root Cause
+           ↙    ↘
+    Path A      Path B
+       ↓          ↓
+    Effect A   Effect B
+```
+
+**Example**:
+```
+    Database migration error (Root)
+           ↙              ↘
+  Missing indexes    Schema mismatch
+       ↓                    ↓
+  Slow queries       Type errors
+```
+
+#### Common Cause (Confounding)
+```
+    Confounder (Z)
+      ↙    ↘
+Variable X   Variable Y
+```
+X and Y correlate because Z causes both, but X doesn't cause Y.
+
+**Example**:
+```
+    Hot weather (Confounder)
+      ↙              ↘
+Ice cream sales   Drownings
+```
+Ice cream doesn't cause drownings; both caused by hot weather/summer season.
+
+---
+
+## Evidence Assessment Methods
+
+### Essential Tests
+
+#### 1. Temporal Sequence (Must Pass)
+**Rule**: Cause MUST precede effect. If effect happened before "cause," it's not causal.
+
+**How to verify**:
+- Create detailed timeline
+- Mark exact timestamps of cause and effect
+- Calculate lag (effect should follow cause)
+
+**Example**:
+- Migration deployed: March 10, 2:00 PM
+- Queries slowed: March 10, 2:15 PM
+- ✓ Cause precedes effect by 15 minutes
+
+#### 2. Counterfactual (Strong Evidence)
+**Question**: "What would have happened without the cause?"
+
+**How to test**:
+- **Control group**: Compare cases with cause vs without
+- **Rollback**: Remove cause, see if effect disappears
+- **Baseline comparison**: Compare to period before cause
+- **A/B test**: Random assignment of cause
+
+**Example**:
+- Hypothesis: New feature X causes conversion drop
+- Test: A/B test with X enabled vs disabled
+- Result: No conversion difference → X not the cause
+
+#### 3. Mechanism (Plausibility Test)
+**Question**: Can you explain HOW cause produces effect?
+
+**Requirements**:
+- Pathway from cause to effect
+- Each step makes sense
+- Supported by theory or evidence
+
+**Example - Strong**:
+- Cause: Increased checkout latency (5 sec)
+- Mechanism: Users abandon slow pages
+- Evidence: Industry data shows 7+ sec → 20% abandon
+- ✓ Clear, plausible mechanism
+
+**Example - Weak**:
+- Cause: Moon phase (full moon)
+- Mechanism: ??? (no plausible pathway)
+- Effect: Website traffic increase
+- ✗ No mechanism, likely spurious correlation
+
+---
+
+### Evidence Hierarchy
+
+**Strongest Evidence** (Gold Standard):
+
+1. **Randomized Controlled Trial (RCT)**
+   - Random assignment eliminates confounding
+   - Compare treatment vs control group
+   - Example: A/B test with random user assignment
+
+**Strong Evidence**:
+
+2. **Quasi-Experiment / Natural Experiment**
+   - Some random-like assignment (not perfect)
+   - Example: Policy implemented in one region, not another
+   - Control for confounders statistically
+
+**Medium Evidence**:
+
+3. **Longitudinal Study (Before/After)**
+   - Track same subjects over time
+   - Controls for individual differences
+   - Example: Metric before and after change
+
+4. **Well-Controlled Observational Study**
+   - Statistical controls for confounders
+   - Multiple variables measured
+   - Example: Regression analysis with covariates
+
+**Weak Evidence**:
+
+5. **Cross-Sectional Correlation**
+   - Single point in time
+   - Can't establish temporal order
+   - Example: Survey at one moment
+
+6. **Anecdote / Single Case**
+   - May not generalize
+   - Many confounders
+   - Example: One user complaint
+
+**Recommendation**: For high-stakes decisions, aim for quasi-experiment or better.
+
+---
+
+### Bradford Hill Criteria
+
+Nine factors that strengthen causal claims. Score each on 1-3 scale.
+
+#### 1. Strength
+**Question**: How strong is the association?
+
+- **3**: Very strong correlation (r > 0.7, OR > 10)
+- **2**: Moderate correlation (r = 0.3-0.7, OR = 2-10)
+- **1**: Weak correlation (r < 0.3, OR < 2)
+
+**Example**: 10x latency increase = strong association (3/3)
+
+#### 2. Consistency
+**Question**: Does relationship replicate across contexts?
+
+- **3**: Always consistent (multiple studies, settings)
+- **2**: Mostly consistent (some exceptions)
+- **1**: Rarely consistent (contradictory results)
+
+**Example**: Latency affects conversion on all pages, devices, countries = consistent (3/3)
+
+#### 3. Specificity
+**Question**: Is cause-effect relationship specific?
+
+- **3**: Specific cause → specific effect
+- **2**: Somewhat specific (some other effects)
+- **1**: General/non-specific (cause has many effects)
+
+**Example**: Missing index affects only queries on that table = specific (3/3)
+
+#### 4. Temporality
+**Question**: Does cause precede effect?
+
+- **3**: Always (clear temporal sequence)
+- **2**: Usually (mostly precedes)
+- **1**: Unclear or reverse causation possible
+
+**Example**: Migration (2:00 PM) before slow queries (2:15 PM) = always (3/3)
+
+#### 5. Dose-Response
+**Question**: Does more cause → more effect?
+
+- **3**: Clear gradient (linear or monotonic)
+- **2**: Some gradient (mostly true)
+- **1**: No gradient (flat or random)
+
+**Example**: Larger tables → slower queries = clear gradient (3/3)
+
+#### 6. Plausibility
+**Question**: Does mechanism make sense?
+
+- **3**: Very plausible (well-established theory)
+- **2**: Somewhat plausible (reasonable)
+- **1**: Implausible (contradicts theory)
+
+**Example**: Index scans faster than table scans = well-established (3/3)
+
+#### 7. Coherence
+**Question**: Fits with existing knowledge?
+
+- **3**: Fits well (no contradictions)
+- **2**: Somewhat fits (minor contradictions)
+- **1**: Contradicts existing knowledge
+
+**Example**: Aligns with database optimization theory = fits well (3/3)
+
+#### 8. Experiment
+**Question**: Does intervention change outcome?
+
+- **3**: Strong experimental evidence (RCT)
+- **2**: Some experimental evidence (quasi)
+- **1**: No experimental evidence (observational only)
+
+**Example**: Rollback restored performance = strong experiment (3/3)
+
+#### 9. Analogy
+**Question**: Similar cause-effect patterns exist?
+
+- **3**: Strong analogies (many similar cases)
+- **2**: Some analogies (a few similar)
+- **1**: No analogies (unique case)
+
+**Example**: Similar patterns in other databases = some analogies (2/3)
+
+**Scoring**:
+- **Total 18-27**: Strong causal evidence
+- **Total 12-17**: Moderate evidence
+- **Total < 12**: Weak evidence (need more investigation)
+
+---
+
+## Worked Example: Website Conversion Drop
+
+### Problem
+
+Website conversion rate dropped from 5% to 3% (40% relative drop) starting November 15th.
+
+---
+
+### 1. Effect Definition
+
+**Effect**: Conversion rate dropped 40% (5% → 3%)
+
+**Quantification**:
+- Baseline: 5% (stable for 6 months prior)
+- Current: 3% (observed for 2 weeks)
+- Absolute drop: 2 percentage points
+- Relative drop: 40%
+- Impact: ~$100k/week revenue loss
+
+**Timeline**:
+- Last normal day: November 14th (5.1% conversion)
+- First drop observed: November 15th (3.2% conversion)
+- Ongoing: Yes (still at 3% as of November 29th)
+
+**Impact**:
+- 10,000 daily visitors
+- 500 conversions → 300 conversions
+- $200 average order value
+- Loss: 200 conversions × $200 = $40k/day
+
+---
+
+### 2. Competing Hypotheses (Using Multiple Techniques)
+
+#### Using 5 Whys:
+```
+Effect: Conversion dropped 40%
+Why? → Users abandoning checkout
+Why? → Checkout takes too long
+Why? → Payment processor is slow
+Why? → New processor has higher latency
+Why? → Switched to cheaper processor (ROOT)
+```
+
+#### Using Fishbone Diagram:
+
+**Technology**:
+- New payment processor (Nov 10)
+- New checkout UI (Nov 14)
+
+**Process**:
+- Lack of A/B testing
+- No performance monitoring
+
+**Environment**:
+- Seasonal traffic shift (holiday season)
+
+**People**:
+- User behavior changes?
+
+#### Using Timeline Analysis:
+```
+Nov 10: Switched to new payment processor
+Nov 14: Deployed new checkout UI
+Nov 15: Conversion drop observed (2:00 AM)
+```
+
+#### Competing Hypotheses Generated:
+
+**Hypothesis 1: New checkout UI** (deployed Nov 14)
+- Type: Proximate cause
+- Evidence for: Timing matches (Nov 14 → Nov 15)
+- Evidence against: A/B test showed no difference; Rollback didn't fix
+
+**Hypothesis 2: Payment processor switch** (Nov 10)
+- Type: Root cause candidate
+- Evidence for: New processor slower (400ms vs 100ms); Timing precedes
+- Evidence against: 5-day lag (why not immediate?)
+
+**Hypothesis 3: Payment latency increase** (Nov 15)
+- Type: Proximate cause/symptom
+- Evidence for: Logs show 5-8 sec checkout (was 2-3 sec); User complaints
+- Evidence against: None (strong evidence)
+
+**Hypothesis 4: Seasonal traffic shift**
+- Type: Confounder
+- Evidence for: Holiday season
+- Evidence against: Traffic composition unchanged
+
+---
+
+### 3. Causal Model
+
+#### Causal Chain:
+```
+ROOT: Switched to cheaper payment processor (Nov 10)
+    ↓ (mechanism: lower throughput processor)
+INTERMEDIATE: Payment API latency under load
+    ↓ (mechanism: slow API → page delays)
+PROXIMATE: Checkout takes 5-8 seconds (Nov 15+)
+    ↓ (mechanism: users abandon slow pages)
+EFFECT: Conversion drops 40%
+```
+
+#### Why Nov 15 lag?
+- Nov 10-14: Weekday traffic (low, processor handled it)
+- Nov 15: Weekend traffic spike (2x normal)
+- Processor couldn't handle weekend load → latency spike
+
+#### Confounders Ruled Out:
+- UI change: Rollback had no effect
+- Seasonality: Traffic patterns unchanged
+- Marketing: No changes
+
+---
+
+### 4. Evidence Assessment
+
+**Temporal Sequence**: ✓ PASS (3/3)
+- Processor switch (Nov 10) → Latency (Nov 15) → Drop (Nov 15)
+- Clear precedence
+
+**Counterfactual**: ✓ PASS (3/3)
+- Test: Switched back to old processor
+- Result: Conversion recovered to 4.8%
+- Strong evidence
+
+**Mechanism**: ✓ PASS (3/3)
+- Slow processor (400ms) → High load → 5-8 sec checkout → User abandonment
+- Industry data: 7+ sec = 20% abandon
+- Clear, plausible mechanism
+
+**Dose-Response**: ✓ PASS (3/3)
+| Checkout Time | Conversion |
+|---------------|------------|
+| <3 sec | 5% |
+| 3-5 sec | 4% |
+| 5-7 sec | 3% |
+| >7 sec | 2% |
+
+Clear gradient
+
+**Consistency**: ✓ PASS (3/3)
+- All devices (mobile, desktop, tablet)
+- All payment methods
+- All countries
+- Consistent pattern
+
+**Bradford Hill Score: 24/27** (Very Strong)
+1. Strength: 3 (40% drop)
+2. Consistency: 3 (all segments)
+3. Specificity: 2 (latency affects other things)
+4. Temporality: 3 (clear sequence)
+5. Dose-response: 3 (gradient)
+6. Plausibility: 3 (well-known)
+7. Coherence: 3 (fits theory)
+8. Experiment: 3 (rollback test)
+9. Analogy: 2 (some similar cases)
+
+---
+
+### 5. Conclusion
+
+**Root Cause**: Switched to cheaper payment processor with insufficient throughput for weekend traffic.
+
+**Confidence**: High (95%+)
+
+**Why high confidence**:
+- Perfect temporal sequence
+- Strong counterfactual (rollback restored)
+- Clear mechanism
+- Dose-response present
+- Consistent across contexts
+- No confounders
+- Bradford Hill 24/27
+
+**Why root (not symptom)**:
+- Processor switch is decision that created problem
+- Latency is symptom of this decision
+- Fixing latency alone treats symptom
+- Reverting switch eliminates problem
+
+---
+
+### 6. Recommended Actions
+
+**Immediate**:
+1. ✓ Reverted to old processor (Nov 28)
+2. Negotiate better rates with old processor
+
+**If keeping new processor**:
+1. Add caching layer (reduce latency <2 sec)
+2. Implement async checkout
+3. Load test at 3x peak
+
+**Validation**:
+- A/B test: Old processor vs new+caching
+- Monitor: Latency p95, conversion rate
+- Success: Conversion >4.5%, latency <2 sec
+
+---
+
+### 7. Limitations
+
+**Data limitations**:
+- No abandonment reason tracking
+- Processor metrics limited (black box)
+
+**Analysis limitations**:
+- Didn't test all latency optimizations
+- Small sample for some device types
+
+**Generalizability**:
+- Specific to our checkout flow
+- May not apply to other traffic patterns
+
+---
+
+## Quality Checklist
+
+Before finalizing root cause analysis, verify:
+
+**Effect Definition**:
+- [ ] Effect clearly described (not vague)
+- [ ] Quantified with magnitude
+- [ ] Timeline established (when started, duration)
+- [ ] Baseline for comparison stated
+
+**Hypothesis Generation**:
+- [ ] Multiple hypotheses generated (3+ competing)
+- [ ] Used systematic techniques (5 Whys, Fishbone, Timeline, Differential)
+- [ ] Proximate vs root distinguished
+- [ ] Confounders identified
+
+**Causal Model**:
+- [ ] Causal chain mapped (root → proximate → effect)
+- [ ] Mechanisms explained for each link
+- [ ] Confounding relationships noted
+- [ ] Direct vs indirect causation clear
+
+**Evidence Assessment**:
+- [ ] Temporal sequence verified (cause before effect)
+- [ ] Counterfactual tested (what if cause absent)
+- [ ] Mechanism explained with supporting evidence
+- [ ] Dose-response checked (more cause → more effect)
+- [ ] Consistency assessed (holds across contexts)
+- [ ] Confounders ruled out or controlled
+
+**Conclusion**:
+- [ ] Root cause clearly stated
+- [ ] Confidence level stated with justification
+- [ ] Distinguished from proximate causes/symptoms
+- [ ] Alternative explanations acknowledged
+- [ ] Limitations noted
+
+**Recommendations**:
+- [ ] Address root cause (not just symptoms)
+- [ ] Actionable interventions proposed
+- [ ] Validation tests specified
+- [ ] Success metrics defined
+
+**Minimum Standards**:
+- For medium-stakes (postmortems, feature issues): Score ≥ 3.5 on rubric
+- For high-stakes (infrastructure, safety): Score ≥ 4.0 on rubric
+- Red flags: <3 on Temporal Sequence, Counterfactual, or Mechanism = weak causal claim
--- a/skills/causal-inference-root-cause/resources/template.md
+++ b/skills/causal-inference-root-cause/resources/template.md
@@ -0,0 +1,265 @@
+# Causal Inference & Root Cause Analysis Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Root Cause Analysis Progress:
+- [ ] Step 1: Define effect with quantification and timeline
+- [ ] Step 2: Generate competing hypotheses systematically
+- [ ] Step 3: Build causal model with mechanisms
+- [ ] Step 4: Assess evidence using essential tests
+- [ ] Step 5: Document conclusion with confidence level
+```
+
+**Step 1: Define effect with quantification and timeline**
+
+Describe what happened specifically, quantify magnitude (baseline vs current, change), establish timeline (first observed, duration, context), and identify impact (who's affected, severity, business impact). Use [Quick Template](#quick-template) section 1.
+
+**Step 2: Generate competing hypotheses systematically**
+
+List 3-5 potential causes, classify each as root/proximate/symptom, note evidence for/against each hypothesis, and identify potential confounders (third factors causing both X and Y). Use techniques from [Section-by-Section Guidance](#section-by-section-guidance): 5 Whys, Fishbone Diagram, Timeline Analysis.
+
+**Step 3: Build causal model with mechanisms**
+
+Draw causal chain (Root → Intermediate → Proximate → Effect) with mechanisms explaining HOW each step leads to next, and rule out confounders (distinguish Z causing both X and Y from X causing Y). See [Causal Model](#3-causal-model) template structure.
+
+**Step 4: Assess evidence using essential tests**
+
+Check temporal sequence (cause before effect?), test counterfactual (what if cause absent?), explain mechanism (HOW does it work?), look for dose-response (more cause → more effect?), verify consistency (holds across contexts?), and control for confounding. See [Evidence Assessment](#4-evidence-assessment) and [Section-by-Section Guidance](#4-evidence-assessment-1).
+
+**Step 5: Document conclusion with confidence level**
+
+State root cause with confidence level (High/Medium/Low) and justification, explain complete causal mechanism, clarify why this is root (not symptom), note alternative explanations and why less likely, recommend interventions addressing root cause, propose validation tests, and acknowledge limitations. Use [Quality Checklist](#quality-checklist) to verify completeness.
+
+## Quick Template
+
+```markdown
+# Root Cause Analysis: {Effect/Problem}
+
+## 1. Effect Definition
+
+**What happened**: {Clear description of the effect}
+
+**Quantification**: {Magnitude, frequency, impact}
+- Baseline: {Normal state}
+- Current: {Observed state}
+- Change: {Absolute and relative change}
+
+**Timeline**:
+- First observed: {Date/time}
+- Duration: {Ongoing/resolved}
+- Context: {What else was happening}
+
+**Impact**:
+- Who's affected: {Users, systems, metrics}
+- Severity: {High/Medium/Low}
+- Business impact: {Revenue, users, reputation}
+
+---
+
+## 2. Competing Hypotheses
+
+Generate 3-5 potential causes systematically:
+
+### Hypothesis 1: {Potential cause}
+- **Type**: Root cause / Proximate cause / Symptom
+- **Evidence for**: {Supporting data, observations}
+- **Evidence against**: {Contradicting data}
+
+### Hypothesis 2: {Potential cause}
+- **Type**: Root cause / Proximate cause / Symptom
+- **Evidence for**: {Supporting data}
+- **Evidence against**: {Contradicting data}
+
+### Hypothesis 3: {Potential cause}
+- **Type**: Root cause / Proximate cause / Symptom
+- **Evidence for**: {Supporting data}
+- **Evidence against**: {Contradicting data}
+
+**Potential confounders**: {Third factors that might cause both X and Y}
+
+---
+
+## 3. Causal Model
+
+### Causal Chain (Root → Proximate → Effect)
+
+```
+{Root Cause}
+    ↓ {mechanism: how does root cause lead to next step?}
+{Intermediate Cause}
+    ↓ {mechanism: how does this lead to proximate?}
+{Proximate Cause}
+    ↓ {mechanism: how does this produce observed effect?}
+{Observed Effect}
+```
+
+**Mechanisms**: {Explain HOW each step leads to the next}
+
+**Confounders ruled out**: {Z causes both X and Y, but X doesn't cause Y}
+
+---
+
+## 4. Evidence Assessment
+
+### Temporal Sequence
+- [ ] Does cause precede effect? {Yes/No}
+- [ ] Timeline: {Cause at time T, effect at time T+lag}
+
+### Counterfactual Test
+- [ ] "What if cause hadn't occurred?" → {Expected outcome}
+- [ ] Evidence: {Control group, rollback, baseline comparison}
+
+### Mechanism
+- [ ] Can we explain HOW cause produces effect? {Yes/No}
+- [ ] Mechanism: {Detailed explanation with supporting evidence}
+
+### Dose-Response
+- [ ] More cause → more effect? {Yes/No/Unknown}
+- [ ] Evidence: {Examples of gradient}
+
+### Consistency
+- [ ] Does relationship hold across contexts? {Yes/No}
+- [ ] Evidence: {Different times, places, populations}
+
+### Confounding Control
+- [ ] Identified confounders: {List third variables}
+- [ ] Controlled for: {How confounders were ruled out}
+
+---
+
+## 5. Conclusion
+
+### Most Likely Root Cause
+
+**Root cause**: {Identified root cause}
+
+**Confidence level**: {High/Medium/Low}
+
+**Justification**: {Why this confidence level based on evidence}
+
+**Causal mechanism**: {Complete chain from root cause → effect}
+
+**Why this is root cause (not symptom)**: {If fixed, problem wouldn't recur}
+
+### Alternative Explanations
+
+**Alternative 1**: {Other possible cause}
+- Why less likely: {Evidence against}
+
+**Alternative 2**: {Other possible cause}
+- Why less likely: {Evidence against}
+
+**Unresolved uncertainties**: {What we still don't know}
+
+---
+
+## 6. Recommended Actions
+
+### Immediate Interventions (Address Root Cause)
+1. {Action to eliminate root cause}
+2. {Action to prevent recurrence}
+
+### Tests to Validate Causal Claim
+1. {Experiment to confirm causation}
+2. {Observable prediction if theory is correct}
+
+### Monitoring
+- Metrics to track: {Key indicators}
+- Expected change: {What should happen if intervention works}
+- Timeline: {When to expect results}
+
+---
+
+## 7. Limitations
+
+**Data limitations**: {Missing data, measurement issues}
+
+**Analysis limitations**: {Confounders not controlled, temporal gaps, sample size}
+
+**Generalizability**: {Does this apply beyond this specific case?}
+```
+
+---
+
+## Section-by-Section Guidance
+
+### 1. Effect Definition
+**Goal**: Precisely specify what you're explaining
+- Be specific (not "system slow" but "p95 latency 50ms → 500ms")
+- Quantify magnitude and timeline
+- Establish baseline for comparison
+
+### 2. Competing Hypotheses
+**Goal**: Generate multiple explanations (avoid confirmation bias)
+- Use techniques: 5 Whys, Fishbone Diagram, Timeline Analysis (see `methodology.md`)
+- Distinguish: Root cause (fundamental) vs Proximate cause (immediate trigger) vs Symptom
+- Consider confounders (third variables causing both X and Y)
+
+### 3. Causal Model
+**Goal**: Map how causes lead to effect
+- Show causal chains with mechanisms (not just correlation)
+- Identify confounding relationships
+- Distinguish direct vs indirect causation
+
+### 4. Evidence Assessment
+**Goal**: Test whether X truly causes Y (not just correlates)
+
+**Essential tests:**
+- **Temporal sequence**: Cause must precede effect
+- **Counterfactual**: What happens when cause is absent?
+- **Mechanism**: HOW does cause produce effect?
+
+**Strengthening evidence:**
+- **Dose-response**: More cause → more effect?
+- **Consistency**: Holds across contexts?
+- **Confounding control**: Ruled out third variables?
+
+### 5. Conclusion
+**Goal**: State root cause with justified confidence
+- Distinguish root from proximate causes
+- State confidence level (High/Medium/Low) with reasoning
+- Acknowledge alternative explanations and limitations
+
+### 6. Recommendations
+**Goal**: Actionable interventions based on root cause
+- Address root cause (not just symptoms)
+- Propose tests to validate causal claim
+- Define success metrics
+
+### 7. Limitations
+**Goal**: Acknowledge uncertainties and boundaries
+- Missing data or measurement issues
+- Confounders not fully controlled
+- Scope of generalizability
+
+---
+
+## Quick Reference
+
+**For detailed techniques**: See `methodology.md`
+- 5 Whys (trace back to root)
+- Fishbone Diagram (categorize causes)
+- Timeline Analysis (what changed when?)
+- Bradford Hill Criteria (9 factors for causation)
+- Evidence hierarchy (RCT > longitudinal > cross-sectional)
+- Confounding control methods
+
+**For complete example**: See `examples/database-performance-degradation.md`
+- Shows full analysis with all sections
+- Demonstrates evidence assessment
+- Includes Bradford Hill scoring
+
+**Quality checklist**: Before finalizing, verify:
+- [ ] Effect clearly defined with quantification
+- [ ] Multiple hypotheses generated (3+ competing explanations)
+- [ ] Root cause distinguished from proximate/symptoms
+- [ ] Temporal sequence verified (cause before effect)
+- [ ] Counterfactual tested (what if cause absent?)
+- [ ] Mechanism explained (HOW, not just THAT)
+- [ ] Confounders identified and controlled
+- [ ] Confidence level stated with justification
+- [ ] Alternative explanations noted
+- [ ] Limitations acknowledged
--- a/skills/chain-estimation-decision-storytelling/SKILL.md
+++ b/skills/chain-estimation-decision-storytelling/SKILL.md
@@ -0,0 +1,176 @@
+---
+name: chain-estimation-decision-storytelling
+description: Use when making high-stakes decisions under uncertainty that require stakeholder buy-in. Invoke when evaluating strategic options (build vs buy, market entry, resource allocation), quantifying tradeoffs with uncertain outcomes, justifying investments with expected value analysis, pitching recommendations to decision-makers, or creating business cases with cost-benefit estimates. Use when user mentions "should we", "ROI analysis", "make a case for", "evaluate options", "expected value", "justify decision", or needs to combine estimation, decision analysis, and persuasive communication.
+---
+
+# Chain Estimation → Decision → Storytelling
+
+## Table of Contents
+
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [What is Chain Estimation → Decision → Storytelling?](#what-is-chain-estimation--decision--storytelling)
+- [Workflow](#workflow)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Systematically quantify uncertain choices, make defensible decisions using expected value analysis, and communicate recommendations through persuasive narratives. This meta-skill chains estimation → decision → storytelling to transform ambiguous options into clear, stakeholder-ready recommendations.
+
+## When to Use This Skill
+
+- Evaluating strategic options with uncertain outcomes (build vs buy, market entry, product investment)
+- Creating business cases for resource allocation or budget approval
+- Justifying technical decisions with cost-benefit analysis (architecture, tooling, infrastructure)
+- Pitching recommendations to executives or board with quantified tradeoffs
+- Making investment decisions with ROI projections and risk assessment
+- Prioritizing initiatives with expected value comparison
+- Evaluating partnerships, acquisitions, or major contracts
+- Designing pricing strategies with revenue/cost modeling
+- Resource planning with capacity and utilization estimates
+- Risk mitigation decisions with probability-weighted outcomes
+- Product roadmap decisions with effort/impact estimates
+- Organizational change decisions (hiring, restructuring, policy)
+- Technology adoption with TCO and benefit quantification
+- Market positioning decisions with competitive analysis
+- Portfolio management with probability-adjusted returns
+
+**Trigger phrases:** "should we", "evaluate options", "make a case for", "ROI analysis", "expected value", "justify decision", "quantify tradeoffs", "pitch to", "business case", "cost-benefit", "probability-weighted"
+
+## What is Chain Estimation → Decision → Storytelling?
+
+A three-phase meta-skill that combines:
+
+1. **Estimation**: Quantify uncertain variables with ranges, probabilities, and sensitivity analysis
+2. **Decision**: Apply expected value, decision trees, or scoring to identify best option
+3. **Storytelling**: Package analysis into compelling narrative for stakeholders
+
+**Quick Example:**
+
+```markdown
+# Should we build custom analytics or buy a SaaS tool?
+
+## Estimation
+Build custom: $200k-$400k dev cost (60% likely $300k), $50k/year maintenance
+Buy SaaS: $120k/year subscription, $20k implementation
+
+## Decision
+Expected 3-year cost:
+- Build: $300k + (3 × $50k) = $450k
+- Buy: $20k + (3 × $120k) = $380k
+- Difference: $70k savings with Buy
+
+Expected value with risk adjustment:
+- Build: 30% chance of 2x cost overrun → $510k expected
+- Buy: 95% confidence in pricing → $380k expected
+- Recommendation: Buy (lower cost, lower risk)
+
+## Story
+"We evaluated building custom analytics vs. buying a SaaS solution. While building seems cheaper initially ($300k vs. $380k over 3 years), custom development carries significant risk—30% of similar projects experience 2x cost overruns, bringing expected cost to $510k. The SaaS solution offers predictable pricing, faster time-to-value (2 months vs. 8 months), and proven reliability. Recommendation: Buy the SaaS tool, saving $130k in expected costs and delivering value 6 months earlier."
+```
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Chain Estimation → Decision → Storytelling Progress:
+- [ ] Step 1: Clarify decision and gather inputs
+- [ ] Step 2: Estimate uncertain variables
+- [ ] Step 3: Analyze decision with expected value
+- [ ] Step 4: Craft persuasive narrative
+- [ ] Step 5: Validate and deliver
+```
+
+**Step 1: Clarify decision and gather inputs**
+
+Define the choice (what decision needs to be made?), identify alternatives (2-5 options to compare), list uncertainties (what variables are unknown or probabilistic?), determine audience (who needs to be convinced?), and clarify constraints (budget, timeline, requirements). Ensure the decision is actionable and the options are mutually exclusive.
+
+**Step 2: Estimate uncertain variables**
+
+For each alternative, quantify costs (fixed, variable, opportunity), estimate benefits (revenue, savings, productivity), assign probabilities to scenarios (best case, base case, worst case), and perform sensitivity analysis (which inputs matter most?). Use ranges rather than point estimates. For simple cases → Use `resources/template.md` for structured estimation. For complex cases → Study `resources/methodology.md` for advanced techniques (Monte Carlo, decision trees, real options).
+
+**Step 3: Analyze decision with expected value**
+
+Calculate expected outcomes for each alternative (probability-weighted averages), compare using decision criteria (NPV, payback period, IRR, utility), identify dominant option (best expected value or risk-adjusted return), and test robustness (does conclusion hold across reasonable input ranges?). Document assumptions explicitly. See [Common Patterns](#common-patterns) for decision-type specific approaches.
+
+**Step 4: Craft persuasive narrative**
+
+Structure story with: problem statement (why this decision matters), alternatives considered (show you did the work), analysis summary (key numbers and logic), recommendation (clear choice with reasoning), next steps (what happens if approved). Tailor to audience: executives want bottom line and risks, technical teams want methodology and assumptions, finance wants numbers and sensitivity.
+
+**Step 5: Validate and deliver**
+
+Self-check using `resources/evaluators/rubric_chain_estimation_decision_storytelling.json`. Verify: estimates are justified with sources/logic, probabilities are calibrated (not overconfident), expected value calculation is correct, sensitivity analysis identifies key drivers, narrative is clear and persuasive, assumptions are stated explicitly, risks and limitations are acknowledged. Minimum standard: Score ≥ 3.5. Create `chain-estimation-decision-storytelling.md` output file with full analysis and recommendation.
+
+## Common Patterns
+
+**For build vs buy decisions:**
+- Estimate: Development cost (effort × rate), maintenance cost, SaaS subscription, implementation cost
+- Decision: 3-5 year TCO, risk-adjusted for schedule overruns and feature gaps
+- Story: "Build gives us control but costs $X more and takes Y months longer..."
+
+**For market entry decisions:**
+- Estimate: TAM/SAM/SOM, CAC, LTV, time-to-profitability
+- Decision: Expected NPV with market uncertainty (optimistic/pessimistic scenarios)
+- Story: "If we enter now, base case is $X revenue by year 3, but if market adoption is slower..."
+
+**For resource allocation:**
+- Estimate: Cost per initiative, expected impact (revenue, cost savings, strategic value)
+- Decision: Impact/effort scoring or expected value ranking
+- Story: "Given $X budget, these 3 initiatives deliver $Y expected return vs. $Z for alternatives..."
+
+**For technology decisions:**
+- Estimate: Migration cost, operational cost, performance improvement, risk reduction
+- Decision: TCO over 3-5 years plus risk-adjusted benefits
+- Story: "Migrating to X costs $Y upfront but saves $Z annually and reduces outage risk from..."
+
+**For hiring/staffing decisions:**
+- Estimate: Compensation, recruiting cost, ramp time, productivity impact
+- Decision: Cost per incremental output vs. alternatives (contractors, vendors, automation)
+- Story: "Adding 3 engineers at $X cost delivers $Y additional capacity, enabling..."
+
+## Guardrails
+
+**Do:**
+- Use ranges for uncertain estimates (not false precision)
+- Assign probabilities based on data or explicit reasoning
+- Calculate expected value correctly (probability-weighted outcomes)
+- Perform sensitivity analysis (test assumptions)
+- State assumptions explicitly
+- Acknowledge risks and limitations
+- Tailor narrative to audience (exec vs technical vs finance)
+- Include "what would change my mind" conditions
+- Show your work (transparent methodology)
+- Test robustness (does conclusion hold with different assumptions?)
+
+**Don't:**
+- Use single-point estimates for highly uncertain variables
+- Claim false precision ("$347,291" when uncertainty is ±50%)
+- Ignore risk or downside scenarios
+- Cherry-pick optimistic assumptions
+- Hide assumptions or methodology
+- Overstate confidence in estimates
+- Skip sensitivity analysis
+- Make recommendation before analyzing alternatives
+- Use jargon without defining terms for audience
+- Forget to state next steps or decision criteria
+
+**Common Pitfalls:**
+- **Anchoring bias**: First estimate becomes "default" without testing alternatives
+- **Optimism bias**: Best-case scenarios feel more likely than they are
+- **Sunk cost fallacy**: Including past costs that shouldn't affect forward-looking decision
+- **Overconfidence**: Narrow ranges that don't reflect true uncertainty
+- **Ignoring opportunity cost**: Not considering what else could be done with resources
+- **Analysis paralysis**: Spending too much time estimating vs. deciding with available info
+
+## Quick Reference
+
+- **Template**: `resources/template.md` - Structured estimation → decision → story framework
+- **Methodology**: `resources/methodology.md` - Advanced techniques (Monte Carlo, decision trees, real options)
+- **Examples**: `resources/examples/` - Worked examples (build vs buy, market entry, hiring decision)
+- **Quality rubric**: `resources/evaluators/rubric_chain_estimation_decision_storytelling.json`
+- **Output file**: `chain-estimation-decision-storytelling.md`
+- **Key distinction**: Combines quantitative rigor (estimation, expected value) with qualitative persuasion (narrative, stakeholder alignment)
+- **When to use**: High-stakes decisions with uncertainty that need buy-in (not routine choices or purely data-driven optimizations)
--- a/skills/chain-estimation-decision-storytelling/resources/evaluators/rubric_chain_estimation_decision_storytelling.json
+++ b/skills/chain-estimation-decision-storytelling/resources/evaluators/rubric_chain_estimation_decision_storytelling.json
@@ -0,0 +1,176 @@
+{
+  "criteria": [
+    {
+      "name": "Estimation Quality",
+      "description": "Are costs and benefits quantified with appropriate ranges/probabilities?",
+      "scale": {
+        "1": "Single-point estimates with no uncertainty. Major cost or benefit categories missing.",
+        "2": "Some ranges provided but many point estimates. Several categories incomplete.",
+        "3": "Most estimates have ranges. Key cost and benefit categories covered. Some uncertainty acknowledged.",
+        "4": "Comprehensive estimation with ranges for uncertain variables. Probabilities assigned to scenarios. Justification provided for estimates.",
+        "5": "Rigorous estimation with probability distributions, data sources cited, estimation method explained (analogous, parametric, bottom-up), and confidence levels stated."
+      }
+    },
+    {
+      "name": "Probability Calibration",
+      "description": "Are probabilities reasonable, justified, and calibrated?",
+      "scale": {
+        "1": "No probabilities assigned or completely arbitrary (e.g., all 50%).",
+        "2": "Probabilities assigned but no justification. Appear overconfident (too many 5% or 95%).",
+        "3": "Probabilities have some justification. Reasonable calibration for most scenarios.",
+        "4": "Probabilities justified with base rates, expert judgment, or reference class. Well-calibrated ranges.",
+        "5": "Rigorous probability assignment using historical data, base rates, and adjustments. Calibration checked explicitly. Confidence bounds stated."
+      }
+    },
+    {
+      "name": "Decision Analysis Rigor",
+      "description": "Is expected value and comparison logic sound?",
+      "scale": {
+        "1": "No expected value calculation. Comparison is purely subjective.",
+        "2": "Expected value attempted but calculation errors. Comparison incomplete.",
+        "3": "Expected value calculated correctly. Basic comparison of alternatives using EV or simple scoring.",
+        "4": "Sound EV calculation with appropriate decision criteria (NPV, IRR, utility). Clear comparison methodology.",
+        "5": "Rigorous analysis using appropriate technique (EV, decision tree, Monte Carlo, MCDA). Multiple decision criteria considered. Methodology appropriate for problem complexity."
+      }
+    },
+    {
+      "name": "Sensitivity Analysis",
+      "description": "Are key drivers identified and impact tested?",
+      "scale": {
+        "1": "No sensitivity analysis performed.",
+        "2": "Limited sensitivity (single variable tested). No identification of key drivers.",
+        "3": "One-way sensitivity on 2-3 key variables. Drivers identified but impact not quantified well.",
+        "4": "Comprehensive one-way sensitivity on all major variables. Key drivers ranked by impact. Break-even analysis performed.",
+        "5": "Advanced sensitivity including two-way analysis, scenario analysis, or tornado diagrams. Robustness tested across reasonable ranges. Conditions that change conclusion clearly stated."
+      }
+    },
+    {
+      "name": "Alternative Comparison",
+      "description": "Are all relevant alternatives considered and compared fairly?",
+      "scale": {
+        "1": "Only one alternative analyzed (no comparison).",
+        "2": "Two alternatives but comparison is cursory or biased.",
+        "3": "2-3 alternatives analyzed. Comparison is fair but may miss some options or factors.",
+        "4": "3-5 alternatives including creative options. Fair comparison across all relevant factors.",
+        "5": "Comprehensive alternative generation (considered 5+ initially, narrowed to 3-5). Comparison addresses all stakeholder concerns. Dominated options eliminated with explanation."
+      }
+    },
+    {
+      "name": "Assumption Transparency",
+      "description": "Are assumptions stated explicitly and justified?",
+      "scale": {
+        "1": "Assumptions hidden or unstated. Reader must guess what's assumed.",
+        "2": "Few assumptions stated. Most are implicit. Little justification.",
+        "3": "Major assumptions stated but justification is thin. Some assumptions still implicit.",
+        "4": "All key assumptions stated explicitly with justification. Reader can assess reasonableness.",
+        "5": "Complete assumption transparency. Each assumption justified with source or reasoning. Alternative assumptions considered. Impact of changing assumptions tested."
+      }
+    },
+    {
+      "name": "Narrative Clarity",
+      "description": "Is the story clear, logical, and persuasive?",
+      "scale": {
+        "1": "Narrative is confusing, illogical, or missing. Just numbers with no story.",
+        "2": "Some narrative but disjointed. Logic is hard to follow. Key points buried.",
+        "3": "Clear narrative structure. Main points are clear. Logic is mostly sound.",
+        "4": "Compelling narrative with clear problem statement, analysis summary, recommendation, and reasoning. Flows logically.",
+        "5": "Highly persuasive narrative that leads reader through problem, analysis, and conclusion. Key insights highlighted. Tradeoffs acknowledged. Objections preempted. Memorable framing."
+      }
+    },
+    {
+      "name": "Audience Tailoring",
+      "description": "Is content appropriate for stated audience?",
+      "scale": {
+        "1": "No consideration of audience. Wrong level of detail or wrong focus.",
+        "2": "Minimal tailoring. May have too much or too little detail for audience.",
+        "3": "Content generally appropriate. Length and detail reasonable for audience.",
+        "4": "Well-tailored to audience needs. Executives get summary, technical teams get methodology, finance gets numbers. Appropriate jargon level.",
+        "5": "Expertly tailored with multiple versions or sections for different stakeholders. Executive summary for leaders, technical appendix for specialists, financial detail for finance. Anticipates audience questions."
+      }
+    },
+    {
+      "name": "Risk Acknowledgment",
+      "description": "Are downside scenarios, risks, and limitations addressed?",
+      "scale": {
+        "1": "No mention of risks or limitations. Only upside presented.",
+        "2": "Brief mention of risks but no detail. Limitations glossed over.",
+        "3": "Downside scenarios included. Major risks identified. Some limitations noted.",
+        "4": "Comprehensive risk analysis with downside scenarios, mitigation strategies, and clear limitations. Probability of loss quantified.",
+        "5": "Rigorous risk treatment including probability-weighted downside, specific mitigation plans, uncertainty quantified, and honest assessment of analysis limitations. 'What would change our mind' conditions stated."
+      }
+    },
+    {
+      "name": "Actionability",
+      "description": "Are next steps clear, specific, and feasible?",
+      "scale": {
+        "1": "No next steps or recommendation unclear.",
+        "2": "Vague next steps ('consider options', 'study further'). No specifics.",
+        "3": "Recommendation clear. Next steps identified but lack detail on who/when/how.",
+        "4": "Clear recommendation with specific next steps, owners, and timeline. Success metrics defined.",
+        "5": "Highly actionable with clear recommendation, detailed implementation plan with milestones, owners assigned, success metrics defined, decision review cadence specified, and monitoring plan for key assumptions."
+      }
+    }
+  ],
+  "minimum_standard": 3.5,
+  "stakes_guidance": {
+    "low_stakes": {
+      "threshold": 3.0,
+      "description": "Decisions under $100k or low strategic importance. Acceptable to have simpler analysis (criteria 3-4).",
+      "focus_criteria": ["Estimation Quality", "Decision Analysis Rigor", "Actionability"]
+    },
+    "medium_stakes": {
+      "threshold": 3.5,
+      "description": "Decisions $100k-$1M or moderate strategic importance. Standard threshold applies (criteria average ≥3.5).",
+      "focus_criteria": ["All criteria should meet threshold"]
+    },
+    "high_stakes": {
+      "threshold": 4.0,
+      "description": "Decisions >$1M or high strategic importance. Higher bar required (criteria average ≥4.0).",
+      "focus_criteria": ["Estimation Quality", "Sensitivity Analysis", "Risk Acknowledgment", "Assumption Transparency"],
+      "additional_requirements": ["External validation of key estimates", "Multiple modeling approaches for robustness", "Explicit stakeholder review process"]
+    }
+  },
+  "common_failure_modes": [
+    {
+      "failure": "Optimism bias",
+      "symptoms": "All probabilities favor best case. Downside scenarios underweighted.",
+      "fix": "Use reference class forecasting. Require explicit base rates. Weight downside equally."
+    },
+    {
+      "failure": "Sunk cost fallacy",
+      "symptoms": "Past investments influence forward-looking analysis.",
+      "fix": "Evaluate only incremental future costs/benefits. Ignore sunk costs explicitly."
+    },
+    {
+      "failure": "False precision",
+      "symptoms": "Point estimates to multiple decimal places when uncertainty is ±50%.",
+      "fix": "Use ranges. State confidence levels. Round appropriately given uncertainty."
+    },
+    {
+      "failure": "Anchoring on first estimate",
+      "symptoms": "All alternatives compared to one 'anchor' rather than objectively.",
+      "fix": "Generate alternatives independently. Use multiple estimation methods."
+    },
+    {
+      "failure": "Analysis paralysis",
+      "symptoms": "Endless modeling, no decision. Waiting for perfect information.",
+      "fix": "Set time limits. Use 'good enough' threshold. Decide with available info."
+    },
+    {
+      "failure": "Ignoring opportunity cost",
+      "symptoms": "Only evaluating direct costs, not what else could be done with resources.",
+      "fix": "Explicitly include opportunity cost. Compare to next-best alternative use of capital/time."
+    },
+    {
+      "failure": "Confirmation bias",
+      "symptoms": "Analysis structured to justify predetermined conclusion.",
+      "fix": "Generate alternatives before analyzing. Use blind evaluation. Seek disconfirming evidence."
+    },
+    {
+      "failure": "Overweighting quantifiable",
+      "symptoms": "Strategic or qualitative factors ignored because hard to measure.",
+      "fix": "Explicitly list qualitative factors. Use scoring for non-quantifiable. Ask 'what matters that we're not measuring?'"
+    }
+  ],
+  "usage_notes": "Use this rubric to self-assess before delivering analysis. For high-stakes decisions (>$1M or strategic), aim for 4.0+ average. For low-stakes (<$100k), 3.0+ may be acceptable. Pay special attention to Estimation Quality, Decision Analysis Rigor, and Risk Acknowledgment as these are most critical for sound decisions."
+}
--- a/skills/chain-estimation-decision-storytelling/resources/examples/build-vs-buy-analytics-platform.md
+++ b/skills/chain-estimation-decision-storytelling/resources/examples/build-vs-buy-analytics-platform.md
@@ -0,0 +1,485 @@
+# Decision: Build Custom Analytics Platform vs. Buy SaaS Solution
+
+**Date:** 2024-01-15
+**Decision-maker:** CTO + VP Product
+**Audience:** Executive team
+**Stakes:** Medium ($500k-$1.5M over 3 years)
+
+---
+
+## 1. Decision Context
+
+**What we're deciding:**
+Should we build a custom analytics platform in-house or purchase a SaaS analytics solution?
+
+**Why this matters:**
+- Current analytics are manual and time-consuming (20 hours/week analyst time)
+- Product team needs real-time insights to inform roadmap decisions
+- Sales needs usage data to identify expansion opportunities
+- Engineering wants to reduce operational burden of maintaining custom tools
+
+**Alternatives:**
+1. **Build custom**: Develop in-house analytics platform with our exact requirements
+2. **Buy SaaS**: Purchase enterprise analytics platform (e.g., Amplitude, Mixpanel)
+3. **Hybrid**: Use SaaS for standard metrics, build custom for proprietary analysis
+
+**Key uncertainties:**
+- Development cost and timeline (historical variance ±40%)
+- Feature completeness of SaaS solution (will it meet all needs?)
+- Usage growth rate (affects SaaS costs which scale with volume)
+- Long-term flexibility needs (will we outgrow SaaS or need custom features?)
+
+**Constraints:**
+- Budget: $150k available in current year, $50k/year ongoing
+- Timeline: Need solution operational within 6 months
+- Requirements: Must support 100M events/month, 50+ team members, custom dashboards
+- Strategic: Prefer minimal vendor lock-in, prioritize time-to-value
+
+**Audience:** Executive team (need bottom-line recommendation + risks)
+
+---
+
+## 2. Estimation
+
+### Alternative 1: Build Custom
+
+**Costs:**
+- **Initial development**: $200k-$400k (most likely $300k)
+  - Base estimate: 6 engineer-months × $50k loaded cost = $300k
+  - Range reflects scope uncertainty and potential technical challenges
+  - Source: Similar internal projects averaged $280k ±$85k (30% std dev)
+
+- **Annual operational costs**: $40k-$60k per year (most likely $50k)
+  - Infrastructure: $15k-$25k (based on 100M events/month)
+  - Maintenance: 0.5 engineer FTE = $25k-$35k per year
+  - Source: Current analytics tools cost $45k/year to maintain
+
+- **Opportunity cost**: $150k
+  - Engineering team would otherwise work on core product features
+  - Estimated value of deferred features: $150k in potential revenue impact
+
+**Benefits:**
+- **Cost savings**: $0 subscription fees (vs $120k/year for SaaS)
+- **Perfect fit**: 100% feature match to our specific needs
+- **Flexibility**: Full control to add custom analysis
+- **Strategic value**: Build analytics competency, own our data
+
+**Probabilities:**
+- **Best case (20%)**: On-time delivery at $250k, perfect execution
+  - Prerequisites: Clear requirements, no scope creep, experienced team available
+
+- **Base case (50%)**: Moderate delays and cost overruns to $350k over 8 months
+  - Typical scenario based on historical performance
+
+- **Worst case (30%)**: Significant delays to $500k over 12 months, some features cut
+  - Risk factors: Key engineer departure, underestimated complexity, changing requirements
+
+**Key assumptions:**
+- Engineering team has capacity (currently 70% utilized)
+- No major technical unknowns in data pipeline
+- Requirements are stable (< 10% scope change)
+- Infrastructure costs scale linearly with events
+
+### Alternative 2: Buy SaaS
+
+**Costs:**
+- **Initial implementation**: $15k-$25k (most likely $20k)
+  - Setup and integration: 2-3 weeks consulting
+  - Data migration and testing
+  - Team training
+  - Source: Vendor quote + reference customer feedback
+
+- **Annual subscription**: $100k-$140k per year (most likely $120k)
+  - Base: $80k for 100M events/month
+  - Users: $2k per user × 20 power users = $40k
+  - Growth buffer: Assume 20% event growth per year
+  - Source: Vendor pricing confirmed, escalates with usage
+
+- **Switching cost** (if we change vendors later): $50k-$75k
+  - Data export and migration
+  - Re-implementing integrations
+  - Team retraining
+
+**Benefits:**
+- **Faster time-to-value**: 2 months vs. 8 months for build
+  - 6-month head start = earlier insights = better decisions sooner
+  - Estimated value: $75k (half of opportunity cost avoided)
+
+- **Proven reliability**: 99.9% uptime SLA
+  - Reduces operational risk
+  - Frees engineering for core product
+
+- **Feature velocity**: Continuous improvements from vendor
+  - New capabilities quarterly (ML-powered insights, predictive analytics)
+  - Estimated value: $30k/year in avoided feature development
+
+- **Lower risk**: Predictable costs, no schedule risk
+  - High confidence in timeline and total cost
+
+**Probabilities:**
+- **Best case (40%)**: Perfect fit, seamless implementation, $100k/year steady state
+  - Vendor delivers on promises, usage grows slower than expected
+
+- **Base case (45%)**: Good fit with minor gaps, standard implementation, $120k/year
+  - 85% of needs met out-of-box, workarounds for remaining 15%
+
+- **Worst case (15%)**: Poor fit requiring workarounds or supplemental tools, $150k/year
+  - Missing critical features, need to maintain some custom tooling
+
+**Key assumptions:**
+- SaaS vendor is stable and continues product development
+- Event volume growth is 20% per year (manageable)
+- Vendor lock-in is acceptable (switching cost is reasonable)
+- Security and compliance requirements are met by vendor
+
+### Alternative 3: Hybrid
+
+**Costs:**
+- **Initial investment**: $100k-$150k (most likely $125k)
+  - SaaS implementation: $20k
+  - Custom integrations and proprietary metrics: $100k-$130k development
+
+- **Annual costs**: $80k-$100k per year (most likely $90k)
+  - SaaS subscription (smaller tier): $60k-$70k
+  - Maintenance of custom components: $20k-$30k
+
+**Benefits:**
+- **Balanced approach**: Standard analytics from SaaS, custom analysis in-house
+- **Reduced risk**: Less development than full build, more control than pure SaaS
+- **Flexibility**: Can shift balance over time based on needs
+
+**Probabilities:**
+- **Base case (60%)**: Works reasonably well, $125k + $90k/year
+- **Integration complexity (40%)**: More overhead than expected, $150k + $100k/year
+
+**Key assumptions:**
+- Clean separation between standard and custom analytics
+- SaaS provides good API for custom integrations
+- Maintaining two systems doesn't create excessive complexity
+
+---
+
+## 3. Decision Analysis
+
+### Expected Value Calculation (3-Year NPV)
+
+**Discount rate:** 10% (company's cost of capital)
+
+#### Alternative 1: Build Custom
+
+**Year 0 (Initial):**
+- Best case (20%): -$250k development - $150k opportunity cost = -$400k
+- Base case (50%): -$350k development - $150k opportunity cost = -$500k
+- Worst case (30%): -$500k development - $150k opportunity cost = -$650k
+
+**Expected Year 0:** ($-400k × 0.20) + ($-500k × 0.50) + ($-650k × 0.30) = -$525k
+
+**Years 1-3 (Operational):**
+- Annual cost: $50k/year
+- PV of 3 years at 10%: $50k × 2.49 = $124k
+
+**Total Expected NPV (Build):** -$525k - $124k = **-$649k**
+
+*Note: Costs are negative because this is an investment. Focus is on minimizing cost since benefits (analytics capability) are equivalent across alternatives.*
+
+#### Alternative 2: Buy SaaS
+
+**Year 0 (Initial):**
+- Implementation: $20k
+- No opportunity cost (fast implementation)
+
+**Years 1-3 (Operational):**
+- Best case (40%): $100k/year × 2.49 = $249k
+- Base case (45%): $120k/year × 2.49 = $299k
+- Worst case (15%): $150k/year × 2.49 = $374k
+
+**Expected annual cost:** ($100k × 0.40) + ($120k × 0.45) + ($150k × 0.15) = $116.5k/year
+**PV of 3 years:** $116.5k × 2.49 = $290k
+
+**Total Expected NPV (Buy):** -$20k - $290k = **-$310k**
+
+**Benefit adjustment for faster time-to-value:** +$75k (6-month head start)
+**Adjusted NPV (Buy):** -$310k + $75k = **-$235k**
+
+#### Alternative 3: Hybrid
+
+**Year 0 (Initial):**
+- Development + implementation: $125k
+- Partial opportunity cost: $75k (half the custom build time)
+
+**Years 1-3 (Operational):**
+- Expected annual: $90k/year × 2.49 = $224k
+
+**Total Expected NPV (Hybrid):** -$125k - $75k - $224k = **-$424k**
+
+### Comparison Summary
+
+| Alternative | Expected 3-Year Cost | Risk Profile | Time to Value |
+|-------------|---------------------|--------------|---------------|
+| Build Custom | $649k              | **High** (30% worst case) | 8 months |
+| Buy SaaS     | $235k              | **Low** (predictable) | 2 months |
+| Hybrid       | $424k              | **Medium** | 5 months |
+
+**Cost difference:** Buy SaaS saves **$414k** vs. Build Custom over 3 years
+
+### Sensitivity Analysis
+
+**What if development cost for Build is 20% lower ($240k base instead of $300k)?**
+- Build NPV: -$577k (still $342k worse than Buy)
+- **Conclusion still holds**
+
+**What if SaaS costs grow 40% per year instead of 20%?**
+- Year 3 SaaS cost: $230k (vs. $145k base case)
+- Buy NPV: -$325k (still $324k better than Build)
+- **Conclusion still holds**
+
+**What if we need to switch SaaS vendors in Year 3?**
+- Additional switching cost: $65k
+- Buy NPV: -$300k (still $349k better than Build)
+- **Conclusion still holds**
+
+**Break-even analysis:**
+At what annual SaaS cost does Build become cheaper?
+- Build 3-year cost: $649k
+- Buy 3-year cost: $20k + (X × 2.49) - $75k = $649k
+- Solve: X = $282k/year
+
+**Interpretation:** SaaS would need to cost $282k/year (2.4x current estimate) for Build to break even. Very unlikely.
+
+### Robustness Check
+
+**Conclusion is robust if:**
+- Development cost < $600k (currently $300k base, $500k worst case ✓)
+- SaaS annual cost < $280k (currently $120k base, $150k worst case ✓)
+- Time-to-value benefit > $0 (6-month head start valuable ✓)
+
+**Conclusion changes if:**
+- SaaS vendor goes out of business (low probability, large incumbents)
+- Regulatory requirements force on-premise solution (not currently foreseen)
+- Custom analytics become core competitive differentiator (possible but unlikely)
+
+---
+
+## 4. Recommendation
+
+### **Recommended option: Buy SaaS Solution**
+
+**Reasoning:**
+
+Buy SaaS dominates Build Custom on three dimensions:
+
+1. **Lower expected cost**: $235k vs. $649k over 3 years (saves $414k)
+2. **Lower risk**: Predictable subscription vs. 30% chance of 2x cost overrun on build
+3. **Faster time-to-value**: 2 months vs. 8 months (6-month head start enables better decisions sooner)
+
+The cost advantage is substantial ($414k savings) and robust to reasonable assumption changes. Even if SaaS costs double or we need to switch vendors, Buy still saves $300k+.
+
+The risk profile strongly favors Buy. Historical data shows 30% of similar build projects experience 2x cost overruns. SaaS has predictable costs with 99.9% uptime SLA.
+
+Time-to-value matters: getting analytics operational 6 months sooner means better product decisions sooner, worth approximately $75k in avoided opportunity cost.
+
+**Key factors:**
+1. **Cost**: $414k lower expected cost over 3 years
+2. **Risk**: Predictable vs. high uncertainty (30% worst case for Build)
+3. **Speed**: 2 months vs. 8 months to operational
+4. **Strategic fit**: Analytics are important but not core competitive differentiator
+
+**Tradeoffs accepted:**
+- **Vendor dependency**: Accepting switching cost of $65k if we change vendors
+  - Mitigation: Choose stable, market-leading vendor (Amplitude or Mixpanel)
+
+- **Some feature gaps**: SaaS may not support 100% of custom analysis needs
+  - Mitigation: 85% coverage out-of-box, workarounds for remaining 15%
+  - Can supplement with lightweight custom tools if needed ($20k-$30k vs. $300k+ full build)
+
+- **Less flexibility**: Can't customize as freely as in-house solution
+  - Mitigation: Most SaaS platforms offer extensive APIs and integrations
+  - True custom needs can be addressed incrementally
+
+**Why not Hybrid?**
+Hybrid ($424k) is $189k more expensive than Buy with minimal additional benefit. The complexity of maintaining two systems outweighs the incremental flexibility.
+
+---
+
+## 5. Risks and Mitigations
+
+### Risk 1: SaaS doesn't meet all requirements
+
+**Probability:** Medium (15% worst case scenario)
+
+**Impact:** Need workarounds or supplemental tools
+
+**Mitigation:**
+- Conduct thorough vendor evaluation with 2-week pilot
+- Map all requirements to vendor capabilities before committing
+- Budget $30k for lightweight custom supplements if needed
+- Still cheaper than full Build even with supplements
+
+### Risk 2: Vendor lock-in / price increases
+
+**Probability:** Low-Medium (vendors typically increase 5-10%/year)
+
+**Impact:** Higher ongoing costs
+
+**Mitigation:**
+- Negotiate multi-year contract with price protection
+- Maintain data export capability (ensure vendor supports data portability)
+- Budget includes 20% annual growth buffer
+- Switching cost is manageable ($65k) if needed
+
+### Risk 3: Usage growth exceeds estimates
+
+**Probability:** Low (current trajectory is 15%/year, estimated 20%)
+
+**Impact:** Higher subscription costs
+
+**Mitigation:**
+- Monitor usage monthly against plan
+- Optimize event instrumentation to reduce unnecessary events
+- Renegotiate tier if growth is faster than expected
+- Even at 2x usage growth, still cheaper than Build
+
+### Risk 4: Security or compliance issues
+
+**Probability:** Very Low (vendor is SOC 2 Type II certified)
+
+**Impact:** Cannot use vendor, forced to build
+
+**Mitigation:**
+- Verify vendor security certifications before contract
+- Review data handling and privacy policies
+- Include compliance requirements in vendor evaluation
+- This risk applies to any vendor; not specific to this decision
+
+---
+
+## 6. Next Steps
+
+**If approved:**
+
+1. **Vendor evaluation** (2 weeks) - VP Product + Data Lead
+   - Demo top 3 vendors (Amplitude, Mixpanel, Heap)
+   - Map requirements to capabilities
+   - Validate pricing and terms
+   - Decision by: Feb 1
+
+2. **Pilot implementation** (2 weeks) - Engineering Lead
+   - 2-week pilot with selected vendor
+   - Instrument 3 key product flows
+   - Validate data accuracy and latency
+   - Go/no-go decision by: Feb 15
+
+3. **Full rollout** (4 weeks) - Data Team + Engineering
+   - Instrument all product events
+   - Migrate existing dashboards
+   - Train team on new platform
+   - Launch by: March 15
+
+**Success metrics:**
+- **Time to value**: Analytics operational within 2 months (by March 15)
+- **Cost**: Stay within $20k implementation + $120k annual budget
+- **Adoption**: 50+ team members using platform within 30 days of launch
+- **Value delivery**: Reduce manual analytics time from 20 hours/week to <5 hours/week
+
+**Decision review:**
+- **6-month review** (Sept 2024): Validate cost and value delivered
+  - Key question: Are we getting value proportional to cost?
+  - Metrics: Usage stats, time savings, decisions influenced by data
+
+- **Annual review** (Jan 2025): Assess whether to continue, renegotiate, or reconsider build
+  - Key indicators: Usage growth trend, missing features impact, pricing changes
+
+**What would change our mind:**
+- If vendor quality degrades significantly (downtime, bugs, poor support)
+- If pricing increases >30% beyond projections
+- If we identify analytics as core competitive differentiator (requires custom innovation)
+- If regulatory requirements force on-premise solution
+
+---
+
+## 7. Appendix: Assumptions Log
+
+**Development estimates:**
+- Based on: 3 similar internal projects (API platform, reporting tool, data pipeline)
+- Historical variance: ±30% from initial estimate
+- Team composition: 2-3 senior engineers for 3-4 months
+- Scope: Event ingestion, storage, query engine, dashboarding UI
+
+**SaaS pricing:**
+- Based on: Vendor quotes for 100M events/month, 50 users
+- Confirmed with: 2 reference customers at similar scale
+- Growth assumption: 20% annual event growth (aligned with product roadmap)
+- User assumption: 20 power users (product, sales, exec) need full access
+
+**Opportunity cost:**
+- Based on: Engineering team would otherwise work on product features
+- Estimated value: Product features could drive $150k additional revenue
+- Source: Product roadmap prioritization (deferred features)
+
+**Time-to-value benefit:**
+- Based on: 6-month head start with SaaS (2 months vs. 8 months)
+- Estimated value: Better decisions sooner = avoided mistakes + seized opportunities
+- Conservative estimate: 50% of opportunity cost = $75k
+
+**Discount rate:**
+- Company cost of capital: 10%
+- Used to calculate present value of multi-year costs
+
+---
+
+## Self-Assessment (Rubric Scores)
+
+**Estimation Quality:** 4/5
+- Comprehensive estimation with ranges and probabilities
+- Justification provided for estimates with sources
+- Could improve: More rigorous data collection from reference customers
+
+**Probability Calibration:** 4/5
+- Probabilities justified with base rates (historical project performance)
+- Well-calibrated ranges
+- Could improve: External validation of probability estimates
+
+**Decision Analysis Rigor:** 5/5
+- Sound expected value calculation with NPV
+- Appropriate decision criteria
+- Multiple scenarios tested
+
+**Sensitivity Analysis:** 5/5
+- Comprehensive one-way sensitivity on key variables
+- Break-even analysis performed
+- Conditions that change conclusion clearly stated
+
+**Alternative Comparison:** 4/5
+- Three alternatives analyzed fairly
+- Could improve: Consider more creative alternatives (e.g., open-source + custom)
+
+**Assumption Transparency:** 5/5
+- All key assumptions stated explicitly with justification
+- Alternative assumptions tested in sensitivity analysis
+
+**Narrative Clarity:** 4/5
+- Clear structure and logical flow
+- Could improve: More compelling framing for exec audience
+
+**Audience Tailoring:** 4/5
+- Appropriate detail for executive audience
+- Could improve: Add one-page executive summary
+
+**Risk Acknowledgment:** 5/5
+- Comprehensive risk analysis with probabilities and mitigations
+- Downside scenarios quantified
+- "What would change our mind" conditions stated
+
+**Actionability:** 5/5
+- Clear recommendation with specific next steps
+- Owners and timeline defined
+- Success metrics and review cadence specified
+
+**Average Score:** 4.5/5 (Exceeds standard for medium-stakes decision)
+
+---
+
+**Analysis completed:** January 15, 2024
+**Analyst:** [Name]
+**Reviewed by:** CTO
+**Status:** Ready for executive decision
--- a/skills/chain-estimation-decision-storytelling/resources/methodology.md
+++ b/skills/chain-estimation-decision-storytelling/resources/methodology.md
@@ -0,0 +1,339 @@
+# Advanced Chain Estimation → Decision → Storytelling Methodology
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Advanced Analysis Progress:
+- [ ] Step 1: Select appropriate advanced technique for complexity
+- [ ] Step 2: Build model (decision tree, Monte Carlo, real options)
+- [ ] Step 3: Run analysis and interpret results
+- [ ] Step 4: Validate robustness across scenarios
+- [ ] Step 5: Translate technical findings into narrative
+```
+
+**Step 1: Select appropriate advanced technique for complexity**
+
+Choose technique based on decision characteristics: decision trees for sequential choices, Monte Carlo for multiple interacting uncertainties, real options for flexibility value, multi-criteria analysis for qualitative + quantitative factors. See [Technique Selection Guide](#technique-selection-guide) for decision flowchart.
+
+**Step 2: Build model**
+
+Structure problem using chosen technique: define states and branches for decision trees, specify probability distributions for Monte Carlo, identify options and decision points for real options analysis, establish criteria and weights for multi-criteria. See technique-specific sections below for modeling guidance.
+
+**Step 3: Run analysis and interpret results**
+
+Execute calculations (manually for small trees, with tools for complex simulations), interpret output distributions or decision paths, identify dominant strategies or highest-value options, and quantify value of information or flexibility where applicable.
+
+**Step 4: Validate robustness across scenarios**
+
+Test assumptions with stress testing, vary key parameters to check sensitivity, compare results across different modeling approaches, and identify conditions where conclusion changes. See [Sensitivity and Robustness Testing](#sensitivity-and-robustness-testing).
+
+**Step 5: Translate technical findings into narrative**
+
+Convert technical analysis into business language, highlight key insights without overwhelming with methodology, explain "so what" for decision-makers, and provide clear recommendation with confidence bounds. See [Communicating Complex Analysis](#communicating-complex-analysis).
+
+---
+
+## Technique Selection Guide
+
+**Decision Trees** → Sequential decisions with discrete outcomes and known probabilities
+- Use when: Clear sequence of choices, branching scenarios, need optimal path
+- Example: Build vs buy with adoption uncertainty
+
+**Monte Carlo Simulation** → Multiple interacting uncertainties with continuous distributions
+- Use when: Many uncertain variables, complex interactions, need probability distributions
+- Example: Project NPV with uncertain cost, revenue, timeline
+
+**Real Options Analysis** → Decisions with flexibility value (defer, expand, abandon)
+- Use when: Uncertainty resolves over time, value of waiting, staged commitments
+- Example: Pilot before full launch, expand if successful
+
+**Multi-Criteria Decision Analysis (MCDA)** → Mix of quantitative and qualitative factors
+- Use when: Multiple objectives, stakeholder tradeoffs, subjective criteria
+- Example: Vendor selection (cost + quality + relationship)
+
+---
+
+## Decision Trees
+
+### Structure
+- **Decision node (□)**: Your choice
+- **Chance node (○)**: Uncertain outcome with probabilities
+- **Terminal node**: Final payoff
+
+### Method
+1. Map all decisions and chance events
+2. Assign probabilities to chance events
+3. Work backward: calculate EV at chance nodes, choose best at decision nodes
+4. Identify optimal path
+
+### Example
+```
+□ Build vs Buy
+├─ Build → ○ Success (60%) → $500k
+│         └─ Fail (40%) → $100k
+└─ Buy → ○ Fits (70%) → $400k
+         └─ Doesn't (30%) → $150k
+
+Build EV = (500 × 0.6) + (100 × 0.4) = $340k
+Buy EV = (400 × 0.7) + (150 × 0.3) = $325k
+Decision: Build (higher EV)
+```
+
+### Value of Information
+- EVPI = EV with perfect info - EV without info
+- Tells you how much to spend on reducing uncertainty
+
+---
+
+## Monte Carlo Simulation
+
+### When to Use
+- Multiple uncertain variables (>3)
+- Complex interactions between variables
+- Need full probability distribution of outcomes
+- Continuous ranges (not discrete scenarios)
+
+### Method
+1. **Identify uncertain variables**: cost, revenue, timeline, adoption rate, etc.
+2. **Define distributions**: normal, log-normal, triangular, uniform
+3. **Specify correlations**: if variables move together
+4. **Run simulation**: 10,000+ iterations
+5. **Analyze output**: mean, median, percentiles, probability of success
+
+### Distribution Types
+- **Normal**: μ ± σ (height, measurement error)
+- **Log-normal**: positively skewed (project duration, costs)
+- **Triangular**: min/most likely/max (quick estimation)
+- **Uniform**: all values equally likely (no information)
+
+### Interpretation
+- **P50 (median)**: 50% chance of exceeding
+- **P10/P90**: 80% confidence interval
+- **Probability of target**: P(NPV > $0), P(ROI > 20%)
+
+### Tools
+- Excel: =NORM.INV(RAND(), mean, stdev)
+- Python: `numpy.random.normal(mean, stdev, size=10000)`
+- @RISK, Crystal Ball: Monte Carlo add-ins
+
+---
+
+## Real Options Analysis
+
+### Concept
+Flexibility has value. Option to defer, expand, contract, or abandon is worth more than committing upfront.
+
+### When to Use
+- Uncertainty resolves over time (can learn before committing)
+- Irreversible investments (can't easily reverse)
+- Staged decisions (pilot → scale)
+
+### Types of Options
+- **Defer**: Wait for more information before committing
+- **Expand**: Scale up if successful
+- **Contract/Abandon**: Scale down or exit if unsuccessful
+- **Switch**: Change approach mid-course
+
+### Valuation Approach
+
+**Simple NPV (no flexibility):**
+- Commit now: EV = Σ(outcome × probability)
+
+**With real option:**
+- Value = NPV of commitment + Value of flexibility
+- Flexibility value = Expected payoff from optimal future decision - Expected payoff from committing now
+
+### Example
+- **Commit to full launch now**: $1M investment, 60% success → $3M, 40% fail → $0
+  - EV = (3M × 0.6) + (0 × 0.4) - 1M = $800K
+
+- **Pilot first ($200K), then decide**:
+  - Good pilot (60%) → full launch → EV $1.8M (0.6 × 3M - 1M)
+  - Bad pilot (40%) → abandon → lose $200K
+  - EV = (1.8M × 0.6) + (-0.2M × 0.4) = $1.0M
+
+- **Real option value** = $1.0M - $800K = $200K (value of flexibility to learn first)
+
+---
+
+## Multi-Criteria Decision Analysis (MCDA)
+
+### When to Use
+- Multiple objectives that can't be reduced to single metric (not just NPV)
+- Qualitative + quantitative factors
+- Stakeholder tradeoffs (different groups value different things)
+
+### Method
+
+**1. Identify criteria** (from stakeholder perspectives)
+- Cost, speed, quality, risk, strategic fit, customer impact, etc.
+
+**2. Weight criteria** (based on priorities)
+- Sum to 100%
+- Finance might weight cost 40%, Product weights customer impact 30%
+
+**3. Score alternatives** (1-5 or 1-10 scale on each criterion)
+- Alternative A: Cost=4, Speed=2, Quality=5
+- Alternative B: Cost=2, Speed=5, Quality=3
+
+**4. Calculate weighted scores**
+- A = (4 × 0.3) + (2 × 0.4) + (5 × 0.3) = 3.5
+- B = (2 × 0.3) + (5 × 0.4) + (3 × 0.3) = 3.5
+
+**5. Sensitivity analysis** on weights
+- How much would weights need to change to flip the decision?
+
+### Handling Qualitative Criteria
+- **Scoring rubric**: Define what 1, 3, 5 means for "strategic fit"
+- **Pairwise comparison**: Compare alternatives head-to-head on each criterion
+- **Range**: Use min-max scaling to normalize disparate units
+
+---
+
+## Sensitivity and Robustness Testing
+
+### One-Way Sensitivity
+- Vary one parameter at a time (e.g., cost ±20%)
+- Check if conclusion changes
+- Identify which parameters matter most
+
+### Two-Way Sensitivity
+- Vary two parameters simultaneously
+- Create sensitivity matrix or contour plot
+- Example: Cost (rows) × Revenue (columns) → NPV
+
+### Tornado Diagram
+- Bar chart showing impact of each parameter
+- Longest bars = most sensitive parameters
+- Focus analysis on top 2-3 drivers
+
+### Scenario Analysis
+- Define coherent scenarios (pessimistic, base, optimistic)
+- Not just parameter ranges, but plausible futures
+- Calculate outcome for each complete scenario
+
+### Break-Even Analysis
+- At what value does conclusion change?
+- "Need revenue >$500K to beat alternative"
+- "If cost exceeds $300K, pivot to Plan B"
+
+### Stress Testing
+- Extreme scenarios (worst case everything goes wrong)
+- Identify fragility: "Works unless X and Y both fail"
+- Build contingency plans for stress scenarios
+
+---
+
+## Communicating Complex Analysis
+
+### For Executives
+**Focus**: Bottom line, confidence, risks
+- Recommendation (1 sentence)
+- Key numbers (EV, NPV, ROI)
+- Confidence level (P10-P90 range)
+- Top 2 risks + mitigations
+- Decision criteria: "Proceed if X, pivot if Y"
+
+### For Technical Teams
+**Focus**: Methodology, assumptions, sensitivity
+- Modeling approach and rationale
+- Key assumptions with justification
+- Sensitivity analysis results
+- Robustness checks performed
+- Limitations of analysis
+
+### For Finance
+**Focus**: Numbers, assumptions, financial metrics
+- Cash flow timing
+- Discount rate and rationale
+- NPV, IRR, payback period
+- Risk-adjusted returns
+- Comparison to hurdle rate
+
+### General Principles
+- **Lead with conclusion**, then support with analysis
+- **Show confidence bounds**, not just point estimates
+- **Explain "so what"**, not just "what"
+- **Use visuals**: probability distributions, decision trees, tornado charts
+- **Be honest about limitations**: "Assumes X, sensitive to Y"
+
+---
+
+## Common Pitfalls in Advanced Analysis
+
+### False Precision
+- **Problem**: Reporting $1,234,567 when uncertainty is ±50%
+- **Fix**: Round appropriately. Use ranges, not points.
+
+### Ignoring Correlations
+- **Problem**: Modeling all uncertainties as independent when they're linked
+- **Fix**: Specify correlations in Monte Carlo (costs move together, revenue and volume linked)
+
+### Overfit ting Models
+- **Problem**: Building complex models with 20 parameters when data is thin
+- **Fix**: Keep models simple. Complexity doesn't equal accuracy.
+
+### Anchoring on Base Case
+- **Problem**: Treating "most likely" as "expected value"
+- **Fix**: Calculate probability-weighted EV. Assymetric distributions matter.
+
+### Analysis Paralysis
+- **Problem**: Endless modeling instead of deciding
+- **Fix**: Set time limits. "Good enough" threshold. Decide with available info.
+
+### Confirmation Bias
+- **Problem**: Modeling to justify predetermined conclusion
+- **Fix**: Model alternatives fairly. Seek disconfirming evidence. External review.
+
+### Ignoring Soft Factors
+- **Problem**: Optimizing NPV while ignoring strategic fit, team morale, brand impact
+- **Fix**: Use MCDA for mixed quantitative + qualitative. Make tradeoffs explicit.
+
+---
+
+## Advanced Tools and Resources
+
+### Spreadsheet Tools
+- **Excel**: Data tables, Scenario Manager, Goal Seek
+- **Google Sheets**: Same capabilities, collaborative
+
+### Specialized Software
+- **@RISK** (Palisade): Monte Carlo simulation add-in for Excel
+- **Crystal Ball** (Oracle): Similar Monte Carlo tool
+- **Python**: `numpy`, `scipy`, `simpy` for custom simulations
+- **R**: Statistical analysis and simulation
+
+### When to Use Tools vs. Manual
+- **Manual** (small decision trees): < 10 branches, quick calculation
+- **Spreadsheet** (medium complexity): Decision trees, simple Monte Carlo (< 5 variables)
+- **Specialized tools** (high complexity): 10+ uncertain variables, complex correlations, sensitivity analysis
+
+### Learning Resources
+- Decision analysis: "Decision Analysis for the Professional" - Skinner
+- Monte Carlo: "Risk Analysis in Engineering" - Modarres
+- Real options: "Real Options" - Copeland & Antikarov
+- MCDA: "Multi-Criteria Decision Analysis" - Belton & Stewart
+
+---
+
+## Summary
+
+**Choose technique based on problem structure:**
+- Sequential choices → Decision trees
+- Multiple uncertainties → Monte Carlo
+- Flexibility value → Real options
+- Mixed criteria → MCDA
+
+**Focus on:**
+- Robust conclusions (stress test assumptions)
+- Clear communication (translate technical to business language)
+- Actionable insights (not just numbers)
+- Honest limits (acknowledge what analysis can't tell you)
+
+**Remember:**
+- Models inform decisions, don't make them
+- Simple model well-executed beats complex model poorly-executed
+- Transparency about assumptions matters more than sophistication
+- "All models are wrong, some are useful" - George Box
--- a/skills/chain-estimation-decision-storytelling/resources/template.md
+++ b/skills/chain-estimation-decision-storytelling/resources/template.md
@@ -0,0 +1,433 @@
+# Chain Estimation → Decision → Storytelling Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Analysis Progress:
+- [ ] Step 1: Gather inputs and define decision scope
+- [ ] Step 2: Estimate costs, benefits, and probabilities
+- [ ] Step 3: Calculate expected value and compare alternatives
+- [ ] Step 4: Structure narrative with clear recommendation
+- [ ] Step 5: Validate completeness with quality checklist
+```
+
+**Step 1: Gather inputs and define decision scope**
+
+Clarify what decision needs to be made, identify 2-5 alternatives to compare, list key uncertainties (costs, benefits, probabilities), determine audience (executives, technical team, finance), and note constraints (budget, timeline, requirements). Use [Quick Template](#quick-template) structure below.
+
+**Step 2: Estimate costs, benefits, and probabilities**
+
+For each alternative, quantify all relevant costs (development, operation, opportunity cost), estimate benefits (revenue, savings, productivity gains), assign probabilities to scenarios (best/base/worst case), and use ranges rather than point estimates. See [Estimation Guidelines](#estimation-guidelines) for techniques.
+
+**Step 3: Calculate expected value and compare alternatives**
+
+Compute probability-weighted outcomes for each alternative, compare using appropriate decision criteria (NPV, IRR, payback, utility), identify which option has best risk-adjusted return, and test sensitivity to key assumptions. See [Decision Analysis](#decision-analysis) section.
+
+**Step 4: Structure narrative with clear recommendation**
+
+Follow storytelling framework: problem statement, alternatives considered, analysis summary, clear recommendation with reasoning, and next steps. Tailor level of detail to audience. See [Narrative Structure](#narrative-structure) for guidance.
+
+**Step 5: Validate completeness with quality checklist**
+
+Use [Quality Checklist](#quality-checklist) to verify: all alternatives considered, estimates are justified, probabilities are reasonable, expected value is calculated correctly, sensitivity analysis performed, narrative is clear and persuasive, assumptions stated explicitly.
+
+## Quick Template
+
+Copy this structure to create your analysis:
+
+```markdown
+# Decision: {Decision Question}
+
+## 1. Decision Context
+
+**What we're deciding:** {Clear statement of the choice}
+
+**Why this matters:** {Business impact, urgency, strategic importance}
+
+**Alternatives:**
+1. {Option A}
+2. {Option B}
+3. {Option C}
+
+**Key uncertainties:**
+- {Variable 1}: {Range or distribution}
+- {Variable 2}: {Range or distribution}
+- {Variable 3}: {Range or distribution}
+
+**Constraints:**
+- Budget: {Available resources}
+- Timeline: {Decision deadline, implementation timeline}
+- Requirements: {Must-haves, non-negotiables}
+
+**Audience:** {Who needs to approve this decision?}
+
+---
+
+## 2. Estimation
+
+### Alternative 1: {Name}
+
+**Costs:**
+- Initial investment: ${Low}k - ${High}k (most likely: ${Base}k)
+- Annual operational: ${Low}k - ${High}k per year
+- Opportunity cost: {What we give up}
+
+**Benefits:**
+- Revenue impact: +${Low}k - ${High}k (most likely: ${Base}k)
+- Cost savings: ${Low}k - ${High}k per year
+- Strategic value: {Qualitative benefits}
+
+**Probabilities:**
+- Best case (30%): {Scenario description}
+- Base case (50%): {Scenario description}
+- Worst case (20%): {Scenario description}
+
+**Key assumptions:**
+- {Assumption 1}
+- {Assumption 2}
+- {Assumption 3}
+
+### Alternative 2: {Name}
+{Same structure}
+
+### Alternative 3: {Name}
+{Same structure}
+
+---
+
+## 3. Decision Analysis
+
+### Expected Value Calculation
+
+**Alternative 1: {Name}**
+- Best case (30%): ${Amount} × 0.30 = ${Weighted}
+- Base case (50%): ${Amount} × 0.50 = ${Weighted}
+- Worst case (20%): ${Amount} × 0.20 = ${Weighted}
+- **Expected value: ${Total}**
+
+**Alternative 2: {Name}**
+{Same calculation}
+**Expected value: ${Total}**
+
+**Alternative 3: {Name}**
+{Same calculation}
+**Expected value: ${Total}**
+
+### Comparison
+
+| Alternative | Expected Value | Risk Profile | Time to Value | Strategic Fit |
+|-------------|----------------|--------------|---------------|---------------|
+| {Alt 1}     | ${EV}          | {High/Med/Low} | {Timeline}    | {Score/10}    |
+| {Alt 2}     | ${EV}          | {High/Med/Low} | {Timeline}    | {Score/10}    |
+| {Alt 3}     | ${EV}          | {High/Med/Low} | {Timeline}    | {Score/10}    |
+
+### Sensitivity Analysis
+
+**What if {key variable} changes?**
+- If {variable} is 20% higher: {Impact on decision}
+- If {variable} is 20% lower: {Impact on decision}
+
+**Most sensitive to:**
+- {Variable 1}: {Explanation of impact}
+- {Variable 2}: {Explanation of impact}
+
+**Robustness check:**
+- Conclusion holds if {conditions}
+- Would change if {conditions}
+
+---
+
+## 4. Recommendation
+
+**Recommended option: {Alternative X}**
+
+**Reasoning:**
+{1-2 paragraphs explaining why this is the best choice given the analysis}
+
+**Key factors:**
+- {Factor 1}: {Why it matters}
+- {Factor 2}: {Why it matters}
+- {Factor 3}: {Why it matters}
+
+**Tradeoffs accepted:**
+- We're accepting {downside} in exchange for {upside}
+- We're prioritizing {value 1} over {value 2}
+
+**Risks and mitigations:**
+- **Risk**: {What could go wrong}
+  - **Mitigation**: {How we'll address it}
+- **Risk**: {What could go wrong}
+  - **Mitigation**: {How we'll address it}
+
+---
+
+## 5. Next Steps
+
+**If approved:**
+1. {Immediate action 1} - {Owner} by {Date}
+2. {Immediate action 2} - {Owner} by {Date}
+3. {Immediate action 3} - {Owner} by {Date}
+
+**Success metrics:**
+- {Metric 1}: Target {value} by {date}
+- {Metric 2}: Target {value} by {date}
+- {Metric 3}: Target {value} by {date}
+
+**Decision review:**
+- Revisit this decision in {timeframe} to validate assumptions
+- Key indicators to monitor: {metrics to track}
+
+**What would change our mind:**
+- If {condition}, we should reconsider
+- If {condition}, we should accelerate
+- If {condition}, we should pause
+```
+
+---
+
+## Estimation Guidelines
+
+### Cost Estimation
+
+**Categories to consider:**
+- **One-time costs**: Development, implementation, migration, training
+- **Recurring costs**: Subscription fees, maintenance, support, infrastructure
+- **Hidden costs**: Opportunity cost, technical debt, switching costs
+- **Risk costs**: Probability-weighted downside scenarios
+
+**Estimation techniques:**
+- **Analogous**: Similar past projects (adjust for differences)
+- **Parametric**: Cost per unit × quantity (e.g., $150k per engineer × 2 engineers)
+- **Bottom-up**: Estimate components and sum
+- **Three-point**: Best case, most likely, worst case → calculate expected value
+
+**Expressing uncertainty:**
+- Use ranges: $200k-$400k (not $300k)
+- Assign probabilities: 60% likely $300k, 20% $200k, 20% $400k
+- Show confidence: "High confidence" vs "Rough estimate"
+
+### Benefit Estimation
+
+**Categories to consider:**
+- **Revenue impact**: New revenue, increased conversion, higher retention
+- **Cost savings**: Reduced operational costs, avoided hiring, infrastructure savings
+- **Productivity gains**: Time saved × value of time
+- **Risk reduction**: Probability of bad outcome × cost of bad outcome
+- **Strategic value**: Market positioning, competitive advantage, optionality
+
+**Quantification approaches:**
+- **Direct measurement**: Historical data, benchmarks, experiments
+- **Proxy metrics**: Leading indicators that correlate with value
+- **Scenario modeling**: Best/base/worst case with probabilities
+- **Comparable analysis**: Similar initiatives at comparable companies
+
+### Probability Assignment
+
+**How to assign probabilities:**
+- **Base rates**: Start with historical frequency (e.g., 70% of projects finish on time)
+- **Adjustments**: Modify for specific circumstances (this project is simpler/more complex)
+- **Expert judgment**: Multiple estimates, average or calibrated
+- **Reference class forecasting**: Look at similar situations
+
+**Common probability pitfalls:**
+- **Overconfidence**: Ranges too narrow, probabilities too extreme (5% or 95%)
+- **Anchoring**: First number becomes reference even if wrong
+- **Optimism bias**: Best case feels more likely than it is
+- **Planning fallacy**: Underestimating time and cost
+
+**Calibration check:**
+- If you say 70% confident, are you right 70% of the time?
+- Test with past predictions if available
+- Use wider ranges for higher uncertainty
+
+---
+
+## Decision Analysis
+
+### Expected Value Calculation
+
+**Formula:**
+```
+Expected Value = Σ (Outcome × Probability)
+```
+
+**Example:**
+- Best case: $500k × 30% = $150k
+- Base case: $300k × 50% = $150k
+- Worst case: $100k × 20% = $20k
+- Expected value = $150k + $150k + $20k = $320k
+
+**Multi-year NPV:**
+```
+NPV = Σ (Cash Flow_t / (1 + discount_rate)^t)
+```
+
+**When to use:**
+- **Expected value**: When outcomes are roughly linear with value (money, time)
+- **Decision trees**: When sequence of choices matters
+- **Monte Carlo**: When multiple uncertainties interact
+- **Scoring/weighting**: When mix of quantitative and qualitative factors
+
+### Comparison Methods
+
+**1. Expected Value Ranking**
+- Calculate EV for each alternative
+- Rank by highest expected value
+- **Best for**: Decisions with quantifiable outcomes
+
+**2. NPV Comparison**
+- Discount future cash flows to present value
+- Compare NPV across alternatives
+- **Best for**: Multi-year investments
+
+**3. Payback Period**
+- Time to recover initial investment
+- Consider in addition to NPV (not instead of)
+- **Best for**: When liquidity or fast ROI matters
+
+**4. Weighted Scoring**
+- Score each alternative on multiple criteria (1-10)
+- Multiply by importance weight
+- Sum weighted scores
+- **Best for**: Mix of quantitative and qualitative factors
+
+### Sensitivity Analysis
+
+**One-way sensitivity:**
+- Vary one input at a time (e.g., cost ±20%)
+- Check if conclusion changes
+- Identify which inputs matter most
+
+**Tornado diagram:**
+- Show impact of each variable on outcome
+- Order by magnitude of impact
+- Focus on top 2-3 drivers
+
+**Scenario analysis:**
+- Define coherent scenarios (pessimistic, base, optimistic)
+- Calculate outcome for each complete scenario
+- Assign probabilities to scenarios
+
+**Break-even analysis:**
+- At what value of {key variable} does decision change?
+- Provides threshold for monitoring
+
+---
+
+## Narrative Structure
+
+### Executive Summary (for executives)
+
+**Format:**
+1. **The decision** (1 sentence): What we're choosing between
+2. **The recommendation** (1 sentence): What we should do
+3. **The reasoning** (2-3 bullets): Key factors driving recommendation
+4. **The ask** (1 sentence): What approval or resources needed
+5. **The timeline** (1 sentence): When this happens
+
+**Length:** 4-6 sentences, fits in one paragraph
+
+**Example:**
+> "We evaluated building custom analytics vs. buying a SaaS tool. Recommendation: Buy the SaaS solution. Key factors: (1) $130k lower expected cost due to build risk, (2) 6 months faster time-to-value, (3) proven reliability vs. custom development uncertainty. Requesting $20k implementation budget and $120k annual subscription approval. Implementation begins next month with value delivery in 8 weeks."
+
+### Detailed Analysis (for stakeholders)
+
+**Structure:**
+1. **Problem statement**: Why this decision matters (1 paragraph)
+2. **Alternatives considered**: Show you did the work (bullets)
+3. **Analysis approach**: Methodology and assumptions (1 paragraph)
+4. **Key findings**: Numbers, comparison, sensitivity (1-2 paragraphs)
+5. **Recommendation**: Clear choice with reasoning (1-2 paragraphs)
+6. **Risks and mitigations**: What could go wrong (bullets)
+7. **Next steps**: Implementation plan (bullets)
+
+**Length:** 1-2 pages
+
+**Tone:** Professional, balanced, transparent about tradeoffs
+
+### Technical Deep-Dive (for technical teams)
+
+**Additional detail:**
+- Estimation methodology and data sources
+- Sensitivity analysis details
+- Technical assumptions and constraints
+- Implementation considerations
+- Alternative approaches considered and why rejected
+
+**Length:** 2-4 pages
+
+**Tone:** Analytical, rigorous, shows technical depth
+
+---
+
+## Quality Checklist
+
+Before finalizing, verify:
+
+**Estimation quality:**
+- [ ] All relevant costs included (one-time, recurring, opportunity, risk)
+- [ ] All relevant benefits quantified or described
+- [ ] Uncertainty expressed with ranges or probabilities
+- [ ] Assumptions stated explicitly with justification
+- [ ] Sources cited for estimates where applicable
+
+**Decision analysis quality:**
+- [ ] Expected value calculated correctly (probability × outcome)
+- [ ] All alternatives compared fairly
+- [ ] Sensitivity analysis performed on key variables
+- [ ] Robustness tested (does conclusion hold across reasonable ranges?)
+- [ ] Dominant option identified with clear rationale
+
+**Narrative quality:**
+- [ ] Clear recommendation stated upfront
+- [ ] Problem statement explains why decision matters
+- [ ] Alternatives shown (proves due diligence)
+- [ ] Analysis summary appropriate for audience
+- [ ] Tradeoffs acknowledged honestly
+- [ ] Risks and mitigations addressed
+- [ ] Next steps are actionable
+
+**Communication quality:**
+- [ ] Tailored to audience (exec vs technical vs finance)
+- [ ] Jargon explained or avoided
+- [ ] Key numbers highlighted
+- [ ] Visual aids used where helpful (tables, charts)
+- [ ] Length appropriate (not too long or too short)
+
+**Integrity checks:**
+- [ ] No cherry-picking of favorable data
+- [ ] Downside scenarios included, not just upside
+- [ ] Probabilities are calibrated (not overconfident)
+- [ ] "What would change my mind" conditions stated
+- [ ] Limitations and uncertainties acknowledged
+
+---
+
+## Common Decision Types
+
+### Build vs Buy
+- **Estimate**: Dev cost, maintenance, SaaS fees, implementation
+- **Decision**: 3-5 year TCO with risk adjustment
+- **Story**: Control vs. cost, speed vs. customization
+
+### Market Entry
+- **Estimate**: TAM/SAM/SOM, CAC, LTV, time to profitability
+- **Decision**: NPV with market uncertainty scenarios
+- **Story**: Growth opportunity vs. execution risk
+
+### Hiring
+- **Estimate**: Comp, recruiting, ramp time, productivity impact
+- **Decision**: Cost per output vs. alternatives
+- **Story**: Capacity constraints vs. efficiency gains
+
+### Technology Migration
+- **Estimate**: Migration cost, operational savings, risk reduction
+- **Decision**: Multi-year TCO plus risk-adjusted benefits
+- **Story**: Short-term pain for long-term gain
+
+### Resource Allocation
+- **Estimate**: Cost per initiative, expected impact
+- **Decision**: Portfolio optimization or impact/effort ranking
+- **Story**: Given constraints, maximize expected value
--- a/skills/chain-roleplay-debate-synthesis/SKILL.md
+++ b/skills/chain-roleplay-debate-synthesis/SKILL.md
@@ -0,0 +1,261 @@
+---
+name: chain-roleplay-debate-synthesis
+description: Use when facing decisions with multiple legitimate perspectives and inherent tensions. Invoke when stakeholders have competing priorities (growth vs. sustainability, speed vs. quality, innovation vs. risk), need to pressure-test ideas from different angles before committing, exploring tradeoffs between incompatible values, synthesizing conflicting expert opinions into coherent strategy, or surfacing assumptions that single-viewpoint analysis would miss.
+---
+
+# Chain Roleplay → Debate → Synthesis
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Roleplay → Debate → Synthesis Progress:
+- [ ] Step 1: Frame the decision and identify roles
+- [ ] Step 2: Roleplay each perspective authentically
+- [ ] Step 3: Structured debate between viewpoints
+- [ ] Step 4: Synthesize into coherent recommendation
+- [ ] Step 5: Validate synthesis quality
+```
+
+**Step 1: Frame the decision and identify roles**
+
+State the decision clearly as a question, identify 2-5 stakeholder perspectives or roles that have legitimate but competing interests, and clarify what a successful synthesis looks like. See [Decision Framing](#decision-framing) for guidance on choosing productive roles.
+
+**Step 2: Roleplay each perspective authentically**
+
+For each role, articulate their position, priorities, concerns, and evidence. Genuinely advocate for each viewpoint without strawmanning. See [Roleplay Guidelines](#roleplay-guidelines) for authentic advocacy techniques and use [resources/template.md](resources/template.md) for complete structure.
+
+**Step 3: Structured debate between viewpoints**
+
+Facilitate direct clash between perspectives on key points of disagreement. Surface tensions, challenge assumptions, test edge cases, and identify cruxes (what evidence would change each perspective's mind). See [Debate Structure](#debate-structure) for debate formats and facilitation techniques.
+
+**Step 4: Synthesize into coherent recommendation**
+
+Integrate insights from all perspectives into a unified decision that acknowledges tradeoffs, incorporates valid concerns from each viewpoint, and explains what's being prioritized and why. See [Synthesis Patterns](#synthesis-patterns) for integration approaches and [resources/template.md](resources/template.md) for synthesis framework. For complex multi-stakeholder decisions, see [resources/methodology.md](resources/methodology.md).
+
+**Step 5: Validate synthesis quality**
+
+Check synthesis against [resources/evaluators/rubric_chain_roleplay_debate_synthesis.json](resources/evaluators/rubric_chain_roleplay_debate_synthesis.json) to ensure all perspectives were represented authentically, debate surfaced real tensions, synthesis is coherent and actionable, and no perspective was dismissed without engagement. See [When NOT to Use This Skill](#when-not-to-use-this-skill) to confirm this approach was appropriate.
+
+---
+
+## Decision Framing
+
+### Choosing Productive Roles
+
+**Good role selection:**
+- **Competing interests**: Roles have legitimate but different priorities (e.g., Speed Advocate vs. Quality Guardian)
+- **Different expertise**: Roles bring distinct knowledge domains (e.g., Engineer, Designer, Customer)
+- **Value tensions**: Roles represent incompatible values (e.g., Privacy Advocate vs. Personalization)
+- **Stakeholder representation**: Roles map to real decision-makers or affected parties
+
+**Typical role patterns:**
+- **Functional roles**: Engineer, Designer, PM, Marketer, Finance, Legal, Customer
+- **Archetype roles**: Optimist, Pessimist, Risk Manager, Visionary, Pragmatist
+- **Stakeholder roles**: Customer, Employee, Investor, Community, Regulator
+- **Value roles**: Ethics Officer, Growth Hacker, Brand Guardian, Innovation Lead
+- **Temporal roles**: Short-term Thinker, Long-term Strategist
+
+**How many roles:**
+- **2 roles**: Clean binary debate (build vs. buy, growth vs. profitability)
+- **3 roles**: Triadic tension (speed vs. quality vs. cost)
+- **4-5 roles**: Multi-stakeholder complexity (product strategy with eng, design, marketing, finance, customer)
+- **Avoid >5**: Becomes unwieldy, synthesis too complex
+
+### Framing the Question
+
+**Strong framing:**
+- "Should we prioritize X over Y?" (clear tradeoff)
+- "What's the right balance between A and B?" (explicit tension)
+- "Should we pursue strategy X?" (specific, actionable)
+
+**Weak framing:**
+- "What should we do?" (too vague)
+- "How can we have our cake and eat it too?" (assumes false resolution)
+- "Who's right?" (assumes winner rather than synthesis)
+
+---
+
+## Roleplay Guidelines
+
+### Authentic Advocacy
+
+**Each role should:**
+1. **State position clearly**: What do they believe should be done?
+2. **Articulate priorities**: What values or goals drive this position?
+3. **Surface concerns**: What risks or downsides do they see in other approaches?
+4. **Provide evidence**: What data, experience, or reasoning supports this view?
+5. **Show vulnerability**: What uncertainties or limitations does this role acknowledge?
+
+**Avoiding strawmen:**
+- ❌ "The engineer just wants to use shiny new tech" (caricature)
+- ✅ "The engineer values maintainability and believes new framework reduces technical debt"
+
+- ❌ "Sales only cares about closing deals" (dismissive)
+- ✅ "Sales is accountable for revenue and sees this feature as critical for competitive positioning"
+
+**Empathy without capitulation:**
+You can deeply understand a perspective without agreeing with it. Each role should be the "hero of their own story."
+
+### Perspective-Taking Checklist
+
+For each role, answer:
+- [ ] What success looks like from this perspective
+- [ ] What failure looks like from this perspective
+- [ ] What metrics or evidence this role finds most compelling
+- [ ] What this role fears about alternative approaches
+- [ ] What this role knows that others might not
+- [ ] What constraints or pressures this role faces
+
+---
+
+## Debate Structure
+
+### Facilitating Productive Clash
+
+**Debate formats:**
+
+**1. Point-Counterpoint**
+- Role A makes case for their position
+- Role B responds with objections and counterarguments
+- Role A addresses objections
+- Repeat with Role B's case
+
+**2. Devil's Advocate**
+- One role presents the "default" or "obvious" choice
+- Other roles systematically challenge assumptions and surface risks
+- Goal: Pressure-test before committing
+
+**3. Constructive Confrontation**
+- Identify 3-5 key decision dimensions (cost, speed, risk, quality, etc.)
+- Each role articulates position on each dimension
+- Surface where perspectives conflict most
+
+**4. Crux-Finding**
+- Ask each role: "What would need to be true for you to change your mind?"
+- Identify testable assumptions or evidence that would shift debate
+- Focus discussion on cruxes rather than rehashing positions
+
+### Questions to Surface Tensions
+
+- "What's the strongest argument against your position?"
+- "What does [other role] see that you might be missing?"
+- "Where is the irreducible tradeoff between your perspectives?"
+- "If you had to steelman the opposing view, what would you say?"
+- "What happens in edge cases for your approach?"
+- "What are you optimizing for that others aren't?"
+
+### Red Flags in Debate
+
+- **Premature consensus**: Roles agree too quickly without surfacing real tensions
+- **Talking past each other**: Roles argue different points rather than engaging
+- **Appeal to authority**: "Because the CEO said so" rather than reasoning
+- **False dichotomies**: "Either we do X or we fail" without exploring middle ground
+- **Unsupported claims**: "Everyone knows Y" without evidence or reasoning
+
+---
+
+## Synthesis Patterns
+
+### Integration Approaches
+
+**1. Weighted Synthesis**
+- "We'll prioritize X, while incorporating safeguards for Y's concerns"
+- Example: "Ship fast (PM's priority), but with feature flags and monitoring (Engineer's concern)"
+
+**2. Sequencing**
+- "First we do X, then we address Y"
+- Example: "Launch MVP to test market (Growth), then invest in quality (Engineering) once product-market fit is proven"
+
+**3. Conditional Strategy**
+- "If condition A, do X; if condition B, do Y"
+- Example: "If adoption > 10K users in Q1, invest in scale; otherwise, pivot based on feedback"
+
+**4. Hybrid Approach**
+- "Combine elements of multiple perspectives"
+- Example: "Build core in-house (control) but buy peripheral components (speed)"
+
+**5. Reframing**
+- "Debate reveals the real question is Z, not X vs Y"
+- Example: "Debate about pricing reveals we need to segment customers first"
+
+**6. Elevating Constraint**
+- "Identify the binding constraint both perspectives agree on"
+- Example: "Both speed and quality advocates agree engineering capacity is the bottleneck; synthesis is to hire first"
+
+### Synthesis Quality Markers
+
+**Strong synthesis:**
+- ✅ Acknowledges validity of multiple perspectives
+- ✅ Explains what's being prioritized and why
+- ✅ Addresses major concerns from each viewpoint
+- ✅ Clear on tradeoffs being accepted
+- ✅ Actionable recommendation
+- ✅ Monitoring plan for key assumptions
+
+**Weak synthesis:**
+- ❌ "Let's do everything" (no prioritization)
+- ❌ "X wins, Y loses" (dismisses valid concerns)
+- ❌ "We need more information" (avoids decision)
+- ❌ "It depends" without specifying conditions
+- ❌ Vague platitudes without concrete next steps
+
+---
+
+## Examples
+
+### Example 1: Short-form Synthesis
+
+**Decision**: Should we rewrite our monolith as microservices?
+
+**Roles**:
+- **Scalability Engineer**: We need microservices to scale independently and deploy faster
+- **Pragmatic Engineer**: Rewrite is 12-18 months with high risk; monolith works fine
+- **Finance**: What's the ROI? Rewrite costs $2M in eng time
+
+**Synthesis**:
+Don't rewrite everything, but extract the 2-3 services with clear scaling needs (authentication, payment processing) as independent microservices. Keep core business logic in monolith for now. This addresses scalability concerns for bottleneck components (Scalability Engineer), limits risk and timeline (Pragmatic Engineer), and reduces cost to $400K vs. $2M (Finance). Revisit full migration if extracted services succeed and prove the pattern.
+
+### Example 2: Full Analysis
+
+For a complete worked example with detailed roleplay, debate, and synthesis, see:
+- [resources/examples/build-vs-buy-crm.md](resources/examples/build-vs-buy-crm.md) - Sales, Engineering, Finance debate CRM platform decision
+
+---
+
+## When NOT to Use This Skill
+
+**Skip roleplay-debate-synthesis when:**
+
+❌ **Single clear expert**: If one person has definitive expertise and others defer, just ask the expert
+❌ **No genuine tension**: If stakeholders actually agree, debate is artificial
+❌ **Values cannot be negotiated**: If ethical red line, don't roleplay the unethical side
+❌ **Time-critical decision**: If decision must be made in minutes, skip full debate
+❌ **Implementation details**: If decision is "how" not "whether" or "what", use technical collaboration not debate
+
+**Use simpler approaches when:**
+- ✅ Decision is straightforward with clear data → Use decision matrix or expected value
+- ✅ Need creative options not evaluation → Use brainstorming not debate
+- ✅ Need detailed analysis not perspective clash → Use analytical frameworks
+- ✅ Implementation planning not decision-making → Use project planning not roleplay
+
+---
+
+## Advanced Techniques
+
+For complex multi-stakeholder decisions, see [resources/methodology.md](resources/methodology.md) for:
+- **Multi-round debates** (iterative refinement of positions)
+- **Audience-perspective shifts** (how synthesis changes for different stakeholders)
+- **Facilitation anti-patterns** (how debates go wrong)
+- **Synthesis under uncertainty** (when evidence is incomplete)
+- **Stakeholder mapping** (identifying who needs to be represented)
+
+---
+
+## Resources
+
+- **[resources/template.md](resources/template.md)** - Structured template for roleplay → debate → synthesis analysis
+- **[resources/methodology.md](resources/methodology.md)** - Advanced facilitation techniques and debate formats
+- **[resources/examples/](resources/examples/)** - Complete worked examples across domains
+- **[resources/evaluators/rubric_chain_roleplay_debate_synthesis.json](resources/evaluators/rubric_chain_roleplay_debate_synthesis.json)** - Quality assessment rubric (10 criteria)
--- a/skills/chain-roleplay-debate-synthesis/resources/evaluators/rubric_chain_roleplay_debate_synthesis.json
+++ b/skills/chain-roleplay-debate-synthesis/resources/evaluators/rubric_chain_roleplay_debate_synthesis.json
@@ -0,0 +1,186 @@
+{
+  "criteria": [
+    {
+      "name": "Perspective Authenticity",
+      "description": "Are roles represented genuinely without strawman arguments?",
+      "scale": {
+        "1": "Roles are caricatures or strawmen. Positions feel artificial or dismissive ('X just wants...'). No genuine advocacy.",
+        "2": "Some roles feel authentic but others are weak. Uneven representation. Some strawmanning present.",
+        "3": "Most roles feel genuine. Positions are reasonable but may lack depth. Minimal strawmanning.",
+        "4": "All roles authentically represented. Each is 'hero of their own story.' Steelman approach evident. Strong advocacy for each perspective.",
+        "5": "Exceptional authenticity. Every role's position is compellingly argued. Reader could genuinely support any perspective. Intellectual empathy demonstrated throughout."
+      }
+    },
+    {
+      "name": "Depth of Roleplay",
+      "description": "Are priorities, concerns, evidence, and vulnerabilities fully articulated for each role?",
+      "scale": {
+        "1": "Roles are one-dimensional. Only positions stated, no priorities, concerns, or evidence.",
+        "2": "Positions and some priorities stated. Concerns and evidence missing or thin.",
+        "3": "Positions, priorities, and concerns articulated. Evidence present but may be thin. Vulnerabilities rarely acknowledged.",
+        "4": "Comprehensive roleplay. Clear position, well-justified priorities, specific concerns, supporting evidence, and acknowledged vulnerabilities for each role.",
+        "5": "Exceptionally deep roleplay. Rich detail on priorities, nuanced concerns, strong evidence, intellectual honesty about vulnerabilities and limits. Success metrics defined for each role."
+      }
+    },
+    {
+      "name": "Debate Quality",
+      "description": "Do perspectives genuinely clash on key points of disagreement?",
+      "scale": {
+        "1": "No actual debate. Roles present positions but don't engage. Talking past each other or premature consensus.",
+        "2": "Limited engagement. Some responses to other roles but mostly separate monologues. Key tensions not surfaced.",
+        "3": "Moderate debate. Roles respond to each other on main points. Some clash evident but could go deeper.",
+        "4": "Strong debate. Perspectives directly engage on 3-5 key dimensions. Real clash of ideas. Tensions clearly surfaced.",
+        "5": "Exceptional debate. Deep engagement with point-counterpoint structure. All major tensions explored thoroughly. Debate format (devil's advocate, crux-finding, etc.) skillfully applied."
+      }
+    },
+    {
+      "name": "Tension Surfacing",
+      "description": "Are irreducible tradeoffs and conflicts explicitly identified?",
+      "scale": {
+        "1": "No tensions identified. Falsely suggests all perspectives align. Missing the point of debate.",
+        "2": "Some tensions mentioned but not explored. Glossed over or minimized.",
+        "3": "Main tensions identified (2-3 key tradeoffs). Reasonably clear where perspectives conflict.",
+        "4": "All major tensions surfaced and explored. Clear on irreducible tradeoffs (speed vs quality, cost vs flexibility, etc.). Dimensions of disagreement explicit.",
+        "5": "Comprehensive tension mapping. All conflicts identified, categorized, and explored deeply. False dichotomies challenged. Genuine irreducible tradeoffs distinguished from resolvable disagreements."
+      }
+    },
+    {
+      "name": "Crux Identification",
+      "description": "Are conditions that would change each role's mind identified?",
+      "scale": {
+        "1": "No cruxes identified. Positions appear fixed and immovable.",
+        "2": "Vague acknowledgment that positions could change but no specifics.",
+        "3": "Some cruxes identified for some roles. Moderate specificity on what would change minds.",
+        "4": "Clear cruxes for all roles. Specific conditions or evidence that would shift positions. Enables productive focus on key uncertainties.",
+        "5": "Exceptional crux identification. Specific, testable conditions for each role. Distinguishes between cruxes (truly pivotal) and nice-to-haves. Debate explicitly focuses on resolving cruxes."
+      }
+    },
+    {
+      "name": "Synthesis Coherence",
+      "description": "Is the unified recommendation logical, well-integrated, and addresses the decision?",
+      "scale": {
+        "1": "No synthesis. Just restates positions or says 'we need more information.' Avoids deciding.",
+        "2": "Weak synthesis. Either 'do everything' (no prioritization) or 'X wins, Y loses' (dismisses perspectives). Not truly integrated.",
+        "3": "Moderate synthesis. Clear recommendation but integration is shallow. May not fully address all concerns or explain tradeoffs.",
+        "4": "Strong synthesis. Coherent recommendation that integrates insights from all perspectives. Integration pattern clear (weighted, sequencing, conditional, hybrid, reframing, constraint elevation). Addresses decision directly.",
+        "5": "Exceptional synthesis. Deeply integrated recommendation better than any single perspective. Pattern expertly applied. Innovative solution that satisfies multiple objectives. Elegant and actionable."
+      }
+    },
+    {
+      "name": "Concern Integration",
+      "description": "Are each role's core concerns explicitly addressed in the synthesis?",
+      "scale": {
+        "1": "No concern integration. Synthesis ignores or dismisses most perspectives' concerns.",
+        "2": "Limited integration. Addresses concerns from 1-2 roles, ignores others.",
+        "3": "Moderate integration. Most roles' main concerns acknowledged but may not be fully addressed. Some perspectives feel under-represented.",
+        "4": "Strong integration. All roles' core concerns explicitly addressed. Clear explanation of how synthesis handles each perspective's priorities.",
+        "5": "Exceptional integration. Every role's concerns not just addressed but shown to be *strengthened* by the integrated approach. Synthesis makes each perspective better off than going solo."
+      }
+    },
+    {
+      "name": "Tradeoff Transparency",
+      "description": "Are accepted tradeoffs and rejected alternatives clearly explained?",
+      "scale": {
+        "1": "No tradeoff transparency. Synthesis presented as 'best of all worlds' without acknowledging costs.",
+        "2": "Minimal transparency. Vague acknowledgment that tradeoffs exist but not specified.",
+        "3": "Moderate transparency. Main tradeoffs mentioned. Some explanation of what's being accepted/rejected.",
+        "4": "Strong transparency. Clear on what's prioritized and what's sacrificed. Explicit rationale for tradeoffs. Alternatives rejected with reasons.",
+        "5": "Exceptional transparency. Comprehensive accounting of all tradeoffs. Clear on second-order effects. Honest about what each perspective gives up and why it's worth it. 'What we're NOT doing' as clear as 'What we ARE doing.'"
+      }
+    },
+    {
+      "name": "Actionability",
+      "description": "Are next steps specific, feasible, with owners and timelines?",
+      "scale": {
+        "1": "No action plan. Vague or missing next steps.",
+        "2": "Vague next steps. 'Consider X', 'Explore Y.' No owners or timelines.",
+        "3": "Moderate actionability. Next steps identified but lack detail on who/when/how. Success metrics missing or vague.",
+        "4": "Strong actionability. Clear next steps with owners and dates. Success metrics defined. Implementation approach specified (phased, conditional, etc.).",
+        "5": "Exceptional actionability. Detailed implementation plan. Owners assigned, timeline clear, milestones defined, success metrics from each role's perspective, monitoring plan, escalation conditions, decision review cadence."
+      }
+    },
+    {
+      "name": "Stakeholder Readiness",
+      "description": "Is synthesis communicated appropriately for different audiences?",
+      "scale": {
+        "1": "No stakeholder tailoring. Single narrative that may not resonate with any audience.",
+        "2": "Minimal tailoring. One-size-fits-all communication. May be too technical for execs or too vague for implementers.",
+        "3": "Moderate tailoring. Some audience awareness. Key messages identified but not fully adapted.",
+        "4": "Strong tailoring. Synthesis clearly communicates to different stakeholders (exec summary, technical detail, operational guidance). Appropriate emphasis for each audience.",
+        "5": "Exceptional tailoring. Multiple versions or sections for different stakeholders. Executive summary for leaders, technical appendix for specialists, operational guide for implementation teams. Anticipates questions from each audience."
+      }
+    }
+  ],
+  "minimum_standard": 3.5,
+  "complexity_guidance": {
+    "simple_decisions": {
+      "threshold": 3.0,
+      "description": "Binary choices with 2-3 clear stakeholders (e.g., build vs buy, speed vs quality). Acceptable to have simpler analysis (criteria 3-4).",
+      "focus_criteria": ["Perspective Authenticity", "Tension Surfacing", "Synthesis Coherence"]
+    },
+    "standard_decisions": {
+      "threshold": 3.5,
+      "description": "Multi-faceted decisions with 3-4 stakeholders and competing priorities (e.g., product roadmap, hiring strategy, market entry). Standard threshold applies (criteria average ≥3.5).",
+      "focus_criteria": ["All criteria should meet threshold"]
+    },
+    "complex_decisions": {
+      "threshold": 4.0,
+      "description": "Strategic decisions with 4-5+ stakeholders, power dynamics, and high stakes (e.g., company strategy, major pivots, organizational change). Higher bar required (criteria average ≥4.0).",
+      "focus_criteria": ["Depth of Roleplay", "Debate Quality", "Concern Integration", "Tradeoff Transparency"],
+      "additional_requirements": ["Stakeholder mapping", "Multi-round debate structure", "Facilitation anti-pattern awareness"]
+    }
+  },
+  "common_failure_modes": [
+    {
+      "failure": "Strawman arguments",
+      "symptoms": "Roles are caricatured or weakly represented. 'X just wants shiny tech', 'Y only cares about money.'",
+      "fix": "Steelman approach: Present the *strongest* version of each perspective. Each role is 'hero of their own story.' If a position feels weak, strengthen it."
+    },
+    {
+      "failure": "Premature consensus",
+      "symptoms": "Roles agree too quickly without genuine debate. 'We all want the same thing.'",
+      "fix": "Play devil's advocate. Ask: 'What could go wrong?' 'Where do you disagree?' Test agreement with edge cases. Give permission to disagree."
+    },
+    {
+      "failure": "Talking past each other",
+      "symptoms": "Roles present positions but don't engage. Arguments on different dimensions (one says cost, other says speed).",
+      "fix": "Make dimensions of disagreement explicit. Force direct engagement: '[Role A], respond to [Role B]'s point about X.'"
+    },
+    {
+      "failure": "False dichotomies",
+      "symptoms": "'Either X or we fail.' 'We must choose between quality and speed.'",
+      "fix": "Challenge the dichotomy. Explore middle ground, sequencing, conditional strategies. Ask: 'Are those really the only two options?'"
+    },
+    {
+      "failure": "Synthesis dismisses perspectives",
+      "symptoms": "'X wins, Y loses.' Some roles' concerns ignored or minimized in synthesis.",
+      "fix": "Explicitly address every role's core concerns. Show how synthesis incorporates (not dismisses) each perspective. Check: 'Does this address your concern about [issue]?'"
+    },
+    {
+      "failure": "Vague tradeoffs",
+      "symptoms": "Synthesis presented as win-win without acknowledging costs. No transparency on what's sacrificed.",
+      "fix": "Make tradeoffs explicit. 'We're prioritizing X, accepting Y, rejecting Z because [rationale].' Be honest about costs."
+    },
+    {
+      "failure": "Analysis paralysis",
+      "symptoms": "'We need more information.' Endless debate without convergence. Perfect information fallacy.",
+      "fix": "Set decision deadline. Clarify decision criteria. Good-enough threshold. Explicitly trade off value of info vs cost of delay."
+    },
+    {
+      "failure": "Dominant voice",
+      "symptoms": "One role speaks 70%+ of time. Others defer. Synthesis reflects single perspective.",
+      "fix": "Explicit turn-taking. Direct questions to quieter roles. Affirm their contributions. Check power dynamics."
+    },
+    {
+      "failure": "No actionability",
+      "symptoms": "Synthesis is conceptual but not actionable. No clear next steps, owners, or timeline.",
+      "fix": "Specify: Who does what by when? How do we measure success? What's the implementation approach (phased, conditional)? When do we review?"
+    },
+    {
+      "failure": "Single narrative for all audiences",
+      "symptoms": "Same explanation for execs, engineers, and operators. Too technical or too vague for some.",
+      "fix": "Tailor communication. Exec summary (strategic), technical brief (implementation), operational guide (execution). Emphasize what each audience cares about."
+    }
+  ],
+  "usage_notes": "Use this rubric to self-assess before delivering synthesis. For simple binary decisions, 3.0+ is acceptable. For standard multi-stakeholder decisions, aim for 3.5+. For complex strategic decisions with high stakes, aim for 4.0+. Pay special attention to Perspective Authenticity, Synthesis Coherence, and Concern Integration as these are most critical for effective roleplay-debate-synthesis."
+}
--- a/skills/chain-roleplay-debate-synthesis/resources/examples/build-vs-buy-crm.md
+++ b/skills/chain-roleplay-debate-synthesis/resources/examples/build-vs-buy-crm.md
@@ -0,0 +1,599 @@
+# Decision: Build Custom CRM vs. Buy SaaS CRM
+
+**Date**: 2024-11-02
+**Decision-maker**: CTO + VP Sales
+**Stakes**: Medium (affects 30-person sales team, $200K-$800K over 3 years)
+
+---
+
+## 1. Decision Context
+
+**What we're deciding:**
+Should we build a custom CRM tailored to our sales process or buy an existing SaaS CRM platform (Salesforce, HubSpot, etc.)?
+
+**Why this matters:**
+- Current CRM is a patchwork of spreadsheets and email threads (manual, error-prone)
+- Sales team losing deals due to poor pipeline visibility and follow-up tracking
+- Need solution operational within 6 months to support Q4 sales push
+- Decision impacts sales productivity for next 3+ years
+
+**Success criteria for synthesis:**
+- Addresses all three stakeholders' core concerns
+- Clear on timeline and costs
+- Actionable implementation plan
+- All parties can commit to making it work
+
+**Constraints:**
+- Budget: $150K available this year, $50K/year ongoing
+- Timeline: Must be operational within 6 months
+- Requirements: Support 30 users, custom deal stages, integration with current tools
+- Non-negotiable: Cannot disrupt Q4 sales push (our busiest quarter)
+
+**Audience:** Executive team (CEO, CTO, VP Sales, CFO)
+
+---
+
+## 2. Roles & Perspectives
+
+### Role 1: VP Sales ("Revenue Defender")
+
+**Position**: Buy SaaS CRM - we need it operational fast to support sales team.
+
+**Priorities**:
+1. **Speed to value**: Sales team needs this now, not in 12 months
+2. **Proven reliability**: Can't afford downtime or bugs during Q4 push
+3. **User adoption**: Sales reps need familiar, polished interface
+4. **Feature completeness**: Reporting, forecasting, mobile app out-of-box
+
+**Concerns about building custom**:
+- Development will take 12-18 months (historical track record of eng projects)
+- Custom tools are often clunky and sales reps hate them
+- Maintenance burden (who fixes bugs when it breaks?)
+- Opportunity cost: Every month without CRM costs us deals
+
+**Evidence**:
+- Previous custom tool (deal calculator) took 14 months to build and was buggy for first 6 months
+- Sales team productivity dropped 20% during transition to last custom tool
+- Industry benchmark: SaaS CRM implementation takes 2-3 months vs. 12+ for custom build
+
+**Vulnerabilities**:
+- "I acknowledge SaaS solutions have vendor lock-in risk and ongoing subscription costs"
+- "If engineering can credibly deliver in 6 months with high quality, I'd reconsider"
+- "Main uncertainty: Will SaaS meet our unique sales process needs (2-stage approval workflow)?"
+
+**Success metrics**:
+- Time to operational: <3 months
+- User adoption: >90% of sales reps using daily within 30 days
+- Deal velocity: Reduce avg sales cycle from 60 days to 45 days
+- Win rate: Maintain or improve current 25% win rate
+
+---
+
+### Role 2: Engineering Lead ("Builder")
+
+**Position**: Build custom CRM - we can tailor exactly to our needs and own the platform.
+
+**Priorities**:
+1. **Perfect fit**: Our sales process is unique (2-stage approval, custom deal stages); SaaS won't fit
+2. **Long-term ownership**: Build once, no ongoing subscription fees ($50K/year saved)
+3. **Extensibility**: Can add features as we grow (integrations, automation, reporting)
+4. **Technical debt reduction**: Opportunity to modernize our stack
+
+**Concerns about buying SaaS**:
+- Vendor lock-in: Stuck with their roadmap, pricing, and limitations
+- Data ownership: Our customer data lives on their servers
+- Customization limits: SaaS tools are 80% fit, not 100%
+- Hidden costs: Subscription fees compound, APIs cost extra, training costs
+
+**Evidence**:
+- Previous SaaS tool (project management) had 15% of features unused but we still paid for them
+- Customization via SaaS APIs is limited and often breaks on upgrades
+- Engineering team has capacity: 2 senior engineers available for 4-6 months
+- Build cost: $200K-$300K upfront vs. $50K/year SaaS ($250K over 5 years)
+
+**Vulnerabilities**:
+- "I acknowledge custom builds often take longer than estimated (our track record is +40%)"
+- "If SaaS can handle our 2-stage approval workflow, that reduces the 'perfect fit' advantage"
+- "Main uncertainty: Will we actually build all planned features or ship minimal version?"
+
+**Success metrics**:
+- Feature completeness: 100% of specified requirements met
+- Cost: <$300K initial build + <$30K/year maintenance
+- Extensibility: Add 2-3 new features per year as business evolves
+- Technical quality: <5 bugs per month in production
+
+---
+
+### Role 3: Finance/Operations ("Cost Realist")
+
+**Position**: Depends on ROI - need to see credible numbers for both options.
+
+**Priorities**:
+1. **Total cost of ownership (TCO)**: Upfront + ongoing costs over 3-5 years
+2. **Risk-adjusted value**: Factor in probability of delays, overruns, adoption failure
+3. **Payback period**: How quickly do we recoup investment via productivity gains?
+4. **Operational predictability**: Prefer predictable costs to variable/uncertain
+
+**Concerns about both options**:
+- **Build**: High upfront cost, uncertain timeline, may exceed budget
+- **Buy**: Ongoing subscription compounds, pricing can increase, switching costs if we change later
+- **Both**: Adoption risk (if sales team doesn't use it, value = $0)
+
+**Evidence**:
+- Current manual CRM costs us 20 hours/week of sales rep time (= $40K/year in lost productivity)
+- Average SaaS CRM for 30 users: $50K/year (HubSpot Professional)
+- Engineering project cost overruns average +40% from initial estimate (historical data)
+- Failed internal tool adoption has happened twice in past 3 years (calc tool, reporting dashboard)
+
+**Vulnerabilities**:
+- "I'm making assumptions about productivity gains that haven't been validated"
+- "If custom build delivers high value in 6 months (unlikely but possible), ROI beats SaaS"
+- "Main uncertainty: Adoption rate - will sales team actually use whichever solution we choose?"
+
+**Success metrics**:
+- TCO: Minimize 3-year total cost (build + operate)
+- Payback: <18 months to recoup investment
+- Adoption: >80% active usage (logging deals, updating pipeline)
+- Predictability: <20% variance from projected costs
+
+---
+
+## 3. Debate
+
+### Key Points of Disagreement
+
+**Dimension 1: Timeline - Fast (Buy) vs. Perfect (Build)**
+- **VP Sales**: Need it in 3 months to support Q4. Cannot wait 12+ months for custom build.
+- **Engineering Lead**: 6-month build timeline is achievable with focused team. SaaS implementation still takes 2-3 months (not that much faster).
+- **Finance**: Every month of delay costs $3K in lost productivity. Time is money.
+- **Tension**: Speed to value vs. building it right
+
+**Dimension 2: Fit - Good Enough (Buy) vs. Perfect (Build)**
+- **VP Sales**: 80% fit is fine if it's reliable and fast. Sales reps can adapt process slightly.
+- **Engineering Lead**: 80% fit means 20% pain forever. Our 2-stage approval is core to how we sell.
+- **Finance**: "Perfect fit" is theoretical. Custom builds often ship with missing features or bugs.
+- **Tension**: Process flexibility vs. tool flexibility
+
+**Dimension 3: Cost - Predictable (Buy) vs. Upfront (Build)**
+- **VP Sales**: $50K/year is predictable, budgetable, and includes support/updates.
+- **Engineering Lead**: $50K/year = $250K over 5 years. Build is $300K once, then $30K/year = $420K over 5 years. But we own it.
+- **Finance**: Engineering track record suggests $300K becomes $400K. Risk-adjusted, not clear build is cheaper.
+- **Tension**: Upfront vs. ongoing, certain vs. uncertain
+
+**Dimension 4: Risk - Operational (Buy) vs. Execution (Build)**
+- **VP Sales**: Execution risk is high (engineering track record). Operational risk with SaaS is low (proven, 99.9% uptime).
+- **Engineering Lead**: Vendor risk is real (lock-in, pricing changes, product direction). We control our own destiny with custom build.
+- **Finance**: Both have risks. Mitigatable? SaaS = switching costs. Build = timeline/budget overruns.
+- **Tension**: Vendor dependency vs. execution capability
+
+### Debate Transcript (Point-Counterpoint)
+
+**VP Sales Opening Case:**
+
+Look, I get the appeal of building our perfect CRM, but we cannot afford to wait 12-18 months. Our sales team is losing deals *right now* because we don't have proper pipeline tracking. I had a rep miss a $50K deal last month because a follow-up fell through the cracks in our spreadsheet system.
+
+The reality is that engineering has a track record of delays. The deal calculator took 14 months instead of 8. The reporting dashboard was 6 months late. If we start a custom build today, best case is 12 months to launch, realistically 18 months. By then, we'll have lost $100K+ in productivity and missed deals.
+
+A SaaS solution like HubSpot can be operational in 2-3 months. Yes, it's not a perfect fit for our 2-stage approval, but we can adapt our process slightly or use workflow automation. The key is: it works, it's reliable, and my sales team will actually use it because it's polished and familiar.
+
+The $50K/year cost is worth it for the speed, reliability, and ongoing support. We get updates, new features, and 24/7 support included. If we build custom and it breaks during Q4, who fixes it? Engineering is focused on product features, not internal tools.
+
+**Engineering Lead Response:**
+
+I appreciate the urgency, but "speed at any cost" is how we end up with tools that don't fit our needs and frustrate users. Let me address the timeline concern: I'm proposing a 6-month build, not 12-18 months. We scope it tightly to core CRM features (contacts, deals, pipeline, basic reporting). No gold-plating.
+
+Yes, past projects have run over, but those were exploratory with unclear requirements. This is a well-defined problem. We've built CRMs before (I personally built one at my previous company in 5 months). With 2 senior engineers dedicated full-time, 6 months is realistic.
+
+On the "good enough" point: Our 2-stage approval process is non-negotiable. It's a compliance requirement and competitive differentiator. SaaS CRMs make this painful or impossible. I've demoed HubSpot - their approval workflows are clunky and don't match our needs. Sales reps will work around the system, undermining data quality.
+
+On cost: Yes, $50K/year seems manageable, but that's $250K over 5 years, $500K over 10 years, *forever*. And that's today's price. SaaS pricing increases 5-10% per year. We're also locked in - switching CRMs later is a $100K+ migration project. Build once, own forever, total control.
+
+And let's not forget: We can build exactly what we need. Want a custom integration with our accounting system? Done. Want automated deal scoring based on our proprietary criteria? Done. SaaS tools require expensive consultants or hacky API integrations that break on upgrades.
+
+**VP Sales Rebuttal:**
+
+Six months is your *best case* estimate. But you're assuming everything goes perfectly: no scope creep, no technical challenges, no distractions from product team priorities. Even if you hit 8 months, that's 5 months longer than SaaS, costing us $15K+ in lost productivity.
+
+On the 2-stage approval: I've seen it work in HubSpot via custom workflows and approval automations. It's not perfect, but it's workable. And honestly, maybe our process could use some optimization. SaaS CRMs embody industry best practices - might be an opportunity to streamline.
+
+On the "locked in" concern: Sure, switching CRMs is painful. But building custom and then replacing it later is *also* painful (arguably more so - you're throwing away $300K of investment). At least with SaaS, you can evaluate competitors and switch if needed. With custom, you're locked into your own technical debt.
+
+Here's my core concern: *Adoption risk*. Sales reps have used Salesforce or HubSpot before. They know the interface, the mobile app, the workflows. Your custom CRM, no matter how well-built, will feel unfamiliar. Adoption could be slow or fail entirely. Then we've spent $300K on a tool no one uses.
+
+**Engineering Lead Rebuttal:**
+
+On timeline: I hear your skepticism given our track record. What if we mitigate that risk? We could do a phased approach: MVP in 3 months (just contact management and deal tracking), then add features incrementally. If the MVP works, we continue. If not, we bail after $75K and switch to SaaS. That's a middle ground.
+
+On adoption: Fair point. But I'd argue familiarity cuts both ways. Sales reps are frustrated with *current* tools that don't fit our process. A custom tool that matches their workflow could actually drive *better* adoption than a foreign SaaS tool they need to work around. We'd design with sales team input from day one.
+
+On process optimization: I'm not opposed to improving the process, but the 2-stage approval isn't arbitrary - it's a compliance requirement from our legal team. We can't just "streamline" it away. A SaaS tool that doesn't support it is a non-starter, full stop.
+
+On the cost trajectory: You're right that switching from custom later is painful. But switching from SaaS is *also* painful, *plus* we've paid $250K in subscription fees *plus* we've built dependencies on their platform. At least with custom, we own the asset.
+
+**Finance Cross-Examination:**
+
+**Finance to Engineering**: "You say 6 months and $300K. Our historical data shows engineering projects run +40% over budget. Shouldn't we plan for 8-9 months and $400K as the realistic case?"
+
+**Engineering**: "Fair. If we scope tightly and use my phased approach (MVP first), I'm confident in 6 months. But yes, let's budget $350K to be safe."
+
+**Finance to VP Sales**: "You say SaaS is $50K/year. Have you confirmed that price includes 30 users, all features we need, and API access for integrations?"
+
+**VP Sales**: "I've gotten quotes. HubSpot Professional is $45K/year for 30 users. Salesforce is $55K/year. Both include core features. API access is included, though there are rate limits."
+
+**Finance to both**: "Here's the cost comparison I'm seeing:
+- **SaaS (HubSpot)**: $20K implementation + $50K/year × 3 years = $170K (3-year TCO)
+- **Build**: $350K (risk-adjusted) + $30K/year × 3 years = $440K (3-year TCO)
+
+SaaS is $270K cheaper over 3 years. Engineering, what would need to be true for build to be cheaper?"
+
+**Engineering**: "Two things: (1) We'd need to run the custom CRM for 7+ years for TCO to break even. (2) If SaaS costs increase to $70K/year (SaaS companies often raise prices), break-even is 5 years. I'm thinking long-term; you're thinking 3-year horizon."
+
+**VP Sales to Engineering**: "What happens if you start the build and realize at month 4 that you're behind schedule or over budget? Do we have an exit ramp, or are we committed?"
+
+**Engineering**: "That's why I proposed phased approach with MVP milestone. At 3 months, we review: Is MVP working? Is it on budget? If yes, continue. If no, we pivot to SaaS. That gives us an exit ramp."
+
+### Cruxes (What Would Change Minds)
+
+**VP Sales would shift to Build if:**
+- Engineering commits to 6-month delivery with phased milestones and exit ramps
+- MVP is demonstrated at 3 months with core functionality working
+- Sales reps are involved in design from day one (adoption risk mitigation)
+- There's a backup plan (SaaS) if custom build fails
+
+**Engineering Lead would shift to Buy if:**
+- SaaS demo shows it can handle 2-stage approval workflow without painful workarounds
+- Cost analysis includes long-term subscription growth (10-year TCO, not 3-year)
+- We retain option to migrate to custom later if SaaS limitations become painful
+- Data export and portability are guaranteed (no vendor lock-in on data)
+
+**Finance would support Build if:**
+- Timeline is credibly 6 months with phased milestones (not 12-18 months)
+- Budget is capped at $350K with clear scope controls
+- Adoption plan is strong (sales team involvement, training, change management)
+- Long-term TCO analysis shows break-even within 5-7 years
+
+**Finance would support Buy if:**
+- 3-year TCO is significantly lower ($170K vs $440K)
+- Predictable costs with no surprises
+- Adoption risk is low (familiar interface for sales reps)
+- Operational in 2-3 months to start capturing productivity gains
+
+### Areas of Agreement
+
+Despite disagreements, all three roles agree on:
+- **Current state is unacceptable**: Spreadsheets are costing us deals and productivity
+- **Timeline pressure**: Need solution operational before end of year
+- **Adoption is critical**: Doesn't matter if we build or buy if sales team doesn't use it
+- **2-stage approval is non-negotiable**: Compliance requirement can't be compromised
+- **Budget constraint**: ~$150K available this year, need to stay within bounds
+
+---
+
+## 4. Synthesis
+
+### Integration Approach
+
+**Pattern used**: Phased + Conditional (with exit ramps)
+
+**Synthesis statement:**
+
+We'll pursue a **hybrid phased approach** that mitigates the risks of both options:
+
+**Phase 1 (Months 1-3): Build MVP**
+- Engineering builds minimal CRM (contacts, deals, pipeline, 2-stage approval)
+- Budget: $100K (1.5 engineers × 3 months)
+- Success criteria: MVP demonstrates core functionality, sales team validates it works for their workflow
+
+**Phase 2 (Month 3): Decision Gate**
+- If MVP succeeds: Continue to full build (Phase 3)
+- If MVP fails or is behind schedule: Pivot to SaaS (Phase 4)
+- Decision criteria: (1) Core features working, (2) On budget (<$100K spent), (3) Sales team positive on usability
+
+**Phase 3 (Months 4-6): Complete Build** (if MVP succeeds)
+- Add reporting, mobile app, integrations
+- Budget: Additional $150K
+- Total: $250K build cost
+
+**Phase 4: SaaS Fallback** (if MVP fails)
+- Implement HubSpot Professional
+- Cost: $20K implementation + $50K/year
+- Timeline: 2-3 months to operational
+
+This approach addresses all three stakeholders' core concerns:
+
+### What We're Prioritizing
+
+**Primary focus**: Mitigate execution risk while preserving custom fit option
+
+- From **VP Sales**: Speed to value (MVP in 3 months, exit ramp if build fails)
+- From **Engineering**: Opportunity to build custom fit (MVP tests feasibility)
+- From **Finance**: Capital-efficient (only invest $100K before decision gate, pivot if needed)
+
+**Secondary considerations**: Address concerns from each perspective
+
+- **VP Sales**'s concern about adoption: Sales team involved in MVP design, validates at 3 months
+- **Engineering**'s concern about SaaS fit: MVP proves we can build 2-stage approval properly
+- **Finance**'s concern about cost overruns: Phased budget ($100K → gate → $150K), cap at $250K total
+
+### Tradeoffs Accepted
+
+**We're accepting:**
+- **3-month delay vs. starting SaaS immediately**
+  - **Rationale**: Worth the delay to test if custom build is feasible. If MVP fails, we pivot to SaaS and only lost 3 months.
+
+- **Risk of $100K sunk cost if MVP fails**
+  - **Rationale**: $100K is our "learning cost" to validate whether custom build can work. Still cheaper than committing to $250K+ full build that might fail.
+
+**We're NOT accepting:**
+- **Blind commitment to custom build** (Engineering's original proposal)
+  - **Reason**: Too risky given track record. MVP + gate reduces risk.
+
+- **Immediate SaaS adoption** (Sales' original proposal)
+  - **Reason**: Doesn't test whether custom fit is achievable. Worth 3 months to find out.
+
+- **Waiting 12+ months for full custom solution** (original concern)
+  - **Reason**: Phased approach with exit ramps means we pivot to SaaS if build isn't working by month 3.
+
+### Addressing Each Role's Core Concerns
+
+**VP Sales's main concern (speed and adoption) is addressed by:**
+- MVP in 3 months (not 12-18 months)
+- Exit ramp at month 3 if build is behind schedule
+- Sales team involved in MVP design (adoption risk mitigation)
+- Fallback to SaaS if build doesn't work out
+
+**Engineering Lead's main concern (custom fit and control) is addressed by:**
+- Opportunity to prove custom build is feasible via MVP
+- If MVP succeeds, we complete the build (Phases 3)
+- 2-stage approval built exactly as needed
+- Only pivot to SaaS if custom build *demonstrably* fails (data-driven decision)
+
+**Finance's main concern (cost and risk) is addressed by:**
+- Capital-efficient: Only $100K at risk before decision gate
+- Clear decision criteria at month 3 (on budget? on schedule? working?)
+- Capped budget: $250K total if we complete build (vs. $350K uncapped original proposal)
+- Predictable costs: SaaS fallback available if build overruns
+
+---
+
+## 5. Recommendation
+
+**Recommended Action:**
+Pursue phased build with MVP milestone at 3 months and decision gate (continue build vs. pivot to SaaS).
+
+**Rationale:**
+
+This synthesis is superior to either "pure build" or "pure buy" because it:
+
+1. **Tests feasibility before full commitment**: The MVP validates whether we can build the custom CRM on time and on budget. If yes, we continue. If no, we pivot to SaaS with only $100K sunk cost.
+
+2. **Mitigates execution risk (Sales' top concern)**: Exit ramp at month 3 means we're not locked into a long, uncertain build. If engineering can't deliver, we bail quickly and go SaaS.
+
+3. **Preserves custom fit option (Engineering's top concern)**: If MVP succeeds, we get the perfectly tailored CRM with 2-stage approval. If SaaS doesn't fit our needs (as Engineering argues), we've built the right solution.
+
+4. **Optimizes cost under uncertainty (Finance's top concern)**: We only invest $100K to learn whether custom build is viable. If it works, total cost is $250K (less than original $350K). If it doesn't, we pivot to SaaS having "paid" $100K for the knowledge that custom wasn't feasible.
+
+The key insight from the debate: **The real uncertainty is execution capability, not which option is better in theory**. By building an MVP first, we resolve that uncertainty before committing the full budget.
+
+**Key factors driving this decision:**
+1. **Execution risk** (from Sales): Phased approach with exit ramps mitigates this
+2. **Custom fit value** (from Engineering): MVP tests whether we can actually build it
+3. **Cost efficiency** (from Finance): Capital-efficient with decision gate before full investment
+
+---
+
+## 6. Implementation
+
+**Immediate next steps:**
+
+1. **Kickoff MVP build** (Week 1) - Engineering Lead
+   - Scope MVP: Contacts, Deals, Pipeline, 2-stage approval
+   - Assign 1.5 senior engineers full-time
+   - Set up weekly check-ins with Sales for feedback
+
+2. **Define MVP success criteria** (Week 1) - Finance + CTO
+   - Core features functional (create contact, create deal, advance through 2-stage approval)
+   - Budget: <$100K spent by month 3
+   - Usability: Sales reps can complete key workflows without confusion
+
+3. **Involve Sales team in design** (Week 2) - VP Sales
+   - Weekly design reviews with 3-5 sales reps
+   - Prototype testing at weeks 4, 8, 12
+   - Feedback incorporated into MVP
+
+**Phased approach:**
+
+- **Phase 1** (Months 1-3): Build and validate MVP
+  - Milestone 1 (Month 1): Contacts and deal creation working
+  - Milestone 2 (Month 2): 2-stage approval workflow functional
+  - Milestone 3 (Month 3): Sales team testing, feedback incorporated
+
+- **Phase 2** (Month 3): Decision Gate
+  - Review meeting: CTO, VP Sales, CFO, Engineering Lead
+  - Decision criteria:
+    - ✅ MVP core features working?
+    - ✅ Budget on track (<$100K)?
+    - ✅ Sales team feedback positive (usability acceptable)?
+  - If yes: Proceed to Phase 3 (complete build)
+  - If no: Pivot to Phase 4 (SaaS implementation)
+
+- **Phase 3** (Months 4-6): Complete Build (if Phase 2 = yes)
+  - Add reporting dashboard
+  - Build mobile app (iOS/Android)
+  - Integrate with accounting system and email
+  - User training and rollout to full sales team
+
+- **Phase 4**: SaaS Fallback (if Phase 2 = no)
+  - Month 4: Select vendor (HubSpot vs. Salesforce), contract negotiation
+  - Months 4-5: Implementation and customization
+  - Month 6: Training and rollout
+
+**Conditional triggers:**
+
+- **If MVP fails Month 3 gate**: Pivot to SaaS immediately (do not continue build)
+- **If build exceeds $250K total**: Stop, reassess whether to continue or pivot to SaaS
+- **If adoption is <80% by Month 9**: Investigate issues, consider switching (build or buy)
+
+**Success metrics:**
+
+- **Engineering's perspective**: MVP functional by month 3, full build by month 6, <5 bugs/month
+- **Sales' perspective**: Operational by month 6, >90% adoption within 30 days, sales cycle reduces to 45 days
+- **Finance's perspective**: Total cost <$250K (if build) or <$20K + $50K/year (if SaaS), payback <18 months
+
+**Monitoring plan:**
+
+- **Weekly**: Engineering progress vs. milestones, budget burn rate
+- **Monthly**: Sales team feedback on usability, adoption metrics
+- **Month 3**: Formal decision gate review (continue build or pivot to SaaS)
+- **Month 6**: Post-launch review (did we hit success metrics?)
+- **Month 12**: ROI review (productivity gains vs. investment)
+
+**Escalation conditions:**
+
+- If budget exceeds $100K by month 3 → automatic pivot to SaaS
+- If MVP core features not working by month 3 → escalate to CEO for decision
+- If adoption <50% by month 9 → escalate to executive team for intervention
+
+---
+
+## 7. Stakeholder Communication
+
+**For Executive Team (CEO, CFO, Board):**
+
+**Key message**: "We're pursuing a phased build approach that tests feasibility before full commitment, with SaaS fallback if custom build doesn't work."
+
+**Focus on**: Risk mitigation, cost efficiency, timeline
+- Risk: MVP + decision gate reduces execution risk from high to medium
+- Cost: Only $100K at risk before decision point, capped at $250K total
+- Timeline: MVP operational in 3 months, decision by month 3, full solution by month 6
+
+**For Engineering Team:**
+
+**Key message**: "We're building an MVP to prove we can deliver a custom CRM on time and on budget. If successful, we'll complete the build. If not, we'll pivot to SaaS."
+
+**Focus on**: Technical scope, timeline, success criteria
+- Scope: MVP = contacts, deals, pipeline, 2-stage approval (no gold-plating)
+- Timeline: 3 months to MVP, 6 months to full build (if MVP succeeds)
+- Success criteria: Functional, on budget, sales team validates usability
+- Commitment: If we prove we can do this, company will invest in completing the build
+
+**For Sales Team:**
+
+**Key message**: "We're building a custom CRM tailored to your workflow, with your input. If it doesn't work out, we'll get you a proven SaaS CRM instead."
+
+**Focus on**: Involvement, timeline, fallback plan
+- Involvement: You'll be involved from day one - design reviews, prototype testing, feedback
+- Timeline: Testing MVP in 3 months, full CRM operational by month 6
+- Custom fit: Built for your 2-stage approval workflow (not workarounds)
+- Safety net: If custom build doesn't work, we have SaaS option (HubSpot) ready to go
+
+---
+
+## 8. Appendix: Assumptions & Uncertainties
+
+**Key assumptions:**
+
+1. **Engineering can deliver MVP in 3 months with 1.5 engineers**
+   - **Confidence**: Medium (based on similar past projects, but track record includes delays)
+   - **Impact if wrong**: Delays decision gate, may force SaaS pivot
+
+2. **MVP will provide sufficient signal on full build feasibility**
+   - **Confidence**: High (MVP includes core technical challenges: data model, 2-stage approval logic, UI)
+   - **Impact if wrong**: May commit to full build that later fails
+
+3. **Sales team will provide constructive feedback during MVP development**
+   - **Confidence**: High (VP Sales committed to involving team)
+   - **Impact if wrong**: Build wrong features, adoption fails
+
+4. **SaaS option (HubSpot) will still be available at $50K/year in 3 months**
+   - **Confidence**: High (enterprise contracts are stable, pricing doesn't fluctuate month-to-month)
+   - **Impact if wrong**: Fallback plan costs more than projected
+
+5. **Current manual CRM costs $40K/year in lost productivity**
+   - **Confidence**: Medium (rough estimate based on 20 hours/week sales rep time)
+   - **Impact if wrong**: ROI calculation changes, but decision logic still holds
+
+**Unresolved uncertainties:**
+
+- **Will sales reps actually use the custom CRM?**: Mitigated by involving them in design, but adoption risk remains
+- **Can engineering complete full build in months 4-6?**: MVP reduces uncertainty but doesn't eliminate it
+- **Will SaaS really handle 2-stage approval well?**: Need to do deeper demo/trial if we pivot to SaaS
+
+**What would change our mind:**
+
+- **If MVP demonstrates build is infeasible** (behind schedule, over budget, or sales team feedback is negative) → Pivot to SaaS immediately
+- **If SaaS vendors introduce features that handle our 2-stage approval** → Reconsider SaaS earlier
+- **If budget gets cut below $100K** → Go straight to SaaS (can't afford MVP experiment)
+
+---
+
+## Self-Assessment (Rubric Scores)
+
+**Perspective Authenticity**: 4/5
+- All three roles authentically represented with strong advocacy
+- Each role feels genuine ("hero of their own story")
+- Could improve: More depth on Finance's methodology for cost analysis
+
+**Depth of Roleplay**: 4/5
+- Priorities, concerns, evidence, and vulnerabilities articulated for each role
+- Success metrics defined
+- Could improve: More specific evidence citations (e.g., which past projects, exact timeline data)
+
+**Debate Quality**: 5/5
+- Strong point-counterpoint engagement
+- Roles respond directly to each other's arguments
+- Cross-examination adds depth
+- All perspectives clash on key dimensions
+
+**Tension Surfacing**: 5/5
+- Four key tensions explicitly identified and explored (timeline, fit, cost, risk)
+- Irreducible tradeoffs clear
+- False dichotomies avoided (phased approach finds middle ground)
+
+**Crux Identification**: 4/5
+- Clear cruxes for each role
+- Conditions that would change minds specified
+- Could improve: More specificity on evidence thresholds (e.g., exactly what MVP must demonstrate)
+
+**Synthesis Coherence**: 5/5
+- Phased + conditional pattern well-applied
+- Recommendation is unified and logical
+- Addresses the decision directly
+- Better than any single perspective alone (integrates insights)
+
+**Concern Integration**: 5/5
+- All three roles' core concerns explicitly addressed
+- Synthesis shows how each perspective is strengthened by integration
+- No perspective dismissed
+
+**Tradeoff Transparency**: 5/5
+- Clear on what's accepted (3-month delay, $100K risk) and why
+- Explicit on what's rejected (blind build commitment, immediate SaaS)
+- Honest about second-order effects
+
+**Actionability**: 5/5
+- Detailed implementation plan with phases
+- Owners and timelines specified
+- Success metrics from each role's perspective
+- Monitoring plan and escalation conditions
+- Decision review cadence (Month 3 gate)
+
+**Stakeholder Readiness**: 4/5
+- Communication tailored for execs, engineering, and sales
+- Key messages appropriate for each audience
+- Could improve: Add one-page executive summary at the top
+
+**Average Score**: 4.6/5 (Exceeds standard for medium-complexity decision)
+
+**Why this exceeds standard:**
+- Genuine multi-stakeholder debate with real tension
+- Synthesis pattern (phased + conditional) elegantly resolves competing priorities
+- Decision gate provides exit ramp that addresses execution risk
+- All perspectives strengthened by integration (not just compromise)
+
+---
+
+**Analysis completed**: November 2, 2024
+**Facilitator**: [Internal Strategy Team]
+**Status**: Ready for executive approval
+**Next milestone**: Kickoff MVP build (Week 1)
--- a/skills/chain-roleplay-debate-synthesis/resources/methodology.md
+++ b/skills/chain-roleplay-debate-synthesis/resources/methodology.md
@@ -0,0 +1,361 @@
+# Advanced Roleplay → Debate → Synthesis Methodology
+
+## Workflow
+
+Copy this checklist for advanced techniques:
+
+```
+Advanced Facilitation Progress:
+- [ ] Step 1: Map stakeholder landscape and power dynamics
+- [ ] Step 2: Design multi-round debate structure
+- [ ] Step 3: Facilitate with anti-pattern awareness
+- [ ] Step 4: Synthesize under uncertainty and constraints
+- [ ] Step 5: Adapt communication for different audiences
+```
+
+**Step 1: Map stakeholder landscape**
+Identify all stakeholders, map influence and interest, understand power dynamics and coalitions, determine who must be represented in the debate. See [1. Stakeholder Mapping](#1-stakeholder-mapping) for power-interest matrix and role selection strategy.
+
+**Step 2: Design multi-round structure**
+Plan debate rounds (diverge, converge, iterate), allocate time appropriately, choose debate formats for each round, set decision criteria upfront. See [2. Multi-Round Debate Structure](#2-multi-round-debate-structure) for three-round framework and time management.
+
+**Step 3: Facilitate with anti-pattern awareness**
+Recognize when debates go wrong (premature consensus, dominance, false dichotomies), intervene with techniques to surface genuine tensions, ensure all perspectives get authentic hearing. See [3. Facilitation Anti-Patterns](#3-facilitation-anti-patterns) for common failures and interventions.
+
+**Step 4: Synthesize under uncertainty**
+Handle incomplete information, conflicting evidence, and irreducible disagreement. Use conditional strategies and monitoring plans. See [4. Synthesis Under Uncertainty](#4-synthesis-under-uncertainty) for approaches when evidence is incomplete.
+
+**Step 5: Adapt communication**
+Tailor synthesis narrative for technical, executive, and operational audiences. Emphasize different aspects for different stakeholders. See [5. Audience-Perspective Adaptation](#5-audience-perspective-adaptation) for stakeholder-specific messaging.
+
+---
+
+## 1. Stakeholder Mapping
+
+### Power-Interest Matrix
+
+**High Power, High Interest** → **Manage Closely**
+- Must be represented in debate
+- Concerns must be addressed
+- Examples: Executive sponsor, Product owner, Key customer
+
+**High Power, Low Interest** → **Keep Satisfied**
+- Consult but may not need full representation
+- Examples: CFO (if not budget owner), Adjacent VP, Legal
+
+**Low Power, High Interest** → **Keep Informed**
+- Valuable input, may aggregate into broader role
+- Examples: End users, Support team, Implementation team
+
+**Low Power, Low Interest** → **Monitor**
+- Don't need direct representation
+
+### Role Selection Strategy
+
+**Must include:**
+- Primary decision-maker or proxy
+- Implementation owner
+- Resource controller (budget, people, time)
+- Risk owner
+
+**Should include:**
+- Key affected stakeholders (customer, user)
+- Domain expert
+- Devil's advocate
+
+**Aggregation when >5 stakeholders:**
+- Combine similar perspectives into archetype roles
+- Rotate roles across debate rounds
+- Focus on distinct viewpoints, not individuals
+
+### Coalition Identification
+
+**Common coalitions:**
+- **Revenue**: Sales, Marketing, Growth → prioritize growth
+- **Quality**: Engineering, Support, Brand → prioritize quality
+- **Efficiency**: Finance, Operations → prioritize cost
+- **Innovation**: R&D, Product, Strategy → prioritize new capabilities
+
+**Why matters**: Coalitions amplify perspectives. Synthesis must address coalition concerns, not just individual roles.
+
+---
+
+## 2. Multi-Round Debate Structure
+
+### Three-Round Framework
+
+**Round 1: Diverge (30-45 min)**
+- **Goal**: Surface all perspectives
+- **Format**: Sequential roleplay (no interruption)
+- **Outcome**: Clear understanding of each position
+
+**Round 2: Engage (45-60 min)**
+- **Goal**: Surface tensions, challenge assumptions, identify cruxes
+- **Format**: Point-counterpoint or constructive confrontation
+- **Facilitation**: Direct traffic, push for specifics, surface cruxes, note agreements
+
+**Round 3: Converge (30-45 min)**
+- **Goal**: Build unified recommendation
+- **Format**: Collaborative synthesis
+- **Facilitation**: Propose patterns, test against roles, refine, check coherence
+
+### Adaptive Structures
+
+**Two-round** (simpler decisions): Roleplay+Debate → Synthesis
+
+**Four-round** (complex decisions): Positions → Challenge → Refine → Synthesize
+
+**Iterative**: Initial synthesis → Test → Refine → Repeat
+
+---
+
+## 3. Facilitation Anti-Patterns
+
+### Premature Consensus
+**Symptoms**: Roles agree quickly without genuine debate
+**Fix**: Play devil's advocate, test with edge cases, give permission to disagree
+
+### Dominant Voice
+**Symptoms**: One role speaks 70%+ of time, others defer
+**Fix**: Explicit turn-taking, direct questions to quieter roles, affirm contributions
+
+### Talking Past Each Other
+**Symptoms**: Roles make points but don't engage
+**Fix**: Make dimensions explicit, force direct engagement, summarize and redirect
+
+### False Dichotomies
+**Symptoms**: "Either X or we fail"
+**Fix**: Challenge dichotomy, explore spectrum, introduce alternatives, reframe
+
+### Appeal to Authority
+**Symptoms**: "CEO wants X, so we do X"
+**Fix**: Ask for underlying reasoning, question applicability, examine evidence
+
+### Strawman Arguments
+**Symptoms**: Weak versions of opposing views
+**Fix**: Steelman request, direct to role for their articulation, empathy prompt
+
+### Analysis Paralysis
+**Symptoms**: "Need more data" endlessly
+**Fix**: Set decision deadline, clarify decision criteria, good-enough threshold
+
+---
+
+## 4. Synthesis Under Uncertainty
+
+### When Evidence is Incomplete
+
+**Conditional strategy with learning triggers:**
+- "Start with X. Monitor [metric]. If [threshold] not met by [date], switch to Y."
+
+**Reversible vs. irreversible:**
+- Choose reversible option first
+- Example: "Buy SaaS (reversible). Only build custom if SaaS proves inadequate after 6 months."
+
+**Small bets and experiments:**
+- Run pilots before full commitment
+- Example: "Test feature with 10% users. Rollout to 100% only if retention improves >5%."
+
+**Information value calculation:**
+- Is value of additional information worth the delay?
+
+### When Roles Fundamentally Disagree
+
+**Disagree and commit:**
+- Make decision, all commit to making it work
+- Document disagreement for learning
+
+**Escalate to decision-maker:**
+- Present both perspectives clearly
+- Let higher authority break tie
+
+**Parallel paths** (if resources allow):
+- Pursue both approaches simultaneously
+- Let data decide which to scale
+
+**Defer decision:**
+- Explicitly choose to wait
+- Set conditions for revisiting
+
+### When Constraints Shift Mid-Debate
+
+**Revisit assumptions:**
+- Which roles' positions change given new constraint?
+
+**Re-prioritize:**
+- Given new constraint, what's binding now?
+
+**Scope reduction:**
+- What can we cut to stay within constraints?
+
+**Challenge the constraint:**
+- Is the new constraint real or negotiable?
+
+---
+
+## 5. Audience-Perspective Adaptation
+
+### For Executives
+**Focus**: Strategic impact, ROI, risk, competitive positioning
+- Bottom-line recommendation (1 sentence)
+- Strategic rationale (2-3 bullets)
+- Financial impact (costs, benefits, ROI)
+- Risk summary (top 2 risks + mitigations)
+- Competitive implications
+
+**Format**: 1-page executive summary
+
+### For Technical Teams
+**Focus**: Implementation feasibility, technical tradeoffs, timeline, resources
+- Technical approach (how)
+- Architecture decisions and rationale
+- Resource requirements (people, time, tools)
+- Technical risks and mitigation
+- Success metrics (technical KPIs)
+
+**Format**: 2-3 page technical brief
+
+### For Operational Teams
+**Focus**: Customer impact, ease of execution, support burden, messaging
+- Customer value proposition
+- Operational changes (what changes for them)
+- Training and enablement needs
+- Support implications
+- Timeline and rollout plan
+
+**Format**: Operational guide
+
+---
+
+## 6. Advanced Debate Formats
+
+### Socratic Dialogue
+**Purpose**: Deep exploration through questioning
+**Method**: One role (Socrates) asks probing questions, other responds
+**Questions**: "What do you mean by [term]?", "Why is that important?", "What if opposite were true?"
+
+### Steelman Debate
+**Purpose**: Understand deeply before challenging
+**Method**: Role B steelmans Role A's argument (stronger than A did), then challenges
+**Why works**: Forces genuine understanding, surfaces real strengths
+
+### Pre-Mortem Debate
+**Purpose**: Surface risks and failure modes
+**Method**: Assume decision X failed. Each role explains why from their perspective
+**Repeat for each alternative**
+
+### Fishbowl Debate
+**Purpose**: Represent multiple layers (decision-makers + affected parties)
+**Format**: Inner circle debates, outer circle observes, pause periodically for outer circle input
+
+### Delphi Method
+**Purpose**: Aggregate expert opinions without groupthink
+**Format**: Round 1 (anonymous positions) → Share → Round 2 (revise) → Repeat until convergence
+
+---
+
+## 7. Complex Synthesis Patterns
+
+### Multi-Criteria Decision Analysis (MCDA)
+
+**When**: Multiple competing criteria, can't integrate narratively
+
+**Method**:
+1. Identify criteria (from role perspectives): Cost, Speed, Quality, Risk, Customer Impact
+2. Weight criteria (based on priorities): Sum to 100%
+3. Score alternatives (1-5 scale per criterion)
+4. Calculate weighted scores
+5. Sensitivity analysis on weights
+
+### Pareto Frontier Analysis
+
+**When**: Two competing objectives with tradeoff curve
+
+**Method**:
+1. Plot alternatives on two dimensions (e.g., Cost vs Quality)
+2. Identify Pareto frontier (non-dominated alternatives)
+3. Choose based on priorities
+
+### Real Options Analysis
+
+**When**: Decision can be staged with learning opportunities
+
+**Method**:
+1. Identify decision points (Now: invest $X, Later: decide based on results)
+2. Map scenarios and outcomes
+3. Calculate option value (flexibility value - upfront commitment value)
+
+---
+
+## 8. Facilitation Best Practices
+
+### Reading the Room
+
+**Verbal cues:**
+- Hesitation: "Well, I guess..." (not convinced)
+- Qualifiers: "Maybe", "Possibly" (hedging)
+- Repetition: Saying same point multiple times (not feeling heard)
+
+**Facilitation responses:**
+- Check in: "I sense hesitation. Can you say more?"
+- Affirm: "I hear X is important. Let's address that."
+- Give space: "Let's pause and hear from [quieter person]."
+
+### Managing Conflict
+
+**Productive** (encourage):
+- Disagreement on ideas (not people)
+- Specificity, evidence-based, openness to changing mind
+
+**Unproductive** (intervene):
+- Personal attacks, generalizations, dismissiveness, stonewalling
+
+**Interventions**: Reframe (focus on idea), ground in evidence, seek understanding, take break
+
+### Building Toward Synthesis
+
+**Incremental agreement**: Note areas of agreement as they emerge
+
+**Trial balloons**: Float potential synthesis ideas early, gauge reactions
+
+**Role-checking**: Test synthesis against each role iteratively
+
+### Closing the Debate
+
+**Signals**: Positions clear, tensions explored, cruxes identified, repetition, time pressure
+
+**Transition**: "We've heard all perspectives. Now let's build unified recommendation."
+
+**Final check**: "Can everyone live with this?" "What would make this 10% better for each of you?"
+
+---
+
+## 9. Case Studies
+
+For detailed worked examples showing stakeholder mapping, multi-round debates, and complex synthesis:
+
+- [Monolith vs Microservices](examples/methodology/case-study-monolith-microservices.md) - Engineering team debate
+- [Market Entry Decision](examples/methodology/case-study-market-entry.md) - Executive team with 5 stakeholders
+- [Pricing Model Debate](examples/methodology/case-study-pricing-model.md) - Customer segmentation synthesis
+
+---
+
+## Summary
+
+**Key principles:**
+
+1. **Map the landscape**: Understand stakeholders, power dynamics, coalitions before designing debate
+
+2. **Structure for depth**: Multiple rounds allow positions to evolve as understanding deepens
+
+3. **Recognize anti-patterns**: Premature consensus, dominant voice, talking past, false dichotomies, appeal to authority, strawmen, analysis paralysis
+
+4. **Synthesize under uncertainty**: Conditional strategies, reversible decisions, small bets, monitoring plans
+
+5. **Adapt communication**: Tailor for executives (strategic), technical teams (implementation), operational teams (execution)
+
+6. **Master advanced formats**: Socratic dialogue, steelman, pre-mortem, fishbowl, Delphi for different contexts
+
+7. **Facilitate skillfully**: Read the room, manage conflict productively, build incremental agreement, know when to close
+
+**The best synthesis** integrates insights from all perspectives, addresses real concerns, makes tradeoffs explicit, and results in a decision better than any single viewpoint alone.
--- a/skills/chain-roleplay-debate-synthesis/resources/template.md
+++ b/skills/chain-roleplay-debate-synthesis/resources/template.md
@@ -0,0 +1,306 @@
+# Roleplay → Debate → Synthesis Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Roleplay → Debate → Synthesis Progress:
+- [ ] Step 1: Frame decision and select 2-5 roles
+- [ ] Step 2: Roleplay each perspective authentically
+- [ ] Step 3: Facilitate structured debate
+- [ ] Step 4: Synthesize unified recommendation
+- [ ] Step 5: Self-assess with rubric
+```
+
+**Step 1: Frame decision and select roles**
+Define the decision question clearly, identify 2-5 stakeholder perspectives with competing interests, and determine what successful synthesis looks like. Use [Quick Template](#quick-template) structure below.
+
+**Step 2: Roleplay each perspective**
+For each role, articulate their position, priorities, concerns, evidence, and vulnerabilities without strawmanning. See [Section 2](#2-roles--perspectives) of template structure.
+
+**Step 3: Facilitate structured debate**
+Use debate format (point-counterpoint, devil's advocate, crux-finding) to surface tensions and challenge assumptions. See [Section 3](#3-debate) of template structure.
+
+**Step 4: Synthesize unified recommendation**
+Integrate insights using synthesis patterns (weighted, sequencing, conditional, hybrid, reframing, or constraint elevation). See [Section 4](#4-synthesis) of template structure and [Synthesis Patterns](#synthesis-patterns).
+
+**Step 5: Self-assess with rubric**
+Validate using rubric: perspective authenticity, debate quality, synthesis coherence, and actionability. Use [Quality Checklist](#quality-checklist) before finalizing.
+
+---
+
+## Quick Template
+
+Copy this structure to create your analysis:
+
+```markdown
+# Decision: [Question]
+
+**Date**: [Today's date]
+**Decision-maker**: [Who decides]
+**Stakes**: [High/Medium/Low - impact and reversibility]
+
+---
+
+## 1. Decision Context
+
+**What we're deciding:**
+[Clear statement of the choice - "Should we X?" or "What's the right balance between X and Y?"]
+
+**Why this matters:**
+[Business impact, urgency, strategic importance]
+
+**Success criteria for synthesis:**
+[What makes this synthesis successful]
+
+**Constraints:**
+- [Budget, timeline, requirements, non-negotiables]
+
+**Audience:** [Who needs to approve or act on this decision]
+
+---
+
+## 2. Roles & Perspectives
+
+### Role 1: [Name - e.g., "Engineering Lead" or "Growth Advocate"]
+
+**Position**: [What they believe should be done]
+
+**Priorities**: [What values or goals drive this position]
+- [Priority 1]
+- [Priority 2]
+- [Priority 3]
+
+**Concerns about alternatives**: [What risks or downsides they see]
+- [Concern 1]
+- [Concern 2]
+
+**Evidence**: [What data, experience, or reasoning supports this view]
+- [Evidence 1]
+- [Evidence 2]
+
+**Vulnerabilities**: [What uncertainties or limitations they acknowledge]
+- [What they're unsure about]
+- [What could prove them wrong]
+
+**Success metrics**: [How this role measures success]
+
+---
+
+### Role 2: [Name]
+
+[Same structure as Role 1]
+
+---
+
+### Role 3: [Name] (if applicable)
+
+[Same structure as Role 1]
+
+---
+
+## 3. Debate
+
+### Key Points of Disagreement
+
+**Dimension 1: [e.g., "Timeline - Fast vs. Thorough"]**
+- **[Role A]**: [Their position on this dimension]
+- **[Role B]**: [Their position on this dimension]
+- **Tension**: [Where the conflict lies]
+
+**Dimension 2: [e.g., "Risk Tolerance"]**
+- **[Role A]**: [Position]
+- **[Role B]**: [Position]
+- **Tension**: [Conflict]
+
+### Debate Transcript (Point-Counterpoint)
+
+**[Role A] Opening Case:**
+[Their argument for their position - 2-3 paragraphs]
+
+**[Role B] Response:**
+[Objections and counterarguments - 2-3 paragraphs]
+
+**[Role A] Rebuttal:**
+[Addresses objections - 1-2 paragraphs]
+
+**Cross-examination:**
+- **[Role A] to [Role B]**: [Probing question]
+- **[Role B]**: [Response]
+
+### Cruxes (What Would Change Minds)
+
+**[Role A] would shift if:**
+- [Condition or evidence that would change their position]
+
+**[Role B] would shift if:**
+- [Condition or evidence]
+
+### Areas of Agreement
+
+Despite disagreements, roles agree on:
+- [Common ground 1]
+- [Common ground 2]
+
+---
+
+## 4. Synthesis
+
+### Integration Approach
+
+**Pattern used**: [Weighted Synthesis / Sequencing / Conditional / Hybrid / Reframing / Constraint Elevation]
+
+**Synthesis statement:**
+[1-2 paragraphs explaining the unified recommendation that integrates insights from all perspectives]
+
+### What We're Prioritizing
+
+**Primary focus**: [What's being prioritized and why]
+- From [Role X]: [What we're adopting from this perspective]
+- From [Role Y]: [What we're adopting]
+
+**Secondary considerations**: [How we're addressing other concerns]
+- [Role X]'s concern about [issue]: Mitigated by [approach]
+
+### Tradeoffs Accepted
+
+**We're accepting:**
+- [Tradeoff 1]
+  - **Rationale**: [Why this makes sense]
+
+**We're NOT accepting:**
+- [What we explicitly decided against]
+  - **Reason**: [Why rejected]
+
+---
+
+## 5. Recommendation
+
+**Recommended Action:**
+[Clear, specific recommendation in 1-2 sentences]
+
+**Rationale:**
+[2-3 paragraphs explaining why this synthesis is the best path forward]
+
+**Key factors driving this decision:**
+1. [Factor 1 - from which role's perspective]
+2. [Factor 2]
+3. [Factor 3]
+
+---
+
+## 6. Implementation
+
+**Immediate next steps:**
+1. [Action 1] - [Owner] by [Date]
+2. [Action 2] - [Owner] by [Date]
+3. [Action 3] - [Owner] by [Date]
+
+**Phased approach:** (if using sequencing)
+- **Phase 1** ([Timeline]): [What happens first]
+- **Phase 2** ([Timeline]): [What happens next]
+
+**Conditional triggers:** (if using conditional strategy)
+- **If [condition A]**: [Then do X]
+- **If [condition B]**: [Then do Y]
+
+**Success metrics:**
+- [Metric 1 - from Role X's perspective]: Target [value] by [date]
+- [Metric 2 - from Role Y's perspective]: Target [value] by [date]
+
+**Monitoring plan:**
+- **Weekly**: [What we track frequently]
+- **Monthly**: [What we review periodically]
+
+---
+
+## 7. Stakeholder Communication
+
+**For [Stakeholder Group A - e.g., Executive Team]:**
+- Key message: [1-sentence summary]
+- Focus on: [What matters most to them]
+
+**For [Stakeholder Group B - e.g., Engineering Team]:**
+- Key message: [1-sentence summary]
+- Focus on: [What matters most to them]
+
+---
+
+## 8. Appendix: Assumptions & Uncertainties
+
+**Key assumptions:**
+1. [Assumption 1]
+   - **Confidence**: High / Medium / Low
+   - **Impact if wrong**: [What happens]
+
+**Unresolved uncertainties:**
+- [Uncertainty 1]: [How we'll handle this]
+
+**What would change our mind:**
+- [Condition or evidence that would trigger reconsideration]
+```
+
+---
+
+## Synthesis Patterns
+
+### 1. Weighted Synthesis
+"Prioritize X, while incorporating safeguards for Y"
+- Example: "Ship fast (PM), with feature flags and monitoring (Engineer)"
+
+### 2. Sequencing
+"First X, then Y"
+- Example: "Launch MVP (Growth), then invest in quality (Engineering) if PMF proven"
+
+### 3. Conditional Strategy
+"If A, do X; if B, do Y"
+- Example: "If >10K users in Q1, scale; otherwise pivot"
+
+### 4. Hybrid Approach
+"Combine elements of multiple perspectives"
+- Example: "Build core in-house (control) but buy peripherals (speed)"
+
+### 5. Reframing
+"Debate reveals real question is Z, not X vs Y"
+- Example: "Pricing debate reveals we need to segment customers first"
+
+### 6. Constraint Elevation
+"Identify binding constraint all perspectives agree on"
+- Example: "Both agree eng capacity is bottleneck; hire first"
+
+---
+
+## Quality Checklist
+
+Before finalizing, verify:
+
+**Roleplay quality:**
+- [ ] Each role has clear position, priorities, concerns, evidence
+- [ ] Perspectives feel authentic (not strawmen)
+- [ ] Vulnerabilities acknowledged
+- [ ] Success metrics defined for each role
+
+**Debate quality:**
+- [ ] Key disagreements surfaced on 3-5 dimensions
+- [ ] Perspectives directly engage (not talking past each other)
+- [ ] Cruxes identified (what would change minds)
+- [ ] Areas of agreement noted
+
+**Synthesis quality:**
+- [ ] Clear integration approach (weighted/sequencing/conditional/hybrid/reframe/constraint)
+- [ ] All roles' concerns addressed (not dismissed)
+- [ ] Tradeoffs explicit (what we're accepting and why)
+- [ ] Recommendation is unified and coherent
+- [ ] Actionable next steps with owners and dates
+
+**Communication quality:**
+- [ ] Tailored for different stakeholders
+- [ ] Key messages clear (1-sentence summaries)
+- [ ] Emphasis appropriate for audience
+
+**Integrity:**
+- [ ] Assumptions stated explicitly
+- [ ] Uncertainties acknowledged
+- [ ] "What would change our mind" conditions specified
+- [ ] No perspective dismissed without engagement
--- a/skills/chain-spec-risk-metrics/SKILL.md
+++ b/skills/chain-spec-risk-metrics/SKILL.md
@@ -0,0 +1,155 @@
+---
+name: chain-spec-risk-metrics
+description: Use when planning high-stakes initiatives (migrations, launches, strategic changes) that require clear specifications, proactive risk identification (premortem/register), and measurable success criteria. Invoke when user mentions "plan this migration", "launch strategy", "implementation roadmap", "what could go wrong", "how do we measure success", or when high-impact decisions need comprehensive planning with risk mitigation and instrumentation.
+---
+# Chain Spec Risk Metrics
+
+## Table of Contents
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [When NOT to Use This Skill](#when-not-to-use-this-skill)
+- [What Is It?](#what-is-it)
+- [Workflow](#workflow)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+This skill helps you create comprehensive plans for high-stakes initiatives by chaining together three critical components: clear specifications, proactive risk analysis, and measurable success metrics. It ensures initiatives are well-defined, risks are anticipated and mitigated, and progress can be objectively tracked.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+
+- **Plan complex implementations** - Migrations, infrastructure changes, system redesigns requiring detailed specs
+- **Launch new initiatives** - Products, features, programs that need risk assessment and success measurement
+- **Make high-stakes decisions** - Strategic choices where failure modes must be identified and monitored
+- **Coordinate cross-functional work** - Initiatives requiring clear specifications for alignment and risk transparency
+- **Request comprehensive planning** - User asks "plan this migration", "create implementation roadmap", "what could go wrong?"
+- **Establish accountability** - Need clear success criteria and risk owners for governance
+
+**Trigger phrases:**
+- "Plan this [migration/launch/implementation]"
+- "Create a roadmap for..."
+- "What could go wrong with..."
+- "How do we measure success for..."
+- "Write a spec that includes risks and metrics"
+- "Comprehensive plan with risk mitigation"
+
+## When NOT to Use This Skill
+
+Skip this skill when:
+
+- **Quick decisions** - Low-stakes choices don't need full premortem treatment
+- **Specifications only** - If user just needs a spec without risk/metrics analysis (use one-pager-prd or adr-architecture instead)
+- **Risk analysis only** - If focused solely on identifying risks (use project-risk-register or postmortem instead)
+- **Metrics only** - If just defining KPIs (use metrics-tree instead)
+- **Already decided and executing** - Use postmortem or reviews-retros-reflection for retrospectives
+- **Brainstorming alternatives** - Use brainstorm-diverge-converge to generate options first
+
+## What Is It?
+
+Chain Spec Risk Metrics is a meta-skill that combines three complementary techniques into a comprehensive planning artifact:
+
+1. **Specification** - Define what you're building/changing with clarity (scope, requirements, approach, timeline)
+2. **Risk Analysis** - Identify what could go wrong through premortem ("imagine we failed - why?") and create risk register with mitigations
+3. **Success Metrics** - Define measurable outcomes to track progress and validate success
+
+**Quick example:**
+> **Initiative:** Migrate monolith to microservices
+>
+> **Spec:** Decompose into 5 services (auth, user, order, inventory, payment), API gateway, shared data patterns
+>
+> **Risks:**
+> - Data consistency issues between services (High) → Implement saga pattern with compensation
+> - Performance degradation from network hops (Medium) → Load test with production traffic patterns
+>
+> **Metrics:**
+> - Deployment frequency (target: 10+ per week, baseline: 2 per week)
+> - API p99 latency (target: < 200ms, baseline: 150ms)
+> - Mean time to recovery (target: < 30min, baseline: 2 hours)
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Chain Spec Risk Metrics Progress:
+- [ ] Step 1: Gather initiative context
+- [ ] Step 2: Write comprehensive specification
+- [ ] Step 3: Conduct premortem and build risk register
+- [ ] Step 4: Define success metrics and instrumentation
+- [ ] Step 5: Validate completeness and deliver
+```
+
+**Step 1: Gather initiative context**
+
+Ask user for the initiative goal, constraints (time/budget/resources), stakeholders, current state (baseline), and desired outcomes. Clarify whether this is a greenfield build, migration, enhancement, or strategic change. See [resources/template.md](resources/template.md) for full context questions.
+
+**Step 2: Write comprehensive specification**
+
+Create detailed specification covering scope (what's in/out), approach (architecture/methodology), requirements (functional/non-functional), dependencies, timeline, and success criteria. For standard initiatives use [resources/template.md](resources/template.md); for complex multi-phase programs see [resources/methodology.md](resources/methodology.md) for decomposition techniques.
+
+**Step 3: Conduct premortem and build risk register**
+
+Run premortem exercise: "Imagine 12 months from now this initiative failed spectacularly. What went wrong?" Identify risks across technical, operational, organizational, and external dimensions. For each risk document likelihood, impact, mitigation strategy, and owner. See [Premortem Technique](#premortem-technique) and [Risk Register Structure](#risk-register-structure) sections, or [resources/methodology.md](resources/methodology.md) for advanced risk assessment methods.
+
+**Step 4: Define success metrics and instrumentation**
+
+Identify leading indicators (early signals), lagging indicators (outcome measures), and counter-metrics (what you're NOT willing to sacrifice). Specify current baseline, target values, measurement method, and tracking cadence for each metric. See [Metrics Framework](#metrics-framework) and use [resources/template.md](resources/template.md) for standard structure.
+
+**Step 5: Validate completeness and deliver**
+
+Self-check the complete artifact using [resources/evaluators/rubric_chain_spec_risk_metrics.json](resources/evaluators/rubric_chain_spec_risk_metrics.json). Ensure specification is clear and actionable, risks are comprehensive with mitigations, metrics measure actual success, and all three components reinforce each other. Minimum standard: Average score ≥ 3.5 across all criteria.
+
+## Common Patterns
+
+### Premortem Technique
+
+1. **Set the scene**: "It's [6/12/24] months from now. This initiative failed catastrophically."
+2. **Brainstorm failure causes**: Each stakeholder writes 3-5 reasons why it failed (independently first)
+3. **Cluster and prioritize**: Group similar failures, vote on likelihood and impact
+4. **Convert to risk register**: Each failure mode becomes a risk with mitigation plan
+
+### Risk Register Structure
+
+For each identified risk, document:
+- **Risk description**: Specific failure mode (not vague "project delay")
+- **Category**: Technical, operational, organizational, external
+- **Likelihood**: Low/Medium/High (or probability %)
+- **Impact**: Low/Medium/High (or cost estimate)
+- **Mitigation strategy**: What you'll do to reduce likelihood or impact
+- **Owner**: Who monitors and responds to this risk
+- **Status**: Open, Mitigated, Accepted, Closed
+
+### Metrics Framework
+
+**Leading indicators** (predict future success):
+- Deployment frequency, code review velocity, incident detection time
+
+**Lagging indicators** (measure outcomes):
+- Uptime, user adoption, revenue impact, customer satisfaction
+
+**Counter-metrics** (what you're NOT willing to sacrifice):
+- Code quality, team morale, security posture, user privacy
+
+## Guardrails
+
+- **Don't skip any component** - Spec without risks = blind spots; risks without metrics = unvalidated mitigations
+- **Be specific in specifications** - "Improve performance" is not a spec; "Reduce p99 API latency from 500ms to 200ms" is
+- **Quantify risks** - Use likelihood × impact scores to prioritize; don't treat all risks equally
+- **Make metrics measurable** - "Better UX" is not measurable; "Increase checkout completion from 67% to 75%" is
+- **Assign owners** - Every risk and metric needs a clear owner who monitors and acts
+- **State assumptions explicitly** - Document what you're assuming about resources, timelines, dependencies
+- **Include counter-metrics** - Always define what success does NOT mean sacrificing
+- **Update as you learn** - This is a living document; revisit after milestones to update risks/metrics
+
+## Quick Reference
+
+| Component | When to Use | Resource |
+|-----------|-------------|----------|
+| **Template** | Standard initiatives with known patterns | [resources/template.md](resources/template.md) |
+| **Methodology** | Complex multi-phase programs, novel risks | [resources/methodology.md](resources/methodology.md) |
+| **Examples** | See what good looks like | [resources/examples/](resources/examples/) |
+| **Rubric** | Validate before delivering | [resources/evaluators/rubric_chain_spec_risk_metrics.json](resources/evaluators/rubric_chain_spec_risk_metrics.json) |
--- a/skills/chain-spec-risk-metrics/resources/evaluators/rubric_chain_spec_risk_metrics.json
+++ b/skills/chain-spec-risk-metrics/resources/evaluators/rubric_chain_spec_risk_metrics.json
@@ -0,0 +1,279 @@
+{
+  "criteria": [
+    {
+      "name": "Specification Clarity",
+      "description": "Is the initiative goal, scope, approach, and timeline clearly defined and actionable?",
+      "scoring": {
+        "1": "Vague goal ('improve system') with no clear scope or timeline. Stakeholders can't act on this.",
+        "2": "General goal stated but scope unclear (what's in/out?). Timeline missing or unrealistic.",
+        "3": "Goal, scope, timeline stated but lacks detail. Approach mentioned but not explained. Acceptable for low-stakes.",
+        "4": "Clear goal, explicit scope (in/out), realistic timeline with milestones. Approach well-explained. Good for medium-stakes.",
+        "5": "Crystal clear goal tied to business outcome. Precise scope with rationale. Detailed approach with diagrams. Timeline has buffer and dependencies. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "Spec says 'improve performance' without quantifying what that means",
+        "No 'out of scope' section (scope creep likely)",
+        "Timeline has no buffer or dependencies identified",
+        "Approach section just lists technology choices without explaining why"
+      ]
+    },
+    {
+      "name": "Specification Completeness",
+      "description": "Are all necessary components covered (current state, requirements, dependencies, assumptions)?",
+      "scoring": {
+        "1": "Major sections missing. No baseline, no requirements, or no dependencies documented.",
+        "2": "Some sections present but incomplete. Requirements exist but vague ('system should be fast').",
+        "3": "All major sections present. Requirements specific but could be more detailed. Acceptable for low-stakes.",
+        "4": "Comprehensive: baseline with data, specific requirements, dependencies and assumptions explicit. Good for medium-stakes.",
+        "5": "Exhaustive: current state with metrics, functional + non-functional requirements with acceptance criteria, all dependencies mapped, assumptions validated. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "No current state baseline (can't measure improvement)",
+        "Requirements mix functional and non-functional without clear categories",
+        "No assumptions stated (hidden risks)",
+        "Dependencies mentioned but not explicitly called out in own section"
+      ]
+    },
+    {
+      "name": "Risk Analysis Comprehensiveness",
+      "description": "Are risks identified across all dimensions (technical, operational, organizational, external)?",
+      "scoring": {
+        "1": "No risks identified, or only 1-2 obvious risks listed. Major blind spots.",
+        "2": "3-5 risks identified but all in one category (e.g., only technical). Missing organizational, external risks.",
+        "3": "5-10 risks covering technical and operational. Some organizational risks. Acceptable for low-stakes.",
+        "4": "10-15 risks across all four categories. Premortem conducted. Covers non-obvious risks. Good for medium-stakes.",
+        "5": "15+ risks identified through structured premortem. All categories covered with specific failure modes. Includes low-probability/high-impact risks. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "Risk register is just a list of vague concerns ('project might be delayed')",
+        "All risks are technical (missing organizational, external)",
+        "No premortem conducted (risks are just obvious failure modes)",
+        "Low-probability/high-impact risks ignored (e.g., key person leaves)"
+      ]
+    },
+    {
+      "name": "Risk Quantification",
+      "description": "Are risks scored by likelihood and impact, with clear prioritization?",
+      "scoring": {
+        "1": "No risk scoring. Can't tell which risks are most important.",
+        "2": "Risks listed but no likelihood/impact assessment. Unclear which to prioritize.",
+        "3": "Likelihood and impact assessed (Low/Med/High) for each risk. Priority clear. Acceptable for low-stakes.",
+        "4": "Likelihood (%) and impact (cost/time) quantified. Risk scores calculated. Top risks prioritized. Good for medium-stakes.",
+        "5": "Quantitative risk analysis: probability distributions, expected loss, mitigation cost-benefit. Risks ranked by expected value. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "All risks marked 'High' (no actual prioritization)",
+        "Likelihood/impact inconsistent (50% likelihood but marked 'Low'?)",
+        "No risk score or priority ranking (can't tell what to focus on)",
+        "Mitigation cost not compared to expected loss (over-mitigating low-risk items)"
+      ]
+    },
+    {
+      "name": "Risk Mitigation Depth",
+      "description": "Does each high-priority risk have specific, actionable mitigation strategies with owners?",
+      "scoring": {
+        "1": "No mitigation strategies. Just a list of risks.",
+        "2": "Generic mitigations ('monitor closely', 'be careful'). Not actionable.",
+        "3": "Specific mitigations for top 5 risks. Owners assigned. Acceptable for low-stakes.",
+        "4": "Detailed mitigations for all high-risk items (score 6-9). Clear actions, owners, status tracking. Good for medium-stakes.",
+        "5": "Comprehensive mitigations with preventive + detective + corrective controls. Cost-benefit analysis. Rollback plans. Continuous monitoring. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "Mitigation is vague ('increase testing') without specifics",
+        "No owners assigned to risks (accountability missing)",
+        "High-risk items have no mitigation plan",
+        "Mitigations are all preventive (no detective/corrective controls if prevention fails)"
+      ]
+    },
+    {
+      "name": "Metrics Measurability",
+      "description": "Are metrics specific, measurable, with clear baselines, targets, and measurement methods?",
+      "scoring": {
+        "1": "No metrics, or metrics are unmeasurable ('better UX', 'improved reliability').",
+        "2": "Metrics stated but no baseline or target ('track uptime'). Can't assess success.",
+        "3": "3-5 metrics with baselines and targets. Measurement method implied but not explicit. Acceptable for low-stakes.",
+        "4": "5-10 metrics with baselines, targets, measurement methods, tracking cadence, owners. Good for medium-stakes.",
+        "5": "Comprehensive metrics framework (leading/lagging/counter). All metrics SMART (specific, measurable, achievable, relevant, time-bound). Instrumentation plan. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "Metric is subjective ('improved user experience') without quantification",
+        "No baseline (can't measure improvement)",
+        "Target is vague ('reduce latency') without number",
+        "Measurement method missing (how will you actually track this?)"
+      ]
+    },
+    {
+      "name": "Leading/Lagging Balance",
+      "description": "Are there both leading indicators (early signals) and lagging indicators (outcomes)?",
+      "scoring": {
+        "1": "Only lagging indicators (outcomes). No early warning signals.",
+        "2": "Mostly lagging. One or two leading indicators but not well-chosen.",
+        "3": "2-3 leading indicators (predict outcomes) and 3-5 lagging (measure outcomes). Acceptable for low-stakes.",
+        "4": "Balanced: 3-5 leading indicators that predict lagging outcomes. Tracking cadence matches (leading daily/weekly, lagging monthly). Good for medium-stakes.",
+        "5": "Sophisticated framework: leading indicators validated to predict lagging. Includes counter-metrics to prevent gaming. Dashboard with real-time leading, periodic lagging. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "All metrics are outcomes (no early signals of trouble)",
+        "Leading indicators don't actually predict lagging (no validated correlation)",
+        "No counter-metrics (risk of gaming the system)",
+        "Tracking cadence wrong (measuring strategic metrics daily creates noise)"
+      ]
+    },
+    {
+      "name": "Integration: Spec↔Risk↔Metrics",
+      "description": "Do the three components reinforce each other (specs enable metrics, risks map to specs, metrics validate mitigations)?",
+      "scoring": {
+        "1": "Components are disconnected. Metrics don't relate to spec goals. Risks don't map to spec decisions.",
+        "2": "Weak connections. Some overlap but mostly independent documents.",
+        "3": "Moderate integration. Risks reference spec sections. Some metrics measure risk mitigations. Acceptable for low-stakes.",
+        "4": "Strong integration. Major spec decisions have corresponding risks. High-risk items have metrics to detect issues. Metrics align with spec goals. Good for medium-stakes.",
+        "5": "Seamless integration. Specs include instrumentation for metrics. Risks mapped to specific spec choices with rationale. Metrics validate both spec assumptions and risk mitigations. Traceability matrix. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "Metrics don't align with spec goals (tracking unrelated things)",
+        "Spec makes technology choice but risks don't assess that choice",
+        "High-risk items have no corresponding metrics to detect if risk is materializing",
+        "Spec doesn't include instrumentation needed to collect metrics"
+      ]
+    },
+    {
+      "name": "Actionability",
+      "description": "Can stakeholders act on this artifact? Are owners, timelines, and next steps clear?",
+      "scoring": {
+        "1": "No clear next steps. No owners assigned. Stakeholders can't act on this.",
+        "2": "Some next steps but vague ('start planning'). Owners missing or unclear.",
+        "3": "Next steps clear for immediate phase. Owners assigned to risks and metrics. Acceptable for low-stakes.",
+        "4": "Clear action plan with milestones, owners, dependencies. Stakeholders know what to do and when. Good for medium-stakes.",
+        "5": "Detailed execution plan with phase gates, decision points, escalation paths. RACI matrix for key activities. Stakeholders empowered to execute autonomously. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "No owners assigned to risks or metrics (accountability vacuum)",
+        "Timeline exists but no clear milestones or dependencies",
+        "Next steps are vague ('continue planning', 'monitor situation')",
+        "Unclear decision authority (who approves phase transitions?)"
+      ]
+    },
+    {
+      "name": "Realism and Feasibility",
+      "description": "Is the plan realistic given constraints (time, budget, team)? Are assumptions validated?",
+      "scoring": {
+        "1": "Unrealistic plan (6-month timeline for 2-year project). Assumptions unvalidated. Will fail.",
+        "2": "Overly optimistic. Timeline has no buffer. Assumes best-case scenario throughout.",
+        "3": "Mostly realistic. Timeline includes some buffer (10-20%). Key assumptions stated. Acceptable for low-stakes.",
+        "4": "Realistic timeline with 20-30% buffer. Assumptions validated or explicitly called out as needs validation. Good for medium-stakes.",
+        "5": "Conservative timeline with 30%+ buffer and contingency plans. All assumptions validated or mitigated. Three-point estimates for uncertain items. Exemplary for high-stakes."
+      },
+      "red_flags": [
+        "Timeline has no buffer (assumes everything goes perfectly)",
+        "Assumes team has skills they don't have (no training plan)",
+        "Budget doesn't include contingency (cost overruns likely)",
+        "Critical assumptions not validated ('we assume API will handle 10K req/s' - did you test this?)"
+      ]
+    }
+  ],
+  "stakes_guidance": {
+    "low_stakes": {
+      "description": "Initiative < 1 eng-month, reversible, limited impact. Examples: Small feature, internal tool, process tweak.",
+      "target_score": 3.0,
+      "required_criteria": [
+        "Specification Clarity ≥ 3",
+        "Risk Analysis Comprehensiveness ≥ 3",
+        "Metrics Measurability ≥ 3"
+      ],
+      "optional_criteria": [
+        "Risk Quantification (can use Low/Med/High)",
+        "Leading/Lagging Balance (3 metrics sufficient)"
+      ]
+    },
+    "medium_stakes": {
+      "description": "Initiative 1-6 months, affects multiple teams, significant impact. Examples: Service migration, product launch, infrastructure change.",
+      "target_score": 3.5,
+      "required_criteria": [
+        "All criteria ≥ 3",
+        "Specification Completeness ≥ 4",
+        "Risk Mitigation Depth ≥ 4",
+        "Metrics Measurability ≥ 4"
+      ],
+      "recommended": [
+        "Conduct premortem for risk analysis",
+        "Include counter-metrics to prevent gaming",
+        "Assign owners to all high-risk items and metrics"
+      ]
+    },
+    "high_stakes": {
+      "description": "Initiative 6+ months, company-wide, strategic/existential impact. Examples: Architecture overhaul, market expansion, regulatory compliance.",
+      "target_score": 4.0,
+      "required_criteria": [
+        "All criteria ≥ 4",
+        "Risk Quantification ≥ 4 (use quantitative analysis)",
+        "Integration ≥ 4 (traceability matrix recommended)",
+        "Actionability ≥ 4 (detailed execution plan)"
+      ],
+      "recommended": [
+        "Quantitative risk analysis (expected value, cost-benefit)",
+        "Advanced metrics frameworks (HEART, North Star, SLI/SLO)",
+        "Continuous validation loop (update risks/metrics monthly)",
+        "External review (architect, security, compliance)"
+      ]
+    }
+  },
+  "common_failure_modes": [
+    {
+      "failure_mode": "Spec Without Risks",
+      "symptoms": "Detailed specification but no risk analysis. Assumes everything will go as planned.",
+      "consequences": "Blindsided by preventable failures. No mitigation plans when issues arise.",
+      "fix": "Run 30-minute premortem: 'Imagine this failed - why?' Identify top 10 risks and mitigate."
+    },
+    {
+      "failure_mode": "Risk Theater",
+      "symptoms": "50+ risks listed but no prioritization, mitigation, or owners. Just documenting everything that could go wrong.",
+      "consequences": "Analysis paralysis. Team can't focus. Risks aren't actually managed.",
+      "fix": "Score risks by likelihood × impact. Focus on top 10 (score 6-9). Assign owners and specific mitigations."
+    },
+    {
+      "failure_mode": "Vanity Metrics",
+      "symptoms": "Tracking activity metrics ('features shipped', 'lines of code') instead of outcome metrics ('user value', 'revenue').",
+      "consequences": "Team optimizes for the wrong thing. Looks busy but doesn't deliver value.",
+      "fix": "For each metric ask: 'If this goes up, are users/business better off?' Replace vanity with value metrics."
+    },
+    {
+      "failure_mode": "Plan and Forget",
+      "symptoms": "Beautiful spec/risk/metrics doc created then never referenced again.",
+      "consequences": "Doc becomes stale. Risks materialize but aren't detected. Metrics drift from goals.",
+      "fix": "Schedule monthly reviews. Update risks (new ones, status changes). Track metrics in team rituals (sprint reviews, all-hands)."
+    },
+    {
+      "failure_mode": "Premature Precision",
+      "symptoms": "Overconfident estimates: 'Migration will take exactly 47 days and cost $487,234.19'.",
+      "consequences": "False confidence. When reality diverges, team loses trust in planning.",
+      "fix": "Use ranges (30-60 days, $400-600K). State confidence levels (50%, 90%). Build in buffer (20-30%)."
+    },
+    {
+      "failure_mode": "Disconnected Components",
+      "symptoms": "Specs, risks, and metrics are separate documents that don't reference each other.",
+      "consequences": "Metrics don't validate spec assumptions. Risks aren't mitigated by spec choices. Incoherent plan.",
+      "fix": "Explicitly map: major spec decisions → corresponding risks → metrics that detect risk. Ensure traceability."
+    },
+    {
+      "failure_mode": "No Counter-Metrics",
+      "symptoms": "Optimizing for single metric without guardrails (e.g., 'ship faster!' without quality threshold).",
+      "consequences": "Gaming the system. Ship faster but quality tanks. Optimize costs but reliability suffers.",
+      "fix": "For each primary metric, define counter-metric: what you're NOT willing to sacrifice. Monitor both."
+    },
+    {
+      "failure_mode": "Analysis Paralysis",
+      "symptoms": "Spent 3 months planning, creating perfect spec/risks/metrics, haven't started building.",
+      "consequences": "Opportunity cost. Market moves on. Team demoralized by lack of progress.",
+      "fix": "Time-box planning (1-2 weeks for most initiatives). Embrace uncertainty. Learn by doing. Update plan as you learn."
+    }
+  ],
+  "scale": 5,
+  "minimum_average_score": 3.5,
+  "interpretation": {
+    "1.0-2.0": "Inadequate. Major gaps in spec, risks, or metrics. Do not proceed. Revise artifact.",
+    "2.0-3.0": "Needs improvement. Some components present but incomplete or vague. Acceptable only for very low-stakes initiatives. Revise before proceeding with medium/high-stakes.",
+    "3.0-3.5": "Acceptable for low-stakes initiatives. Core components present with sufficient detail. For medium-stakes, strengthen risk analysis and metrics.",
+    "3.5-4.0": "Good. Ready for medium-stakes initiatives. Comprehensive spec, proactive risk management, measurable success criteria. For high-stakes, add quantitative analysis and continuous validation.",
+    "4.0-5.0": "Excellent. Ready for high-stakes initiatives. Exemplary planning with detailed execution plan, quantitative risk analysis, sophisticated metrics, and strong integration."
+  }
+}
--- a/skills/chain-spec-risk-metrics/resources/examples/microservices-migration.md
+++ b/skills/chain-spec-risk-metrics/resources/examples/microservices-migration.md
@@ -0,0 +1,346 @@
+# Microservices Migration: Spec, Risks, Metrics
+
+Complete worked example showing how to chain specification, risk analysis, and success metrics for a complex migration initiative.
+
+## 1. Specification
+
+### 1.1 Executive Summary
+
+**Goal**: Migrate our monolithic e-commerce platform to microservices architecture to enable independent team velocity, improve reliability through service isolation, and reduce deployment risk.
+
+**Timeline**: 18-month program (Q1 2024 - Q2 2025)
+- Phase 1 (Q1-Q2 2024): Foundation + Auth service
+- Phase 2 (Q3-Q4 2024): User, Order, Inventory services
+- Phase 3 (Q1-Q2 2025): Payment, Notification services + monolith sunset
+
+**Stakeholders**:
+- **Exec Sponsor**: CTO (accountable for initiative success)
+- **Engineering**: 3 teams (15 engineers total)
+- **Product**: Head of Product (feature velocity impact)
+- **Operations**: SRE team (operational complexity)
+- **Finance**: CFO (infrastructure cost impact)
+
+**Success Criteria**: Independent service deployments with <5min MTTR, 99.95% uptime, no customer-facing regressions.
+
+### 1.2 Current State (Baseline)
+
+**Architecture**: Ruby on Rails monolith (250K LOC) serving all e-commerce functions.
+
+**Current Metrics**:
+- Deployments: 2 per week (all-or-nothing deploys)
+- Deployment time: 45 minutes (build + test + deploy)
+- Mean time to recovery (MTTR): 2 hours (requires full rollback)
+- P99 API latency: 450ms (slower than target)
+- Uptime: 99.8% (below SLA of 99.9%)
+- Developer velocity: 3 weeks average for feature to production
+
+**Pain Points**:
+- Any code change requires full system deployment (high risk)
+- One team's bug can take down entire platform
+- Database hot spots limit scaling (users table has 50M rows)
+- Hard to onboard new engineers (entire codebase is context)
+- A/B testing is difficult (can't target specific services)
+
+**Previous Attempts**: Tried service-oriented architecture in 2021 but stopped due to data consistency issues and operational complexity.
+
+### 1.3 Proposed Approach
+
+**Target Architecture**:
+```
+                    [API Gateway / Envoy]
+                            |
+        ┌───────────────────┼───────────────────┐
+        ↓                   ↓                   ↓
+   [Auth Service]    [User Service]      [Order Service]
+        ↓                   ↓                   ↓
+   [Auth DB]           [User DB]           [Order DB]
+
+        ↓                   ↓                   ↓
+ [Inventory Service]  [Payment Service]  [Notification Service]
+        ↓                   ↓                   ↓
+ [Inventory DB]       [Payment DB]       [Notification Queue]
+```
+
+**Service Boundaries** (based on DDD):
+1. **Auth Service**: Authentication, authorization, session management
+2. **User Service**: User profiles, preferences, account management
+3. **Order Service**: Order creation, status, history
+4. **Inventory Service**: Product catalog, stock levels, reservations
+5. **Payment Service**: Payment processing, refunds, wallet
+6. **Notification Service**: Email, SMS, push notifications
+
+**Technology Stack**:
+- **Language**: Node.js (team expertise, async I/O for APIs)
+- **API Gateway**: Envoy (service mesh, observability)
+- **Databases**: PostgreSQL per service (team expertise, ACID guarantees)
+- **Messaging**: RabbitMQ (reliable delivery, team familiar)
+- **Observability**: OpenTelemetry → DataDog (centralized logging, metrics, tracing)
+
+**Data Migration Strategy**:
+- **Phase 1**: Read from monolith DB, write to both (dual-write pattern)
+- **Phase 2**: Read from service DB, write to both (validate consistency)
+- **Phase 3**: Cut over fully to service DB, decommission monolith tables
+
+**Deployment Strategy**:
+- Canary releases: 1% → 10% → 50% → 100% traffic per service
+- Feature flags for gradual rollout within traffic tiers
+- Automated rollback on error rate spike (>0.5%)
+
+### 1.4 Scope
+
+**In Scope (18 months)**:
+- Extract 6 services from monolith with independent deployability
+- Implement API gateway for routing and observability
+- Set up per-service CI/CD pipelines
+- Migrate data to per-service databases
+- Establish SLOs for each service (99.95% uptime, <200ms p99)
+- Train engineers on microservices patterns and operational practices
+
+**Out of Scope**:
+- Rewrite service internals (lift-and-shift code from monolith)
+- Multi-region deployment (deferred to 2026)
+- GraphQL federation (REST APIs sufficient for now)
+- Service mesh full rollout (Envoy at gateway only, not sidecar)
+- Real-time event streaming (async via RabbitMQ sufficient)
+
+**Future Considerations** (post-Q2 2025):
+- Replatform services to Go or Rust for performance
+- Implement CQRS/Event Sourcing for Order/Inventory
+- Multi-region active-active deployment
+- GraphQL federation layer for frontend
+
+### 1.5 Requirements
+
+**Functional Requirements**:
+- **FR-001**: Each service must be independently deployable without affecting others
+- **FR-002**: API Gateway must route requests to appropriate service based on URL path
+- **FR-003**: Services must communicate asynchronously for non-blocking operations (e.g., send notification after order)
+- **FR-004**: All services must implement health checks (liveness, readiness)
+- **FR-005**: Data consistency must be maintained across service boundaries (eventual consistency acceptable for non-critical paths)
+
+**Non-Functional Requirements**:
+- **Performance**:
+  - P50 API latency: <100ms (target: 50ms improvement from baseline)
+  - P99 API latency: <200ms (target: 250ms improvement)
+  - API Gateway overhead: <10ms added latency
+- **Reliability**:
+  - Per-service uptime: 99.95% (vs. 99.8% current)
+  - MTTR: <30 minutes (vs. 2 hours current)
+  - Zero-downtime deployments (vs. maintenance windows)
+- **Scalability**:
+  - Each service must handle 10K req/s independently
+  - Database connections pooled (max 100 per service)
+  - Horizontal scaling via Kubernetes (3-10 replicas per service)
+- **Security**:
+  - Service-to-service auth via mTLS
+  - API Gateway enforces rate limiting (100 req/min per user)
+  - Secrets managed via Vault (no hardcoded credentials)
+- **Operability**:
+  - Centralized logging (all services → DataDog)
+  - Distributed tracing (OpenTelemetry)
+  - Automated alerts on SLO violations
+
+### 1.6 Dependencies & Assumptions
+
+**Dependencies**:
+- Kubernetes cluster provisioned by Infrastructure team (ready by Jan 2024)
+- DataDog enterprise license approved (ready by Feb 2024)
+- RabbitMQ cluster available (SRE team owns, ready by Mar 2024)
+- Engineers complete microservices training (2-week program, Jan 2024)
+
+**Assumptions**:
+- No major product launches during migration (to reduce risk overlap)
+- Database schema changes can be coordinated across monolith and services
+- Existing test coverage is sufficient (80% for critical paths)
+- Customer traffic grows <20% during migration period
+
+**Constraints**:
+- Budget: $500K (infrastructure + tooling + training)
+- Team: 15 engineers (no additional headcount)
+- Timeline: 18 months firm (customer commitment for improved reliability)
+- Compliance: Must maintain PCI-DSS for payment service
+
+### 1.7 Timeline & Milestones
+
+| Milestone | Date | Deliverable | Success Criteria |
+|-----------|------|-------------|------------------|
+| **M0: Foundation** | Mar 2024 | API Gateway deployed, observability in place | Gateway routes traffic, 100% tracing coverage |
+| **M1: Auth Service** | Jun 2024 | Auth extracted, deployed independently | Zero auth regressions, <200ms p99 latency |
+| **M2: User + Order** | Sep 2024 | User and Order services live | Independent deploys, data consistency validated |
+| **M3: Inventory** | Dec 2024 | Inventory service live | Stock reservations work, no overselling |
+| **M4: Payment** | Mar 2025 | Payment service live (PCI-DSS compliant) | Payment success rate >99.9%, audit passed |
+| **M5: Notification** | May 2025 | Notification service live | Email/SMS delivery >99%, queue processed <1min |
+| **M6: Monolith Sunset** | Jun 2025 | Monolith decommissioned | All traffic via services, monolith DB read-only |
+
+## 2. Risk Analysis
+
+### 2.1 Premortem Summary
+
+**Exercise Prompt**: "It's June 2025. We launched microservices migration and it failed catastrophically. Engineering morale is low, customers are experiencing outages, and the CTO is considering rolling back to the monolith. What went wrong?"
+
+**Top Failure Modes Identified** (from cross-functional team premortem):
+
+1. **Data Consistency Nightmare** (Engineering): Dual-write bugs caused order/inventory mismatches, overselling inventory, customer complaints.
+
+2. **Distributed System Cascading Failures** (SRE): One slow service (Payment) caused timeouts in all upstream services, bringing down entire platform.
+
+3. **Operational Complexity Overwhelmed Team** (SRE): Too many services to monitor, alert fatigue, SRE team couldn't keep up with incidents.
+
+4. **Performance Degradation** (Engineering): Network hops between services added latency, checkout flow is now slower than monolith.
+
+5. **Data Migration Errors** (Engineering): Production migration script had bugs, lost 50K user records, required emergency restore.
+
+6. **Team Skill Gaps** (Management): Engineers lacked distributed systems expertise, made common mistakes (distributed transactions, thundering herd).
+
+7. **Cost Overruns** (Finance): Per-service databases and infrastructure cost 3× more than budgeted, CFO halted project.
+
+8. **Feature Velocity Dropped** (Product): Cross-service changes require coordinating 3 teams, slowing down product velocity.
+
+9. **Security Vulnerabilities** (Security): Service-to-service auth misconfigured, unauthorized service access, data breach.
+
+10. **Incomplete Migration** (Management): Ran out of time, stuck with half-migrated state (some services live, monolith still critical).
+
+### 2.2 Risk Register
+
+| Risk ID | Risk Description | Category | Likelihood | Impact | Score | Mitigation Strategy | Owner | Status |
+|---------|-----------------|----------|------------|--------|-------|---------------------|-------|--------|
+| **R1** | Data inconsistency between monolith DB and service DBs during dual-write phase, leading to customer-facing bugs (order not found, wrong inventory) | Technical | High (60%) | High | 9 | **Mitigation**: (1) Implement reconciliation job to detect mismatches, (2) Extensive integration tests for dual-write paths, (3) Shadow mode: write to both but read from monolith initially to validate, (4) Automated rollback if mismatch rate >0.1% | Tech Lead - Data | Open |
+| **R2** | Cascading failures: slow/failing service causes timeouts in all dependent services, total outage | Technical | Medium (40%) | High | 6 | **Mitigation**: (1) Circuit breakers on all service clients (fail fast), (2) Bulkhead pattern (isolate thread pools per service), (3) Timeout tuning (aggressive timeouts <1sec), (4) Graceful degradation (fallback to cached data) | SRE Lead | Open |
+| **R3** | Operational complexity overwhelms SRE team: too many alerts, incidents, runbooks to manage | Operational | High (70%) | Medium | 6 | **Mitigation**: (1) Standardize observability across services (common dashboards), (2) SLO-based alerting only (eliminate noise), (3) Automated remediation for common issues, (4) Hire 2 additional SREs (approved) | SRE Manager | Open |
+| **R4** | Performance degradation: added network latency from service calls makes checkout slower than monolith | Technical | Medium (50%) | Medium | 4 | **Mitigation**: (1) Baseline perf tests on monolith, (2) Set p99 latency budget per service (<50ms), (3) Async where possible (notification, inventory reservation), (4) Load tests at 2× expected traffic | Perf Engineer | Open |
+| **R5** | Data migration script errors cause data loss in production | Technical | Low (20%) | High | 3 | **Mitigation**: (1) Dry-run migration on prod snapshot, (2) Automated validation (row counts, checksums), (3) Incremental migration (batch of 10K users at a time), (4) Keep monolith DB as source of truth during migration, (5) Point-in-time recovery tested | DB Admin | Open |
+| **R6** | Team lacks microservices expertise, makes preventable mistakes (no circuit breakers, distributed transactions, etc.) | Organizational | Medium (50%) | Medium | 4 | **Mitigation**: (1) Mandatory 2-week microservices training (Jan 2024), (2) Architecture review board for service designs, (3) Pair programming with experienced engineers, (4) Reference implementation as template | Engineering Manager | Open |
+| **R7** | Infrastructure costs exceed budget: per-service DBs, K8s overhead, observability tooling cost 3× estimate | Organizational | Medium (40%) | Medium | 4 | **Mitigation**: (1) Detailed cost model before each phase, (2) Right-size instances (start small, scale up), (3) Use managed services (RDS) to reduce ops cost, (4) Monthly cost reviews with Finance | Finance Partner | Open |
+| **R8** | Feature velocity drops: cross-service changes require coordination, slowing product development | Organizational | High (60%) | Low | 3 | **Mitigation**: (1) Design services with clear boundaries (minimize cross-service changes), (2) Establish API contracts early, (3) Service teams own full stack (no handoffs), (4) Track deployment frequency as leading indicator | Product Manager | Accepted |
+| **R9** | Service-to-service auth misconfigured, allowing unauthorized access or data leaks | External | Low (15%) | High | 3 | **Mitigation**: (1) mTLS enforced at gateway and between services, (2) Security audit before each service goes live, (3) Penetration test after full migration, (4) Principle of least privilege (services can only access what they need) | Security Lead | Open |
+| **R10** | Migration incomplete by deadline: stuck in half-migrated state, technical debt accumulates | Organizational | Medium (40%) | High | 6 | **Mitigation**: (1) Phased rollout with hard cutover dates, (2) Executive sponsorship to protect time, (3) No feature work during final 2 months (migration focus), (4) Rollback plan if timeline slips | Program Manager | Open |
+
+### 2.3 Risk Mitigation Timeline
+
+**Pre-Launch (Jan-Feb 2024)**:
+- R6: Complete microservices training for all engineers
+- R7: Finalize cost model and get CFO sign-off
+- R3: Hire additional SREs, onboard them
+
+**Phase 1 (Mar-Jun 2024 - Auth Service)**:
+- R1: Validate dual-write reconciliation with Auth DB
+- R2: Implement circuit breakers in Auth service client
+- R4: Baseline latency tests, set p99 budgets
+- R5: Dry-run migration on prod snapshot
+
+**Phase 2 (Jul-Dec 2024 - User, Order, Inventory)**:
+- R1: Reconciliation jobs running for all 3 services
+- R2: Bulkhead pattern implemented across services
+- R4: Load tests at 2× traffic before each service launch
+
+**Phase 3 (Jan-Jun 2025 - Payment, Notification, Sunset)**:
+- R9: Security audit and pen test before Payment goes live
+- R10: Hard cutover date for monolith sunset, no slippage
+
+**Continuous (Throughout)**:
+- R3: Monthly SRE team health check (alert fatigue, runbook gaps)
+- R7: Monthly cost reviews vs. budget
+- R8: Track feature velocity every sprint
+
+## 3. Success Metrics
+
+### 3.1 Leading Indicators (Early Signals)
+
+| Metric | Baseline | Target | Measurement Method | Tracking Cadence | Owner |
+|--------|----------|--------|-------------------|------------------|-------|
+| **Deployment Frequency** (per service) | 2/week (monolith) | 10+/week (per service) | Git tag count in CI/CD | Weekly | Tech Lead |
+| **Build + Test Time** | 25 min (monolith) | <10 min (per service) | CI/CD pipeline duration | Per build | DevOps |
+| **Code Review Cycle Time** | 2 days | <1 day | GitHub PR metrics | Weekly | Engineering Manager |
+| **Test Coverage** (per service) | 80% (monolith) | >85% (per service) | Jest coverage report | Per commit | QA Lead |
+| **Incident Detection Time** (MTTD) | 15 min | <5 min | DataDog alert → Slack | Per incident | SRE |
+
+**Rationale**: These predict future success. If deployment frequency increases early, it validates independent deployability. If incident detection improves, observability is working.
+
+### 3.2 Lagging Indicators (Outcome Measures)
+
+| Metric | Baseline | Target | Measurement Method | Tracking Cadence | Owner |
+|--------|----------|--------|-------------------|------------------|-------|
+| **System Uptime** (SLO) | 99.8% | 99.95% | DataDog uptime monitor | Daily | SRE Lead |
+| **API p99 Latency** | 450ms | <200ms | OpenTelemetry traces | Real-time dashboard | Perf Engineer |
+| **Mean Time to Recovery** (MTTR) | 2 hours | <30 min | Incident timeline analysis | Per incident | SRE Lead |
+| **Customer-Impacting Incidents** | 8/month | <3/month | PagerDuty severity 1 & 2 | Monthly | Engineering Manager |
+| **Feature Velocity** (stories/sprint) | 12 stories | >15 stories | Jira velocity report | Per sprint | Product Manager |
+| **Infrastructure Cost** | $50K/month | <$70K/month | AWS billing dashboard | Monthly | Finance |
+
+**Rationale**: These measure actual outcomes. Uptime and MTTR validate reliability improvements. Latency and velocity validate performance and productivity gains.
+
+### 3.3 Counter-Metrics (What We Won't Sacrifice)
+
+| Metric | Threshold | Monitoring Method | Escalation Trigger |
+|--------|-----------|-------------------|-------------------|
+| **Code Quality** (bug rate) | <5 bugs/sprint | Jira bug tickets | >10 bugs/sprint → halt features, fix bugs |
+| **Team Morale** (happiness score) | >7/10 | Quarterly eng survey | <6/10 → leadership 1:1s, workload adjustment |
+| **Security Posture** (critical vulns) | 0 critical | Snyk security scans | Any critical vuln → immediate fix before ship |
+| **Data Integrity** (order accuracy) | >99.99% | Reconciliation jobs | <99.9% → halt migrations, investigate |
+| **Customer Satisfaction** (NPS) | >40 | Quarterly NPS survey | <30 → customer interviews, rollback if needed |
+
+**Rationale**: Prevent gaming the system. Don't ship faster at the expense of quality. Don't improve latency by cutting security. Don't optimize costs by burning out the team.
+
+### 3.4 Success Criteria Summary
+
+**Must-Haves** (hard requirements):
+- All 6 services independently deployable by Jun 2025
+- 99.95% uptime maintained throughout migration
+- Zero data loss during migrations
+- PCI-DSS compliance for Payment service
+- Infrastructure cost <$70K/month
+
+**Should-Haves** (target achievements):
+- P99 latency <200ms (250ms improvement from baseline)
+- MTTR <30 min (90-minute improvement)
+- Deployment frequency >10/week per service (5× improvement)
+- Feature velocity >15 stories/sprint (25% improvement)
+- Customer-impacting incidents <3/month (63% reduction)
+
+**Nice-to-Haves** (stretch goals):
+- Multi-region deployment capability (deferred to 2026)
+- GraphQL federation layer (deferred)
+- Event streaming for real-time analytics (deferred)
+- Team self-rates microservices maturity >8/10 (vs. 4/10 today)
+
+## 4. Self-Assessment
+
+Evaluated using `rubric_chain_spec_risk_metrics.json`:
+
+### Specification Quality
+- **Clarity** (4.5/5): Goal, scope, timeline, approach clearly stated with diagrams
+- **Completeness** (4.0/5): All sections covered; could add more detail on monolith sunset plan
+- **Feasibility** (4.0/5): 18-month timeline is aggressive but achievable with current team; phased approach mitigates risk
+
+### Risk Analysis Quality
+- **Comprehensiveness** (4.5/5): 10 risks across technical, operational, organizational; premortem surfaced non-obvious risks (team skill gaps, cost overruns)
+- **Quantification** (3.5/5): Likelihood/impact scored; could add $ impact estimates for high-risk items
+- **Mitigation Depth** (4.5/5): Each high-risk item has detailed mitigation with specific actions (circuit breakers, reconciliation jobs, etc.)
+
+### Metrics Quality
+- **Measurability** (5.0/5): All metrics have clear baseline, target, measurement method, cadence
+- **Leading/Lagging Balance** (4.5/5): Good mix of early signals (deployment frequency, MTTD) and outcomes (uptime, MTTR)
+- **Counter-Metrics** (4.0/5): Explicit guardrails (quality, morale, security); prevents optimization myopia
+
+### Integration
+- **Spec→Risk Mapping** (4.5/5): Major spec decisions (dual-write, service boundaries) have corresponding risks (R1, R8)
+- **Risk→Metrics Mapping** (4.5/5): High-risk items tracked by metrics (R1 → data integrity counter-metric, R2 → MTTR)
+- **Coherence** (4.5/5): All three components tell consistent story of complex migration with proactive risk management
+
+### Actionability
+- **Stakeholder Clarity** (4.5/5): Clear owners for each risk, metric; stakeholders can act on this document
+- **Timeline Realism** (4.0/5): 18-month timeline is aggressive; includes buffer (80% work, 20% buffer in each phase)
+
+**Overall Average**: 4.3/5 ✓ (Exceeds 3.5 minimum standard)
+
+**Strengths**:
+- Comprehensive risk analysis with specific, actionable mitigations
+- Clear metrics with baselines, targets, and measurement methods
+- Phased rollout reduces big-bang risk
+
+**Areas for Improvement**:
+- Add quantitative cost impact for high-risk items (e.g., R1 data inconsistency could cost $X in customer refunds)
+- More detail on monolith sunset plan (how to decommission safely)
+- Consider adding "reverse premortem" (what would make this wildly successful?) to identify opportunities
+
+**Recommendation**: Proceed with migration. Risk mitigation strategies are sound. Metrics will provide early warning if initiative is off track. Schedule quarterly reviews to update risks/metrics as we learn.
--- a/skills/chain-spec-risk-metrics/resources/methodology.md
+++ b/skills/chain-spec-risk-metrics/resources/methodology.md
@@ -0,0 +1,393 @@
+# Chain Spec Risk Metrics Methodology
+
+Advanced techniques for complex multi-phase programs, novel risks, and sophisticated metric frameworks.
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Chain Spec Risk Metrics Progress:
+- [ ] Step 1: Gather initiative context
+- [ ] Step 2: Write comprehensive specification
+- [ ] Step 3: Conduct premortem and build risk register
+- [ ] Step 4: Define success metrics and instrumentation
+- [ ] Step 5: Validate completeness and deliver
+```
+
+**Step 1: Gather initiative context** - Collect goal, constraints, stakeholders, baseline. Use template questions plus assess complexity using [5. Complexity Decision Matrix](#5-complexity-decision-matrix).
+
+**Step 2: Write comprehensive specification** - For multi-phase programs use [1. Advanced Specification Techniques](#1-advanced-specification-techniques) including phased rollout strategies and requirements traceability matrix.
+
+**Step 3: Conduct premortem and build risk register** - Apply [2. Advanced Risk Assessment](#2-advanced-risk-assessment) methods including quantitative analysis, fault tree analysis, and premortem variations for comprehensive risk identification.
+
+**Step 4: Define success metrics** - Use [3. Advanced Metrics Frameworks](#3-advanced-metrics-frameworks) such as HEART, AARRR, North Star, or SLI/SLO depending on initiative type.
+
+**Step 5: Validate and deliver** - Ensure integration using [4. Integration Best Practices](#4-integration-best-practices) and check for [6. Anti-Patterns](#6-anti-patterns) before delivering.
+
+## 1. Advanced Specification Techniques
+
+### Multi-Phase Program Decomposition
+
+**When**: Initiative is too large to execute in one phase (>6 months, >10 people, multiple systems).
+
+**Approach**: Break into phases with clear value delivery at each stage.
+
+**Phase Structure**:
+- **Phase 0 (Discovery)**: Research, prototyping, validate assumptions
+  - Deliverable: Technical spike, proof of concept, feasibility report
+  - Metrics: Prototype success rate, assumption validation rate
+- **Phase 1 (Foundation)**: Core infrastructure, no user-facing features yet
+  - Deliverable: Platform/APIs deployed, instrumentation in place
+  - Metrics: System stability, API latency, error rates
+- **Phase 2 (Alpha)**: Limited rollout to internal users or pilot customers
+  - Deliverable: Feature complete for target use case, internal feedback
+  - Metrics: User activation, time-to-value, critical bugs
+- **Phase 3 (Beta)**: Broader rollout, feature complete, gather feedback
+  - Deliverable: Production-ready with known limitations
+  - Metrics: Adoption rate, support tickets, performance under load
+- **Phase 4 (GA)**: General availability, full feature set, scaled operations
+  - Deliverable: Fully launched, documented, supported
+  - Metrics: Market penetration, revenue, NPS, operational costs
+
+**Phase Dependencies**:
+- Document what each phase depends on (previous phase completion, external dependencies)
+- Define phase gates (criteria to proceed to next phase)
+- Include rollback plans if a phase fails
+
+### Phased Rollout Strategies
+
+**Canary Deployment**:
+- Roll out to 1% → 10% → 50% → 100% of traffic/users
+- Monitor metrics at each stage before expanding
+- Automatic rollback if error rates spike
+
+**Ring Deployment**:
+- Ring 0: Internal employees (catch obvious bugs)
+- Ring 1: Pilot customers (friendly users, willing to provide feedback)
+- Ring 2: Early adopters (power users, high tolerance)
+- Ring 3: General availability (all users)
+
+**Feature Flags**:
+- Deploy code but keep feature disabled
+- Gradually enable for user segments
+- A/B test impact before full rollout
+
+**Geographic Rollout**:
+- Region 1 (smallest/lowest risk) → Region 2 → Region 3 → Global
+- Allows testing localization, compliance, performance in stages
+
+### Requirements Traceability Matrix
+
+For complex initiatives, map requirements → design → implementation → tests → risks → metrics.
+
+| Requirement ID | Requirement | Design Doc | Implementation | Test Cases | Related Risks | Success Metric |
+|----------------|-------------|------------|----------------|------------|---------------|----------------|
+| REQ-001 | Auth via OAuth 2.0 | Design-Auth.md | auth-service/ | test/auth/ | R3, R7 | Auth success rate > 99.9% |
+| REQ-002 | API p99 < 200ms | Design-Perf.md | gateway/ | test/perf/ | R1, R5 | p99 latency metric |
+
+**Benefits**:
+- Ensures nothing is forgotten (every requirement has tests, metrics, risk mitigation)
+- Helps with impact analysis (if a risk materializes, which requirements are affected?)
+- Useful for audit/compliance (trace from business need → implementation → validation)
+
+## 2. Advanced Risk Assessment
+
+### Quantitative Risk Analysis
+
+**When**: High-stakes initiatives where "Low/Medium/High" risk scoring is insufficient.
+
+**Approach**: Assign probabilities (%) and impact ($$) to risks, calculate expected loss.
+
+**Example**:
+- **Risk**: Database migration fails, requiring full rollback
+- **Probability**: 15% (based on similar past migrations)
+- **Impact**: $500K (3 eng-weeks × 10 engineers × $50K/year/eng + customer churn)
+- **Expected Loss**: 15% × $500K = $75K
+- **Mitigation Cost**: $30K (comprehensive testing, dry-run on prod snapshot)
+- **Decision**: Invest $30K to mitigate (expected savings $45K)
+
+**Three-Point Estimation** for uncertainty:
+- **Optimistic**: Best-case scenario (10th percentile)
+- **Most Likely**: Expected case (50th percentile)
+- **Pessimistic**: Worst-case scenario (90th percentile)
+- **Expected Value**: (Optimistic + 4×Most Likely + Pessimistic) / 6
+
+### Fault Tree Analysis
+
+**When**: Analyzing how multiple small failures combine to cause catastrophic outcome.
+
+**Approach**: Work backward from catastrophic failure to root causes using logic gates.
+
+**Example**: "Customer data breach"
+```
+[Customer Data Breach]
+       ↓ (OR gate - any path causes breach)
+    ┌──┴──┐
+[Auth bypass] OR [DB exposed to internet]
+    ↓ (AND gate - both required)
+ ┌──┴──┐
+[SQL injection] AND [No input validation]
+```
+
+**Insights**:
+- Identify single points of failure (only one thing has to go wrong)
+- Reveal defense-in-depth opportunities (add redundant protections)
+- Prioritize mitigations (block root causes that appear in multiple paths)
+
+### Bow-Tie Risk Diagrams
+
+**When**: Complex risks with multiple causes and multiple consequences.
+
+**Structure**:
+```
+[Causes] → [Preventive Controls] → [RISK EVENT] → [Mitigative Controls] → [Consequences]
+```
+
+**Example**: Risk = "Production database unavailable"
+
+**Left side (Causes + Prevention)**:
+- Cause 1: Hardware failure → Prevention: Redundant instances, health checks
+- Cause 2: Human error (bad migration) → Prevention: Dry-run on snapshot, peer review
+- Cause 3: DDoS attack → Prevention: Rate limiting, WAF
+
+**Right side (Mitigation + Consequences)**:
+- Consequence 1: Users can't access app → Mitigation: Read replica for degraded mode
+- Consequence 2: Revenue loss → Mitigation: SLA credits, customer communication plan
+- Consequence 3: Reputational damage → Mitigation: PR plan, status page transparency
+
+**Benefits**: Shows full lifecycle of risk (how to prevent, how to respond if it happens anyway).
+
+### Premortem Variations
+
+**Reverse Premortem** ("It succeeded wildly - how?"):
+- Identifies what must go RIGHT for success
+- Reveals critical success factors often overlooked
+- Example: "We succeeded because we secured executive sponsorship early and maintained it throughout"
+
+**Pre-Parade** (best-case scenario):
+- What would make this initiative exceed expectations?
+- Identifies opportunities to amplify impact
+- Example: "If we also integrate with Salesforce, we could unlock enterprise market"
+
+**Lateral Premortem** (stakeholder-specific):
+- Run separate premortems for each stakeholder group
+- Engineering: "It failed because of technical reasons..."
+- Product: "It failed because users didn't adopt it..."
+- Operations: "It failed because we couldn't support it at scale..."
+
+## 3. Advanced Metrics Frameworks
+
+### HEART Framework (Google)
+
+For user-facing products, track:
+- **Happiness**: User satisfaction (NPS, CSAT surveys)
+- **Engagement**: Level of user involvement (DAU/MAU, sessions/user)
+- **Adoption**: New users accepting product (% of target users activated)
+- **Retention**: Rate users come back (7-day/30-day retention)
+- **Task Success**: Efficiency completing tasks (completion rate, time on task, error rate)
+
+**Application**:
+- Leading: Adoption rate, task success rate
+- Lagging: Retention, engagement over time
+- Counter-metric: Happiness (don't sacrifice UX for engagement)
+
+### AARRR Pirate Metrics
+
+For growth-focused initiatives:
+- **Acquisition**: Users discover product (traffic sources, signup rate)
+- **Activation**: Users have good first experience (onboarding completion, time-to-aha)
+- **Retention**: Users return (D1/D7/D30 retention)
+- **Revenue**: Users pay (conversion rate, ARPU, LTV)
+- **Referral**: Users bring others (viral coefficient, NPS)
+
+**Application**:
+- Identify bottleneck stage (where most users drop off)
+- Focus initiative on improving that stage
+- Track funnel conversion at each stage
+
+### North Star + Supporting Metrics
+
+**North Star Metric**: Single metric that best captures value delivered to customers.
+- Examples: Weekly active users (social app), time-to-insight (analytics), API calls/week (platform)
+
+**Supporting Metrics**: Inputs that drive the North Star.
+- Example NSM: Weekly Active Users
+  - Supporting: New user signups, activation rate, feature engagement, retention rate
+
+**Application**:
+- All initiatives tie back to moving North Star or supporting metrics
+- Prevents metric myopia (optimizing one metric at expense of others)
+- Aligns team around common goal
+
+### SLI/SLO/SLA Framework
+
+For reliability-focused initiatives:
+- **SLI (Service Level Indicator)**: What you measure (latency, error rate, throughput)
+- **SLO (Service Level Objective)**: Target for SLI (p99 latency < 200ms, error rate < 0.1%)
+- **SLA (Service Level Agreement)**: Consequence if SLO not met (customer credits, escalation)
+
+**Error Budget**:
+- If SLO is 99.9% uptime, error budget is 0.1% downtime (43 minutes/month)
+- Can "spend" error budget on risky deployments
+- If budget exhausted, halt feature work and focus on reliability
+
+**Application**:
+- Define SLIs/SLOs for each major component
+- Track burn rate (how fast you're consuming error budget)
+- Use as gate for deployment (don't deploy if error budget exhausted)
+
+### Metrics Decomposition Trees
+
+**When**: Complex metric needs to be broken down into actionable components.
+
+**Example**: Increase revenue
+```
+Revenue
+├─ New Customer Revenue
+│  ├─ New Customers (leads × conversion rate)
+│  └─ Average Deal Size (features × pricing tier)
+└─ Existing Customer Revenue
+   ├─ Expansion (upsell rate × expansion ARR)
+   └─ Retention (renewal rate × existing ARR)
+```
+
+**Application**:
+- Identify which leaf nodes to focus on
+- Each leaf becomes a metric to track
+- Reveals non-obvious leverage points (e.g., increasing renewal rate might have bigger impact than new customer acquisition)
+
+## 4. Integration Best Practices
+
+### Specs → Risks Mapping
+
+**Principle**: Every major specification decision should have corresponding risks identified.
+
+**Example**:
+- **Spec**: "Use MongoDB for user data storage"
+- **Risks**:
+  - R1: MongoDB performance degrades above 10M documents (mitigation: sharding strategy)
+  - R2: Team lacks MongoDB expertise (mitigation: training, hire consultant)
+  - R3: Data model changes require migration (mitigation: schema versioning)
+
+**Implementation**:
+- Review each spec section and ask "What could go wrong with this choice?"
+- Ensure alternative approaches are considered with their risks
+- Document why chosen approach is best despite risks
+
+### Risks → Metrics Mapping
+
+**Principle**: Each high-impact risk should have a metric that detects if it's materializing.
+
+**Example**:
+- **Risk**: "Database performance degrades under load"
+- **Metrics**:
+  - Leading: Query time p95 (early warning before user impact)
+  - Lagging: User-reported latency complaints
+  - Counter-metric: Don't over-optimize for read speed at expense of write consistency
+
+**Implementation**:
+- For each risk score 6-9, define early warning metric
+- Set up alerts/dashboards to monitor these metrics
+- Define escalation thresholds (when to invoke mitigation plan)
+
+### Metrics → Specs Mapping
+
+**Principle**: Specifications should include instrumentation to enable metric collection.
+
+**Example**:
+- **Metric**: "API p99 latency < 200ms"
+- **Spec Requirements**:
+  - Distributed tracing (OpenTelemetry) for end-to-end latency
+  - Per-endpoint latency bucketing (identify slow endpoints)
+  - Client-side RUM (real user monitoring) for user-perceived latency
+
+**Implementation**:
+- Review metrics and ask "What instrumentation is needed?"
+- Add observability requirements to spec (logging, metrics, tracing)
+- Include instrumentation in acceptance criteria
+
+### Continuous Validation Loop
+
+**Pattern**: Spec → Implement → Measure → Validate Risks → Update Spec
+
+**Steps**:
+1. **Initial Spec**: Document approach, risks, metrics
+2. **Phase 1 Implementation**: Build and deploy
+3. **Measure Reality**: Collect actual metrics vs. targets
+4. **Validate Risk Mitigations**: Did mitigations work? New risks emerged?
+5. **Update Spec**: Revise for next phase based on learnings
+
+**Example**:
+- **Phase 1 Spec**: Expected 10K req/s with single instance
+- **Reality**: Actual 3K req/s (bottleneck in DB queries)
+- **Risk Update**: Add R8: Query optimization needed for target load
+- **Metric Update**: Add query execution time to dashboards
+- **Phase 2 Spec**: Refactor queries, add read replicas, target 12K req/s
+
+## 5. Complexity Decision Matrix
+
+Use this matrix to decide when to use this skill vs. simpler approaches:
+
+| Initiative Characteristics | Recommended Approach |
+|---------------------------|----------------------|
+| **Low Stakes** (< 1 eng-month, reversible, no user impact) | Lightweight spec, simple checklist, skip formal risk register |
+| **Medium Stakes** (1-3 months, some teams, moderate impact) | Use template.md: Full spec + premortem + 3-5 metrics |
+| **High Stakes** (3-6 months, cross-team, high impact) | Use template.md + methodology: Quantitative risk analysis, comprehensive metrics |
+| **Strategic** (6+ months, company-level, existential) | Use methodology + external review: Fault tree, SLI/SLOs, continuous validation |
+
+**Heuristics**:
+- If failure would cost >$100K or 6+ eng-months, use full methodology
+- If initiative affects >1000 users or >3 teams, use at least template
+- If uncertainty is high, invest more in risk analysis and phased rollout
+- If metrics are complex or novel, use advanced frameworks (HEART, North Star)
+
+## 6. Anti-Patterns
+
+**Spec Only (No Risks/Metrics)**:
+- Symptom: Detailed spec but no discussion of what could go wrong or how to measure success
+- Fix: Run quick premortem (15 min), define 3 must-track metrics minimum
+
+**Risk Theater (Long List, No Mitigations)**:
+- Symptom: 50+ risks identified but no prioritization or mitigation plans
+- Fix: Score risks, focus on top 10, assign owners and mitigations
+
+**Vanity Metrics (Measures Activity Not Outcomes)**:
+- Symptom: Tracking "features shipped" instead of "user value delivered"
+- Fix: For each metric ask "If this goes up, are users/business better off?"
+
+**Plan and Forget (No Updates)**:
+- Symptom: Beautiful spec/risk/metrics doc created then never referenced
+- Fix: Schedule monthly reviews, update risks/metrics, track in team rituals
+
+**Premature Precision (Overconfident Estimates)**:
+- Symptom: "Migration will take exactly 47 days and cost $487K"
+- Fix: Use ranges (30-60 days, $400-600K), state confidence levels, build in buffer
+
+**Analysis Paralysis (Perfect Planning)**:
+- Symptom: Spent 3 months planning, haven't started building
+- Fix: Time-box planning (1-2 weeks for most initiatives), embrace uncertainty, learn by doing
+
+## 7. Templates for Common Initiative Types
+
+### Migration (System/Platform/Data)
+- **Spec Focus**: Current vs. target architecture, migration path, rollback plan, validation
+- **Risk Focus**: Data loss, downtime, performance regression, failed migration
+- **Metrics Focus**: Migration progress %, data integrity, system performance, rollback capability
+
+### Launch (Product/Feature)
+- **Spec Focus**: User stories, UX flows, technical design, launch checklist
+- **Risk Focus**: Low adoption, technical bugs, scalability issues, competitive response
+- **Metrics Focus**: Activation, engagement, retention, revenue impact, support tickets
+
+### Infrastructure Change
+- **Spec Focus**: Architecture diagram, capacity planning, runbooks, disaster recovery
+- **Risk Focus**: Outages, cost overruns, security vulnerabilities, operational complexity
+- **Metrics Focus**: Uptime, latency, costs, incident count, MTTR
+
+### Process Change (Organizational)
+- **Spec Focus**: Current vs. new process, roles/responsibilities, training plan, timeline
+- **Risk Focus**: Adoption resistance, productivity drop, key person dependency, communication gaps
+- **Metrics Focus**: Process compliance, cycle time, employee satisfaction, quality metrics
+
+For complete worked example, see [examples/microservices-migration.md](../examples/microservices-migration.md).
--- a/skills/chain-spec-risk-metrics/resources/template.md
+++ b/skills/chain-spec-risk-metrics/resources/template.md
@@ -0,0 +1,369 @@
+# Chain Spec Risk Metrics Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Chain Spec Risk Metrics Progress:
+- [ ] Step 1: Gather initiative context
+- [ ] Step 2: Write comprehensive specification
+- [ ] Step 3: Conduct premortem and build risk register
+- [ ] Step 4: Define success metrics and instrumentation
+- [ ] Step 5: Validate completeness and deliver
+```
+
+**Step 1: Gather initiative context** - Collect goal, constraints, stakeholders, baseline, desired outcomes. See [Context Questions](#context-questions).
+
+**Step 2: Write comprehensive specification** - Document scope, approach, requirements, dependencies, timeline. See [Quick Template](#quick-template) and [Specification Guidance](#specification-guidance).
+
+**Step 3: Conduct premortem and build risk register** - Run failure imagination exercise, categorize risks, assign mitigations/owners. See [Premortem Process](#premortem-process) and [Risk Register Template](#risk-register-template).
+
+**Step 4: Define success metrics** - Identify leading/lagging indicators, baselines, targets, counter-metrics. See [Metrics Template](#metrics-template) and [Instrumentation Guidance](#instrumentation-guidance).
+
+**Step 5: Validate and deliver** - Self-check with rubric (≥3.5 average), ensure all components align. See [Quality Checklist](#quality-checklist).
+
+## Quick Template
+
+```markdown
+# [Initiative Name]: Spec, Risks, Metrics
+
+## 1. Specification
+
+### 1.1 Executive Summary
+- **Goal**: [What are you building/changing and why?]
+- **Timeline**: [When does this need to be delivered?]
+- **Stakeholders**: [Who cares about this initiative?]
+- **Success Criteria**: [What does done look like?]
+
+### 1.2 Current State (Baseline)
+- [Describe the current state/system]
+- [Key metrics/data points about current state]
+- [Pain points or limitations driving this initiative]
+
+### 1.3 Proposed Approach
+- **Architecture/Design**: [High-level approach]
+- **Implementation Plan**: [Phases, milestones, dependencies]
+- **Technology Choices**: [Tools, frameworks, platforms with rationale]
+
+### 1.4 Scope
+- **In Scope**: [What this initiative includes]
+- **Out of Scope**: [What is explicitly excluded]
+- **Future Considerations**: [What might come later]
+
+### 1.5 Requirements
+**Functional Requirements:**
+- [Feature/capability 1]
+- [Feature/capability 2]
+- [Feature/capability 3]
+
+**Non-Functional Requirements:**
+- **Performance**: [Latency, throughput, scalability targets]
+- **Reliability**: [Uptime, error rates, recovery time]
+- **Security**: [Authentication, authorization, data protection]
+- **Compliance**: [Regulatory, policy, audit requirements]
+
+### 1.6 Dependencies & Assumptions
+- **Dependencies**: [External teams, systems, resources needed]
+- **Assumptions**: [What we're assuming is true]
+- **Constraints**: [Budget, time, resource, technical limitations]
+
+### 1.7 Timeline & Milestones
+| Milestone | Date | Deliverable |
+|-----------|------|-------------|
+| [Phase 1] | [Date] | [What's delivered] |
+| [Phase 2] | [Date] | [What's delivered] |
+| [Phase 3] | [Date] | [What's delivered] |
+
+## 2. Risk Analysis
+
+### 2.1 Premortem Summary
+**Exercise Prompt**: "It's [X months] from now. This initiative failed catastrophically. What went wrong?"
+
+**Top failure modes identified:**
+1. [Failure mode 1]
+2. [Failure mode 2]
+3. [Failure mode 3]
+4. [Failure mode 4]
+5. [Failure mode 5]
+
+### 2.2 Risk Register
+
+| Risk ID | Risk Description | Category | Likelihood | Impact | Risk Score | Mitigation Strategy | Owner | Status |
+|---------|-----------------|----------|------------|--------|-----------|---------------------|-------|--------|
+| R1 | [Specific failure scenario] | Technical | High | High | 9 | [How you'll prevent/reduce] | [Name] | Open |
+| R2 | [Specific failure scenario] | Operational | Medium | High | 6 | [How you'll prevent/reduce] | [Name] | Open |
+| R3 | [Specific failure scenario] | Organizational | Medium | Medium | 4 | [How you'll prevent/reduce] | [Name] | Open |
+| R4 | [Specific failure scenario] | External | Low | High | 3 | [How you'll prevent/reduce] | [Name] | Open |
+
+**Risk Scoring**: Low=1, Medium=2, High=3. Risk Score = Likelihood × Impact.
+
+**Risk Categories**:
+- **Technical**: Architecture, code quality, infrastructure, performance
+- **Operational**: Processes, runbooks, support, operations
+- **Organizational**: Resources, skills, alignment, communication
+- **External**: Market, vendors, regulation, dependencies
+
+### 2.3 Risk Mitigation Timeline
+- **Pre-Launch**: [Risks to mitigate before going live]
+- **Launch Window**: [Monitoring and safeguards during launch]
+- **Post-Launch**: [Ongoing monitoring and response]
+
+## 3. Success Metrics
+
+### 3.1 Leading Indicators (Early Signals)
+| Metric | Baseline | Target | Measurement Method | Tracking Cadence | Owner |
+|--------|----------|--------|-------------------|------------------|-------|
+| [Predictive metric 1] | [Current value] | [Goal value] | [How measured] | [How often] | [Name] |
+| [Predictive metric 2] | [Current value] | [Goal value] | [How measured] | [How often] | [Name] |
+| [Predictive metric 3] | [Current value] | [Goal value] | [How measured] | [How often] | [Name] |
+
+**Examples**: Deployment frequency, code review cycle time, test coverage, incident detection time
+
+### 3.2 Lagging Indicators (Outcome Measures)
+| Metric | Baseline | Target | Measurement Method | Tracking Cadence | Owner |
+|--------|----------|--------|-------------------|------------------|-------|
+| [Outcome metric 1] | [Current value] | [Goal value] | [How measured] | [How often] | [Name] |
+| [Outcome metric 2] | [Current value] | [Goal value] | [How measured] | [How often] | [Name] |
+| [Outcome metric 3] | [Current value] | [Goal value] | [How measured] | [How often] | [Name] |
+
+**Examples**: Uptime, user adoption rate, revenue impact, customer satisfaction score
+
+### 3.3 Counter-Metrics (What We Won't Sacrifice)
+| Metric | Threshold | Monitoring Method | Escalation Trigger |
+|--------|-----------|-------------------|-------------------|
+| [Protection metric 1] | [Minimum acceptable] | [How monitored] | [When to escalate] |
+| [Protection metric 2] | [Minimum acceptable] | [How monitored] | [When to escalate] |
+| [Protection metric 3] | [Minimum acceptable] | [How monitored] | [When to escalate] |
+
+**Examples**: Code quality (test coverage > 80%), team morale (happiness score > 7/10), security posture (no critical vulnerabilities), user privacy (zero data leaks)
+
+### 3.4 Success Criteria Summary
+**Must-haves (hard requirements)**:
+- [Critical success criterion 1]
+- [Critical success criterion 2]
+- [Critical success criterion 3]
+
+**Should-haves (target achievements)**:
+- [Desired outcome 1]
+- [Desired outcome 2]
+- [Desired outcome 3]
+
+**Nice-to-haves (stretch goals)**:
+- [Aspirational outcome 1]
+- [Aspirational outcome 2]
+```
+
+## Context Questions
+
+**Basics**: What are you building/changing? Why now? Who are the stakeholders?
+
+**Constraints**: When does this need to be delivered? What's the budget? What resources are available?
+
+**Current State**: What exists today? What's the baseline? What are the pain points?
+
+**Desired Outcomes**: What does success look like? What metrics would you track? What are you most worried about?
+
+**Scope**: Greenfield/migration/enhancement? Single or multi-phase? Who is the primary user?
+
+## Specification Guidance
+
+### Scope Definition
+
+**In Scope** should be specific and concrete:
+- ✅ "Migrate user authentication service to OAuth 2.0 with PKCE"
+- ❌ "Improve authentication"
+
+**Out of Scope** prevents scope creep:
+- List explicitly what won't be included in this phase
+- Reference future work that might address excluded items
+- Example: "Out of Scope: Social login (Google/GitHub) - deferred to Phase 2"
+
+### Requirements Best Practices
+
+**Functional Requirements**: What the system must do
+- Use "must" for requirements, "should" for preferences
+- Be specific: "System must handle 10K requests/sec" not "System should be fast"
+- Include acceptance criteria: How will you verify this requirement is met?
+
+**Non-Functional Requirements**: How well the system must perform
+- **Performance**: Quantify with numbers (latency < 200ms, throughput > 1000 req/s)
+- **Reliability**: Define uptime SLAs, error budgets, MTTR targets
+- **Security**: Specify authentication, authorization, encryption requirements
+- **Scalability**: Define growth expectations (users, data, traffic)
+
+### Timeline & Milestones
+
+Make milestones concrete and verifiable:
+- ✅ "Milestone 1 (Mar 15): Auth service deployed to staging, 100% auth tests passing"
+- ❌ "Milestone 1: Complete most of auth work"
+
+Include buffer for unknowns:
+- 80% planned work, 20% buffer for issues/tech debt
+- Identify critical path and dependencies clearly
+
+## Premortem Process
+
+### Step 1: Set the Scene
+
+Frame the failure scenario clearly:
+> "It's [6/12/24] months from now - [Date]. We launched [initiative name] and it failed catastrophically. [Stakeholders] are upset. The team is demoralized. What went wrong?"
+
+Choose timeframe based on initiative:
+- Quick launch: 3-6 months
+- Major migration: 12-24 months
+- Strategic change: 24+ months
+
+### Step 2: Brainstorm Failures (Independent)
+
+Have each stakeholder privately write 3-5 specific failure modes:
+- Be specific: "Database migration lost 10K user records" not "data issues"
+- Think from your domain: Engineers focus on technical, PMs on product, Ops on operational
+- No filtering: List even unlikely scenarios
+
+### Step 3: Share and Cluster
+
+Collect all failure modes and group similar ones:
+- **Technical failures**: System design, code bugs, infrastructure
+- **Operational failures**: Runbooks missing, wrong escalation, poor monitoring
+- **Organizational failures**: Lack of skills, poor communication, misaligned incentives
+- **External failures**: Vendor issues, market changes, regulatory changes
+
+### Step 4: Vote and Prioritize
+
+For each cluster, assess:
+- **Likelihood**: How probable is this? (Low 10-30%, Medium 30-60%, High 60%+)
+- **Impact**: How bad would it be? (Low = annoying, Medium = painful, High = catastrophic)
+- **Risk Score**: Likelihood × Impact (1-9 scale)
+
+Focus mitigation on High-risk items (score 6-9).
+
+### Step 5: Convert to Risk Register
+
+For each significant failure mode:
+1. Reframe as a risk: "Risk that [specific scenario] happens"
+2. Categorize (Technical/Operational/Organizational/External)
+3. Assign owner (who monitors and responds)
+4. Define mitigation (how to reduce likelihood or impact)
+5. Track status (Open/Mitigated/Accepted/Closed)
+
+## Risk Register Template
+
+### Risk Statement Format
+
+Good risk statements are specific and measurable:
+- ✅ "Risk that database migration script fails to preserve foreign key relationships, causing data integrity errors in 15% of records"
+- ❌ "Risk of data issues"
+
+### Mitigation Strategies
+
+**Reduce Likelihood** (prevent the risk):
+- Example: "Implement dry-run migration on production snapshot; verify all FK relationships before live migration"
+
+**Reduce Impact** (limit the damage):
+- Example: "Create rollback script tested on staging; set up real-time monitoring for FK violations; keep read replica as backup"
+
+**Accept Risk** (consciously choose not to mitigate):
+- For low-impact or very-low-likelihood risks
+- Document why it's acceptable: "Accept risk of 3rd-party API rate limiting during launch (low likelihood, workaround available)"
+
+**Transfer Risk** (shift to vendor/insurance):
+- Example: "Use managed database service with automated backups and point-in-time recovery (transfers operational risk to vendor)"
+
+### Risk Owners
+
+Each risk needs a clear owner who:
+- Monitors early warning signals
+- Executes mitigation plans
+- Escalates if risk materializes
+- Updates risk status regularly
+
+Not necessarily the person who fixes it, but the person accountable for tracking it.
+
+## Metrics Template
+
+### Leading Indicators (Early Signals)
+
+These predict future success before lagging metrics move:
+- **Engineering**: Deployment frequency, build time, code review cycle time, test coverage
+- **Product**: Feature adoption rate, activation rate, time-to-value, engagement trends
+- **Operations**: Incident detection time, MTTD, runbook execution rate, alert accuracy
+
+Choose 3-5 leading indicators that:
+1. Predict lagging outcomes (validated correlation)
+2. Can be measured frequently (daily/weekly)
+3. You can act on quickly (short feedback loop)
+
+### Lagging Indicators (Outcomes)
+
+These measure actual success but appear later:
+- **Reliability**: Uptime, error rate, MTTR, SLA compliance
+- **Performance**: p50/p95/p99 latency, throughput, response time
+- **Business**: Revenue, user growth, retention, NPS, cost savings
+- **User**: Activation rate, feature adoption, satisfaction score
+
+Choose 3-5 lagging indicators that:
+1. Directly measure initiative goals
+2. Are measurable and objective
+3. Stakeholders care about
+
+### Counter-Metrics (Guardrails)
+
+What success does NOT mean sacrificing:
+- If optimizing for speed → Counter-metric: Code quality (test coverage, bug rate)
+- If optimizing for growth → Counter-metric: Costs (infrastructure spend, CAC)
+- If optimizing for features → Counter-metric: Technical debt (cycle time, deployment frequency)
+
+Choose 2-3 counter-metrics to:
+1. Prevent gaming the system
+2. Protect long-term sustainability
+3. Maintain team/user trust
+
+## Instrumentation Guidance
+
+**Baseline**: Measure current state before launch (✅ "p99 latency: 500ms" ❌ "API is slow"). Without baseline, you can't measure improvement.
+
+**Targets**: Make them specific ("reduce p99 from 500ms to 200ms"), achievable (industry benchmarks), time-bound ("by Q2 end"), meaningful (tied to outcomes).
+
+**Measurement**: Document data source, calculation method, measurement frequency, and who can access the metrics.
+
+**Tracking Cadence**: Real-time (system health), daily (operations), weekly (product), monthly (business).
+
+## Quality Checklist
+
+Before delivering, verify:
+
+**Specification Complete:**
+- [ ] Goal, stakeholders, timeline clearly stated
+- [ ] Current state baseline documented with data
+- [ ] Scope (in/out/future) explicitly defined
+- [ ] Requirements are specific and measurable
+- [ ] Dependencies and assumptions listed
+- [ ] Timeline has concrete milestones
+
+**Risks Comprehensive:**
+- [ ] Premortem conducted (failure imagination exercise)
+- [ ] Risks span technical, operational, organizational, external
+- [ ] Each risk has likelihood, impact, score
+- [ ] High-risk items (6-9 score) have detailed mitigation plans
+- [ ] Every risk has an assigned owner
+- [ ] Risk register is prioritized by score
+
+**Metrics Measurable:**
+- [ ] 3-5 leading indicators (early signals) defined
+- [ ] 3-5 lagging indicators (outcomes) defined
+- [ ] 2-3 counter-metrics (guardrails) defined
+- [ ] Each metric has baseline, target, measurement method
+- [ ] Metrics have clear owners and tracking cadence
+- [ ] Success criteria (must/should/nice-to-have) documented
+
+**Integration:**
+- [ ] Risks map to specs (e.g., technical risks tied to architecture choices)
+- [ ] Metrics validate risk mitigations (e.g., measure if mitigation worked)
+- [ ] Specs enable metrics (e.g., instrumentation built into design)
+- [ ] All three components tell a coherent story
+
+**Rubric Validation:**
+- [ ] Self-assessed with rubric ≥ 3.5 average across all criteria
+- [ ] Stakeholders can act on this artifact
+- [ ] Gaps and unknowns explicitly acknowledged
--- a/skills/chef-assistant/SKILL.md
+++ b/skills/chef-assistant/SKILL.md
@@ -0,0 +1,249 @@
+---
+name: chef-assistant
+description: Use when cooking or planning meals, troubleshooting recipes, learning culinary techniques (knife skills, sauces, searing), understanding food science (Maillard reaction, emulsions, brining), building flavor profiles (salt/acid/fat/heat balance), plating and presentation, exploring global cuisines and cultural food traditions, diagnosing taste problems, requesting substitutions or pantry hacks, planning menus, or when users mention cooking, recipes, chef, cuisine, flavor, technique, plating, food science, seasoning, or culinary questions.
+---
+# Chef Assistant
+
+## Table of Contents
+- [Purpose](#purpose)
+- [When to Use](#when-to-use)
+- [What Is It?](#what-is-it)
+- [Workflow](#workflow)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Chef Assistant helps you cook with confidence by combining:
+
+- **Culinary technique** (knife skills, sauces, searing, braising)
+- **Food science** (why things work—Maillard, emulsions, brining)
+- **Flavor architecture** (salt, acid, fat, heat, sweet, bitter, umami, aroma, texture)
+- **Cultural context** (how cuisines solve similar problems)
+- **Home cooking pragmatism** (substitutions, shortcuts, pantry hacks)
+- **Presentation clarity** (plating principles for home cooks)
+
+This moves you from following recipes blindly to understanding principles, so you can improvise, troubleshoot, and create.
+
+## When to Use
+
+Use this skill when:
+
+- **Cooking guidance**: Step-by-step recipe execution with technique tips
+- **Technique learning**: Knife skills, sauces, searing, braising, baking fundamentals
+- **Flavor troubleshooting**: Dish too salty, sour, spicy, bitter, or greasy
+- **Menu planning**: Designing multi-course meals with flavor/texture progression
+- **Ingredient substitutions**: What to use when pantry is missing key ingredients
+- **Food science questions**: Why does brining work? What is Maillard reaction?
+- **Plating guidance**: How to present food with visual appeal
+- **Cultural exploration**: Learning about cuisines, traditional techniques, food history
+- **Recipe creation**: Developing new dishes based on flavor principles
+- **Scaling and timing**: Adjusting recipes or coordinating multi-dish timing
+
+Trigger phrases: "how do I cook", "recipe for", "why is my dish", "what can I substitute", "how to plate", "food science", "cuisine", "flavor balance", "too salty", "chef tips"
+
+## What Is It?
+
+Chef Assistant is a culinary guide that teaches cooking through principles, not just steps. Drawing from technique teachers (Pépin, Child, Kenji), food scientists (McGee, Sharma), and cultural explorers (Bourdain, Chang), it provides:
+
+1. **Why before how**: Science and rationale before steps
+2. **Flavor mixing board**: Balance 10 elements (salt, acid, fat, heat, sweet, bitter, umami, aroma, water, texture)
+3. **Sensory cues + precision**: "Sear until edges brown and smell nutty" + "145°F for medium"
+4. **Reproducible methods**: Ratios and templates (3:1 vinaigrette, 0.8-1.2% dry brine)
+5. **Cultural context**: How other cuisines approach similar challenges
+
+**Quick example:**
+
+**User asks**: "My chicken breast is always dry. What am I doing wrong?"
+
+**Chef Assistant response**:
+1. **Diagnosis**: Likely overcooked. Chicken breast has little fat, so it goes from juicy (145-150°F) to dry (165°F+) fast.
+2. **Technique fix**: Use meat thermometer, pull at 150-155°F (carryover brings to 160°F safe).
+3. **Flavor fix**: Dry brine (1% salt by weight, 6-24h) keeps it juicy.
+4. **Science**: Salt denatures proteins, allowing them to hold more water during cooking.
+5. **Texture add**: Finish with crispy skin or crunchy topping for contrast.
+6. **Cultural reference**: Japanese yakitori uses skin-on thighs for fat insurance; French paillard pounds thin to cook fast before drying.
+
+**Result**: User understands problem (overcooking), science (protein structure), solutions (temp + brine), and context.
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Chef Assistant Progress:
+- [ ] Step 1: Define cooking goal and constraints
+- [ ] Step 2: Identify key techniques and principles
+- [ ] Step 3: Build flavor architecture
+- [ ] Step 4: Plan texture and contrast
+- [ ] Step 5: Execute with sensory cues and precision
+- [ ] Step 6: Plate and present with intention
+```
+
+**Step 1: Define cooking goal**
+
+Specify what you're making, dietary constraints, equipment available, skill level, and time budget. Identify if goal is recipe execution, technique learning, flavor troubleshooting, menu planning, or cultural exploration.
+
+**Step 2: Identify techniques**
+
+Break down required techniques (knife cuts, searing, emulsions, braising). Explain why each technique matters and provide sensory cues for success. Reference [resources/template.md](resources/template.md) for technique breakdowns.
+
+**Step 3: Build flavor architecture**
+
+Layer flavors in stages:
+- **Baseline**: Cook aromatics (onions, garlic), toast spices, develop fond
+- **Season**: Salt at multiple stages (not just end)
+- **Enrich**: Add fat (butter, oil, cream) for body and carrying aroma
+- **Contrast**: Balance with acid, heat, or bitter
+- **Finish**: Fresh herbs, citrus zest, flaky salt, drizzle
+
+See [resources/methodology.md](resources/methodology.md#flavor-systems) for advanced flavor pairing.
+
+**Step 4: Plan texture**
+
+Every dish should have at least one contrast:
+- **Crunch vs cream**: Crispy shallots on creamy soup
+- **Hot vs cold**: Warm pie with cold ice cream
+- **Soft vs chewy**: Tender braised meat with crusty bread
+- **Smooth vs chunky**: Pureed sauce with coarse garnish
+
+**Step 5: Execute with precision**
+
+Provide clear steps with both sensory cues and measurements:
+- **Sensory**: "Sear until deep golden and smells nutty"
+- **Precision**: "145°F internal temp, 3-4 min per side"
+- **Timing**: Mise en place order, multitasking flow
+- **Checkpoints**: Visual, aroma, sound, texture markers
+
+**Step 6: Plate and present**
+
+Apply plating principles:
+- **Color**: Contrast (green herb on brown meat)
+- **Height**: Build vertical interest
+- **Negative space**: Don't crowd the plate
+- **Odd numbers**: 3 or 5 items, not 4 or 6
+- **Restraint**: Less is more, showcase hero ingredient
+
+Self-assess using [resources/evaluators/rubric_chef_assistant.json](resources/evaluators/rubric_chef_assistant.json). **Minimum standard**: Average score ≥ 3.5.
+
+## Common Patterns
+
+**Pattern 1: Recipe Execution with Technique Teaching**
+- **Goal**: Cook a specific dish while learning transferable skills
+- **Approach**: Provide recipe with embedded technique explanations (why we sear, why we rest meat, why we add acid)
+- **Key elements**: Mise en place checklist, timing flow, sensory cues + precision temps, technique sidebars
+- **Output**: Completed dish + understanding of 2-3 techniques applicable to other recipes
+- **Example**: Making pan-seared steak → learn Maillard reaction, resting for juice redistribution, pan sauce from fond
+
+**Pattern 2: Flavor Troubleshooting**
+- **Goal**: Fix dish that tastes off (too salty, sour, flat, greasy)
+- **Approach**: Diagnose imbalance, explain why it happened, provide corrective actions
+- **Key framework**: Salt/acid/fat/heat/sweet/bitter/umami balance wheel
+- **Corrections**:
+  - Too salty → bulk/dilute, acid, fat
+  - Too sour → fat, sweet, salt
+  - Too spicy → dairy, sweet, starch
+  - Flat/boring → salt first, then acid or umami
+  - Too greasy → acid + salt + crunch
+- **Output**: Rescued dish + understanding of flavor balance
+- **Example**: Soup too salty → add unsalted stock (bulk), squeeze lemon (acid masks salt), swirl in cream (fat softens)
+
+**Pattern 3: Technique Deep Dive**
+- **Goal**: Master specific technique (knife skills, mother sauces, emulsions, braising)
+- **Approach**: Explain principle, demonstrate technique, provide practice path
+- **Structure**: Why it matters → science/mechanics → step-by-step → common mistakes → practice exercises
+- **Output**: Reproducible technique skill
+- **Example**: Emulsions (vinaigrette, mayo, hollandaise) → explain emulsification (fat suspended in water via emulsifier) → show whisking technique → troubleshoot breaking → practice progression
+
+**Pattern 4: Menu Planning with Progression**
+- **Goal**: Design multi-course meal with intentional flavor/texture progression
+- **Approach**: Map courses for variety in flavor intensity, temperature, texture, cooking method
+- **Progression principles**:
+  - Light → heavy (don't peak too early)
+  - Fresh → rich (acid/herbs early, fat/umami later)
+  - Texture variety (alternate crispy, creamy, chewy)
+  - Palate cleansers (sorbet between courses)
+- **Output**: Balanced menu with timing plan
+- **Example**: 4-course dinner → bright salad with citrus vinaigrette → seafood with white wine sauce → braised short rib with root vegetables → light citrus tart
+
+**Pattern 5: Cultural Cooking Exploration**
+- **Goal**: Learn cuisine through its principles, not just recipes
+- **Approach**: Identify flavor base, core techniques, ingredient philosophy, cultural context
+- **Elements**: Aromatic base (mirepoix, sofrito, trinity), signature spices, cooking methods, meal structure
+- **Output**: Understanding of cuisine's logic + 2-3 signature recipes
+- **Example**: Thai cuisine → balance sweet/sour/salty/spicy in every dish, use fish sauce for umami, emphasize fresh herbs, contrasting textures (crispy + soft)
+
+## Guardrails
+
+**Critical requirements:**
+
+1. **Salt at multiple stages**: Don't season only at end. Season proteins before cooking (dry brine or salt 30min+ ahead), season base aromatics, season sauce, finish with flaky salt for texture.
+
+2. **Use meat thermometer**: Visual cues alone are unreliable. Invest in instant-read thermometer. Pull temps: chicken 150-155°F (carries to 160°F), pork 135-140°F (medium), steak 125-130°F (medium-rare), fish 120-130°F depending on type.
+
+3. **Taste as you go**: Adjust seasoning incrementally. Add salt/acid/fat in small amounts, taste, repeat. Can't un-salt, but can always add more.
+
+4. **Mise en place before heat**: Prep everything before you start cooking. Dice all vegetables, measure spices, prep aromatics. High-heat cooking moves fast—no time to chop mid-sear.
+
+5. **Control heat**: Most home cooks cook too hot. High heat for searing only. Medium for sautéing aromatics. Low for sauces and gentle cooking. Preheat pans properly (water droplet test).
+
+6. **Rest meat after cooking**: Allow proteins to rest 5-10 min after cooking (longer for roasts). Juices redistribute, carryover cooking completes. Tent with foil if worried about cooling.
+
+7. **Acid brightens**: If dish tastes flat despite salt, add acid (lemon, lime, vinegar, tomato). Acid wakes up flavors and balances richness.
+
+8. **Fat carries flavor**: Aroma compounds are fat-soluble. Toast spices in oil/butter to release flavor. Finish sauces with fat for body and sheen.
+
+**Common pitfalls:**
+
+- ❌ **Overcrowding pan**: Creates steam, not sear. Leave space between items. Cook in batches if needed.
+- ❌ **Moving food too much**: Let it sit to develop crust. Don't flip steak 10 times—flip once.
+- ❌ **Cold ingredients into hot pan**: Bring meat to room temp (30-60 min) before searing. Cold center = overcooked outside.
+- ❌ **Using dull knives**: Dull knives slip and are dangerous. Sharp knives cut cleanly with control. Hone regularly, sharpen periodically.
+- ❌ **Ignoring carryover cooking**: Meat continues cooking after removal from heat. Pull 5-10°F below target temp.
+- ❌ **Undersalting**: Most home cooking is undersalted. Professional rule: season boldly at each stage.
+
+## Quick Reference
+
+**Key resources:**
+
+- **[resources/template.md](resources/template.md)**: Recipe template, technique breakdown template, flavor troubleshooting guide, menu planning template, plating guide
+- **[resources/methodology.md](resources/methodology.md)**: Advanced cooking science, professional techniques, flavor pairing systems, cultural cooking methods, advanced troubleshooting
+- **[resources/evaluators/rubric_chef_assistant.json](resources/evaluators/rubric_chef_assistant.json)**: Quality criteria for cooking guidance and execution
+
+**Quick ratios and formulas:**
+
+- **Vinaigrette**: 3:1 oil:acid + mustard emulsifier + salt (thin with water to taste)
+- **Pan sauce**: Fond + ¼-½ cup liquid → reduce by half → swirl 1-2 Tbsp cold butter → acid/herb
+- **Quick pickle**: 1:1:1 water:vinegar:sugar + 2-3% salt
+- **Dry brine**: 0.8-1.2% salt by weight (fish 30-90 min, chicken 6-24h, roasts 24-48h)
+- **Pasta water ratio**: 1% salt by weight (10g salt per liter water)
+- **Roux ratio**: Equal parts fat and flour by weight (melt fat, whisk in flour, cook 2-10 min depending on color)
+
+**Typical workflow time:**
+
+- Recipe execution guidance: 5-15 minutes (depending on recipe complexity)
+- Technique teaching: 10-20 minutes (includes explanation + practice guidance)
+- Flavor troubleshooting: 5-10 minutes (diagnosis + corrections)
+- Menu planning: 15-30 minutes (multi-course with timing)
+- Cultural cuisine exploration: 20-40 minutes (principles + 2-3 recipes)
+
+**When to escalate:**
+
+- Advanced pastry (tempering chocolate, laminated doughs)
+- Molecular gastronomy (spherification, sous vide precision)
+- Professional butchery and charcuterie
+- Large-scale catering logistics
+- Specialized dietary needs (medical diets, severe allergies)
+→ Consult specialized culinary resources or professionals
+
+**Inputs required:**
+
+- **Cooking goal**: What you want to make or learn
+- **Constraints**: Dietary, equipment, time, skill level
+- **Current state** (if troubleshooting): What's wrong with dish
+- **Ingredients available**: What you're working with
+
+**Outputs produced:**
+
+- `chef-assistant-guide.md`: Complete cooking guide with recipe, techniques, flavor architecture, plating guidance, and cultural context
--- a/skills/chef-assistant/resources/evaluators/rubric_chef_assistant.json
+++ b/skills/chef-assistant/resources/evaluators/rubric_chef_assistant.json
@@ -0,0 +1,313 @@
+{
+  "criteria": [
+    {
+      "name": "Flavor Architecture & Balance",
+      "description": "Is the flavor profile well-balanced across salt, acid, fat, heat, sweet, bitter, umami, and aroma?",
+      "scoring": {
+        "1": "Single-dimensional flavor. No balance attempted. Missing key elements (e.g., no acid to balance richness, no salt depth). Tastes flat or overwhelming in one dimension.",
+        "3": "Basic balance attempted. Salt and acid present but may not be fully integrated. Some flavor elements considered. Could be more nuanced or layered.",
+        "5": "Exemplary balance: salt at multiple stages, acid brightens without dominating, fat carries aroma, umami provides depth, texture provides contrast. Flavors build in layers (baseline → season → enrich → contrast → finish). Each element supports others."
+      }
+    },
+    {
+      "name": "Technique Execution & Precision",
+      "description": "Are cooking techniques properly executed with attention to timing, temperature, and sensory cues?",
+      "scoring": {
+        "1": "Poor technique execution. Overcooking, under-seasoning, or improper heat control evident. No use of thermometer or sensory checkpoints. Timing haphazard.",
+        "3": "Adequate technique. Basic execution (searing, sautéing) performed but may lack refinement. Some attention to temperature and timing. Could use more precision or sensory awareness.",
+        "5": "Professional technique execution. Proper searing (high heat, dry surface, crust development), accurate temperature control (thermometer used, pull temps correct), sensory checkpoints applied (smell, color, sound, texture). Timing orchestrated for multi-component dishes. Resting, carryover cooking understood."
+      }
+    },
+    {
+      "name": "Texture & Contrast",
+      "description": "Does the dish incorporate textural variety and contrast (crispy/creamy, hot/cold, soft/chewy)?",
+      "scoring": {
+        "1": "Single texture throughout. No contrast (all soft, all mushy). Monotonous mouthfeel. No consideration of textural elements.",
+        "3": "Some textural variety. One contrast present (e.g., crispy topping on soft base). Could be more intentional or pronounced.",
+        "5": "Deliberate textural architecture: at least one clear contrast (crispy shallots on creamy soup, crunchy slaw with tender meat, hot protein with cold sauce). Multiple textures create sensory excitement. Textural elements added at proper time (crispy elements at end to preserve crunch)."
+      }
+    },
+    {
+      "name": "Science & Rationale",
+      "description": "Is the cooking approach grounded in food science with clear explanations of why techniques work?",
+      "scoring": {
+        "1": "No scientific rationale provided. Instructions are rote steps without explanation. No understanding of Maillard, protein denaturation, emulsification, or other key concepts.",
+        "3": "Basic science mentioned. Some explanations for techniques (e.g., 'sear for crust'). Could go deeper into mechanisms (why protein denatures, how emulsions form).",
+        "5": "Comprehensive scientific grounding: Maillard reaction explained (temp, dry surface, browning), protein denaturation stages (120-180°F), emulsification technique (slow oil addition, emulsifier role), starch gelatinization, carryover cooking. Science informs technique choices and troubleshooting."
+      }
+    },
+    {
+      "name": "Seasoning Strategy",
+      "description": "Is seasoning (salt, acid, fat) applied strategically at multiple stages, not just at the end?",
+      "scoring": {
+        "1": "Single-stage seasoning (only at end). Undersalted or oversalted. No acid balance. Fat not used to carry flavor. No tasting and adjustment protocol.",
+        "3": "Basic seasoning strategy. Salt added at beginning and end. Acid or fat considered. Some tasting, but could be more systematic.",
+        "5": "Multi-stage seasoning: salt proteins before cooking (dry brine or 30+ min ahead), season aromatics, season sauce throughout, finish with flaky salt. Acid added at end to brighten. Fat used to carry aroma (toast spices in oil). Taste-and-adjust protocol followed (small increments, taste, repeat)."
+      }
+    },
+    {
+      "name": "Mise en Place & Workflow",
+      "description": "Is proper mise en place practiced with organized prep, timing, and execution flow?",
+      "scoring": {
+        "1": "No mise en place. Ingredients not prepped before cooking starts. Scrambling to chop mid-sauté. Poor timing (components not ready simultaneously). Disorganized workflow.",
+        "3": "Basic mise en place. Most ingredients prepped ahead, but some gaps. Timing mostly works out. Could be more organized or efficient.",
+        "5": "Professional mise en place discipline: all ingredients prepped and arranged in cooking order before heat applied, timing backwards-planned from service time, multitasking and orchestration evident (oven and stovetop used efficiently), holding strategies for components, last-minute tasks clearly identified."
+      }
+    },
+    {
+      "name": "Plating & Presentation",
+      "description": "Is the dish plated with attention to color, height, negative space, and visual appeal?",
+      "scoring": {
+        "1": "No plating consideration. Food dumped on plate. Overcrowded, no negative space. Dirty rim. No garnish or finish. Visually unappealing.",
+        "3": "Basic plating. Components arranged, not dumped. Some attention to color or placement. Clean rim. Could use more height, negative space, or intentional garnish.",
+        "5": "Intentional plating: color contrast (green herb on brown protein), height built (not flat), negative space (clean rim, not overcrowded), odd numbers (3 or 5 elements), restraint (hero ingredient showcased), finishing touches (flaky salt, oil drizzle, fresh herb at end)."
+      }
+    },
+    {
+      "name": "Cultural Context & Authenticity",
+      "description": "Is cultural context provided when relevant, with respect for traditional techniques and flavors?",
+      "scoring": {
+        "1": "No cultural context. Techniques or ingredients used without understanding origin or significance. Inauthentic or disrespectful adaptations. Food stripped of story.",
+        "3": "Basic cultural awareness. Origin mentioned. Some traditional techniques acknowledged. Could provide more depth or context.",
+        "5": "Rich cultural context: origin of dish or technique explained, traditional approach honored, variations across regions noted, cultural significance shared (when dish is eaten, what it represents). Adaptations made thoughtfully with awareness of trade-offs. Food treated as story and connection, not just flavor."
+      }
+    },
+    {
+      "name": "Troubleshooting & Problem-Solving",
+      "description": "Are clear troubleshooting strategies provided for common problems (too salty, flat, broken sauce)?",
+      "scoring": {
+        "1": "No troubleshooting guidance. If something goes wrong, no recovery strategy. Doesn't anticipate common problems. No diagnostic framework.",
+        "3": "Basic troubleshooting. Some common problems addressed (e.g., 'add lemon if flat'). Could be more systematic or comprehensive in problem diagnosis and correction.",
+        "5": "Comprehensive troubleshooting framework: diagnostic tree for flavor imbalances (salt → acid → fat → umami sequence), corrective actions with ratios (½ tsp acid, 1-2 Tbsp butter), emulsion rescue techniques, burnt vs. charred distinction, sauce breaking recovery. Prevention strategies included."
+      }
+    },
+    {
+      "name": "Substitutions & Adaptability",
+      "description": "Are ingredient substitutions and adaptations provided for dietary needs, pantry constraints, or preferences?",
+      "scoring": {
+        "1": "No substitutions offered. Rigid recipe. No adaptations for dietary restrictions, missing ingredients, or skill levels. Not accessible.",
+        "3": "Basic substitutions mentioned. Some alternatives for key ingredients. Could offer more options or explain impact of substitutions on flavor/texture.",
+        "5": "Flexible and adaptable: comprehensive substitutions with ratios (butter → olive oil at 75% amount, garlic powder ⅛ tsp per clove), dietary adaptations (dairy-free, gluten-free options), scaling guidance (up and down), skill-level modifications (shortcuts for weeknights, advanced techniques for enthusiasts). Impact of substitutions explained (flavor, texture differences)."
+      }
+    }
+  ],
+  "minimum_score": 3.5,
+  "guidance_by_cooking_type": {
+    "Recipe Creation": {
+      "target_score": 4.2,
+      "focus_criteria": [
+        "Flavor Architecture & Balance",
+        "Technique Execution & Precision",
+        "Seasoning Strategy"
+      ],
+      "key_requirements": [
+        "Complete mise en place list with prep notes",
+        "Multi-stage seasoning (not just end)",
+        "Sensory cues + precision temps/times",
+        "Flavor balance across salt/acid/fat/umami",
+        "At least one texture contrast",
+        "Troubleshooting guidance included",
+        "Substitutions for key ingredients"
+      ],
+      "common_pitfalls": [
+        "Single-stage seasoning (only salting at end)",
+        "No sensory checkpoints (only times, no 'golden brown and fragrant')",
+        "Missing texture contrast (all soft or all crunchy)",
+        "No troubleshooting (what if too salty, flat, overcooked?)",
+        "Rigid recipe (no substitutions or adaptations)"
+      ]
+    },
+    "Technique Teaching": {
+      "target_score": 4.3,
+      "focus_criteria": [
+        "Technique Execution & Precision",
+        "Science & Rationale",
+        "Troubleshooting & Problem-Solving"
+      ],
+      "key_requirements": [
+        "Explain WHY technique matters (principle before steps)",
+        "Food science grounded (Maillard, denaturation, emulsification)",
+        "Step-by-step with sensory cues at each stage",
+        "Common mistakes identified with prevention strategies",
+        "Practice progression (beginner → intermediate → advanced)",
+        "Related techniques noted (when to use which)"
+      ],
+      "common_pitfalls": [
+        "Steps without rationale ('do this because I said so')",
+        "No sensory cues (only temps, no visual/aroma/sound indicators)",
+        "Missing common mistakes section (learners repeat same errors)",
+        "No practice progression (thrown into deep end)",
+        "Technique taught in isolation (no connection to broader cooking)"
+      ]
+    },
+    "Flavor Troubleshooting": {
+      "target_score": 4.0,
+      "focus_criteria": [
+        "Flavor Architecture & Balance",
+        "Troubleshooting & Problem-Solving",
+        "Seasoning Strategy"
+      ],
+      "key_requirements": [
+        "Diagnostic framework (identify primary imbalance first)",
+        "Corrective actions with specific amounts (½ tsp acid, 1-2 Tbsp butter)",
+        "Prioritized fixes (Priority 1, 2, 3 sequence)",
+        "Explanation of why correction works (acid masks salt, fat softens)",
+        "Prevention strategy for future (salt at multiple stages, taste as you go)",
+        "Salvage strategies if beyond fixing (repurpose, dilute, companion pairing)"
+      ],
+      "common_pitfalls": [
+        "Vague corrections ('add more seasoning' without specifics)",
+        "No diagnostic framework (throwing random fixes at problem)",
+        "Missing prevention guidance (doesn't help avoid problem next time)",
+        "No salvage strategy (when dish is beyond fixing)",
+        "Ignoring multiple imbalances (only addressing salt, missing acid/fat)"
+      ]
+    },
+    "Menu Planning": {
+      "target_score": 3.9,
+      "focus_criteria": [
+        "Flavor Architecture & Balance",
+        "Texture & Contrast",
+        "Mise en Place & Workflow"
+      ],
+      "key_requirements": [
+        "Flavor progression across courses (light → heavy, acid → rich → sweet)",
+        "Texture variety (alternate crispy, creamy, tender, crunchy)",
+        "Temperature variety (cold → hot → hot → cold)",
+        "Cooking method diversity (raw, sautéed, braised, baked)",
+        "Timing plan (backwards from service time)",
+        "Prep schedule (day before, morning of, 2h before, last minute)",
+        "Holding strategies and contingency plans"
+      ],
+      "common_pitfalls": [
+        "No flavor progression (all heavy, or all rich, or no acid breaks)",
+        "Texture repetition (everything soft, or everything crispy)",
+        "Poor timing (everything needs oven simultaneously)",
+        "No prep schedule (scrambling at service time)",
+        "Missing contingency plans (no backup if dish fails)"
+      ]
+    },
+    "Cultural Cooking": {
+      "target_score": 4.1,
+      "focus_criteria": [
+        "Cultural Context & Authenticity",
+        "Flavor Architecture & Balance",
+        "Technique Execution & Precision"
+      ],
+      "key_requirements": [
+        "Origin and cultural significance explained",
+        "Traditional techniques honored and explained",
+        "Regional variations noted",
+        "Flavor philosophy of cuisine (Thai balance, Japanese umami, French refinement)",
+        "Key ingredients and their roles (fish sauce in Thai, dashi in Japanese)",
+        "Respectful adaptations (if substitutions needed, acknowledge trade-offs)",
+        "Food as story (when eaten, what it represents)"
+      ],
+      "common_pitfalls": [
+        "No cultural context (technique without origin or significance)",
+        "Inauthentic fusion (random mixing without understanding)",
+        "Ignoring traditional methods (shortcuts that lose essence)",
+        "Missing regional variations (cuisine treated as monolithic)",
+        "Substitutions without acknowledgment (authenticity claimed incorrectly)"
+      ]
+    }
+  },
+  "guidance_by_complexity": {
+    "Simple (Beginner)": {
+      "target_score": 3.5,
+      "characteristics": "1-2 techniques, 30 min or less, 5-10 ingredients, forgiving (hard to ruin), minimal equipment",
+      "focus": "Core technique execution (searing, sautéing), basic flavor balance (salt + acid + fat), one texture contrast, mise en place practice",
+      "examples": "Pan-seared chicken breast with lemon butter sauce, simple vinaigrette salad, roasted vegetables, pasta with garlic oil",
+      "scoring_priorities": [
+        "Technique Execution (proper heat, sensory cues)",
+        "Seasoning Strategy (multi-stage salting)",
+        "Troubleshooting (basic fixes for common problems)"
+      ]
+    },
+    "Moderate (Intermediate)": {
+      "target_score": 4.0,
+      "characteristics": "3-4 techniques, 45-90 min, 10-15 ingredients, some precision required, timing coordination needed",
+      "focus": "Multi-component dishes (protein + sauce + side), flavor layering (aromatics → seasoning → enriching), texture contrasts (2+), orchestration of timing",
+      "examples": "Pan-seared steak with pan sauce and roasted vegetables, risotto with sautéed mushrooms, braised chicken thighs with wine sauce",
+      "scoring_priorities": [
+        "Flavor Architecture (layered, balanced across elements)",
+        "Mise en Place & Workflow (timing coordination)",
+        "Texture & Contrast (intentional variety)"
+      ]
+    },
+    "Complex (Advanced)": {
+      "target_score": 4.3,
+      "characteristics": "5+ techniques, 2-4 hours or multi-day, 15+ ingredients, precision critical, advanced methods (sous vide, fermentation, emulsions)",
+      "focus": "Advanced techniques (emulsions, braising, confit), complex flavor profiles (5+ elements balanced), multi-course orchestration, plating finesse, cultural authenticity",
+      "examples": "Duck confit with orange gastrique and roasted root vegetables, beef Wellington, multi-course tasting menu, mole sauce from scratch",
+      "scoring_priorities": [
+        "Science & Rationale (deep understanding of mechanisms)",
+        "Cultural Context (authenticity and respect)",
+        "Plating & Presentation (restaurant-quality finesse)",
+        "All criteria should score 4+ for complex dishes"
+      ]
+    }
+  },
+  "common_failure_modes": [
+    {
+      "failure": "Single-stage seasoning (only at end)",
+      "symptom": "Dish tastes flat despite adding salt at end. Protein bland on inside. Sauce flavor sits 'on top' rather than integrated.",
+      "detection": "Check when salt is added. If only at end, that's the issue. Ask: 'Did you salt the protein before cooking? Did you season the aromatics?'",
+      "fix": "Multi-stage seasoning: (1) Salt proteins 30+ min before cooking (or dry brine), (2) season aromatics as they cook, (3) season sauce throughout cooking, (4) finish with flaky salt for texture. Salt penetrates over time, so early salting = deeper flavor."
+    },
+    {
+      "failure": "No texture contrast (monotonous mouthfeel)",
+      "symptom": "Dish is one texture throughout: all soft (mashed potatoes + braised meat + creamy sauce), all crunchy, or all mushy. Boring to eat despite good flavor.",
+      "detection": "Assess textures in dish. Is there variety? Crispy/creamy, hot/cold, soft/chewy, smooth/chunky contrasts present?",
+      "fix": "Add contrasting element: crispy shallots on creamy soup, toasted nuts on soft vegetables, crusty bread with tender braise, cold crème fraîche on hot soup, crunchy slaw with soft pulled pork. Add crispy elements at end to preserve crunch."
+    },
+    {
+      "failure": "Overcooking protein (dry, tough meat)",
+      "symptom": "Chicken breast dry, steak gray throughout, fish falling apart. Overcooked beyond target temp.",
+      "detection": "Check if thermometer used. What pull temp? Was carryover cooking (5-10°F rise) accounted for? Visual cues alone are unreliable.",
+      "fix": "Use instant-read thermometer. Pull 5-10°F below target: chicken breast 150-155°F (carries to 160°F), steak 125-130°F for medium-rare, fish 120-125°F. Rest protein 5-10 min (tented with foil). Prevent: Dry brine (salt 6-24h ahead) helps retain moisture."
+    },
+    {
+      "failure": "Flat flavor despite salt (missing acid or umami)",
+      "symptom": "Dish adequately salted but still tastes flat, one-dimensional, or boring. No brightness or depth.",
+      "detection": "Salt present, but dish lacks excitement? Likely missing acid (lemon, vinegar) or umami (parmesan, soy, mushroom, tomato). Ask: 'Is there acid to brighten? Umami for depth?'",
+      "fix": "Add acid first (½ tsp lemon juice or vinegar), taste. If still flat, add umami (1 Tbsp parmesan, ½ tsp soy sauce, or 1 tsp tomato paste). Fresh herbs or citrus zest also add aroma (perception of flavor)."
+    },
+    {
+      "failure": "Broken emulsion (sauce separates or curdles)",
+      "symptom": "Vinaigrette separates into oil and vinegar layers. Hollandaise curdles. Butter sauce breaks into greasy pool. Mayo turns to liquid.",
+      "detection": "Sauce looks separated, greasy, or curdled rather than smooth and cohesive. Did oil get added too fast? Was heat too high (hollandaise)? Was it over-whisked?",
+      "fix": "Rescue: Start with fresh emulsifier in clean bowl (egg yolk for hollandaise/mayo, mustard for vinaigrette). Slowly whisk broken sauce back in (dropwise at first, then faster). For butter sauce: lower heat, add cold butter gradually. Prevention: Add fat slowly, don't overheat."
+    },
+    {
+      "failure": "Overcrowding pan (steaming instead of searing)",
+      "symptom": "Protein gray and steamed rather than golden-brown seared. No crust, no fond in pan. Mushrooms release water and become soggy.",
+      "detection": "Check pan: Are items touching each other with no space? Is there steam visible? Food releases liquid and doesn't brown.",
+      "fix": "Leave space between items (at least ½ inch). Cook in batches if needed. Ensure pan is hot before adding food (water droplet test: droplet should sizzle and evaporate in 2-3 sec). Pat protein dry before searing. Use large pan or multiple pans for batch cooking."
+    },
+    {
+      "failure": "Poor mise en place (scrambling mid-cook)",
+      "symptom": "Chopping garlic while onions burn. Measuring spices while sauce reduces too much. Ingredients not ready when needed. High-heat cooking becomes chaotic.",
+      "detection": "Is cook scrambling to prep mid-execution? Are things burning or overcooking while other tasks are completed?",
+      "fix": "Prep all ingredients before heat is applied: dice all vegetables, measure all spices, prepare all aromatics, bring proteins to room temp. Arrange in cooking order (left to right). Read recipe completely before starting. For high-heat cooking (stir-fry, searing), absolutely everything must be prepped first."
+    },
+    {
+      "failure": "No carryover cooking consideration (overshooting target temp)",
+      "symptom": "Pulled protein at target temp (e.g., 130°F for medium-rare), but after resting it's 140°F (medium). Overshot doneness.",
+      "detection": "Check what temp protein was pulled vs. final temp after resting. Was it pulled at target or 5-10°F below?",
+      "fix": "Pull 5-10°F below target temp: steak at 125-130°F for 135°F final (medium-rare), chicken at 150-155°F for 160°F final. Larger/thicker proteins have more carryover (10°F), smaller have less (5°F). Rest 5-10 min for steaks/chops, 10-20 min for roasts. Temp continues to rise during rest."
+    },
+    {
+      "failure": "Dirty rim or overcrowded plate (poor plating)",
+      "symptom": "Sauce drips on rim, food overcrowded with no negative space, flat presentation (no height), messy appearance. Looks institutional or careless.",
+      "detection": "Look at plate: Is rim clean? Is there negative space (1-inch border)? Is there height, or is everything flat? Too many elements crowded?",
+      "fix": "Clean rim with damp towel before service. Leave 1-inch negative space around food. Build height (stack or lean components). Use odd numbers (3 or 5 items, not even). Restraint: showcase hero ingredient, don't overcrowd. Color contrast (green herb on brown meat). Finish with flaky salt, oil drizzle, fresh herb."
+    },
+    {
+      "failure": "Ignoring cultural context (technique without understanding)",
+      "symptom": "Cuisine treated as recipe collection without understanding flavor philosophy. Inauthentic 'fusion' without respect for traditions. Missing cultural significance.",
+      "detection": "Is origin explained? Traditional techniques honored? Regional variations noted? Or is it just 'Asian-inspired' or 'Mexican-style' without depth?",
+      "fix": "Provide cultural context: where dish originates, traditional techniques (and why they matter), flavor philosophy of cuisine (Thai balance sweet/sour/salty/spicy, Japanese umami layering). Note regional variations. If adapting, acknowledge trade-offs. Treat food as story, not just flavor."
+    }
+  ]
+}
--- a/skills/chef-assistant/resources/methodology.md
+++ b/skills/chef-assistant/resources/methodology.md
@@ -0,0 +1,357 @@
+# Chef Assistant - Advanced Methodology
+
+## 1. Food Science Fundamentals
+
+### Heat Transfer Methods
+
+**Conduction** (direct contact): Heat through molecular contact. Examples: Pan searing, grilling. Control: Surface temp (400-450°F for Maillard, 500°F+ for pizza). Preheat thoroughly, don't move food.
+
+**Convection** (fluid movement): Hot air/liquid circulates. Examples: Roasting, frying, boiling. Control: Oven/liquid temp, air circulation. Space items for airflow.
+
+**Radiation** (infrared): Energy as electromagnetic waves. Examples: Broiling, grilling over coals. Control: Distance from heat. Broil close for crust, far for gentle cooking.
+
+### Maillard Reaction
+
+**What**: Chemical reaction between amino acids and reducing sugars creating flavor compounds.
+
+**Requirements**: Temp 300-500°F (accelerates 310°F+), dry surface (moisture inhibits), slightly alkaline pH helps, time for complexity.
+
+**Applications**: Searing meat, roasting vegetables, toasting nuts/spices/bread, browning fond for sauces.
+
+**Optimization**: Dry brine proteins (draws moisture, concentrates surface proteins), high smoke-point fat (avocado, grapeseed, clarified butter), don't crowd pan (steam inhibits), let crust form before flipping.
+
+### Protein Denaturation
+
+**Process**: Heat causes proteins to unfold and bond, changing texture.
+
+**Temperature stages**: 120-130°F myosin denatures (rare steak), 140-150°F collagen shrinks squeezing water (drier), 160°F+ actin denatures (well-done, dry), 180°F+ collagen converts to gelatin (braising temp for tough cuts).
+
+**Implications**: Tender cuts cook to 125-145°F, tough cuts with collagen cook to 180°F+ (collagen → gelatin), fish 120-140°F (delicate), eggs 145°F whites set, 155°F yolks set, 180°F rubbery.
+
+**Techniques**: Brining (salt denatures proteins, allows water retention), marinades (acid denatures surface, can make mushy if too long), resting (proteins relax and reabsorb liquid).
+
+### Emulsions
+
+**Definition**: Suspension of one liquid in another (fat in water or vice versa).
+
+**Types**: Oil-in-water (vinaigrette, mayo, hollandaise), water-in-oil (butter).
+
+**Emulsifiers**: Lecithin (egg yolks, mustard), proteins (milk, egg whites), gums/starches.
+
+**Technique**: (1) Start with emulsifier, (2) add oil dropwise at first, (3) whisk vigorously, (4) can add faster once emulsion forms, (5) if breaks: start with fresh emulsifier, slowly whisk in broken emulsion.
+
+**Applications**: Vinaigrette (mustard emulsifier), mayonnaise (egg yolk), hollandaise (egg yolk + butter warm), pan sauce (butter emulsified into reduced liquid—monte au beurre), beurre blanc (butter into wine-vinegar reduction).
+
+### Starch Gelatinization
+
+**Process**: Starch granules absorb water and swell, thickening liquids.
+
+**Temp ranges**: 140-150°F granules begin swelling, 180-212°F full gelatinization (max thickness), 212°F+ boiling (granules rupture if over-stirred, losing thickness).
+
+**Applications**: Roux-based sauces (béchamel, velouté, gravy), corn starch slurries (stir-fries, pie fillings), rice/pasta cooking (starch releases into water, creates sauce body).
+
+**Technique**: Cold start for roux (cook flour in fat before liquid to avoid lumps), slurry for corn starch (mix with cold water before adding to hot), pasta water for sauce (starchy water emulsifies, helps cling).
+
+---
+
+## 2. Mother Sauces & Derivatives
+
+### Five Mother Sauces
+
+**1. Béchamel** (white): Milk + white roux. Ratio: 2 Tbsp roux per cup milk. Derivatives: Mornay (+ cheese), soubise (+ onions).
+
+**2. Velouté** (blond): Light stock (chicken/fish/veal) + blond roux. Ratio: 2 Tbsp roux per cup stock. Derivatives: Allemande (+ egg yolk + lemon), suprême (+ cream).
+
+**3. Espagnole** (brown): Brown stock + brown roux + tomato paste + mirepoix. Ratio: 3 Tbsp roux per cup stock. Derivatives: Demi-glace (+ reduced stock), bordelaise (+ red wine + shallots).
+
+**4. Tomato**: Tomatoes + aromatics + stock. Slow simmer to concentrate. Derivatives: Marinara (garlic + herbs), arrabbiata (+ chili).
+
+**5. Hollandaise** (emulsified butter): Egg yolks + melted butter + lemon. Gentle heat (double boiler), slow butter addition. Derivatives: Béarnaise (+ tarragon + shallots), maltaise (+ orange).
+
+### Modern Pan Sauce Formula
+
+**Template**: (1) Fond (brown bits after searing), (2) aromatics (shallots/garlic/thyme 30 sec), (3) deglaze (¼-½ cup wine/stock/juice, scrape fond, reduce by half), (4) enrich (swirl 1-2 Tbsp cold butter—monte au beurre), (5) finish (lemon juice + herbs + salt).
+
+**Variations**: Red wine sauce (red wine + beef stock + thyme), white wine sauce (white wine + chicken stock + tarragon), citrus sauce (OJ + stock + ginger), cream sauce (wine + stock → reduce → cream → reduce again).
+
+---
+
+## 3. Flavor Pairing Systems
+
+### The Flavor Mixing Board (10 Elements)
+
+**1. Salt**: Enhances all flavors, reduces bitterness. Types: Kosher (cooking), flaky (finishing), sea (complex). Dosage: 1-2% by weight. Timing: Multiple stages (proteins early, sauces throughout, finish at end).
+
+**2. Acid**: Brightens flavors, cuts richness, balances sweet. Types: Citrus (bright, aromatic), vinegar (clean, sharp), wine (complex), tomato (savory-acid). Dosage: Start ½ tsp, build. Timing: Add at end (heat dulls) or simmer to mellow.
+
+**3. Fat**: Carries aroma, creates body, balances acid/heat. Types: Butter (rich, milk solids), olive oil (fruity), cream (smooth), nuts (texture + fat). Timing: Toast spices in fat early, finish sauces with fat for sheen.
+
+**4. Heat/Spice**: Excitement, complexity. Types: Black pepper (sharp), chili (capsaicin), ginger (warm), wasabi (nasal). Dosage: Build slowly, can't remove. Timing: Toast early for depth, add raw for brightness.
+
+**5. Sweet**: Balances acid/bitter/spice, caramelization. Types: Sugar (clean), honey (floral), fruit (complex), mirin (savory-sweet). Dosage: Subtle in savory. Timing: Early for Maillard, late for balance.
+
+**6. Bitter**: Complexity, palate cleansing. Types: Charred vegetables, dark greens, coffee, dark chocolate. Dosage: Use restraint. Timing: Controlled charring, blanch to reduce bitterness.
+
+**7. Umami**: Savory depth, "fifth taste." Types: Glutamates (parmesan, tomato, soy), nucleotides (mushrooms, seafood, aged meats). Dosage: Layer multiple sources. Timing: Build throughout cooking.
+
+**8. Aroma**: 80% of flavor. Types: Herbs (fresh/dried), alliums (garlic/onion), citrus zest, spices. Dosage: Dried herbs early, fresh late. Timing: Fat-soluble (cook in oil), volatile (finish fresh).
+
+**9. Water/Liquid**: Carrier, dissolves, dilutes, creates texture. Types: Stock (umami), wine (acid + complexity), coconut milk (fat + sweet). Timing: Reduce to concentrate, add to thin.
+
+**10. Texture**: Mouthfeel, satisfaction. Types: Crispy, creamy, chewy, tender, crunchy, smooth. Dosage: At least one contrast per dish. Timing: Crispy elements added last (stay crisp).
+
+### Flavor Pairing Strategies
+
+**Complementary**: Ingredients share aroma compounds. Examples: Strawberry + basil (methyl cinnamate), chocolate + coffee (pyrazines), lamb + rosemary (terpenes).
+
+**Contrasting**: Opposing flavors create balance. Examples: Sweet + salty (salted caramel), rich + acid (fatty fish + lemon), spicy + sweet (Thai chili-lime-sugar).
+
+**Bridge ingredients**: Connect disparate flavors. Examples: Balsamic bridges strawberries and black pepper, miso bridges chocolate and berries, orange bridges beets and goat cheese.
+
+---
+
+## 4. Advanced Cooking Techniques
+
+### Dry-Brining
+
+**Principle**: Salt draws moisture out via osmosis, then protein reabsorbs with salt dissolved, seasoning deeply.
+
+**Benefits**: Even seasoning throughout, moisture retention during cooking, crispy skin (dry surface).
+
+**Method**: (1) Calculate 0.8-1.2% salt by weight (weigh protein × 0.01), (2) rub salt evenly, (3) place uncovered on rack in fridge, (4) wait (fish 30-90min, chicken/steak 6-24h, roasts 24-48h), (5) pat dry, cook (no additional salt).
+
+**Science**: Salt denatures surface proteins, breaking structure to hold more water.
+
+### Sous Vide
+
+**Principle**: Cook in vacuum-sealed bag in water bath at precise temp for even, edge-to-edge doneness.
+
+**Benefits**: Perfect doneness throughout, impossible to overcook (at target temp), tenderization over long times (tough cuts at 140-165°F for 24-72h).
+
+**Method**: (1) Season, seal in bag, (2) set water bath to target final temp (130°F for medium-rare), (3) cook (steaks 1-4h, chicken 1-3h, short ribs 48-72h), (4) sear after for crust (30 sec/side, screaming hot pan).
+
+**Limitations**: No Maillard crust (must sear after), requires equipment.
+
+### Confit
+
+**Principle**: Cook and preserve in fat at low temp (180-200°F) until tender, store in fat.
+
+**Traditional**: Duck confit (legs in duck fat 2-3h). **Modern**: Olive oil confit (garlic, tomatoes, fish at 180°F until tender).
+
+**Method**: (1) Season (salt, herbs), (2) submerge in fat, (3) cook low (180-200°F) until tender (1-3h duck, 30-60min fish/garlic), (4) cool in fat, store submerged (weeks in fridge), (5) reheat by crisping in oven.
+
+**Science**: Low temp tenderizes without drying, fat seals from oxygen (preservation).
+
+### Braising
+
+**Principle**: Sear for crust, then slow-cook partially submerged in liquid until collagen converts to gelatin.
+
+**Ideal for**: Tough cuts with collagen (short ribs, shanks, brisket, pork shoulder).
+
+**Method**: (1) Sear all sides (Maillard, fond), (2) sauté mirepoix, (3) deglaze with wine (scrape fond), (4) add stock (halfway up protein), herbs, cover, 275-325°F oven, (5) cook 2-4h until fork-tender (collagen → gelatin at 180°F+), (6) remove protein, reduce liquid to sauce.
+
+**Science**: Collagen breaks into gelatin at 180°F+, requires time. Sauce is rich and glossy.
+
+### Fermentation
+
+**Principle**: Microbes transform ingredients, creating complex flavors (umami, acid, funk).
+
+**Common applications**: Lacto-fermentation (vegetables + salt → lactic acid, sauerkraut/kimchi), koji fermentation (rice + *Aspergillus oryzae* → miso/soy sauce), wild fermentation (sourdough).
+
+**Quick ferment** (3-7 days): (1) Submerge vegetables in 2-3% salt brine (20-30g per liter water), (2) keep submerged (weight down), (3) room temp loosely covered, (4) taste daily (tangier over time), (5) refrigerate when desired.
+
+**Benefits**: Umami, probiotic, shelf-stable, complex acid.
+
+---
+
+## 5. Cultural Cooking Methods
+
+### French Classical
+
+**Philosophy**: Precision, technique mastery, sauce-centric, refinement.
+
+**Core techniques**: Mother sauces, mise en place rigor, knife skills (brunoise, julienne, chiffonade), stock-making (fond de cuisine).
+
+**Flavor approach**: Layered, technique-driven, butter and cream for richness, wine for acidity, fine herbs (tarragon, chervil, parsley).
+
+**Example**: Coq au vin (braise chicken in red wine, lardons, pearl onions, mushrooms).
+
+### Italian Rustic
+
+**Philosophy**: Simple ingredients, quality-driven, seasonal, carb-centric (pasta, bread, polenta).
+
+**Core techniques**: Pasta making and saucing (emulsify with pasta water), risotto (slow stock addition, constant stirring), braising (osso buco, ragu).
+
+**Flavor approach**: Minimal ingredients, let quality shine, acid from tomato/wine, olive oil foundation, parmigiano for umami, fresh basil/oregano.
+
+**Example**: Cacio e pepe (pasta, pecorino, black pepper, pasta water—emulsification).
+
+### Japanese Washoku
+
+**Philosophy**: Balance, seasonality, umami layering, presentation as art.
+
+**Core techniques**: Dashi (kombu + katsuobushi stock—umami base), delicate knife work (sashimi, vegetables), grilling (yakitori, robata), pickling (tsukemono).
+
+**Flavor approach**: Umami-forward (dashi, miso, soy), subtle seasoning, balance five colors/five tastes, mirin for sweet-savory.
+
+**Example**: Miso soup (dashi base + miso + tofu + wakame—umami layered, miso added at end to preserve probiotics).
+
+### Thai Balance
+
+**Philosophy**: Every dish balances four flavors (sweet/sour/salty/spicy), bright and aromatic, fresh herbs.
+
+**Core techniques**: Pounding curry pastes (mortar and pestle), stir-frying (high heat, fast), balancing nam prik (chili sauces).
+
+**Flavor approach**: Fish sauce (salty-umami), lime (sour), palm sugar (sweet), chili (spicy). Fresh herbs at end (Thai basil, cilantro, mint). Coconut milk for richness.
+
+**Example**: Tom yum (hot and sour soup—lemongrass, galangal, lime, chili, fish sauce, shrimp).
+
+### Mexican Layered Heat
+
+**Philosophy**: Chili-centric, complex heat (not just spicy), corn and beans foundation, bright acid.
+
+**Core techniques**: Toasting dried chilies (depth before rehydrating), masa preparation (nixtamalization—corn + lime), slow-cooked salsas (charred tomatillos, chilies).
+
+**Flavor approach**: Layer chili varieties (ancho=sweet, chipotle=smoky, habanero=bright heat), lime for acid, cilantro for aroma, cumin for earthy depth.
+
+**Example**: Mole (complex sauce—20+ ingredients, chilies, chocolate, spices, hours of toasting and blending).
+
+---
+
+## 6. Advanced Troubleshooting
+
+### When Salt Doesn't Fix Flat Flavors
+
+**Diagnosis tree**:
+1. **Missing acid**: Most common after salt. Add lemon/lime/vinegar (start ½ tsp).
+2. **Missing umami**: Add parmesan, soy, fish sauce, tomato paste, or mushroom powder.
+3. **Missing aroma**: Fresh herbs, citrus zest, toasted spices, garlic oil.
+4. **Over-reduced**: Flavor muted from too much reduction. Add fresh stock/water + rebalance.
+5. **Textural boredom**: Flavor fine, mouthfeel monotonous. Add crunch, creaminess, or temp contrast.
+
+### Over-Salted Beyond Saving
+
+**Actions**: (1) Dilute (add unsalted liquid—stock, water, coconut milk), (2) bulk up (more base ingredients—vegetables, beans, grains), (3) acid mask (lemon/vinegar can perceptually reduce saltiness—doesn't remove, but balances), (4) fat cushion (cream, butter, or coconut milk softens salt perception).
+
+**If still too salty**: Repurpose (use as concentrated component in larger dish—over-salted beans → burrito filling with rice and salsa), starch absorption (serve with rice, pasta, bread—absorbs and dilutes). **Potato myth**: Raw potato does NOT absorb salt (debunked). Only dilution works.
+
+### Burnt vs. Charred
+
+**Distinction**: Charred (controlled, adds complexity—charred peppers, blackened fish) vs. burnt (acrid, overwhelms—scorched garlic, blackened onions).
+
+**If accidentally burnt**: (1) Remove burnt bits (don't incorporate, discard fond if burnt), (2) rinse vegetables (if edges burnt but core fine), (3) balance with sweet + fat (honey/sugar + cream can mask mild bitterness), (4) char something else intentionally (grilled lemon, toasted nuts to make it seem intentional).
+
+**Prevention**: Lower heat, stir more, add liquid earlier.
+
+### Sauce Breaking
+
+**Causes**: Oil added too fast (didn't emulsify), temp too high (proteins coagulated—hollandaise), over-whisking (butter emulsion can break).
+
+**Rescue**: (1) Vinaigrette: fresh emulsifier (mustard), slowly whisk broken sauce in, (2) hollandaise: new egg yolk in clean bowl, slowly whisk broken sauce in (dropwise first), (3) butter sauce: lower heat, add cold butter gradually while whisking, don't boil, (4) mayo: new egg yolk, slowly rebuild with broken mayo as "oil".
+
+---
+
+## 7. Professional Kitchen Practices
+
+### Mise en Place Discipline
+
+**Philosophy**: "Everything in its place"—prep all before cooking starts.
+
+**Benefits**: No scrambling mid-cook (high-heat moves fast), accurate seasoning (measured in advance), clean workflow (focused on technique, not chopping).
+
+**Method**: (1) Read recipe completely, (2) list all ingredients and prep, (3) prep in order (longest tasks first—brines, marinades; then chopping; then measuring), (4) arrange in cooking order (left to right, or by timing), (5) prep more than needed (better extra minced garlic than run out mid-sauté).
+
+### Timing & Orchestration
+
+**Multi-dish timing**: (1) Backwards planning (start with service time, work backwards), (2) identify bottlenecks (what takes longest? what needs oven? start those first), (3) holding strategies (what can sit—braises, roasts rest; what can't—delicate fish, fried foods), (4) last-minute tasks (list for final 5 min—searing, plating, garnishing).
+
+**Example (3-dish dinner for 6:30 PM)**: 3:00 PM start braise (3h), 5:00 PM prep vegetables and set table, 5:45 PM roast vegetables (400°F, 30 min), 6:10 PM sear protein (8 min cook + 10 min rest), 6:20 PM reduce braising liquid to sauce and plate, 6:30 PM service.
+
+### Knife Skills for Efficiency
+
+**Core cuts**: Brunoise (1-2mm dice for mirepoix, garnish), small dice (3-4mm for aromatic base), medium dice (6-8mm for sides, soups), julienne (2mm × 2mm × 4-5cm matchstick for garnish, quick-cooking), chiffonade (ribbon for herbs, leafy greens—stack, roll, slice thin).
+
+**Speed techniques**: Claw grip (protect fingertips, knuckles guide blade), rocking motion (keep tip on board, rock blade), batch similar cuts (dice all onions at once), sharp knife = faster + safer (dull knife slips, requires force).
+
+### Tasting & Adjustment
+
+**When to taste**: (1) After each major ingredient addition, (2) after seasoning increments, (3) before service (final adjustment).
+
+**What to assess**: Salt level (enough? too much?), acid balance (bright enough? too sour?), fat/body (rich enough? too heavy?), texture (right consistency? need thickening/thinning?), temperature (taste at serving temp—affects perception).
+
+**Protocol**: (1) Identify primary deficiency (salt, acid, fat), (2) add small increment (¼ tsp salt, ½ tsp lemon), (3) stir, wait 30 sec (flavors distribute), (4) taste again, (5) repeat until balanced.
+
+---
+
+## 8. Seasonal Cooking Strategies
+
+### Spring: Bright, Fresh, Tender
+
+**Philosophy**: Celebrate delicate flavors, minimal cooking, acid-forward.
+
+**Key ingredients**: Asparagus, peas, ramps, artichokes, spring onions, baby greens, strawberries.
+
+**Techniques**: Blanching (bright green vegetables), quick sautés (preserve crunch), raw preparations (pea shoots, radishes).
+
+**Flavor profile**: Fresh herbs (dill, tarragon, mint), lemon, light vinaigrettes, butter-based sauces (not cream-heavy).
+
+### Summer: Bright, Grilled, Tomato-Centric
+
+**Philosophy**: Showcase peak produce, outdoor cooking, minimal interference.
+
+**Key ingredients**: Tomatoes, corn, zucchini, peppers, eggplant, stone fruit, berries.
+
+**Techniques**: Grilling (char + smoke), raw salads (peak flavor needs no cooking), quick pickling (preserve abundance).
+
+**Flavor profile**: Basil, cilantro, lime, chili, olive oil, balsamic, smoky notes.
+
+### Fall: Rich, Roasted, Earthy
+
+**Philosophy**: Comfort, depth, caramelization, warming spices.
+
+**Key ingredients**: Squash, root vegetables, Brussels sprouts, apples, pears, mushrooms, game meats.
+
+**Techniques**: Roasting (caramelize natural sugars), braising (slow-cooked comfort), stewing (rich, thick).
+
+**Flavor profile**: Sage, rosemary, thyme, cinnamon, nutmeg, brown butter, cider vinegar, red wine.
+
+### Winter: Hearty, Braised, Preserved
+
+**Philosophy**: Sustenance, long-cooked, umami-rich, pantry staples.
+
+**Key ingredients**: Cabbage, kale, celery root, citrus, dried beans, cured meats, preserved foods.
+
+**Techniques**: Braising (tough cuts → tender), slow-roasting (concentrate flavors), using preserved ingredients (ferments, canned tomatoes, dried mushrooms).
+
+**Flavor profile**: Bay leaf, peppercorns, citrus (brightens heavy dishes), red wine, stock-based sauces, fermented funk.
+
+---
+
+## 9. Menu Design Principles
+
+### Flavor Arc (Progression Through Courses)
+
+**Course 1 (Awaken)**: Wake up palate, stimulate appetite. Flavor: Acid-forward, fresh, bright. Example: Citrus salad, ceviche, oysters.
+
+**Course 2 (Explore)**: Build complexity, not too heavy. Flavor: Balanced, umami introduction. Example: Pasta with light sauce, seared fish with herb butter.
+
+**Course 3 (Peak)**: Richest, most complex flavors. Flavor: Umami-dominant, fat-rich, savory peak. Example: Braised short rib, duck confit, aged steak.
+
+**Course 4 (Refresh)**: Palate cleanser, prepare for dessert. Flavor: Acid or bitter (sorbet, cheese with bitter greens). Example: Lemon sorbet, arugula salad with aged cheese.
+
+**Course 5 (Satisfy)**: Sweet closure, not cloying. Flavor: Sweet with acid or bitter balance (citrus tart, dark chocolate). Example: Lemon tart, panna cotta with berry compote.
+
+### Textural, Temperature, & Method Diversity
+
+**Ensure variety across courses**:
+- **Texture**: Creamy → Tender → Crispy → Soft → Crunchy
+- **Temperature**: Cold → Hot → Hot → Cold → Room temp
+- **Method**: Raw → Roasted → Braised → Baked
+
+**Within single dish**: Soft protein + crispy garnish + creamy sauce + crunchy vegetable
+
+### Wine/Beverage Pairing Logic
+
+**Principles**: (1) Match weight (light food = light wine, rich food = full-bodied), (2) match or contrast intensity, (3) acid in wine cuts fat in food, (4) sweet in wine balances spice, (5) tannin pairs with protein and fat.
+
+**Classic pairings**: Oysters + Champagne (acid cuts brine, bubbles refresh), salmon + Pinot Noir (light red matches fatty fish), steak + Cabernet (tannin needs fat and protein), blue cheese + Port (sweet balances funk and salt).
--- a/skills/chef-assistant/resources/template.md
+++ b/skills/chef-assistant/resources/template.md
@@ -0,0 +1,296 @@
+# Chef Assistant - Templates
+
+## Workflow
+
+```
+Recipe Development Progress:
+- [ ] Define dish goal and flavor profile
+- [ ] Identify core techniques
+- [ ] Build flavor architecture
+- [ ] Plan texture contrasts
+- [ ] Create execution timeline
+- [ ] Design plating approach
+```
+
+---
+
+## Recipe Template
+
+### Recipe Name
+**Cuisine**: [Type] | **Difficulty**: [Level] | **Time**: [Active] / [Total] | **Serves**: [Number]
+
+**Flavor Profile**: [Salt/Acid/Fat/Heat/Sweet/Umami balance]
+**Texture**: [Crispy/Creamy, Hot/Cold contrasts]
+**Aroma**: [Dominant notes—garlic, herbs, spices]
+
+### Ingredients
+
+**Mise en place**:
+- [Ingredient] – [Amount] – [Prep: diced, minced, room temp]
+
+**Cooking**: [Fats, aromatics, main ingredients with purpose]
+**Finishing**: [Fresh herbs, acid, flaky salt, garnish]
+
+**Equipment**: [Skillet, thermometer, etc.]
+
+### Technique Breakdown
+
+1. **[Technique]** (e.g., Searing)
+   - **Why**: Maillard crust, fond
+   - **How**: High heat, dry surface, don't move
+   - **Cues**: Golden brown, nutty aroma, releases easily
+   - **Precision**: 400°F+ pan, 3-4 min/side
+
+### Instructions
+
+**Step 1: Prep ([X] min)**
+- Mise en place, bring proteins to room temp (30-60 min), preheat oven
+
+**Step 2: Build flavor base ([X] min)**
+- Heat fat, add aromatics until [sensory cue], toast spices until fragrant
+- **Science**: Aromatics release in fat, spices bloom
+
+**Step 3: Main cooking ([X] min)**
+- [Instruction with technique tip, sensory checkpoint, precision temp/time]
+
+**Step 4: Sauce/finish ([X] min)**
+- Deglaze, reduce, enrich with fat, balance salt → acid → fat
+
+**Step 5: Rest and plate ([X] min)**
+- Rest protein [time], plate using principles below
+
+### Flavor Troubleshooting
+
+| Problem | Fix |
+|---------|-----|
+| Too salty | Bulk/dilute + acid + fat |
+| Too sour | Fat or sweet + pinch salt |
+| Too spicy | Dairy or sweet + starch |
+| Flat | Salt first, then acid or umami |
+| Too greasy | Acid + salt + crunch |
+
+### Plating
+
+1. **Sauce**: Under / pool / swoosh / dots
+2. **Foundation**: Purée / grains (center or off-center)
+3. **Hero**: Protein (center, leaning, or stacked)
+4. **Accompaniments**: Sides at 3, 7 o'clock (odd numbers)
+5. **Garnish**: Fresh herbs, microgreens (top, restraint)
+6. **Finish**: Flaky salt, oil drizzle, citrus zest
+
+**Principles**: Color contrast, height, negative space (1" rim), odd numbers, clean rim
+
+### Cultural Context & Substitutions
+
+**Origin**: [Where from, traditional approach, significance]
+**Variations**: [Regional differences]
+**Substitutions**: If missing [ingredient] → [replacement at ratio], impact on flavor/texture
+**Scaling**: Up/down notes, shortcuts for weeknight
+
+---
+
+## Technique Breakdown Template
+
+### [Technique Name]
+
+**Category**: [Dry heat / Wet heat / Combination] | **Difficulty**: [Level] | **Equipment**: [Tools]
+
+**Why**: [Purpose—texture, flavor, appearance] | **When**: [Dish types] | **Science**: [Mechanism—Maillard, denaturation, etc.]
+
+**Temp range**: [Optimal] | **Time**: [How time affects outcome] | **Variables**: [Moisture, heat, agitation]
+
+### Execution
+
+**Prep**: [Ingredient prep, equipment setup, temp management]
+
+**Steps**:
+1. [Step with sensory cues: look, listen, smell, feel, precision temp]
+2. [Next step with checkpoints]
+
+**Finish**: [Completion indicators, resting/carryover]
+
+### Common Mistakes
+
+1. **[Error]**: Why happens → How to avoid → How to fix if occurs
+2. **[Error]**: Cause → Prevention → Correction
+
+**Practice**: Beginner ([simple ingredient]) → Intermediate ([challenging]) → Advanced ([complex])
+
+**Related techniques**: [Similar technique] (how differs, when to use)
+
+---
+
+## Flavor Troubleshooting Template
+
+### [Dish Name/Type]
+**Current**: [What's wrong—too salty, flat, bitter, greasy]
+**Target**: [What should it taste like]
+
+### Diagnosis
+
+**Primary imbalance**: [Salt/Acid/Fat/Heat/Sweet/Bitter/Umami/Aroma/Texture]
+**Cause**: [What likely caused this]
+
+**Balance assessment** (1-10):
+- Salt: [Level] | Acid: [Level] | Fat: [Level] | Heat: [Level]
+- Sweet: [Level] | Bitter: [Level] | Umami: [Level] | Aroma: [Level]
+
+**Target for dish**: [Ideal balance]
+
+### Corrections
+
+**Priority 1**: [Most important fix]
+- **Action**: [Ingredient + amount]
+- **Why**: [How this rebalances]
+- **Check**: Taste after
+
+**Priority 2-3**: [Additional adjustments if needed]
+
+### Prevention
+
+**Root cause**: [What caused imbalance]
+**Strategy**: [How to avoid—salt in stages, taste as you go, measurement]
+
+**Salvage** (if beyond fixing): Repurpose, dilute and rebalance, or pair with counterbalance
+
+---
+
+## Menu Planning Template
+
+### Event Details
+**Occasion**: [Type] | **Guests**: [Number, restrictions] | **Season**: [Season] | **Skill**: [Level] | **Kitchen**: [Setup]
+
+### Menu Structure
+
+**Course 1: Appetizer** ([Time])
+- **Dish**: [Name] | **Flavor**: Light, acidic, fresh | **Texture**: Crispy, refreshing | **Temp**: Cold/room
+- **Prep ahead**: [What can be prepped] | **Method**: [Avoid oven if needed later]
+
+**Course 2: Light main** ([Time])
+- **Dish**: [Name] | **Flavor**: Delicate, wine-friendly | **Texture**: Contrast with C1 | **Temp**: Hot
+- **Method**: Quick sauté or roast
+
+**Course 3: Entrée** ([Time])
+- **Dish**: [Name] | **Flavor**: Rich, savory, umami peak | **Texture**: Tender, substantial | **Temp**: Hot
+- **Method**: Slow roast/braise (frees attention)
+
+**Course 4: Dessert** ([Time])
+- **Dish**: [Name] | **Flavor**: Sweet with acid/mint | **Texture**: Contrast | **Temp**: Cold/room
+- **Prep**: 100% before guests
+
+### Progression Applied
+
+**Flavor**: Light → Heavy (acid early → rich middle → sweet light)
+**Texture**: Alternate (crispy → soft → tender → creamy)
+**Temp**: Cold → Hot → Hot → Cold
+**Method**: Raw → Roasted → Braised → Baked
+
+### Timing Plan
+
+**Day before**: [Braises, doughs, sauces]
+**Morning of**: [Chop, marinate, garnishes]
+**2h before**: [Long-cooking items]
+**30 min**: [Final cooking]
+**Service**: 6:00 aperitifs → 6:30 C1 → 7:00 C2 → 7:30 C3 → 8:15 C4
+
+**Kitchen flow**: [Oven/stove coordination, holding strategies, last-minute tasks]
+
+### Pairings & Contingencies
+
+**Wine**: C1 [sparkling/light white] → C2 [white/rosé] → C3 [red/full white] → C4 [dessert wine]
+**Backup**: [Strategies if dish fails, running late, new dietary restriction]
+
+---
+
+## Plating Guide Template
+
+### Style & Components
+
+**Style**: [Classic centered / Contemporary asymmetric / Rustic / Deconstructed]
+**Plate**: [Type, size]
+
+**Components**: 1) Hero, 2) Starch/base, 3) Vegetable, 4) Sauce, 5) Garnish
+
+### Plating Steps
+
+1. **Sauce**: [Under / pool / drizzle / swoosh / dots] with [tool]
+2. **Foundation**: [Purée / grains] at [center / off-center] as [quenelle / smear / mound]
+3. **Hero**: [Protein/main] at [position] with [angle/height]
+4. **Accompaniments**: [Sides] at [odd positions—3, 7 o'clock] in [odd numbers]
+5. **Garnish**: [Herbs / microgreens] at [top / scattered] with restraint
+6. **Finish**: Flaky salt, oil drizzle, citrus zest
+
+### Checklist
+
+- [ ] Color contrast (2+ colors)
+- [ ] Height (vertical element)
+- [ ] Negative space (1" rim border)
+- [ ] Odd numbers (3, 5, 7 not even)
+- [ ] Clean rim (wipe drips)
+- [ ] Balance (visual weight distributed)
+- [ ] Restraint (hero showcased, not crowded)
+
+**Sensory**: Visual impact, aroma at table (fresh herbs last), textural cues (visible crispy elements)
+
+**Avoid**: Overcrowding, flat presentation, dirty rim, even numbers, over-garnishing, center blob
+
+---
+
+## Quick Reference
+
+### Cooking Temps (Pull Temps for Carryover)
+
+| Protein | Doneness | Pull | Final (after rest) |
+|---------|----------|------|-------------------|
+| Beef/Lamb | Rare | 120°F | 125°F |
+| Beef/Lamb | Med-rare | 125-130°F | 130-135°F |
+| Beef/Lamb | Medium | 135-140°F | 140-145°F |
+| Pork | Medium | 135-140°F | 140-145°F |
+| Chicken breast | Juicy | 150-155°F | 155-160°F |
+| Chicken thigh | Tender | 175°F | 180°F |
+| Duck breast | Med-rare | 130-135°F | 135-140°F |
+| Fish (firm) | Moist | 125°F | 130°F |
+| Fish (delicate) | Moist | 120°F | 125°F |
+
+### Essential Ratios
+
+| Preparation | Ratio | Notes |
+|-------------|-------|-------|
+| Vinaigrette | 3:1 oil:acid | + mustard + salt, thin with water |
+| Pan sauce | ¼-½ cup liquid | Deglaze, reduce, swirl 1-2 Tbsp butter |
+| Quick pickle | 1:1:1 water:vinegar:sugar | + 2-3% salt |
+| Dry brine | 0.8-1.2% salt by weight | Fish 30-90min, chicken 6-24h, roast 24-48h |
+| Pasta water | 1% salt | 10g per liter (sea-salty) |
+| Roux | 1:1 by weight | Fat:flour, cook 2-10min for color |
+
+### Flavor Fixes
+
+| Problem | Primary | Secondary | Last Resort |
+|---------|---------|-----------|-------------|
+| Too salty | Bulk/dilute + acid | Fat | Sweet (tiny) |
+| Too sour | Fat or sweet | Reduce | Pinch salt |
+| Too spicy | Dairy | Sweet + starch | Dilute |
+| Too bitter | Salt + fat | Sweet or acid | Char something else |
+| Too sweet | Acid | Salt | Bitter element |
+| Greasy | Acid + salt | Crunch | Reduce further |
+| Flat | Salt first | Then acid or umami | Fat for body |
+
+### Texture Pairings
+
+| Primary | Contrast | Example |
+|---------|----------|---------|
+| Creamy | Crunchy | Soup + croutons, mash + fried onions |
+| Soft | Chewy | Fish + crusty bread, polenta + braised meat |
+| Tender | Crispy | Short rib + chip, chicken + crispy skin |
+| Smooth | Chunky | Purée + coarse nuts, hummus + falafel |
+| Hot | Cold | Warm pie + ice cream, grilled peach + burrata |
+
+### Substitutions
+
+**Fats**: Butter → Olive oil (75% amount, fruitier) | Butter → Ghee (nuttier, high heat) | Cream → Coconut cream (coconut note) | Cream → Greek yogurt (tangy, add off heat)
+
+**Acids**: Lemon ↔ Lime (1:1) | Lemon → White wine vinegar (sharper, less aroma) | Red wine vin → Balsamic (sweeter, use 50% then taste)
+
+**Umami**: Parmesan → Pecorino (saltier, use 75%) | Soy → Fish sauce (funkier, use 50-75%) | Soy → Miso paste (1 Tbsp miso ≈ 1-2 tsp soy) | Anchovies → Fish sauce (1 anchovy ≈ ½ tsp)
+
+**Alliums**: Onion ↔ Shallots (sweeter) | Garlic fresh → powder (1 clove ≈ ⅛ tsp) | Ginger fresh → ground (1 Tbsp ≈ ¼ tsp) | Scallions ↔ Chives
--- a/skills/code-data-analysis-scaffolds/SKILL.md
+++ b/skills/code-data-analysis-scaffolds/SKILL.md
@@ -0,0 +1,171 @@
+---
+name: code-data-analysis-scaffolds
+description: Use when starting technical work requiring structured approach - writing tests before code (TDD), planning data exploration (EDA), designing statistical analysis, clarifying modeling objectives (causal vs predictive), or validating results. Invoke when user mentions "write tests for", "explore this dataset", "analyze", "model", "validate", or when technical work needs systematic scaffolding before execution.
+---
+# Code Data Analysis Scaffolds
+
+## Table of Contents
+- [Purpose](#purpose)
+- [When to Use This Skill](#when-to-use-this-skill)
+- [When NOT to Use This Skill](#when-not-to-use-this-skill)
+- [What Is It?](#what-is-it)
+- [Workflow](#workflow)
+- [Scaffold Types](#scaffold-types)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+This skill provides structured scaffolds (frameworks, checklists, templates) for technical work in software engineering and data science. It helps you approach complex tasks systematically by defining what to do, in what order, and what to validate before proceeding.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+
+- **Write tests systematically** - TDD scaffolding, test suite design, test data generation
+- **Explore data rigorously** - EDA plans, data quality checks, feature analysis strategies
+- **Design statistical analyses** - A/B tests, causal inference, hypothesis testing frameworks
+- **Build predictive models** - Model selection, validation strategy, evaluation metrics
+- **Refactor with confidence** - Test coverage strategy, refactoring checklist, regression prevention
+- **Validate technical work** - Data validation, model evaluation, code quality checks
+- **Clarify technical approach** - Distinguish causal vs predictive goals, choose appropriate methods
+
+**Trigger phrases:**
+- "Write tests for [code/feature]"
+- "Explore this dataset"
+- "Analyze [data/problem]"
+- "Build a model to predict..."
+- "How should I validate..."
+- "Design an A/B test for..."
+- "What's the right approach to..."
+
+## When NOT to Use This Skill
+
+Skip this skill when:
+
+- **Just execute, don't plan** - User wants immediate code/analysis without scaffolding
+- **Scaffold already exists** - User has clear plan and just needs execution help
+- **Non-technical tasks** - Use appropriate skill for writing, planning, decision-making
+- **Simple one-liners** - No scaffold needed for trivial tasks
+- **Exploratory conversation** - User is brainstorming, not ready for structured approach yet
+
+## What Is It?
+
+Code Data Analysis Scaffolds provides structured frameworks for common technical patterns:
+
+1. **TDD Scaffold**: Given requirements, generate test structure before implementing code
+2. **EDA Scaffold**: Given dataset, create systematic exploration plan
+3. **Statistical Analysis Scaffold**: Given question, design appropriate statistical test/model
+4. **Validation Scaffold**: Given code/model/data, create comprehensive validation checklist
+
+**Quick example:**
+
+> **Task**: "Write authentication function"
+>
+> **TDD Scaffold**:
+> ```python
+> # Test structure (write these FIRST)
+> def test_valid_credentials():
+>     assert authenticate("user@example.com", "correct_pass") == True
+>
+> def test_invalid_password():
+>     assert authenticate("user@example.com", "wrong_pass") == False
+>
+> def test_nonexistent_user():
+>     assert authenticate("nobody@example.com", "any_pass") == False
+>
+> def test_empty_credentials():
+>     with pytest.raises(ValueError):
+>         authenticate("", "")
+>
+> # Now implement authenticate() to make tests pass
+> ```
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Code Data Analysis Scaffolds Progress:
+- [ ] Step 1: Clarify task and objectives
+- [ ] Step 2: Choose appropriate scaffold type
+- [ ] Step 3: Generate scaffold structure
+- [ ] Step 4: Validate scaffold completeness
+- [ ] Step 5: Deliver scaffold and guide execution
+```
+
+**Step 1: Clarify task and objectives**
+
+Ask user for the task, dataset/codebase context, constraints, and expected outcome. Determine if this is TDD (write tests first), EDA (explore data), statistical analysis (test hypothesis), or validation (check quality). See [resources/template.md](resources/template.md) for context questions.
+
+**Step 2: Choose appropriate scaffold type**
+
+Based on task, select scaffold: TDD (testing code), EDA (exploring data), Statistical Analysis (hypothesis testing, A/B tests), Causal Inference (estimating treatment effects), Predictive Modeling (building ML models), or Validation (checking quality). See [Scaffold Types](#scaffold-types) for guidance on choosing.
+
+**Step 3: Generate scaffold structure**
+
+Create systematic framework with clear steps, validation checkpoints, and expected outputs at each stage. For standard cases use [resources/template.md](resources/template.md); for advanced techniques see [resources/methodology.md](resources/methodology.md).
+
+**Step 4: Validate scaffold completeness**
+
+Check scaffold covers all requirements, includes validation steps, makes assumptions explicit, and provides clear success criteria. Self-assess using [resources/evaluators/rubric_code_data_analysis_scaffolds.json](resources/evaluators/rubric_code_data_analysis_scaffolds.json) - minimum score ≥3.5.
+
+**Step 5: Deliver scaffold and guide execution**
+
+Present scaffold with clear next steps. If user wants execution help, follow the scaffold systematically. If scaffold reveals gaps (missing data, unclear requirements), surface these before proceeding.
+
+## Scaffold Types
+
+### TDD (Test-Driven Development)
+**When**: Writing new code, refactoring existing code, fixing bugs
+**Output**: Test structure (test cases → implementation → refactor)
+**Key Elements**: Test cases covering happy path, edge cases, error conditions, test data setup
+
+### EDA (Exploratory Data Analysis)
+**When**: New dataset, data quality questions, feature engineering
+**Output**: Exploration plan (data overview → quality checks → univariate → bivariate → insights)
+**Key Elements**: Data shape/types, missing values, distributions, outliers, correlations
+
+### Statistical Analysis
+**When**: Hypothesis testing, A/B testing, comparing groups
+**Output**: Analysis design (question → hypothesis → test selection → assumptions → interpretation)
+**Key Elements**: Null/alternative hypotheses, significance level, power analysis, assumption checks
+
+### Causal Inference
+**When**: Estimating treatment effects, understanding causation not just correlation
+**Output**: Causal design (DAG → identification strategy → estimation → sensitivity analysis)
+**Key Elements**: Confounders, treatment/control groups, identification assumptions, effect estimation
+
+### Predictive Modeling
+**When**: Building ML models, forecasting, classification/regression tasks
+**Output**: Modeling pipeline (data prep → feature engineering → model selection → validation → evaluation)
+**Key Elements**: Train/val/test split, baseline model, metrics selection, cross-validation, error analysis
+
+### Validation
+**When**: Checking data quality, code quality, model quality before deployment
+**Output**: Validation checklist (assertions → edge cases → integration tests → monitoring)
+**Key Elements**: Acceptance criteria, test coverage, error handling, boundary conditions
+
+## Guardrails
+
+- **Clarify before scaffolding** - Don't guess what user needs; ask clarifying questions first
+- **Distinguish causal vs predictive** - Causal inference needs different methods than prediction (RCT/IV vs ML)
+- **Make assumptions explicit** - Every scaffold has assumptions (data distribution, user behavior, system constraints)
+- **Include validation steps** - Scaffold should include checkpoints to validate work at each stage
+- **Provide examples** - Show what good looks like (sample test, sample EDA visualization, sample model evaluation)
+- **Surface gaps early** - If scaffold reveals missing data/requirements, flag immediately
+- **Avoid premature optimization** - Start with simple scaffold, add complexity only if needed
+- **Follow best practices** - TDD: test first, EDA: start with data quality, Modeling: baseline before complex models
+
+## Quick Reference
+
+| Task Type | When to Use | Scaffold Resource |
+|-----------|-------------|-------------------|
+| **TDD** | Writing/refactoring code | [resources/template.md](resources/template.md) #tdd-scaffold |
+| **EDA** | Exploring new dataset | [resources/template.md](resources/template.md) #eda-scaffold |
+| **Statistical Analysis** | Hypothesis testing, A/B tests | [resources/template.md](resources/template.md) #statistical-analysis-scaffold |
+| **Causal Inference** | Treatment effect estimation | [resources/methodology.md](resources/methodology.md) #causal-inference-methods |
+| **Predictive Modeling** | ML model building | [resources/methodology.md](resources/methodology.md) #predictive-modeling-pipeline |
+| **Validation** | Quality checks before shipping | [resources/template.md](resources/template.md) #validation-scaffold |
+| **Examples** | See what good looks like | [resources/examples/](resources/examples/) |
+| **Rubric** | Validate scaffold quality | [resources/evaluators/rubric_code_data_analysis_scaffolds.json](resources/evaluators/rubric_code_data_analysis_scaffolds.json) |
--- a/skills/code-data-analysis-scaffolds/resources/evaluators/rubric_code_data_analysis_scaffolds.json
+++ b/skills/code-data-analysis-scaffolds/resources/evaluators/rubric_code_data_analysis_scaffolds.json
@@ -0,0 +1,314 @@
+{
+  "criteria": [
+    {
+      "name": "Scaffold Structure Clarity",
+      "description": "Is the scaffold structure clear, systematic, and easy to follow?",
+      "scoring": {
+        "1": "No clear structure. Random collection of steps/checks without logical flow.",
+        "2": "Basic structure but steps are vague or out of order. User confused about what to do next.",
+        "3": "Clear structure with defined steps. User can follow but may need clarification on some steps.",
+        "4": "Well-organized structure with clear steps, checkpoints, and expected outputs at each stage.",
+        "5": "Exemplary structure: systematic, numbered steps with clear inputs/outputs, decision points explicit."
+      },
+      "red_flags": [
+        "Steps not numbered or sequenced",
+        "No clear starting/ending point",
+        "Validation steps missing",
+        "User must guess what to do next"
+      ]
+    },
+    {
+      "name": "Coverage Completeness",
+      "description": "Does the scaffold cover all necessary aspects (happy path, edge cases, validation, etc.)?",
+      "scoring": {
+        "1": "Major gaps. Only covers happy path, ignores edge cases/errors/validation.",
+        "2": "Partial coverage. Addresses main case but misses important edge cases or validation steps.",
+        "3": "Adequate coverage. Main cases and some edge cases covered. Basic validation included.",
+        "4": "Comprehensive coverage. Happy path, edge cases, error conditions, validation all included.",
+        "5": "Exhaustive coverage. All cases, validation at each step, robustness checks, limitations documented."
+      },
+      "red_flags": [
+        "TDD scaffold: No tests for edge cases or errors",
+        "EDA scaffold: Missing data quality checks",
+        "Statistical scaffold: No assumption checks",
+        "Any scaffold: No validation step before delivering"
+      ]
+    },
+    {
+      "name": "Technical Rigor",
+      "description": "Is the approach technically sound with appropriate methods/tests?",
+      "scoring": {
+        "1": "Technically incorrect. Wrong methods, flawed logic, or inappropriate techniques.",
+        "2": "Questionable rigor. Some techniques correct but others questionable or missing justification.",
+        "3": "Adequate rigor. Standard techniques applied correctly. Acceptable for routine work.",
+        "4": "High rigor. Appropriate methods, assumptions checked, sensitivity analysis included.",
+        "5": "Exemplary rigor. Best practices followed, multiple validation approaches, limitations acknowledged."
+      },
+      "red_flags": [
+        "Causal inference without DAG or identification strategy",
+        "Statistical test without checking assumptions",
+        "ML model without train/val/test split (data leakage)",
+        "TDD without testing error conditions"
+      ]
+    },
+    {
+      "name": "Actionability",
+      "description": "Can user execute scaffold without further guidance? Are examples concrete?",
+      "scoring": {
+        "1": "Not actionable. Vague advice, no concrete steps, no code examples.",
+        "2": "Somewhat actionable. General direction but user needs to figure out details.",
+        "3": "Actionable. Clear steps with code snippets. User can execute with minor adjustments.",
+        "4": "Highly actionable. Complete code examples, data assumptions stated, ready to adapt.",
+        "5": "Immediately executable. Copy-paste ready examples with inline comments, expected outputs shown."
+      },
+      "red_flags": [
+        "No code examples (just prose descriptions)",
+        "Code has placeholders without explaining what to fill in",
+        "No example inputs/outputs",
+        "Vague instructions ('check assumptions', 'validate results' without saying how)"
+      ]
+    },
+    {
+      "name": "Test Quality (for TDD)",
+      "description": "For TDD scaffolds: Do tests cover happy path, edge cases, errors, and integration?",
+      "scoring": {
+        "1": "Only happy path tests. No edge cases, errors, or integration tests.",
+        "2": "Happy path + some edge cases. Error handling or integration missing.",
+        "3": "Happy path, edge cases, basic error tests. Integration tests may be missing.",
+        "4": "Comprehensive: Happy path, edge cases, error conditions, integration tests all present.",
+        "5": "Exemplary: Above + property-based tests, test fixtures, mocks for external dependencies."
+      },
+      "red_flags": [
+        "No tests for None/empty input",
+        "No tests for expected exceptions",
+        "No tests for state changes/side effects",
+        "No integration tests for external systems"
+      ],
+      "applicable_to": ["TDD"]
+    },
+    {
+      "name": "Data Quality Assessment (for EDA)",
+      "description": "For EDA scaffolds: Are data quality checks (missing, duplicates, outliers, consistency) included?",
+      "scoring": {
+        "1": "No data quality checks. Jumps straight to analysis without inspecting data.",
+        "2": "Minimal checks. Maybe checks missing values but ignores duplicates, outliers, consistency.",
+        "3": "Basic quality checks. Missing values, duplicates, basic outliers checked.",
+        "4": "Thorough quality checks. Missing patterns, duplicates, outliers, type consistency, referential integrity.",
+        "5": "Comprehensive quality framework. All checks + distributions, cardinality, data lineage, validation rules."
+      },
+      "red_flags": [
+        "No check for missing values",
+        "No check for duplicates",
+        "No outlier detection",
+        "Assumes data is clean without validation"
+      ],
+      "applicable_to": ["EDA", "Statistical Analysis", "Predictive Modeling"]
+    },
+    {
+      "name": "Assumption Documentation",
+      "description": "Are assumptions explicitly stated and justified?",
+      "scoring": {
+        "1": "No assumptions stated. User unaware of what's being assumed.",
+        "2": "Some assumptions implicit but not documented. User must infer them.",
+        "3": "Key assumptions stated but not justified or validated.",
+        "4": "Assumptions explicitly stated with justification. User knows what's assumed and why.",
+        "5": "Assumptions stated, justified, validated where possible, and sensitivity to violations analyzed."
+      },
+      "red_flags": [
+        "Statistical test applied without stating/checking assumptions",
+        "Causal claim without stating identification assumptions",
+        "ML model without documenting train/test split assumptions",
+        "Function implementation without stating preconditions"
+      ]
+    },
+    {
+      "name": "Validation Steps Included",
+      "description": "Does scaffold include validation/quality checks before delivering results?",
+      "scoring": {
+        "1": "No validation. Results delivered without any quality checks.",
+        "2": "Informal validation. 'Looks good' without systematic checks.",
+        "3": "Basic validation. Some checks but not comprehensive or systematic.",
+        "4": "Systematic validation. Checklist of quality criteria, most items checked.",
+        "5": "Rigorous validation framework. Multiple validation approaches, robustness checks, edge cases tested."
+      },
+      "red_flags": [
+        "No validation step in workflow",
+        "No rubric or checklist to assess quality",
+        "No test suite execution before delivering code",
+        "No sensitivity analysis for statistical results"
+      ]
+    },
+    {
+      "name": "Code/Analysis Quality",
+      "description": "Is code well-structured, readable, and following best practices?",
+      "scoring": {
+        "1": "Poor quality. Spaghetti code, no structure, hard to understand.",
+        "2": "Low quality. Works but hard to read, poor naming, no comments.",
+        "3": "Adequate quality. Readable, basic structure, some comments. Acceptable for prototypes.",
+        "4": "Good quality. Clean code, good naming, appropriate comments, follows style guide.",
+        "5": "Excellent quality. Modular, DRY, well-documented, type hints, follows SOLID principles."
+      },
+      "red_flags": [
+        "Magic numbers without explanation",
+        "Copy-pasted code (not DRY)",
+        "Functions doing multiple unrelated things",
+        "No docstrings or comments explaining complex logic"
+      ]
+    },
+    {
+      "name": "Reproducibility",
+      "description": "Can another person reproduce the analysis/tests with provided information?",
+      "scoring": {
+        "1": "Not reproducible. Missing critical information (data, packages, random seeds).",
+        "2": "Partially reproducible. Some information provided but key details missing.",
+        "3": "Mostly reproducible. Enough information for skilled practitioner to reproduce with effort.",
+        "4": "Reproducible. All information provided (data access, package versions, random seeds, parameters).",
+        "5": "Fully reproducible. Documented environment, requirements.txt, Docker container, or notebook with all steps."
+      },
+      "red_flags": [
+        "No package versions specified",
+        "Random operations without setting seed",
+        "Data source not documented or inaccessible",
+        "No instructions for running tests/analysis"
+      ]
+    }
+  ],
+  "task_type_guidance": {
+    "TDD": {
+      "description": "Test-Driven Development scaffolds",
+      "focus_criteria": [
+        "Test Quality",
+        "Code/Analysis Quality",
+        "Validation Steps Included"
+      ],
+      "target_score": 3.5,
+      "success_indicators": [
+        "Tests written before implementation",
+        "Happy path, edge cases, errors all tested",
+        "Tests pass and are maintainable",
+        "Red-Green-Refactor cycle followed"
+      ]
+    },
+    "EDA": {
+      "description": "Exploratory Data Analysis scaffolds",
+      "focus_criteria": [
+        "Data Quality Assessment",
+        "Coverage Completeness",
+        "Assumption Documentation"
+      ],
+      "target_score": 3.5,
+      "success_indicators": [
+        "Data quality systematically checked",
+        "Univariate and bivariate analysis completed",
+        "Insights and recommendations documented",
+        "Missing values, outliers, distributions analyzed"
+      ]
+    },
+    "Statistical Analysis": {
+      "description": "Hypothesis testing, A/B tests, causal inference",
+      "focus_criteria": [
+        "Technical Rigor",
+        "Assumption Documentation",
+        "Validation Steps Included"
+      ],
+      "target_score": 4.0,
+      "success_indicators": [
+        "Hypotheses clearly stated",
+        "Appropriate test selected and justified",
+        "Assumptions checked (normality, independence, etc.)",
+        "Effect sizes and confidence intervals reported",
+        "Sensitivity analysis performed"
+      ]
+    },
+    "Predictive Modeling": {
+      "description": "ML model building and evaluation",
+      "focus_criteria": [
+        "Technical Rigor",
+        "Validation Steps Included",
+        "Reproducibility"
+      ],
+      "target_score": 4.0,
+      "success_indicators": [
+        "Train/val/test split before preprocessing (no data leakage)",
+        "Baseline model for comparison",
+        "Cross-validation performed",
+        "Error analysis and feature importance computed",
+        "Model deployment checklist completed"
+      ]
+    },
+    "Validation": {
+      "description": "Data/code/model quality checks",
+      "focus_criteria": [
+        "Coverage Completeness",
+        "Validation Steps Included",
+        "Technical Rigor"
+      ],
+      "target_score": 4.0,
+      "success_indicators": [
+        "Schema validation (types, ranges, constraints)",
+        "Referential integrity checked",
+        "Edge cases tested",
+        "Monitoring/alerting strategy defined"
+      ]
+    }
+  },
+  "common_failure_modes": [
+    {
+      "failure_mode": "Jumping to Implementation Without Scaffold",
+      "symptoms": "User writes code/analysis immediately without planning structure first.",
+      "consequences": "Missing edge cases, poor test coverage, incomplete analysis.",
+      "fix": "Force scaffold creation before implementation. Use template as checklist."
+    },
+    {
+      "failure_mode": "Testing Only Happy Path",
+      "symptoms": "TDD scaffold has tests for expected usage but none for errors/edge cases.",
+      "consequences": "Code breaks in production on unexpected inputs.",
+      "fix": "Require tests for: empty input, None, boundary values, invalid types, expected exceptions."
+    },
+    {
+      "failure_mode": "Skipping Data Quality Checks",
+      "symptoms": "EDA scaffold jumps to visualization without checking missing values, outliers, duplicates.",
+      "consequences": "Invalid conclusions based on dirty data.",
+      "fix": "Mandatory data quality section before any analysis. No exceptions."
+    },
+    {
+      "failure_mode": "Assumptions Not Documented",
+      "symptoms": "Statistical test applied without stating/checking assumptions (normality, independence, etc.).",
+      "consequences": "Invalid statistical inference. Wrong conclusions.",
+      "fix": "Explicit assumption section in scaffold. Check assumptions before applying test."
+    },
+    {
+      "failure_mode": "No Validation Step",
+      "symptoms": "Scaffold delivers results without any quality check or self-assessment.",
+      "consequences": "Low-quality outputs, errors not caught.",
+      "fix": "Mandatory validation step in workflow. Use rubric self-assessment."
+    },
+    {
+      "failure_mode": "Correlation Interpreted as Causation",
+      "symptoms": "EDA finds correlation, claims causal relationship without causal inference methods.",
+      "consequences": "Wrong business decisions based on spurious causality.",
+      "fix": "Distinguish predictive (correlation) from causal questions. Use causal inference methodology if claiming causation."
+    },
+    {
+      "failure_mode": "Data Leakage in ML",
+      "symptoms": "Preprocessing (scaling, imputation) done before train/test split.",
+      "consequences": "Overly optimistic model performance. Fails in production.",
+      "fix": "Scaffold enforces: split first, then preprocess. Fit transformers on train only."
+    },
+    {
+      "failure_mode": "Code Without Tests",
+      "symptoms": "Implementation provided but no test scaffold or test execution.",
+      "consequences": "Regressions not caught, bugs in production.",
+      "fix": "TDD scaffold mandatory for production code. Tests must pass before code review."
+    }
+  ],
+  "scale": 5,
+  "minimum_average_score": 3.5,
+  "interpretation": {
+    "1.0-2.0": "Inadequate. Major gaps in structure, coverage, or rigor. Do not use. Revise scaffold.",
+    "2.0-3.0": "Needs improvement. Basic structure present but incomplete or lacks rigor. Acceptable for learning/practice only.",
+    "3.0-3.5": "Acceptable. Covers main cases with adequate rigor. Suitable for routine work or prototypes.",
+    "3.5-4.0": "Good. Comprehensive coverage with good rigor. Suitable for production code/analysis.",
+    "4.0-5.0": "Excellent. Exemplary structure, rigor, and completeness. Production-ready with best practices."
+  }
+}
--- a/skills/code-data-analysis-scaffolds/resources/examples/eda-customer-churn.md
+++ b/skills/code-data-analysis-scaffolds/resources/examples/eda-customer-churn.md
@@ -0,0 +1,272 @@
+# EDA Example: Customer Churn Analysis
+
+Complete exploratory data analysis for telecom customer churn dataset.
+
+## Task
+
+Explore customer churn dataset to understand:
+- What factors correlate with churn?
+- Are there data quality issues?
+- What features should we engineer for predictive model?
+
+## Dataset
+
+- **Rows**: 7,043 customers
+- **Target**: `Churn` (Yes/No)
+- **Features**: 20 columns (demographics, account info, usage patterns)
+
+## EDA Scaffold Applied
+
+### 1. Data Overview
+
+```python
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+
+df = pd.read_csv('telecom_churn.csv')
+
+print(f"Shape: {df.shape}")
+# Output: (7043, 21)
+
+print(f"Columns: {df.columns.tolist()}")
+# ['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
+#  'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
+#  'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
+#  'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
+#  'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn']
+
+print(df.dtypes)
+# customerID        object
+# gender            object
+# SeniorCitizen      int64
+# tenure             int64
+# MonthlyCharges   float64
+# TotalCharges      object  ← Should be numeric!
+# Churn             object
+
+print(df.head())
+print(df.describe())
+```
+
+**Findings**:
+- TotalCharges is object type (should be numeric) - needs fixing
+- Churn is target variable (26.5% churn rate)
+
+### 2. Data Quality Checks
+
+```python
+# Missing values
+missing = df.isnull().sum()
+missing_pct = (missing / len(df)) * 100
+print(missing_pct[missing_pct > 0])
+# No missing values marked as NaN
+
+# But TotalCharges is object - check for empty strings
+print((df['TotalCharges'] == ' ').sum())
+# Output: 11 rows have space instead of number
+
+# Fix: Convert TotalCharges to numeric
+df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
+print(df['TotalCharges'].isnull().sum())
+# Output: 11 (now properly marked as missing)
+
+# Strategy: Drop 11 rows (< 0.2% of data)
+df = df.dropna()
+
+# Duplicates
+print(f"Duplicates: {df.duplicated().sum()}")
+# Output: 0
+
+# Data consistency checks
+print("Tenure vs TotalCharges consistency:")
+print(df[['tenure', 'MonthlyCharges', 'TotalCharges']].head())
+# tenure=1, Monthly=$29, Total=$29 ✓
+# tenure=34, Monthly=$57, Total=$1889 ≈ $57*34 ✓
+```
+
+**Findings**:
+- 11 rows (0.16%) with missing TotalCharges - dropped
+- No duplicates
+- TotalCharges ≈ MonthlyCharges × tenure (consistent)
+
+### 3. Univariate Analysis
+
+```python
+# Target variable
+print(df['Churn'].value_counts(normalize=True))
+# No     73.5%
+# Yes    26.5%
+
+# Imbalanced but not severely (>20% minority class is workable)
+
+# Numeric variables
+numeric_cols = ['tenure', 'MonthlyCharges', 'TotalCharges']
+for col in numeric_cols:
+    print(f"\n{col}:")
+    print(f"  Mean: {df[col].mean():.2f}, Median: {df[col].median():.2f}")
+    print(f"  Std: {df[col].std():.2f}, Range: [{df[col].min()}, {df[col].max()}]")
+
+    # Histogram
+    df[col].hist(bins=50, edgecolor='black')
+    plt.title(f'{col} Distribution')
+    plt.xlabel(col)
+    plt.show()
+
+    # Check outliers
+    Q1, Q3 = df[col].quantile([0.25, 0.75])
+    IQR = Q3 - Q1
+    outliers = ((df[col] < (Q1 - 1.5*IQR)) | (df[col] > (Q3 + 1.5*IQR))).sum()
+    print(f"  Outliers: {outliers} ({outliers/len(df)*100:.1f}%)")
+```
+
+**Findings**:
+- **tenure**: Right-skewed (mean=32, median=29). Many new customers (0-12 months).
+- **MonthlyCharges**: Bimodal distribution (peaks at ~$20 and ~$80). Suggests customer segments.
+- **TotalCharges**: Right-skewed (correlated with tenure). Few outliers (2.3%).
+
+```python
+# Categorical variables
+cat_cols = ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'Contract', 'PaymentMethod']
+for col in cat_cols:
+    print(f"\n{col}: {df[col].nunique()} unique values")
+    print(df[col].value_counts())
+
+    # Bar plot
+    df[col].value_counts().plot(kind='bar')
+    plt.title(f'{col} Distribution')
+    plt.xticks(rotation=45)
+    plt.show()
+```
+
+**Findings**:
+- **gender**: Balanced (50/50 male/female)
+- **SeniorCitizen**: 16% are senior citizens
+- **Contract**: 55% month-to-month, 24% one-year, 21% two-year
+- **PaymentMethod**: Electronic check most common (34%)
+
+### 4. Bivariate Analysis (Churn vs Features)
+
+```python
+# Churn rate by categorical variables
+for col in cat_cols:
+    churn_rate = df.groupby(col)['Churn'].apply(lambda x: (x=='Yes').mean())
+    print(f"\n{col} vs Churn:")
+    print(churn_rate.sort_values(ascending=False))
+
+    # Stacked bar chart
+    pd.crosstab(df[col], df['Churn'], normalize='index').plot(kind='bar', stacked=True)
+    plt.title(f'Churn Rate by {col}')
+    plt.ylabel('Proportion')
+    plt.show()
+```
+
+**Key Findings**:
+- **Contract**: Month-to-month churn=42.7%, One-year=11.3%, Two-year=2.8% (Strong signal!)
+- **SeniorCitizen**: Seniors churn=41.7%, Non-seniors=23.6%
+- **PaymentMethod**: Electronic check=45.3% churn, others~15-18%
+- **tenure**: Customers with tenure<12 months churn=47.5%, >60 months=7.9%
+
+```python
+# Numeric variables vs Churn
+for col in numeric_cols:
+    plt.figure(figsize=(10, 4))
+
+    # Box plot
+    plt.subplot(1, 2, 1)
+    df.boxplot(column=col, by='Churn')
+    plt.title(f'{col} by Churn')
+
+    # Histogram (overlay)
+    plt.subplot(1, 2, 2)
+    df[df['Churn']=='No'][col].hist(bins=30, alpha=0.5, label='No Churn', density=True)
+    df[df['Churn']=='Yes'][col].hist(bins=30, alpha=0.5, label='Churn', density=True)
+    plt.legend()
+    plt.xlabel(col)
+    plt.title(f'{col} Distribution by Churn')
+    plt.show()
+```
+
+**Key Findings**:
+- **tenure**: Churned customers have lower tenure (mean=18 vs 38 months)
+- **MonthlyCharges**: Churned customers pay MORE ($74 vs $61/month)
+- **TotalCharges**: Churned customers have lower total (correlated with tenure)
+
+```python
+# Correlation matrix
+numeric_df = df[['tenure', 'MonthlyCharges', 'TotalCharges', 'SeniorCitizen']].copy()
+numeric_df['Churn_binary'] = (df['Churn'] == 'Yes').astype(int)
+
+corr = numeric_df.corr()
+plt.figure(figsize=(8, 6))
+sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
+plt.title('Correlation Matrix')
+plt.show()
+```
+
+**Key Findings**:
+- tenure ↔ TotalCharges: 0.83 (strong positive correlation - expected)
+- Churn ↔ tenure: -0.35 (negative: longer tenure → less churn)
+- Churn ↔ MonthlyCharges: +0.19 (positive: higher charges → more churn)
+- Churn ↔ TotalCharges: -0.20 (negative: driven by tenure)
+
+### 5. Insights & Recommendations
+
+```python
+print("\n=== KEY FINDINGS ===")
+print("1. Data Quality:")
+print("   - 11 rows (<0.2%) dropped due to missing TotalCharges")
+print("   - No other quality issues. Data is clean.")
+print("")
+print("2. Churn Patterns:")
+print("   - Overall churn rate: 26.5% (slightly imbalanced)")
+print("   - Strongest predictor: Contract type (month-to-month 42.7% vs two-year 2.8%)")
+print("   - High-risk segment: New customers (<12mo tenure) with high monthly charges")
+print("   - Low churn: Long-term customers (>60mo) on two-year contracts")
+print("")
+print("3. Feature Importance:")
+print("   - **High signal**: Contract, tenure, PaymentMethod, SeniorCitizen")
+print("   - **Medium signal**: MonthlyCharges, InternetService")
+print("   - **Low signal**: gender, PhoneService (balanced across churn/no-churn)")
+print("")
+print("\n=== RECOMMENDED ACTIONS ===")
+print("1. Feature Engineering:")
+print("   - Create 'tenure_bucket' (0-12mo, 12-24mo, 24-60mo, >60mo)")
+print("   - Create 'high_charges' flag (MonthlyCharges > $70)")
+print("   - Interaction: tenure × Contract (captures switching cost)")
+print("   - Payment risk score (Electronic check is risky)")
+print("")
+print("2. Model Strategy:")
+print("   - Use all categorical features (one-hot encode)")
+print("   - Baseline: Predict churn for month-to-month + new customers")
+print("   - Advanced: Random Forest or Gradient Boosting (handle interactions)")
+print("   - Validate with stratified 5-fold CV (preserve 26.5% churn rate)")
+print("")
+print("3. Business Insights:")
+print("   - **Retention program**: Target month-to-month customers < 12mo tenure")
+print("   - **Contract incentives**: Offer discounts for one/two-year contracts")
+print("   - **Payment method**: Encourage auto-pay (reduce electronic check)")
+print("   - **Early warning**: Monitor customers with high MonthlyCharges + short tenure")
+```
+
+### 6. Self-Assessment
+
+Using rubric:
+
+- **Clarity** (5/5): Systematic exploration, clear findings at each stage
+- **Completeness** (5/5): Data quality, univariate, bivariate, insights all covered
+- **Rigor** (5/5): Proper statistical analysis, visualizations, quantified relationships
+- **Actionability** (5/5): Specific feature engineering and business recommendations
+
+**Average**: 5.0/5 ✓
+
+This EDA provides solid foundation for predictive modeling and business action.
+
+## Next Steps
+
+1. **Feature engineering**: Implement recommended features
+2. **Baseline model**: Logistic regression with top 5 features
+3. **Advanced models**: Random Forest, XGBoost with feature interactions
+4. **Evaluation**: F1-score, precision/recall curves, AUC-ROC
+5. **Deployment**: Real-time churn scoring API
--- a/skills/code-data-analysis-scaffolds/resources/examples/tdd-authentication.md
+++ b/skills/code-data-analysis-scaffolds/resources/examples/tdd-authentication.md
@@ -0,0 +1,226 @@
+# TDD Example: User Authentication
+
+Complete TDD example showing test-first development for authentication function.
+
+## Task
+
+Build a `validate_login(username, password)` function that:
+- Returns `True` for valid credentials
+- Returns `False` for invalid password
+- Raises `ValueError` for missing username/password
+- Raises `User Not FoundError` for nonexistent users
+- Logs failed attempts
+
+## Step 1: Write Tests FIRST
+
+```python
+# test_auth.py
+import pytest
+from auth import validate_login, UserNotFoundError
+
+# HAPPY PATH
+def test_valid_credentials():
+    """User with correct password should authenticate"""
+    assert validate_login("alice@example.com", "SecurePass123!") == True
+
+# EDGE CASES
+def test_empty_username():
+    """Empty username should raise ValueError"""
+    with pytest.raises(ValueError, match="Username required"):
+        validate_login("", "password")
+
+def test_empty_password():
+    """Empty password should raise ValueError"""
+    with pytest.raises(ValueError, match="Password required"):
+        validate_login("alice@example.com", "")
+
+def test_none_credentials():
+    """None values should raise ValueError"""
+    with pytest.raises(ValueError):
+        validate_login(None, None)
+
+# ERROR CONDITIONS
+def test_invalid_password():
+    """Wrong password should return False"""
+    assert validate_login("alice@example.com", "WrongPassword") == False
+
+def test_nonexistent_user():
+    """User not in database should raise UserNotFoundError"""
+    with pytest.raises(UserNotFoundError):
+        validate_login("nobody@example.com", "anypassword")
+
+def test_case_sensitive_password():
+    """Password check should be case-sensitive"""
+    assert validate_login("alice@example.com", "securepass123!") == False
+
+# STATE/SIDE EFFECTS
+def test_failed_attempt_logged(caplog):
+    """Failed login should be logged"""
+    validate_login("alice@example.com", "WrongPassword")
+    assert "Failed login attempt" in caplog.text
+    assert "alice@example.com" in caplog.text
+
+def test_successful_login_logged(caplog):
+    """Successful login should be logged"""
+    validate_login("alice@example.com", "SecurePass123!")
+    assert "Successful login" in caplog.text
+
+# INTEGRATION TEST
+@pytest.fixture
+def mock_database():
+    """Mock database with test users"""
+    return {
+        "alice@example.com": {
+            "password_hash": "hashed_SecurePass123!",
+            "salt": "random_salt_123"
+        }
+    }
+
+def test_database_integration(mock_database, monkeypatch):
+    """Function should query database correctly"""
+    def mock_get_user(username):
+        return mock_database.get(username)
+
+    monkeypatch.setattr("auth.get_user_from_db", mock_get_user)
+    result = validate_login("alice@example.com", "SecurePass123!")
+    assert result == True
+```
+
+## Step 2: Run Tests (They Should FAIL - Red)
+
+```bash
+$ pytest test_auth.py
+FAILED - ModuleNotFoundError: No module named 'auth'
+```
+
+## Step 3: Write Minimal Implementation (Green)
+
+```python
+# auth.py
+import logging
+import hashlib
+
+logger = logging.getLogger(__name__)
+
+class UserNotFoundError(Exception):
+    pass
+
+def validate_login(username, password):
+    # Input validation
+    if not username:
+        raise ValueError("Username required")
+    if not password:
+        raise ValueError("Password required")
+
+    # Get user from database
+    user = get_user_from_db(username)
+    if user is None:
+        raise UserNotFoundError(f"User {username} not found")
+
+    # Hash password and compare
+    password_hash = hash_password(password, user['salt'])
+    is_valid = (password_hash == user['password_hash'])
+
+    # Log attempt
+    if is_valid:
+        logger.info(f"Successful login for {username}")
+    else:
+        logger.warning(f"Failed login attempt for {username}")
+
+    return is_valid
+
+def get_user_from_db(username):
+    # Stub - implement database query
+    users = {
+        "alice@example.com": {
+            "password_hash": hash_password("SecurePass123!", "random_salt_123"),
+            "salt": "random_salt_123"
+        }
+    }
+    return users.get(username)
+
+def hash_password(password, salt):
+    # Simplified - use bcrypt/argon2 in production
+    return hashlib.sha256(f"{password}{salt}".encode()).hexdigest()
+```
+
+## Step 4: Run Tests Again (Should PASS - Green)
+
+```bash
+$ pytest test_auth.py -v
+test_valid_credentials PASSED
+test_empty_username PASSED
+test_empty_password PASSED
+test_none_credentials PASSED
+test_invalid_password PASSED
+test_nonexistent_user PASSED
+test_case_sensitive_password PASSED
+test_failed_attempt_logged PASSED
+test_successful_login_logged PASSED
+test_database_integration PASSED
+
+========== 10 passed in 0.15s ==========
+```
+
+## Step 5: Refactor (Keep Tests Green)
+
+```python
+# auth.py (refactored for readability)
+class AuthenticationService:
+    def __init__(self, user_repo, password_hasher):
+        self.user_repo = user_repo
+        self.password_hasher = password_hasher
+        self.logger = logging.getLogger(__name__)
+
+    def validate_login(self, username, password):
+        self._validate_inputs(username, password)
+        user = self._get_user(username)
+        is_valid = self._check_password(password, user)
+        self._log_attempt(username, is_valid)
+        return is_valid
+
+    def _validate_inputs(self, username, password):
+        if not username:
+            raise ValueError("Username required")
+        if not password:
+            raise ValueError("Password required")
+
+    def _get_user(self, username):
+        user = self.user_repo.get_by_username(username)
+        if user is None:
+            raise UserNotFoundError(f"User {username} not found")
+        return user
+
+    def _check_password(self, password, user):
+        password_hash = self.password_hasher.hash(password, user.salt)
+        return password_hash == user.password_hash
+
+    def _log_attempt(self, username, is_valid):
+        if is_valid:
+            self.logger.info(f"Successful login for {username}")
+        else:
+            self.logger.warning(f"Failed login attempt for {username}")
+```
+
+Tests still pass after refactoring!
+
+## Key Takeaways
+
+1. **Tests written FIRST** define expected behavior
+2. **Minimal implementation** to make tests pass
+3. **Refactor** with confidence (tests catch regressions)
+4. **Comprehensive coverage**: happy path, edge cases, errors, side effects, integration
+5. **Fast feedback**: Know immediately if something breaks
+
+## Self-Assessment
+
+Using rubric:
+
+- **Clarity** (5/5): Requirements clearly defined by tests
+- **Completeness** (5/5): All cases covered (happy, edge, error, integration)
+- **Rigor** (5/5): TDD cycle followed (Red → Green → Refactor)
+- **Actionability** (5/5): Tests are executable specification
+
+**Average**: 5.0/5 ✓
+
+This is production-ready test-first code.
--- a/skills/code-data-analysis-scaffolds/resources/methodology.md
+++ b/skills/code-data-analysis-scaffolds/resources/methodology.md
@@ -0,0 +1,470 @@
+# Code Data Analysis Scaffolds Methodology
+
+Advanced techniques for causal inference, predictive modeling, property-based testing, and complex data analysis.
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Code Data Analysis Scaffolds Progress:
+- [ ] Step 1: Clarify task and objectives
+- [ ] Step 2: Choose appropriate scaffold type
+- [ ] Step 3: Generate scaffold structure
+- [ ] Step 4: Validate scaffold completeness
+- [ ] Step 5: Deliver scaffold and guide execution
+```
+
+**Step 1: Clarify task** - Assess complexity and determine if advanced techniques needed. See [1. When to Use Advanced Methods](#1-when-to-use-advanced-methods).
+
+**Step 2: Choose scaffold** - Select from Causal Inference, Predictive Modeling, Property-Based Testing, or Advanced EDA. See specific sections below.
+
+**Step 3: Generate structure** - Apply advanced scaffold matching task complexity. See [2. Causal Inference Methods](#2-causal-inference-methods), [3. Predictive Modeling Pipeline](#3-predictive-modeling-pipeline), [4. Property-Based Testing](#4-property-based-testing), [5. Advanced EDA Techniques](#5-advanced-eda-techniques).
+
+**Step 4: Validate** - Check assumptions, sensitivity analysis, robustness checks using [6. Advanced Validation Patterns](#6-advanced-validation-patterns).
+
+**Step 5: Deliver** - Present with caveats, limitations, and recommendations for further analysis.
+
+## 1. When to Use Advanced Methods
+
+| Task Characteristic | Standard Template | Advanced Methodology |
+|---------------------|-------------------|---------------------|
+| **Causal question** | "Does X correlate with Y?" | "Does X cause Y?" → Causal inference needed |
+| **Sample size** | < 1000 rows | > 10K rows with complex patterns |
+| **Model complexity** | Linear/logistic regression | Ensemble methods, neural nets, feature interactions |
+| **Test sophistication** | Unit tests, integration tests | Property-based tests, mutation testing, fuzz testing |
+| **Data complexity** | Clean, tabular data | Multi-modal, high-dimensional, unstructured data |
+| **Stakes** | Low (exploratory) | High (production ML, regulatory compliance) |
+
+## 2. Causal Inference Methods
+
+Use when research question is "Does X **cause** Y?" not just "Are X and Y correlated?"
+
+### Causal Inference Scaffold
+
+```python
+# CAUSAL INFERENCE SCAFFOLD
+
+# 1. DRAW CAUSAL DAG (Directed Acyclic Graph)
+# Explicitly model: Treatment → Outcome, Confounders → Treatment & Outcome
+#
+#  Example:
+#    Education → Income
+#         ↑          ↑
+#    Family Background
+#
+# Treatment: Education
+# Outcome: Income
+# Confounder: Family Background (affects both education and income)
+
+# 2. IDENTIFY CONFOUNDERS
+confounders = ['age', 'gender', 'family_income', 'region']
+# These variables affect BOTH treatment and outcome
+# If not controlled, they bias causal estimate
+
+# 3. CHECK IDENTIFICATION ASSUMPTIONS
+# For causal effect to be identifiable:
+# a) No unmeasured confounders (all variables in DAG observed)
+# b) Treatment assignment as-if random conditional on confounders
+# c) Positivity: Every unit has nonzero probability of treatment/control
+
+# 4. CHOOSE IDENTIFICATION STRATEGY
+
+# Option A: RCT - Random assignment eliminates confounding. Check balance on confounders.
+from scipy import stats
+for var in confounders:
+    _, p = stats.ttest_ind(treatment_group[var], control_group[var])
+    print(f"{var}: {'✓' if p > 0.05 else '✗'} balanced")
+
+# Option B: Regression - Control for confounders. Assumes no unmeasured confounding.
+import statsmodels.formula.api as smf
+model = smf.ols('outcome ~ treatment + age + gender + family_income', data=df).fit()
+treatment_effect = model.params['treatment']
+
+# Option C: Propensity Score Matching - Match treated to similar controls on P(treatment|X).
+from sklearn.linear_model import LogisticRegression; from sklearn.neighbors import NearestNeighbors
+ps_model = LogisticRegression().fit(df[confounders], df['treatment'])
+df['ps'] = ps_model.predict_proba(df[confounders])[:,1]
+treated, control = df[df['treatment']==1], df[df['treatment']==0]
+nn = NearestNeighbors(n_neighbors=1).fit(control[['ps']])
+_, indices = nn.kneighbors(treated[['ps']])
+treatment_effect = treated['outcome'].mean() - control.iloc[indices.flatten()]['outcome'].mean()
+
+# Option D: IV - Need instrument Z: affects treatment, not outcome (except through treatment).
+from statsmodels.sandbox.regression.gmm import IV2SLS
+iv_model = IV2SLS(df['income'], df[['education'] + confounders], df[['instrument'] + confounders]).fit()
+
+# Option E: RDD - Treatment assigned at cutoff. Compare units just above/below threshold.
+df['above_cutoff'] = (df['running_var'] >= cutoff).astype(int)
+# Use local linear regression around cutoff to estimate effect
+
+# Option F: DiD - Compare treatment vs control, before vs after. Assumes parallel trends.
+t_before, t_after = df[(df['group']=='T') & (df['time']=='before')]['y'].mean(), df[(df['group']=='T') & (df['time']=='after')]['y'].mean()
+c_before, c_after = df[(df['group']=='C') & (df['time']=='before')]['y'].mean(), df[(df['group']=='C') & (df['time']=='after')]['y'].mean()
+did_estimate = (t_after - t_before) - (c_after - c_before)
+
+# 5. SENSITIVITY ANALYSIS
+print("\n=== SENSITIVITY CHECKS ===")
+print("1. Unmeasured confounding: How strong would confounder need to be to change conclusion?")
+print("2. Placebo tests: Check for effect in period before treatment (should be zero)")
+print("3. Falsification tests: Check for effect on outcome that shouldn't be affected")
+print("4. Robustness: Try different model specifications, subsamples, bandwidths (RDD)")
+
+# 6. REPORT CAUSAL ESTIMATE WITH UNCERTAINTY
+print(f"\nCausal Effect: {treatment_effect:.3f}")
+print(f"95% CI: [{ci_lower:.3f}, {ci_upper:.3f}]")
+print(f"Interpretation: Treatment X causes {treatment_effect:.1%} change in outcome Y")
+print(f"Assumptions: [List key identifying assumptions]")
+print(f"Limitations: [Threats to validity]")
+```
+
+### Causal Inference Checklist
+
+- [ ] **Causal question clearly stated**: "Does X cause Y?" not "Are X and Y related?"
+- [ ] **DAG drawn**: Treatment, outcome, confounders, mediators identified
+- [ ] **Identification strategy chosen**: RCT, regression, PS matching, IV, RDD, DiD
+- [ ] **Assumptions checked**: No unmeasured confounding, positivity, parallel trends (DiD), etc.
+- [ ] **Sensitivity analysis**: Test robustness to violations of assumptions
+- [ ] **Limitations acknowledged**: Threats to internal/external validity stated
+
+## 3. Predictive Modeling Pipeline
+
+Use for forecasting, classification, regression - when goal is prediction not causal understanding.
+
+### Predictive Modeling Scaffold
+
+```python
+# PREDICTIVE MODELING SCAFFOLD
+
+# 1. DEFINE PREDICTION TASK & METRIC
+task = "Predict customer churn (binary classification)"
+primary_metric = "F1-score"  # Balance precision/recall
+secondary_metrics = ["AUC-ROC", "precision", "recall", "accuracy"]
+
+# 2. TRAIN/VAL/TEST SPLIT (before any preprocessing!)
+from sklearn.model_selection import train_test_split
+
+# Split: 60% train, 20% validation, 20% test
+train_val, test = train_test_split(df, test_size=0.2, random_state=42, stratify=df['target'])
+train, val = train_test_split(train_val, test_size=0.25, random_state=42, stratify=train_val['target'])
+
+print(f"Train: {len(train)}, Val: {len(val)}, Test: {len(test)}")
+print(f"Class balance - Train: {train['target'].mean():.2%}, Test: {test['target'].mean():.2%}")
+
+# 3. FEATURE ENGINEERING (fit on train, transform train/val/test)
+from sklearn.preprocessing import StandardScaler
+from sklearn.impute import SimpleImputer
+
+# Numeric features: impute missing, standardize
+numeric_features = ['age', 'income', 'tenure']
+num_imputer = SimpleImputer(strategy='median').fit(train[numeric_features])
+num_scaler = StandardScaler().fit(num_imputer.transform(train[numeric_features]))
+
+X_train_num = num_scaler.transform(num_imputer.transform(train[numeric_features]))
+X_val_num = num_scaler.transform(num_imputer.transform(val[numeric_features]))
+X_test_num = num_scaler.transform(num_imputer.transform(test[numeric_features]))
+
+# Categorical features: one-hot encode
+from sklearn.preprocessing import OneHotEncoder
+cat_features = ['region', 'product_type']
+cat_encoder = OneHotEncoder(handle_unknown='ignore', sparse=False).fit(train[cat_features])
+
+X_train_cat = cat_encoder.transform(train[cat_features])
+X_val_cat = cat_encoder.transform(val[cat_features])
+X_test_cat = cat_encoder.transform(test[cat_features])
+
+# Combine features
+import numpy as np
+X_train = np.hstack([X_train_num, X_train_cat])
+X_val = np.hstack([X_val_num, X_val_cat])
+X_test = np.hstack([X_test_num, X_test_cat])
+y_train, y_val, y_test = train['target'], val['target'], test['target']
+
+# 4. BASELINE MODEL (always start simple!)
+from sklearn.dummy import DummyClassifier
+baseline = DummyClassifier(strategy='most_frequent').fit(X_train, y_train)
+baseline_f1 = f1_score(y_val, baseline.predict(X_val))
+print(f"Baseline F1: {baseline_f1:.3f}")
+
+# 5. MODEL SELECTION & HYPERPARAMETER TUNING
+from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
+from sklearn.linear_model import LogisticRegression
+from sklearn.model_selection import GridSearchCV
+from sklearn.metrics import f1_score, roc_auc_score
+
+# Try multiple models
+models = {
+    'Logistic Regression': LogisticRegression(max_iter=1000),
+    'Random Forest': RandomForestClassifier(random_state=42),
+    'Gradient Boosting': GradientBoostingClassifier(random_state=42)
+}
+
+results = {}
+for name, model in models.items():
+    model.fit(X_train, y_train)
+    y_pred = model.predict(X_val)
+    y_proba = model.predict_proba(X_val)[:,1]
+
+    results[name] = {
+        'F1': f1_score(y_val, y_pred),
+        'AUC': roc_auc_score(y_val, y_proba),
+        'Precision': precision_score(y_val, y_pred),
+        'Recall': recall_score(y_val, y_pred)
+    }
+    print(f"{name}: F1={results[name]['F1']:.3f}, AUC={results[name]['AUC']:.3f}")
+
+# Select best model (highest F1 on validation)
+best_model_name = max(results, key=lambda x: results[x]['F1'])
+best_model = models[best_model_name]
+print(f"\nBest model: {best_model_name}")
+
+# Hyperparameter tuning on best model
+if best_model_name == 'Random Forest':
+    param_grid = {
+        'n_estimators': [100, 200, 300],
+        'max_depth': [10, 20, None],
+        'min_samples_split': [2, 5, 10]
+    }
+    grid_search = GridSearchCV(best_model, param_grid, cv=5, scoring='f1', n_jobs=-1)
+    grid_search.fit(X_train, y_train)
+    best_model = grid_search.best_estimator_
+    print(f"Best params: {grid_search.best_params_}")
+
+# 6. CROSS-VALIDATION (check for overfitting)
+from sklearn.model_selection import cross_val_score
+cv_scores = cross_val_score(best_model, X_train, y_train, cv=5, scoring='f1')
+print(f"CV F1 scores: {cv_scores}")
+print(f"Mean: {cv_scores.mean():.3f}, Std: {cv_scores.std():.3f}")
+
+# 7. FINAL EVALUATION ON TEST SET (only once!)
+y_test_pred = best_model.predict(X_test)
+y_test_proba = best_model.predict_proba(X_test)[:,1]
+
+test_f1 = f1_score(y_test, y_test_pred)
+test_auc = roc_auc_score(y_test, y_test_proba)
+print(f"\n=== FINAL TEST PERFORMANCE ===")
+print(f"F1: {test_f1:.3f}, AUC: {test_auc:.3f}")
+
+# 8. ERROR ANALYSIS
+from sklearn.metrics import confusion_matrix, classification_report
+print("\nConfusion Matrix:")
+print(confusion_matrix(y_test, y_test_pred))
+print("\nClassification Report:")
+print(classification_report(y_test, y_test_pred))
+
+# Analyze misclassifications
+test_df = test.copy()
+test_df['prediction'] = y_test_pred
+test_df['prediction_proba'] = y_test_proba
+false_positives = test_df[(test_df['target']==0) & (test_df['prediction']==1)]
+false_negatives = test_df[(test_df['target']==1) & (test_df['prediction']==0)]
+print(f"False Positives: {len(false_positives)}")
+print(f"False Negatives: {len(false_negatives)}")
+# Inspect these cases to understand failure modes
+
+# 9. FEATURE IMPORTANCE
+if hasattr(best_model, 'feature_importances_'):
+    feature_names = numeric_features + list(cat_encoder.get_feature_names_out(cat_features))
+    importances = pd.DataFrame({
+        'feature': feature_names,
+        'importance': best_model.feature_importances_
+    }).sort_values('importance', ascending=False)
+    print("\nTop 10 Features:")
+    print(importances.head(10))
+
+# 10. MODEL DEPLOYMENT CHECKLIST
+print("\n=== DEPLOYMENT READINESS ===")
+print(f"✓ Test F1 ({test_f1:.3f}) > Baseline ({baseline_f1:.3f})")
+print(f"✓ Cross-validation shows consistent performance (CV std={cv_scores.std():.3f})")
+print("✓ Error analysis completed, failure modes understood")
+print("✓ Feature importance computed, no surprising features")
+print("□ Model serialized and saved")
+print("□ Monitoring plan in place (track drift in input features, output distribution)")
+print("□ Rollback plan if model underperforms in production")
+```
+
+### Predictive Modeling Checklist
+
+- [ ] **Clear prediction task**: Classification, regression, time series forecasting
+- [ ] **Appropriate metrics**: Match business objectives (precision vs recall tradeoff, etc.)
+- [ ] **Train/val/test split**: Before any preprocessing (no data leakage)
+- [ ] **Baseline model**: Simple model for comparison
+- [ ] **Feature engineering**: Proper handling of missing values, scaling, encoding
+- [ ] **Cross-validation**: k-fold CV to check for overfitting
+- [ ] **Model selection**: Compare multiple model types
+- [ ] **Hyperparameter tuning**: Grid/random search on validation set
+- [ ] **Error analysis**: Understand failure modes, inspect misclassifications
+- [ ] **Test set evaluation**: Final performance check (only once!)
+- [ ] **Deployment readiness**: Monitoring, rollback plan, model versioning
+
+## 4. Property-Based Testing
+
+Use for testing complex logic, data transformations, invariants. Goes beyond example-based tests.
+
+### Property-Based Testing Scaffold
+
+```python
+# PROPERTY-BASED TESTING SCAFFOLD
+from hypothesis import given, strategies as st
+import pytest
+
+# Example: Testing a sort function
+def my_sort(lst):
+    return sorted(lst)
+
+# Property 1: Output length equals input length
+@given(st.lists(st.integers()))
+def test_sort_preserves_length(lst):
+    assert len(my_sort(lst)) == len(lst)
+
+# Property 2: Output is sorted (each element <= next element)
+@given(st.lists(st.integers()))
+def test_sort_is_sorted(lst):
+    result = my_sort(lst)
+    for i in range(len(result) - 1):
+        assert result[i] <= result[i+1]
+
+# Property 3: Output contains same elements as input (multiset equality)
+@given(st.lists(st.integers()))
+def test_sort_preserves_elements(lst):
+    result = my_sort(lst)
+    assert sorted(lst) == sorted(result)  # Canonical form comparison
+
+# Property 4: Idempotence (sorting twice = sorting once)
+@given(st.lists(st.integers()))
+def test_sort_is_idempotent(lst):
+    result = my_sort(lst)
+    assert my_sort(result) == result
+
+# Property 5: Empty input → empty output
+def test_sort_empty_list():
+    assert my_sort([]) == []
+
+# Property 6: Single element → unchanged
+@given(st.integers())
+def test_sort_single_element(x):
+    assert my_sort([x]) == [x]
+```
+
+### Property-Based Testing Strategies
+
+**For data transformations:**
+- Idempotence: `f(f(x)) == f(x)`
+- Round-trip: `decode(encode(x)) == x`
+- Commutativity: `f(g(x)) == g(f(x))`
+- Invariants: Properties that never change (e.g., sum after transformation)
+
+**For numeric functions:**
+- Boundary conditions: Zero, negative, very large numbers
+- Inverse relationships: `f(f_inverse(x)) ≈ x`
+- Known identities: `sin²(x) + cos²(x) = 1`
+
+**For string/list operations:**
+- Length preservation or predictable change
+- Character/element preservation
+- Order properties (sorted, reversed)
+
+## 5. Advanced EDA Techniques
+
+For high-dimensional, multi-modal, or complex data.
+
+### Dimensionality Reduction
+
+```python
+# PCA: Linear dimensionality reduction
+from sklearn.decomposition import PCA
+pca = PCA(n_components=2)
+X_pca = pca.fit_transform(X_scaled)
+print(f"Explained variance: {pca.explained_variance_ratio_}")
+
+# t-SNE: Non-linear, good for visualization
+from sklearn.manifold import TSNE
+tsne = TSNE(n_components=2, perplexity=30, random_state=42)
+X_tsne = tsne.fit_transform(X_scaled)
+plt.scatter(X_tsne[:,0], X_tsne[:,1], c=y, cmap='viridis'); plt.show()
+
+# UMAP: Faster alternative to t-SNE, preserves global structure
+# pip install umap-learn
+import umap
+reducer = umap.UMAP(n_components=2, random_state=42)
+X_umap = reducer.fit_transform(X_scaled)
+```
+
+### Cluster Analysis
+
+```python
+from sklearn.cluster import KMeans, DBSCAN
+from sklearn.metrics import silhouette_score
+
+# Elbow method: Find optimal K
+inertias = []
+for k in range(2, 11):
+    kmeans = KMeans(n_clusters=k, random_state=42)
+    kmeans.fit(X_scaled)
+    inertias.append(kmeans.inertia_)
+plt.plot(range(2, 11), inertias); plt.xlabel('K'); plt.ylabel('Inertia'); plt.show()
+
+# Silhouette score: Measure cluster quality
+for k in range(2, 11):
+    kmeans = KMeans(n_clusters=k, random_state=42).fit(X_scaled)
+    score = silhouette_score(X_scaled, kmeans.labels_)
+    print(f"K={k}: Silhouette={score:.3f}")
+
+# DBSCAN: Density-based clustering (finds arbitrary shapes)
+dbscan = DBSCAN(eps=0.5, min_samples=5)
+clusters = dbscan.fit_predict(X_scaled)
+print(f"Clusters found: {len(set(clusters)) - (1 if -1 in clusters else 0)}")
+print(f"Noise points: {(clusters == -1).sum()}")
+```
+
+## 6. Advanced Validation Patterns
+
+### Mutation Testing
+
+Tests the quality of your tests by introducing bugs and checking if tests catch them.
+
+```python
+# Install: pip install mutmut
+# Run: mutmut run --paths-to-mutate=src/
+# Check: mutmut results
+# Survivors (mutations not caught) indicate weak tests
+```
+
+### Fuzz Testing
+
+Generate random/malformed inputs to find edge cases.
+
+```python
+from hypothesis import given, strategies as st
+
+@given(st.text())
+def test_function_doesnt_crash_on_any_string(s):
+    result = my_function(s)  # Should never raise exception
+    assert result is not None
+```
+
+### Data Validation Framework (Great Expectations)
+
+```python
+import great_expectations as gx
+
+# Define expectations
+expectation_suite = gx.ExpectationSuite(name="my_data_suite")
+expectation_suite.add_expectation(gx.expectations.ExpectColumnToExist(column="user_id"))
+expectation_suite.add_expectation(gx.expectations.ExpectColumnValuesToNotBeNull(column="user_id"))
+expectation_suite.add_expectation(gx.expectations.ExpectColumnValuesToBeBetween(column="age", min_value=0, max_value=120))
+
+# Validate data
+results = context.run_validation(batch_request, expectation_suite)
+print(results["success"])  # True if all expectations met
+```
+
+## 7. When to Use Each Method
+
+| Research Goal | Method | Key Consideration |
+|---------------|--------|-------------------|
+| Causal effect estimation | RCT, IV, RDD, DiD | Identify confounders, check assumptions |
+| Prediction/forecasting | Supervised ML | Avoid data leakage, validate out-of-sample |
+| Pattern discovery | Clustering, PCA, t-SNE | Dimensionality reduction first if high-D |
+| Complex logic testing | Property-based testing | Define invariants that must hold |
+| Data quality | Great Expectations | Automate checks in pipelines |
--- a/skills/code-data-analysis-scaffolds/resources/template.md
+++ b/skills/code-data-analysis-scaffolds/resources/template.md
@@ -0,0 +1,391 @@
+# Code Data Analysis Scaffolds Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Code Data Analysis Scaffolds Progress:
+- [ ] Step 1: Clarify task and objectives
+- [ ] Step 2: Choose appropriate scaffold type
+- [ ] Step 3: Generate scaffold structure
+- [ ] Step 4: Validate scaffold completeness
+- [ ] Step 5: Deliver scaffold and guide execution
+```
+
+**Step 1: Clarify task** - Ask context questions to understand task type, constraints, expected outcomes. See [Context Questions](#context-questions).
+
+**Step 2: Choose scaffold** - Select TDD, EDA, Statistical Analysis, or Validation based on task. See [Scaffold Selection Guide](#scaffold-selection-guide).
+
+**Step 3: Generate structure** - Use appropriate scaffold template. See [TDD Scaffold](#tdd-scaffold), [EDA Scaffold](#eda-scaffold), [Statistical Analysis Scaffold](#statistical-analysis-scaffold), or [Validation Scaffold](#validation-scaffold).
+
+**Step 4: Validate completeness** - Check scaffold covers requirements, includes validation steps, makes assumptions explicit. See [Quality Checklist](#quality-checklist).
+
+**Step 5: Deliver and guide** - Present scaffold, highlight next steps, surface any gaps discovered. Execute if user wants help.
+
+## Context Questions
+
+**For all tasks:**
+- What are you trying to accomplish? (Specific outcome expected)
+- What's the context? (Dataset characteristics, codebase state, existing work)
+- Any constraints? (Time, tools, data limitations, performance requirements)
+- What does success look like? (Acceptance criteria, quality bar)
+
+**For TDD tasks:**
+- What functionality needs tests? (Feature, bug fix, refactor)
+- Existing test coverage? (None, partial, comprehensive)
+- Test framework preference? (pytest, jest, junit, etc.)
+- Integration vs unit tests? (Scope of testing)
+
+**For EDA tasks:**
+- What's the dataset? (Size, format, source)
+- What questions are you trying to answer? (Exploratory vs. hypothesis-driven)
+- Existing knowledge about data? (Schema, distributions, known issues)
+- End goal? (Feature engineering, quality assessment, insights)
+
+**For Statistical/Modeling tasks:**
+- What's the research question? (Descriptive, predictive, causal)
+- Available data? (Sample size, variables, treatment/control)
+- Causal or predictive goal? (Understanding why vs. forecasting what)
+- Significance level / acceptable error rate?
+
+## Scaffold Selection Guide
+
+| User Says | Task Type | Scaffold to Use |
+|-----------|-----------|-----------------|
+| "Write tests for..." | TDD | [TDD Scaffold](#tdd-scaffold) |
+| "Explore this dataset..." | EDA | [EDA Scaffold](#eda-scaffold) |
+| "Analyze the effect of..." / "Does X cause Y?" | Causal Inference | See methodology.md |
+| "Predict..." / "Classify..." / "Forecast..." | Predictive Modeling | See methodology.md |
+| "Design an A/B test..." / "Compare groups..." | Statistical Analysis | [Statistical Analysis Scaffold](#statistical-analysis-scaffold) |
+| "Validate..." / "Check quality..." | Validation | [Validation Scaffold](#validation-scaffold) |
+
+## TDD Scaffold
+
+Use when writing new code, refactoring, or fixing bugs. **Write tests FIRST, then implement.**
+
+### Quick Template
+
+```python
+# Test file: test_[module].py
+import pytest
+from [module] import [function_to_test]
+
+# 1. HAPPY PATH TESTS (expected usage)
+def test_[function]_with_valid_input():
+    """Test normal, expected behavior"""
+    result = [function](valid_input)
+    assert result == expected_output
+    assert result.property == expected_value
+
+# 2. EDGE CASE TESTS (boundary conditions)
+def test_[function]_with_empty_input():
+    """Test with empty/minimal input"""
+    result = [function]([])
+    assert result == expected_for_empty
+
+def test_[function]_with_maximum_input():
+    """Test with large/maximum input"""
+    result = [function](large_input)
+    assert result is not None
+
+# 3. ERROR CONDITION TESTS (invalid input, expected failures)
+def test_[function]_with_invalid_input():
+    """Test proper error handling"""
+    with pytest.raises(ValueError):
+        [function](invalid_input)
+
+def test_[function]_with_none_input():
+    """Test None handling"""
+    with pytest.raises(TypeError):
+        [function](None)
+
+# 4. STATE TESTS (if function modifies state)
+def test_[function]_modifies_state_correctly():
+    """Test side effects are correct"""
+    obj = Object()
+    obj.[function](param)
+    assert obj.state == expected_state
+
+# 5. INTEGRATION TESTS (if interacting with external systems)
+@pytest.fixture
+def mock_external_service():
+    """Mock external dependencies"""
+    return Mock(spec=ExternalService)
+
+def test_[function]_with_external_service(mock_external_service):
+    """Test integration points"""
+    result = [function](mock_external_service)
+    mock_external_service.method.assert_called_once()
+    assert result == expected_from_integration
+```
+
+### Test Data Setup
+
+```python
+# conftest.py or test fixtures
+@pytest.fixture
+def sample_data():
+    """Reusable test data"""
+    return {
+        "valid": [...],
+        "edge_case": [...],
+        "invalid": [...]
+    }
+
+@pytest.fixture(scope="session")
+def database_session():
+    """Database for integration tests"""
+    db = create_test_db()
+    yield db
+    db.cleanup()
+```
+
+### TDD Cycle
+
+1. **Red**: Write failing test (defines what success looks like)
+2. **Green**: Write minimal code to make test pass
+3. **Refactor**: Improve code while keeping tests green
+4. **Repeat**: Next test case
+
+## EDA Scaffold
+
+Use when exploring new dataset. Follow systematic plan to understand data quality and patterns.
+
+### Quick Template
+
+```python
+# 1. DATA OVERVIEW
+# Load and inspect
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+
+df = pd.read_[format]('data.csv')
+
+# Basic info
+print(f"Shape: {df.shape}")
+print(f"Columns: {df.columns.tolist()}")
+print(df.dtypes)
+print(df.head())
+print(df.info())
+print(df.describe())
+
+# 2. DATA QUALITY CHECKS
+# Missing values
+missing = df.isnull().sum()
+missing_pct = (missing / len(df)) * 100
+print(missing_pct[missing_pct > 0])
+
+# Duplicates
+print(f"Duplicates: {df.duplicated().sum()}")
+
+# Data types consistency
+print("Check: Are numeric columns actually numeric?")
+print("Check: Are dates parsed correctly?")
+print("Check: Are categorical variables encoded properly?")
+
+# 3. UNIVARIATE ANALYSIS
+# Numeric: mean, median, std, range, distribution plots, outliers (IQR method)
+for col in df.select_dtypes(include=[np.number]).columns:
+    print(f"{col}: mean={df[col].mean():.2f}, median={df[col].median():.2f}, std={df[col].std():.2f}")
+    df[col].hist(bins=50); plt.title(f'{col} Distribution'); plt.show()
+    Q1, Q3 = df[col].quantile([0.25, 0.75])
+    outliers = ((df[col] < (Q1 - 1.5*(Q3-Q1))) | (df[col] > (Q3 + 1.5*(Q3-Q1)))).sum()
+    print(f"  Outliers: {outliers} ({outliers/len(df)*100:.1f}%)")
+
+# Categorical: value counts, unique values, bar plots
+for col in df.select_dtypes(include=['object', 'category']).columns:
+    print(f"{col}: {df[col].nunique()} unique, most common={df[col].mode()[0]}")
+    df[col].value_counts().head(10).plot(kind='bar'); plt.show()
+
+# 4. BIVARIATE ANALYSIS
+# Correlation heatmap, pairplots, categorical vs numeric boxplots
+sns.heatmap(df.select_dtypes(include=[np.number]).corr(), annot=True, cmap='coolwarm')
+sns.pairplot(df[['var1', 'var2', 'var3', 'target']], hue='target'); plt.show()
+# For each categorical-numeric pair, create boxplots to see distributions
+
+# 5. INSIGHTS & NEXT STEPS
+print("\n=== KEY FINDINGS ===")
+print("1. Data quality: [summary]")
+print("2. Distributions: [any skewness, outliers]")
+print("3. Correlations: [strong relationships found]")
+print("4. Missing patterns: [systematic missingness?]")
+print("\n=== RECOMMENDED ACTIONS ===")
+print("1. Handle missing data: [imputation strategy]")
+print("2. Address outliers: [cap, remove, transform]")
+print("3. Feature engineering: [ideas based on EDA]")
+print("4. Data transformations: [log, standardize, encode]")
+```
+
+### EDA Checklist
+
+- [ ] Load data and check shape/dtypes
+- [ ] Assess missing values (how much, which variables, patterns?)
+- [ ] Check for duplicates
+- [ ] Validate data types (numeric, categorical, dates)
+- [ ] Univariate analysis (distributions, outliers, summary stats)
+- [ ] Bivariate analysis (correlations, relationships with target)
+- [ ] Identify data quality issues
+- [ ] Document insights and recommended next steps
+
+## Statistical Analysis Scaffold
+
+Use for hypothesis testing, A/B tests, comparing groups.
+
+### Quick Template
+
+```python
+# STATISTICAL ANALYSIS SCAFFOLD
+
+# 1. DEFINE RESEARCH QUESTION
+question = "Does treatment X improve outcome Y?"
+
+# 2. STATE HYPOTHESES
+H0 = "Treatment X has no effect on outcome Y (null hypothesis)"
+H1 = "Treatment X improves outcome Y (alternative hypothesis)"
+
+# 3. SET SIGNIFICANCE LEVEL
+alpha = 0.05  # 5% significance level (Type I error rate)
+power = 0.80  # 80% power (1 - Type II error rate)
+
+# 4. CHECK ASSUMPTIONS (t-test: independence, normality, equal variance)
+from scipy import stats
+_, p_norm = stats.shapiro(treatment_group)  # Normality test
+_, p_var = stats.levene(treatment_group, control_group)  # Equal variance test
+print(f"Normality: p={p_norm:.3f} {'✓' if p_norm > 0.05 else '✗ use non-parametric'}")
+print(f"Equal variance: p={p_var:.3f} {'✓' if p_var > 0.05 else '✗ use Welch t-test'}")
+
+# 5. PERFORM STATISTICAL TEST
+# Choose appropriate test based on data type and assumptions
+
+# For continuous outcome, 2 groups:
+statistic, p_value = stats.ttest_ind(treatment_group, control_group)
+print(f"t-statistic: {statistic:.3f}, p-value: {p_value:.4f}")
+
+# For categorical outcome, 2 groups:
+from scipy.stats import chi2_contingency
+contingency_table = pd.crosstab(df['group'], df['outcome'])
+chi2, p_value, dof, expected = chi2_contingency(contingency_table)
+print(f"Chi-square: {chi2:.3f}, p-value: {p_value:.4f}")
+
+# 6. INTERPRET RESULTS & EFFECT SIZE
+if p_value < alpha:
+    cohen_d = (treatment_group.mean() - control_group.mean()) / pooled_std
+    effect = "Small" if abs(cohen_d) < 0.2 else "Medium" if abs(cohen_d) < 0.5 else "Large"
+    print(f"REJECT H0 (p={p_value:.4f}). Effect size (Cohen's d)={cohen_d:.3f} ({effect})")
+else:
+    print(f"FAIL TO REJECT H0 (p={p_value:.4f}). Insufficient evidence for effect.")
+
+# 7. CONFIDENCE INTERVAL & SENSITIVITY
+ci_95 = stats.t.interval(0.95, len(treatment_group)-1, loc=treatment_group.mean(), scale=stats.sem(treatment_group))
+print(f"95% CI: [{ci_95[0]:.2f}, {ci_95[1]:.2f}]")
+print("Sensitivity: Check without outliers, with non-parametric test, with confounders")
+```
+
+### Statistical Test Selection
+
+| Data Type | # Groups | Test |
+|-----------|----------|------|
+| Continuous | 2 | t-test (or Welch's if unequal variance) |
+| Continuous | 3+ | ANOVA (or Kruskal-Wallis if non-normal) |
+| Categorical | 2 | Chi-square or Fisher's exact |
+| Ordinal | 2 | Mann-Whitney U |
+| Paired/Repeated | 2 | Paired t-test or Wilcoxon signed-rank |
+
+## Validation Scaffold
+
+Use for validating data quality, code quality, or model quality before shipping.
+
+### Data Validation Template
+
+```python
+# DATA VALIDATION CHECKLIST
+
+# 1. SCHEMA VALIDATION
+expected_columns = ['id', 'timestamp', 'value', 'category']
+assert set(df.columns) == set(expected_columns), "Column mismatch"
+
+expected_dtypes = {'id': 'int64', 'timestamp': 'datetime64', 'value': 'float64', 'category': 'object'}
+for col, dtype in expected_dtypes.items():
+    assert df[col].dtype == dtype, f"{col} type mismatch: expected {dtype}, got {df[col].dtype}"
+
+# 2. RANGE VALIDATION
+assert df['value'].min() >= 0, "Negative values found (should be >= 0)"
+assert df['value'].max() <= 100, "Values exceed maximum (should be <= 100)"
+
+# 3. UNIQUENESS VALIDATION
+assert df['id'].is_unique, "Duplicate IDs found"
+
+# 4. COMPLETENESS VALIDATION
+required_fields = ['id', 'value']
+for field in required_fields:
+    missing_pct = df[field].isnull().mean() * 100
+    assert missing_pct == 0, f"{field} has {missing_pct:.1f}% missing (required field)"
+
+# 5. CONSISTENCY VALIDATION
+assert (df['start_date'] <= df['end_date']).all(), "start_date after end_date found"
+
+# 6. REFERENTIAL INTEGRITY
+valid_categories = ['A', 'B', 'C']
+assert df['category'].isin(valid_categories).all(), "Invalid categories found"
+
+print("✓ All data validations passed")
+```
+
+### Code Validation Checklist
+
+- [ ] **Unit tests**: All functions have tests covering happy path, edge cases, errors
+- [ ] **Integration tests**: APIs, database interactions tested end-to-end
+- [ ] **Test coverage**: ≥80% coverage for critical paths
+- [ ] **Error handling**: All exceptions caught and handled gracefully
+- [ ] **Input validation**: All user inputs validated before processing
+- [ ] **Logging**: Key operations logged for debugging
+- [ ] **Documentation**: Functions have docstrings, README updated
+- [ ] **Performance**: No obvious performance bottlenecks (profiled if needed)
+- [ ] **Security**: No hardcoded secrets, SQL injection protected, XSS prevented
+
+### Model Validation Checklist
+
+- [ ] **Train/val/test split**: Data split before any preprocessing (no data leakage)
+- [ ] **Baseline model**: Simple baseline implemented for comparison
+- [ ] **Cross-validation**: k-fold CV performed (k≥5)
+- [ ] **Metrics**: Appropriate metrics chosen (accuracy, precision/recall, AUC, RMSE, etc.)
+- [ ] **Overfitting check**: Training vs validation performance compared
+- [ ] **Error analysis**: Failure modes analyzed, edge cases identified
+- [ ] **Fairness**: Model checked for bias across sensitive groups
+- [ ] **Interpretability**: Feature importance or SHAP values computed
+- [ ] **Robustness**: Model tested with perturbed inputs
+- [ ] **Monitoring**: Drift detection and performance tracking in place
+
+## Quality Checklist
+
+Before delivering, verify:
+
+**Scaffold Structure:**
+- [ ] Clear step-by-step process defined
+- [ ] Each step has concrete actions (not vague advice)
+- [ ] Validation checkpoints included
+- [ ] Expected outputs specified
+
+**Completeness:**
+- [ ] Covers all requirements from user's task
+- [ ] Includes example code/pseudocode where helpful
+- [ ] Anticipates edge cases and error conditions
+- [ ] Provides decision guidance (when to use which approach)
+
+**Clarity:**
+- [ ] Assumptions stated explicitly
+- [ ] Technical terms defined or illustrated
+- [ ] Success criteria clear
+- [ ] Next steps obvious
+
+**Actionability:**
+- [ ] User can execute scaffold without further guidance
+- [ ] Code snippets are runnable (or nearly runnable)
+- [ ] Gaps surfaced early (missing data, unclear requirements)
+- [ ] Includes validation/quality checks
+
+**Rubric Score:**
+- [ ] Self-assessed with rubric ≥ 3.5 average
--- a/skills/cognitive-design/SKILL.md
+++ b/skills/cognitive-design/SKILL.md
@@ -0,0 +1,350 @@
+---
+name: cognitive-design
+description: Use when designing visual interfaces, data visualizations, educational content, or presentations and need to ensure they align with how humans naturally perceive, process, and remember information. Invoke when user mentions cognitive load, visual hierarchy, dashboard design, form design, e-learning, infographics, or wants to improve clarity and reduce user confusion. Also applies when evaluating existing designs for cognitive alignment or choosing between design alternatives.
+---
+
+# Cognitive Design Principles
+
+## Table of Contents
+
+- [Read This First](#read-this-first)
+- [How to Use This Skill](#how-to-use-this-skill)
+- [Workflows](#workflows)
+  - [New Design Workflow](#new-design-workflow)
+  - [Design Review Workflow](#design-review-workflow)
+  - [Quick Validation Workflow](#quick-validation-workflow)
+- [Path Selection Menu](#path-selection-menu)
+  - [Path 1: Understand Cognitive Foundations](#path-1-understand-cognitive-foundations)
+  - [Path 2: Apply Design Frameworks](#path-2-apply-design-frameworks)
+  - [Path 3: Evaluate Existing Designs](#path-3-evaluate-existing-designs)
+  - [Path 4: Get Domain-Specific Guidance](#path-4-get-domain-specific-guidance)
+  - [Path 5: Avoid Common Mistakes](#path-5-avoid-common-mistakes)
+  - [Path 6: Access Quick Reference](#path-6-access-quick-reference)
+  - [Path 7: Exit](#path-7-exit)
+
+---
+
+## Read This First
+
+### What This Skill Does
+
+This skill helps you create **cognitively aligned designs** - visual interfaces, data visualizations, educational content, and presentations that work with (not against) human perception, attention, memory, and decision-making.
+
+**Core principle:** Effective design aligns with how people think, not just how things look.
+
+### Why Cognitive Design Matters
+
+**Common problems this addresses:**
+- Users miss critical information in dashboards
+- Complex interfaces cause cognitive overload and abandonment
+- Data visualizations are misinterpreted or misleading
+- Educational materials aren't retained
+- Form completion rates are low
+- Error messages are confusing
+
+**How this helps:**
+- Ground design decisions in cognitive psychology research
+- Apply systematic frameworks (Cognitive Design Pyramid, Design Feedback Loop, Three-Layer Model)
+- Evaluate designs against objective cognitive criteria
+- Choose appropriate visual encodings based on perceptual hierarchy
+- Manage attention, memory limits, and cognitive load
+
+### When to Use This Skill
+
+**Use this skill when:**
+- ✓ Designing new interfaces, dashboards, visualizations, or educational content
+- ✓ Users report confusion, overwhelm, or missing information
+- ✓ Improving designs with poor metrics (low completion rates, high bounce rates)
+- ✓ Conducting design reviews and need systematic evaluation
+- ✓ Choosing between design alternatives with cognitive rationale
+- ✓ Advocating for design decisions to stakeholders
+- ✓ Designing high-stakes interfaces where cognitive failure has consequences
+
+**Do NOT use for:**
+- ✗ Pure aesthetic/brand identity decisions unrelated to cognition
+- ✗ Technical implementation (coding, databases)
+- ✗ Business strategy (feature prioritization, pricing)
+- ✗ Tool-specific training (how to use Figma, Tableau, etc.)
+
+### How This Skill Works
+
+This is an **interactive hub** - you choose your path based on current need:
+
+1. **Learning mode:** Start with Path 1 (Cognitive Foundations) to understand principles
+2. **Application mode:** Jump to Path 2 (Frameworks) or Path 4 (Domain Guidance) to apply to specific design
+3. **Evaluation mode:** Use Path 3 (Evaluation Tools) to assess existing designs
+4. **Quick mode:** Use Path 6 (Quick Reference) for rapid decision-making
+
+**After completing any path, return to the menu to select another or exit.**
+
+### Collaborative Nature
+
+I'll guide you through cognitive design principles by:
+- Explaining WHY certain designs work (cognitive foundations)
+- Providing HOW to apply principles (frameworks and workflows)
+- Offering domain-specific guidance (data viz, UX, education, storytelling)
+- Evaluating designs systematically (checklists and audits)
+
+You bring domain expertise and context; I provide cognitive science grounding.
+
+---
+
+## How to Use This Skill
+
+### Prerequisites
+
+- Basic design literacy (familiarity with UI terminology, common chart types)
+- Understanding of user tasks and context (from user research, stories, or brief)
+- Access to design being created or evaluated
+
+### Expected Outcomes
+
+**Immediate:**
+- Designs with clear visual hierarchy
+- Reduced cognitive load (fewer overwhelm complaints)
+- Systematic evaluation process
+
+**Short-term (weeks):**
+- Improved usability metrics (completion rates, time-on-task)
+- Fewer user support requests
+- More defensible design decisions
+
+**Long-term (months):**
+- Internalized cognitive principles
+- Team shared vocabulary
+- Measurable business impact
+
+---
+
+## Workflows
+
+Choose a workflow based on your current situation:
+
+### New Design Workflow
+
+**Use when:** Creating a new interface, dashboard, visualization, or educational content from scratch
+
+**Time:** 2-4 hours
+
+**Copy this checklist and track your progress:**
+
+```
+New Design Progress:
+- [ ] Step 1: Orient to cognitive principles
+- [ ] Step 2: Structure design thinking with frameworks
+- [ ] Step 3: Apply domain-specific guidance
+- [ ] Step 4: Evaluate design for cognitive alignment
+- [ ] Step 5: Check for common mistakes
+- [ ] Step 6: Iterate based on findings
+```
+
+**Step 1: Orient to cognitive principles**
+
+Start with [Cognitive Foundations](resources/cognitive-foundations.md) for deep understanding of WHY designs work (perception, memory, Gestalt principles) OR use [Quick Reference](resources/quick-reference.md) for rapid orientation (20 core principles, decision rules). Foundations give you theoretical grounding; Quick Reference gets you started faster.
+
+**Step 2: Structure design thinking with frameworks**
+
+Use [Design Frameworks](resources/frameworks.md) to apply systematic approaches: Cognitive Design Pyramid (4-tier quality assessment), Design Feedback Loop (interaction cycles), and Three-Layer Visualization Model (data communication fidelity). These provide repeatable structure for design decisions.
+
+**Step 3: Apply domain-specific guidance**
+
+Choose your domain: [Data Visualization](resources/data-visualization.md) for charts/dashboards, [UX Product Design](resources/ux-product-design.md) for interfaces/apps, [Educational Design](resources/educational-design.md) for e-learning/training, or [Storytelling & Journalism](resources/storytelling-journalism.md) for data journalism/presentations. Apply tailored cognitive principles for your specific context.
+
+**Step 4: Evaluate design for cognitive alignment**
+
+Use [Evaluation Tools](resources/evaluation-tools.md) to assess systematically: Cognitive Design Checklist (8 dimensions including visibility, hierarchy, chunking) and Visualization Audit Framework (4 criteria: Clarity, Efficiency, Integrity, Aesthetics). Identify weaknesses and prioritize fixes.
+
+**Step 5: Check for common mistakes**
+
+Review [Cognitive Fallacies](resources/cognitive-fallacies.md) to prevent failures: chartjunk, truncated axes, 3D distortion, cognitive biases, data integrity violations. Ensure your design avoids misleading patterns.
+
+**Step 6: Iterate based on findings**
+
+Return to domain guidance or frameworks as needed. Fix issues identified in evaluation. Re-evaluate until design passes cognitive alignment criteria (avg score ≥3.5 on rubric).
+
+---
+
+### Design Review Workflow
+
+**Use when:** Evaluating existing designs for cognitive alignment, conducting design critiques, or diagnosing usability issues
+
+**Time:** 30-60 minutes
+
+**Copy this checklist and track your progress:**
+
+```
+Design Review Progress:
+- [ ] Step 1: Systematic assessment with evaluation tools
+- [ ] Step 2: Quick checks for common mistakes
+- [ ] Step 3: Rapid validation against core principles
+- [ ] Step 4: Note issues and prioritize fixes
+```
+
+**Step 1: Systematic assessment with evaluation tools**
+
+Start with [Evaluation Tools](resources/evaluation-tools.md) for comprehensive review: Apply Cognitive Design Checklist (visibility, hierarchy, chunking, simplicity, memory support, feedback, consistency, scanning) and Visualization Audit Framework (score Clarity, Efficiency, Integrity, Aesthetics 1-5). Identify failing dimensions.
+
+**Step 2: Quick checks for common mistakes**
+
+Reference [Cognitive Fallacies](resources/cognitive-fallacies.md) for rapid diagnosis: Look for chartjunk, truncated axes, 3D effects, misleading colors, data integrity violations. These are common culprits in cognitive failures.
+
+**Step 3: Rapid validation against core principles**
+
+Use [Quick Reference](resources/quick-reference.md) for fast validation: Apply 3-question check (Attention? Memory? Clarity?), verify chart selection matches task, check color usage, confirm chunking fits working memory. Catches major issues quickly.
+
+**Step 4: Note issues and prioritize fixes**
+
+Document findings with severity: CRITICAL (integrity violations, accessibility failures), HIGH (clarity/efficiency issues preventing use), MEDIUM (suboptimal patterns, aesthetic issues), LOW (minor optimizations). Prioritize fixes foundation-first (perception → coherence → engagement → behavior).
+
+---
+
+### Quick Validation Workflow
+
+**Use when:** Need rapid go/no-go decision, spot-checking changes, or validating against cognitive basics during active design work
+
+**Time:** 5-10 minutes
+
+**Copy this checklist and track your progress:**
+
+```
+Quick Validation Progress:
+- [ ] Step 1: Three-question rapid check
+- [ ] Step 2: Spot checks if issues found
+```
+
+**Step 1: Three-question rapid check**
+
+Use [Quick Reference](resources/quick-reference.md) and apply: (1) Attention - "Is it obvious what to look at first?" (visual hierarchy clear, primary elements salient, predictable scanning), (2) Memory - "Is user required to remember anything that could be shown?" (state visible, options presented, fits 4±1 chunks), (3) Clarity - "Can someone unfamiliar understand in 5 seconds?" (purpose graspable, no unnecessary decoration, familiar terminology). If all YES → likely cognitively sound.
+
+**Step 2: Spot checks if issues found**
+
+If any question fails, use [Evaluation Tools](resources/evaluation-tools.md) for targeted diagnosis: Failed attention? Check hierarchy and visual salience sections. Failed memory? Check chunking and memory support sections. Failed clarity? Check simplicity and labeling sections. Apply specific fixes from relevant section.
+
+---
+
+## Path Selection Menu
+
+**Choose your path based on current need:**
+
+### Path 1: Understand Cognitive Foundations
+
+**Choose this when:** You want to learn the core cognitive psychology principles underlying effective design (attention, memory, perception, Gestalt grouping, visual encoding hierarchy).
+
+**What you'll get:** Deep understanding of WHY certain designs work, grounded in research from Tufte, Norman, Ware, Cleveland & McGill, Mayer, and others.
+
+**Time:** 20-40 minutes
+
+**→ [Go to Cognitive Foundations resource](resources/cognitive-foundations.md)**
+
+---
+
+### Path 2: Apply Design Frameworks
+
+**Choose this when:** You want systematic frameworks to structure your design thinking and decision-making.
+
+**What you'll get:** Three complementary frameworks:
+- **Cognitive Design Pyramid** (4 tiers: Perceptual Efficiency → Cognitive Coherence → Emotional Engagement → Behavioral Alignment)
+- **Design Feedback Loop** (Perceive → Interpret → Decide → Act → Learn)
+- **Three-Layer Visualization Model** (Data → Visual Encoding → Cognitive Interpretation)
+
+**Time:** 30-45 minutes
+
+**→ [Go to Frameworks resource](resources/frameworks.md)**
+
+---
+
+### Path 3: Evaluate Existing Designs
+
+**Choose this when:** You need to assess a design systematically for cognitive alignment, or conducting a design review/critique.
+
+**What you'll get:**
+- **Cognitive Design Checklist** (visibility, hierarchy, chunking, consistency, feedback, memory support)
+- **Visualization Audit Framework** (4 criteria: Clarity, Efficiency, Integrity, Aesthetics)
+- Examples of evaluation in practice
+
+**Time:** 30-60 minutes (depending on design complexity)
+
+**→ [Go to Evaluation Tools resource](resources/evaluation-tools.md)**
+
+---
+
+### Path 4: Get Domain-Specific Guidance
+
+**Choose this when:** You're working on a specific type of design and want tailored cognitive principles for that context.
+
+**Choose your domain:**
+
+#### 4a. Data Visualization (Charts, Dashboards, Analytics)
+
+**→ [Go to Data Visualization resource](resources/data-visualization.md)**
+
+**Covers:** Chart selection via task-encoding alignment, dashboard hierarchy and grouping, progressive disclosure for exploration, narrative data visualization
+
+---
+
+#### 4b. Product/UX Design (Interfaces, Mobile Apps, Web Applications)
+
+**→ [Go to UX Product Design resource](resources/ux-product-design.md)**
+
+**Covers:** Learnability via familiar patterns, task flow efficiency, cognitive load management, onboarding design, error handling
+
+---
+
+#### 4c. Educational Design (E-Learning, Training, Instructional Materials)
+
+**→ [Go to Educational Design resource](resources/educational-design.md)**
+
+**Covers:** Multimedia learning principles, dual coding, worked examples, retrieval practice, segmenting, coherence principle
+
+---
+
+#### 4d. Storytelling/Journalism (Data Journalism, Presentations, Infographics)
+
+**→ [Go to Storytelling & Journalism resource](resources/storytelling-journalism.md)**
+
+**Covers:** Visual narrative structure, annotation strategies, scrollytelling, framing and context, visual metaphors
+
+---
+
+### Path 5: Avoid Common Mistakes
+
+**Choose this when:** You want to prevent or diagnose cognitive design failures - chartjunk, misleading visualizations, cognitive biases, data integrity violations.
+
+**What you'll get:**
+- Common cognitive fallacies explained (WHY they fail)
+- Visual misleads and how to avoid them
+- Integrity principles for trustworthy design
+
+**Time:** 15-25 minutes
+
+**→ [Go to Cognitive Fallacies resource](resources/cognitive-fallacies.md)**
+
+---
+
+### Path 6: Access Quick Reference
+
+**Choose this when:** You need rapid design guidance, core principles summary, or quick validation checks.
+
+**What you'll get:**
+- 20 core actionable principles (one-sentence summaries)
+- 3-question quick check (Attention, Memory, Clarity)
+- Common decision rules (when to use bar vs pie charts, how to chunk information, etc.)
+- Design heuristics at a glance
+
+**Time:** 5-15 minutes
+
+**→ [Go to Quick Reference resource](resources/quick-reference.md)**
+
+---
+
+### Path 7: Exit
+
+**Choose this when:** You've completed your design work or gathered the information you need.
+
+**Before you exit:**
+- Have you achieved your goal for this session?
+- Do you need to return to any path for deeper exploration?
+- Have you documented key design decisions and cognitive rationale?
+
+**Thank you for using the Cognitive Design skill. Your designs are now more cognitively aligned!**
+
--- a/skills/cognitive-design/resources/cognitive-fallacies.md
+++ b/skills/cognitive-design/resources/cognitive-fallacies.md
@@ -0,0 +1,488 @@
+# Cognitive Fallacies & Visual Misleads
+
+This resource covers common design failures that confuse or mislead users - what NOT to do and how to avoid cognitive pitfalls.
+
+**Covered topics:**
+1. Chartjunk and data-ink ratio violations
+2. Misleading visualizations (truncated axes, 3D distortion)
+3. Cognitive biases in interpretation
+4. Data integrity violations
+5. Integrity principles and solutions
+
+---
+
+## Why Avoid Cognitive Fallacies
+
+### WHY This Matters
+
+**Core insight:** Common visualization mistakes aren't just aesthetic failures - they cause systematic cognitive misinterpretation and damage trust.
+
+**Problems caused by fallacies:**
+- **Chartjunk:** Consumes working memory without conveying data
+- **Truncated axes:** Exaggerates differences, misleads comparison
+- **3D effects:** Distorts perception through volume illusions
+- **Cherry-picking:** Misleads by omitting contradictory context
+- **Spurious correlations:** Implies false causation
+
+**Why designers commit fallacies:**
+- Aesthetic appeal prioritized over clarity
+- Unaware of cognitive impacts
+- Intentional manipulation (sometimes)
+- Following bad examples
+
+**Ethical obligation:** Visualizations are persuasive - designers have responsibility to communicate honestly.
+
+---
+
+## What You'll Learn
+
+**Five categories of failures:**
+
+1. **Visual Noise:** Chartjunk, low data-ink ratio, clutter
+2. **Perceptual Distortions:** 3D effects, volume illusions, inappropriate chart types
+3. **Cognitive Biases:** Confirmation bias, anchoring, framing effects
+4. **Data Integrity Violations:** Truncated axes, cherry-picking, non-uniform scales
+5. **Misleading Correlations:** Spurious correlations, false causation
+
+---
+
+## Why Chartjunk Impairs Comprehension
+
+### WHY This Matters
+
+**Definition:** Gratuitous visual decorations that obscure data without adding information (Tufte).
+
+**Cognitive mechanism:**
+- Working memory limited to 4±1 chunks
+- Every visual element consumes attentional resources
+- Decorative elements compete with data for attention
+- Resources spent on decoration unavailable for comprehension
+
+**Result:** Slower comprehension, higher cognitive load, missed insights
+
+---
+
+### WHAT to Avoid
+
+#### Common Chartjunk
+
+**Avoid:** Heavy backgrounds (gradients/textures), excessive gridlines, 3D effects for decoration, decorative icons replacing data, ornamental elements (borders/fancy fonts)
+**Why problematic:** Reduces contrast, creates visual noise, distorts perception, competes with data
+**Solution:** White/light background, minimal gray gridlines, flat 2D design, simple bars, minimal decoration
+
+---
+
+#### Data-Ink Ratio
+
+**Tufte's principle:** Maximize proportion of ink showing data, minimize non-data ink
+
+**Application:**
+```
+Audit every element: "Does this convey data or improve comprehension?"
+If NO → remove it
+If YES → keep it minimal
+```
+
+**Example optimization:**
+```
+Before:
+- Heavy gridlines: 30% ink
+- 3D effects: 20% ink
+- Decorative borders: 10% ink
+- Actual data: 40% ink
+Data-ink ratio: 40%
+
+After:
+- Minimal axis: 5% ink
+- Simple bars: 95% ink
+Data-ink ratio: 95%
+
+Result: Faster comprehension, clearer message
+```
+
+---
+
+## Why Truncated Axes Mislead
+
+### WHY This Matters
+
+**Definition:** Axes that don't start at zero, cutting off the baseline
+
+**Cognitive issue:** Viewers assume bar length is proportional to value - truncation breaks this assumption
+
+**Effect:** Small absolute differences appear dramatically large
+
+---
+
+### WHAT to Avoid
+
+#### Truncated Bar Charts
+
+**Problem example:**
+```
+Sales comparison:
+Company A: $80M (bar shows from $70M)
+Company B: $85M (bar shows from $70M)
+
+Visual: Company B's bar looks 5x larger than A
+Reality: Only 6.25% difference ($5M on $80M base)
+
+Mislead mechanism: Truncated y-axis starting at $70M instead of $0M
+```
+
+**Solutions:**
+```
+Option 1: Start y-axis at zero (honest proportional representation)
+Option 2: Use line chart instead (focuses on change, zero baseline less critical)
+Option 3: If truncating necessary, add clear axis break symbol and annotation
+Option 4: Show absolute numbers directly on bars to provide context
+```
+
+---
+
+#### When Truncation is Acceptable
+
+**Line charts:**
+```
+✓ Showing stock price changes over time
+✓ Temperature trends
+✓ Focus is on pattern/trend, not absolute magnitude comparison
+
+Requirement: Still note scale clearly, provide context
+```
+
+**Small multiples with consistent truncation:**
+```
+✓ Comparing trend patterns across categories
+✓ All charts use same y-axis range (fair comparison)
+✓ Clearly labeled
+
+Purpose: See shape differences, not magnitude
+```
+
+---
+
+## Why 3D Effects Distort
+
+### WHY This Matters
+
+**Cognitive issue:** Human visual system estimates 3D volumes and angles imprecisely
+
+**Mechanisms:**
+- **Perspective foreshortening:** Elements at front appear larger than identical elements at back
+- **Volume scaling:** Doubling height and width octuples volume, but may be perceived linearly
+- **Angle distortion:** 3D pie slices at different positions appear different sizes even when equal
+
+---
+
+### WHAT to Avoid
+
+#### 3D Bar Charts
+
+**Problem:**
+```
+3D bars with depth dimension:
+- Harder to judge bar height (top surface not aligned with axis)
+- Perspective makes back bars look smaller
+- Depth adds no information, only distortion
+
+Solution: Use simple 2D bars
+```
+
+---
+
+#### 3D Pie Charts
+
+**Problem:**
+```
+3D pie with perspective:
+- Slices at front appear larger than equal slices at back
+- Angles further distorted by tilt
+- Already-poor angle perception made worse
+
+Solution: If pie necessary, use flat 2D; better yet, use bar chart
+```
+
+---
+
+#### Volume Illusions
+
+**Problem:**
+```
+Scaling icons in multiple dimensions:
+Person icon twice as tall to represent 2x data
+Visual result: 4x area (height × width)
+Viewer perception: May see as 2x, 4x, or even 8x (volume)
+
+Solution: Scale only one dimension (height), or use simple bars
+```
+
+---
+
+## Why Cognitive Biases Affect Interpretation
+
+### WHY This Matters
+
+**Core insight:** Viewers bring cognitive biases to data interpretation - design can either reinforce or mitigate these biases.
+
+**Key biases:**
+- **Confirmation bias:** See what confirms existing beliefs
+- **Anchoring:** First number sets benchmark for all subsequent
+- **Framing effects:** How data is presented influences emotional response
+
+**Designer responsibility:** Present data neutrally, provide full context, enable exploration
+
+---
+
+### WHAT to Avoid
+
+#### Reinforcing Confirmation Bias
+
+**Problem:**
+```
+Dashboard highlighting data supporting desired conclusion
+- Only positive metrics prominent
+- Contradictory data hidden or small
+- Selective time periods
+
+Result: Viewers who want to believe conclusion find easy support
+```
+
+**Solutions:**
+```
+✓ Present full context, not cherry-picked subset
+✓ Enable filtering/exploration (users can challenge assumptions)
+✓ Show multiple viewpoints or comparisons
+✓ Note data limitations and contradictions
+```
+
+---
+
+#### Anchoring Effects
+
+**Problem:**
+```
+Leading with dramatic statistic:
+"Sales increased 500%!" (from $10k to $50k, absolute small)
+Then: "Annual revenue $2M" (anchored to 500%, seems disappointing)
+
+First number anchors perception of all subsequent numbers
+```
+
+**Solutions:**
+```
+✓ Provide baseline context upfront
+✓ Show absolute and relative numbers together
+✓ Be mindful of what's shown first in presentations
+✓ Use neutral sorting (alphabetical, not "best first")
+```
+
+---
+
+#### Framing Effects
+
+**Problem:**
+```
+Same data, different frames:
+"10% unemployment" vs "90% employed"
+"1 in 10 fail" vs "90% success rate"
+"50 new cases today" vs "Cumulative 5,000 cases"
+
+Same numbers, different emotional impact
+```
+
+**Solutions:**
+```
+✓ Acknowledge framing choice
+✓ Provide multiple views (daily AND cumulative)
+✓ Show absolute AND relative (percentages + actual numbers)
+✓ Consider audience and choose frame ethically
+```
+
+**Note:** Framing isn't inherently wrong, but it's powerful - use responsibly
+
+---
+
+## Why Data Integrity Violations Damage Trust
+
+### WHY This Matters
+
+**Tufte's Graphical Integrity:** Visual representation should be proportional to numerical quantities
+
+**Components:**
+- Honest axes (zero baseline or marked truncation)
+- Fair comparisons (same scale)
+- Complete context (full time period, baselines)
+- Clear labeling (sources, methodology)
+
+**Why it matters:** Trust is fragile - one misleading visualization damages credibility long-term
+
+---
+
+### WHAT to Avoid
+
+#### Cherry-Picking Time Periods
+
+**Problem:**
+```
+"Revenue grew 30% in Q4!"
+...omitting that it declined 40% over full year
+
+Mislead mechanism: Showing only favorable subset
+```
+
+**Solutions:**
+```
+✓ Show full relevant time period
+✓ If focusing on segment, show it IN CONTEXT of whole
+✓ Note data selection criteria ("Q4 only shown because...")
+✓ Provide historical comparison (vs same quarter previous year)
+```
+
+---
+
+#### Non-Uniform Scales
+
+**Problem:**
+```
+X-axis intervals:
+0, 10, 20, 50, 100 (not uniform increments)
+
+Effect: Trend appears to accelerate or decelerate artificially
+```
+
+**Solutions:**
+```
+✓ Use uniform scale intervals
+✓ If log scale needed, clearly label as such
+✓ If breaks necessary, mark with axis break symbol
+```
+
+---
+
+#### Missing Context
+
+**Problem:**
+```
+"50% increase!" without denominator
+50% of what? 2 to 3? 1000 to 1500?
+
+"Highest level in 6 months!" without historical context
+What was level 7 months ago? 1 year ago? Historical average?
+```
+
+**Solutions:**
+```
+✓ Show absolute numbers + percentages
+✓ Provide historical context (historical average, benchmarks)
+✓ Include comparison baselines (previous period, peer comparison)
+✓ Note sample size and data source
+```
+
+---
+
+## Why Spurious Correlations Mislead
+
+### WHY This Matters
+
+**Definition:** Statistical relationship between variables with no causal connection
+
+**Classic example:** Ice cream sales vs shark attacks (both increase in summer, no causal link)
+
+**Cognitive bias:** Correlation looks like causation when visualized together
+
+**Designer responsibility:** Clarify when relationships are correlational vs causal
+
+---
+
+### WHAT to Avoid
+
+#### Dual-Axis Manipulation
+
+**Problem:**
+```
+Dual-axis chart with independent scales:
+Left axis: Metric A (scaled 0-100)
+Right axis: Metric B (scaled 0-10)
+
+By adjusting scales, can make ANY two metrics appear correlated
+Viewer sees: Lines moving together, assumes relationship
+Reality: Scales manipulated to create visual similarity
+```
+
+**Solutions:**
+```
+✓ Use dual-axis only when relationship is justified (not arbitrary)
+✓ Clearly label both axes
+✓ Explain WHY two metrics are shown together
+✓ Consider separate charts if relationship unclear
+```
+
+---
+
+#### Implying Causation
+
+**Problem:**
+```
+Chart showing two rising trends:
+"Social media usage and depression rates both rising"
+
+Visual implication: One causes the other
+Reality: Could be correlation only, common cause, or coincidence
+```
+
+**Solutions:**
+```
+✓ Explicitly state "correlation, not proven causation"
+✓ Note other possible explanations
+✓ If causal claim, cite research supporting it
+✓ Provide mechanism explanation if causal
+```
+
+---
+
+## Integrity Principles
+
+### What to Apply
+
+**Honest Axes:**
+```
+✓ Start bars at zero or clearly mark truncation
+✓ Use uniform scale intervals
+✓ Label clearly with units
+✓ If log scale, label as such
+```
+
+**Fair Comparisons:**
+```
+✓ Use same scale for items being compared
+✓ Don't manipulate dual-axis to force visual correlation
+✓ Show data for same time periods
+✓ Include all relevant data points (not selective)
+```
+
+**Complete Context:**
+```
+✓ Show full time period or note selection
+✓ Include baselines (previous year, average, benchmark)
+✓ Provide denominator for percentages
+✓ Note when data is projected/estimated vs actual
+```
+
+**Accurate Encoding:**
+```
+✓ Match visual scaling to data scaling
+✓ Avoid volume illusions (icon sizing)
+✓ Use 2D representations for accuracy
+✓ Ensure color encoding matches meaning (red=negative convention)
+```
+
+**Transparency:**
+```
+✓ Note data sources
+✓ Mention sample sizes
+✓ State selection criteria if data is subset
+✓ Acknowledge limitations
+✓ Provide access to full dataset when possible
+```
+
--- a/skills/cognitive-design/resources/cognitive-foundations.md
+++ b/skills/cognitive-design/resources/cognitive-foundations.md
@@ -0,0 +1,490 @@
+# Cognitive Foundations
+
+This resource explains the core cognitive psychology principles underlying effective visual design.
+
+**Foundation for:** All other paths in this skill
+
+---
+
+## Why Learn Cognitive Foundations
+
+### WHY This Matters
+
+Understanding how humans perceive, attend, remember, and process information enables you to:
+- **Predict user behavior:** Know what users will notice, understand, and remember
+- **Design with confidence:** Ground decisions in research, not guesswork
+- **Diagnose problems:** Identify why designs fail cognitively
+- **Advocate effectively:** Explain design rationale with scientific backing
+
+**Mental model:** Like a doctor needs anatomy to diagnose illness, designers need cognitive psychology to diagnose interface problems.
+
+**Research foundation:** Tufte (data visualization), Norman (interaction design), Ware (visual perception), Cleveland & McGill (graphical perception), Miller/Cowan (working memory), Gestalt psychology (perceptual grouping), Mayer (multimedia learning).
+
+Without cognitive foundations: design by intuition alone, inconsistent decisions, inability to explain why designs work or fail.
+
+---
+
+## What You'll Learn
+
+### Core Cognitive Concepts
+
+This section covers 5 foundational areas:
+1. **Perception & Attention** - What users notice first, visual search, preattentive processing
+2. **Memory & Cognition** - Working memory limits, chunking, cognitive load
+3. **Gestalt & Grouping** - Automatic perceptual organization patterns
+4. **Visual Encoding** - Hierarchy of perceptual accuracy for different chart types
+5. **Mental Models & Mapping** - How prior experience shapes expectations
+
+---
+
+## Why Perception & Attention
+
+### WHY This Matters
+
+**Core insight:** Attention is limited and selective like a spotlight - you can only focus on one thing at high resolution while periphery remains blurry.
+
+**Implications:**
+- Visual design controls where attention goes
+- Competing elements dilute attention
+- Critical information must be made salient
+- Users scan, they don't read everything thoroughly
+
+**Mental model:** Human vision is like a camera with a tiny high-resolution center (fovea covers ~1-2° of visual field) and low-resolution periphery. We rapidly move attention spotlight to important areas.
+
+---
+
+### WHAT to Know
+
+#### Preattentive Processing
+
+**Definition:** Detection of basic visual features in under 500ms before conscious attention.
+
+**Preattentive Features:**
+- **Color:** A single red item among gray pops out instantly
+- **Form:** Unique shapes stand out (circle among squares)
+- **Motion:** Moving elements grab attention automatically
+- **Position:** Spatial outliers noticed quickly
+- **Size:** Largest/smallest items detected fast
+
+**Application Rule:**
+```
+Use preattentive features sparingly for 1-3 critical elements only.
+Too many salient elements = visual noise = nothing stands out.
+```
+
+**Example:** Dashboard alert in red among gray metrics → immediate attention
+
+**Anti-pattern:** Everything in bright colors → overwhelm, nothing prioritized
+
+---
+
+#### Visual Search
+
+**Definition:** After preattentive pop-out, users engage in serial visual search - scanning one area at a time with high acuity.
+
+**Predictable Patterns:**
+- **F-pattern:** Text-heavy content (read top-left to top-right, then down left side, short horizontal scan middle)
+- **Z-pattern:** Visual-heavy content (top-left to top-right, diagonal to bottom-left, then bottom-right)
+- **Gutenberg Diagram:** Primary optical area top-left, terminal area bottom-right
+
+**Application Rule:**
+```
+Position most important elements along scanning path:
+- Top-left for primary content (both patterns start here)
+- Top-right for secondary actions
+- Bottom-right for terminal actions (next/submit)
+```
+
+**Example:** Dashboard KPIs top-left, details below, "Export" button bottom-right
+
+---
+
+#### Attention Spotlight Trade-offs
+
+**Attention is zero-sum:** Effort spent on decorations is unavailable for data comprehension.
+
+**Rule:** Maximize attention on task-relevant elements, minimize on decorative elements.
+
+**Squint Test:** Blur design (squint or Gaussian blur) - can you still identify what's important? If hierarchy disappears when blurred, it's too subtle.
+
+---
+
+## Why Memory & Cognition
+
+### WHY This Matters
+
+**Core insight:** Working memory can hold only ~4±1 meaningful chunks of information, and items fade within seconds unless rehearsed.
+
+**Implications:**
+- Interfaces must be designed to fit memory limits
+- Information must be chunked into groups
+- State should be externalized to interface, not user's memory
+- Recognition is vastly easier than recall
+
+**Mental model:** Working memory is like juggling - you can keep 3-5 balls in the air, but adding more causes you to drop them all.
+
+**Updated understanding:** Miller (1956) proposed 7±2 chunks, but Cowan (2001) showed actual capacity is 4±1 for most people.
+
+---
+
+### WHAT to Know
+
+#### Working Memory Capacity
+
+**Definition:** "Mental scratchpad" holding active information temporarily (seconds) before transfer to long-term memory or loss.
+
+**Capacity:** **4±1 chunks** for most people
+
+**Chunk:** Meaningful unit of information
+- Expert: "555-123-4567" = 1 chunk (phone number)
+- Novice: "555-123-4567" = 10 chunks (each digit separate)
+
+**Application Rule:**
+```
+Design within 4±1 constraint:
+- Navigation: ≤7 top-level categories
+- Forms: 4-6 fields per screen without wizard
+- Dashboards: Group metrics into 3-5 categories
+- Choices: Limit concurrent options to ≤7
+```
+
+**Example:** Registration wizard with 4 steps (personal info, account, preferences, review) vs single 30-field page
+
+**Why it works:** Each step fits in working memory; progress indicator externalizes "where am I"
+
+---
+
+#### Cognitive Load Types
+
+**Intrinsic Load:** Inherent complexity of the task itself (can't change)
+
+**Extraneous Load:** Mental effort from poor design - confusing interfaces, unclear icons, missing context
+→ **MINIMIZE THIS**
+
+**Germane Load:** Meaningful mental effort contributing to learning or understanding
+→ **Support this, don't eliminate**
+
+**Application Rule:**
+```
+Reduce extraneous load to free capacity for germane load:
+- Remove decorative elements (chartjunk)
+- Simplify workflows (fewer steps, fewer choices)
+- Provide memory aids (breadcrumbs, visible state, tooltips)
+- Use familiar patterns (standard UI conventions)
+```
+
+**Example:** E-learning slide with audio narration + diagram (germane load) vs text + diagram + background music (extraneous load from competing visual text and irrelevant music)
+
+---
+
+#### Recognition vs. Recall
+
+**Recall:** Retrieve from memory without cues (hard, error-prone, slow)
+- "What was the error code?"
+- "Which menu has the export function?"
+
+**Recognition:** Identify among presented options (easy, fast, accurate)
+- Error message shown in context
+- Menu items visible
+
+**Application Rule:**
+```
+Design for recognition over recall:
+- Show available options (dropdown) vs requiring typed command
+- Display current filters as visible chips vs requiring memory
+- Provide breadcrumbs vs expecting user to remember navigation path
+- Use icons WITH labels vs icons alone
+```
+
+**Example:** File menu with visible "Save" option vs command-line requiring user to remember "save" command
+
+---
+
+#### Chunking Strategy
+
+**Definition:** Grouping related information into single meaningful units to overcome working memory limits.
+
+**Application Patterns:**
+- **Lists:** Break into categories (❌ 20 ungrouped nav items → ✓ 5 categories × 4 items)
+- **Forms:** Group fields (❌ 30 sequential fields → ✓ 4-step wizard with grouped fields)
+- **Phone numbers:** Use conventional chunking (555-123-4567 = 3 chunks vs 5551234567 = 10 chunks)
+- **Dashboards:** Group metrics by category (Traffic panel, Conversion panel, Error panel)
+
+---
+
+## Why Gestalt & Grouping
+
+### WHY This Matters
+
+**Core insight:** The brain automatically organizes visual elements based on proximity, similarity, continuity, and other principles - this happens preconsciously before you think about it.
+
+**Implications:**
+- Layout directly creates perceived meaning
+- Whitespace is not "empty" - it shows separation
+- Consistent styling shows relationships
+- Alignment implies connection
+
+**Mental model:** Gestalt principles are like visual grammar - rules the brain applies automatically to make sense of what it sees.
+
+**Historical note:** Gestalt psychology (1910s-1920s) discovered these principles, and they remain validated by modern neuroscience.
+
+---
+
+### WHAT to Know
+
+#### Proximity
+
+**Principle:** Items close together are perceived as belonging to the same group.
+
+**Application Rule:**
+```
+Use spatial grouping to show relationships:
+- Related form fields adjacent with minimal spacing
+- Unrelated groups separated by whitespace
+- Label next to corresponding graphic (not across page)
+```
+
+**Example:** Dashboard with 3 panels (traffic, conversions, errors) separated by whitespace → automatically perceived as 3 distinct groups
+
+**Anti-pattern:** Even spacing everywhere → no perceived structure
+
+---
+
+#### Similarity
+
+**Principle:** Elements that look similar (shape, color, size, style) are seen as related or part of a pattern.
+
+**Application Rule:**
+```
+Use consistent styling for related elements:
+- All primary CTAs same color/style
+- All error messages same red + icon treatment
+- All headings same font/size per level
+```
+
+**Example:** All "delete" actions red across interface → users learn "red = destructive action"
+
+**Anti-pattern:** Inconsistent button styles → users can't predict function from appearance
+
+---
+
+#### Continuity
+
+**Principle:** Elements arranged on a line or curve are perceived as more related than elements not aligned.
+
+**Application Rule:**
+```
+Use alignment to show structure:
+- Left-align related items (form fields, list items)
+- Grid layouts imply equal status
+- Flowing lines guide eye through narrative
+```
+
+**Example:** Infographic with visual flow from top to bottom, numbered steps along continuous path
+
+**Anti-pattern:** Haphazard placement → no clear reading order
+
+---
+
+#### Closure
+
+**Principle:** Mind perceives complete figures even when parts are missing, filling gaps mentally.
+
+**Application Rule:**
+```
+Leverage closure for simplified designs:
+- Dotted lines imply connection without solid line visual weight
+- Incomplete circles/shapes still recognized (reduce ink)
+- Implied boundaries via alignment without explicit borders
+```
+
+**Example:** Dashboard cards separated by whitespace only (no borders) → still perceived as distinct containers
+
+---
+
+#### Figure-Ground Segregation
+
+**Principle:** We instinctively separate scene into "figure" (foreground object of focus) and "ground" (background).
+
+**Application Rule:**
+```
+Ensure clear figure-ground distinction:
+- Modal dialogs: dim background, brighten foreground
+- Highlighted content: stronger contrast than surroundings
+- Active state: clearly differentiated from inactive
+```
+
+**Example:** Popup form with darkened background overlay → form is clearly the figure
+
+**Anti-pattern:** Insufficient contrast → users can't distinguish what's active
+
+---
+
+## Why Visual Encoding Hierarchy
+
+### WHY This Matters
+
+**Core insight:** Not all visual encodings are equally effective for human perception - some enable precise comparisons, others make them nearly impossible.
+
+**Cleveland & McGill's Hierarchy (most to least accurate):**
+1. Position along common scale (bar chart)
+2. Position along non-aligned scales
+3. Length
+4. Angle/slope
+5. Area
+6. Volume
+7. Color hue (categorical only)
+8. Color saturation
+
+**Implications:**
+- Chart type choice directly determines user accuracy
+- Bar charts enable 5-10x more precise comparison than pie charts
+- Color hue has no inherent ordering (don't use for quantities)
+
+**Mental model:** Human visual system is like a measurement tool - some encodings are precision instruments (position), others are crude estimates (area).
+
+---
+
+### WHAT to Know
+
+#### Task-Encoding Matching
+
+**Match encoding to user task:**
+- **Compare values:** Bar chart (position/length 5-10x more accurate than pie angle/area)
+- **See trend:** Line chart (continuous line shows temporal progression, slope changes visible)
+- **Understand distribution:** Histogram/box plot (shape visible, outliers apparent)
+- **Part-to-whole:** Stacked bar/treemap (avoid pie >5 slices - angles hard to judge)
+- **Find outliers/clusters:** Scatterplot (2D position enables pattern detection)
+
+---
+
+#### Color Encoding Rules
+
+- **Categorical:** Distinct hues (red, blue, green). Limit 5-7 categories. Hue lacks ordering.
+- **Quantitative:** Lightness gradient (light→dark). Avoid rainbow (misleading peaks). Darkness = more.
+- **Diverging:** Two-hue gradient through neutral (blue←gray→orange). Shows direction/magnitude.
+- **Accessible:** Redundant coding (color + icon + label). 8% males colorblind.
+
+---
+
+#### Perceptually Uniform Colormaps
+
+**Problem:** Rainbow spectrum has non-uniform perceptual steps - equal data differences don't produce equal perceived color differences. Can hide or exaggerate patterns.
+
+**Solution:** Use perceptually uniform colormaps:
+- **Viridis** (yellow→green→blue)
+- **Magma** (purple→red→yellow)
+- **Plasma** (purple→orange→yellow)
+
+**Why:** Equal data steps = equal perceived color steps = honest representation
+
+**Application:** Any heatmap, choropleth map, or continuous quantitative color scale
+
+---
+
+## Why Mental Models & Mapping
+
+### WHY This Matters
+
+**Core insight:** Users approach interfaces with preconceived notions (mental models) from past experiences. Designs that violate these models require re-learning and cause confusion.
+
+**Implications:**
+- Familiarity reduces cognitive load (System 1 processing)
+- Deviation from standards forces conscious thought (System 2 processing)
+- Consistent patterns become automatic
+- Natural mappings feel intuitive
+
+**Mental model:** Like driving a car - once you've learned, you don't consciously think about pedals. New car with reversed pedals would be dangerous because it violates your mental model.
+
+---
+
+### WHAT to Know
+
+#### Mental Models (User Expectations)
+
+**Definition:** Internal representations of how things work based on prior experience and analogy.
+
+**Sources:**
+- **Platform conventions:** iOS vs Android interaction patterns
+- **Cultural patterns:** Left-to-right reading in Western contexts
+- **Physical world:** Skeuomorphic metaphors (trash can for delete)
+- **Other applications:** "Most apps work this way" (Jakob's Law)
+
+**Application Rule:**
+```
+Use standard UI patterns to leverage existing mental models:
+- Hamburger menu for navigation (mobile)
+- Magnifying glass for search
+- Shopping cart for e-commerce
+- Standard keyboard shortcuts (Ctrl+C/V)
+```
+
+**When to deviate:** Only when standard pattern demonstrably fails for your use case AND you provide clear onboarding
+
+**Example:** File system trash can - users understand "throw away to delete" from physical world metaphor
+
+---
+
+#### Affordances & Signifiers
+
+**Affordance:** Properties suggesting how to interact (button affords clicking, slider affords dragging)
+
+**Signifier:** Visual cues indicating affordance (button looks raised/has shadow to signal "press me")
+
+**Application Rule:**
+```
+Design controls to signal their function:
+- Buttons: raised appearance, hover state, cursor change
+- Text inputs: rectangular field with border, cursor appears on click
+- Sliders: handle that looks draggable
+- Links: underlined or colored text, cursor changes to pointer
+```
+
+**Example:** Flat button vs raised button - raised appearance signals "I'm interactive" without user experimentation
+
+**Anti-pattern:** Everything looks flat and identical → user must experiment to discover what's clickable
+
+---
+
+#### Natural Mapping
+
+**Definition:** Relationship between controls and effects that mirrors spatial/conceptual relationships in intuitive way.
+
+**Application Patterns:**
+- **Spatial:** Stove knob layout mirrors burner layout (front-left knob → front-left burner)
+- **Directional:** Volume up = move up, zoom in = pinch outward, scroll down = swipe up (matches physical)
+- **Conceptual:** Green = go/good/success, Red = stop/bad/error (culturally learned, widely understood)
+
+**Rule:** Align control layout/direction with effect for instant comprehension
+
+---
+
+## Application: Putting Foundations Together
+
+### Dashboard Design Example
+
+**Problem:** 20 equal-weight metrics → cognitive overload
+
+**Applied Principles:**
+- **Attention:** 3-4 primary KPIs top-left (large), smaller secondary below, red for violations only
+- **Memory:** Group into 3-4 panels (Traffic, Conversions, Errors) = fits 4±1 chunks
+- **Gestalt:** Proximity (related metrics within panel), similarity (consistent colors), whitespace (panel separation)
+- **Encoding:** Bar charts (comparisons), line charts (trends), avoid pies (poor angle perception)
+- **Mental Models:** Standard chart types, conventional axes (time left-to-right), familiar icons + labels
+
+**Result:** 5-second comprehension vs 60-second confusion
+
+---
+
+### Form Wizard Example
+
+**Problem:** 30-field form → 60% abandonment
+
+**Applied Principles:**
+- **Attention:** 4 steps revealed gradually, current step prominent
+- **Memory:** 4-6 fields per step (fits 4±1 capacity), progress indicator visible (externalizes state)
+- **Gestalt:** Related fields grouped (name, address), whitespace between groups
+- **Recognition over Recall:** Show step names ("Personal Info" → "Account" → "Preferences" → "Review"), display entered info in review, enable back navigation
+- **Natural Mapping:** Linear flow left-to-right/top-to-bottom, "Next" button bottom-right consistently
+
+**Result:** 75% completion, faster task time
--- a/skills/cognitive-design/resources/data-visualization.md
+++ b/skills/cognitive-design/resources/data-visualization.md
@@ -0,0 +1,497 @@
+# Data Visualization
+
+This resource provides cognitive design principles specifically for charts, dashboards, and data-driven presentations.
+
+**Covered topics:**
+1. Chart selection via task-encoding alignment
+2. Dashboard hierarchy and grouping
+3. Progressive disclosure for interactive exploration
+4. Narrative data visualization
+
+---
+
+## Why Data Visualization Needs Cognitive Design
+
+### WHY This Matters
+
+**Core insight:** Data visualization is cognitive communication - the goal is insight transfer from data to human understanding. Success requires matching visual encodings to human perceptual strengths.
+
+**Common problems:**
+- Wrong chart type chosen (pie when bar would be clearer)
+- Dashboards overwhelming users with too many metrics
+- Interactive tools causing anxiety (users fear getting lost)
+- Data stories lacking clear narrative guidance
+
+**How cognitive principles help:**
+- Choose encodings that exploit perceptual accuracy (Cleveland & McGill hierarchy)
+- Design dashboards within working memory limits (chunking, hierarchy)
+- Structure exploration to reduce cognitive load (progressive disclosure, visible state)
+- Guide interpretation through annotations and narrative flow
+
+**Mental model:** Data visualization is a translation problem - translating abstract data into visual form that human perception can process efficiently.
+
+---
+
+## What You'll Learn
+
+**Four key areas:**
+
+1. **Chart Selection:** Match visualization type to user task and perceptual capabilities
+2. **Dashboard Design:** Organize multiple visualizations for at-a-glance comprehension
+3. **Interactive Exploration:** Enable data investigation without overwhelming or losing users
+4. **Narrative Visualization:** Tell data stories with clear flow and guided interpretation
+
+---
+
+## Why Chart Selection Matters
+
+### WHY This Matters
+
+**Core insight:** Not all visual encodings serve all tasks equally well - position/length enable 5-10x more accurate comparison than angle/area.
+
+**Cleveland & McGill's Perceptual Hierarchy (most to least accurate):**
+1. Position along common scale
+2. Position along non-aligned scales
+3. Length
+4. Angle/Slope
+5. Area
+6. Volume
+7. Color hue
+8. Color saturation
+
+**Implication:** Chart type choice directly determines user accuracy in extracting insights.
+
+**Mental model:** Different charts are tools for different jobs - bar chart is like precision calipers (accurate comparison), pie chart is like eyeballing (rough proportion sense).
+
+---
+
+### WHAT to Apply
+
+#### Task-to-Chart Mapping
+
+**User Task: Compare Values**
+
+**Best choice:** Bar chart (horizontal or vertical)
+- **Why:** Position/length encoding (top of hierarchy)
+- **Enables:** Precise magnitude comparison, easy ranking
+- **Example:** Comparing sales across 6 regions
+
+**Avoid:** Pie chart
+- **Why:** Angle/area encoding (lower accuracy)
+- **Problem:** Difficult to judge which slice is larger when differences are subtle
+
+**Application rule:**
+```
+For precise comparison → bar chart with common baseline
+Sort descending for easy ranking
+Add data labels for exact values (dual coding)
+```
+
+---
+
+**User Task: Show Trend Over Time**
+
+**Best choice:** Line chart
+- **Why:** Position along time axis shows progression naturally
+- **Enables:** Pattern detection (rising, falling, cyclical), slope perception
+- **Example:** Monthly revenue over 2 years
+
+**Avoid:** Bar chart for trends (acceptable but line is clearer), stacked area (hard to compare non-bottom series)
+
+**Application rule:**
+```
+Time always on x-axis (left to right)
+Use line for continuous data, bar for discrete periods
+Limit to 5-7 lines max (more becomes tangled mess)
+Annotate significant events ("Product launch Q2")
+```
+
+---
+
+**User Task: Understand Distribution**
+
+**Best choice:** Histogram or box plot
+- **Why:** Shows shape (normal, skewed, bimodal), spread, outliers
+- **Enables:** Pattern recognition (is data normally distributed?), outlier identification
+- **Example:** Distribution of customer order values
+
+**Avoid:** Pie charts (can't show distribution), bar charts of individual values (too granular)
+
+**Application rule:**
+```
+Histogram: Choose appropriate bin width (too narrow = noise, too wide = hide patterns)
+Box plot: Shows median, quartiles, outliers clearly
+Include sample size in label for context
+```
+
+---
+
+**User Task: Show Part-to-Whole Relationship**
+
+**Acceptable choice:** Pie chart (BUT only if ≤5 slices and differences clear)
+- **Why:** Circular metaphor intuitive for "whole"
+- **Limitations:** Hard to judge angles, tiny slices unreadable
+
+**Better choice:** Stacked bar chart or treemap
+- **Why:** Position/length still more accurate than angle
+- **Enables:** Easier comparison of parts
+
+**Application rule:**
+```
+If using pie: ≤5 slices, sort descending, label percentages directly
+Better: Stacked bar (compare lengths) or treemap (hierarchical parts)
+Always ask: "Do users need precise comparison?" → If yes, avoid pie
+```
+
+---
+
+**User Task: Find Outliers or Clusters**
+
+**Best choice:** Scatterplot
+- **Why:** 2D position enables pattern detection, outliers pop out preattentively
+- **Enables:** Correlation visualization, cluster identification
+- **Example:** Relationship between marketing spend and revenue by product
+
+**Application rule:**
+```
+Add trend line if correlation expected
+Use color/shape for categorical third variable (max 5-7 categories)
+Label outliers with annotations
+Consider log scale if data spans orders of magnitude
+```
+
+---
+
+**User Task: Compare Multiple Variables**
+
+**Best choice:** Small multiples (repeated chart structure)
+- **Why:** Consistent structure enables pattern comparison across conditions
+- **Enables:** Seeing how patterns change by category
+- **Example:** Monthly sales trends for each product category (separate line chart per category)
+
+**Application rule:**
+```
+Keep axes consistent across multiples for fair comparison
+Arrange in logical order (by average, alphabetically, geographically)
+Limit to 6-12 multiples (more requires pagination)
+Highlight one for focus if needed
+```
+
+---
+
+## Why Dashboard Design Matters
+
+### WHY This Matters
+
+**Core insight:** Dashboards combine many visualizations competing for limited attention - without clear hierarchy and grouping, users experience cognitive overload.
+
+**Problems dashboards solve:**
+- At-a-glance status monitoring
+- Quick decision-making
+- Pattern detection across metrics
+
+**Cognitive challenges:**
+- Information density overwhelming working memory
+- No clear entry point (where to look first?)
+- Related metrics not visually grouped
+- Excessive scrolling breaks mental model
+
+**Mental model:** Dashboard is like airplane cockpit - instruments grouped by system, critical alerts preattentively salient, most important gauges in pilot's direct view.
+
+---
+
+### WHAT to Apply
+
+#### Visual Hierarchy for Dashboards
+
+**Principle:** Establish clear focal point with size, position, and contrast
+
+**Application pattern:**
+
+**Primary KPIs (3-4 max):**
+```
+- Position: Top-left (F-pattern start)
+- Size: Large numbers (36-48px)
+- Style: Bold, high contrast
+- Purpose: "What's the current status?" answered in 5 seconds
+```
+
+**Secondary Metrics (5-10):**
+```
+- Position: Below or right of primary KPIs
+- Size: Medium (18-24px)
+- Grouping: Clustered by relationship (all conversion metrics together)
+- Purpose: Supporting detail for primary KPIs
+```
+
+**Tertiary/Details:**
+```
+- Position: Lower sections or drill-down
+- Size: Smaller (12-16px)
+- Access: Details on demand (click to expand)
+- Purpose: Deep investigation, not monitoring
+```
+
+**Example:** E-commerce dashboard - Primary KPIs top-left (revenue, conversion, users), secondary metrics mid (channels, products), tertiary bottom (details, history).
+
+---
+
+#### Gestalt Grouping for Clarity
+
+**Principle:** Use proximity, similarity, and whitespace to show relationships
+
+**Proximity:** Related metrics close, unrelated separated by whitespace
+**Similarity:** Consistent styling (same charts styled same way, color encoding consistent)
+**Example:** Traffic panel (sessions, pageviews, bounce), Conversion panel (rate, revenue, AOV) separated by whitespace, red for errors throughout
+
+---
+
+#### Chunking for Working Memory
+
+**Principle:** Limit concurrent visualizations to 4±1 major groups
+
+**Application rule:** Group >15 metrics into 3-5 categories, each with 3-5 metrics max = 9-25 metrics organized into 4±1 chunks (fits working memory).
+
+---
+
+#### Preattentive Salience for Alerts
+
+**Principle:** Use red color ONLY for threshold violations requiring immediate action
+
+**Application rule:** Normal = gray, alert = red (critical only). Avoid red for all negatives (causes alert fatigue). Example: Revenue down 2% = gray, down 15% = red threshold violation.
+
+---
+
+#### Data-Ink Ratio Maximization
+
+**Principle:** Remove decorative elements; maximize ink showing actual data
+
+**Remove:** Heavy gridlines, 3D effects, gradients, excessive decimals, decorative icons, chart borders
+**Keep:** Data (points/lines/bars), minimal axes, direct labels, meaningful annotations
+
+---
+
+## Why Progressive Disclosure Matters
+
+### WHY This Matters
+
+**Core insight:** Users fear getting lost in complex data exploration - progressive disclosure (overview first, zoom/filter, then details) provides safety net and reduces cognitive load.
+
+**Shneiderman's Mantra:** "Overview first, zoom and filter, then details on demand"
+
+**Benefits:**
+- Reduces initial cognitive load (don't show everything at once)
+- Provides context before detail (prevents disorientation)
+- Enables personalized investigation (users drill into what they care about)
+- Lowers anxiety (visible state shows "where am I", undo provides safety)
+
+**Mental model:** Like Google Maps - start with full map view (overview), zoom to neighborhood (filter), click building for details (details on demand).
+
+---
+
+### WHAT to Apply
+
+#### Overview Level
+
+**Purpose:** Provide context and entry point
+
+**Design principles:**
+```
+Show: Overall pattern, main trends, big picture
+Hide: Granular details, outliers (show in drill-down)
+Enable: Quick understanding of "what's the story here?"
+```
+
+**Example:**
+```
+Sales Dashboard Overview:
+- Total sales number (aggregated)
+- Trend line (last 90 days)
+- Top 3 performing regions (summary)
+User understands: "Sales are up 12% this quarter, West region leading"
+```
+
+---
+
+#### Filter/Zoom Level
+
+**Purpose:** Focus on subset of interest
+
+**Design principles:**
+```
+Show: Filtered results clearly
+Externalize: Active filters as visible chips/tags (recognition, not recall)
+Enable: Easy modification (click chip to remove filter)
+Provide: Clear "Reset all filters" option
+```
+
+**Example:**
+```
+User clicks "West region" from overview:
+- Dashboard updates to show only West region data
+- Visible chip at top: "Region: West" with X to remove
+- Breadcrumb: "All Regions > West"
+- "Reset" button returns to overview
+User sees: "I'm viewing West region data, can easily get back"
+```
+
+---
+
+#### Details on Demand Level
+
+**Purpose:** Reveal specifics only when needed
+
+**Design principles:**
+```
+Trigger: Hover, click, or expand action (not visible by default)
+Show: Granular data, additional metrics, raw numbers
+Position: Tooltip, modal, or expandable section (doesn't disrupt main view)
+```
+
+**Example:**
+```
+User hovers over data point on trend line:
+- Tooltip appears: "June 15, 2024: $45,234 sales, 234 orders, $193 AOV"
+- User moves mouse away: Tooltip disappears
+- Main chart unchanged (not cluttered with all this detail by default)
+```
+
+---
+
+#### State Visibility
+
+**Principle:** Always show current navigation state - don't make users remember
+
+**Critical state to externalize:**
+```
+✓ Active filters (as removable chips)
+✓ Current zoom level or time range
+✓ Drilled-down path (breadcrumbs)
+✓ Sorting/grouping applied
+✓ Comparison baseline (if comparing to previous period)
+```
+
+**Example implementation:**
+```
+Dashboard header always shows:
+- Time range: "Last 30 days" (clickable to change)
+- Filters: [Region: West] [Product: Widget] (each with X to remove)
+- Comparison: "vs. previous 30 days" (toggle on/off)
+- Breadcrumb: "Overview > West > Widget Sales"
+
+User never wonders: "What am I looking at? How did I get here?"
+```
+
+---
+
+## Why Narrative Visualization Matters
+
+### WHY This Matters
+
+**Core insight:** People naturally seek stories with cause-effect and chronology - structuring data as narrative chunks information meaningfully and aids retention.
+
+**Benefits of narrative structure:**
+- Easier to process (story arc vs heap of facts)
+- Better retention (narrative = memorable)
+- Guided interpretation (annotations prevent misunderstanding)
+- Emotional engagement (storytelling activates empathy)
+
+**Mental model:** Data journalism isn't a data dump - it's a guided tour where the designer leads readers to insights.
+
+---
+
+### WHAT to Apply
+
+#### Narrative Structure
+
+**Classic arc:** Context → Problem/Question → Data/Evidence → Resolution/Insight
+
+**Application to visualization:**
+
+**Context (What's the situation?):**
+```
+- Title that sets stage: "U.S. Solar Energy Adoption 2010-2024"
+- Subtitle: "How policy changes drove growth"
+- Brief intro text or annotation providing background
+```
+
+**Problem/Question (What are we investigating?):**
+```
+- Visual question posed: "Did tax incentives accelerate adoption?"
+- Annotation highlighting the question point on chart
+```
+
+**Data/Evidence (What does data show?):**
+```
+- Chart showing trend with clear inflection point
+- Annotation: "Tax credit introduced 2015" with arrow to spike
+- Supporting data: % increase after policy vs before
+```
+
+**Resolution/Insight (What did we learn?):**
+```
+- Conclusion annotation: "Adoption rate tripled post-incentive"
+- Implications: "States with stronger incentives saw faster growth"
+- Call-to-action or next question (optional)
+```
+
+---
+
+#### Annotation Strategy
+
+**Principle:** Guide attention to key insights; prevent misinterpretation
+
+**Types of annotations:**
+
+**Callout boxes:**
+```
+Purpose: Highlight main insight
+Position: Near relevant data point
+Style: Contrasting color or background
+Content: 1-2 sentences max ("Sales spiked 45% after campaign launch")
+```
+
+**Arrows/Lines:**
+```
+Purpose: Connect explanation to data
+Use: Point from text to specific data element
+Example: Arrow from "Product launch" text to spike in line chart
+```
+
+**Shaded regions:**
+```
+Purpose: Mark time periods or ranges
+Use: Highlight recession periods, policy changes, events
+Example: Gray shaded region "COVID-19 lockdown Mar-May 2020"
+```
+
+**Direct labels:**
+```
+Purpose: Replace legend lookups
+Use: Label lines/bars directly instead of requiring legend
+Example: Line chart with "North region" text next to the line (not legend box)
+```
+
+**Application rule:**
+```
+Annotate: Main insight (what users should notice)
+Annotate: Unusual events (explain outliers/anomalies)
+Don't annotate: Obvious patterns (if trend is clear, don't state "going up")
+Test: Can user grasp message without annotations? If not, add guidance
+```
+
+---
+
+#### Scrollytelling
+
+**Definition:** Visualization that updates or highlights aspects as user scrolls
+
+**Benefits:**
+- Maintains context (chart stays visible while story progresses)
+- Reveals complexity gradually (progressive disclosure in narrative form)
+- Engaging (interactive storytelling)
+
+**Application pattern:**
+
+**Example progression:** Start with full context (0%) → Highlight specific periods as user scrolls (33%, 66%) → End with full picture + projection (100%). Chart stays visible, smooth transitions, user can scroll back, provide skip option.
+
--- a/skills/cognitive-design/resources/educational-design.md
+++ b/skills/cognitive-design/resources/educational-design.md
@@ -0,0 +1,424 @@
+# Educational Design
+
+This resource provides cognitive design principles for instructional materials, e-learning courses, and educational software.
+
+**Covered topics:**
+1. Multimedia learning principles (Mayer's principles)
+2. Dual coding theory
+3. Worked examples for skill acquisition
+4. Retrieval practice for retention
+5. Segmenting and coherence
+
+---
+
+## Why Educational Design Needs Cognitive Principles
+
+### WHY This Matters
+
+**Core insight:** Human learning is constrained by working memory limits and processing channels - instructional design must align with these cognitive realities.
+
+**Common problems:**
+- Dense slides with text paragraphs + complex diagrams (split attention)
+- Passive reading/watching (weak memory traces)
+- Decorative graphics competing with instructional content
+- Information overload (exceeds working memory)
+- No active recall opportunities (retrieval practice missing)
+
+**How cognitive principles help:**
+- **Dual coding:** Combine relevant visuals + words (two memory traces)
+- **Modalities principle:** Audio narration + visuals (splits load across channels)
+- **Coherence:** Remove extraneous content (frees working memory)
+- **Segmenting:** Break into chunks (fits working memory)
+- **Retrieval practice:** Active recall strengthens retention
+
+**Research foundation:** Richard Mayer's multimedia learning principles, John Sweller's cognitive load theory, Paivio's dual coding theory
+
+---
+
+## What You'll Learn
+
+**Five key areas:**
+
+1. **Multimedia Principles:** How to combine words, pictures, and audio effectively
+2. **Dual Coding:** Leveraging visual and verbal processing channels
+3. **Worked Examples:** Teaching complex procedures efficiently
+4. **Retrieval Practice:** Active recall for long-term retention
+5. **Segmenting & Coherence:** Chunking content and removing noise
+
+---
+
+## Why Multimedia Principles Matter
+
+### WHY This Matters
+
+**Core insight:** People have separate processing channels for visual and verbal information (Baddeley's working memory model) - proper multimedia design leverages both without overloading either.
+
+**Baddeley's Model:**
+- **Phonological loop:** Processes spoken/written words
+- **Visuospatial sketchpad:** Processes images/spatial information
+- **Central executive:** Coordinates both channels
+
+**Implication:** You can split cognitive load across channels, but wrong combinations cause interference.
+
+---
+
+### WHAT to Apply
+
+#### Multimedia Principle
+
+**Principle:** People learn better from words + pictures than words alone
+
+**Evidence:** Dual coding creates two memory traces instead of one (Paivio 1971, Mayer 2001)
+
+**Application:**
+```
+❌ Text-only explanation of process
+✓ Diagram showing process + text labels
+✓ Video demonstrating concept + verbal explanation
+
+Example: Teaching how heart pumps blood
+❌ Text description only
+✓ Animated diagram of heart + narration
+Result: 50-100% better retention with multimedia
+```
+
+**Caution:** Pictures must be RELEVANT to content
+```
+❌ Decorative stock photo of "learning" (generic student at desk)
+✓ Annotated diagram directly supporting concept
+```
+
+---
+
+#### Modality Principle
+
+**Principle:** Audio narration + visuals better than on-screen text + visuals
+
+**Why:** On-screen text + complex visual both compete for visual channel (overload)
+
+**Application:**
+```
+For animations or complex diagrams:
+❌ Dense on-screen text + diagram (visual channel overloaded)
+✓ Audio narration + diagram (splits load across channels)
+
+Example: Explaining software interface
+❌ Screenshot with text callouts explaining every feature
+✓ Screenshot + voiceover explaining each feature
+Result: Reduces cognitive load, improves comprehension
+```
+
+**Exception:** For static text-heavy content (articles, code), on-screen text is fine
+- Reader controls pace
+- Can re-read as needed
+- Narration unnecessary
+
+---
+
+#### Spatial Contiguity Principle
+
+**Principle:** Place text near corresponding graphics, not separated
+
+**Why:** Prevents holding one in memory while searching for the other (split attention)
+
+**Application:**
+```
+Diagram with labels:
+❌ Diagram on left, labels in legend/list on right (requires visual search + memory)
+✓ Labels directly ON or immediately adjacent to diagram parts
+
+Example: Anatomy diagram
+❌ Numbered diagram + separate key (1. Heart, 2. Lungs...)
+✓ Direct labels on organs + leader lines
+Result: Instant association, no memory burden
+```
+
+---
+
+#### Temporal Contiguity Principle
+
+**Principle:** Present corresponding narration and animation simultaneously, not sequentially
+
+**Why:** Holding one in memory while waiting for the other adds cognitive load
+
+**Application:**
+```
+Video lesson:
+❌ Show full animation, then explain what happened (requires remembering animation)
+✓ Narrate while animation plays (synchronized)
+
+Example: Chemistry reaction
+❌ Play full reaction animation → then explain
+✓ Narrate each step as it's happening
+Result: Immediate connection between visual and explanation
+```
+
+---
+
+#### Coherence Principle
+
+**Principle:** Exclude extraneous material - every element should support learning goal
+
+**What to remove:**
+```
+❌ Decorative graphics unrelated to content
+❌ Background music during instruction
+❌ Tangential interesting stories (if they don't support main point)
+❌ Excessive detail beyond learning objective
+
+✓ Keep: Relevant diagrams, supporting examples, meaningful practice
+```
+
+**Application:**
+```
+Slide design:
+Before (violates coherence):
+- Stock photo of "teamwork" (decorative)
+- Background music playing
+- Tangent about company history
+- Dense paragraph with extra details
+→ Cognitive overload from extraneous content
+
+After (coherent):
+- Diagram directly illustrating concept
+- No background music
+- Focus only on learning objective
+- Concise explanation
+→ All working memory devoted to learning
+```
+
+**Evidence:** Extraneous content can reduce learning by 30-50% (Mayer)
+
+---
+
+#### Signaling Principle
+
+**Principle:** Highlight essential material to guide attention
+
+**Application:**
+```
+✓ Bold key terms first time introduced
+✓ Headings/subheadings show structure
+✓ Arrows/circles on diagrams highlighting key elements
+✓ Verbal cues: "The most important point is..."
+✓ Color highlighting for critical information (use sparingly)
+
+Example: Complex diagram
+Without signaling: User must determine what's important
+With signaling: Arrows point to key mechanism, key part highlighted
+Result: Attention directed to essentials
+```
+
+---
+
+#### Segmenting Principle
+
+**Principle:** Break lessons into user-paced segments rather than continuous presentation
+
+**Why:** Fits working memory limits, allows consolidation before next chunk
+
+**Application:**
+```
+❌ 30-minute continuous lecture video (cognitive overload)
+✓ 6 segments × 5 minutes each, user clicks "Next" to continue
+
+Benefits:
+- Fits attention span
+- User controls pace (can pause/replay)
+- Breaks between segments allow consolidation
+- Can skip ahead if already know topic
+```
+
+**Optimal segment length:** 3-7 minutes per concept
+
+---
+
+## Why Dual Coding Matters
+
+### WHY This Matters
+
+**Dual Coding Theory (Paivio):** Humans process visual and verbal information through separate channels that can reinforce each other.
+
+**Benefits:**
+- Two memory traces instead of one (redundancy aids recall)
+- Visual channel good for spatial/concrete concepts
+- Verbal channel good for abstract/sequential concepts
+- Combined = stronger encoding
+
+---
+
+### WHAT to Apply
+
+**Application patterns:**
+
+**Text + Diagram:**
+```
+Example: Explaining data structure
+✓ Code snippet (verbal) + visual diagram of structure
+Result: Can recall via either channel
+```
+
+**Narration + Illustration:**
+```
+Example: Historical event
+✓ Illustrated timeline + audio story
+Result: Visual spatial memory + verbal narrative memory
+```
+
+**Caution - Avoid Redundant Text:**
+```
+❌ On-screen text identical to audio narration (doesn't add channel, just duplicates)
+✓ On-screen keywords/outline + audio detailed explanation
+```
+
+---
+
+## Why Worked Examples Matter
+
+### WHY This Matters
+
+**Core insight:** For novices learning procedures, worked examples reduce extraneous cognitive load and allow focus on solution patterns.
+
+**Problem-solving (novice):**
+- High cognitive load (exploring solution space)
+- Many wrong paths taken
+- Limited capacity for noticing patterns
+
+**Worked example (novice):**
+- Low extraneous load (no exploring)
+- All capacity devoted to understanding steps
+- Can study solution pattern
+
+**Application:** Transition from worked examples → partially completed examples → full problems
+
+---
+
+### WHAT to Apply
+
+**Worked Example Structure:**
+
+```
+Step 1: Problem statement
+Step 2: Solution shown with explanation of each step
+Step 3: Principle highlighted: "Notice how we..."
+
+Example: Math problem
+Instead of: "Solve this equation: 3x + 7 = 19"
+Better:
+  Problem: 3x + 7 = 19
+  Solution:
+    3x + 7 = 19
+    3x = 12       (subtract 7 from both sides - inverse operation)
+    x = 4         (divide both sides by 3 - inverse operation)
+  Principle: Use inverse operations to isolate variable
+```
+
+**Fading:**
+```
+Start: Full worked example
+Next: Partially worked (complete last step)
+Then: Start provided, learner completes middle + end
+Finally: Full problem-solving
+```
+
+---
+
+## Why Retrieval Practice Matters
+
+### WHY This Matters
+
+**Testing effect:** Practicing retrieval (active recall) creates stronger memory traces than passive re-reading.
+
+**Evidence:** Retrieval practice improves long-term retention by 30-50% vs passive study (Roediger & Karpicke)
+
+**Why it works:**
+- Active recall strengthens memory pathways
+- Identifies gaps in knowledge (metacognitive benefit)
+- Desirable difficulty (requires effort = better encoding)
+
+---
+
+### WHAT to Apply
+
+**Application patterns:**
+
+**Embedded quizzes:**
+```
+After each segment: 2-3 questions testing key concepts
+✓ Multiple choice (forces retrieval)
+✓ Short answer (even better - must generate answer)
+✓ Immediate explanatory feedback (not just "correct/incorrect")
+
+Example:
+After video on Gestalt principles:
+Q: "Which principle explains why we see related items as grouped when they're placed close together?"
+A: Proximity principle
+Feedback: "Correct! Proximity is the tendency to group nearby elements. This is why we use whitespace to separate unrelated content."
+```
+
+**Spaced repetition:**
+```
+Immediate: Quiz at end of lesson
+1 day later: Review quiz
+1 week later: Cumulative quiz
+1 month later: Final assessment
+
+Spacing effect: Distributed practice beats massed practice
+```
+
+**Low-stakes practice:**
+```
+✓ Formative quizzes don't count toward grade (reduces anxiety)
+✓ Unlimited attempts (learning goal, not evaluation)
+✓ Explanatory feedback (teaching moment)
+```
+
+---
+
+## Why Segmenting & Coherence Matter
+
+### WHY This Matters
+
+**Segmenting:** Prevents cognitive overload by chunking within working memory limits
+
+**Coherence:** Eliminates extraneous load so all capacity devoted to learning
+
+**Together:** Essential for managing cognitive load in complex material
+
+---
+
+### WHAT to Apply
+
+**Segmenting strategies:**
+
+```
+30-minute topic divided into:
+- Segment 1 (5 min): Concept introduction + first example
+- Pause (user clicks next)
+- Segment 2 (5 min): Second example + principle
+- Pause
+- Segment 3 (5 min): Practice problem
+- Pause
+- Segment 4 (5 min): Application to real scenario
+- Pause
+- Segment 5 (5 min): Summary + quiz
+
+Benefits: Working memory not overloaded, consolidation between segments
+```
+
+**Coherence strategies:**
+
+```
+Remove:
+❌ Decorative stock photos
+❌ Background music
+❌ Tangential fun facts (if they don't support learning objective)
+❌ Overly detailed explanations beyond scope
+
+Keep:
+✓ Relevant diagrams supporting concept
+✓ Concrete examples illustrating principle
+✓ Practice problems applying knowledge
+✓ Summaries reinforcing key points
+```
+
--- a/skills/cognitive-design/resources/evaluation-rubric.json
+++ b/skills/cognitive-design/resources/evaluation-rubric.json
@@ -0,0 +1,83 @@
+{
+  "criteria": [
+    {
+      "name": "Completeness",
+      "description": "Are all required components present and comprehensive?",
+      "scores": {
+        "1": "Major components missing - skill has significant gaps in coverage",
+        "2": "Several important components missing or incomplete - limited utility",
+        "3": "Core components present but some supporting elements missing - adequate coverage",
+        "4": "Nearly complete with only minor gaps - good comprehensive coverage",
+        "5": "Exceptionally complete - all core and supporting components present with depth"
+      }
+    },
+    {
+      "name": "Clarity",
+      "description": "Are instructions clear, unambiguous, and easy to understand?",
+      "scores": {
+        "1": "Instructions confusing or ambiguous - difficult to follow",
+        "2": "Some clarity but significant ambiguity remains - requires interpretation",
+        "3": "Generally clear with occasional ambiguity - mostly understandable",
+        "4": "Clear and well-explained with minimal ambiguity - easy to follow",
+        "5": "Crystal clear with excellent explanations and examples - no ambiguity"
+      }
+    },
+    {
+      "name": "Actionability",
+      "description": "Can the skill be followed step-by-step and applied in practice?",
+      "scores": {
+        "1": "Theoretical only - no actionable steps or practical guidance",
+        "2": "Limited actionability - some steps but difficult to apply",
+        "3": "Adequately actionable - can be followed with some effort",
+        "4": "Highly actionable - clear steps and practical application guidance",
+        "5": "Exceptionally actionable - detailed steps, examples, and immediate applicability"
+      }
+    },
+    {
+      "name": "Structure",
+      "description": "Is the skill logically organized with clear navigation?",
+      "scores": {
+        "1": "Poorly organized - difficult to navigate, no clear structure",
+        "2": "Weak organization - some structure but hard to find information",
+        "3": "Adequately organized - logical structure, acceptable navigation",
+        "4": "Well organized - clear structure and easy navigation",
+        "5": "Exceptionally organized - intuitive structure, excellent navigation, linked resources"
+      }
+    },
+    {
+      "name": "Examples",
+      "description": "Are sufficient concrete examples provided to illustrate concepts?",
+      "scores": {
+        "1": "No examples provided - purely theoretical",
+        "2": "Minimal examples - insufficient to illustrate concepts clearly",
+        "3": "Adequate examples - covers main concepts with at least one example each",
+        "4": "Good examples - multiple concrete illustrations across most concepts",
+        "5": "Excellent examples - comprehensive concrete illustrations for all major concepts"
+      }
+    },
+    {
+      "name": "Triggers",
+      "description": "Is 'when to use' clearly defined with specific trigger conditions?",
+      "scores": {
+        "1": "No trigger conditions specified - unclear when to use",
+        "2": "Vague triggers - difficult to know when skill applies",
+        "3": "Adequate triggers - general guidance on when to use",
+        "4": "Clear triggers - specific conditions and scenarios well-defined",
+        "5": "Excellent triggers - comprehensive when/when-not guidance with concrete scenarios"
+      }
+    },
+    {
+      "name": "Cognitive Foundation",
+      "description": "Are cognitive psychology principles well-explained and properly applied?",
+      "scores": {
+        "1": "No cognitive foundation - purely prescriptive rules without explanation",
+        "2": "Weak cognitive foundation - limited explanation of underlying principles",
+        "3": "Adequate cognitive foundation - basic principles explained",
+        "4": "Strong cognitive foundation - principles well-explained with research backing",
+        "5": "Exceptional cognitive foundation - deep principles, research citations, WHY clearly explained"
+      }
+    }
+  ],
+  "threshold": 3.5,
+  "passing_note": "Average score must be ≥ 3.5 for skill to be considered production-ready. Scores below 3 in any individual criterion require revision before deployment."
+}
--- a/skills/cognitive-design/resources/evaluation-tools.md
+++ b/skills/cognitive-design/resources/evaluation-tools.md
@@ -0,0 +1,411 @@
+# Evaluation Tools
+
+This resource provides systematic checklists and frameworks for evaluating designs against cognitive principles.
+
+**Tools covered:**
+1. Cognitive Design Checklist (general interface/visualization evaluation)
+2. Visualization Audit Framework (4-criteria data visualization quality assessment)
+
+---
+
+## Why Systematic Evaluation
+
+### WHY This Matters
+
+**Core insight:** Cognitive design has multiple dimensions - visibility, hierarchy, chunking, consistency, feedback, memory support, integrity. Ad-hoc review often misses issues in one or more dimensions.
+
+**Benefits of systematic evaluation:**
+- **Comprehensive coverage:** Ensures all cognitive principles checked
+- **Objective assessment:** Reduces subjective bias
+- **Catches issues early:** Before launch or during design critiques
+- **Team alignment:** Shared criteria for quality
+- **Measurable improvement:** Track fixes over time
+
+**Mental model:** Like a pre-flight checklist for pilots - systematically verify all critical systems before takeoff.
+
+Without systematic evaluation: missed cognitive issues, inconsistent quality, user confusion that could have been prevented.
+
+**Use when:**
+- Conducting design reviews/critiques
+- Evaluating existing designs for improvement
+- Quality assurance before launch
+- Diagnosing why design feels "off"
+- Teaching/mentoring cognitive design
+
+---
+
+## What You'll Learn
+
+**Two complementary tools:**
+
+**Cognitive Design Checklist:** General-purpose evaluation for any interface, visualization, or content
+- Quick questions across 6 dimensions
+- Suitable for any design context
+- 10-15 minutes for thorough review
+
+**Visualization Audit Framework:** Specialized 4-criteria assessment for data visualizations
+- Clarity, Efficiency, Integrity, Aesthetics
+- Systematic quality scoring
+- 15-30 minutes depending on complexity
+
+---
+
+## Why Cognitive Design Checklist
+
+### WHY This Matters
+
+**Purpose:** Catch glaring cognitive problems before they reach users.
+
+**Coverage areas:**
+1. Visibility & Comprehension (can users see and understand?)
+2. Visual Hierarchy (what gets noticed first?)
+3. Chunking & Organization (fits working memory?)
+4. Simplicity & Clarity (extraneous elements removed?)
+5. Memory Support (state externalized?)
+6. Feedback & Interaction (immediate responses?)
+7. Consistency (patterns maintained?)
+8. Scanning Patterns (layout leverages F/Z-pattern?)
+
+**Mental model:** Like a doctor's diagnostic checklist - systematically check each vital sign.
+
+---
+
+### WHAT to Check
+
+#### 1. Visibility & Immediate Comprehension
+
+**Goal:** Core message/purpose graspable in ≤5 seconds
+
+**Checklist:**
+- [ ] Can users identify the purpose/main message within 5 seconds? (5-second test)
+- [ ] Is important information visible without scrolling (above fold)?
+- [ ] Is text/content legible? (sufficient size, contrast, line length)
+- [ ] Are interactive elements distinguishable from static content?
+
+**Test:** 5-second test (show design, ask what they remember). **Pass:** Identify purpose. **Fail:** Remember decoration or miss point.
+**Common failures:** Cluttered layout, poor contrast, content buried below fold.
+**Fix priorities:** CRITICAL (contrast), HIGH (5-second clarity), MEDIUM (hierarchy)
+
+---
+
+#### 2. Visual Hierarchy
+
+**Goal:** Users can distinguish primary vs secondary vs tertiary content
+
+**Checklist:**
+- [ ] Is visual hierarchy clear? (size, contrast, position differentiate importance)
+- [ ] Do headings/labels form clear levels? (H1 > H2 > H3 > body)
+- [ ] Does design pass "squint test"? (important elements still visible when blurred)
+- [ ] Are calls-to-action visually prominent?
+
+**Test:** Squint test (blur design). **Pass:** Important elements visible when blurred. **Fail:** Everything same weight.
+**Common failures:** Everything same size, primary CTA not distinguished, decoration more prominent than data.
+**Fix priorities:** HIGH (primary not prominent), MEDIUM (heading hierarchy), LOW (minor adjustments)
+
+---
+
+#### 3. Chunking & Organization
+
+**Goal:** Information grouped to fit working memory capacity (4±1 chunks, max 7)
+
+**Checklist:**
+- [ ] Are long lists broken into categories? (≤7 items per unbroken list)
+- [ ] Are related items visually grouped? (proximity, backgrounds, whitespace)
+- [ ] Is navigation organized into logical categories? (≤7 top-level items)
+- [ ] Are form fields grouped by relationship? (personal info, account, preferences)
+
+**Test:** Count ungrouped items. **Pass:** ≤7 items or clear grouping. **Fail:** >7 items ungrouped.
+**Common failures:** 15+ flat navigation, 30-field ungrouped form, 20 equal-weight metrics.
+**Fix priorities:** CRITICAL (>10 ungrouped), HIGH (7-10 ungrouped), MEDIUM (clearer groups)
+
+---
+
+#### 4. Simplicity & Clarity
+
+**Goal:** Every element serves user goal; extraneous elements removed
+
+**Checklist:**
+- [ ] Can you justify every visual element? (Does it convey information or improve usability?)
+- [ ] Is data-ink ratio high? (maximize ink showing data, minimize decoration)
+- [ ] Are decorative elements eliminated? (chartjunk, unnecessary lines, ornaments)
+- [ ] Is terminology familiar or explained? (no unexplained jargon)
+
+**Test:** Point to each element, ask "What purpose?" **Pass:** Every element justified. **Fail:** Decorative/unclear elements.
+**Common failures:** Chartjunk (3D, backgrounds, excess gridlines), jargon, redundant elements.
+**Fix priorities:** HIGH (decoration competing with data), MEDIUM (unexplained terms), LOW (minor simplification)
+
+---
+
+#### 5. Memory Support
+
+**Goal:** Users don't need to remember what could be shown (recognition over recall)
+
+**Checklist:**
+- [ ] Is current system state visible? (active filters, current page, progress through flow)
+- [ ] Are navigation breadcrumbs provided? (where am I, how did I get here)
+- [ ] For multi-step processes, is progress shown? (wizard step X of Y)
+- [ ] Are options presented rather than requiring recall? (dropdowns vs typed commands)
+
+**Test:** Identify what users must remember. Ask "Could this be shown?" **Pass:** State visible. **Fail:** Relying on memory.
+**Common failures:** No active filter indication, no progress indicator, hidden state.
+**Fix priorities:** CRITICAL (lost in flow), HIGH (critical state invisible), MEDIUM (minor memory aids)
+
+---
+
+#### 6. Feedback & Interaction
+
+**Goal:** Every action gets immediate, clear feedback
+
+**Checklist:**
+- [ ] Do all interactive elements provide immediate feedback? (hover states, click feedback)
+- [ ] Are loading states shown? (spinners, progress bars for waits >1 second)
+- [ ] Do form fields validate inline? (immediate feedback, not after submit)
+- [ ] Are error messages contextual? (next to problem, not top of page)
+- [ ] Are success confirmations shown? ("Saved", checkmarks)
+
+**Test:** Click/interact. **Pass:** Feedback within 100ms. **Fail:** No feedback or delayed >1s without indicator.
+**Common failures:** No hover states, no loading indicator, errors not contextual, no success confirmation.
+**Fix priorities:** CRITICAL (no feedback for critical actions), HIGH (delayed without loading), MEDIUM (missing hover)
+
+---
+
+#### 7. Consistency
+
+**Goal:** Repeated patterns throughout (terminology, layout, interactions, visual style)
+
+**Checklist:**
+- [ ] Is terminology consistent? (same words for same concepts)
+- [ ] Are UI patterns consistent? (buttons, links, inputs styled uniformly)
+- [ ] Is color usage consistent? (red = error, green = success throughout)
+- [ ] Are interaction patterns predictable? (click/tap behavior consistent)
+
+**Test:** List similar elements, check consistency. **Pass:** Identical styling/behavior. **Fail:** Unjustified variations.
+**Common failures:** Inconsistent terminology ("Email" vs "E-mail"), visual inconsistency (button styles vary), semantic inconsistency (red means error and negative).
+**Fix priorities:** HIGH (terminology), MEDIUM (visual styling), LOW (minor patterns)
+
+---
+
+#### 8. Scanning Patterns
+
+**Goal:** Layout leverages predictable F-pattern or Z-pattern scanning
+
+**Checklist:**
+- [ ] Is primary content positioned top-left? (where scanning starts)
+- [ ] For text-heavy content, does layout follow F-pattern? (top horizontal, then down left, short mid horizontal)
+- [ ] For visual-heavy content, does layout follow Z-pattern? (top-left to top-right, diagonal to bottom-left, then bottom-right)
+- [ ] Are terminal actions positioned bottom-right? (where scanning ends)
+
+**Test:** Trace eye movement (F/Z pattern). **Pass:** Critical elements on path. **Fail:** Important content off path.
+**Common failures:** Primary CTA bottom-left (off Z-pattern), key info middle-right (off F-pattern), patterns ignored.
+**Fix priorities:** MEDIUM (CTA off path), LOW (secondary optimization)
+
+---
+
+## Why Visualization Audit Framework
+
+### WHY This Matters
+
+**Purpose:** Comprehensive quality assessment for data visualizations across four independent dimensions.
+
+**Key insight:** Visualization quality requires success on ALL four criteria - high score on one doesn't compensate for failure on another.
+
+**Four Criteria:**
+1. **Clarity:** Immediately understandable and unambiguous
+2. **Efficiency:** Minimal cognitive effort to extract information
+3. **Integrity:** Truthful and free from misleading distortions
+4. **Aesthetics:** Visually pleasing and appropriate
+
+**Mental model:** Like evaluating a car - needs to be safe (integrity), functional (efficiency), easy to use (clarity), and pleasant (aesthetics). Missing any dimension makes it poor overall.
+
+**Use when:**
+- Evaluating data visualizations (charts, dashboards, infographics)
+- Choosing between visualization alternatives
+- Quality assurance before publication
+- Diagnosing why visualization isn't working
+
+---
+
+### WHAT to Audit
+
+#### Criterion 1: Clarity
+
+**Question:** Is visualization immediately understandable and unambiguous?
+
+**Checklist:**
+- [ ] Is main message obvious or clearly annotated?
+- [ ] Are axes labeled with units?
+- [ ] Is legend clear and necessary? (or use direct labels if possible)
+- [ ] Is title descriptive? (conveys what's being shown)
+- [ ] Are annotations used to guide interpretation?
+- [ ] Is chart type appropriate for message?
+
+**5-Second Test:**
+- Show visualization for 5 seconds
+- Ask: "What's the main point?"
+  - **Pass:** Correctly identify main insight
+  - **Fail:** Confused or remember decorative elements instead
+
+**Scoring:**
+- **5 (Excellent):** Main message graspable in <5 seconds, perfectly labeled
+- **4 (Good):** Clear with minor improvements needed (e.g., better title)
+- **3 (Adequate):** Understandable but requires effort
+- **2 (Needs work):** Ambiguous or missing critical labels
+- **1 (Poor):** Incomprehensible
+
+---
+
+#### Criterion 2: Efficiency
+
+**Question:** Can users extract information with minimal cognitive effort?
+
+**Checklist:**
+- [ ] Are encodings appropriate for task? (position/length for comparison, not angle/area)
+- [ ] Is chart type matched to user task? (compare → bar, trend → line, distribution → histogram)
+- [ ] Is comparison easy? (common baseline, aligned scales)
+- [ ] Is cross-referencing minimized? (direct labels instead of legend lookups)
+- [ ] Are cognitive shortcuts enabled? (sorting by value, highlighting key points)
+
+**Encoding Check:**
+- Identify user task (compare, see trend, find outliers)
+- Check encoding against Cleveland & McGill hierarchy
+  - **Pass:** Position/length used for precise comparisons
+  - **Fail:** Angle/area/color used when position would work better
+
+**Scoring:**
+- **5 (Excellent):** Optimal encoding, zero wasted cognitive effort
+- **4 (Good):** Appropriate with minor inefficiencies
+- **3 (Adequate):** Works but more effort than necessary
+- **2 (Needs work):** Poor encoding choice (pie when bar would be better)
+- **1 (Poor):** Wrong chart type for task
+
+---
+
+#### Criterion 3: Integrity
+
+**Question:** Is visualization truthful and free from misleading distortions?
+
+**Checklist:**
+- [ ] Do axes start at zero (or clearly note truncation)?
+- [ ] Are scale intervals uniform?
+- [ ] Is data complete? (not cherry-picked dates hiding context)
+- [ ] Are comparisons fair? (same scale for compared items)
+- [ ] Is context provided? (baselines, historical comparison, benchmarks)
+- [ ] Are limitations noted? (sample size, data source, margin of error)
+
+**Integrity Tests:**
+1. **Axis test:** Does y-axis start at zero for bar charts? If not, is truncation clearly noted?
+   - **Pass:** Zero baseline or explicit truncation note
+   - **Fail:** Truncated axis exaggerating differences without disclosure
+
+2. **Completeness test:** Is full relevant time period shown? Or cherry-picked subset?
+   - **Pass:** Complete data with context
+   - **Fail:** Selective dates hiding broader trend
+
+3. **Fairness test:** Are compared items on same scale?
+   - **Pass:** Common scale enables fair comparison
+   - **Fail:** Dual-axis manipulation creates false correlation
+
+**Scoring:**
+- **5 (Excellent):** Completely honest, full context provided
+- **4 (Good):** Honest with minor context improvements possible
+- **3 (Adequate):** Not misleading but could provide more context
+- **2 (Needs work):** Distortions present (truncated axis, cherry-picked data)
+- **1 (Poor):** Actively misleading (severe distortions, no context)
+
+**CRITICAL:** Scores below 3 on integrity are unacceptable - fix immediately
+
+---
+
+#### Criterion 4: Aesthetics
+
+**Question:** Is visualization visually pleasing and appropriate for context?
+
+**Checklist:**
+- [ ] Is visual design professional and polished?
+- [ ] Is color palette appropriate? (not garish, suits content tone)
+- [ ] Is whitespace used effectively? (not cramped, not wasteful)
+- [ ] Are typography choices appropriate? (readable, professional)
+- [ ] Does style match context? (serious for finance, friendly for consumer)
+
+**Important:** Aesthetics should NEVER undermine clarity or integrity
+
+**Scoring:**
+- **5 (Excellent):** Beautiful and appropriate, enhances engagement
+- **4 (Good):** Pleasant and professional
+- **3 (Adequate):** Acceptable, not ugly but not polished
+- **2 (Needs work):** Amateurish or inappropriate style
+- **1 (Poor):** Ugly or completely inappropriate
+
+**Trade-off Note:** If forced to choose, prioritize Clarity and Integrity over Aesthetics
+
+---
+
+#### Using the 4-Criteria Framework
+
+**Process:**
+
+**Step 1: Evaluate each criterion independently**
+- Score Clarity (1-5)
+- Score Efficiency (1-5)
+- Score Integrity (1-5)
+- Score Aesthetics (1-5)
+
+**Step 2: Calculate average**
+- Average score = (Clarity + Efficiency + Integrity + Aesthetics) / 4
+- **Pass threshold:** ≥3.5 average
+- **Critical failures:** Any individual score <3 requires attention
+
+**Step 3: Identify weakest dimension**
+- Which criterion has lowest score?
+- This is your primary improvement target
+
+**Step 4: Prioritize fixes**
+1. **CRITICAL:** Integrity < 3 (fix immediately - misleading is unacceptable)
+2. **HIGH:** Clarity or Efficiency < 3 (users can't understand or use it)
+3. **MEDIUM:** Aesthetics < 3 (affects engagement)
+4. **LOW:** Scores 3-4 (optimization, not critical)
+
+**Step 5: Verify fixes don't harm other dimensions**
+- Example: Improving aesthetics shouldn't reduce clarity
+- Example: Improving efficiency shouldn't compromise integrity
+
+---
+
+## Examples of Evaluation in Practice
+
+### Example 1: Dashboard Review Using Checklist
+
+**Context:** Team dashboard with 20 metrics, users overwhelmed and missing alerts
+
+**Checklist Results:**
+- ❌ **Visibility:** Too cluttered, 20 equal-weight metrics
+- ❌ **Hierarchy:** Everything same size, alerts not prominent
+- ❌ **Chunking:** 15 ungrouped metrics (exceeds working memory)
+- ❌ **Simplicity:** Chartjunk (gridlines, 3D, gradients)
+- ❌ **Memory:** No active filter indication
+- ✓ **Feedback:** Hover states, loading indicators present
+- ⚠️ **Consistency:** Mostly consistent, minor button variations
+- ❌ **Scanning:** Key KPI bottom-right (off F-pattern)
+
+**Fixes:** (1) Reduce to 3-4 primary KPIs top-left, group others. (2) Remove chartjunk, establish hierarchy. (3) Show active filters as chips. (4) Standardize buttons.
+
+**Outcome:** Users grasp status in 5 seconds, find alerts immediately
+
+---
+
+### Example 2: Bar Chart Audit Using 4 Criteria
+
+**Context:** Q4 sales bar chart for presentation
+
+**Audit Scores:**
+- **Clarity (4/5):** Clear title/labels, direct bar labels. Minor: Could annotate top performer.
+- **Efficiency (5/5):** Optimal position/length encoding, sorted descending, common baseline.
+- **Integrity (2/5 - CRITICAL):** ❌ Y-axis starts at 80 (exaggerates differences), ❌ No historical context.
+- **Aesthetics (4/5):** Clean, professional. Minor: Could use brand colors.
+
+**Average:** 3.75/5 (barely passes). **Critical issue:** Integrity <3 unacceptable.
+
+**Fixes:** (1) Start y-axis at zero or add break symbol + "Axis truncated" note. (2) Add Q3 baseline for context. (3) Annotate: "West region led Q4 with 23% increase."
+
+**After fixes:** Clarity 5/5, Efficiency 5/5, Integrity 5/5, Aesthetics 4/5 = **4.75/5 Excellent**
+
--- a/skills/cognitive-design/resources/frameworks.md
+++ b/skills/cognitive-design/resources/frameworks.md
@@ -0,0 +1,479 @@
+# Design Frameworks
+
+This resource provides three systematic frameworks for structuring cognitive design thinking and decision-making.
+
+**Frameworks covered:**
+1. Cognitive Design Pyramid (hierarchical quality assessment)
+2. Design Feedback Loop (interaction cycles)
+3. Three-Layer Visualization Model (data communication fidelity)
+
+---
+
+## Why Use Frameworks
+
+### WHY This Matters
+
+Frameworks provide:
+- **Systematic structure** for design thinking (not ad-hoc)
+- **Complete coverage** across multiple dimensions
+- **Prioritization guidance** (what to fix first)
+- **Shared vocabulary** for team communication
+- **Repeatable process** applicable across projects
+
+**Mental model:** Like architectural blueprints - frameworks give structure to design decisions so nothing critical is forgotten.
+
+Without frameworks: inconsistent application of principles, missed dimensions, unclear priorities, reinventing approach each time.
+
+---
+
+##What You'll Learn
+
+Three complementary frameworks, each suited for different contexts:
+
+**Cognitive Design Pyramid:** When you need comprehensive multi-dimensional quality assessment
+
+**Design Feedback Loop:** When you're designing interactive interfaces and need to support user perception-action cycles
+
+**Three-Layer Visualization Model:** When you're creating data visualizations and need to ensure fidelity from data through interpretation
+
+---
+
+## Why Cognitive Design Pyramid
+
+### WHY This Matters
+
+**Core insight:** Design effectiveness depends on satisfying multiple needs in hierarchical sequence - perceptual clarity is foundation, enabling cognitive coherence, which enables emotional engagement, which enables behavioral outcomes.
+
+**Key principle:** Lower levels are prerequisites for higher levels:
+- If users can't perceive elements clearly → coherence impossible
+- If design isn't coherent → engagement suffers
+- If not engaging → behavior change fails
+
+**Mental model:** Like Maslow's hierarchy of needs - must satisfy foundational needs before higher-level needs matter.
+
+**Use when:**
+- Comprehensive design quality check needed
+- Diagnosing what level is failing
+- Prioritizing improvements (fix foundation first)
+- Evaluating entire user experience holistically
+
+---
+
+### WHAT to Apply
+
+#### The Pyramid (4 Tiers)
+
+```
+        ┌─────────────────────────┐
+        │ BEHAVIORAL ALIGNMENT    │ ← Peak: Guides actual user behavior
+        ├─────────────────────────┤
+        │ EMOTIONAL ENGAGEMENT    │ ← Higher: Motivates and doesn't frustrate
+        ├─────────────────────────┤
+        │ COGNITIVE COHERENCE     │ ← Middle: Makes logical sense
+        ├─────────────────────────┤
+        │ PERCEPTUAL EFFICIENCY   │ ← Base: Seen and registered correctly
+        └─────────────────────────┘
+```
+
+---
+
+#### Tier 1: Perceptual Efficiency (Foundation)
+
+**Goal:** Users can see and register all necessary elements clearly.
+
+**Checkpoints:**
+- [ ] Sufficient contrast for all text and key elements (WCAG AA minimum: 4.5:1 for body text, 3:1 for large text)
+- [ ] Visual hierarchy obvious (squint test - primary elements still visible when blurred)
+- [ ] No overwhelming clutter or visual noise (data-ink ratio high)
+- [ ] Typography legible (appropriate size, line height, line length)
+- [ ] Color distinguishable (not relying solely on hue for colorblind users)
+- [ ] Grouping perceivable (Gestalt principles applied)
+
+**Common failures:**
+- Low contrast text (gray on light gray)
+- Everything same visual weight (no hierarchy)
+- Chartjunk obscuring data
+- Text too small or lines too long
+
+**Fix priority:** HIGHEST - without perception, nothing else works
+
+**Example:** ❌ All metrics gray 12px → ✓ Primary KPIs 36px bold, secondary 18px gray
+
+---
+
+#### Tier 2: Cognitive Coherence (Comprehension)
+
+**Goal:** Information is organized to align with how users think and expect.
+
+**Checkpoints:**
+- [ ] Layout matches mental models and familiar patterns (standard UI conventions)
+- [ ] Terminology consistent throughout (same words for same concepts)
+- [ ] Navigation/flow logical and predictable (no mystery meat navigation)
+- [ ] Memory aids present (breadcrumbs, state indicators, progress bars)
+- [ ] Chunking within working memory capacity (≤7 items per group, ideally 4±1)
+- [ ] Recognition over recall (show options, don't require memorization)
+
+**Common failures:**
+- Inconsistent terminology confusing users
+- Non-standard UI patterns requiring re-learning
+- Hidden navigation or state
+- Unchunked long lists overwhelming memory
+
+**Fix priority:** HIGH - enables users to understand and navigate
+
+**Example:** ❌ 30 ungrouped fields, inconsistent labels → ✓ 4-step wizard, grouped fields, consistent terms, progress visible
+
+---
+
+#### Tier 3: Emotional Engagement (Motivation)
+
+**Goal:** Design feels pleasant and motivating, not frustrating or anxiety-inducing.
+
+**Checkpoints:**
+- [ ] Aesthetic quality appropriate for context (professional/friendly/serious)
+- [ ] Design feels pleasant to use (not frustrating)
+- [ ] Anxiety reduced (progress shown, undo available, clear next steps)
+- [ ] Tone matches user emotional state (encouraging for learning, calm for high-stress tasks)
+- [ ] Accomplishment visible (checkmarks, confirmations, completed states)
+
+**Common failures:**
+- Ugly or amateurish appearance reducing trust
+- Frustrating interactions causing stress
+- No reassurance in multi-step processes
+- Inappropriate tone (playful for serious tasks)
+
+**Fix priority:** MEDIUM - affects engagement and trust, but only after foundation solid
+
+**Why it matters:** Emotional state influences cognitive performance - pleasant affect improves problem-solving, stress narrows attention
+
+**Example:** ❌ Dense text, no progress, no encouragement → ✓ Friendly tone, progress bar, checkmarks, "You're almost done!"
+
+---
+
+#### Tier 4: Behavioral Alignment (Action)
+
+**Goal:** Design guides users toward desired behaviors and outcomes.
+
+**Checkpoints:**
+- [ ] Calls-to-action clear and prominent (primary action obvious)
+- [ ] Visual emphasis on actionable items (buttons stand out)
+- [ ] Key takeaways highlighted (not buried in text)
+- [ ] Ethical nudges toward good decisions (defaults favor user)
+- [ ] Success paths more accessible than failure paths
+
+**Common failures:**
+- Primary CTA not visually prominent
+- Actionable insights buried in data
+- Destructive actions too easy (no confirmation)
+- Defaults favor business over user
+
+**Fix priority:** MEDIUM-LOW - optimize after foundation, coherence, engagement work
+
+**Example:** ❌ Insights in footnotes, equal button sizes → ✓ Key insight top (large), "Take Action" prominent green, export secondary gray
+
+---
+
+#### Applying the Pyramid
+
+**Process:**
+1. **Assess bottom-up:** Evaluate T1 (see clearly?) → T2 (makes sense?) → T3 (pleasant?) → T4 (guides action?)
+2. **Identify weakest tier:** Where are most failures? Which blocks success?
+3. **Prioritize foundation-up:** Fix T1 first (enables everything), then T2, then T3/T4
+4. **Re-evaluate:** Check fixes don't break higher tiers, iterate until all pass
+
+---
+
+## Why Design Feedback Loop
+
+### WHY This Matters
+
+**Core insight:** Users don't passively consume interfaces - they actively engage in continuous perception-action-learning cycles.
+
+**Loop stages:** Perceive → Interpret → Decide → Act → Learn → (loop repeats)
+
+**Key principle:** Design must support every stage; break anywhere causes confusion or failure.
+
+**Mental model:** Like a conversation - user asks (via perception), interface answers (via display), user responds (via action), interface confirms (via feedback), understanding updates (learning).
+
+**Use when:**
+- Designing interactive interfaces (apps, dashboards, tools)
+- Ensuring each screen answers user's questions
+- Providing appropriate feedback for actions
+- Diagnosing where interaction breaks down
+
+---
+
+### WHAT to Apply
+
+#### The Loop (5 Stages)
+
+```
+┌──────────┐
+│ PERCEIVE │ → "What am I seeing?"
+└────┬─────┘
+     ↓
+┌──────────┐
+│INTERPRET │ → "What does this mean?"
+└────┬─────┘
+     ↓
+┌──────────┐
+│  DECIDE  │ → "What can I do next?"
+└────┬─────┘
+     ↓
+┌──────────┐
+│   ACT    │ → "How do I do it?"
+└────┬─────┘
+     ↓
+┌──────────┐
+│  LEARN   │ → "What happened? Was it successful?"
+└────┬─────┘
+     ↓
+ (repeat)
+```
+
+---
+
+#### Stage 1: Perceive
+
+**User question:** "What am I seeing?"
+
+**Design must provide:**
+- [ ] Visibility of system status (current page, active state, what's happening)
+- [ ] Clear visual hierarchy (where to look first)
+- [ ] Salient critical elements (preattentive cues for important items)
+
+**Common failures:**
+- Hidden state (user doesn't know where they are)
+- No visual hierarchy (don't know what's important)
+- Loading without indicator (appears broken)
+
+**Example:** ❌ No active filter indication → ✓ Active filters shown as visible chips at top
+
+---
+
+#### Stage 2: Interpret
+
+**User question:** "What does this mean?"
+
+**Design must provide:**
+- [ ] Clear labels and context (explain what user is seeing)
+- [ ] Familiar terminology (or definitions for specialized terms)
+- [ ] Adequate context (why am I here, what are these options)
+- [ ] Visual encoding that matches meaning (charts appropriate for data type)
+
+**Common failures:**
+- Jargon or abbreviations without explanation
+- Missing context (chart without title/axes labels)
+- Unclear purpose of page/screen
+
+**Example:** ❌ No title, axes "X"/"Y", no units → ✓ "Q4 Sales by Region (thousands USD)", labeled axes, annotated events
+
+---
+
+#### Stage 3: Decide
+
+**User question:** "What can I do next?"
+
+**Design must provide:**
+- [ ] Available actions obvious (clear CTAs)
+- [ ] Choices not overwhelming (Hick's Law - limit options)
+- [ ] Recommended/default option suggested when appropriate
+- [ ] Consequences of choices clear (especially for destructive actions)
+
+**Common failures:**
+- Too many options causing paralysis
+- No guidance on what to do next
+- Unclear consequences ("Are you sure?" without explaining what happens)
+
+**Example:** ❌ 10 unclear buttons → ✓ Primary "Continue" prominent, secondary "Save Draft" gray, "Cancel" text link
+
+---
+
+#### Stage 4: Act
+
+**User question:** "How do I do it?"
+
+**Design must provide:**
+- [ ] Affordances clear (buttons look pressable, sliders look draggable)
+- [ ] Controls accessible (appropriate size/position per Fitts's Law)
+- [ ] Keyboard shortcuts available for power users
+- [ ] Constraints prevent invalid actions (disabled states, input masking)
+
+**Common failures:**
+- Unclear what's clickable (flat design with no affordances)
+- Tiny touch targets on mobile
+- No keyboard access
+- Allowing invalid inputs
+
+**Example:** ❌ Flat text, no interactive cue → ✓ Raised appearance, hover state, cursor→pointer, focus ring
+
+---
+
+#### Stage 5: Learn
+
+**User question:** "What happened? Was it successful?"
+
+**Design must provide:**
+- [ ] Immediate feedback for every action (< 100ms for responsiveness perception)
+- [ ] Confirmations for successful actions (checkmarks, "Saved" message, state change)
+- [ ] Clear error messages in context (next to problem, plain language)
+- [ ] Updated system state visible (if filter applied, data updates)
+
+**Common failures:**
+- No feedback (user clicks button, nothing visible happens)
+- Delayed feedback (loading without indication)
+- Generic errors ("Error occurred" without explanation)
+- Feedback hidden or dismissible before user sees it
+
+**Example:** ❌ Click → nothing visible → page maybe changes → ✓ Click → spinner → success message → confirmed transition
+
+---
+
+#### Applying the Feedback Loop
+
+**For each screen/interaction, ask:**
+
+1. **Perceive:** Can user see current state and what's important?
+2. **Interpret:** Will user understand what they're seeing?
+3. **Decide:** Are next actions clear and not overwhelming?
+4. **Act:** Are controls obvious and accessible?
+5. **Learn:** Will user get immediate, clear feedback?
+
+**Example: Login form**
+- **Perceive:** Fields visible, labels clear
+- **Interpret:** "Log in to your account" heading
+- **Decide:** "Log in" button obvious, "Forgot password?" link
+- **Act:** Focus states, clickable button, Enter submits
+- **Learn:** Spinner on submit, success→redirect, error→message next to field with red border
+
+---
+
+## Why Three-Layer Visualization Model
+
+### WHY This Matters
+
+**Core insight:** Effective data visualization requires success at three distinct layers - accurate data, appropriate visual encoding, and correct user interpretation. Failure at any layer breaks the entire communication chain.
+
+**Layers:** Data → Visual Encoding → Cognitive Interpretation
+
+**Key principle:** You can have perfect data with wrong chart type (encoding failure) or perfect chart with user misunderstanding (interpretation failure). All three must succeed.
+
+**Mental model:** Like a telephone game - message (data) must be transmitted (encoding) and received (interpretation) accurately or communication fails.
+
+**Use when:**
+- Creating any data visualization
+- Diagnosing why visualization isn't working
+- Choosing chart types
+- Validating user understanding
+
+---
+
+### WHAT to Apply
+
+#### Layer 1: Data
+
+**Question:** Is the underlying data accurate, complete, and relevant?
+
+**Checkpoints:**
+- [ ] Data quality verified (no errors, outliers investigated)
+- [ ] Complete dataset (not cherry-picked subset that misleads)
+- [ ] Relevant to question being answered (not tangential data)
+- [ ] Appropriate aggregation/granularity (not hiding or overwhelming with detail)
+- [ ] Time period representative (not artificially truncated to show desired trend)
+
+**Common failures:**
+- Garbage data → garbage visualization
+- Cherry-picked dates hiding broader context
+- Outliers distorting scale
+- Wrong metric for question
+
+**Fix:** Validate data quality before designing visualization
+
+**Example:** "Are sales improving?" ❌ Only last 3 months (hides 2-year decline) → ✓ 2-year trend with annotation: "Recent uptick after sustained decline"
+
+---
+
+#### Layer 2: Visual Encoding
+
+**Question:** Are visualization choices appropriate for the data type, user task, and perceptual capabilities?
+
+**Checkpoints:**
+- [ ] Chart type matches task (compare → bar, trend → line, distribution → histogram)
+- [ ] Encoding matches perceptual hierarchy (position > angle > area)
+- [ ] Axes scaled appropriately (start at zero for bars, or clearly note truncation)
+- [ ] Color usage correct (hue for categories, lightness for quantities)
+- [ ] Labels clear and sufficient (title, axes, units, legend if needed)
+
+**Common failures:**
+- Pie chart when bar chart would be clearer
+- Truncated axis exaggerating differences
+- Rainbow color scale for quantitative data
+- Missing units or context
+
+**Fix:** Match encoding to task using Cleveland & McGill hierarchy
+
+**Example:** Compare 6 regional sales. ❌ Pie chart (poor angle/area) → ✓ Bar chart (precise position/length)
+
+---
+
+#### Layer 3: Cognitive Interpretation
+
+**Question:** Will users correctly understand the message and draw valid conclusions?
+
+**Checkpoints:**
+- [ ] Main insight obvious or annotated (don't require users to discover it)
+- [ ] Context provided (baselines, comparisons, historical trends)
+- [ ] Audience knowledge level accommodated (annotations for novices)
+- [ ] Potential misinterpretations prevented (annotations clarifying what NOT to conclude)
+- [ ] Self-contained (doesn't require remembering distant information)
+
+**Common failures:**
+- Heap of data without guidance to key insight
+- Missing context (percentage without denominator, comparison without baseline)
+- Assumes expert knowledge novices lack
+- Spurious correlation without clarification
+
+**Fix:** Add titles, annotations, context; test with target users
+
+**Example:** Ice cream sales vs drowning deaths. ❌ No annotation (viewers assume causation) → ✓ "Both increase in summer (common cause), not causally related"
+
+---
+
+#### Applying the Three-Layer Model
+
+**Process:**
+1. **Validate data:** Check quality, completeness, relevance, outliers, time period, aggregation
+2. **Choose encoding:** Identify task → select chart type matching task + perceptual hierarchy → design axes, colors, labels → maximize data-ink
+3. **Support interpretation:** Add title conveying message → annotate insights → provide context → test with users → clarify misinterpretations
+4. **Iterate:** Fix weak layers, re-validate
+
+**Example: Sales dashboard**
+- **Data:** Complete 2-year sales, verified quality, identified 2 outlier months (note in viz)
+- **Encoding:** Task = trend + regional comparison. Line chart, distinct hue per region (limit 5), y-axis at zero "Sales (thousands USD)", time on x-axis
+- **Interpretation:** Title "Regional Sales Trends 2023-2024: Overall Growth with West Leading", annotate outliers ("Holiday promo Nov 2023", "Launch June 2024"), show previous year dotted baseline, tested with sales team
+- **Result:** Accurate data + appropriate encoding + correct interpretation = insight
+
+---
+
+## Choosing the Right Framework
+
+**Use Cognitive Design Pyramid when:**
+- Comprehensive multi-dimensional quality assessment needed
+- Diagnosing which aspect of design is failing
+- Prioritizing fixes (foundation → higher tiers)
+- Evaluating entire user experience
+
+**Use Design Feedback Loop when:**
+- Designing interactive interfaces
+- Ensuring each screen supports user questions
+- Providing appropriate feedback
+- Diagnosing where interaction breaks down
+
+**Use Three-Layer Model when:**
+- Creating data visualizations
+- Choosing chart types
+- Validating data quality through interpretation
+- Diagnosing visualization failures
+
+**Use multiple frameworks together for complete coverage**
+
--- a/skills/cognitive-design/resources/quick-reference.md
+++ b/skills/cognitive-design/resources/quick-reference.md
@@ -0,0 +1,300 @@
+# Quick Reference
+
+This resource provides rapid access to core cognitive design principles and quick validation checks.
+
+---
+
+## Why Quick Reference
+
+### WHY This Matters
+
+Quick references enable:
+- **Fast decision-making** during active design work
+- **Rapid validation** without deep dive into theory
+- **Team alignment** through shared heuristics
+- **Design advocacy** with memorable, citable principles
+
+**Mental model:** Like a cheat sheet or checklist for design decisions - quick lookups when you need them.
+
+Without quick reference: slowed workflow, inconsistent application, forgetting key principles under time pressure.
+
+---
+
+## What to Use
+
+### 20 Core Actionable Principles
+
+#### Attention & Perception
+
+1. **Selective Salience:** Use preattentive features (color, contrast, motion, size) sparingly for critical elements only; overuse overwhelms
+2. **Visual Hierarchy:** Organize by importance using size, contrast, position, spacing; guide attention along F-pattern or Z-pattern
+3. **Perceptual Grouping:** Use proximity (close = related), similarity (alike = grouped), continuity (aligned = connected), closure (mind fills gaps)
+4. **Figure-Ground Distinction:** Ensure clear separation between foreground content and background
+
+#### Memory & Cognition
+
+5. **Working Memory Respect:** Limit concurrent elements to 4±1 chunks; group related items; show context rather than requiring memorization
+6. **Recognition Over Recall:** Make options visible rather than requiring memory; show current state, breadcrumbs, available actions
+7. **Chunking Strategy:** Group related information into meaningful units (phone: 555-123-4567; navigation: categories not flat list)
+8. **Progressive Disclosure:** Reveal complexity gradually; show only what's immediately needed, provide details on demand
+
+#### Encoding & Communication
+
+9. **Encoding Hierarchy:** Use position/length for precise comparisons (bar charts), reserve angle/area/color for less critical distinctions
+10. **Data-Ink Maximization:** Remove decorative elements that don't convey information; maximize proportion showing actual data
+11. **Dual Coding:** Combine relevant visuals with text (two memory traces); use audio narration with complex visuals
+12. **Spatial Contiguity:** Place labels/annotations adjacent to content they describe; prevent split-attention
+
+#### Consistency & Standards
+
+13. **Pattern Consistency:** Use predictable, familiar patterns (standard icons, conventional layouts, platform norms)
+14. **Terminology Consistency:** Use same words for same concepts throughout
+15. **Natural Mapping:** Align controls with effects intuitively (increase volume = move up; zoom in = pinch outward)
+
+#### Feedback & Interaction
+
+16. **Immediate Feedback:** Provide visible response to every action within milliseconds (loading states, confirmations, validation)
+17. **Visible State:** Externalize system state to interface (show active filters, current page, progress); don't rely on user memory
+18. **Error Prevention First:** Constrain inputs, disable invalid actions, provide guidance before errors; when errors occur, provide contextual recovery
+
+#### Emotional & Behavioral
+
+19. **Emotional Calibration:** Pleasant aesthetics improve problem-solving; frustration narrows attention and reduces working memory
+20. **Behavioral Guidance:** Use visual emphasis, clear calls-to-action, ethical nudges to guide toward desired outcomes
+
+---
+
+### 3-Question Quick Check
+
+**Use this for rapid validation:**
+
+#### Question 1: Attention
+**"Is it obvious what to look at first?"**
+- [ ] Visual hierarchy is clear (primary vs secondary vs tertiary)
+- [ ] Most important element is preattentively salient (but not overwhelming)
+- [ ] Layout follows predictable scanning pattern (F or Z)
+
+**If NO:** Increase size/contrast of primary elements, reduce visual weight of secondary items, establish clear entry point
+
+---
+
+#### Question 2: Memory
+**"Is user required to remember anything that could be shown?"**
+- [ ] Current state is visible (filters, progress, location)
+- [ ] Options are presented, not requiring recall
+- [ ] Concurrent elements fit in 4±1 chunks
+
+**If NO:** Externalize state to interface, show don't tell, chunk information into groups
+
+---
+
+#### Question 3: Clarity
+**"Can someone unfamiliar understand this in 5 seconds?"**
+- [ ] Purpose/main message is immediately graspable
+- [ ] No unnecessary decorative elements competing for attention
+- [ ] Terminology is familiar or explained
+
+**If NO:** Simplify, add labels/titles, remove extraneous elements, use annotations to guide interpretation
+
+**If all three are YES → design is likely cognitively sound**
+
+---
+
+### Common Decision Rules
+
+#### Chart Selection
+
+**Task: Compare values**
+→ **Use:** Bar chart (position/length encoding)
+→ **Avoid:** Pie chart (angle/area less accurate)
+→ **Why:** Cleveland & McGill hierarchy - position > angle
+
+**Task: Show trend over time**
+→ **Use:** Line chart with time on x-axis
+→ **Avoid:** Stacked area (hard to compare non-bottom series)
+→ **Why:** Continuous lines show temporal progression naturally
+
+**Task: Show distribution**
+→ **Use:** Histogram or box plot
+→ **Avoid:** Multiple pie charts
+→ **Why:** Enables shape perception (normal, skewed, bimodal)
+
+**Task: Show part-to-whole**
+→ **Use:** Stacked bar chart or treemap
+→ **Avoid:** Pie chart with >5 slices
+→ **Why:** Easier to compare bar lengths than angles
+
+**Task: Find outliers**
+→ **Use:** Scatterplot
+→ **Avoid:** Table of numbers
+→ **Why:** Visual pattern makes outliers pop out preattentively
+
+---
+
+#### Color Usage
+
+**Categorical data (types, categories)**
+→ **Use:** Distinct hues (red, blue, green - perceptually different)
+→ **Avoid:** Shades of same hue
+→ **Why:** Hue lacks inherent ordering; best for nominal categories
+
+**Quantitative data (amounts, rankings)**
+→ **Use:** Lightness/saturation gradient (light to dark)
+→ **Avoid:** Rainbow spectrum or varied hues
+→ **Why:** Lightness has natural perceptual ordering (more = darker)
+
+**Alerts/errors**
+→ **Use:** Red sparingly for threshold violations only
+→ **Avoid:** Red for all negative values
+→ **Why:** Overuse causes alert fatigue; preattentive salience needs restraint
+
+**Accessible color**
+→ **Use:** Redundant coding (color + icon/shape/text)
+→ **Why:** 8% of males are colorblind; don't rely on color alone
+
+---
+
+#### Chunking Information
+
+**Long lists (>7 items)**
+→ **Action:** Group into 3-7 categories with visual separation
+→ **Why:** Working memory limit; chunking fits capacity
+
+**Multi-step processes**
+→ **Action:** Break into 3-5 steps, show progress indicator
+→ **Why:** Progressive disclosure reduces overwhelm, visible state reduces anxiety
+
+**Form fields**
+→ **Action:** 4-6 fields per screen; group related fields with proximity/backgrounds
+→ **Why:** Fits working memory; Gestalt proximity shows relationships
+
+**Navigation menus**
+→ **Action:** 5-7 top-level categories max
+→ **Why:** Decision time increases with choices (Hick's Law)
+
+---
+
+#### Error Handling
+
+**Error message location**
+→ **Action:** Next to problematic field, not top of page
+→ **Why:** Gestalt proximity; spatial contiguity reduces cognitive load
+
+**Error message language**
+→ **Action:** Plain language ("Password must be 8+ characters") not error codes
+→ **Why:** Reduces interpretation load, especially under stress
+
+**Error prevention**
+→ **Action:** Disable invalid actions, constrain inputs, validate inline
+→ **Why:** Prevention > correction; immediate feedback enables learning
+
+**Error recovery**
+→ **Action:** Show what to fix, auto-focus to field, keep prior input visible
+→ **Why:** Recognition over recall; reduce motor effort
+
+---
+
+#### Typography & Layout
+
+**Heading hierarchy**
+→ **Action:** Use size + weight to distinguish levels (H1 > H2 > H3 > body)
+→ **Why:** Visual hierarchy guides scanning, shows structure
+
+**Line length**
+→ **Action:** 50-75 characters per line for body text
+→ **Why:** Longer lines cause eye strain; shorter disrupt reading rhythm
+
+**Whitespace**
+→ **Action:** Use to separate unrelated groups, create breathing room
+→ **Why:** Gestalt principle - separated = unrelated; crowding increases cognitive load
+
+**Alignment**
+→ **Action:** Left-align text in Western contexts; align related elements
+→ **Why:** Gestalt continuity; consistent starting point aids scanning
+
+---
+
+### Design Heuristics at a Glance
+
+#### Tufte's Principles
+- **Maximize data-ink ratio:** Remove chart junk, keep only ink showing data
+- **Graphical integrity:** Visual representation matches data proportionally
+- **Small multiples:** Repeated structure enables comparison across conditions
+
+#### Norman's Principles
+- **Visibility:** State and options should be visible
+- **Affordances:** Controls suggest their use (buttons look pressable)
+- **Feedback:** Every action gets immediate, visible response
+- **Mapping:** Controls arranged to match effects spatially
+- **Constraints:** Prevent errors by limiting invalid actions
+
+#### Gestalt Principles (Quick)
+- **Proximity:** Close = related
+- **Similarity:** Alike = grouped
+- **Continuity:** Aligned = connected
+- **Closure:** Mind completes incomplete figures
+- **Figure-Ground:** Foreground vs background separation
+
+#### Cognitive Load Principles
+- **Intrinsic load:** Task complexity (can't change)
+- **Extraneous load:** Bad design (MINIMIZE THIS)
+- **Germane load:** Meaningful learning effort (support this)
+
+#### Mayer's Multimedia Principles (Quick)
+- **Multimedia:** Words + pictures > words alone
+- **Modality:** Audio + visual > text + visual (splits load)
+- **Contiguity:** Place text near corresponding graphic
+- **Coherence:** Exclude extraneous content
+- **Segmenting:** Break into user-paced chunks
+
+---
+
+### When to Use Which Framework
+
+**Cognitive Design Pyramid**
+→ **Use when:** Comprehensive quality check needed, evaluating all dimensions of design
+→ **Tiers:** Perception → Coherence → Emotion → Behavior
+
+**Design Feedback Loop**
+→ **Use when:** Designing interactive interfaces, ensuring each screen answers user questions
+→ **Stages:** Perceive → Interpret → Decide → Act → Learn
+
+**Three-Layer Visualization Model**
+→ **Use when:** Creating data visualizations, checking data quality through to interpretation
+→ **Layers:** Data → Encoding → Interpretation
+
+---
+
+### 5-Second Tests
+
+**Dashboard:** Can user identify current status within 5 seconds?
+
+**Form:** Can user determine what information is needed within 5 seconds?
+
+**Visualization:** Can user grasp main insight within 5 seconds?
+
+**Infographic:** Can user recall key message after 5 seconds viewing?
+
+**Interface:** Can user identify primary action within 5 seconds?
+
+**If NO to any → simplify, strengthen hierarchy, add annotations**
+
+---
+
+### Common Rationales for Advocacy
+
+**"Why bar chart instead of pie chart?"**
+→ "Cleveland & McGill's research shows position/length encoding is 5-10x more accurate than angle/area for human perception. Bar charts enable precise comparisons; pie charts make them difficult."
+
+**"Why simplify the interface?"**
+→ "Working memory holds only 4±1 chunks. Current design exceeds this, causing cognitive overload. Chunking into groups fits human capacity and improves task completion."
+
+**"Why inline validation?"**
+→ "Immediate feedback enables learning through the perception-action loop. Delayed feedback breaks the cognitive connection between action and outcome."
+
+**"Why not use red for all negative values?"**
+→ "Preattentive salience depends on contrast. If everything is red, nothing stands out (alert fatigue). Reserve red for true threshold violations requiring immediate action."
+
+**"Why accessible color schemes?"**
+→ "8% of males have color vision deficiency. Redundant coding (color + icon + text) ensures all users can perceive information, not just those with typical vision."
+
--- a/skills/cognitive-design/resources/storytelling-journalism.md
+++ b/skills/cognitive-design/resources/storytelling-journalism.md
@@ -0,0 +1,473 @@
+# Storytelling & Journalism
+
+This resource provides cognitive design principles for data journalism, presentations, infographics, and visual storytelling.
+
+**Covered topics:**
+1. Visual narrative structure
+2. Annotation strategies
+3. Scrollytelling techniques
+4. Framing and context
+5. Visual metaphors
+
+---
+
+## Why Storytelling Needs Cognitive Design
+
+### WHY This Matters
+
+**Core insight:** People naturally seek stories with cause-effect and chronology - structuring data as narrative aids comprehension and retention.
+
+**Challenges of data storytelling:**
+- Raw data is heap of facts (hard to process)
+- Visualizations can be misinterpreted without guidance
+- Readers skim, don't read thoroughly
+- Need emotional engagement + factual accuracy
+
+**How cognitive principles help:**
+- **Narrative structure:** Context → problem → resolution (chunks information meaningfully)
+- **Annotations:** Guide attention to key insights (prevent misinterpretation)
+- **Self-contained graphics:** Include all context (recognition over recall, no split attention)
+- **Emotional engagement:** Appropriate imagery improves retention (Norman's emotional design)
+- **Progressive disclosure:** Scrollytelling reveals complexity gradually
+
+**Mental model:** Data journalism is guided tour, not data dump - designer leads readers to insights while allowing exploration.
+
+---
+
+## What You'll Learn
+
+**Five key areas:**
+
+1. **Narrative Structure:** Organizing data stories with beginning, middle, end
+2. **Annotation Strategies:** Guiding interpretation and preventing misreading
+3. **Scrollytelling:** Progressive revelation as user scrolls
+4. **Framing & Context:** Honest presentation avoiding misleading frames
+5. **Visual Metaphors:** Leveraging existing knowledge for new concepts
+
+---
+
+## Why Narrative Structure Matters
+
+### WHY This Matters
+
+**Core insight:** Human brains are wired for stories - we naturally seek cause-effect relationships and temporal sequences.
+
+**Benefits of narrative:**
+- Easier to process (story arc vs unstructured facts)
+- Better retention (stories are memorable)
+- Emotional engagement (narratives activate empathy)
+- Natural chunking (beginning/middle/end provides structure)
+
+**Mental model:** Like journalism's inverted pyramid, but for visual data - lead with context, build to insight, resolve with implications.
+
+---
+
+### WHAT to Apply
+
+#### Classic Narrative Arc for Data
+
+**Structure:** Context → Problem/Question → Data/Evidence → Resolution/Insight
+
+**Context (Set the stage):**
+```
+Purpose: Orient reader to situation
+Elements:
+- Title that frames the story: "How Climate Change is Affecting Crop Yields"
+- Subtitle/intro: Brief background (2-3 sentences)
+- Visuals: Overall trend or map showing scope
+
+Example:
+Title: "The Midwest Corn Belt is Shifting North"
+Subtitle: "Rising temperatures are pushing viable growing regions 100 miles northward"
+Visual: Map showing historical corn production zones vs current
+```
+
+**Problem/Question (Establish stakes):**
+```
+Purpose: Show why this matters, what's at stake
+Elements:
+- Specific question posed: "Will traditional farming regions remain viable?"
+- Visual highlighting problem area
+- Human impact stated (not just abstract data)
+
+Example:
+Question: "Can farmers adapt quickly enough?"
+Visual: Chart showing yield declines in traditional zones
+Impact: "1.5 million farming families affected"
+```
+
+**Data/Evidence (Show the findings):**
+```
+Purpose: Present data that answers question
+Elements:
+- Clear visualizations (chart type matched to message)
+- Annotations highlighting key patterns
+- Comparisons (before/after, with/without intervention)
+
+Example:
+Visual: Line chart showing yields 1980-2024 by region
+Annotation: "Southern regions declining 15%, northern regions up 22%"
+Comparison: States that adapted (Iowa) vs those that didn't
+```
+
+**Resolution/Insight (Deliver the takeaway):**
+```
+Purpose: Answer the question, provide conclusion
+Elements:
+- Main insight clearly stated
+- Implications for future
+- Call-to-action or next question (optional)
+
+Example:
+Insight: "Adaptation possible but requires 5-10 year transition"
+Implication: "Without support, smaller farms will struggle to relocate/retool"
+Next: "How can policy accelerate adaptation?"
+```
+
+---
+
+#### Opening Strategies
+
+**Lead with human impact:**
+```
+❌ "Agricultural productivity data shows regional variations"
+✓ "Sarah Miller's family has farmed Iowa corn for 4 generations. Now her yields are declining while her neighbor 200 miles north is thriving."
+
+Why: Concrete human story engages emotions, makes abstract data personal
+```
+
+**Lead with surprising finding:**
+```
+❌ "Unemployment rates changed over time"
+✓ "Despite recession fears, unemployment in Tech Hub cities fell 15% while national rates rose"
+
+Why: Counterintuitive findings capture attention
+```
+
+**Lead with visual:**
+```
+Strong opening image/chart that encapsulates story
+Followed by: "This is the story of..." text
+Why: Visual impact draws reader in
+```
+
+---
+
+## Why Annotation Strategies Matter
+
+### WHY This Matters
+
+**Core insight:** Readers scan rather than study - without guidance, they may miss key insights or misinterpret data.
+
+**Benefits of annotations:**
+- Guide attention to important patterns (preattentive cues)
+- Prevent misinterpretation (clarify what data shows)
+- Reduce cognitive load (don't make readers discover insight)
+- Enable skimming (annotations convey story even without deep read)
+
+**Mental model:** Annotations are like tour guide pointing out important sights - "Look here, notice this pattern, here's why it matters."
+
+---
+
+### WHAT to Apply
+
+#### Annotation Types
+
+**Callout boxes:**
+```
+Purpose: Highlight main insight
+Position: Near relevant data point, contrasting background
+Content: 1-2 sentences max
+Style: Larger font than body, bold or colored
+
+Example on line chart:
+Callout pointing to spike: "Sales increased 78% after campaign launch - highest growth in company history"
+```
+
+**Arrows and leader lines:**
+```
+Purpose: Connect explanation to specific data element
+Use: Point from text annotation to exact point/region on chart
+Style: Simple arrow, not decorative
+
+Example:
+Arrow from "Product launch" text to vertical line on timeline
+Arrow from "Outlier explained by data error" to specific point
+```
+
+**Shaded regions:**
+```
+Purpose: Mark time periods or ranges of interest
+Use: Highlight recessions, policy changes, events
+Style: Subtle shading (10-20% opacity), doesn't obscure data
+
+Example:
+Gray shaded region labeled "COVID-19 lockdown March-May 2020"
+Allows comparison of before/during/after in context
+```
+
+**Direct labels on data:**
+```
+Purpose: Eliminate legend lookups
+Use: Label lines/bars directly instead of separate legend
+Benefit: Immediate association, no cross-referencing
+
+Example on multi-line chart:
+Text "North Region" placed directly next to its line (not legend box)
+```
+
+**Contextual annotations:**
+```
+Purpose: Explain anomalies, provide necessary background
+Use: Note data quirks, methodological notes, definitions
+
+Example:
+"*Data unavailable for Q2 2020 due to reporting disruptions"
+"**Adjusted for inflation using 2024 dollars"
+```
+
+---
+
+#### Annotation Guidelines
+
+**What to annotate:**
+```
+✓ Main insight (what should reader take away?)
+✓ Unexpected patterns (outliers, inflection points)
+✓ Important events (policy changes, launches, crises)
+✓ Comparisons (how does this compare to baseline/benchmark?)
+✓ Methodological notes (data sources, limitations)
+
+❌ Don't annotate obvious patterns ("trend is increasing" when clearly visible)
+❌ Don't over-annotate (too many annotations = visual noise)
+```
+
+**Annotation placement:**
+```
+✓ Near the data being explained (Gestalt proximity)
+✓ Outside the chart area if possible (don't obscure data)
+✓ Consistent positioning (all callouts top-right, or all left, etc.)
+```
+
+---
+
+## Why Scrollytelling Matters
+
+### WHY This Matters
+
+**Core insight:** Complex data stories benefit from progressive revelation - scrollytelling maintains context while building understanding step-by-step.
+
+**Benefits:**
+- Progressive disclosure (fits working memory)
+- Maintains context (chart stays visible throughout)
+- Engaging (interactive vs passive reading)
+- Guided exploration (designer controls sequence)
+
+**Mental model:** Like flipping through graphic novel panels - each scroll reveals next part of story while maintaining continuity.
+
+---
+
+### WHAT to Apply
+
+#### Basic Scrollytelling Pattern
+
+**Structure:**
+```
+1. Sticky chart (stays visible as user scrolls)
+2. Text sections (scroll past, trigger chart updates)
+3. Smooth transitions (not jarring jumps)
+4. User control (can scroll back up to review)
+```
+
+**Example implementation:**
+
+**Section 1 (scroll 0%):**
+```
+Chart shows: Full trend line 2010-2024
+Text visible: "Overall growth trajectory shows steady increase"
+User sees: Big picture
+```
+
+**Section 2 (scroll 33%):**
+```
+Chart updates: Highlight 2015-2018 segment in color, rest faded
+Text visible: "First phase: Rapid growth following policy change"
+User sees: Specific period in context of whole
+```
+
+**Section 3 (scroll 66%):**
+```
+Chart updates: Highlight 2020 dip in red
+Text visible: "COVID-19 impact caused temporary decline"
+User sees: Anomaly explained
+```
+
+**Section 4 (scroll 100%):**
+```
+Chart updates: Full color restored, add projection (dotted)
+Text visible: "Projected recovery to pre-2020 trend by 2026"
+User sees: Complete story with future outlook
+```
+
+---
+
+#### Scrollytelling Best Practices
+
+**Transitions:**
+```
+✓ Smooth animations (300-500ms transitions)
+✓ Maintain reference points (axis don't jump)
+✓ One change at a time (highlight region OR add annotation, not both simultaneously)
+❌ Jarring jumps (disorienting)
+```
+
+**User control:**
+```
+✓ Can scroll back up to review
+✓ Scroll speed doesn't affect (trigger points based on position, not speed)
+✓ "Skip to end" option for impatient users
+✓ Pause/play if animations continue automatically
+```
+
+**Accessibility:**
+```
+✓ All content accessible without scrolling (fallback for screen readers)
+✓ Keyboard navigation supported
+✓ Works without JavaScript (progressive enhancement)
+```
+
+---
+
+## Why Framing & Context Matter
+
+### WHY This Matters
+
+**Core insight:** Same data can support different conclusions based on framing - ethical journalism provides complete context to avoid misleading.
+
+**Framing effects (Tversky & Kahneman):**
+- 10% unemployment vs 90% employed (same data, different emotional impact)
+- Deaths per 100k vs % survival (same mortality, different perception)
+
+**Ethical obligation:** Provide enough context for accurate interpretation
+
+---
+
+### WHAT to Apply
+
+#### Provide Baselines & Comparisons
+
+**Always include:**
+```
+✓ Historical comparison (how does this compare to past?)
+✓ Peer comparison (how does this compare to similar entities?)
+✓ Benchmark (what's the standard/goal?)
+✓ Absolute + relative (numbers + percentages both shown)
+
+Example: "Unemployment rises to 5.2%"
+Better: "Unemployment rises to 5.2% from 4.8% last quarter (historical average: 5.5%)"
+Complete context: Reader can judge severity
+```
+
+**Avoid cherry-picking:**
+```
+❌ Show only favorable time period (Q4 2023 sales up! ...but down overall year)
+✓ Show full relevant period + note any focus area
+
+Example:
+❌ "Satisfaction scores improved 15 points!" (cherry-picked one quarter)
+✓ Chart showing full 2-year trend (overall declining with one uptick quarter)
+```
+
+---
+
+#### Clarify Denominator
+
+**Percentages need context:**
+```
+❌ "50% increase!" (50% of what?)
+✓ "Increased from 10 to 15 users (50% increase)"
+✓ "Increased 50 percentage points (from 20% to 70%)"
+
+Confusion: Is it 50 percentage point increase or 50% relative increase?
+Clarity: State both absolute numbers and percentage
+```
+
+---
+
+#### Note Limitations
+
+**Methodological transparency:**
+```
+✓ Data source stated: "Source: U.S. Census Bureau"
+✓ Sample size noted: "n=1,200 respondents"
+✓ Margin of error: "±3% margin of error"
+✓ Missing data: "State data unavailable 2020-2021"
+✓ Selection criteria: "Only includes full-time employees"
+
+Purpose: Reader can assess reliability and applicability
+```
+
+---
+
+## Why Visual Metaphors Matter
+
+### WHY This Matters
+
+**Core insight:** Familiar metaphors leverage existing knowledge to explain new concepts - but only if metaphor resonates with audience.
+
+**Benefits:**
+- Faster comprehension (tap into existing schemas)
+- Memorable (concrete imagery aids recall)
+- Emotional connection (metaphors evoke feelings)
+
+**Risks:**
+- Misleading if metaphor doesn't fit
+- Cultural assumptions (metaphor may not translate)
+- Oversimplification (metaphor hides complexity)
+
+---
+
+### WHAT to Apply
+
+#### Choosing Metaphors
+
+**Effective metaphors:**
+```
+✓ Virus spread as fire spreading across map (leverages fire = spread schema)
+✓ Data flow as river (volume, direction, obstacles)
+✓ Economic inequality as wealth distribution pyramid
+✓ Carbon footprint as actual footprint size
+
+Why they work: Concrete, universally understood, structurally similar to concept
+```
+
+**Problematic metaphors:**
+```
+❌ Complex technical process as simple machine (oversimplifies)
+❌ Cultural-specific metaphors (e.g., sports metaphors for international audience)
+❌ Metaphors that contradict data (e.g., "economy is healthy" shown as growing plant - what if it's not growing?)
+```
+
+---
+
+#### Metaphor Guidelines
+
+**Test your metaphor:**
+```
+1. Does it help understanding or just decorate?
+2. Is it universally recognized by target audience?
+3. Does it accurately represent the concept?
+4. Does it oversimplify in misleading ways?
+5. Could it be misinterpreted?
+
+If any answer is problematic → reconsider metaphor
+```
+
+**Clarify limitations:**
+```
+When using metaphor, note where analogy breaks down:
+"While virus spread resembles fire spread, unlike fire, viruses can have delayed effects..."
+
+Prevents overgeneralizing from metaphor
+```
+
--- a/skills/cognitive-design/resources/ux-product-design.md
+++ b/skills/cognitive-design/resources/ux-product-design.md
@@ -0,0 +1,272 @@
+# UX & Product Design
+
+This resource provides cognitive design principles for interactive software interfaces (web apps, mobile apps, desktop software).
+
+**Covered topics:**
+1. Learnability through familiar patterns
+2. Task flow efficiency
+3. Cognitive load management
+4. Onboarding design
+5. Error handling and prevention
+
+---
+
+## Why UX Needs Cognitive Design
+
+### WHY This Matters
+
+**Core insight:** Users approach interfaces with mental models from prior experiences - designs that violate expectations require re-learning and cause cognitive friction.
+
+**Common problems:**
+- New users abandon apps (steep learning curve)
+- Task flows with too many steps/choices (Hick's Law impact)
+- Complex features overwhelm users (cognitive overload)
+- Confusing error messages
+- Onboarding shows all features at once (memory overload)
+
+**How cognitive principles help:** Leverage existing mental models (Jakob's Law), minimize steps/choices (Hick's/Fitts's Law), progressive disclosure, inline validation, onboarding focused on 3-4 key tasks (working memory limit).
+
+---
+
+## What You'll Learn
+
+1. **Learnability:** Leverage familiar patterns for instant comprehension
+2. **Task Flow Efficiency:** Minimize steps and optimize control placement
+3. **Cognitive Load Management:** Progressive disclosure and memory aids
+4. **Onboarding:** Teaching without overwhelming
+5. **Error Handling:** Prevention first, then contextual recovery
+
+---
+
+## Why Learnability Matters
+
+### WHY This Matters
+
+Users spend most time on OTHER sites/apps (Jakob's Law) - they expect interfaces to work like what they already know.
+
+**Benefits of familiar patterns:** Instant recognition (System 1), lower cognitive load, faster completion, reduced errors.
+
+---
+
+### WHAT to Apply
+
+#### Standard UI Patterns
+
+**Navigation:**
+- Hamburger menu (☰) for mobile
+- Magnifying glass (🔍) for search
+- Logo top-left returns home
+- User avatar top-right for account
+- Breadcrumbs for hierarchy
+
+**Actions:**
+- Primary: Right-aligned button
+- Destructive: Red, requires confirmation
+- Secondary: Gray/outlined
+- Disabled: Grayed out
+
+**Forms:**
+- Labels above/left of fields
+- Required fields: Asterisk (*)
+- Validation: Inline as user types
+- Submit: Bottom-right
+
+**Feedback:**
+- Loading: Spinner for waits >1s
+- Success: Green checkmark + message
+- Error: Red + icon + message
+- Confirmation: Modal for destructive actions
+
+**Application rule:** Use standard patterns by default. Deviate only when standard fails AND provide onboarding. Test if users understand without help.
+
+---
+
+#### Affordances & Signifiers
+
+Controls should signal function through appearance:
+
+**Buttons:** Raised appearance, hover state, active state, focus state
+**Links:** Underlined or distinct color, pointer cursor on hover
+**Inputs:** Rectangular border, cursor on click, placeholder text, focus state
+**Draggable:** Handle icon (≡≡), grab cursor, shadow on drag
+
+**Anti-patterns:** Flat design with no cues, no hover states, buttons looking like labels, clickable areas smaller than visual target.
+
+---
+
+#### Platform Conventions
+
+**iOS:** Back top-left, navigation bottom tabs, swipe gestures, share icon with up arrow
+**Android:** System back button, hamburger menu top-left, three-dot overflow, FAB bottom-right
+**Web:** Logo top-left to home, primary nav top horizontal, search top-right, footer links
+
+**Rule:** Match platform norms. If cross-platform, adapt to each. Don't invent when standards exist.
+
+---
+
+## Why Task Flow Efficiency Matters
+
+### WHY This Matters
+
+Every decision point and step adds time and cognitive effort.
+
+**Hick's Law:** Decision time increases logarithmically with choices (2 choices = fast, 10 = slow/paralysis)
+**Fitts's Law:** Time to target = distance ÷ size (large/nearby = fast, small/distant = slow)
+
+---
+
+### WHAT to Apply
+
+#### Reduce Steps
+
+**Audit method:**
+1. Map current flow
+2. Question each step: "Necessary? Can automate? Can merge?"
+3. Eliminate unnecessary
+4. Combine related
+5. Pre-fill known info
+
+**Example:** Checkout flow reduced from 8 steps to 4 (pre-fill email/shipping, combine review inline) = 50% fewer steps, higher completion.
+
+---
+
+#### Reduce Choices (Hick's Law)
+
+**Progressive disclosure:** Show 5 most common filters, "More filters" reveals rest
+**Smart defaults:** Highlight recommended option, show "Other options" link
+**Contextual menus:** 5-7 actions relevant to current mode, not all 50
+
+**Rule:** Common tasks ≤5 options. Advanced features behind "More". Personalize based on usage.
+
+---
+
+#### Optimize Control Placement (Fitts's Law)
+
+Frequent actions = large and nearby. Infrequent = smaller and distant.
+
+**Primary:** Large button (44×44px mobile, 32×32px desktop), bottom-right or natural flow
+**Secondary:** Medium, near primary but distinct (outlined, gray)
+**Tertiary/destructive:** Smaller, separated, requires confirmation
+
+---
+
+## Why Cognitive Load Management Matters
+
+### WHY This Matters
+
+Working memory holds 4±1 chunks - exceeding this causes confusion/abandonment.
+
+**Load types:** Intrinsic (task complexity), Extraneous (poor design - MINIMIZE), Germane (meaningful learning - support)
+
+---
+
+### WHAT to Apply
+
+#### Progressive Disclosure
+
+Reveal complexity gradually, show only immediate needs.
+
+**Wizards:** 4 steps × 6-8 fields (not 30 fields on one page) = fits working memory, visible progress
+**Expandable sections:** Collapsed by default, expand on demand
+**"Advanced" options:** Basic visible, "Show advanced" link reveals rest
+
+---
+
+#### Chunking & Grouping
+
+Group related items, separate with whitespace.
+
+**Forms:** Group by relationship (Personal Info, Shipping Address) = 2 chunks
+**Navigation:** 5-7 categories × 3-5 items each (not 25 flat items)
+
+---
+
+#### Memory Aids (Recognition over Recall)
+
+Show options, don't require memorization.
+
+**Visible state:** Active filters as chips, current page highlighted, breadcrumbs, progress indicators
+**Autocomplete:** Search suggestions, address autocomplete, date picker
+**Recent history:** Recently opened files, search history, previous purchases
+
+---
+
+## Why Onboarding Matters
+
+### WHY This Matters
+
+First experience determines continuation/abandonment. Must teach key tasks without overwhelming.
+
+**Failures:** All features upfront, passive tutorials, no contextual help
+**Success:** 3-4 core tasks, interactive tutorials, contextual help when encountered
+
+---
+
+### WHAT to Apply
+
+#### Focus on Core Tasks
+
+Limit to 3-4 most important tasks, not comprehensive tour.
+
+**Ask:** "What must users learn to get value?" NOT "What are all features?"
+
+**Example:** Project management app - onboard on: create project, add task, assign to member, mark complete. Skip advanced filtering/custom fields/reporting (teach contextually later).
+
+---
+
+#### Interactive Learning
+
+Users learn by doing, not reading.
+
+**Guided interaction:** Highlight button, require click to proceed (active learning, muscle memory)
+**Progressive completion:** Step 1 must complete before Step 2 unlocks = sense of accomplishment, ensures learning
+
+---
+
+#### Contextual Help
+
+Advanced features taught when encountered, not upfront.
+
+**First encounter tooltips:** One-time help when user navigates to new feature
+**Empty states:** "No tasks yet! Click + to create first task" with illustrative graphic
+**Gradual discovery:** After 1 week show tip, after 1 month show power tip (usage-based timing)
+
+---
+
+## Why Error Handling Matters
+
+### WHY This Matters
+
+Users make errors (slips/mistakes) - good design prevents and provides clear recovery.
+
+**Goal:** Prevention > detection > recovery
+
+---
+
+### WHAT to Apply
+
+#### Prevention (Best)
+
+**Constrain inputs:** Date picker (not free text), numeric keyboard for phone, input masking
+**Provide defaults:** Pre-select common option, suggest formats
+**Confirm destructive:** Require confirmation modal, "Type DELETE to confirm", undo when possible
+
+---
+
+#### Detection (Inline Validation)
+
+Immediate feedback as user types/on blur, not after submit.
+
+**Real-time validation:** Password strength meter as typing, email format on blur
+**Positioning:** Error NEXT TO field (not top of page) - Gestalt proximity
+
+---
+
+#### Recovery (Clear Guidance)
+
+Tell what's wrong and how to fix, in plain language.
+
+**Bad:** "Error 402"
+**Good:** "Password must be at least 8 characters"
+
+**Visual:** Red + icon, auto-focus to error field, keep user input (don't clear), green checkmark when fixed
--- a/skills/communication-storytelling/SKILL.md
+++ b/skills/communication-storytelling/SKILL.md
@@ -0,0 +1,212 @@
+---
+name: communication-storytelling
+description: Use when transforming analysis/data into persuasive narratives—presenting to executives, explaining technical concepts to non-technical audiences, creating customer-facing communications, writing investor updates, announcing changes, turning research into insights, or when user mentions "write this for", "explain to", "present findings", "make this compelling", "audience is". Invoke when information needs to become a story tailored to specific stakeholders.
+---
+
+# Communication Storytelling
+
+## Table of Contents
+- [Purpose](#purpose)
+- [When to Use](#when-to-use)
+- [What Is It?](#what-is-it)
+- [Workflow](#workflow)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Transform complex information, analysis, or data into clear, persuasive narratives tailored to specific audiences. This skill helps you craft compelling stories with a strong headline, key supporting points, and concrete evidence that drives understanding and action.
+
+## When to Use
+
+Use this skill when you need to:
+
+**Audience Translation:**
+- Present technical analysis to non-technical stakeholders
+- Explain complex data to executives who need quick decisions
+- Write customer-facing communications from internal analysis
+- Translate research findings into actionable insights
+
+**High-Stakes Communication:**
+- Create board presentations or investor updates
+- Announce organizational changes or difficult decisions
+- Write crisis communications that build trust
+- Present recommendations that need executive buy-in
+
+**Narrative Crafting:**
+- Turn A/B test results into product decisions
+- Create compelling case studies from customer data
+- Write product launch announcements from feature lists
+- Transform postmortems into learning narratives
+
+**When user says:**
+- "How do I present this to [audience]?"
+- "Make this compelling for [stakeholders]"
+- "Explain [technical thing] to [non-technical audience]"
+- "Write an announcement about [change]"
+- "Turn this analysis into a narrative"
+
+## What Is It?
+
+Communication storytelling uses a structured approach to create narratives that inform, persuade, and inspire action. The core framework includes:
+
+1. **Headline** - Single clear statement capturing the essence
+2. **Key Points** - 3-5 supporting ideas with logical flow
+3. **Proof** - Evidence, data, examples, stories that substantiate
+4. **Call-to-Action** - What audience should think, feel, or do
+
+**Quick example:**
+
+**Bad (data dump):**
+"Our Q2 revenue was $2.3M, up from $1.8M in Q1. Customer count went from 450 to 520. Churn decreased from 5% to 3.2%. NPS improved from 42 to 58. We launched 3 new features..."
+
+**Good (storytelling):**
+"We've reached product-market fit. Three signals prove it: (1) Revenue grew 28% while sales capacity stayed flat—customers are pulling product from us, not the other way around. (2) Churn dropped 36% as we focused on power users, with our top segment now at 1% monthly churn. (3) NPS jumped 16 points to 58, with customers specifically praising the three features we bet on. Recommendation: Double down on power user segment with premium tier."
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Communication Storytelling Progress:
+- [ ] Step 1: Gather inputs and clarify audience
+- [ ] Step 2: Choose appropriate narrative structure
+- [ ] Step 3: Craft the narrative
+- [ ] Step 4: Validate quality and clarity
+- [ ] Step 5: Deliver and adapt
+```
+
+**Step 1: Gather inputs and clarify audience**
+
+Ask user for the message (analysis, data, information to communicate), audience (who will receive this), purpose (inform, persuade, inspire, build trust), context (situation, stakes, constraints), and tone (formal, casual, urgent, celebratory). Understanding audience deeply is critical—their expertise level, concerns, decision authority, and time constraints shape everything. See [resources/template.md](resources/template.md) for input questions.
+
+**Step 2: Choose appropriate narrative structure**
+
+For standard communications (announcements, updates, presentations) → Use [resources/template.md](resources/template.md) quick template. For complex multi-stakeholder communications requiring different versions → Study [resources/methodology.md](resources/methodology.md) for audience segmentation and narrative adaptation techniques. To see what good looks like → Review [resources/examples/](resources/examples/).
+
+**Step 3: Craft the narrative**
+
+Create `communication-storytelling.md` with: (1) Compelling headline that captures essence in one sentence, (2) 3-5 key points arranged in logical flow (chronological, problem-solution, importance-ranked), (3) Concrete proof for each point (data, examples, quotes, stories), (4) Clear call-to-action stating what audience should do next. Use storytelling techniques: specificity over generality, show don't tell, human stories over abstract concepts, tension/resolution arcs. See [Story Structure](#story-structure) for narrative patterns.
+
+**Step 4: Validate quality and clarity**
+
+Self-assess using [resources/evaluators/rubric_communication_storytelling.json](resources/evaluators/rubric_communication_storytelling.json). Check: headline is clear and compelling, key points are distinct and well-supported, proof is concrete and relevant, flow is logical, tone matches audience, jargon is appropriate for expertise level, call-to-action is clear and achievable, length matches time constraints. Read aloud to test clarity. Test with "so what?" question—does each point answer why audience should care? Minimum standard: Average score ≥ 3.5 before delivering.
+
+**Step 5: Deliver and adapt**
+
+Present the completed `communication-storytelling.md` file. Highlight how narrative addresses audience's key concerns. Note storytelling techniques used (data humanized, tension-resolution, specificity). If user has feedback or needs adaptations for different audiences, use [resources/methodology.md](resources/methodology.md) for multi-version strategy.
+
+## Story Structure
+
+### The Hero's Journey (Transformation Story)
+
+**When to use:** Major changes, pivots, overcoming challenges
+
+**Structure:**
+1. **Status Quo** - Where we were (comfort, but problem lurking)
+2. **Call to Adventure** - Why we had to change (problem emerges)
+3. **Trials** - What we tried, what we learned (struggle builds credibility)
+4. **Victory** - What worked (resolution)
+5. **Return with Knowledge** - What we do now (new normal, lessons learned)
+
+**Example:** "We were growing 20% YoY, but churning 10% monthly—unsustainable. Data showed we were solving the wrong problem for the wrong users. We tested 5 hypotheses over 3 months, failing at 4. The one that worked: focusing on power users willing to pay 5x more. Churn dropped to 2%, growth hit 40% YoY. Now we're betting everything on premium tier."
+
+### Problem-Solution-Benefit (Decision Story)
+
+**When to use:** Recommendations, proposals, project updates
+
+**Structure:**
+1. **Problem** - Clearly defined issue with stakes (what happens if unaddressed)
+2. **Solution** - Your recommendation with rationale (why this, not alternatives)
+3. **Benefit** - Tangible outcomes (quantified impact)
+
+**Example:** "We lose 30% of signups at checkout—$2M ARR left on table. Root cause: we ask for credit card before users see value. Proposal: 14-day trial, no card required, with onboarding emails showing ROI. Comparable companies saw 60% conversion lift. Expected impact: +$1.2M ARR with 4-week implementation."
+
+### Before-After-Bridge (Contrast Story)
+
+**When to use:** Product launches, feature announcements, process improvements
+
+**Structure:**
+1. **Before** - Current painful state (audience's lived experience)
+2. **After** - Improved future state (what becomes possible)
+3. **Bridge** - How to get there (your solution)
+
+**Example:** "Before: Sales team spends 10 hours/week manually exporting data, cleaning it in spreadsheets, and copy-pasting into slide decks—error-prone and soul-crushing. After: One-click report generation with live data, auto-refreshing dashboards, 30 minutes per week. Bridge: We built sales analytics v2.0, launching Monday with training sessions."
+
+### Situation-Complication-Resolution (Executive Story)
+
+**When to use:** Executive communications, board updates, investor relations
+
+**Structure:**
+1. **Situation** - Context and baseline (set the stage)
+2. **Complication** - What changed or what's at stake (creates tension)
+3. **Resolution** - Your path forward (release tension)
+
+**Example:** "Situation: We budgeted $5M for customer acquisition in 2024. Complication: iOS 17 privacy changes killed our primary ad channel—50% drop in conversion overnight. Resolution: Shifting $2M to content marketing (3-month ROI), $1M to partnerships (immediate distribution), keeping $2M in ads for testing new channels. Risk: content takes time to scale, but partnerships derisk timeline."
+
+## Common Patterns
+
+**Data-Heavy Communications:**
+- Lead with insight, not data
+- One number per point (too many = confusion)
+- Humanize data with stories: "42% churn" → "We lose 12 customers every week—that's Sarah's entire cohort from January"
+- Use comparisons for context: "200ms latency" → "2x slower than competitors, 3x slower than last year"
+
+**Technical → Non-Technical:**
+- Translate jargon: "distributed consensus algorithm" → "how servers agree on truth without a central authority"
+- Use analogies from audience's domain: "Kubernetes is like a airport air traffic control for containers"
+- Focus on business impact, not technical implementation
+- Anticipate "why does this matter?" and answer it explicitly
+
+**Change Management:**
+- Acknowledge the loss/pain (don't gloss over difficulty)
+- Paint compelling future state (hope, not just fear)
+- Show path from here to there (make it concrete)
+- Address "what about me?" early (personal impact)
+
+**Crisis Communications:**
+- Lead with facts (what happened, when, impact)
+- Take accountability (no blame-shifting or weasel words)
+- State what you're doing (concrete actions with timeline)
+- Commit to transparency (when they'll hear next)
+
+## Guardrails
+
+**Do:**
+- ✅ Test headline clarity—can someone understand the essence in 10 seconds?
+- ✅ Use concrete specifics over vague generalities
+- ✅ Match sophistication level to audience (avoid talking up or down)
+- ✅ Front-load conclusions (executives decide in first 30 seconds)
+- ✅ Show your work for major claims (data sources, assumptions)
+- ✅ Acknowledge limitations and risks (builds credibility)
+
+**Don't:**
+- ❌ Bury the lede (most important thing must be first)
+- ❌ Use jargon your audience doesn't know (or define it)
+- ❌ Make claims without proof (erodes trust)
+- ❌ Assume audience cares—make them care by showing stakes
+- ❌ Write walls of text (use bullets, headers, white space)
+- ❌ Lie or mislead (including by omission)
+
+**Red Flags:**
+- 🚩 Your draft is mostly bullet points with no narrative arc
+- 🚩 You can't summarize your message in one sentence
+- 🚩 You use passive voice to avoid accountability ("mistakes were made")
+- 🚩 You include data that doesn't support your points
+- 🚩 Your call-to-action is vague ("be better," "work harder")
+
+## Quick Reference
+
+**Resources:**
+- **[resources/template.md](resources/template.md)** - Quick-start template with headline, key points, proof structure
+- **[resources/methodology.md](resources/methodology.md)** - Advanced techniques for multi-stakeholder communications, narrative frameworks, persuasion principles
+- **[resources/examples/](resources/examples/)** - Worked examples showing different story structures and audiences
+- **[resources/evaluators/rubric_communication_storytelling.json](resources/evaluators/rubric_communication_storytelling.json)** - 10-criteria quality rubric with audience-based thresholds
+
+**When to use which resource:**
+- Standard communication → Start with template.md
+- Multiple audiences for same message → Study methodology.md multi-version strategy
+- Complex persuasion (board pitch, investor update) → Study methodology.md persuasion frameworks
+- Unsure what good looks like → Review examples/ for your scenario
+- Before delivering → Validate with rubric (score ≥ 3.5 required)
--- a/skills/communication-storytelling/resources/evaluators/rubric_communication_storytelling.json
+++ b/skills/communication-storytelling/resources/evaluators/rubric_communication_storytelling.json
@@ -0,0 +1,335 @@
+{
+  "criteria": [
+    {
+      "name": "Headline Clarity",
+      "description": "Can audience understand the core message in 10 seconds?",
+      "scoring": {
+        "1": "No clear headline, or headline is vague/generic. Reader doesn't know what message is about.",
+        "2": "Headline exists but is vague or buries key insight. Requires reading body to understand point.",
+        "3": "Headline clearly states topic and general direction. Reader gets gist but not full insight.",
+        "4": "Headline captures core message with specificity. Reader understands essence without reading body.",
+        "5": "Compelling headline that captures essence, creates curiosity, and uses concrete specifics. Impossible to misunderstand."
+      },
+      "red_flags": [
+        "Headline is generic ('Q3 Update', 'Project Status')",
+        "Can't summarize message in one sentence",
+        "Headline describes format not content ('Memo on X' vs 'We should do X because Y')",
+        "Buries insight in paragraph 3 instead of headline"
+      ]
+    },
+    {
+      "name": "Structure and Flow",
+      "description": "Is narrative easy to follow with logical progression?",
+      "scoring": {
+        "1": "No clear structure. Random collection of points without logical connection.",
+        "2": "Some structure but jumps around. Reader confused about how points relate or what comes next.",
+        "3": "Clear structure with distinct sections. Logical flow but transitions could be smoother.",
+        "4": "Well-organized structure with smooth transitions. Each point builds on previous. Easy to follow.",
+        "5": "Exemplary narrative arc. Points flow naturally, build tension/resolution, guide reader seamlessly from problem to action."
+      },
+      "red_flags": [
+        "Jumps between topics without transitions",
+        "Key points overlap or repeat",
+        "No clear progression (flat list of facts)",
+        "Conclusion doesn't follow from body (non sequitur)"
+      ]
+    },
+    {
+      "name": "Evidence Quality",
+      "description": "Are claims backed by concrete, credible proof?",
+      "scoring": {
+        "1": "No evidence. Claims without support. Asks for blind trust.",
+        "2": "Minimal evidence. Vague statements ('studies show', 'many customers') without specifics.",
+        "3": "Adequate evidence. Some data/examples provided but missing sources or context.",
+        "4": "Strong evidence. Specific data with sources, concrete examples, comparisons for context.",
+        "5": "Comprehensive proof. Multiple evidence types (quantitative, qualitative, examples), sources cited, limitations acknowledged."
+      },
+      "red_flags": [
+        "Unsourced claims ('experts agree', 'industry standard')",
+        "Cherry-picked data without showing full picture",
+        "Anecdotes presented as data ('one customer said' as proof of trend)",
+        "No comparisons (data without context)"
+      ]
+    },
+    {
+      "name": "Audience Fit",
+      "description": "Is message tailored to audience's expertise, concerns, and constraints?",
+      "scoring": {
+        "1": "Wrong audience fit. Jargon for non-experts, or dumbed-down for experts. Ignores their concerns.",
+        "2": "Partial fit. Some mismatch in sophistication or doesn't address key concerns audience has.",
+        "3": "Good fit. Appropriate sophistication level, addresses main concerns, reasonable length.",
+        "4": "Excellent fit. Matches expertise, directly addresses concerns, appropriate tone and length, uses their language.",
+        "5": "Perfect fit. Deeply understands audience, anticipates objections, uses analogies from their domain, feels personalized."
+      },
+      "red_flags": [
+        "Technical jargon for non-technical audience",
+        "Oversimplifies for expert audience",
+        "Doesn't address audience's stated priorities",
+        "Length mismatches time available (5-page email for busy executive)"
+      ]
+    },
+    {
+      "name": "Storytelling Techniques",
+      "description": "Uses specificity, shows vs tells, humanizes data, builds tension?",
+      "scoring": {
+        "1": "No storytelling. Dry recitation of facts or feature list. No narrative arc.",
+        "2": "Minimal storytelling. Mostly facts with occasional story elements. Tells more than shows.",
+        "3": "Good storytelling. Uses some techniques (specifics, examples, data humanization). Shows and tells.",
+        "4": "Strong storytelling. Consistently shows vs tells, uses specifics, humanizes data, builds some tension.",
+        "5": "Masterful storytelling. Vivid specifics, shows throughout, data becomes human stories, tension and resolution arc."
+      },
+      "red_flags": [
+        "Uses generalities instead of specifics ('many', 'significant', 'improved')",
+        "Tells instead of shows ('this is great' vs concrete evidence it's great)",
+        "Data dump without interpretation or humanization",
+        "No narrative arc (flat delivery of information)"
+      ]
+    },
+    {
+      "name": "Accountability and Honesty",
+      "description": "Takes ownership, acknowledges risks/limitations, no blame-shifting?",
+      "scoring": {
+        "1": "No accountability. Passive voice, blame-shifting, or hiding responsibility.",
+        "2": "Weak accountability. Some ownership but uses weasel words or deflects.",
+        "3": "Adequate accountability. Takes responsibility, but doesn't fully acknowledge limitations.",
+        "4": "Strong accountability. Clear ownership, acknowledges risks and limitations honestly.",
+        "5": "Exemplary accountability. Named ownership, vulnerable honesty about uncertainties, acknowledges past mistakes if relevant."
+      },
+      "red_flags": [
+        "Passive voice hides actors ('mistakes were made' vs 'I made mistakes')",
+        "Blame external factors without acknowledging internal role",
+        "Overconfident claims without acknowledging uncertainties",
+        "Misleading by omission (hiding risks or downsides)"
+      ]
+    },
+    {
+      "name": "Actionability (Call-to-Action)",
+      "description": "Is CTA clear, specific, achievable, and time-bound?",
+      "scoring": {
+        "1": "No CTA, or completely vague ('think about it', 'be better').",
+        "2": "CTA exists but vague or passive. Unclear what to do or who should do it.",
+        "3": "Clear CTA but missing timeline, owner, or specifics on how to do it.",
+        "4": "Strong CTA. Specific action, clear owner, deadline, achievable.",
+        "5": "Perfect CTA. Specific, achievable, time-bound, clear owner, low-friction next step provided (link, meeting invite, etc)."
+      },
+      "red_flags": [
+        "No action requested (information without purpose)",
+        "Vague ask ('let's improve X')",
+        "No timeline ('eventually', 'soon')",
+        "No owner (unclear who should act)",
+        "Too many asks (confusing priority)"
+      ]
+    },
+    {
+      "name": "Tone Appropriateness",
+      "description": "Does tone match situation (crisis, celebration, persuasion, etc)?",
+      "scoring": {
+        "1": "Tone completely wrong. Casual for crisis, or somber for celebration.",
+        "2": "Tone somewhat off. Misreads situation or audience expectations.",
+        "3": "Tone mostly appropriate. Minor mismatches but generally fits.",
+        "4": "Tone fits well. Appropriate formality, urgency, and emotion for situation.",
+        "5": "Tone perfect. Nuanced match to situation, builds appropriate emotional connection, feels authentic."
+      },
+      "red_flags": [
+        "Inappropriate levity in crisis",
+        "Overly formal for casual announcement",
+        "Defensive tone when accountability needed",
+        "Hype/marketing speak for internal honest conversation"
+      ]
+    },
+    {
+      "name": "Transparency",
+      "description": "Are assumptions, data sources, and limitations explicit?",
+      "scoring": {
+        "1": "Opaque. No visibility into how conclusions reached, what's assumed, or data sources.",
+        "2": "Minimal transparency. Some info provided but key assumptions or limitations hidden.",
+        "3": "Adequate transparency. Main assumptions and sources stated, but some gaps.",
+        "4": "High transparency. Assumptions explicit, sources cited, limitations acknowledged.",
+        "5": "Full transparency. Shows work, cites sources, states assumptions, acknowledges limitations, distinguishes facts from speculation."
+      },
+      "red_flags": [
+        "Unsourced data ('research shows')",
+        "Unstated assumptions (e.g., market stays stable)",
+        "No acknowledgment of limitations or uncertainties",
+        "Facts and speculation mixed without distinction"
+      ]
+    },
+    {
+      "name": "Credibility",
+      "description": "Does narrative build trust through vulnerability, track record, or expert validation?",
+      "scoring": {
+        "1": "No credibility signals. Asks for trust without earning it.",
+        "2": "Weak credibility. Some signals (e.g., 'trust me') but not substantiated.",
+        "3": "Adequate credibility. Track record or external validation mentioned.",
+        "4": "Strong credibility. Combines multiple signals: vulnerability, track record, expert validation, data transparency.",
+        "5": "Exceptional credibility. Vulnerable honesty, demonstrated track record, expert validation, shows calibration (past predictions vs outcomes)."
+      },
+      "red_flags": [
+        "No track record or validation provided",
+        "Overconfident without acknowledging past errors",
+        "Appeals to authority without substance ('experts agree')",
+        "No vulnerability (appears infallible, reduces trust)"
+      ]
+    }
+  ],
+  "audience_guidance": {
+    "Low-Stakes": {
+      "description": "Routine updates, internal announcements, FYI communications",
+      "target_score": 3.0,
+      "focus_criteria": ["Headline Clarity", "Actionability", "Audience Fit"],
+      "success_indicators": [
+        "Message is clear and quickly understood",
+        "Action needed (if any) is obvious",
+        "Appropriate length for stakes"
+      ],
+      "acceptable_tradeoffs": [
+        "Can be more concise at expense of storytelling",
+        "Less extensive proof needed",
+        "Lower formality acceptable"
+      ]
+    },
+    "Medium-Stakes": {
+      "description": "Product announcements, project updates, recommendations needing approval",
+      "target_score": 3.5,
+      "focus_criteria": ["Evidence Quality", "Structure and Flow", "Storytelling Techniques", "Actionability"],
+      "success_indicators": [
+        "Well-supported with concrete evidence",
+        "Compelling narrative that engages audience",
+        "Clear action and timeline",
+        "Addresses likely objections"
+      ],
+      "acceptable_tradeoffs": [
+        "Some complexity acceptable if audience has time",
+        "Can be longer if stakes warrant detail"
+      ]
+    },
+    "High-Stakes": {
+      "description": "Executive decisions, crisis communications, investor updates, major changes",
+      "target_score": 4.0,
+      "focus_criteria": ["All criteria, especially Accountability, Evidence Quality, Credibility, Transparency"],
+      "success_indicators": [
+        "Comprehensive evidence from multiple sources",
+        "Full transparency on assumptions and risks",
+        "Clear accountability and ownership",
+        "Builds credibility through vulnerability",
+        "Anticipates and addresses objections",
+        "Strong storytelling that builds trust"
+      ],
+      "acceptable_tradeoffs": [
+        "Can be longer if needed for completeness",
+        "Extra formality for gravitas"
+      ]
+    }
+  },
+  "communication_type_guidance": {
+    "Technical to Non-Technical": {
+      "description": "Explaining technical concepts/decisions to business stakeholders",
+      "critical_criteria": ["Audience Fit", "Storytelling Techniques"],
+      "key_patterns": [
+        "Use analogies from audience's domain",
+        "Focus on business impact, not technical implementation",
+        "Translate jargon or define terms",
+        "Show with concrete examples, not abstract concepts"
+      ]
+    },
+    "Executive Communication": {
+      "description": "Board updates, CEO memos, investor relations",
+      "critical_criteria": ["Headline Clarity", "Evidence Quality", "Transparency"],
+      "key_patterns": [
+        "Front-load conclusions (BLUF - Bottom Line Up Front)",
+        "Quantify everything (revenue, cost, time, risk)",
+        "Show vs baseline/target/competitors",
+        "Acknowledge risks explicitly"
+      ]
+    },
+    "Customer-Facing": {
+      "description": "Product announcements, incident communications, customer updates",
+      "critical_criteria": ["Tone Appropriateness", "Accountability", "Actionability"],
+      "key_patterns": [
+        "Lead with customer impact (not internal process)",
+        "Clear next steps for customer",
+        "Empathy for pain points",
+        "No jargon or internal acronyms"
+      ]
+    },
+    "Change Management": {
+      "description": "Org changes, process changes, difficult news",
+      "critical_criteria": ["Tone Appropriateness", "Accountability", "Storytelling Techniques", "Transparency"],
+      "key_patterns": [
+        "Acknowledge loss/pain (don't gloss over difficulty)",
+        "Paint compelling future state",
+        "Show path from here to there",
+        "Address 'what about me?' early"
+      ]
+    },
+    "Crisis Communication": {
+      "description": "Incidents, outages, mistakes, sensitive issues",
+      "critical_criteria": ["Accountability", "Transparency", "Actionability", "Credibility"],
+      "key_patterns": [
+        "Lead with facts (what happened, when, impact)",
+        "Take accountability (no passive voice or blame-shifting)",
+        "State what you're doing (concrete actions with timeline)",
+        "Commit to transparency (when they'll hear next)"
+      ]
+    }
+  },
+  "common_failure_modes": [
+    {
+      "failure_mode": "Burying the Lede",
+      "symptoms": "Important insight appears in paragraph 3 or buried in middle of long text.",
+      "consequences": "Busy audience never sees main point. Decisions delayed or made without full context.",
+      "fix": "Move most important insight to headline and first sentence. Use inverted pyramid (most important first)."
+    },
+    {
+      "failure_mode": "Death by Bullets",
+      "symptoms": "Deck with 50 bullet points, no narrative thread connecting them.",
+      "consequences": "Audience can't follow logic, points don't build, no memorable takeaway.",
+      "fix": "Use bullets to support narrative, not replace it. Each slide should have one key point."
+    },
+    {
+      "failure_mode": "Jargon Mismatch",
+      "symptoms": "Technical terms for non-technical audience, or dumbed-down language for experts.",
+      "consequences": "Audience either confused or insulted. Message doesn't land.",
+      "fix": "Match sophistication to audience. Define jargon when needed. Use analogies from their domain."
+    },
+    {
+      "failure_mode": "Data Dump Without Interpretation",
+      "symptoms": "Lists metrics without context or insight. 'Churn is 3.2%, NPS is 58, CAC is $1,150.'",
+      "consequences": "Audience doesn't know what data means or what to do with it.",
+      "fix": "Lead with insight, support with data. 'We're retaining customers well (3.2% churn is top quartile) but they're expensive to acquire ($1,150 CAC = 18-month payback).'"
+    },
+    {
+      "failure_mode": "Vague Call-to-Action",
+      "symptoms": "CTA like 'Let's be more customer-focused' or 'Think about this'.",
+      "consequences": "No one does anything. Message dies without action.",
+      "fix": "Specific action, owner, timeline. 'Sarah, approve $50K budget by Friday' or 'Each team should interview 5 customers by month-end using guide [link].'"
+    },
+    {
+      "failure_mode": "No Stakes",
+      "symptoms": "Recommendation without showing cost of inaction. 'We should improve X.'",
+      "consequences": "Audience doesn't prioritize. Request ignored in favor of urgent items.",
+      "fix": "Show opportunity cost. 'Page load time is 2.5s, costing us 30% of conversions ($800K annually). Optimizing to 1s recovers $240K in year 1.'"
+    },
+    {
+      "failure_mode": "Correlation as Causation",
+      "symptoms": "Claims causal relationship based on correlation. 'Feature X increased, then revenue grew, so X caused growth.'",
+      "consequences": "Wrong decisions based on spurious relationships. Waste resources on non-drivers.",
+      "fix": "Be explicit about causation vs correlation. If claiming causation, show mechanism or use causal inference methods."
+    },
+    {
+      "failure_mode": "Passive Voice Hiding Accountability",
+      "symptoms": "'Mistakes were made', 'The decision was reached', 'It was determined that...'",
+      "consequences": "Erodes trust. Audience doesn't know who's responsible or in control.",
+      "fix": "Use active voice with named actors. 'I made mistakes', 'The exec team decided', 'Based on our analysis, I recommend...'"
+    }
+  ],
+  "scale": 5,
+  "minimum_average_score": 3.5,
+  "interpretation": {
+    "1.0-2.0": "Inadequate. Major issues with clarity, evidence, or audience fit. Do not deliver. Revise significantly.",
+    "2.0-3.0": "Needs improvement. Basic structure present but weak evidence, poor storytelling, or mismatched tone. Acceptable only for low-stakes FYI messages.",
+    "3.0-3.5": "Acceptable. Clear message with adequate evidence. Suitable for routine communications and internal updates.",
+    "3.5-4.0": "Good. Compelling narrative with strong evidence and clear action. Suitable for medium-stakes product announcements, recommendations.",
+    "4.0-5.0": "Excellent. Masterful storytelling, comprehensive evidence, builds credibility and trust. Suitable for high-stakes executive/crisis/customer communications."
+  }
+}
--- a/skills/communication-storytelling/resources/examples/product-launch-announcement.md
+++ b/skills/communication-storytelling/resources/examples/product-launch-announcement.md
@@ -0,0 +1,242 @@
+# Example: Product Launch Announcement
+
+## Scenario
+
+**Context:** Launching "Analytics Insights" - automated report generation tool that saves customer success teams 10-15 hours per week. Announcing to existing customers via email + blog post.
+
+**Audience:** Customer success managers and directors at mid-size B2B SaaS companies
+- Expertise: Moderate technical literacy, very familiar with pain points
+- Concerns: Time savings, ease of adoption, cost, disruption to current workflow
+- Time available: 2-3 minutes to read email, 5-10 minutes for blog
+
+**Purpose:** Drive adoption of new feature (target: 40% adoption in 30 days)
+
+**Tone:** Empathetic (acknowledge pain), excited (benefits), supportive (easy transition)
+
+---
+
+## Story Structure Used
+
+**Before-After-Bridge (BAB):**
+1. Before: Painful current state (audience's lived experience)
+2. After: Improved future state (what becomes possible)
+3. Bridge: The solution (how to get there)
+4. Call to Action: Next step to access the "after" state
+
+---
+
+## Draft: Communication
+
+### Before (Weak - Feature Announcement)
+
+> Subject: New Feature: Analytics Insights
+>
+> Hi there,
+>
+> We're excited to announce Analytics Insights, a new feature that automatically generates customer health reports. This feature uses machine learning to analyze usage patterns and create personalized insights for each customer.
+>
+> Key features:
+> - Automated report generation
+> - Customizable templates
+> - Email scheduling
+> - Data visualization
+> - 10+ pre-built metrics
+>
+> Analytics Insights is available now in the dashboard under "Reports." Check out our documentation for setup instructions.
+>
+> Let us know if you have any questions!
+
+**Problems:**
+- ❌ Leads with feature ("Analytics Insights") not benefit
+- ❌ Doesn't acknowledge current pain (assumes audience knows why they need this)
+- ❌ Feature list without context (why do I care about "customizable templates"?)
+- ❌ No proof (claims time savings but doesn't quantify or show evidence)
+- ❌ Weak CTA ("check it out") without urgency or clear next step
+- ❌ No storytelling - just a list of features
+
+### After (Strong - Storytelling)
+
+> Subject: Spend 15 hours/week on customers, not spreadsheets
+>
+> # Before: Your Weekends Belong to Excel. After: They Belong to You.
+>
+> If you're a customer success manager, I don't have to tell you how your Fridays go.
+>
+> **Your current Friday afternoon:**
+> - 2:00pm: Start pulling usage data for your top 20 accounts
+> - 2:45pm: Export to CSV, clean up formatting issues (why does every export break?)
+> - 3:30pm: Build pivot tables and charts in Google Sheets
+> - 4:15pm: Copy-paste into your customer email template
+> - 4:45pm: Personalize each email (change "Hey Sarah" to "Hey Michael", update account-specific context)
+> - 6:30pm: Send last email, realize you spent 4.5 hours on reporting that should have taken 30 minutes
+> - Saturday morning: Customer replies asking for clarification on a metric - back to your laptop
+>
+> You became a CS manager to build relationships and drive retention. Instead, you're a data janitor.
+>
+> We built Analytics Insights to fix this.
+>
+> ## After: What Your Friday Looks Like Now
+>
+> **Your new Friday afternoon:**
+> - 2:00pm: Open Analytics Insights, click "Generate Reports"
+> - 2:03pm: Review automatically generated insights for your 20 accounts (already personalized, already visualized, already in email format)
+> - 2:15pm: Click "Send" or make minor tweaks if needed
+> - 2:20pm: Done. You just freed up 4 hours.
+>
+> **What you do with those 4 hours:**
+> - Proactively reach out to at-risk accounts (the system flagged 3 with declining usage)
+> - Prepare for Monday's executive retention review (with confidence, not scrambling)
+> - Actually leave at 5pm on Friday
+>
+> This isn't about automation for automation's sake. It's about spending time on what matters: your customers.
+>
+> ## How It Works (The Bridge)
+>
+> **Setup (5 minutes):**
+> 1. Go to Reports → Analytics Insights
+> 2. Choose your template (we have 5 pre-built, including "Executive Summary" and "Feature Adoption")
+> 3. Select accounts and schedule (weekly Fridays at 2pm, or on-demand)
+> 4. Done.
+>
+> **What Happens Next:**
+> Every Friday at 2pm, Analytics Insights:
+> - Pulls usage data for each account (sessions, feature adoption, health score trends)
+> - Generates insights using the same patterns you'd manually identify ("Sarah's team adoption is up 40% this month, driven by the new workflow builder")
+> - Creates beautiful visualizations (no more manual charting)
+> - Drafts personalized emails (including customer name, account-specific context, relevant metrics)
+> - Sends to you for review (or auto-sends if you enable it)
+>
+> **Customization:**
+> - Add your own metrics (we start with 12 standard ones)
+> - Edit templates (match your voice and brand)
+> - Override insights (AI gets it 90% right, you perfect the last 10%)
+>
+> ## Why We Built This (Proof You Can Trust It)
+>
+> We didn't build this in a vacuum. Over the past 6 months:
+> - We interviewed 40 CS managers like you
+> - 38 of them mentioned "manual reporting" as their #1 time sink
+> - Average time reported: 12 hours per week
+>
+> We piloted Analytics Insights with 12 beta customers for 8 weeks. Results:
+> - **Time saved:** Average 10.6 hours per week (range: 8-15 hours depending on account size)
+> - **Accuracy:** 94% of AI-generated insights matched what CS managers would have written manually
+> - **Adoption:** 11 of 12 beta users now use it weekly (one still tweaking template preferences)
+>
+> **What beta users said:**
+>
+> > "This is the feature I didn't know I needed until I had it. Now I can't imagine going back. I left at 4pm last Friday for the first time in a year." — Sarah Chen, CS Director, TechCorp (120 accounts)
+>
+> > "I was skeptical about AI writing my customer emails. But it nails the tone—I only change 1-2 sentences per email now. Total game changer." — Michael Rodriguez, CS Manager, GrowthCo (45 accounts)
+>
+> > "The ROI calculation is simple: I save 12 hours per week. That's 48 hours per month. If my time is worth $100/hr (conservative), this saves my company $4,800/month. It's already included in my plan. This is a no-brainer." — Jamie Lee, VP Customer Success, DataCo (8-person CS team)
+>
+> ## This Is Included (No Extra Cost)
+>
+> Analytics Insights is included in your current plan. No upgrade, no add-on, no surprise invoice.
+>
+> Why? Because your success is our success. If you spend less time on manual work and more time with customers, retention improves. When retention improves, we both win.
+>
+> ## Get Started Today (Your Friday Is Waiting)
+>
+> **Here's what to do next:**
+>
+> 1. **[Try it now - 5 minute setup](link)** → Set up your first automated report
+> 2. **[Watch 3-minute demo video](link)** → See it in action before you commit
+> 3. **[Join office hours this Thursday 2pm PT](link)** → Ask questions, see advanced tips
+>
+> **What to expect:**
+> - **This week:** Set up and generate your first report (test with 5 accounts before full rollout)
+> - **Next week:** Roll out to all accounts, customize templates to match your voice
+> - **Week 3:** Sit back and enjoy your Fridays (seriously, that's the goal)
+>
+> **Need help?**
+> Reply to this email and I'll personally help you set up. Or book time with our CS team [here](link).
+>
+> ## A Note from Our CEO
+>
+> "We started this company because we believe customer success teams are undervalued and overworked. You're expected to be data analysts, relationship managers, product experts, and therapists—all at once. This feature is our attempt to give you back time to focus on what you do best: building relationships that drive retention. I hope it makes your life a little easier."
+>
+> —CEO Name
+>
+> **P.S.** Set a reminder to try this before next Friday. Future you will thank present you when you leave work at 5pm instead of 7pm. [Set up Analytics Insights now →](link)
+
+**Why This Works:**
+
+✅ **Headline:** "Spend 15 hours/week on customers, not spreadsheets" - benefit-focused, quantified, relatable
+✅ **Empathy:** Opens with painfully specific current state that audience lives ("Friday 2pm, start pulling usage data...")
+✅ **Show don't tell:** Detailed timeline of current Friday (4.5 hours of manual work) vs new Friday (20 minutes)
+✅ **Specificity:** 10.6 hours saved (not "saves time"), 94% accuracy (not "highly accurate"), 11/12 adoption (not "popular")
+✅ **Social proof:** 3 customer testimonials with names, companies, account sizes (credibility through specificity)
+✅ **Proof:** Beta results (12 customers, 8 weeks, quantified outcomes) not just claims
+✅ **Stakes:** Humanized ($4,800/month = 48 hours × $100/hr) and emotional (leave at 5pm on Friday)
+✅ **No-risk:** Included in current plan (removes cost objection)
+✅ **Actionability:** 3 clear next steps (try now, watch demo, join office hours) with timelines
+✅ **Multiple CTAs:** Try now (for action-oriented), watch demo (for cautious), office hours (for question-askers)
+✅ **Tone:** Empathetic ("I don't have to tell you..."), supportive ("I'll personally help"), excited but not over-the-top
+✅ **Structure:** BAB (Before → After → Bridge) creates clear transformation narrative
+
+---
+
+## Self-Assessment Using Rubric
+
+**Headline Clarity (5/5):** "Spend 15 hours/week on customers, not spreadsheets" - crystal clear benefit
+**Structure (5/5):** BAB (Before painful, After aspirational, Bridge actionable) - perfect fit for product launch
+**Evidence Quality (5/5):** 12 beta customers, 8 weeks, 10.6 hours saved, 94% accuracy, 11/12 adoption, 3 named testimonials
+**Audience Fit (5/5):** Deep empathy with CS manager pain points, appropriate detail level, addresses concerns (cost, accuracy, adoption)
+**Storytelling (5/5):** Hyper-specific current state (Friday timeline), vivid future state (leave at 5pm), concrete bridge (5-minute setup)
+**Accountability (4/5):** Acknowledges AI isn't perfect (90% right, you perfect last 10%), CEO note shows commitment
+**Actionability (5/5):** 3 tiered CTAs (try/watch/ask), weekly timeline, support offers (personal help, office hours)
+**Tone (5/5):** Empathetic + excited + supportive - matches product launch for existing customers
+**Transparency (5/5):** Shows beta results (not just cherry-picked wins), admits AI needs 10% human refinement
+**Credibility (5/5):** Customer testimonials with full names/companies, quantified beta results, CEO commitment
+
+**Average: 4.9/5** ✓ Production-ready (very strong)
+
+---
+
+## Key Techniques Demonstrated
+
+1. **Empathy Opening:** Start with painful specificity audience recognizes ("Your Friday afternoon: 2:00pm...")
+2. **Transformation Narrative:** Contrast current painful state (6:30pm still working) with aspirational future (2:20pm done)
+3. **Humanization:** Time saved → emotional benefit (leave at 5pm Friday for first time in a year)
+4. **Social Proof:** 3 testimonials from different seniority levels (director, manager, VP) with specific results
+5. **Risk Removal:** Included in current plan (no cost), 5-minute setup (low effort), personal help offered (low barrier)
+6. **Multiple CTAs:** Try/Watch/Ask - accommodates different audience personas (action-takers, cautious evaluators, question-askers)
+7. **Proof Stack:** Interviews (40 CS managers) + beta (12 customers, 8 weeks) + testimonials (3 named) = comprehensive evidence
+8. **Specificity:** Not "saves time" but "10.6 hours/week", not "accurate" but "94%", not "popular" but "11/12 beta users"
+9. **CEO Voice:** Adds weight and shows company commitment (not just product team shipping feature)
+10. **PS Technique:** Reinforces CTA with emotional hook (future you will thank present you)
+
+---
+
+## Alternative Version: Internal Announcement (to CS Team)
+
+If announcing internally to your own CS team (not customers), adjust:
+
+**Headline:** "Reclaim Your Fridays: New Auto-Reporting Tool Launching"
+
+**Changes:**
+- More emphasis on how to get support (training sessions, dedicated Slack channel)
+- Call out change management (optional first month, required after pilot)
+- Acknowledge concerns ("I know change is hard when you have a system that works")
+- Add metrics we're tracking (adoption rate, time saved, quality scores)
+- Make CEO note about supporting the team through transition
+
+**Tone shift:** Still empathetic, but more collaborative (we're in this together) vs selling (you should use this)
+
+---
+
+## Alternative Version: Blog Post (Public)
+
+If publishing as public blog (not just customers), adjust:
+
+**Headline:** "Why We Built Analytics Insights: Giving CS Teams Their Time Back"
+
+**Changes:**
+- Add "Why this matters for the industry" section (CS burnout crisis, data janitor problem universal)
+- Include more behind-the-scenes (how we built it, technical challenges overcome)
+- Broaden appeal (useful for any CS tool provider, not just our customers)
+- End with industry call-to-action (other CS tools should solve this too)
+
+**Tone shift:** Thought leadership (here's what we learned) vs product marketing (here's what you get)
--- a/skills/communication-storytelling/resources/examples/technical-incident-postmortem.md
+++ b/skills/communication-storytelling/resources/examples/technical-incident-postmortem.md
@@ -0,0 +1,184 @@
+# Example: Technical Incident Postmortem
+
+## Scenario
+
+**Context:** Major database outage affected 80% of customers for 4 hours on a Saturday. Need to communicate to executive team on Monday.
+
+**Audience:** CEO, CTO, VP Product, VP Customer Success
+- Expertise: Mixed (CEO non-technical, others technical)
+- Concerns: Customer impact, reputation, revenue risk, engineering capability
+- Time available: 15 minutes in emergency exec meeting
+
+**Purpose:** Build trust through accountability, explain root cause, commit to prevention
+
+**Tone:** Serious but measured, accountable not defensive, solution-oriented
+
+---
+
+## Story Structure Used
+
+**Crisis Communication Arc (Trust Building):**
+1. Acknowledge problem honestly
+2. Show accountability and action taken
+3. Explain root cause and prevention
+4. Commit to transparency and improvement
+
+---
+
+## Draft: Communication
+
+### Before (Weak - Data Dump)
+
+> Subject: Database Incident - Saturday Nov 2
+>
+> On Saturday November 2nd from 2:14am to 6:47am PT, our primary database experienced corrupted indexes causing query failures. 80% of customer requests returned errors. The incident was triggered by a combination of factors including a schema migration deployed on Friday evening and increased load from a batch job. We rolled back the migration at 5:30am which partially restored service. Full recovery occurred at 6:47am after rebuilding indexes.
+>
+> Impact: 4h 33min outage, 847 customers affected, 12,400 failed requests. Root cause was insufficient testing of migration under production load patterns. We are implementing better testing procedures.
+>
+> Let me know if you have questions.
+
+**Problems:**
+- ❌ Buries emotional acknowledgment (no empathy for customer impact)
+- ❌ Technical jargon obscures key insights ("corrupted indexes" - what does that mean for business?)
+- ❌ Passive voice avoids accountability ("was triggered by" - who's responsible?)
+- ❌ Vague prevention ("better testing" - what specifically?)
+- ❌ No explicit commitment to transparency
+
+### After (Strong - Storytelling)
+
+> Subject: Saturday Outage: What Happened, Why, and What We're Doing About It
+>
+> # We Failed 80% of Our Customers on Saturday
+>
+> On Saturday at 2am, our main database went down, affecting 80% of customers for 4.5 hours. This is unacceptable. Our customers trust us to keep their businesses running, and we broke that trust. Here's what happened, why it happened, and how we're ensuring it never happens again.
+>
+> ## What Happened
+>
+> **Timeline:**
+> - **2:14am:** Automated monitoring alerted on-call engineer (Sarah) to 80% error rate
+> - **2:20am:** Sarah paged database team, identified corrupted database indexes causing all queries to fail
+> - **2:45am:** CTO (me) joined incident call after Sarah escalated
+> - **5:30am:** Rolled back Friday's schema migration, partially restored service (50% → 80% success rate)
+> - **6:47am:** Fully recovered after manually rebuilding indexes
+>
+> **Customer Impact:**
+> - 847 customers (80% of active base) affected
+> - 12,400 failed requests (orders, login attempts, data syncs)
+> - 23 support tickets filed, 8 customers escalated to executives
+> - Estimated revenue impact: $15K in SLA credits
+>
+> **Our Response:**
+> - 6 engineers worked through the night
+> - We proactively emailed all affected customers by 8am Saturday with status and apology
+> - We held customer office hours Sunday 2-6pm (47 customers attended)
+> - We're issuing automatic SLA credits (no request needed)
+>
+> ## Why It Happened (Root Cause)
+>
+> **Immediate cause:** Friday evening we deployed a database schema migration (adding index to support new feature). Under normal load, this worked fine in staging. But Saturday at 2am, a scheduled batch job ran that queries the same table. The combination of migration + batch job created a race condition that corrupted the index.
+>
+> **Underlying causes (honest reflection):**
+> 1. **Insufficient testing:** We tested the migration, but not under realistic load patterns that include batch jobs
+> 2. **Risky timing:** Deploying database changes Friday evening meant skeleton crew if problems emerged
+> 3. **Missing safeguards:** Batch job didn't have circuit breaker to stop if error rate spiked
+>
+> **I take responsibility.** As CTO, I approved the deployment plan that didn't account for batch job interaction. The engineering team followed our process—the process was inadequate.
+>
+> ## What We're Doing About It
+>
+> **Immediate (This Week):**
+> 1. **Deployment freeze:** No schema changes until new process is in place (unfreezes Nov 10)
+> 2. **Load testing:** Adding batch job scenarios to staging environment (reproduces Saturday's load pattern)
+> 3. **Circuit breakers:** Batch jobs now halt if error rate >5% (prevents cascading failures)
+>
+> **Short-term (Next 30 Days):**
+> 1. **Deployment policy:** No risky changes on Fridays or weekends (lessons from Saturday)
+> 2. **Runbook updates:** Document recovery procedures for index corruption (today it was tribal knowledge)
+> 3. **Customer communication template:** Pre-written incident comms for <30min response time
+>
+> **Long-term (Next Quarter):**
+> 1. **Incident response training:** All engineers will complete tabletop exercises (scheduled for Dec)
+> 2. **Database reliability investment:** Hiring database SRE, implementing automated failover
+> 3. **Proactive monitoring:** Detect schema migration risks before they reach production
+>
+> ## Transparency Commitment
+>
+> **Public postmortem:** I'm publishing a blameless postmortem on our engineering blog by Friday (I'll share draft with you first for input). Customers deserve to know we're taking this seriously and learning from it.
+>
+> **Monthly reliability reports:** Starting this month, I'll share incident metrics at exec meetings (MTTR, incident count, severity breakdown) so you have visibility into trends.
+>
+> **Ask anything:** I know you'll have questions. I'm available all day Monday—no meeting too small, no question too uncomfortable.
+>
+> ## What I Need from You
+>
+> 1. **Approval for database SRE hire** (unplanned $180K/year) - this is our #1 reliability gap
+> 2. **Customer outreach support** - If any of the 8 executive escalations need personal attention from CEO/CTO, let me know
+> 3. **Patience** - We'll have fewer feature releases next 2 weeks while we prioritize reliability
+>
+> I'm deeply sorry for this failure. Our customers and our team deserved better. We're fixing it.
+>
+> —CTO Name
+>
+> **Office hours:** Monday 9am-6pm, my door is open
+
+**Why This Works:**
+
+✅ **Headline:** Acknowledges failure explicitly ("We Failed") - shows accountability, builds trust
+✅ **Structure:** What/Why/What We're Doing - clear, logical flow
+✅ **Specificity:** Exact numbers (847 customers, 4.5 hours, $15K) not vague ("many," "several")
+✅ **Accountability:** "I take responsibility" (named CTO) vs passive "mistakes were made"
+✅ **Show don't tell:** Timeline with timestamps shows urgency, not just "we responded quickly"
+✅ **Humanization:** Named engineer (Sarah), personal language ("deeply sorry"), emotional honesty
+✅ **Transparency:** Admits underlying causes (not just immediate trigger), commits to public postmortem
+✅ **Credibility:** Concrete actions with timelines (not vague "we'll do better")
+✅ **Stakes:** Shows revenue impact ($15K SLA credits) and customer escalations (8 to executives)
+✅ **Call-to-action:** Specific asks (SRE hire approval, customer outreach, patience on features)
+✅ **Accessibility:** "Office hours Monday 9am-6pm" - invites conversation, not defensive
+
+---
+
+## Self-Assessment Using Rubric
+
+**Headline Clarity (5/5):** "We Failed 80% of Our Customers" - impossible to misunderstand
+**Structure (5/5):** What/Why/What We're Doing + Transparency Commitment - clear flow
+**Evidence Quality (5/5):** Specific data (847 customers, timeline with timestamps, $15K impact)
+**Audience Fit (5/5):** Mixed technical/non-technical with explanations, addresses exec concerns (customer impact, revenue, capability)
+**Storytelling (5/5):** Shows (timeline, named people) vs tells, humanizes data (8 escalations to executives = serious)
+**Accountability (5/5):** CTO takes responsibility explicitly, no passive voice or blame-shifting
+**Actionability (5/5):** Concrete preventions with timelines, clear asks with budget impact
+**Tone (5/5):** Serious, accountable, solution-oriented - matches crisis situation
+**Transparency (5/5):** Admits underlying causes, commits to public postmortem, invites questions
+**Credibility (5/5):** Vulnerable (admits inadequate process), shows work (root cause analysis), commits with specifics
+
+**Average: 5.0/5** ✓ Production-ready
+
+---
+
+## Key Techniques Demonstrated
+
+1. **Crisis Communication Pattern:** Acknowledge → Accountability → Action → Transparency
+2. **Specificity:** 847 customers (not "many"), 4.5 hours (not "extended"), $15K (not "financial impact")
+3. **Named Accountability:** "As CTO, I approved..." (not "the team" or "we")
+4. **Timeline Storytelling:** Timestamps create urgency and show response speed
+5. **Tiered Actions:** Immediate (this week) / Short-term (30 days) / Long-term (quarter) - shows comprehensive thinking
+6. **Vulnerability:** "I take responsibility", "deeply sorry", "customers deserved better" - builds trust through honesty
+7. **Stakeholder Addressing:** Customers (SLA credits, office hours), Team (supported through incident), Executives (asks for support)
+8. **Open Communication:** "Ask anything", "no question too uncomfortable", "my door is open" - invites dialogue
+
+---
+
+## Alternative Version: External Customer Communication
+
+If communicating to customers (not internal execs), use Before-After-Bridge structure:
+
+**Before:** "On Saturday morning, you may have experienced errors accessing our service. For 4.5 hours, 80% of requests failed."
+
+**After:** "Service is fully restored. We've issued automatic SLA credits to affected accounts (no action needed), and we've implemented safeguards to prevent this specific failure."
+
+**Bridge:** "Here's what happened and what we learned: [simplified root cause without technical jargon]. We're publishing a detailed postmortem on our blog Friday, and I'm personally available for questions: [email]."
+
+**Key differences from internal version:**
+- Less technical detail (no "corrupted indexes")
+- More emphasis on customer impact and resolution
+- Explicit next steps for customers (SLA credits automatic, email for questions)
+- Still accountable and transparent, but focused on customer needs not internal process
--- a/skills/communication-storytelling/resources/methodology.md
+++ b/skills/communication-storytelling/resources/methodology.md
@@ -0,0 +1,465 @@
+# Communication Storytelling Methodology
+
+Advanced techniques for complex multi-stakeholder communications, persuasion frameworks, and medium-specific adaptations.
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Advanced Communication Progress:
+- [ ] Step 1: Map stakeholders and create versions
+- [ ] Step 2: Apply persuasion principles
+- [ ] Step 3: Adapt to medium
+- [ ] Step 4: Build credibility and trust
+- [ ] Step 5: Test and iterate
+```
+
+**Step 1:** Map stakeholders by influence/interest. Create tailored versions. See [1. Multi-Stakeholder Communications](#1-multi-stakeholder-communications).
+**Step 2:** Apply Cialdini principles and cognitive biases. See [2. Persuasion Frameworks](#2-persuasion-frameworks).
+**Step 3:** Adapt narrative to email, slides, video, or written form. See [3. Medium-Specific Adaptations](#3-medium-specific-adaptations).
+**Step 4:** Build credibility through vulnerability, data transparency, and accountability. See [4. Building Credibility](#4-building-credibility).
+**Step 5:** Test with target audience segment, gather feedback, iterate. See [5. Testing and Iteration](#5-testing-and-iteration).
+
+---
+
+## 1. Multi-Stakeholder Communications
+
+### Stakeholder Mapping
+
+**When multiple audiences need different versions of the same message.**
+
+**Power-Interest Matrix:**
+```
+         High Interest
+              |
+    Engage    |    Manage Closely
+   Actively   |    (Key Players)
+--------------|------------------
+    Monitor   |    Keep Informed
+   (Minimal   |    (Keep Satisfied)
+    Effort)   |
+              |
+         Low Interest
+         Low Power → High Power
+```
+
+**Mapping process:**
+1. List all stakeholders (executives, employees, customers, investors, regulators, public)
+2. Plot each on power-interest matrix
+3. Identify information needs for each quadrant
+4. Create communication strategy per quadrant
+
+**Example: Product Sunset Announcement**
+- **CEO (High Power, High Interest):** Full business case with financials, strategic rationale, risk mitigation, resource reallocation plan
+- **Affected Customers (Low Power, High Interest):** Migration path, support timeline, feature parity in new product, testimonials from early migrators
+- **Sales Team (Medium Power, High Interest):** Objection handling guide, competitive positioning, incentive structure for upselling to new product
+- **General Employees (Low Power, Low Interest):** Company blog post with headline, 3 key points, link to FAQ
+
+### Creating Tailored Versions
+
+**Core message stays the same. Emphasis, detail, and proof vary by audience.**
+
+**Technique: Message Map**
+1. **Headline:** Same for all audiences (consistent message)
+2. **Key Points:** Reorder by audience priorities (exec cares about ROI first, engineers care about technical approach first)
+3. **Proof:** Swap evidence types (execs want financial data, customers want testimonials, engineers want benchmarks)
+4. **CTA:** Tailor to authority level (exec approves, manager implements, IC adopts)
+
+**Example: API Deprecation Announcement**
+
+**Version A (Engineering Managers):**
+- Headline: "We're deprecating API v1 on Dec 31 to focus resources on v2 (3x faster, 10x more scalable)"
+- Key Points: (1) Technical improvements in v2, (2) Migration guide with code samples, (3) Support timeline and office hours
+- Proof: Performance benchmarks, migration time estimates (2-4 hours per service)
+- CTA: "Schedule migration for your team by Nov 15 using migration tool [link]"
+
+**Version B (Executives):**
+- Headline: "We're deprecating API v1 to reduce technical debt and accelerate product velocity"
+- Key Points: (1) Cost savings ($500K annually in infrastructure), (2) Faster feature delivery (v2 reduces development time 40%), (3) Improved reliability (99.99% uptime vs 99.9%)
+- Proof: Financial impact, customer satisfaction improvement, competitive positioning
+- CTA: "Approve $50K migration support budget to ensure smooth transition by year-end"
+
+**Version C (External Developers):**
+- Headline: "API v1 is being sunset on Dec 31, 2024 - migrate to v2 for better performance and features"
+- Key Points: (1) Why we're doing this (focus on modern architecture), (2) What you gain (3x faster, new capabilities), (3) How to migrate (step-by-step guide)
+- Proof: v2 adoption stats (80% of new integrations), customer testimonials, performance comparisons
+- CTA: "Start migration today using our automated migration tool [link] - we're here to help"
+
+### Audience Segmentation Framework
+
+**When you can't create individual versions for each person, segment by shared characteristics.**
+
+**Segmentation dimensions:**
+- **Expertise:** Novice, practitioner, expert (affects jargon, depth, proof type)
+- **Decision role:** Recommender, influencer, approver, implementer, end-user (affects CTA, urgency)
+- **Concern:** Risk-averse, cost-conscious, innovation-focused, status-quo-defender (affects framing)
+- **Time:** 30 seconds (headline only), 5 minutes (executive summary), 30 minutes (full narrative)
+
+**Technique: Create master document with expandable sections**
+```markdown
+# [Headline - same for all]
+
+## Executive Summary (30 seconds - for approvers)
+[3 sentences: problem, solution, ask]
+
+## Full Story (5 minutes - for influencers)
+[Complete narrative with key points and proof]
+
+## Technical Deep Dive (30 minutes - for implementers)
+[Detailed analysis, methodology, alternatives considered]
+
+## FAQ (self-service - for end users)
+[Anticipated questions with concise answers]
+```
+
+---
+
+## 2. Persuasion Frameworks
+
+### Cialdini's 6 Principles of Influence
+
+**1. Reciprocity** - People feel obligated to return favors
+- **Application:** Offer value first (free tool, helpful analysis, early access) before asking
+- **Example:** "We've prepared a cost-benefit analysis for your team [attached]. After reviewing, would you be open to a 15-minute conversation about implementation?"
+
+**2. Commitment & Consistency** - People want to act consistently with past commitments
+- **Application:** Get small agreement first, then build to bigger ask
+- **Example:** "You mentioned last quarter that improving customer satisfaction was a top priority. This initiative directly addresses the #1 complaint in our NPS surveys."
+
+**3. Social Proof** - People look to others' behavior for guidance
+- **Application:** Show that similar people/companies have adopted your recommendation
+- **Example:** "3 of our top 5 competitors have already implemented this approach. Salesforce saw 40% improvement in metric X within 6 months."
+
+**4. Authority** - People trust credible experts
+- **Application:** Cite recognized experts, credentials, data sources
+- **Example:** "Gartner's 2024 report identifies this as a 'must-have' capability. We've consulted with Dr. Smith, the leading researcher in this space, to design our approach."
+
+**5. Liking** - People say yes to those they like and relate to
+- **Application:** Find common ground, acknowledge their perspective, show empathy
+- **Example:** "I know the engineering team has been stretched thin with the platform migration—I've felt that pain on my own team. That's why this proposal includes dedicated support resources so you're not doing it alone."
+
+**6. Scarcity** - People want what's limited or exclusive
+- **Application:** Highlight time constraints, limited availability, opportunity cost
+- **Example:** "The vendor's discount expires Nov 30. If we don't decide by then, we'll pay 30% more in 2025, which likely means cutting scope or delaying launch."
+
+### Cognitive Biases to Leverage (Ethically)
+
+**Loss Aversion** - People fear losses more than they value equivalent gains
+- ❌ Weak: "This will increase revenue by $500K"
+- ✅ Strong: "Without this, we'll lose $500K in revenue to competitors who've already adopted it"
+
+**Anchoring** - First number mentioned sets reference point
+- ❌ Weak: "This costs $100K" (sounds expensive)
+- ✅ Strong: "Industry standard is $500K. We've negotiated down to $100K through our existing vendor relationship" (sounds like a deal)
+
+**Status Quo Bias** - People prefer current state unless change is compelling
+- **Counter with:** Show status quo is unstable ("we're already losing ground") + paint vivid future state + provide clear transition path
+
+**Availability Heuristic** - Recent vivid examples feel more probable
+- **Leverage:** Use recent concrete examples ("just last week, Customer X churned citing this exact issue") rather than abstract statistics
+
+**Framing Effect** - Same info presented differently drives different decisions
+- ❌ Negative frame: "This approach has a 20% failure rate"
+- ✅ Positive frame: "This approach succeeds 80% of the time"
+- ⚠️ Use ethically: Don't hide risks, but emphasize benefits
+
+---
+
+## 3. Medium-Specific Adaptations
+
+### Email (Inbox → Action)
+
+**Constraints:** Skimmed in 30 seconds, competing with 50 other emails, mobile-first
+
+**Structure:**
+1. **Subject line:** Specific + actionable + urgency if appropriate
+   - ✅ "Decision needed by Friday: API v1 deprecation plan"
+   - ❌ "API Update"
+2. **First sentence:** Bottom line up front (BLUF)
+   - "We should deprecate API v1 on Dec 31 to focus resources on v2 (3x faster, saves $500K annually)."
+3. **Body:** 3-5 short paragraphs or bullets, white space between each
+4. **CTA:** Bold, single action, deadline
+   - **"Reply by Friday Nov 15 with approval or questions."**
+5. **Optional:** "TL;DR" section at top for busy executives
+
+**Best practices:**
+- One CTA per email (not 5 different asks)
+- Front-load: Most important info in first 2 sentences
+- Mobile-friendly: Short paragraphs, no long sentences, use bullets
+- Progressive disclosure: Link to full doc for details
+
+### Slides (Presentation Support)
+
+**Constraints:** Glance-able, visual-first, presenter provides narration
+
+**Structure:**
+1. **Title slide:** Headline + your name/date
+2. **Situation slide:** Context in 3-4 bullets
+3. **Complication slide:** Problem/opportunity with data
+4. **Resolution slide:** Your recommendation
+5. **Support slides:** One key point per slide with visual proof
+6. **Next steps slide:** Clear CTA with timeline and owners
+
+**Design principles:**
+- **One message per slide:** Slide title = key takeaway
+- **Signal-to-noise:** Remove everything that doesn't support the point
+- **Visual hierarchy:** Big headline, supporting data smaller
+- **Consistency:** Same fonts, colors, layouts across deck
+
+**Data visualization:**
+- Bar charts for comparisons
+- Line charts for trends over time
+- Pie charts for part-to-whole (use sparingly)
+- Callout boxes for key numbers
+
+**Anti-patterns:**
+- ❌ Walls of text (slide should be glance-able in 3 seconds)
+- ❌ Tiny font (nothing below 18pt)
+- ❌ Reading slides word-for-word (presenter should add value)
+- ❌ Clip art or decorative images (signal, not noise)
+
+### Written Narrative (Memo/Blog/Article)
+
+**Constraints:** Deep reading, need to sustain attention, compete with distraction
+
+**Structure:**
+1. **Headline:** Compelling + specific
+2. **Lede:** 2-3 sentences capturing essence (could stand alone)
+3. **Body:** Narrative arc with clear sections, headers, transitions
+4. **Conclusion:** Restate key takeaway + CTA
+
+**Narrative devices:**
+- **Scene-setting:** "It was 2am on a Saturday when the alerts started firing..."
+- **Dialogue:** "As our CEO said in the all-hands, 'We have to make a choice: grow slower or raise prices.'"
+- **Foreshadowing:** "We didn't know it yet, but this decision would reshape our entire roadmap."
+- **Callbacks:** "Remember the $2M revenue gap from Q1? Here's how we closed it."
+
+**Pacing:**
+- **Hook:** First paragraph must grab attention
+- **Vary sentence length:** Short for impact. Longer, more complex sentences for explanation and detail.
+- **Section breaks:** Use headers and white space every 3-4 paragraphs
+- **Pull quotes:** Highlight key insights in larger text/boxes
+
+### Video Script (Speaking)
+
+**Constraints:** Can't skim ahead, linear consumption, attention drops after 90 seconds
+
+**Structure:**
+1. **Hook (0-10s):** Provoke curiosity or state benefit
+   - "Most product launches fail. Here's why ours succeeded."
+2. **Promise (10-20s):** What viewer will learn
+   - "In the next 3 minutes, I'll show you the 3 decisions that made the difference."
+3. **Content (20s-2:30):** Deliver on promise with clear segments
+   - Pattern: Point → Proof → Example (repeat 3x)
+4. **Recap (2:30-2:50):** Restate key takeaways
+5. **CTA (2:50-3:00):** What to do next
+
+**Speaking techniques:**
+- **Conversational tone:** Write how you speak, not formal prose
+- **Signposting:** "First... Second... Finally..." (helps viewer follow structure)
+- **Emphasis:** Slow down and pause before key points
+- **Visuals:** Show, don't just tell (charts, screenshots, demos)
+
+**Time budgets:**
+- 30-second: Headline + one proof point + CTA
+- 2-minute: Headline + 3 key points with brief proof + CTA
+- 5-minute: Full narrative with examples and Q&A preview
+- 10+ minute: Deep dive with sections, detailed proof, objection handling
+
+---
+
+## 4. Building Credibility
+
+### Vulnerability and Honesty
+
+**Counter-intuitive:** Acknowledging weaknesses builds trust.
+
+**Technique: Preemptive objection handling**
+- ❌ Hide risks: "This will definitely work"
+- ✅ Acknowledge risks: "This approach has risk X. Here's how we're mitigating it. If Y happens, we have fallback plan Z."
+
+**Technique: Show your work**
+- ❌ Opaque: "We should invest $2M in this"
+- ✅ Transparent: "We should invest $2M based on: (1) $5M potential upside with 60% probability = $3M expected value, (2) $500K downside protection through phased rollout, (3) comparable to competitors' investments. See detailed model [link]."
+
+**Technique: Admit what you don't know**
+- ❌ Bluff: "We're confident this will work"
+- ✅ Honest: "We're confident about X and Y based on [evidence]. We're less certain about Z—it depends on how customers respond. We'll know more after 30-day pilot."
+
+### Data Transparency
+
+**Principle:** Show your data sources, assumptions, and limitations.
+
+**Template:**
+```markdown
+## Data Sources
+- Customer churn data (internal, last 6 months, n=450)
+- Competitor pricing (public websites, verified Oct 2024)
+- Market size (Gartner 2024 report, TAM methodology)
+
+## Assumptions
+- Customer behavior remains stable (reasonable given 2-year historical consistency)
+- Competitors don't change pricing in next 6 months (risk: we'd need to adjust)
+- Economic conditions don't deteriorate (sensitive: 10% GDP contraction would reduce TAM by 30%)
+
+## Limitations
+- This analysis doesn't include international markets (out of scope, could add 20% upside)
+- Sample size for Segment B is small (n=40, confidence interval wider)
+- Qualitative feedback from 12 interviews, not statistically representative
+```
+
+**Why this works:**
+- Shows intellectual rigor (you've thought through limitations)
+- Builds trust (you're not hiding weaknesses)
+- Helps audience assess reliability (they can judge if limitations matter for their decision)
+
+### Accountability and Track Record
+
+**Principle:** Show past predictions/recommendations and outcomes.
+
+**Technique: Own past mistakes**
+- "Last year I recommended X, which underperformed. Here's what I learned: [lessons]. That's why this recommendation is different: [how you applied lessons]."
+
+**Technique: Show calibration**
+- "In Q1, I forecasted 15-20% growth with 70% confidence. Actual: 18% growth (within range). In Q2, I forecasted 25-30% with 60% confidence. Actual: 22% growth (below range because of market headwinds we didn't anticipate). Here's my Q3 forecast and confidence level..."
+
+**Technique: Commitments with skin in the game**
+- ❌ No stakes: "I think this will work"
+- ✅ Stakes: "I'm confident enough to commit: if this doesn't hit 80% of target by Q2, I'll personally lead the pivot plan"
+
+---
+
+## 5. Testing and Iteration
+
+### Pre-Testing Your Narrative
+
+**Before sending to full audience, test with representative sample.**
+
+**Technique: "Stupid questions" test**
+1. Find someone from target audience who hasn't seen your draft
+2. Ask them to read it quickly (how they'll actually consume it)
+3. Ask: "What's the main point?" (tests headline clarity)
+4. Ask: "What should I do next?" (tests CTA clarity)
+5. Ask: "What questions do you have?" (reveals gaps)
+6. Watch for confused expressions (signals unclear points)
+
+**Technique: Read-aloud test**
+- Read your draft aloud to yourself
+- Mark anywhere you stumble (probably unclear writing)
+- Mark anywhere you get bored (probably too long or off-topic)
+- Fix those sections
+
+**Technique: Time-constraint test**
+- Give someone 30 seconds to read your email/memo
+- Ask what they remember (should be headline + one key point)
+- If they can't recall your main point, your headline isn't clear enough
+
+### Iteration Based on Feedback
+
+**Common feedback patterns and fixes:**
+
+**"I don't understand why this matters"**
+- Fix: Add stakes section showing cost of inaction
+- Fix: Connect to audience's stated priorities more explicitly
+
+**"This feels biased"**
+- Fix: Acknowledge opposing viewpoints before countering them
+- Fix: Show data that goes against your conclusion, then explain why you still recommend it
+
+**"Too long, didn't read it all"**
+- Fix: Add executive summary at top
+- Fix: Cut 30% (be ruthless - most drafts have filler)
+- Fix: Use more headers and bullet points for scannability
+
+**"What about [objection]?"**
+- Fix: Add FAQ or objection-handling section
+- Fix: Preemptively address top 3 objections in main narrative
+
+**"I need more proof"**
+- Fix: Add specific examples, data, or case studies
+- Fix: Cite credible external sources (not just your opinion)
+
+### A/B Testing Headlines and CTAs
+
+**When communicating to large audiences (email campaigns, announcements, marketing), test variations.**
+
+**Headlines to test:**
+- Question vs statement: "Should we migrate to API v2?" vs "We're migrating to API v2 on Dec 31"
+- Benefit vs loss: "Gain 3x performance" vs "Don't fall behind competitors"
+- Specific vs general: "Save $500K annually" vs "Reduce costs significantly"
+
+**CTAs to test:**
+- Urgent vs no deadline: "Reply by Friday" vs "Reply when ready"
+- Specific vs open: "Approve budget increase" vs "Share your thoughts"
+- One ask vs multiple: "Click here to migrate" vs "Migrate, read FAQ, or contact support"
+
+**Measurement:**
+- Email: Open rate (headline test), Click-through rate (CTA test), Reply rate (overall effectiveness)
+- Presentation: Questions asked (clarity test), Decision made in meeting (persuasiveness)
+- Written: Time on page (engagement), Scroll depth (how far people read), CTA completion rate
+
+---
+
+## Advanced Narrative Techniques
+
+### Metaphor Frameworks
+
+**When explaining complex concepts to non-experts.**
+
+**Requirements for good metaphors:**
+1. From audience's domain (don't explain unfamiliar with unfamiliar)
+2. Structural similarity (matches key relationships, not just superficial)
+3. Highlight key insight (clarifies main point, not just decorative)
+
+**Example: Explaining Microservices**
+- ❌ Poor metaphor: "Like Lego blocks" (doesn't explain communication complexity)
+- ✅ Good metaphor: "Like specialized teams (frontend, backend, database) vs generalists. Teams can move faster independently, but now need coordination meetings (APIs) and shared understanding (contracts). When coordination breaks down (service outage), one team being down can block others."
+
+**Framework: "It's like... but..."**
+- "It's like [familiar thing], but [key difference that matters]"
+- Example: "Technical debt is like financial debt—you borrow speed now, pay interest later. But unlike financial debt, technical debt's interest compounds faster and isn't visible on balance sheets."
+
+### Narrative Arcs for Long-Form Content
+
+**Three-Act Structure (Hollywood):**
+1. **Act 1 (Setup - 25%):** Introduce status quo, characters, problem
+2. **Act 2 (Conflict - 50%):** Obstacles, rising tension, attempts that fail
+3. **Act 3 (Resolution - 25%):** Climax, solution, new equilibrium
+
+**Example: Product Pivot Story**
+- Act 1: "We launched in 2022 targeting small businesses with $50/month SaaS tool. First 6 months looked great: 200 customers, $10K MRR, good engagement."
+- Act 2: "But at month 9, churn spiked to 15% monthly. We tried: (1) More features—churn stayed high. (2) Better support—churn stayed high. (3) Lower price—churn got worse. We were burning cash and running out of runway."
+- Act 3: "We interviewed 50 churned customers. Breakthrough: Small businesses needed results in days, not months—they couldn't wait for ROI. We rebuilt as done-for-you service, not self-serve SaaS. Churn dropped to 2%, price increased 5x, customers became advocates. We hit $500K ARR 6 months later."
+
+**In-Medias-Res (Start in the Middle):**
+- Begin with high-tension moment, then flash back to explain how you got there
+- Example: "It was 2am on Saturday when the CEO texted: 'How bad is it?' Our main database was corrupted, affecting 80% of customers. We had 6 hours before Monday morning east coast wakeup. Here's what happened and what we learned."
+
+### Emotional Arcs
+
+**Different situations require different emotional journeys.**
+
+**Inspiration Arc (Hero's Journey):**
+- Start: Ordinary world (relatability)
+- Middle: Challenge and struggle (empathy)
+- End: Triumph and transformation (hope)
+- **Use for:** Change management, celebrating wins, motivating teams
+
+**Trust Arc (Crisis Communication):**
+- Start: Acknowledge problem (honesty)
+- Middle: Show accountability and action (responsibility)
+- End: Commit to transparency and improvement (restoration)
+- **Use for:** Incidents, mistakes, sensitive topics
+
+**Persuasion Arc (Proposal):**
+- Start: Shared problem (alignment)
+- Middle: Solution with proof (logic)
+- End: Clear path forward (confidence)
+- **Use for:** Recommendations, project proposals, strategic plans
+
+**Avoid emotional manipulation:**
+- ✅ Authentic emotion from real stories
+- ❌ Manufactured emotion through exaggeration or fear-mongering
+- ✅ Hope grounded in evidence
+- ❌ False hope that ignores real risks
--- a/skills/communication-storytelling/resources/template.md
+++ b/skills/communication-storytelling/resources/template.md
@@ -0,0 +1,260 @@
+# Communication Storytelling Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Communication Template Progress:
+- [ ] Step 1: Gather audience intelligence
+- [ ] Step 2: Choose story structure
+- [ ] Step 3: Draft narrative elements
+- [ ] Step 4: Refine and polish
+- [ ] Step 5: Self-assess quality
+```
+
+**Step 1:** Ask input questions below. Deep understanding of audience is critical. See [Input Questions](#input-questions).
+**Step 2:** Match structure to situation. See [Story Structures](#story-structures).
+**Step 3:** Fill quick template. Write headline first. See [Quick Template](#quick-template) and [Field Guidance](#field-guidance).
+**Step 4:** Apply storytelling techniques. See [Storytelling Techniques](#storytelling-techniques).
+**Step 5:** Check against quality criteria. See [Quality Checklist](#quality-checklist).
+
+---
+
+## Input Questions
+
+**Required inputs:**
+
+1. **Message** - What information/analysis/data needs to be communicated? What's the core insight, supporting evidence, backstory?
+2. **Audience** - Who specifically (name/role)? Expertise level? What do they care about? Decision authority? Time available?
+3. **Purpose** - What should audience do: inform, persuade, inspire, build trust?
+4. **Context** - Stakes (low/medium/high)? Urgency? Sentiment (good/bad/neutral)? What do they already know?
+5. **Tone** - Formal vs casual? Optimistic vs realistic? Urgent vs measured? Celebratory vs sobering?
+
+**Audience intelligence:**
+- What keeps them up at night?
+- What jargon do they use vs not understand?
+- What data/stories resonate with them?
+- What's their default response (skeptical, supportive, risk-averse)?
+
+---
+
+## Quick Template
+
+```markdown
+# [Headline: One-sentence essence of your message]
+
+## Key Points
+
+### 1. [First key point - most important or foundational]
+**Proof:** [Concrete evidence: data, example, quote, story]
+
+### 2. [Second key point - builds on first]
+**Proof:** [Concrete evidence with comparison or context]
+
+### 3. [Third key point - completes the arc]
+**Proof:** [Concrete evidence tied to audience's priorities]
+
+## Call to Action
+[One clear statement of what audience should do next]
+
+---
+## Context (optional)
+- **Audience:** [Who this is for]
+- **Background:** [What they need to know]
+- **Stakes:** [Why this matters]
+```
+
+---
+
+## Story Structures
+
+### 1. Problem-Solution-Benefit (PSB)
+**Best for:** Recommendations, proposals, project updates
+
+**Structure:** (1) Problem - clearly defined issue with quantified stakes, (2) Solution - your recommendation with rationale, (3) Benefit - tangible outcomes with numbers, (4) Next Steps - concrete ask with timeline
+
+**Example:** "We should switch to monthly subscription pricing because it solves our unpredictable revenue problem. Current annual contracts create feast-or-famine cashflow (Q1 $800K, Q2 $200K). Monthly subscriptions smooth revenue, reduce sales cycle from 6 weeks to 1 week, and give us faster product feedback. Expected impact: +40% revenue predictability, -50% sales cycle time. Next step: 90-day pilot with new customers."
+
+### 2. Hero's Journey (Transformation)
+**Best for:** Major changes, pivots, overcoming challenges
+
+**Structure:** (1) Where We Were - comfortable but problem lurking, (2) Why We Had to Change - problem emerges, (3) What We Tried - struggles that build credibility, (4) What Worked - the breakthrough with evidence, (5) What We Do Now - new normal with lessons, (6) Call to Action
+
+**Example:** "We've transformed from product-first to customer-first. For 3 years, we built features our engineers loved but churned 15% monthly. Data showed customers didn't understand our value. We tested: (1) Better onboarding—5% improvement, (2) Simpler pricing—no change, (3) Talking to 100 churned customers—breakthrough. They needed workflow automation, not platform flexibility. We rebuilt around workflows, churn dropped to 3%, NPS jumped from 32 to 61. Now every feature starts with customer interviews. Recommendation: Every team should talk to 5 customers this quarter."
+
+### 3. Before-After-Bridge (BAB)
+**Best for:** Product launches, feature announcements, process improvements
+
+**Structure:** (1) Before - painful current state (audience's lived experience), (2) After - improved future state (what becomes possible), (3) Bridge - your solution (the path forward), (4) Call to Action
+
+**Example:** "Before: Customer success team spends 15 hours/week manually pulling data, creating charts, personalizing emails—error-prone and soul-crushing. After: One-click automated reports with live data, personalized insights, beautiful visuals—freeing 15 hours for high-value retention work. Bridge: We built CS Analytics Suite, launching March 1st with training and support. Sign up by Feb 15 for early access."
+
+### 4. Situation-Complication-Resolution (SCR)
+**Best for:** Executive communications, board updates, investor relations
+
+**Structure:** (1) Situation - neutral baseline with context, (2) Complication - what changed or what's at stake, (3) Resolution - your strategy to address it, (4) Implications - impact on goals/priorities, (5) Call to Action
+
+**Example:** "Situation: We planned $10M ARR by Q4 with 70% from enterprise. Complication: Top 3 enterprise deals ($2M ARR) pushed to 2025 due to budget freezes—we'll miss target by 20%. Resolution: Pivoting resources to mid-market (faster sales cycles). Reallocating 2 AEs from enterprise to mid-market, launching self-serve tier for sub-$50K deals, extending runway by cutting non-essential hiring. Revised target: $8.5M ARR, 50% enterprise/50% mid-market. Risk: mid-market has higher churn, but faster iteration. Ask: Approve budget reallocation and delay 3 engineering hires to Q1 2025."
+
+---
+
+## Field Guidance
+
+### Crafting Headlines
+**Purpose:** Capture essence in one sentence. Reader understands your core message in 10 seconds.
+
+**Formula options:**
+- Recommendation: "We should [do X] because [key reason]"
+- Insight: "[Counter-intuitive finding] means [implication]"
+- Achievement: "We've [accomplished X], proven by [key metric]"
+- Problem: "[Problem] is costing us [quantified impact]"
+
+**Good:** ✅ "We've reached product-market fit (proven by 28% growth with flat sales capacity)"
+**Bad:** ❌ "Q3 Business Review" (too vague, no insight)
+
+### Structuring Key Points
+**Purpose:** 3-5 supporting ideas with logical flow. Each point must be distinct, well-supported, and advance the narrative.
+
+**Flow options:** Chronological (past→present→future) | Problem→Solution (pain→fix→benefit) | Importance-ranked (critical→supporting→implications)
+
+**Each key point needs:**
+- Clear claim (one sentence)
+- Concrete proof (data, examples, quotes)
+- Connection to headline
+
+**Testing:** Can you defend this with evidence? Is it distinct from other points? Does it pass "so what?" test? Would removing it weaken your case?
+
+### Adding Proof
+**Purpose:** Evidence that substantiates claims. Without proof, you're asking blind trust.
+
+**Types:**
+1. **Quantitative:** "Churn dropped from 8% to 3% monthly" | "2x faster than industry benchmark"
+2. **Qualitative:** Customer quotes | User stories | Expert opinions
+3. **Examples:** Case studies | Analogies | Scenarios
+4. **Logic:** First principles | Causal chains | Elimination ("tested 5, 4 failed")
+
+**Best practices:** One key piece per point | Cite sources | Use comparisons for context | Humanize data with stories
+
+### Writing Call-to-Action
+**Purpose:** Clear statement of what audience should do next.
+
+**Strong CTAs:** ✅ Specific ("Approve $500K budget"), Achievable ("Attend 30-min demo"), Time-bound ("Respond by Friday"), Clear owner
+**Weak CTAs:** ❌ Vague ("Think about this"), Passive ("It would be great if..."), No timeline, Too many asks
+
+**By purpose:**
+- Inform: "No action needed, will update next week"
+- Persuade: "Approve [decision] by [date]"
+- Inspire: "Each team should [action] this quarter"
+- Build trust: "Here's our plan, open to feedback by [date]"
+
+---
+
+## Storytelling Techniques
+
+### Specificity Over Generality
+❌ Vague: "We have many happy customers"
+✅ Specific: "47 customers gave 5-star reviews in past month, 32 mentioning faster load times"
+
+❌ Vague: "This will improve efficiency"
+✅ Specific: "This reduces manual work from 15 hours/week to 2 hours, freeing 520 hours annually per team member"
+
+### Show, Don't Tell
+❌ Tell: "Our customers love the new feature"
+✅ Show: "Within 2 weeks, 83% of active users tried the new feature, 67% use it daily. Acme Corp said it 'eliminated 3 hours of manual work each day.'"
+
+❌ Tell: "The problem is urgent"
+✅ Show: "We lose $50K in potential revenue every week this persists. Three major prospects ($1.5M ARR combined) cited this as their reason for choosing competitors."
+
+### Humanize Data
+❌ Raw: "Customer acquisition cost increased 23% in Q3"
+✅ Humanized: "We're now spending $1,150 to acquire each customer, up from $935—that's the cost of hiring an intern for a month, per customer"
+
+❌ Raw: "System latency is 450ms at p95"
+✅ Humanized: "When customers click 'Submit Order,' they wait almost half a second—long enough to wonder if it worked and click again, creating duplicate orders"
+
+### Build Tension and Resolution
+❌ No tension: "We launched a new feature and it's doing well. Adoption is 75%."
+
+✅ With tension: "After 6 months building feature X, we launched to crickets—only 12% adoption in week 1. We interviewed non-adopters and discovered they didn't understand it solved their problem. We rewrote in-app messaging to show use cases, and adoption jumped to 75% in 2 weeks."
+
+**Pattern:** Set up expectation → Introduce complication → Show struggle → Reveal resolution → Extract lesson
+
+### Use Analogies from Audience's Domain
+- Technical→Executive: "Microservices are like specialized teams instead of generalists—faster iteration, but requires coordination overhead"
+- Data→Business: "This metric is a leading indicator, like smoke before fire—by the time revenue drops, we've already lost the battle"
+
+### Lead with Insight, Not Data
+❌ Data-first: "Here are our Q3 numbers: $2.3M revenue, 520 customers, 3.2% churn, 58 NPS, 28% QoQ growth."
+
+✅ Insight-first: "We've hit product-market fit. The proof: Revenue grew 28% while sales capacity stayed flat—customers are pulling product from us. Churn dropped 36% as we focused on power users. NPS jumped 16 points to 58, driven by the exact 3 features we bet on."
+
+**Pattern:** Lead with conclusion → Support with 2-3 key data points → Explain what data means → State implications
+
+---
+
+## Quality Checklist
+
+Before delivering, verify:
+
+**Structure:**
+- [ ] Headline captures essence in one sentence
+- [ ] 3-5 key points (distinct, not overlapping)
+- [ ] Logical flow (points build on each other)
+- [ ] Clear call-to-action
+
+**Proof:**
+- [ ] Every claim has concrete evidence
+- [ ] Data is specific (numbers, names, dates)
+- [ ] Sources cited for major claims
+- [ ] Comparisons provide context
+
+**Audience Fit:**
+- [ ] Jargon appropriate for expertise level
+- [ ] Tone matches situation
+- [ ] Length matches time constraints
+- [ ] Addresses audience's key concerns
+- [ ] Front-loads conclusions
+
+**Storytelling:**
+- [ ] Uses specifics over generalities
+- [ ] Shows rather than tells
+- [ ] Humanizes data with context/stories
+- [ ] Builds tension and resolution where appropriate
+- [ ] Uses analogies from audience's domain
+
+**Clarity:**
+- [ ] Can summarize in one sentence
+- [ ] Each point passes "so what?" test
+- [ ] No passive voice hiding accountability
+- [ ] No walls of text (uses white space, bullets, headers)
+- [ ] Reads aloud smoothly
+
+**Integrity:**
+- [ ] Acknowledges limitations and risks
+- [ ] No misleading by omission
+- [ ] States assumptions explicitly
+- [ ] Distinguishes facts from speculation
+- [ ] Takes accountability
+
+**Impact:**
+- [ ] Creates urgency if action needed
+- [ ] Builds confidence if trust needed
+- [ ] Provides clear next steps
+- [ ] Memorable (one key takeaway sticks)
+
+---
+
+## Common Pitfalls
+
+**Burying the lede:** ❌ "Let me give background... [500 words later] ...so here's what we should do" | ✅ "We should do X. Here's why: [background as support]"
+
+**Death by bullets:** ❌ Slide deck with 50 bullets, no narrative thread | ✅ Narrative with bullets supporting each point
+
+**Jargon mismatch:** ❌ Using terms executives don't know (or dumbing down for experts) | ✅ Matching sophistication to audience, defining terms when needed
+
+**Data without interpretation:** ❌ "Churn is 3.2%, NPS is 58, CAC is $1,150" | ✅ "We're retaining customers well (3.2% churn is top quartile), but they're expensive to acquire ($1,150 CAC means 18-month payback)"
+
+**Vague CTA:** ❌ "Let's be more customer-focused" | ✅ "Each team should interview 5 customers this quarter using the research guide [link]"
+
+**No stakes:** ❌ "We should consider improving page load time" | ✅ "Page load time is 2.5 seconds, costing us 30% of potential conversions ($800K annually). Optimizing to 1 second recovers $240K in year 1."
--- a/skills/constraint-based-creativity/SKILL.md
+++ b/skills/constraint-based-creativity/SKILL.md
@@ -0,0 +1,171 @@
+---
+name: constraint-based-creativity
+description: Use when brainstorming feels stuck or generates obvious ideas, need to break creative patterns, working with limited resources (budget/time/tools/materials), want unconventional solutions, designing with specific limitations, user mentions "think outside the box", "we're stuck", "same old ideas", "tight constraints", "limited budget/time", or seeking innovation through limitation rather than abundance.
+---
+
+# Constraint-Based Creativity
+
+## Table of Contents
+- [Purpose](#purpose)
+- [When to Use](#when-to-use)
+- [What Is It](#what-is-it)
+- [Workflow](#workflow)
+- [Constraint Types](#constraint-types)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Turn limitations into creative fuel by strategically imposing constraints that force novel thinking, break habitual patterns, and reveal unexpected solutions.
+
+## When to Use
+
+**Invoke this skill when you observe:**
+- Unconstrained brainstorming produces predictable, generic ideas
+- Team defaulting to "same old approaches"
+- Creative block despite ample resources
+- Need to work within tight limitations (budget, time, materials, technical)
+- Want to differentiate from competitors using similar unlimited resources
+- Seeking simplicity and focus over feature bloat
+- Innovation feels incremental rather than breakthrough
+
+**Common trigger phrases:**
+- "We keep coming up with the same ideas"
+- "How do we innovate on a tight budget?"
+- "Think outside the box"
+- "We're stuck"
+- "What if we could only use X?"
+- "Design this with severe limitations"
+- "Create something radically different"
+
+## What Is It
+
+**Constraint-based creativity** deliberately limits freedom (resources, rules, materials, format) to force creative problem-solving. Paradoxically, constraints often boost creativity by:
+
+1. **Reducing decision paralysis** - Fewer options = clearer focus
+2. **Breaking habitual patterns** - Can't use usual solutions
+3. **Forcing novel combinations** - Must work with what's allowed
+4. **Increasing psychological safety** - "We had to because of X"
+5. **Creating memorable differentiation** - Constraints make solutions distinctive
+
+**Quick example:** Twitter's original 140-character limit forced concise, punchy writing. Haiku's 5-7-5 syllable structure produces poetry. $10K budget forces guerrilla marketing over Super Bowl ads. Building with only CSS (no images) creates distinctive visual style.
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Constraint-Based Creativity Progress:
+- [ ] Step 1: Understand the problem and context
+- [ ] Step 2: Choose or design strategic constraints
+- [ ] Step 3: Generate ideas within constraints
+- [ ] Step 4: Evaluate and refine solutions
+- [ ] Step 5: Validate quality and deliver
+```
+
+**Step 1: Understand the problem and context**
+
+Ask user for the creative challenge (what needs solving), current state (what's been tried, why it's not working), ideal outcome (success criteria), and any existing constraints (real limitations already in place). Understanding why ideas feel stale or stuck helps identify which constraints will unlock creativity. See [Constraint Types](#constraint-types) for strategic options.
+
+**Step 2: Choose or design strategic constraints**
+
+If user has existing constraints (tight budget, short timeline, limited materials) → Use [resources/template.md](resources/template.md) to work within them creatively. If no constraints exist and ideation is stuck → Study [resources/methodology.md](resources/methodology.md) to design strategic constraints that force new thinking patterns. Choose 1-3 constraints maximum to avoid over-constraining.
+
+**Step 3: Generate ideas within constraints**
+
+Apply chosen constraints rigorously during ideation. Create `constraint-based-creativity.md` file documenting: problem statement, active constraints (what's forbidden/required/limited), idea generation process, and all ideas produced (including "failed" attempts that revealed insights). Quantity matters - aim for 20+ ideas before evaluating. See [resources/template.md](resources/template.md) for structured generation process.
+
+**Step 4: Evaluate and refine solutions**
+
+Assess ideas using dual criteria: (1) Does it satisfy all constraints? (2) Does it solve the original problem? Select strongest 2-3 ideas. Refine by combining elements, removing unnecessary complexity, and strengthening the constraint-driven insight. Document why certain ideas stand out - often the constraint reveals an unexpected angle. See [resources/methodology.md](resources/methodology.md) for evaluation frameworks.
+
+**Step 5: Validate quality and deliver**
+
+Self-assess using [resources/evaluators/rubric_constraint_based_creativity.json](resources/evaluators/rubric_constraint_based_creativity.json). Verify: constraints were genuinely respected (not bent/broken), solutions are novel (not slight variations of existing), the constraint created the creativity (solution wouldn't exist without it), ideas are actionable (not just conceptual), and creative insight is explained (why this constraint unlocked this idea). Minimum standard: Average score ≥ 3.5. Present completed `constraint-based-creativity.md` file highlighting the constraint-driven breakthroughs.
+
+## Constraint Types
+
+Strategic constraints fall into categories. Choose based on what pattern you want to break:
+
+**Resource Constraints** (force efficiency):
+- Budget: "Design this campaign for $500" (vs typical $50K)
+- Time: "Ship in 48 hours" (vs typical 2-week sprint)
+- Team: "Solo project only" or "No engineers"
+- Materials: "Found objects only" or "Recyclables only"
+
+**Format/Medium Constraints** (force adaptation):
+- Length: "Explain in 10 words" or "Story in 6 words"
+- Medium: "Text only, no images" or "Visual only, no words"
+- Platform: "Twitter thread only" (vs blog post)
+- Dimensions: "Square format only" or "Vertical video"
+
+**Rule-Based Constraints** (force creative workarounds):
+- Forbidden elements: "No letter 'e'" or "No adjectives"
+- Required elements: "Must include these 3 objects"
+- Style rules: "Hemingway style only" or "As if Shakespeare"
+- Process rules: "No editing, one-take only"
+
+**Technical Constraints** (force optimization):
+- Code: "100 lines maximum" or "No external libraries"
+- Performance: "Load in <1 second" or "Run on 1990s hardware"
+- Data: "No PII collection" or "Works offline"
+- Compatibility: "Text-based only (ASCII art)"
+
+**Audience/Perspective Constraints** (force reframing):
+- Audience: "Explain to 5-year-olds" or "For experts only"
+- Perspective: "First person only" or "Second person only"
+- Tone: "No corporate speak" or "Casual only"
+- Voice: "Write as [specific person/character]"
+
+## Common Patterns
+
+**Pattern: The Limitation Sprint**
+When team is stuck, run 30-minute sprints with different constraints. Example: "10 ideas using only free tools" → "10 ideas in black/white only" → "10 ideas for $100 budget." Constraint rotation prevents pattern fixation.
+
+**Pattern: The Subtraction Game**
+Remove assumed "essentials" one at a time. Example: "App without login" → "App without UI" (voice only) → "App without internet" (offline-first). Forces questioning assumptions.
+
+**Pattern: The Format Flip**
+Change medium to force different thinking. Example: "Explain strategy as a recipe" or "Present roadmap as a movie trailer" or "Write documentation as a children's book."
+
+**Pattern: The Resource Inversion**
+Make the assumed limitation the focus. Example: "We have no budget" → "Build guerrilla marketing campaign showcasing zero-budget creativity" or "Only 2-person team" → "Sell the 'small team, big care' advantage."
+
+**Pattern: The Historical Constraint**
+Impose constraints from different eras. Example: "Design this as if it's 1995" (pre-smartphone) or "Build this with Victorian-era materials" or "Market this like 1960s Mad Men."
+
+## Guardrails
+
+**✓ Do:**
+- Choose constraints that directly counter the creative block (stuck on complexity → simplicity constraint)
+- Enforce constraints rigorously during ideation (no "bending" rules mid-process)
+- Generate high volume before judging (quantity first, then quality)
+- Document failed ideas - they often contain seeds of insight
+- Explain how the constraint created the solution (causality matters)
+- Use multiple different constraints in sequence (sprint pattern)
+
+**✗ Don't:**
+- Over-constrain (3+ simultaneous constraints usually paralyzes)
+- Choose arbitrary constraints unrelated to the problem
+- Abandon constraints when ideas get hard (that's when breakthroughs happen)
+- Confuse constraint-based creativity with regular brainstorming
+- Accept slight variations of existing ideas (constraint should force novelty)
+- Skip the evaluation step (need to validate constraint drove the creativity)
+
+## Quick Reference
+
+**Resources:**
+- `resources/template.md` - Structured process for generating ideas within constraints
+- `resources/methodology.md` - Advanced techniques for designing strategic constraints, combining constraint types, and systematic exploration
+- `resources/examples/` - Worked examples showing constraint-driven breakthroughs
+- `resources/evaluators/rubric_constraint_based_creativity.json` - Quality assessment before delivery
+
+**When to choose which resource:**
+- Working with existing constraints (budget, time, technical) → Start with template
+- No constraints but ideation is stuck → Study methodology to design constraints
+- Need to see examples of breakthroughs → Review examples folder
+- Before delivering to user → Always validate with rubric
+
+**Expected deliverable:**
+`constraint-based-creativity.md` file containing: problem statement, chosen constraints with rationale, idea generation process (including volume metrics), top 2-3 solutions with refinement notes, explanation of how constraints drove creativity, and next steps.
--- a/skills/constraint-based-creativity/resources/evaluators/rubric_constraint_based_creativity.json
+++ b/skills/constraint-based-creativity/resources/evaluators/rubric_constraint_based_creativity.json
@@ -0,0 +1,289 @@
+{
+  "criteria": [
+    {
+      "name": "Constraint Integrity",
+      "description": "Are constraints clearly defined, rigorously enforced, and genuinely respected throughout the process?",
+      "scoring": {
+        "1": "Constraints poorly defined or ignored. Ideas violate stated constraints. No evidence of enforcement.",
+        "2": "Constraints defined but loosely enforced. Some ideas 'bend' rules. Inconsistent application.",
+        "3": "Constraints clearly defined and mostly enforced. Occasional flexibility but generally respected.",
+        "4": "Constraints rigorously enforced. All top solutions fully respect constraints. Clear rationale for each constraint.",
+        "5": "Exemplary constraint discipline. Constraints treated as hard rules, not suggestions. Rationale explains WHY each constraint unlocks creativity. Enforcement explicit."
+      }
+    },
+    {
+      "name": "Constraint-Creativity Causality",
+      "description": "Can you explain HOW the constraint drove the creative solution? Would this solution exist without the constraint?",
+      "scoring": {
+        "1": "No causality explained. Solution could exist in unconstrained brainstorming. Constraint seems arbitrary.",
+        "2": "Weak causality. Constraint mentioned but connection to solution unclear. Seems like regular brainstorming with constraints added after.",
+        "3": "Moderate causality. Can see how constraint influenced solution, but solution might exist without it.",
+        "4": "Strong causality. Clear explanation of how constraint unlocked this specific solution. Unlikely to exist without constraint.",
+        "5": "Exemplary causality. Detailed explanation of thinking pattern broken, unexpected angle revealed, and why solution is impossible in unconstrained brainstorming. Constraint-creativity link is explicit and compelling."
+      }
+    },
+    {
+      "name": "Idea Volume & Process Rigor",
+      "description": "Was sufficient volume generated (20+ ideas)? Were 'failed' ideas documented? Was quantity prioritized before quality?",
+      "scoring": {
+        "1": "Minimal ideas (< 5). No documentation of process. Appears to have evaluated while generating (quality-first approach).",
+        "2": "Low volume (5-10 ideas). Some process notes. Evaluation may have started too early.",
+        "3": "Adequate volume (10-20 ideas). Process documented. Some failed ideas captured. Mostly quantity-first.",
+        "4": "Good volume (20-30 ideas). Failed ideas documented with insights. Clear quantity-first approach. Process includes timestamps or rounds.",
+        "5": "Exceptional volume (30+ ideas). Comprehensive documentation of all ideas including failures. Explicit insights from failed attempts. Timeboxed generation rounds. Volume metrics tracked."
+      }
+    },
+    {
+      "name": "Solution Novelty",
+      "description": "Are solutions novel and differentiated, or incremental variations of existing approaches?",
+      "scoring": {
+        "1": "Derivative. Solutions are essentially same as existing approaches with minor tweaks.",
+        "2": "Incremental. Solutions are improvements but not fundamentally different.",
+        "3": "Fresh. Solutions have interesting twists that make them distinctive.",
+        "4": "Novel. Solutions take unexpected angles that competitors aren't pursuing.",
+        "5": "Breakthrough. Solutions are completely unexpected, highly differentiated, and clearly constraint-driven. Competitors would be surprised by these approaches."
+      }
+    },
+    {
+      "name": "Problem-Solution Fit",
+      "description": "Do solutions actually solve the original creative challenge? Do they meet success criteria?",
+      "scoring": {
+        "1": "Poor fit. Solutions don't address the original problem or miss key success criteria.",
+        "2": "Partial fit. Solutions address some aspects but miss others. Success criteria partially met.",
+        "3": "Adequate fit. Solutions address the problem and meet most success criteria.",
+        "4": "Strong fit. Solutions fully address problem and meet all stated success criteria.",
+        "5": "Exceptional fit. Solutions exceed success criteria and address unstated needs. Problem-solution alignment is obvious and well-explained."
+      }
+    },
+    {
+      "name": "Actionability & Implementation Detail",
+      "description": "Can solutions be executed? Are there concrete next steps, timelines, and resource requirements?",
+      "scoring": {
+        "1": "Purely conceptual. No implementation details. Cannot execute without extensive additional planning.",
+        "2": "Vague actionability. Some implementation hints but missing key details (timeline, resources, steps).",
+        "3": "Actionable. Solutions include basic implementation notes and next steps.",
+        "4": "Highly actionable. Detailed implementation plans with timeline, resource allocation, and execution steps.",
+        "5": "Immediately executable. Comprehensive implementation plans with timeline, budget breakdown, success metrics, risk mitigation, and first actions clearly defined. Could start executing today."
+      }
+    },
+    {
+      "name": "Strategic Constraint Design",
+      "description": "Were constraints strategically chosen to counter the creative block? Is there clear rationale for each constraint?",
+      "scoring": {
+        "1": "Arbitrary constraints. No clear connection to the creative challenge. Constraints seem random.",
+        "2": "Loosely related constraints. Some connection to problem but not strategic.",
+        "3": "Relevant constraints. Clear connection to creative challenge. Rationale provided.",
+        "4": "Strategic constraints. Deliberately chosen to counter diagnosed creative block. Rationale explains connection.",
+        "5": "Exemplary constraint strategy. Diagnosed creative block type explicitly. Constraints specifically designed to counter that block. Rationale explains not just WHAT constraints but WHY these specific ones unlock creativity for this specific challenge."
+      }
+    },
+    {
+      "name": "Constraint Tightness (Sweet Spot)",
+      "description": "Are constraints tight enough to force creativity but not so tight they cause paralysis?",
+      "scoring": {
+        "1": "Too loose (no creative tension) OR too tight (paralysis, zero ideas). Missed the sweet spot entirely.",
+        "2": "Somewhat off. Either didn't force much creativity OR made ideation very difficult.",
+        "3": "Adequate tightness. Constraints forced some rethinking. Generated ideas with effort.",
+        "4": "Good tightness. Constraints created productive struggle. Breakthrough zone reached.",
+        "5": "Perfect tightness. Constraints forced radical creativity without paralysis. Clear evidence of being in the 'breakthrough zone'. May have used constraint escalation to find sweet spot."
+      }
+    },
+    {
+      "name": "Risk Honesty & Limitation Acknowledgment",
+      "description": "Are risks, limitations, and potential failures honestly acknowledged?",
+      "scoring": {
+        "1": "No risk acknowledgment. Solutions presented as perfect with no downsides.",
+        "2": "Minimal risk acknowledgment. Obvious limitations ignored or glossed over.",
+        "3": "Adequate risk honesty. Key risks identified. Limitations mentioned.",
+        "4": "Strong risk honesty. Comprehensive risk assessment. Limitations clearly stated. Some mitigation strategies.",
+        "5": "Exemplary honesty. Detailed risk analysis including: execution risks, assumption risks, constraint violation risks, and market risks. Limitations honestly acknowledged. Mitigation strategies for each major risk. Failure modes considered."
+      }
+    },
+    {
+      "name": "Documentation & Explanation Quality",
+      "description": "Is the constraint-based-creativity.md file complete, well-organized, and clearly explained?",
+      "scoring": {
+        "1": "Incomplete documentation. Missing key sections. Unclear explanations.",
+        "2": "Basic documentation. Most sections present but thin on details. Some unclear areas.",
+        "3": "Good documentation. All sections completed. Clear explanations. Organized logically.",
+        "4": "Excellent documentation. Comprehensive coverage of all sections. Clear explanations with examples. Well-organized.",
+        "5": "Exemplary documentation. Complete, detailed, well-organized file. Includes all required sections: problem, context, constraints with rationale, all ideas generated, insights from failures, top solutions with causality explanations, evaluation, breakthrough explanation, next steps. Could serve as template for others."
+      }
+    }
+  ],
+  "constraint_type_guidance": {
+    "Resource Constraints (budget, time, team)": {
+      "target_score": 3.5,
+      "focus_criteria": [
+        "Constraint-Creativity Causality",
+        "Solution Novelty",
+        "Strategic Constraint Design"
+      ],
+      "key_patterns": [
+        "Solutions should make limitation a feature, not just work within it",
+        "Look for 'resource inversion' - turning scarcity into advantage",
+        "Efficiency innovations often emerge from resource constraints"
+      ]
+    },
+    "Format/Medium Constraints (length, platform, style)": {
+      "target_score": 3.5,
+      "focus_criteria": [
+        "Constraint Integrity",
+        "Idea Volume & Process Rigor",
+        "Solution Novelty"
+      ],
+      "key_patterns": [
+        "Format constraints should force different communication patterns",
+        "High volume generation important (many ways to adapt to format)",
+        "Look for creative use of the constrained medium"
+      ]
+    },
+    "Rule-Based Constraints (forbidden/required elements)": {
+      "target_score": 4.0,
+      "focus_criteria": [
+        "Constraint Integrity",
+        "Constraint-Creativity Causality",
+        "Constraint Tightness"
+      ],
+      "key_patterns": [
+        "Rule constraints are easiest to 'cheat' - verify rigorous enforcement",
+        "Should force creative workarounds (that's the point)",
+        "Tightness is critical - too loose has no effect, too tight causes paralysis"
+      ]
+    },
+    "Technical Constraints (performance, compatibility, dependencies)": {
+      "target_score": 4.0,
+      "focus_criteria": [
+        "Actionability & Implementation Detail",
+        "Problem-Solution Fit",
+        "Risk Honesty"
+      ],
+      "key_patterns": [
+        "Technical constraints often reveal elegant solutions",
+        "Implementation details are critical for technical solutions",
+        "Trade-offs must be explicitly acknowledged"
+      ]
+    },
+    "Perspective/Audience Constraints (point of view, target audience)": {
+      "target_score": 3.5,
+      "focus_criteria": [
+        "Solution Novelty",
+        "Constraint-Creativity Causality",
+        "Strategic Constraint Design"
+      ],
+      "key_patterns": [
+        "Perspective shift should reveal new angles on problem",
+        "Look for reframing, not just translation",
+        "Audience constraints work best when forcing empathy shift"
+      ]
+    }
+  },
+  "challenge_complexity_guidance": {
+    "Simple Challenge (clear problem, straightforward constraints)": {
+      "target_score": 3.0,
+      "acceptable_shortcuts": [
+        "Lower idea volume acceptable (15+ instead of 20+)",
+        "Single constraint sufficient",
+        "Less detailed implementation plans"
+      ],
+      "key_quality_gates": [
+        "Constraint integrity still required",
+        "Causality must be clear",
+        "Solution must be actionable"
+      ]
+    },
+    "Standard Challenge (typical creative block, standard constraints)": {
+      "target_score": 3.5,
+      "required_elements": [
+        "20+ ideas generated",
+        "1-2 strategic constraints",
+        "Clear causality explanation",
+        "Implementation plan with timeline"
+      ],
+      "key_quality_gates": [
+        "All 10 criteria evaluated",
+        "Minimum score of 3 on each criterion",
+        "Average ≥ 3.5"
+      ]
+    },
+    "Complex Challenge (multi-faceted problem, multiple constraints)": {
+      "target_score": 4.0,
+      "required_elements": [
+        "30+ ideas across constraint combinations",
+        "2-3 strategic constraints with interaction effects",
+        "Detailed causality explanation for each constraint",
+        "Comprehensive implementation plan with risk mitigation",
+        "Constraint escalation or combination documented"
+      ],
+      "key_quality_gates": [
+        "All 10 criteria evaluated",
+        "Minimum score of 3.5 on each criterion",
+        "Average ≥ 4.0",
+        "Exceptional scores (5) on Causality and Strategic Design"
+      ]
+    }
+  },
+  "common_failure_modes": {
+    "1. Bending Constraints": {
+      "symptom": "Ideas violate constraints with justifications like 'close enough' or 'in spirit'",
+      "why_it_fails": "Defeats the purpose. Breakthrough ideas come from rigorous constraint enforcement.",
+      "fix": "Reject any idea that violates constraints, even slightly. The difficulty is the point.",
+      "related_criteria": ["Constraint Integrity", "Constraint Tightness"]
+    },
+    "2. Incremental Ideas": {
+      "symptom": "Solutions are slight variations of existing approaches, not fundamentally different",
+      "why_it_fails": "Constraint-based creativity should produce novel solutions, not optimizations.",
+      "fix": "Use novelty assessment (5-point scale). If scoring < 4, keep generating. Ask: 'Would this exist in unconstrained brainstorming?'",
+      "related_criteria": ["Solution Novelty", "Constraint-Creativity Causality"]
+    },
+    "3. Over-Constraining": {
+      "symptom": "Zero ideas generated, complete paralysis, frustration",
+      "why_it_fails": "Too many constraints or too-tight constraints prevent any ideation.",
+      "fix": "Reduce to 1-2 constraints maximum. Use constraint escalation to find sweet spot (start loose, progressively tighten).",
+      "related_criteria": ["Constraint Tightness", "Idea Volume & Process Rigor"]
+    },
+    "4. Arbitrary Constraints": {
+      "symptom": "Constraints have no clear connection to the creative block or challenge",
+      "why_it_fails": "Random constraints don't unlock creativity, they just add busywork.",
+      "fix": "Diagnose creative block first (abundance paralysis, pattern fixation, etc.). Choose constraints that counter that specific block.",
+      "related_criteria": ["Strategic Constraint Design", "Constraint-Creativity Causality"]
+    },
+    "5. Skipping Volume Phase": {
+      "symptom": "Only 3-5 ideas generated before evaluating and refining",
+      "why_it_fails": "Breakthrough ideas rarely appear in first few attempts. Need quantity to find quality.",
+      "fix": "Force minimum 20 ideas before ANY evaluation. Set timer and don't stop early. Document all ideas including 'bad' ones.",
+      "related_criteria": ["Idea Volume & Process Rigor", "Solution Novelty"]
+    },
+    "6. Missing Causality": {
+      "symptom": "Cannot explain how constraint drove the creativity. Solution could exist without constraint.",
+      "why_it_fails": "If constraint didn't drive the solution, it's not constraint-based creativity - it's regular brainstorming with constraints added after.",
+      "fix": "For each top solution, explicitly answer: 'Would this exist in unconstrained brainstorming? Why/why not?' If answer is 'yes', keep generating.",
+      "related_criteria": ["Constraint-Creativity Causality", "Strategic Constraint Design"]
+    },
+    "7. Conceptual Without Actionable": {
+      "symptom": "Solutions are interesting ideas but lack implementation details, timeline, resources",
+      "why_it_fails": "Creativity without execution is just daydreaming. Need actionable plans.",
+      "fix": "Add implementation section to each top solution: What are the first 3 actions? Timeline? Budget? Success metrics? If can't answer, solution is too conceptual.",
+      "related_criteria": ["Actionability & Implementation Detail", "Problem-Solution Fit"]
+    },
+    "8. Ignoring Risks": {
+      "symptom": "Solutions presented as perfect with no acknowledged downsides or risks",
+      "why_it_fails": "All solutions have trade-offs. Ignoring them suggests shallow thinking or overconfidence.",
+      "fix": "For each top solution, explicitly list: execution risks, assumption risks, failure modes, and limitations. If can't identify any risks, look harder.",
+      "related_criteria": ["Risk Honesty & Limitation Acknowledgment", "Actionability & Implementation Detail"]
+    }
+  },
+  "scale": {
+    "description": "Each criterion scored 1-5",
+    "min_score": 1,
+    "max_score": 5,
+    "passing_threshold": 3.5,
+    "excellence_threshold": 4.5
+  },
+  "usage_notes": {
+    "when_to_score": "After generating all ideas and selecting top solutions, before delivering to user",
+    "minimum_standard": "Average score ≥ 3.5 across all criteria (standard challenge). Simple challenges: ≥ 3.0. Complex challenges: ≥ 4.0.",
+    "how_to_improve": "If scoring < threshold, identify lowest-scoring criteria and iterate on those specific areas before delivering",
+    "self_assessment": "Score yourself honestly. This rubric is for quality assurance, not for impressing the user. Better to iterate now than deliver weak constraint-based creativity."
+  }
+}
--- a/skills/constraint-based-creativity/resources/examples/code-minimalism-api-design.md
+++ b/skills/constraint-based-creativity/resources/examples/code-minimalism-api-design.md
@@ -0,0 +1,465 @@
+# Constraint-Based Creativity: API Design with Minimalism Constraint
+
+## Problem Statement
+
+Design REST API for a task management service. Need to create clean, intuitive API that developers love using. Compete with established players (Todoist, Asana, etc.) by offering superior developer experience.
+
+## Context
+
+**What's been tried:** Initial unconstrained design produced typical REST API:
+- 23 endpoints across 6 resources
+- CRUD operations for everything (tasks, projects, users, tags, comments, attachments)
+- Complex nested routes: `/api/v1/projects/{id}/tasks/{taskId}/comments/{commentId}/attachments`
+- Auth tokens, pagination params, filtering, sorting on every endpoint
+- 147-page API documentation needed
+
+**Why we're stuck:** API feels bloated despite being "RESTful best practices." Developers in user testing confused by too many options. Onboarding taking 2+ hours just to understand available endpoints. Defaulting to "add more endpoints for more use cases" pattern.
+
+**Success criteria:**
+- New developer can make first successful API call in < 5 minutes
+- Complete API documentation fits on single page (not 147 pages)
+- Differentiated from competitor APIs (memorable, not just "another REST API")
+- Supports all core use cases without sacrificing capability
+
+## Active Constraints
+
+**Constraint 1: Maximum 5 HTTP Endpoints**
+- **Rationale:** Forces ruthless prioritization. Can't add endpoint for every use case. Must design for composability instead of proliferation.
+- **Enforcement:** API spec cannot exceed 5 endpoint definitions. Any new endpoint requires removing an existing one.
+
+**Constraint 2: Single-Page Documentation**
+- **Rationale:** If docs don't fit on one page, API is too complex. Forces clarity and simplicity in design.
+- **Enforcement:** Entire API reference (all endpoints, params, responses, examples) must render on single scrollable page.
+
+**Constraint 3: No Nested Routes Beyond 2 Levels**
+- **Rationale:** Prevents complexity creep from deeply nested resources. Forces flatter, more intuitive structure.
+- **Enforcement:** Routes like `/projects/{id}/tasks` are allowed (2 levels). Routes like `/projects/{id}/tasks/{taskId}/comments` are forbidden (3 levels).
+
+## Idea Generation Process
+
+**Technique used:** Constraint Escalation - started with 10 endpoints, progressively reduced to find creative sweet spot
+
+**Volume:** Generated 27 different API designs across 4 constraint levels
+
+**Mindset:** Initial resistance ("5 endpoints is impossible!"), then breakthrough moment when team realized constraint forced better abstraction.
+
+## All Ideas Generated
+
+### Round 1: 10 endpoints (baseline - 50% reduction from 23)
+
+1. Standard CRUD for tasks (4 endpoints)
+2. Standard CRUD for projects (4 endpoints)
+3. Authentication (1 endpoint)
+4. Search (1 endpoint)
+   - Assessment: Still too many, documentation still multi-page
+
+### Round 2: 7 endpoints (70% reduction)
+
+5. Combine tasks + projects into "items" resource (4 endpoints for items)
+6. Auth (1 endpoint)
+7. Query (1 endpoint for search/filter)
+8. Batch operations (1 endpoint)
+   - Assessment: Better, but "items" abstraction feels forced
+
+### Round 3: 5 endpoints (breakthrough target - 78% reduction)
+
+9. **Design A: Everything is a "node"**
+   - POST /nodes (create)
+   - GET /nodes (list/search)
+   - GET /nodes/{id} (read)
+   - PATCH /nodes/{id} (update)
+   - DELETE /nodes/{id} (delete)
+   - Assessment: ⭐ Clean but too generic, loses semantic meaning
+
+10. **Design B: Action-oriented**
+    - POST /create
+    - GET /query
+    - POST /update
+    - POST /delete
+    - GET /status
+    - Assessment: RESTful purists would hate it, but simple
+
+11. **Design C: Resource + actions**
+    - POST /tasks (create)
+    - GET /tasks (list all with query params)
+    - GET /tasks/{id} (get specific)
+    - PATCH /tasks/{id} (update)
+    - POST /tasks/batch (batch operations)
+    - Assessment: Can't handle projects, tags, users in 5 endpoints
+
+12. **Design D: GraphQL-like but REST**
+    - POST /query (get anything)
+    - POST /mutate (change anything)
+    - POST /auth (authentication)
+    - GET /schema (API schema)
+    - GET /health (health check)
+    - Assessment: ⭐ Interesting, but not really REST anymore
+
+13. **Design E: Hypermedia-driven**
+    - GET / (entry point, returns links to everything)
+    - POST /{resource} (create anything)
+    - GET /{resource} (list anything)
+    - GET /{resource}/{id} (get anything)
+    - PATCH /{resource}/{id} (update anything)
+    - Assessment: ⭐⭐ Generic but powerful, documentation points to root
+
+### Round 4: 3 endpoints (extreme - 87% reduction)
+
+14. **Design F: Commands pattern**
+    - POST /command (send any command)
+    - GET /query (query any data)
+    - GET / (documentation + schema)
+    - Assessment: ⭐⭐ Ultra-minimal, but loses REST semantics
+
+15. **Design G: Single endpoint**
+    - POST /api (everything goes here, JSON-RPC style)
+    - Assessment: Too extreme, not REST, documentation nightmare
+
+## Insights from "Failed" Ideas
+
+- **Designs 1-8 (10-7 endpoints):** Constraint not tight enough, still thinking in traditional CRUD patterns
+- **Design G (1 endpoint):** Over-constrained to point of paralysis, lost REST principles entirely
+- **Breakthrough zone:** 5 endpoints forced abstraction without losing usability
+- **Key insight:** Generic resource paths (`/{resource}`) + comprehensive query params = flexibility without endpoint proliferation
+
+## Top Solutions (Refined)
+
+### Solution 1: Hypermedia-Driven Minimalist API
+
+**Description:**
+
+Five-endpoint API that uses generic resource routing + HATEOAS (Hypermedia as the Engine of Application State) to provide full functionality while staying minimal.
+
+**The 5 Endpoints:**
+
+```
+1. GET /
+   - Entry point (root document)
+   - Returns all available resources and links
+   - Includes inline documentation
+   - Response:
+     {
+       "resources": ["tasks", "projects", "users", "tags"],
+       "actions": {
+         "create": "POST /{resource}",
+         "list": "GET /{resource}",
+         "get": "GET /{resource}/{id}",
+         "update": "PATCH /{resource}/{id}",
+         "delete": "DELETE /{resource}/{id}"
+       },
+       "docs": "https://docs.example.com",
+       "_links": {
+         "tasks": {"href": "/tasks"},
+         "projects": {"href": "/projects"}
+       }
+     }
+
+2. POST /{resource}
+   - Create any resource (tasks, projects, users, tags, etc.)
+   - Resource type determined by URL path
+   - Example: POST /tasks, POST /projects
+   - Response includes _links for related actions
+
+3. GET /{resource}
+   - List all items of resource type
+   - Query params: ?filter, ?sort, ?limit, ?offset
+   - Example: GET /tasks?filter=completed:false&sort=dueDate
+   - Response includes pagination links and available filters
+
+4. GET /{resource}/{id}
+   - Retrieve specific resource by ID
+   - Response includes _links to related resources and actions
+   - Example: GET /tasks/123 includes link to project, assigned user
+
+5. PATCH /{resource}/{id}
+   - Update specific resource
+   - Partial updates supported (send only changed fields)
+   - DELETE also uses this endpoint with {"deleted": true} flag
+   - Atomic updates with version checking
+```
+
+**How constraints shaped it:**
+
+The 5-endpoint limit forced us to stop thinking "one endpoint per use case" and start thinking "generic operations on any resource." We couldn't add `/tasks/batch` or `/projects/{id}/tasks` or `/search` - those would exceed 5 endpoints. Instead, batch operations go through PATCH with arrays, nested resources are discovered via hypermedia links, search uses query params on GET.
+
+Single-page documentation constraint forced us to make the API self-documenting (GET / returns structure) rather than writing extensive docs for 23 different endpoints. The API documentation became the API itself.
+
+No-nesting-beyond-2-levels constraint meant we couldn't do `/projects/{id}/tasks/{taskId}/comments` - instead, comments are queried via `GET /comments?taskId=123`, which is actually simpler for client code.
+
+**Strengths:**
+- Extreme simplicity: 5 endpoints to learn (vs 23 in original design)
+- Self-documenting: GET / explains the entire API
+- Extensible: Add new resources without adding endpoints
+- Consistent: Same pattern for all resources (POST to create, GET to list, etc.)
+- Developer-friendly: First API call can happen in 2 minutes (just GET /)
+- Documentation fits on single page (literally - root response + 5 endpoint patterns)
+- Hypermedia enables discovery (clients follow links rather than construct URLs)
+
+**Implementation notes:**
+
+**Resource Definitions:**
+```javascript
+// All resources share same interface
+interface Resource {
+  id: string;
+  type: string;  // "task" | "project" | "user" | "tag"
+  attributes: Record<string, any>;
+  relationships?: Record<string, ResourceLink>;
+  _links: {
+    self: { href: string };
+    [key: string]: { href: string };
+  };
+}
+```
+
+**Example: Create task**
+```bash
+POST /tasks
+{
+  "title": "Write API docs",
+  "completed": false,
+  "projectId": "proj-123"
+}
+
+Response:
+{
+  "id": "task-456",
+  "type": "task",
+  "attributes": {
+    "title": "Write API docs",
+    "completed": false
+  },
+  "relationships": {
+    "project": {"data": {"type": "project", "id": "proj-123"}}
+  },
+  "_links": {
+    "self": {"href": "/tasks/task-456"},
+    "project": {"href": "/projects/proj-123"},
+    "update": {"href": "/tasks/task-456", "method": "PATCH"},
+    "complete": {"href": "/tasks/task-456", "method": "PATCH", "body": {"completed": true}}
+  }
+}
+```
+
+**Example: Query with filters**
+```bash
+GET /tasks?filter=completed:false&filter=projectId:proj-123&sort=-dueDate&limit=20
+
+Response:
+{
+  "data": [ /* array of task resources */ ],
+  "meta": {
+    "total": 45,
+    "limit": 20,
+    "offset": 0
+  },
+  "_links": {
+    "self": {"href": "/tasks?filter=completed:false&filter=projectId:proj-123&sort=-dueDate&limit=20"},
+    "next": {"href": "/tasks?filter=completed:false&filter=projectId:proj-123&sort=-dueDate&limit=20&offset=20"}
+  }
+}
+```
+
+**Example: Batch update**
+```bash
+PATCH /tasks/task-456
+{
+  "updates": [
+    {"id": "task-456", "completed": true},
+    {"id": "task-789", "completed": true}
+  ]
+}
+```
+
+**Risks/Limitations:**
+- Generic routing may feel less "RESTful" to purists
+- Requires client to understand hypermedia (though _links help)
+- Query param complexity could grow (mitigate with clear documentation)
+- Initial learning curve for developers used to specific endpoints
+
+### Solution 2: Command-Query API
+
+**Description:**
+
+Extreme minimalism using Command Query Responsibility Segregation (CQRS) pattern with only 3 endpoints.
+
+**The 3 Endpoints:**
+
+```
+1. POST /command
+   - Send any write operation (create, update, delete)
+   - Body specifies command type and parameters
+   - Example:
+     {
+       "command": "createTask",
+       "data": {"title": "Write docs", "projectId": "proj-123"}
+     }
+
+2. POST /query
+   - Retrieve any data (list, get, search, filter)
+   - Body specifies query type and parameters
+   - Example:
+     {
+       "query": "getTasks",
+       "filters": {"completed": false, "projectId": "proj-123"},
+       "sort": ["-dueDate"],
+       "limit": 20
+     }
+
+3. GET /
+   - API schema and available commands/queries
+   - Self-documenting entry point
+```
+
+**How constraints shaped it:**
+
+3-endpoint constraint forced us completely away from REST resource-based thinking toward command/query pattern. We couldn't map resources to endpoints, so we mapped *intentions* (commands/queries) to a single endpoint each. This wouldn't exist in unconstrained design because REST resource mapping is default pattern.
+
+**Strengths:**
+- Ultimate minimalism: 3 endpoints total
+- Clear separation of reads (queries) vs writes (commands)
+- All commands versioned and auditable (event sourcing compatible)
+- Extremely flexible (add new commands without new endpoints)
+
+**Risks/Limitations:**
+- Not REST (breaks HTTP verb semantics)
+- POST for queries feels wrong to REST purists
+- Loses HTTP caching benefits (GET query would cache better)
+- Requires comprehensive command/query documentation
+
+### Solution 3: Smart Defaults API
+
+**Description:**
+
+5 endpoints with "intelligent defaults" that make common use cases zero-config while allowing full customization.
+
+**The 5 Endpoints:**
+
+```
+1. GET /
+   - Entry point + documentation
+
+2. POST /{resource}
+   - Create with smart defaults
+   - Example: POST /tasks with just {"title": "Write docs"}
+   - Auto-assigns: current user, default project, due date (24h from now)
+
+3. GET /{resource}
+   - Defaults to useful view (not everything)
+   - GET /tasks → uncompleted tasks for current user, sorted by due date
+   - Full query: GET /tasks?view=all&user=*&completed=*
+
+4. GET /{resource}/{id}
+   - Retrieve specific item
+   - Response includes related items intelligently
+   - GET /tasks/123 → includes project, assigned user, recent comments (last 5)
+
+5. POST /{resource}/{id}/action
+   - Semantic actions instead of PATCH
+   - POST /tasks/123/complete (vs PATCH with {"completed": true})
+   - POST /tasks/123/assign?userId=456
+   - POST /tasks/123/move?projectId=789
+```
+
+**How constraints shaped it:**
+
+5-endpoint limit meant we couldn't have separate endpoints for common actions (complete task, assign task, move task, etc.). Instead of PATCH with various payloads, we created semantic action endpoint that's more intuitive for developers. The constraint forced us to think: "What actions do developers actually want?" vs "What CRUD operations exist?"
+
+**Strengths:**
+- Developer-friendly (semantic actions match mental model)
+- Smart defaults reduce API calls (get task includes related data)
+- Progressive disclosure (simple cases are simple, complex cases possible)
+
+**Risks/Limitations:**
+- "Actions" endpoint could grow complex
+- Magic defaults might surprise users
+- Less "pure REST" than Solution 1
+
+## Evaluation
+
+**Constraint compliance:** ✓ All solutions respect 5-endpoint max (Solution 2 uses only 3), single-page documentation, and no deep nesting
+
+**Novelty assessment:** All solutions are novel (score: 5/5)
+- Solution 1: Hypermedia-driven design is uncommon in modern APIs
+- Solution 2: Command-Query pattern breaks REST entirely (radical)
+- Solution 3: Semantic actions vs CRUD is differentiated
+- None would exist with unconstrained design (would default to 23-endpoint CRUD API)
+
+**Problem fit:** Solutions address original challenge
+- **Developer onboarding < 5 min:** ✓ GET / self-documents, simple patterns
+- **Single-page docs:** ✓ All solutions achievable with 1-page documentation
+- **Differentiation:** ✓ All three approaches are memorable vs typical REST APIs
+- **Supports use cases:** ✓ Generic patterns support all original use cases
+
+**Actionability:** All three designs can be implemented immediately
+
+## Creative Breakthrough Explanation
+
+The constraint-driven breakthrough happened when we stopped asking "How do we fit 23 endpoints into 5?" and started asking "How do we design an API that only needs 5 endpoints?"
+
+**Thinking pattern broken:**
+- Old pattern: "Each use case needs an endpoint" (additive thinking)
+- New pattern: "Each endpoint should handle multiple use cases" (multiplicative thinking)
+
+**Unexpected angle revealed:**
+Minimalism isn't about removing features - it's about better abstraction. Generic `/{resource}` pattern with query params provides MORE flexibility than specific endpoints, not less.
+
+**Why wouldn't this exist in unconstrained brainstorming:**
+With no constraints, we defaulted to REST "best practices" which led to endpoint proliferation. The 5-endpoint constraint forced us to question whether those "best practices" were actually best for developer experience. Turns out, simplicity beats completeness for DX.
+
+**Real-world validation:**
+- Stripe API: Uses resource patterns (minimal endpoints)
+- GitHub API v3→v4: Moved from REST to GraphQL (single endpoint) for exactly this reason
+- Twilio API: Consistent resource patterns across all products
+
+The constraint helped us discover patterns that successful APIs already use.
+
+## Next Steps
+
+**Decision:** Implement Solution 1 (Hypermedia-Driven Minimalist API) as primary design
+
+**Rationale:**
+- Maintains REST principles (HTTP verbs matter)
+- Self-documenting (GET / returns structure)
+- Most familiar to developers (resource-based)
+- Proven pattern (HAL, JSON:API specs exist)
+- Best balance of minimalism and usability
+
+**Immediate actions:**
+1. Create OpenAPI spec for 5 endpoints (TODAY)
+2. Build prototype implementation (THIS WEEK)
+3. User test with 5 developers (NEXT WEEK)
+4. Measure onboarding time (target: < 5 minutes to first successful call)
+5. Write single-page documentation (NEXT WEEK)
+
+**Success metrics:**
+- Time to first API call (target: < 5 min)
+- Documentation page count (target: 1 page)
+- Developer satisfaction (NPS after onboarding)
+- Comparison: Our 5 endpoints vs competitor's 20+ endpoints
+
+## Self-Assessment (using rubric)
+
+**Constraint Integrity (5/5):** Rigorous adherence. Solution 1 uses exactly 5 endpoints. Documentation will fit on single page (verified with draft). No nesting beyond 2 levels.
+
+**Constraint-Creativity Causality (5/5):** Clear causality. Generic `/{resource}` pattern exists ONLY because 5-endpoint limit forbid per-use-case endpoints. Hypermedia self-documentation exists because single-page constraint forced self-documenting design.
+
+**Idea Volume & Quality (5/5):** Generated 15+ distinct API designs across 4 constraint levels. Top 3 solutions all score 5/5 novelty.
+
+**Problem-Solution Fit (5/5):** All solutions hit success criteria: < 5 min onboarding, single-page docs, differentiation, full capability.
+
+**Actionability (5/5):** Solution 1 includes OpenAPI spec, code examples, implementation notes, and testing plan. Can implement immediately.
+
+**Technical Rigor (5/5):** Solutions are architecturally sound. Hypermedia pattern is proven (HAL/JSON:API specs). Resource abstraction is clean.
+
+**Differentiation (5/5):** Design differentiates through minimalism (5 vs 23 endpoints), self-documentation (GET /), and developer experience focus.
+
+**Risk Honesty (4/5):** Acknowledged risks (hypermedia learning curve, query param complexity). Could add more mitigation details.
+
+**Documentation Quality (5/5):** Complete constraint-based-creativity.md file with full examples, code snippets, evaluation.
+
+**Breakthrough Clarity (5/5):** Explicitly explained how constraints drove creativity. Pattern shift from additive (endpoint per use case) to multiplicative (endpoint handles multiple use cases) is clearly articulated.
+
+**Overall Score: 4.9/5**
+
+API design is ready for implementation. Constraint-driven approach produced significantly better developer experience than unconstrained "REST best practices" approach.
--- a/skills/constraint-based-creativity/resources/examples/product-launch-guerrilla-marketing.md
+++ b/skills/constraint-based-creativity/resources/examples/product-launch-guerrilla-marketing.md
@@ -0,0 +1,404 @@
+# Constraint-Based Creativity: Product Launch Guerrilla Marketing
+
+## Problem Statement
+
+Launch a new B2B SaaS product (team collaboration tool) to market with goal of 1,000 signups in first month. Need creative marketing campaign that gets attention in crowded market.
+
+## Context
+
+**What's been tried:** Traditional approaches considered
+- Paid ads on LinkedIn ($20K/month budget quoted)
+- Content marketing (would take 6+ months to build audience)
+- Conference sponsorships ($15K-50K per event)
+- PR agency ($10K/month retainer)
+
+**Why we're stuck:** All conventional approaches require significant budget we don't have. Team defaulting to "we can't compete without money" mindset. Ideas feel defeated before starting.
+
+**Success criteria:**
+- Generate 1,000 signups in 30 days
+- Cost per acquisition under $5
+- Build brand awareness in target market (engineering/product teams at tech companies)
+- Create memorable, differentiated campaign
+
+## Active Constraints
+
+**Constraint 1: Budget - $500 Maximum Total Spend**
+- **Rationale:** Forces creativity beyond paid media. Can't outspend competitors, so must out-think them. Low budget eliminates expensive conventional tactics.
+- **Enforcement:** All tactics must have documented costs under $500 total.
+
+**Constraint 2: Platform - Twitter + GitHub Only**
+- **Rationale:** Forces focus. Instead of spreading thin across all channels, dominate where target audience (developers/PMs) already congregates.
+- **Enforcement:** No tactics on LinkedIn, Facebook, Instagram, or other platforms. Twitter and GitHub are the entire playing field.
+
+**Constraint 3: No Paid Promotion**
+- **Rationale:** Organic virality only. Forces ideas to be inherently shareable/interesting rather than amplified by money.
+- **Enforcement:** Zero ad spend. No boosted posts, no sponsored content, no influencer payments.
+
+## Idea Generation Process
+
+**Technique used:** Constraint Rotation Sprints (3 rounds of 10 minutes each)
+
+**Volume:** Generated 32 ideas across 3 constraint focus rounds
+
+**Mindset:** Initial resistance ("this is impossible"), but by round 2 team energized by the challenge. Constraint difficulty unlocked creativity.
+
+## All Ideas Generated
+
+### Round 1: Focus on $500 budget constraint (10 ideas)
+
+1. Free stickers ($200) mailed to anyone who requests
+   - Compliance: ✓ | Assessment: Cute but low impact
+
+2. $500 bounty for best integration built by community
+   - Compliance: ✓ | Assessment: Interesting, engages devs
+
+3. Sponsor a small hackathon ($500 venue/pizza)
+   - Compliance: ✓ | Assessment: Limited reach, one-time
+
+4. Create 100 GitHub repos demonstrating use cases ($0)
+   - Compliance: ✓ | Assessment: SEO value, educational
+
+5. Daily Twitter threads documenting "building in public" ($0)
+   - Compliance: ✓ | Assessment: Authentic, takes time
+
+6. "Tiny budget challenge": Document building company on $500
+   - Compliance: ✓ | Assessment: Meta and interesting
+
+7. Commission artwork ($500) from developer-artist, make it public domain
+   - Compliance: ✓ | Assessment: Unique, ownable asset
+
+8. Create CLI tool that's actually useful, includes subtle product promo ($0)
+   - Compliance: ✓ | Assessment: Value-first approach
+
+9. Host AMA on r/programming ($0) + Twitter Spaces ($0)
+   - Compliance: ✓ | Assessment: Direct engagement, authentic
+
+10. Create comprehensive benchmark comparison ($0 research time)
+    - Compliance: ✓ | Assessment: SEO + authority building
+
+### Round 2: Focus on Twitter + GitHub only (12 ideas)
+
+11. 30-day Twitter thread series: "Things I learned building our product"
+    - Compliance: ✓ | Assessment: Storytelling, serialized content
+
+12. GitHub Actions workflow that developers can actually use (free distribution)
+    - Compliance: ✓ | Assessment: Utility-first, viral potential
+
+13. Live-code on Twitter (video) building common integration
+    - Compliance: ✓ | Assessment: Educational, shows product value
+
+14. Create "anti-roadmap" GitHub repo (what we WON'T build and why)
+    - Compliance: ✓ | Assessment: Contrarian, honest, refreshing
+
+15. Twitter bot that solves common developer problem, mentions product
+    - Compliance: ✗ (might feel spammy) | Assessment: Risky
+
+16. "Tiny tools" GitHub org: 50 micro-tools, all showcase product API
+    - Compliance: ✓ | Assessment: SEO + utility, long-tail
+
+17. GitHub issue templates that non-customers can use
+    - Compliance: ✓ | Assessment: Generous, builds goodwill
+
+18. Tweet source code snippets daily with "how it works" explanations
+    - Compliance: ✓ | Assessment: Transparency, educational
+
+19. Create GitHub "awesome list" for collaboration tools (include ours honestly)
+    - Compliance: ✓ | Assessment: Authority, fair comparison
+
+20. Run "Twitter office hours" - answer any dev question daily for 30 days
+    - Compliance: ✓ | Assessment: Time-intensive but high engagement
+
+21. Create memorable Twitter personality/voice (snarky? earnest? technical?)
+    - Compliance: ✓ | Assessment: Differentiation through tone
+
+22. GitHub repo: "Interview questions for team collaboration tools"
+    - Compliance: ✓ | Assessment: Helps buyers evaluate ALL tools (generous)
+
+### Round 3: Focus on organic virality (10 ideas)
+
+23. "The $500 Company" - Document entire launch journey publicly
+    - Compliance: ✓ | Assessment: ⭐ Meta-story, highly shareable
+
+24. Create genuinely useful free tools first, product second
+    - Compliance: ✓ | Assessment: ⭐ Value-first, builds trust
+
+25. Controversial take: "Why most collaboration tools are built wrong"
+    - Compliance: ✓ | Assessment: Attention-grabbing, risky
+
+26. Show real revenue numbers publicly (radical transparency)
+    - Compliance: ✓ | Assessment: Differentiated, bold
+
+27. Create "Collaboration Tool Bingo" card (poke fun at industry clichés)
+    - Compliance: ✓ | Assessment: ⭐ Funny, relatable, shareable
+
+28. "Build your own [our product]" tutorial (gives away the secret sauce)
+    - Compliance: ✓ | Assessment: ⭐ Generous, confident, educational
+
+29. Rate competitors honestly on public spreadsheet (include ourselves)
+    - Compliance: ✓ | Assessment: Brave, helpful, trustworthy
+
+30. Create crisis/incident templates (free, useful, shows product in action)
+    - Compliance: ✓ | Assessment: Timely, practical value
+
+31. Start movement: "#NoMoreMeetings challenge" with our tool
+    - Compliance: ✓ | Assessment: ⭐ Aligns with pain point, movement potential
+
+32. Apologize publicly for something competitors do (but we don't)
+    - Compliance: ✓ | Assessment: ⭐ Contrarian positioning
+
+## Insights from "Failed" Ideas
+
+- **Idea #15 (Twitter bot):** Too promotional, violates "organic virality" spirit even if technically compliant
+- **Constraint tension revealed:** Hardest challenge is being interesting enough to spread WITHOUT money for amplification
+- **Pattern observed:** Ideas scoring ⭐ all share "generosity" angle - give away something valuable for free
+
+## Top Solutions (Refined)
+
+### Solution 1: "The $500 Startup" Public Journey
+
+**Description:**
+
+Launch product AND document the entire journey publicly as "The $500 Startup Challenge." Every dollar spent, every decision made, every failure encountered - shared in real-time via Twitter threads and GitHub repo.
+
+- **Twitter:** Daily thread updates (wins, losses, learnings, numbers)
+- **GitHub:** Public repo with all costs tracked, decisions documented, templates/tools shared
+- **Deliverables:** 30-day chronicle becomes case study, templates, and playbook others can use
+
+**How constraints shaped it:**
+
+This solution exists ONLY because of the constraints. With unlimited budget, we'd run conventional ads and never think to turn our limitation into our story. The $500 constraint became the entire narrative hook - it's not a bug, it's the feature. Platform constraint (Twitter + GitHub) forced us to storytelling formats (threads + repos) rather than polished landing pages. Organic-only constraint meant the story itself must be compelling enough to spread.
+
+**Strengths:**
+- Authentic and relatable (most startups are resource-constrained)
+- Built-in narrative tension ("Can they succeed on $500?")
+- Educational value (others learn from the journey)
+- Differentiates through transparency (competitors hide their budgets)
+- Meta-viral (the campaign IS the story)
+- Generates daily content without feeling promotional
+
+**Implementation notes:**
+
+**Week 1:** Launch announcement
+- Tweet: "We're launching a B2B SaaS product with a $500 budget. Total. Here's the plan: [thread]"
+- GitHub repo: Create "500-dollar-startup" repo with budget tracker ($500 remaining)
+- Commit: Daily updates to README with spend tracking
+
+**Week 2-4:** Daily chronicle
+- Daily Twitter thread (7am): Yesterday's progress, today's plan, $ spent, learning
+- Every dollar spent gets GitHub commit with rationale
+- Weekend: Long-form recap threads (3-5 tweets)
+- Engage authentically with every reply/question
+
+**Week 4:** Results summary
+- Final thread: Total signups, CAC, what worked/didn't, full financial breakdown
+- GitHub: Release all templates, scripts, playbooks as MIT licensed
+- Turn journey into permanent case study/resource
+
+**Budget allocation:**
+- Domain name: $12
+- Hosting (1 month): $5
+- Stickers (50): $100 (given to first 50 signups as "founding users")
+- GitHub Actions runner time: $20 (for automated tools we build)
+- Twitter spaces hosting tools: $0 (free)
+- Reserved for experiments: $363
+
+**Risks/Limitations:**
+- Requires consistent daily effort (30 days straight)
+- Vulnerability - public failure if we don't hit goals
+- Could appear gimmicky if not executed authentically
+- Copycat risk (others could run same constraint challenge)
+
+### Solution 2: "Generous Engineering" - Free Tools Arsenal
+
+**Description:**
+
+Build and open-source 10-15 genuinely useful dev tools/templates/workflows over 30 days, all showcasing product integration. Each tool solves a real problem for target audience. Tools live on GitHub, announced on Twitter, zero promotion - just pure utility.
+
+Tools include:
+1. GitHub Actions workflow for team notifications
+2. Incident response templates (runbooks)
+3. Meeting cost calculator (Chrome extension)
+4. Async standup bot (open source)
+5. Team health check template
+6. Collaboration metrics dashboard (open source)
+7. "Decision log" template
+8. Onboarding checklist generator
+9. Team documentation starter kit
+10. Integration examples repository
+
+**How constraints shaped it:**
+
+Budget constraint eliminated paid distribution, forcing "valuable enough to spread organically" approach. Can't buy attention, so must earn it through utility. Platform constraint (GitHub + Twitter) matches where devs already are - distribute where they work. Organic-only constraint means tools must be NO-STRINGS-ATTACHED valuable. Product mention is subtle (integration examples), not pushy.
+
+**Strengths:**
+- Builds genuine goodwill (helping before selling)
+- SEO value (10-15 repos ranking for relevant searches)
+- Demonstrates product capability (integration examples)
+- Long-tail value (tools keep attracting users after 30 days)
+- Community contribution potential (others fork/improve)
+- Positions team as "engineers who build useful things"
+
+**Implementation notes:**
+
+**Pre-launch:** Build 5 tools completely, outline next 5
+**Days 1-5:** Launch 1 tool per day
+- GitHub: Release with comprehensive README
+- Twitter: Announce with demo GIF, use case, why we built it
+- Zero product mention in tool itself
+- Links to product only in "built by" footer
+
+**Days 6-15:** Continue tool releases + community engagement
+- Respond to GitHub issues
+- Merge community PRs
+- Share user success stories
+
+**Days 16-30:** Integration showcase
+- Create "using these tools with [product]" examples
+- Not required, but available for those interested
+- Focus remains on tool utility
+
+**Budget allocation:**
+- Domain for tools site: $12
+- Hosting (static site): $5
+- GitHub Actions CI: $20
+- Designer time for og:images ($50 Fiverr): $50
+- Reserved: $413
+
+**Risks/Limitations:**
+- Requires significant engineering time (building 10-15 tools)
+- Tools must actually be useful (not vaporware)
+- Delayed attribution (hard to track which tool drove which signup)
+- Others could copy tools without attributing
+
+### Solution 3: "Collaboration Tool Bingo" + Movement
+
+**Description:**
+
+Create shareable "Collaboration Tool Bingo" card that pokes fun at industry clichés ("Synergy!", "Seamless integration!", "Revolutionize teamwork!", "10x productivity!"). Make it funny, relatable, and slightly self-deprecating. Launch #CollabToolBingo movement on Twitter.
+
+Launch with manifesto: "Most collaboration tools over-promise and under-deliver. Here's what we're building instead" + anti-bingo commitment (what we WON'T do).
+
+**How constraints shaped it:**
+
+Organic virality constraint forced "inherently shareable" approach - bingo cards are easy to screenshot and share. Budget constraint means can't buy attention, so must create "social currency" through humor. Platform constraint (Twitter) perfect for viral image-based content. This solution is pure constraint-driven creativity - wouldn't exist with paid media budget.
+
+**Strengths:**
+- Highly shareable (humor + relatable)
+- Positions product through contrast (anti-cliché stance)
+- Low effort to participate (#CollabToolBingo tweets)
+- Starts conversation about industry problems
+- Memorable and distinctive
+- Can evolve into movement/community
+
+**Implementation notes:**
+
+**Day 1:** Launch bingo card
+- Tweet bingo card image (16 clichés in 4x4 grid)
+- Thread: "If collaboration tools were honest" with explanation
+- Pin tweet for 30 days
+
+**Day 2-7:** Encourage participation
+- Retweet best bingo completions
+- Add new clichés based on community suggestions
+- Create "Bingo Hall of Fame" for funniest examples
+
+**Day 8-15:** Launch "Anti-Bingo Manifesto"
+- "We took the Bingo Card seriously. Here's what we're NOT building"
+- Commit publicly to avoiding each cliché
+- Explain how we're different
+
+**Day 16-30:** Community content
+- User-generated bingo cards
+- "Before/After" bingo (what they had vs what they wanted)
+- Expand into other SaaS bingo cards (community-driven)
+
+**Budget allocation:**
+- Designer for bingo card: $150 (Fiverr)
+- Domain for bingo microsite: $12
+- Hosting: $5
+- Reserved: $333
+
+**Risks/Limitations:**
+- Humor could fall flat or offend
+- Negative framing (criticizing competitors) could backfire
+- Requires cultural awareness (clichés must be universally recognized)
+- One-hit-wonder risk (viral moment doesn't translate to signups)
+
+## Evaluation
+
+**Constraint compliance:** ✓ All three solutions respect $500 budget, Twitter+GitHub platforms, and organic-only distribution
+
+**Novelty assessment:** All three solutions are novel (score: 5/5)
+- None would exist with unlimited budget (would run conventional ads instead)
+- Constraints directly shaped the creative direction
+- No competitors doing "The $500 Startup" or "Collaboration Bingo" campaigns
+
+**Problem fit:** Solutions address original challenge
+- **1,000 signups:** Viral potential + daily engagement + educational value = traffic
+- **Under $5 CAC:** All solutions well under $500 for 1K signups = $0.50 CAC if successful
+- **Brand awareness:** All three create memorable moments in target community
+- **Differentiation:** Transparency, generosity, humor = distinct from competitor marketing
+
+**Actionability:** All three can be executed immediately with existing team and budget
+
+## Creative Breakthrough Explanation
+
+The constraint-driven breakthroughs happened when we stopped viewing $500 as a limitation and started viewing it as:
+1. **The story** (Solution 1: The $500 constraint became the campaign narrative)
+2. **The focus** (Solution 2: Can't spread wide, so go deep with utility)
+3. **The freedom** (Solution 3: No budget for safe/boring, so take creative risks)
+
+**Thinking pattern broken:** "We need money to market" → "Our lack of money IS the marketing"
+
+**Unexpected angle revealed:** Transparency and vulnerability as differentiation strategy. Competitors hide their budgets and processes; we make ours the whole campaign.
+
+**Why wouldn't this exist in unconstrained brainstorming:** With $50K budget, we'd run LinkedIn ads and content marketing - conventional but forgettable. The $500 constraint forced us to think "What can ONLY we do with $500 that creates more attention than $50K in ads?" Answer: Turn the limitation into the story.
+
+## Next Steps
+
+**Decision:** Execute Solution 1 ("The $500 Startup") as primary campaign
+
+**Rationale:**
+- Highest narrative tension (can they succeed?)
+- Generates daily content (30 days of touchpoints)
+- Most authentic (real constraints, real challenges)
+- Educational value outlasts 30 days (becomes permanent resource)
+
+**Immediate actions:**
+1. Create GitHub repo "500-dollar-startup" with budget tracker (TODAY)
+2. Draft launch tweet thread (TOMORROW)
+3. Set up daily reminder for chronicle posts (TOMORROW)
+4. Build in public starting Day 1 (THIS WEEK)
+
+**Success metrics:**
+- Twitter followers gained
+- GitHub repo stars
+- Website signups (tracked via /500 UTM)
+- Media mentions
+- Community engagement (replies, retweets, issues)
+
+## Self-Assessment (using rubric)
+
+**Constraint Integrity (5/5):** All constraints rigorously respected. No bending rules. $500 is absolute limit. Twitter+GitHub only. Zero paid promotion.
+
+**Constraint-Creativity Causality (5/5):** Clear causality. These solutions exist BECAUSE of constraints, not despite them. Limitation became feature.
+
+**Idea Volume & Quality (5/5):** Generated 32 ideas in 30 minutes. Top 3 solutions are all novel (score 5/5 on novelty). Documented "failed" ideas and learnings.
+
+**Problem-Solution Fit (5/5):** All solutions address 1,000 signup goal, CAC target, brand awareness, and differentiation.
+
+**Actionability (5/5):** Solutions have detailed implementation plans, budget breakdowns, timeline, and next steps. Can execute immediately.
+
+**Strategic Insight (5/5):** Core insight is "constraint as campaign" - turning limitation into narrative advantage. This is replicable pattern for other resource-constrained launches.
+
+**Differentiation (5/5):** None of these campaigns exist in competitor space. Transparency and vulnerability as strategy is highly differentiated.
+
+**Risk Honesty (4/5):** Acknowledged risks (public failure, consistency required, vulnerability, copycat risk). Could add more mitigation strategies.
+
+**Documentation Quality (5/5):** Complete constraint-based-creativity.md file with all sections, metrics, causality explanation.
+
+**Breakthrough Clarity (5/5):** Explicitly explained how constraints drove creativity. Identified thinking pattern broken: "need money" → "lack of money IS the story."
+
+**Overall Score: 4.9/5**
+
+Campaign is ready for execution. Constraint-driven creativity successfully unlocked novel, differentiated, actionable marketing approach.
--- a/skills/constraint-based-creativity/resources/methodology.md
+++ b/skills/constraint-based-creativity/resources/methodology.md
@@ -0,0 +1,479 @@
+# Constraint-Based Creativity: Advanced Methodology
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Advanced Constraint-Based Creativity:
+- [ ] Step 1: Diagnose the creative block systematically
+- [ ] Step 2: Design strategic constraints using frameworks
+- [ ] Step 3: Apply advanced generation techniques
+- [ ] Step 4: Use constraint combinations and escalation
+- [ ] Step 5: Troubleshoot and adapt
+```
+
+**Step 1: Diagnose the creative block systematically**
+
+Use [1. Diagnosing Creative Blocks](#1-diagnosing-creative-blocks) to identify root cause (abundance paralysis, pattern fixation, resource anxiety, etc.). Understanding the block type determines optimal constraint design.
+
+**Step 2: Design strategic constraints using frameworks**
+
+Apply [2. Strategic Constraint Design](#2-strategic-constraint-design) frameworks to create constraints that directly counter the diagnosed block. Use constraint design principles and psychology insights from [3. Psychology of Constraints](#3-psychology-of-constraints).
+
+**Step 3: Apply advanced generation techniques**
+
+Use methods from [4. Advanced Idea Generation](#4-advanced-idea-generation) including parallel constraint exploration, constraint rotation sprints, and meta-constraint thinking. These techniques maximize creative output.
+
+**Step 4: Use constraint combinations and escalation**
+
+Apply [5. Constraint Combination Patterns](#5-constraint-combination-patterns) to layer multiple constraints strategically. Use [6. Constraint Escalation](#6-constraint-escalation) to progressively tighten constraints for breakthrough moments.
+
+**Step 5: Troubleshoot and adapt**
+
+If constraints aren't working, use [7. Troubleshooting](#7-troubleshooting) to diagnose and fix. Adapt constraints based on what's (not) working.
+
+---
+
+## 1. Diagnosing Creative Blocks
+
+Before designing constraints, diagnose WHY creativity is stuck:
+
+### Block Type 1: Abundance Paralysis
+**Symptoms:** Too many options, can't decide, everything feels possible but nothing feels right
+**Root cause:** Unlimited freedom creates decision anxiety
+**Best constraint types:** Resource (force scarcity), Format (force specificity)
+**Example constraint:** "Choose from exactly 3 options" or "$500 budget only"
+
+### Block Type 2: Pattern Fixation
+**Symptoms:** All ideas feel similar, defaulting to "what we always do", conventional thinking
+**Root cause:** Mental habits, industry best practices dominating, lack of novelty pressure
+**Best constraint types:** Rule-based (forbid the pattern), Historical (different era), Perspective (different audience)
+**Example constraint:** "Cannot use industry-standard approach" or "Design as if it's 1950"
+
+### Block Type 3: Complexity Creep
+**Symptoms:** Ideas keep getting more elaborate, feature bloat, "we need to add X and Y and Z"
+**Root cause:** Mistaking complexity for sophistication, no forcing function for simplicity
+**Best constraint types:** Format (force brevity), Resource (force efficiency), Technical (force optimization)
+**Example constraint:** "Explain in 10 words" or "Must work on 1990s hardware"
+
+### Block Type 4: Resource Anxiety
+**Symptoms:** "We can't do anything with this budget/time/team", defeatism, giving up before ideating
+**Root cause:** Viewing limitation as blocker rather than creative fuel
+**Best constraint types:** Resource Inversion (make limitation the feature), Success Stories (show constraint-driven wins)
+**Example constraint:** "Create marketing campaign that showcases our tiny budget as advantage"
+
+### Block Type 5: Incremental Thinking
+**Symptoms:** Ideas are "better" but not "different", improvements without breakthroughs
+**Root cause:** Optimization mindset, risk aversion, lack of permission for radical ideas
+**Best constraint types:** Forced Connection (force novelty), Forbidden Element (force workarounds), Extreme (force dramatic shift)
+**Example constraint:** "Combine your product with [random unrelated concept]" or "No gradual improvements allowed"
+
+### Block Type 6: Stakeholder Gridlock
+**Symptoms:** Every idea gets shot down, conflicting requirements, "we can't satisfy everyone"
+**Root cause:** Trying to please all stakeholders creates bland compromises
+**Best constraint types:** Audience (design for ONE stakeholder only), Perspective (extreme user), Polarity (embrace the conflict)
+**Example constraint:** "Design exclusively for power users, ignore novices" or "Optimize for speed OR quality, not both"
+
+---
+
+## 2. Strategic Constraint Design
+
+Frameworks for designing constraints that unlock creativity:
+
+### Framework 1: The Counterfactual Principle
+Design constraints that are the **opposite** of current assumptions.
+
+**Current assumption → Constraint:**
+- "We need more features" → "Maximum 3 features"
+- "We need more time" → "Ship in 48 hours"
+- "We need bigger budget" → "Spend $100 only"
+- "We need expert team" → "Design for novice execution"
+- "We need to add" → "Remove 50% of current"
+
+### Framework 2: The Subtraction Cascade
+Remove assumed "essentials" one at a time to reveal what's truly necessary.
+
+**Process:**
+1. List all assumed essentials (budget, time, team, features, marketing, etc.)
+2. For each essential, ask: "What if we had zero X?"
+3. Generate 5 ideas for zero-X scenario
+4. Identify which "essential" assumptions are actually optional
+
+**Example:** E-commerce site
+- What if zero budget for ads? → Viral/organic/referral strategies emerge
+- What if zero product photos? → Text-driven, story-driven commerce emerges
+- What if zero checkout process? → One-click, subscription, or auto-replenish models emerge
+
+### Framework 3: The Constraint Ladder
+Progressive constraint tightening to find the creative sweet spot.
+
+**Level 1 (Gentle):** Constraint is noticeable but comfortable
+**Level 2 (Challenging):** Constraint forces rethinking but still manageable
+**Level 3 (Extreme):** Constraint seems impossible, forces radical creativity
+**Level 4 (Paralyzing):** Constraint too tight, generates zero ideas
+
+**Example:** Content creation
+- L1: "Write in 500 words" (comfortable)
+- L2: "Write in 100 words" (challenging)
+- L3: "Write in 10 words" (extreme, forces breakthroughs)
+- L4: "Write in 3 words" (paralysis, too tight)
+
+Aim for Level 2-3. If hitting Level 4, back off one level.
+
+### Framework 4: The Medium-as-Constraint
+Force creativity by changing the medium.
+
+**Original medium → Constraint medium:**
+- Strategy document → Strategy as recipe
+- Product roadmap → Roadmap as movie trailer script
+- Technical docs → Docs as children's book
+- Business proposal → Proposal as Shakespearean sonnet
+- Data presentation → Data as visual art installation
+
+Medium shift breaks habitual communication patterns.
+
+---
+
+## 3. Psychology of Constraints
+
+Why constraints boost creativity (science-backed principles):
+
+### Principle 1: Cognitive Load Reduction
+**Theory:** Unlimited options overwhelm working memory. Constraints reduce cognitive load, freeing mental energy for creativity.
+**Application:** When team is overwhelmed, add constraints to reduce decision space.
+**Example:** "Choose from 3 pre-selected options" vs "anything is possible"
+
+### Principle 2: Breaking Automaticity
+**Theory:** Brains default to habitual patterns (energy-efficient). Constraints force conscious, deliberate thinking.
+**Application:** When stuck in patterns, add constraints that forbid the habit.
+**Example:** "No bullet points" forces different communication structure
+
+### Principle 3: Psychological Reactance
+**Theory:** Being told "you can't" triggers motivation to prove you can (within the rules).
+**Application:** Frame constraints as challenges, not limitations.
+**Example:** "Design without using any images" becomes a creative challenge
+
+### Principle 4: Permission Through Limitation
+**Theory:** Constraints provide "excuse" for radical ideas ("we HAD to because of X").
+**Application:** Use constraints to create safety for risky ideas.
+**Example:** "$100 budget" gives permission for guerrilla marketing tactics
+
+### Principle 5: Forced Combination
+**Theory:** Constraints force novel combinations that wouldn't occur in unconstrained thinking.
+**Application:** Use constraints that require merging unrelated concepts.
+**Example:** "Explain technical architecture using cooking metaphors only"
+
+---
+
+## 4. Advanced Idea Generation
+
+Techniques beyond basic listing:
+
+### Technique 1: Parallel Constraint Exploration
+Run multiple constraint sets simultaneously, compare results.
+
+**Process:**
+1. Choose 3 different constraint types
+2. Set timer for 10 minutes per constraint
+3. Generate ideas for Constraint A (10 min)
+4. Switch to Constraint B (10 min)
+5. Switch to Constraint C (10 min)
+6. Compare: which constraint produced most novel ideas?
+7. Double down on that constraint type
+
+**Example:** Logo design
+- Constraint A: 3 colors maximum → 8 ideas
+- Constraint B: Circles only → 12 ideas
+- Constraint C: Black/white only → 15 ideas (winner)
+- → Continue with black/white constraint
+
+### Technique 2: Constraint Rotation Sprints
+Rapid cycling through different constraints to prevent fixation.
+
+**Process:**
+1. Set 5-minute timer
+2. Generate ideas for Constraint 1
+3. When timer rings, IMMEDIATELY switch to Constraint 2
+4. Generate for 5 minutes, switch to Constraint 3
+5. After 3 constraints, review all ideas
+6. Select strongest, refine for 10 minutes
+
+Prevents over-thinking any single constraint.
+
+### Technique 3: Meta-Constraint Thinking
+Apply constraints to the constraint-selection process itself.
+
+**Meta-constraints:**
+- "Must use constraint type I've never tried"
+- "Must combine 2 opposite constraints" (e.g., "minimal" + "maximal")
+- "Must choose constraint that scares me"
+- "Must use constraint from unrelated domain" (e.g., music constraints for business problem)
+
+### Technique 4: The "Yes, But What If" Ladder
+Progressive constraint tightening with idea building.
+
+**Process:**
+1. Start with idea from loose constraint
+2. "Yes, but what if [tighter constraint]?" → Adapt idea
+3. "Yes, but what if [even tighter]?" → Adapt again
+4. Continue until idea breaks or becomes brilliant
+
+**Example:** Marketing campaign
+- Idea: Email newsletter
+- Yes, but what if no images? → Text-only newsletter with strong copy
+- Yes, but what if 50 words max? → Punchy, Hemingway-style newsletter
+- Yes, but what if one sentence only? → Twitter-thread-style micro-newsletter
+- Yes, but what if 6 words only? → Six-word story newsletter (breaks through!)
+
+### Technique 5: Constraint Archaeology
+Mine history for proven constraint-driven successes, adapt to current challenge.
+
+**Historical constraint successes:**
+- Twitter's 140 characters → Brevity revolution
+- Haiku's 5-7-5 syllables → Poetic concision
+- Dr. Seuss's 50-word challenge (Green Eggs and Ham) → Children's lit classic
+- Dogme 95 film rules → Cinema movement
+- Helvetica font (limited character set) → Timeless design
+
+**Process:**
+1. Research historical constraint-driven successes in any domain
+2. Extract the constraint principle
+3. Adapt to your challenge
+4. Test if same principle unlocks creativity
+
+---
+
+## 5. Constraint Combination Patterns
+
+Strategic ways to layer multiple constraints:
+
+### Pattern 1: Complementary Pairing
+Combine constraints that reinforce each other.
+
+**Examples:**
+- Resource + Format: "$100 budget" + "One-page proposal"
+- Time + Technical: "48-hour deadline" + "Use existing tools only"
+- Audience + Medium: "For 5-year-olds" + "Visual only, no text"
+
+**Why it works:** Constraints push in same direction, compounding effect.
+
+### Pattern 2: Tension Pairing
+Combine constraints that conflict, forcing creative resolution.
+
+**Examples:**
+- "Minimal design" + "Maximum information density"
+- "Professional tone" + "No corporate jargon"
+- "Fast execution" + "Zero technical debt"
+
+**Why it works:** Tension forces innovation to satisfy both.
+
+### Pattern 3: Progressive Layering
+Add constraints sequentially, not all at once.
+
+**Process:**
+1. Start with Constraint 1 → Generate 10 ideas
+2. Add Constraint 2 → Adapt best ideas or generate new ones
+3. Add Constraint 3 → Further refinement
+
+**Example:** Product launch
+1. Constraint 1: "Organic channels only" → 10 ideas
+2. Add Constraint 2: "+ $500 budget max" → 6 adapted ideas
+3. Add Constraint 3: "+ 48-hour timeline" → 3 final ideas (highly constrained, highly creative)
+
+### Pattern 4: Domain Transfer
+Apply constraints from one domain to another.
+
+**Examples:**
+- Music constraints → Business (rhythm, harmony, tempo applied to workflow)
+- Sports constraints → Product (rules, positions, scoring applied to features)
+- Cooking constraints → Writing (ingredients, timing, presentation applied to content)
+
+**Why it works:** Cross-domain constraints break industry-specific patterns.
+
+---
+
+## 6. Constraint Escalation
+
+Systematically tightening constraints to find breakthrough moments:
+
+### The Escalation Curve
+
+```
+Constraint Tightness →
+                    ↑ Creativity
+         Comfort Zone  | Moderate ideas
+    Productive Struggle | Interesting ideas
+           Breakthrough | NOVEL ideas ← Target this zone
+              Paralysis | Zero ideas (too tight)
+```
+
+### Escalation Process
+
+**Step 1: Establish Baseline**
+- Start with loose constraint
+- Generate 5 ideas
+- Assess novelty (probably low)
+
+**Step 2: First Escalation (50% tighter)**
+- Tighten constraint by half
+- Generate 5 ideas
+- Assess novelty (probably moderate)
+
+**Step 3: Second Escalation (75% tighter)**
+- Tighten significantly
+- Generate 5 ideas
+- Assess novelty (should be high)
+- **This is usually the breakthrough zone**
+
+**Step 4: Third Escalation (90% tighter)**
+- Tighten to near-impossibility
+- Attempt to generate ideas
+- If zero ideas → You've hit paralysis, back off to Step 3
+- If ideas emerge → Exceptional breakthroughs
+
+### Escalation Examples
+
+**Budget escalation:**
+- Baseline: $50K budget → Conventional ideas
+- 50%: $25K budget → Efficient ideas
+- 75%: $12.5K budget → Creative ideas
+- 90%: $5K budget → **Breakthrough** guerrilla ideas
+- 95%: $2.5K budget → Possible paralysis
+
+**Time escalation:**
+- Baseline: 4 weeks → Standard timeline
+- 50%: 2 weeks → Aggressive timeline
+- 75%: 1 week → **Breakthrough** rapid ideas
+- 90%: 48 hours → Extreme ideas or paralysis
+- 95%: 24 hours → Likely paralysis
+
+**Feature escalation:**
+- Baseline: 10 features → Feature bloat
+- 50%: 5 features → Focused product
+- 75%: 3 features → **Breakthrough** simplicity
+- 90%: 1 feature → Single-purpose tool (possible brilliance)
+- 95%: 0.5 features → Probably paralysis
+
+---
+
+## 7. Troubleshooting
+
+When constraints don't produce creativity:
+
+### Problem 1: Constraint Too Loose
+**Symptom:** Ideas feel conventional, no creative tension
+**Diagnosis:** Constraint isn't actually constraining behavior
+**Fix:** Escalate constraint (see Section 6). Make it tighter until you feel resistance.
+
+### Problem 2: Constraint Too Tight
+**Symptom:** Zero ideas generated, complete paralysis, frustration
+**Diagnosis:** Constraint exceeded breakthrough zone into paralysis
+**Fix:** Back off one level. Use Constraint Ladder (Framework 3) to find sweet spot.
+
+### Problem 3: Wrong Constraint Type
+**Symptom:** Lots of ideas but none address the original block
+**Diagnosis:** Constraint doesn't counter the diagnosed creative block
+**Fix:** Return to Section 1 (Diagnosing Creative Blocks). Match constraint type to block type.
+
+### Problem 4: Constraint Not Enforced
+**Symptom:** Ideas "bend" the constraint or ignore it entirely
+**Diagnosis:** Treating constraint as suggestion rather than rule
+**Fix:** Make constraint enforcement explicit. Reject any idea that violates constraint.
+
+### Problem 5: Too Many Constraints
+**Symptom:** Overwhelmed, don't know where to start, ideas satisfy some constraints but not others
+**Diagnosis:** Over-constrained (4+ simultaneous constraints)
+**Fix:** Reduce to 1-2 constraints maximum. Use Progressive Layering (Pattern 3) if multiple constraints needed.
+
+### Problem 6: Arbitrary Constraint
+**Symptom:** Constraint feels random, no clear purpose
+**Diagnosis:** Constraint wasn't strategically designed
+**Fix:** Use Strategic Constraint Design frameworks (Section 2). Constraint should counter specific block.
+
+### Problem 7: Evaluating Too Early
+**Symptom:** Only 3-5 ideas generated before giving up
+**Diagnosis:** Judging ideas before achieving volume
+**Fix:** Force 20+ ideas minimum before any evaluation. Quantity first, quality later.
+
+### Problem 8: Missing the Causality
+**Symptom:** Solutions are good but could exist without the constraint
+**Diagnosis:** Not truly constraint-driven creativity
+**Fix:** Ask: "Would this idea exist in unconstrained brainstorming?" If yes, keep generating. If no, you've found constraint-driven creativity.
+
+---
+
+## 8. Advanced Patterns
+
+### Pattern: Constraint-Driven Positioning
+Use constraint as market differentiator.
+
+**Example:** Basecamp's "No" list
+- Constraint: No enterprise features, no unlimited plans, no customization
+- Result: Positioned as "simple project management" vs complex competitors
+- Outcome: Constraint became brand identity
+
+### Pattern: The Constraint Manifesto
+Publicly commit to constraints as values.
+
+**Example:** Craigslist
+- Constraint: No modern UI, no VC funding, no ads (except jobs/housing)
+- Result: Authenticity, trustworthiness, community focus
+- Outcome: Constraint-driven culture
+
+### Pattern: Constraint as Filter
+Use constraints to make decisions effortless.
+
+**Example:** "If it's not a hell yes, it's a no" (Derek Sivers)
+- Constraint: Binary decision (yes/no only, no maybe)
+- Result: Clarity, focus, fewer regrets
+- Outcome: Constraint simplifies complex decisions
+
+### Pattern: Constraint Stacking
+Layer constraints over time as you master each.
+
+**Process:**
+1. Month 1: Master Constraint A (e.g., "Ship weekly")
+2. Month 2: Add Constraint B (e.g., "+ under 100 lines")
+3. Month 3: Add Constraint C (e.g., "+ zero dependencies")
+4. Result: Constraint-driven expertise
+
+### Pattern: The Anti-Portfolio
+Document what you're NOT doing (constraint as strategy).
+
+**Example:** Y Combinator's anti-portfolio
+- Constraint: "We passed on X because [reason]"
+- Result: Learning from constraint violations
+- Outcome: Refine constraint strategy
+
+---
+
+## 9. Constraint Design Workshop
+
+For teams stuck in creative blocks, run this structured workshop:
+
+**Pre-Work (15 min):**
+- Each person lists 5 ideas from unconstrained brainstorming
+- Share: Notice how similar ideas are
+
+**Round 1: Diagnosis (15 min):**
+- As team, diagnose the creative block type (Section 1)
+- Vote on constraint type to try
+
+**Round 2: Constraint Generation (30 min):**
+- Split into 3 groups, each with different constraint
+- Each group generates 10 ideas in 10 minutes
+- Share all ideas (30 total)
+
+**Round 3: Constraint Escalation (20 min):**
+- Choose winning constraint from Round 2
+- Tighten it (50% more restrictive)
+- Generate 10 more ideas as full team
+
+**Round 4: Evaluation (20 min):**
+- Identify top 3 ideas
+- Explain how constraint drove the creativity
+- Plan next steps
+
+**Total time:** 100 minutes
+**Expected outcome:** 40+ ideas, 3 constraint-driven breakthroughs
--- a/skills/constraint-based-creativity/resources/template.md
+++ b/skills/constraint-based-creativity/resources/template.md
@@ -0,0 +1,382 @@
+# Constraint-Based Creativity Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Constraint-Based Creativity Progress:
+- [ ] Step 1: Gather inputs and clarify the creative challenge
+- [ ] Step 2: Select or design 1-3 strategic constraints
+- [ ] Step 3: Generate 20+ ideas within constraints
+- [ ] Step 4: Evaluate and refine top solutions
+- [ ] Step 5: Document and validate
+```
+
+**Step 1: Gather inputs and clarify the creative challenge**
+
+Ask user for problem/creative challenge, context (what's been tried, why stuck), success criteria, existing constraints (real limitations), and preferred constraint types (if any). Use [Input Questions](#input-questions) to gather comprehensive context.
+
+**Step 2: Select or design 1-3 strategic constraints**
+
+If existing constraints → Work within them creatively. If no constraints → Design strategic ones using [Constraint Selection Guide](#constraint-selection-guide). Maximum 3 constraints to avoid paralysis. Document constraint rationale in the output file.
+
+**Step 3: Generate 20+ ideas within constraints**
+
+Use [Idea Generation Techniques](#idea-generation-techniques) to produce volume. Document ALL ideas including "failures" - they contain insights. Aim for 20+ ideas minimum before evaluating. Quality comes after quantity.
+
+**Step 4: Evaluate and refine top solutions**
+
+Apply [Evaluation Framework](#evaluation-framework) to select strongest 2-3 ideas. Refine by combining elements, removing complexity, and strengthening the constraint-driven insight. Document why these solutions stand out.
+
+**Step 5: Document and validate**
+
+Create `constraint-based-creativity.md` file with complete documentation. Validate using [Quality Checklist](#quality-checklist) before delivering. Ensure constraint-creativity causality is explained.
+
+---
+
+## Input Questions
+
+Ask the user to provide:
+
+**1. Creative Challenge:**
+- What needs solving, creating, or improving?
+- What's the core problem or opportunity?
+
+**2. Context:**
+- What's been tried already? Why didn't it work?
+- Why does ideation feel stuck or stale?
+- What assumptions are currently in place?
+
+**3. Success Criteria:**
+- What does a good solution look like?
+- How will you know if the constraint-based approach worked?
+- Are there measurable goals (cost, time, engagement)?
+
+**4. Existing Constraints (Real Limitations):**
+- Budget limitations? (exact amount)
+- Time constraints? (deadline)
+- Technical limitations? (platform, tools, compatibility)
+- Material/resource limitations? (team size, equipment)
+- Regulatory/policy constraints? (legal, compliance)
+
+**5. Constraint Preferences (Optional):**
+- Are there specific constraint types that interest you? (resource, format, rule-based, technical, perspective)
+- Any constraints you want to avoid?
+- Preference for tight vs loose constraints?
+
+---
+
+## Quick Template
+
+Create file: `constraint-based-creativity.md`
+
+```markdown
+# Constraint-Based Creativity: [Project Name]
+
+## Problem Statement
+
+[What creative challenge needs solving? Why is it important?]
+
+## Context
+
+**What's been tried:** [Previous approaches and why they didn't work]
+
+**Why we're stuck:** [Pattern that needs breaking - e.g., "All ideas feel incremental" or "Default to expensive solutions"]
+
+**Success criteria:** [What makes a solution successful? Measurable if possible]
+
+## Active Constraints
+
+**Constraint 1: [Type] - [Specific Limitation]**
+- Rationale: [Why this constraint will unlock creativity]
+- Enforcement: [How we'll ensure it's respected]
+
+**Constraint 2: [Type] - [Specific Limitation]**
+- Rationale: [Why this constraint matters]
+- Enforcement: [How we'll check compliance]
+
+**Constraint 3 (if applicable): [Type] - [Specific Limitation]**
+- Rationale: [Strategic purpose]
+- Enforcement: [Validation method]
+
+## Idea Generation Process
+
+**Technique used:** [e.g., Rapid listing, SCAMPER within constraints, Forced connections]
+
+**Volume:** Generated [X] ideas in [Y] minutes
+
+**Mindset:** [Notes on staying within constraints vs urge to bend rules]
+
+## All Ideas Generated
+
+1. [Idea 1 - brief description]
+   - Constraint compliance: ✓/✗
+   - Initial assessment: [Quick gut reaction]
+
+2. [Idea 2]
+   - Constraint compliance: ✓/✗
+   - Initial assessment:
+
+[Continue for all 20+ ideas...]
+
+## Insight from "Failed" Ideas
+
+[Document ideas that broke constraints or didn't work - what did they reveal?]
+
+## Top Solutions (Refined)
+
+### Solution 1: [Name/Title]
+
+**Description:** [Detailed explanation of the solution]
+
+**How constraints shaped it:** [Explain causality - this solution wouldn't exist without the constraints because...]
+
+**Strengths:**
+- [Strength 1]
+- [Strength 2]
+- [Strength 3]
+
+**Implementation notes:** [How to execute this]
+
+**Risks/Limitations:** [What could go wrong or where it falls short]
+
+### Solution 2: [Name/Title]
+
+**Description:** [Detailed explanation]
+
+**How constraints shaped it:** [Constraint-creativity causality]
+
+**Strengths:**
+- [Strength 1]
+- [Strength 2]
+
+**Implementation notes:** [Execution plan]
+
+**Risks/Limitations:** [Honest assessment]
+
+### Solution 3 (if applicable): [Name/Title]
+
+[Same structure as above]
+
+## Evaluation
+
+**Constraint compliance:** All top solutions fully respect the imposed limitations
+
+**Novelty assessment:** These solutions are [novel/somewhat novel/incremental] because [reasoning]
+
+**Problem fit:** Solutions address the original challenge by [explanation]
+
+**Actionability:** [Can these be implemented? What resources needed?]
+
+## Creative Breakthrough Explanation
+
+[Explain how the constraints drove the creativity. What thinking pattern did they break? What unexpected angle did they reveal? Why wouldn't these solutions exist in unconstrained brainstorming?]
+
+## Next Steps
+
+1. [Immediate action]
+2. [Follow-up action]
+3. [Testing/validation plan]
+
+## Self-Assessment (using rubric)
+
+[Score against rubric criteria before delivering to user]
+```
+
+---
+
+## Constraint Selection Guide
+
+**If user has existing constraints (real limitations):**
+
+1. **Accept and amplify:** Make the constraint tighter to force more creativity
+   - Budget is $5K → Challenge: "Design for $1K"
+   - Timeline is 2 weeks → Challenge: "Ship in 3 days"
+
+2. **Add complementary constraint:** Pair resource constraint with format constraint
+   - Low budget + "No text, visuals only"
+   - Short timeline + "Using existing tools only"
+
+**If user has no constraints (brainstorming is just stuck):**
+
+1. **Diagnose the stuck pattern:**
+   - Ideas too complex? → Add simplicity constraint ("Maximum 3 features")
+   - Ideas too conventional? → Add rule-based constraint ("Can't use industry standard approach")
+   - Ideas too similar? → Add perspective constraint ("Design for opposite audience")
+
+2. **Choose constraint type strategically:**
+
+| Stuck Pattern | Recommended Constraint | Example |
+|--------------|----------------------|---------|
+| Too complex/feature-bloated | Resource or Format | "One-page explanation" or "$100 budget" |
+| Too conventional | Rule-based | "Can't use competitor's approach" or "No best practices" |
+| Too similar to each other | Technical or Medium | "Text-based only" or "Works offline" |
+| Too vague/abstract | Format | "Explain in 6 words" or "Show with single image" |
+| Too incremental | Historical or Audience | "Design as if it's 1990" or "For 5-year-olds" |
+
+3. **Apply the "1-3 rule":**
+   - 1 constraint: Safe, good for first-timers
+   - 2 constraints: Sweet spot for most challenges
+   - 3 constraints: Maximum before over-constraining
+   - 4+ constraints: Usually paralyzes creativity (avoid)
+
+---
+
+## Idea Generation Techniques
+
+**Technique 1: Rapid Constraint-Compliant Listing**
+- Set timer for 15 minutes
+- List every idea that respects constraints, no matter how wild
+- Don't judge or refine - just capture volume
+- Aim for 30+ ideas in timeboxed session
+- Good for: Getting unstuck quickly
+
+**Technique 2: Constraint-Focused SCAMPER**
+- Apply SCAMPER prompts while respecting constraints:
+  - **S**ubstitute: What can replace X (within constraints)?
+  - **C**ombine: What can merge (within constraints)?
+  - **A**dapt: What can we adapt from elsewhere (within constraints)?
+  - **M**odify: What can we change (within constraints)?
+  - **P**ut to other use: Different purpose (within constraints)?
+  - **E**liminate: What can we remove (constraint might already do this)?
+  - **R**everse: What can we flip (within constraints)?
+- Good for: Systematic exploration
+
+**Technique 3: Forced Connections**
+- Pick 3 random elements (objects, concepts, brands)
+- Force connection between challenge + random element + constraint
+- Example: "App redesign" + "Coffee shop" + "No images" = Text-based app with coffee shop naming metaphors
+- Good for: Breaking patterns completely
+
+**Technique 4: Constraint Escalation**
+- Start with mild constraint, generate 5 ideas
+- Tighten constraint, generate 5 more
+- Tighten again, generate 5 more
+- Example: "$10K budget" → "$1K budget" → "$100 budget"
+- Good for: Finding the creative sweet spot
+
+**Technique 5: The "Yes, And" Game**
+- Build on each idea while adding constraint layer
+- Idea 1: "Simple landing page"
+- Yes, and (constraint): "...with no images, text only"
+- Yes, and: "...using only questions, no statements"
+- Yes, and: "...in under 50 words"
+- Good for: Progressive refinement
+
+---
+
+## Evaluation Framework
+
+**Phase 1: Constraint Compliance Check**
+
+For each idea, verify:
+- [ ] Respects ALL imposed constraints (no "bending" or exceptions)
+- [ ] Uses constraint as feature, not workaround (embraces limitation)
+- [ ] Would be eliminated in unconstrained brainstorming (proves constraint drove it)
+
+Eliminate any ideas that fail these checks.
+
+**Phase 2: Problem-Solution Fit**
+
+For remaining ideas, assess:
+- [ ] Addresses the original creative challenge
+- [ ] Meets success criteria (if measurable)
+- [ ] Is actionable with available resources
+- [ ] Differentiates from existing approaches
+
+Rank ideas by problem fit.
+
+**Phase 3: Novelty Assessment**
+
+For top-ranked ideas, evaluate:
+- **Novel (5)**: Completely unexpected angle, wouldn't exist without constraint
+- **Fresh (4)**: Interesting twist on existing concept, constraint made it distinctive
+- **Improved (3)**: Better version of known approach, constraint forced refinement
+- **Incremental (2)**: Slight variation, constraint didn't add much
+- **Derivative (1)**: Essentially same as existing, constraint was superficial
+
+Select ideas scoring 4-5 for refinement.
+
+**Phase 4: Refinement**
+
+For selected ideas:
+1. **Combine elements:** Can you merge strengths from multiple ideas?
+2. **Subtract complexity:** Remove anything non-essential
+3. **Strengthen constraint insight:** Make the constraint-creativity link more explicit
+4. **Add implementation details:** How would this actually work?
+5. **Acknowledge limitations:** Where does this solution fall short?
+
+---
+
+## Quality Checklist
+
+Before delivering `constraint-based-creativity.md` to user, verify:
+
+**Constraint Integrity:**
+- [ ] Constraints are clearly stated and rationalized
+- [ ] All top solutions genuinely respect constraints (no cheating)
+- [ ] Constraint enforcement was rigorous during ideation
+- [ ] Document includes 1-3 constraints (not over-constrained)
+
+**Idea Volume:**
+- [ ] Generated 20+ ideas minimum
+- [ ] Documented "failed" ideas and insights
+- [ ] Showed quantity before quality approach
+- [ ] Timeboxed generation to avoid perfectionism
+
+**Solution Quality:**
+- [ ] Selected 2-3 strongest solutions
+- [ ] Solutions are novel (not incremental variations)
+- [ ] Solutions solve the original problem
+- [ ] Solutions are actionable (not just conceptual)
+- [ ] Strengths and limitations are honestly assessed
+
+**Creative Causality:**
+- [ ] Explanation of HOW constraints drove creativity
+- [ ] Clear link between limitation and breakthrough
+- [ ] Wouldn't exist in unconstrained brainstorming
+- [ ] Identified what thinking pattern was broken
+
+**Documentation:**
+- [ ] Problem statement is clear
+- [ ] Context explains why stuck/what's been tried
+- [ ] Success criteria are stated
+- [ ] All ideas documented (including volume metrics)
+- [ ] Next steps are actionable
+
+---
+
+## Common Pitfalls
+
+**Pitfall 1: Bending constraints mid-process**
+- **Symptom:** "This constraint is too hard, can we adjust it?"
+- **Fix:** Constraint difficulty is the point. Breakthroughs happen when you can't take the easy path.
+
+**Pitfall 2: Accepting incremental ideas**
+- **Symptom:** Ideas that are slight variations of existing approaches
+- **Fix:** Use novelty assessment. If it scores < 4, keep generating.
+
+**Pitfall 3: Over-constraining**
+- **Symptom:** Zero ideas generated, complete creative paralysis
+- **Fix:** Reduce to 1-2 constraints max. Add constraints progressively, not all at once.
+
+**Pitfall 4: Arbitrary constraints**
+- **Symptom:** Constraint has no relationship to the creative block
+- **Fix:** Choose constraints strategically (see Constraint Selection Guide). Constraint should counter the stuck pattern.
+
+**Pitfall 5: Skipping volume phase**
+- **Symptom:** Evaluating/refining ideas before generating quantity
+- **Fix:** Force 20+ ideas before any judgment. Set timer and don't stop early.
+
+**Pitfall 6: Missing the causality**
+- **Symptom:** Can't explain how constraint drove the creativity
+- **Fix:** If solution could exist without constraint, it's not constraint-based creativity. Keep generating.
+
+**Pitfall 7: Confusing constraint-based with regular brainstorming**
+- **Symptom:** Treating constraints as optional or as framing device only
+- **Fix:** Constraints must be enforced rigorously. They're not suggestions.
+
+**Pitfall 8: Stopping at conceptual**
+- **Symptom:** Solutions are interesting but not actionable
+- **Fix:** Add implementation notes. Verify solution can actually be executed.
--- a/skills/d3-visualization/SKILL.md
+++ b/skills/d3-visualization/SKILL.md
@@ -0,0 +1,332 @@
+---
+name: d3-visualization
+description: Use when creating custom, interactive data visualizations with D3.js—building bar/line/scatter charts from scratch, creating network diagrams or geographic maps, binding changing data to visual elements, adding zoom/pan/brush interactions, animating chart transitions, or when chart libraries (Highcharts, Chart.js) don't support your specific visualization design and you need low-level control over data-driven DOM manipulation, scales, shapes, and layouts.
+---
+
+# D3.js Data Visualization
+
+## Table of Contents
+
+- [Read This First](#read-this-first)
+- [Workflows](#workflows)
+  - [Create Basic Chart Workflow](#create-basic-chart-workflow)
+  - [Update Visualization with New Data](#update-visualization-with-new-data)
+  - [Create Advanced Layout Workflow](#create-advanced-layout-workflow)
+- [Path Selection Menu](#path-selection-menu)
+- [Quick Reference](#quick-reference)
+
+---
+
+## Read This First
+
+### What This Skill Does
+
+This skill helps you create custom, interactive data visualizations using D3.js (Data-Driven Documents). D3 provides low-level building blocks for data-driven DOM manipulation, visual encoding, layout algorithms, and interactions—enabling bespoke visualizations that chart libraries can't provide.
+
+### When to Use D3
+
+**Use D3 when:**
+- Chart libraries don't support your specific design
+- You need full customization control
+- Creating network graphs, hierarchies, or geographic maps
+- Building interactive dashboards with linked views
+- Animating data changes smoothly
+- Working with complex or unconventional data structures
+
+**Don't use D3 when:**
+- Simple bar/line charts suffice (use Chart.js, Highcharts—easier)
+- You need 3D visualizations (use Three.js, WebGL)
+- Massive datasets >10K points without aggregation (performance limitations)
+- You're unfamiliar with JavaScript/SVG/CSS (prerequisites required)
+
+### Core Concepts
+
+**Data Joins**: Bind arrays to DOM elements, creating one-to-one correspondence
+**Scales**: Transform data values → visual values (pixels, colors, sizes)
+**Shapes**: Generate SVG paths for lines, areas, arcs from data
+**Layouts**: Calculate positions for complex visualizations (networks, trees, maps)
+**Transitions**: Animate smooth changes between states
+**Interactions**: Add zoom, pan, drag, brush selection behaviors
+
+### Skill Structure
+
+- **[Getting Started](resources/getting-started.md)**: Setup, prerequisites, first visualization
+- **[Selections & Data Joins](resources/selections-datajoins.md)**: DOM manipulation, data binding
+- **[Scales & Axes](resources/scales-axes.md)**: Data transformation, axis generation
+- **[Shapes & Layouts](resources/shapes-layouts.md)**: Path generators, basic layouts
+- **[Advanced Layouts](resources/advanced-layouts.md)**: Force simulation, hierarchies, geographic maps
+- **[Transitions & Interactions](resources/transitions-interactions.md)**: Animations, zoom/pan/drag/brush
+- **[Workflows](resources/workflows.md)**: Step-by-step guides for common chart types
+- **[Common Patterns](resources/common-patterns.md)**: Reusable code templates
+
+---
+
+## Workflows
+
+Choose a workflow based on your current task:
+
+### Create Basic Chart Workflow
+
+**Use when:** Building bar, line, or scatter charts from scratch
+
+**Time:** 1-2 hours
+
+**Copy this checklist and track your progress:**
+
+```
+Basic Chart Progress:
+- [ ] Step 1: Set up SVG container with margins
+- [ ] Step 2: Load and parse data
+- [ ] Step 3: Create scales (x, y)
+- [ ] Step 4: Generate axes
+- [ ] Step 5: Bind data and create visual elements
+- [ ] Step 6: Style and add interactivity
+```
+
+**Step 1: Set up SVG container with margins**
+
+Create SVG element with proper dimensions. Define margins for axes: `{top: 20, right: 20, bottom: 30, left: 40}`. Calculate inner width/height: `width - margin.left - margin.right`. See [Getting Started](resources/getting-started.md#setup-svg-container).
+
+**Step 2: Load and parse data**
+
+Use `d3.csv('data.csv')` for external files or define data array directly. Parse dates with `d3.timeParse('%Y-%m-%d')` for time series. Convert strings to numbers for CSV data using conversion function. See [Getting Started](resources/getting-started.md#loading-data).
+
+**Step 3: Create scales**
+
+Choose scale types based on data: `scaleBand` (categorical), `scaleTime` (temporal), `scaleLinear` (quantitative). Set domains from data using `d3.extent()`, `d3.max()`, or manual ranges. Set ranges from SVG dimensions. See [Scales & Axes](resources/scales-axes.md#scale-types).
+
+**Step 4: Generate axes**
+
+Create axis generators: `d3.axisBottom(xScale)`, `d3.axisLeft(yScale)`. Append g elements positioned with transforms. Call axis generators: `.call(axis)`. Customize ticks with `.ticks()`, `.tickFormat()`. See [Scales & Axes](resources/scales-axes.md#creating-axes).
+
+**Step 5: Bind data and create visual elements**
+
+Use data join pattern: `svg.selectAll(type).data(array).join(type)`. Set attributes using scales and accessor functions: `.attr('x', d => xScale(d.category))`. For line charts, use `d3.line()` generator. For scatter plots, create circles with `cx`, `cy`, `r` attributes. See [Selections & Data Joins](resources/selections-datajoins.md#data-join-pattern) and [Shapes & Layouts](resources/shapes-layouts.md).
+
+**Step 6: Style and add interactivity**
+
+Apply colors: `.attr('fill', ...)`, `.attr('stroke', ...)`. Add hover effects: `.on('mouseover', ...)` with tooltip. Add click handlers for drill-down. Apply transitions for initial animation (optional). See [Transitions & Interactions](resources/transitions-interactions.md#tooltips) and [Common Patterns](resources/common-patterns.md#tooltip-pattern).
+
+---
+
+### Update Visualization with New Data
+
+**Use when:** Refreshing charts with new data (real-time, filters, user interactions)
+
+**Time:** 30 minutes - 1 hour
+
+**Copy this checklist:**
+
+```
+Update Progress:
+- [ ] Step 1: Encapsulate visualization in update function
+- [ ] Step 2: Update scale domains if needed
+- [ ] Step 3: Re-bind data with key function
+- [ ] Step 4: Add transitions to join
+- [ ] Step 5: Update attributes with new values
+- [ ] Step 6: Trigger update on data change
+```
+
+**Step 1: Encapsulate visualization in update function**
+
+Wrap steps 3-5 from basic chart workflow in `function update(newData) { ... }`. This makes visualization reusable for any dataset. See [Workflows](resources/workflows.md#update-pattern).
+
+**Step 2: Update scale domains**
+
+Recalculate domains when data range changes: `yScale.domain([0, d3.max(newData, d => d.value)])`. Update axes with transition: `svg.select('.y-axis').transition().duration(500).call(yAxis)`. See [Selections & Data Joins](resources/selections-datajoins.md#updating-scales).
+
+**Step 3: Re-bind data with key function**
+
+Use key function for object constancy: `.data(newData, d => d.id)`. Ensures elements track data items, not array positions. Critical for correct transitions. See [Selections & Data Joins](resources/selections-datajoins.md#key-functions).
+
+**Step 4: Add transitions to join**
+
+Insert `.transition().duration(500)` before attribute updates. Specify easing with `.ease(d3.easeCubicOut)`. For custom enter/exit effects, use enter/exit functions in `.join()`. See [Transitions & Interactions](resources/transitions-interactions.md#basic-transitions).
+
+**Step 5: Update attributes with new values**
+
+Set positions/sizes using updated scales: `.attr('y', d => yScale(d.value))`, `.attr('height', d => height - yScale(d.value))`. Transitions animate from old to new values. See [Common Patterns](resources/common-patterns.md#bar-chart-template).
+
+**Step 6: Trigger update on data change**
+
+Call `update(newData)` when data changes: button clicks, timers (`setInterval`), WebSocket messages, API responses. For real-time, use sliding window to limit data points. See [Workflows](resources/workflows.md#real-time-updates).
+
+---
+
+### Create Advanced Layout Workflow
+
+**Use when:** Building network graphs, hierarchies, or geographic maps
+
+**Time:** 2-4 hours
+
+**Copy this checklist:**
+
+```
+Advanced Layout Progress:
+- [ ] Step 1: Choose appropriate layout type
+- [ ] Step 2: Prepare and structure data
+- [ ] Step 3: Create and configure layout
+- [ ] Step 4: Apply layout to data
+- [ ] Step 5: Bind computed properties to elements
+- [ ] Step 6: Add interactions (drag, zoom)
+```
+
+**Step 1: Choose appropriate layout type**
+
+**Force Simulation**: Network diagrams, organic clustering. **Hierarchies**: Tree, cluster (node-link), treemap, pack, partition (space-filling). **Geographic**: Maps with projections. **Chord**: Flow diagrams. See [Advanced Layouts](resources/advanced-layouts.md#choosing-layout) for decision guidance.
+
+**Step 2: Prepare and structure data**
+
+**Force**: `nodes = [{id, group}]`, `links = [{source, target}]`. **Hierarchy**: Nested objects with children arrays, convert with `d3.hierarchy(data)`. **Geographic**: GeoJSON features. See [Advanced Layouts](resources/advanced-layouts.md#data-structures).
+
+**Step 3: Create and configure layout**
+
+**Force**: `d3.forceSimulation(nodes).force('link', d3.forceLink(links)).force('charge', d3.forceManyBody())`. **Hierarchy**: `d3.treemap().size([width, height])`. **Geographic**: `d3.geoMercator().fitExtent([[0,0], [width,height]], geojson)`. See [Advanced Layouts](resources/advanced-layouts.md) for each layout type.
+
+**Step 4: Apply layout to data**
+
+**Force**: Simulation runs automatically, updates node positions each tick. **Hierarchy**: Call layout on root: `treemap(root)`. **Geographic**: No application needed, projection used in path generator. See [Advanced Layouts](resources/advanced-layouts.md#applying-layouts).
+
+**Step 5: Bind computed properties to elements**
+
+**Force**: Update `cx`, `cy` in tick handler: `node.attr('cx', d => d.x)`. **Hierarchy**: Use `d.x0`, `d.x1`, `d.y0`, `d.y1` for rectangles. **Geographic**: Use `path(feature)` for `d` attribute. See [Workflows](resources/workflows.md) for layout-specific examples.
+
+**Step 6: Add interactions**
+
+**Drag** for force networks: `node.call(d3.drag().on('drag', dragHandler))`. **Zoom** for maps/large networks: `svg.call(d3.zoom().on('zoom', zoomHandler))`. **Click** for hierarchy drill-down. See [Transitions & Interactions](resources/transitions-interactions.md).
+
+---
+
+## Path Selection Menu
+
+**What would you like to do?**
+
+1. **[I'm new to D3](resources/getting-started.md)** - Setup environment, understand prerequisites, create first visualization
+
+2. **[Build a basic chart](resources/workflows.md#basic-charts)** - Bar, line, or scatter plot step-by-step
+
+3. **[Transform data with scales](resources/scales-axes.md)** - Map data values to visual properties (positions, colors, sizes)
+
+4. **[Bind data to elements](resources/selections-datajoins.md)** - Connect arrays to DOM elements, handle dynamic updates
+
+5. **[Create network/hierarchy/map](resources/advanced-layouts.md)** - Force-directed graphs, treemaps, geographic visualizations
+
+6. **[Add animations](resources/transitions-interactions.md#transitions)** - Smooth transitions between chart states
+
+7. **[Add interactivity](resources/transitions-interactions.md#interactions)** - Zoom, pan, drag, brush selection, tooltips
+
+8. **[Update chart with new data](resources/workflows.md#update-pattern)** - Handle real-time data, filters, user interactions
+
+9. **[Get code templates](resources/common-patterns.md)** - Copy-paste-modify templates for common patterns
+
+10. **[Understand D3 concepts](resources/getting-started.md#core-concepts)** - Deep dive into data joins, scales, generators, layouts
+
+---
+
+## Quick Reference
+
+### Data Join Pattern (Core D3 Workflow)
+
+```javascript
+// 1. Select container
+const svg = d3.select('svg');
+
+// 2. Bind data
+svg.selectAll('circle')
+  .data(dataArray)
+  .join('circle')          // Create/update/remove elements automatically
+    .attr('cx', d => d.x)  // Use accessor functions (d = datum, i = index)
+    .attr('cy', d => d.y)
+    .attr('r', 5);
+```
+
+### Scale Creation (Data → Visual Transformation)
+
+```javascript
+// For quantitative data
+const xScale = d3.scaleLinear()
+  .domain([0, 100])        // Data range
+  .range([0, 500]);        // Pixel range
+
+// For categorical data
+const xScale = d3.scaleBand()
+  .domain(['A', 'B', 'C'])
+  .range([0, 500])
+  .padding(0.1);
+
+// For temporal data
+const xScale = d3.scaleTime()
+  .domain([new Date(2020, 0, 1), new Date(2020, 11, 31)])
+  .range([0, 500]);
+```
+
+### Shape Generators (SVG Path Creation)
+
+```javascript
+// Line chart
+const line = d3.line()
+  .x(d => xScale(d.date))
+  .y(d => yScale(d.value));
+
+svg.append('path')
+  .datum(data)           // Use .datum() for single data item
+  .attr('d', line)       // Line generator creates path data
+  .attr('fill', 'none')
+  .attr('stroke', 'steelblue');
+```
+
+### Transitions (Animation)
+
+```javascript
+// Add transition before attribute updates
+svg.selectAll('rect')
+  .data(newData)
+  .join('rect')
+  .transition()          // Everything after this is animated
+  .duration(500)         // Milliseconds
+  .attr('height', d => yScale(d.value));
+```
+
+### Common Scale Types
+
+| Data Type | Task | Scale |
+|-----------|------|-------|
+| Quantitative (linear) | Position, size | `scaleLinear()` |
+| Quantitative (exponential) | Compress range | `scaleLog()`, `scalePow()` |
+| Quantitative → Circle area | Size circles | `scaleSqrt()` |
+| Categorical | Bars, groups | `scaleBand()`, `scalePoint()` |
+| Categorical → Colors | Color encoding | `scaleOrdinal()` |
+| Temporal | Time series | `scaleTime()` |
+| Quantitative → Colors | Heatmaps | `scaleSequential()` |
+
+### D3 Module Imports (ES6)
+
+```javascript
+// Specific functions
+import { select, selectAll } from 'd3-selection';
+import { scaleLinear, scaleBand } from 'd3-scale';
+import { line, area } from 'd3-shape';
+
+// Entire D3 namespace
+import * as d3 from 'd3';
+```
+
+### Key Resources by Task
+
+- **Setup & First Chart**: [Getting Started](resources/getting-started.md)
+- **Data Binding**: [Selections & Data Joins](resources/selections-datajoins.md)
+- **Scales & Axes**: [Scales & Axes](resources/scales-axes.md)
+- **Chart Types**: [Workflows](resources/workflows.md) + [Common Patterns](resources/common-patterns.md)
+- **Networks & Trees**: [Advanced Layouts](resources/advanced-layouts.md#force-simulation) + [Advanced Layouts](resources/advanced-layouts.md#hierarchies)
+- **Maps**: [Advanced Layouts](resources/advanced-layouts.md#geographic-maps)
+- **Animation**: [Transitions & Interactions](resources/transitions-interactions.md#transitions)
+- **Interactivity**: [Transitions & Interactions](resources/transitions-interactions.md#interactions)
+
+---
+
+## Next Steps
+
+1. **New to D3?** Start with [Getting Started](resources/getting-started.md)
+2. **Know basics?** Jump to [Workflows](resources/workflows.md) for specific chart types
+3. **Need reference?** Use [Common Patterns](resources/common-patterns.md) for templates
+4. **Build custom viz?** Explore [Advanced Layouts](resources/advanced-layouts.md)
--- a/skills/d3-visualization/resources/advanced-layouts.md
+++ b/skills/d3-visualization/resources/advanced-layouts.md
@@ -0,0 +1,464 @@
+# Advanced Layouts
+
+## Why Layouts Matter
+
+**The Problem**: Calculating positions for complex visualizations (network graphs, treemaps, maps) involves sophisticated algorithms—force-directed simulation, spatial partitioning, map projections—that are mathematically complex.
+
+**D3's Solution**: Layout generators compute positions/sizes/angles automatically. You provide data, configure layout, receive computed coordinates.
+
+**Key Principle**: "Layouts transform data, don't render." Layouts add properties (`x`, `y`, `width`, etc.) that you bind to visual elements.
+
+---
+
+## Force Simulation
+
+### Why Force Layouts
+
+**Use Case**: Network diagrams, organic clustering where fixed positions are unnatural.
+
+**How It Works**: Physics simulation with forces (repulsion, attraction, gravity) that iteratively compute node positions.
+
+---
+
+### Basic Setup
+
+```javascript
+const nodes = [
+  {id: 'A', group: 1},
+  {id: 'B', group: 1},
+  {id: 'C', group: 2}
+];
+
+const links = [
+  {source: 'A', target: 'B'},
+  {source: 'B', target: 'C'}
+];
+
+const simulation = d3.forceSimulation(nodes)
+  .force('link', d3.forceLink(links).id(d => d.id))
+  .force('charge', d3.forceManyBody())
+  .force('center', d3.forceCenter(width / 2, height / 2));
+```
+
+---
+
+### Forces
+
+**forceLink**: Maintains fixed distance between connected nodes
+```javascript
+.force('link', d3.forceLink(links)
+  .id(d => d.id)
+  .distance(50))
+```
+
+**forceManyBody**: Repulsion (negative) or attraction (positive)
+```javascript
+.force('charge', d3.forceManyBody().strength(-100))
+```
+
+**forceCenter**: Pulls nodes toward center point
+```javascript
+.force('center', d3.forceCenter(width / 2, height / 2))
+```
+
+**forceCollide**: Prevents overlapping circles
+```javascript
+.force('collide', d3.forceCollide().radius(20))
+```
+
+**forceX / forceY**: Attracts to specific coordinates
+```javascript
+.force('x', d3.forceX(width / 2).strength(0.1))
+```
+
+---
+
+### Rendering
+
+```javascript
+const link = svg.selectAll('line')
+  .data(links)
+  .join('line')
+    .attr('stroke', '#999');
+
+const node = svg.selectAll('circle')
+  .data(nodes)
+  .join('circle')
+    .attr('r', 10)
+    .attr('fill', d => colorScale(d.group));
+
+simulation.on('tick', () => {
+  link
+    .attr('x1', d => d.source.x)
+    .attr('y1', d => d.source.y)
+    .attr('x2', d => d.target.x)
+    .attr('y2', d => d.target.y);
+
+  node
+    .attr('cx', d => d.x)
+    .attr('cy', d => d.y);
+});
+```
+
+**Key**: Update positions in `tick` handler as simulation runs.
+
+---
+
+### Drag Behavior
+
+```javascript
+function drag(simulation) {
+  function dragstarted(event) {
+    if (!event.active) simulation.alphaTarget(0.3).restart();
+    event.subject.fx = event.subject.x;
+    event.subject.fy = event.subject.y;
+  }
+
+  function dragged(event) {
+    event.subject.fx = event.x;
+    event.subject.fy = event.y;
+  }
+
+  function dragended(event) {
+    if (!event.active) simulation.alphaTarget(0);
+    event.subject.fx = null;
+    event.subject.fy = null;
+  }
+
+  return d3.drag()
+    .on('start', dragstarted)
+    .on('drag', dragged)
+    .on('end', dragended);
+}
+
+node.call(drag(simulation));
+```
+
+---
+
+## Hierarchies
+
+### Creating Hierarchies
+
+```javascript
+const data = {
+  name: 'root',
+  children: [
+    {name: 'child1', value: 10},
+    {name: 'child2', value: 20, children: [{name: 'grandchild', value: 5}]}
+  ]
+};
+
+const root = d3.hierarchy(data)
+  .sum(d => d.value)           // Aggregate values up tree
+  .sort((a, b) => b.value - a.value);
+```
+
+**Key Methods**:
+- `hierarchy(data)`: Creates hierarchy from nested object
+- `.sum(accessor)`: Computes values (leaf → root)
+- `.sort(comparator)`: Orders siblings
+- `.descendants()`: All nodes (breadth-first)
+- `.leaves()`: Leaf nodes only
+
+---
+
+### Tree Layout
+
+**Use**: Node-link diagrams (org charts, file systems)
+
+```javascript
+const tree = d3.tree().size([width, height]);
+tree(root);
+
+// Creates x, y properties on each node
+svg.selectAll('circle')
+  .data(root.descendants())
+  .join('circle')
+    .attr('cx', d => d.x)
+    .attr('cy', d => d.y)
+    .attr('r', 5);
+
+// Links
+svg.selectAll('line')
+  .data(root.links())
+  .join('line')
+    .attr('x1', d => d.source.x)
+    .attr('y1', d => d.source.y)
+    .attr('x2', d => d.target.x)
+    .attr('y2', d => d.target.y);
+```
+
+---
+
+### Treemap Layout
+
+**Use**: Space-filling rectangles (disk usage, budget allocation)
+
+```javascript
+const treemap = d3.treemap()
+  .size([width, height])
+  .padding(1);
+
+treemap(root);
+
+svg.selectAll('rect')
+  .data(root.leaves())
+  .join('rect')
+    .attr('x', d => d.x0)
+    .attr('y', d => d.y0)
+    .attr('width', d => d.x1 - d.x0)
+    .attr('height', d => d.y1 - d.y0)
+    .attr('fill', d => colorScale(d.value));
+```
+
+---
+
+### Pack Layout
+
+**Use**: Circle packing (bubble charts)
+
+```javascript
+const pack = d3.pack().size([width, height]).padding(3);
+pack(root);
+
+svg.selectAll('circle')
+  .data(root.descendants())
+  .join('circle')
+    .attr('cx', d => d.x)
+    .attr('cy', d => d.y)
+    .attr('r', d => d.r)
+    .attr('fill', d => d.children ? '#ccc' : colorScale(d.value));
+```
+
+---
+
+### Partition Layout
+
+**Use**: Sunburst, icicle charts
+
+```javascript
+const partition = d3.partition().size([2 * Math.PI, radius]);
+partition(root);
+
+const arc = d3.arc()
+  .startAngle(d => d.x0)
+  .endAngle(d => d.x1)
+  .innerRadius(d => d.y0)
+  .outerRadius(d => d.y1);
+
+svg.selectAll('path')
+  .data(root.descendants())
+  .join('path')
+    .attr('d', arc)
+    .attr('fill', d => colorScale(d.depth));
+```
+
+---
+
+## Geographic Maps
+
+### GeoJSON
+
+```json
+{
+  "type": "FeatureCollection",
+  "features": [
+    {
+      "type": "Feature",
+      "geometry": {
+        "type": "Polygon",
+        "coordinates": [[[-100, 40], [-100, 50], [-90, 50], [-90, 40], [-100, 40]]]
+      },
+      "properties": {"name": "Region A"}
+    }
+  ]
+}
+```
+
+---
+
+### Projections
+
+```javascript
+// Mercator (world maps)
+const projection = d3.geoMercator()
+  .center([0, 0])
+  .scale(150)
+  .translate([width / 2, height / 2]);
+
+// Albers (US maps)
+const projection = d3.geoAlbersUsa().scale(1000).translate([width / 2, height / 2]);
+
+// Orthographic (globe)
+const projection = d3.geoOrthographic().scale(250).translate([width / 2, height / 2]);
+```
+
+**Common Projections**: Mercator, Albers, Equirectangular, Orthographic, Azimuthal, Conic
+
+---
+
+### Path Generator
+
+```javascript
+const path = d3.geoPath().projection(projection);
+
+d3.json('countries.geojson').then(geojson => {
+  svg.selectAll('path')
+    .data(geojson.features)
+    .join('path')
+      .attr('d', path)
+      .attr('fill', '#ccc')
+      .attr('stroke', '#fff');
+});
+```
+
+---
+
+### Auto-Fit
+
+```javascript
+const projection = d3.geoMercator()
+  .fitExtent([[0, 0], [width, height]], geojson);
+```
+
+**fitExtent**: Automatically scales/centers projection to fit data in bounds.
+
+---
+
+### Choropleth
+
+```javascript
+const colorScale = d3.scaleSequential(d3.interpolateBlues)
+  .domain([0, d3.max(data, d => d.value)]);
+
+svg.selectAll('path')
+  .data(geojson.features)
+  .join('path')
+    .attr('d', path)
+    .attr('fill', d => {
+      const value = data.find(v => v.id === d.id)?.value;
+      return value ? colorScale(value) : '#ccc';
+    });
+```
+
+---
+
+## Chord Diagrams
+
+### Use Case
+Visualize flows/relationships between entities (migrations, trade, connections).
+
+---
+
+### Data Format
+
+```javascript
+const matrix = [
+  [0, 10, 20],    // From A to: A, B, C
+  [15, 0, 5],     // From B to: A, B, C
+  [25, 30, 0]     // From C to: A, B, C
+];
+```
+
+**Matrix[i][j]**: Flow from entity i to entity j.
+
+---
+
+### Creating Chord
+
+```javascript
+const chord = d3.chord()
+  .padAngle(0.05)
+  .sortSubgroups(d3.descending);
+
+const chords = chord(matrix);
+
+const arc = d3.arc()
+  .innerRadius(200)
+  .outerRadius(220);
+
+const ribbon = d3.ribbon()
+  .radius(200);
+
+// Outer arcs (groups)
+svg.selectAll('g')
+  .data(chords.groups)
+  .join('path')
+    .attr('d', arc)
+    .attr('fill', (d, i) => colorScale(i));
+
+// Inner ribbons (connections)
+svg.selectAll('path.ribbon')
+  .data(chords)
+  .join('path')
+    .attr('class', 'ribbon')
+    .attr('d', ribbon)
+    .attr('fill', d => colorScale(d.source.index))
+    .attr('opacity', 0.7);
+```
+
+---
+
+## Choosing Layouts
+
+| Visualization | Layout |
+|---------------|--------|
+| Network graph, organic clusters | Force simulation |
+| Org chart, file tree (node-link) | Tree layout |
+| Space-filling rectangles | Treemap |
+| Bubble chart, circle packing | Pack layout |
+| Sunburst, icicle chart | Partition layout |
+| World/regional maps | Geographic projection |
+| Flow diagram (migration, trade) | Chord diagram |
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Not Handling Tick Updates
+
+```javascript
+// WRONG - positions never update
+const node = svg.selectAll('circle').data(nodes).join('circle');
+simulation.on('tick', () => {});  // Empty!
+
+// CORRECT
+simulation.on('tick', () => {
+  node.attr('cx', d => d.x).attr('cy', d => d.y);
+});
+```
+
+---
+
+### Pitfall 2: Wrong Hierarchy Accessor
+
+```javascript
+// WRONG - d3.hierarchy expects nested structure
+const root = d3.hierarchy(flatArray);
+
+// CORRECT - nest data first or use stratify
+const root = d3.hierarchy(nestedObject);
+```
+
+---
+
+### Pitfall 3: Forgetting to Apply Layout
+
+```javascript
+// WRONG - root has no x, y properties
+const root = d3.hierarchy(data);
+svg.selectAll('circle').data(root.descendants()).join('circle').attr('cx', d => d.x);
+
+// CORRECT - apply layout first
+const tree = d3.tree().size([width, height]);
+tree(root);  // Now root.descendants() have x, y
+```
+
+---
+
+## Next Steps
+
+- See complete workflows: [Workflows](workflows.md)
+- Add interactions: [Transitions & Interactions](transitions-interactions.md)
+- Use code templates: [Common Patterns](common-patterns.md)
--- a/skills/d3-visualization/resources/common-patterns.md
+++ b/skills/d3-visualization/resources/common-patterns.md
@@ -0,0 +1,221 @@
+# Common D3 Patterns - Code Templates
+
+## Bar Chart Template
+
+```javascript
+const data = [{label: 'A', value: 30}, {label: 'B', value: 80}, {label: 'C', value: 45}];
+
+const xScale = d3.scaleBand()
+  .domain(data.map(d => d.label))
+  .range([0, width])
+  .padding(0.1);
+
+const yScale = d3.scaleLinear()
+  .domain([0, d3.max(data, d => d.value)])
+  .range([height, 0]);
+
+svg.selectAll('rect')
+  .data(data)
+  .join('rect')
+    .attr('x', d => xScale(d.label))
+    .attr('y', d => yScale(d.value))
+    .attr('width', xScale.bandwidth())
+    .attr('height', d => height - yScale(d.value))
+    .attr('fill', 'steelblue');
+
+svg.append('g').attr('transform', `translate(0, ${height})`).call(d3.axisBottom(xScale));
+svg.append('g').call(d3.axisLeft(yScale));
+```
+
+## Line Chart Template
+
+```javascript
+const parseDate = d3.timeParse('%Y-%m-%d');
+data.forEach(d => { d.date = parseDate(d.date); });
+
+const xScale = d3.scaleTime()
+  .domain(d3.extent(data, d => d.date))
+  .range([0, width]);
+
+const yScale = d3.scaleLinear()
+  .domain([0, d3.max(data, d => d.value)])
+  .range([height, 0]);
+
+const line = d3.line()
+  .x(d => xScale(d.date))
+  .y(d => yScale(d.value));
+
+svg.append('path')
+  .datum(data)
+  .attr('d', line)
+  .attr('fill', 'none')
+  .attr('stroke', 'steelblue')
+  .attr('stroke-width', 2);
+
+svg.append('g').attr('transform', `translate(0, ${height})`).call(d3.axisBottom(xScale));
+svg.append('g').call(d3.axisLeft(yScale));
+```
+
+## Scatter Plot Template
+
+```javascript
+const xScale = d3.scaleLinear()
+  .domain(d3.extent(data, d => d.x))
+  .range([0, width]);
+
+const yScale = d3.scaleLinear()
+  .domain(d3.extent(data, d => d.y))
+  .range([height, 0]);
+
+const colorScale = d3.scaleOrdinal(d3.schemeCategory10);
+
+svg.selectAll('circle')
+  .data(data)
+  .join('circle')
+    .attr('cx', d => xScale(d.x))
+    .attr('cy', d => yScale(d.y))
+    .attr('r', 5)
+    .attr('fill', d => colorScale(d.category))
+    .attr('opacity', 0.7);
+```
+
+## Network Graph Template
+
+```javascript
+const simulation = d3.forceSimulation(nodes)
+  .force('link', d3.forceLink(links).id(d => d.id))
+  .force('charge', d3.forceManyBody().strength(-100))
+  .force('center', d3.forceCenter(width / 2, height / 2));
+
+const link = svg.selectAll('line')
+  .data(links)
+  .join('line')
+    .attr('stroke', '#999');
+
+const node = svg.selectAll('circle')
+  .data(nodes)
+  .join('circle')
+    .attr('r', 10)
+    .attr('fill', d => d3.schemeCategory10[d.group]);
+
+simulation.on('tick', () => {
+  link.attr('x1', d => d.source.x).attr('y1', d => d.source.y)
+      .attr('x2', d => d.target.x).attr('y2', d => d.target.y);
+  node.attr('cx', d => d.x).attr('cy', d => d.y);
+});
+```
+
+## Treemap Template
+
+```javascript
+const root = d3.hierarchy(data)
+  .sum(d => d.value)
+  .sort((a, b) => b.value - a.value);
+
+const treemap = d3.treemap().size([width, height]).padding(1);
+treemap(root);
+
+svg.selectAll('rect')
+  .data(root.leaves())
+  .join('rect')
+    .attr('x', d => d.x0)
+    .attr('y', d => d.y0)
+    .attr('width', d => d.x1 - d.x0)
+    .attr('height', d => d.y1 - d.y0)
+    .attr('fill', d => d3.interpolateBlues(d.value / root.value));
+
+svg.selectAll('text')
+  .data(root.leaves())
+  .join('text')
+    .attr('x', d => d.x0 + 5)
+    .attr('y', d => d.y0 + 15)
+    .text(d => d.data.name);
+```
+
+## Geographic Map Template
+
+```javascript
+d3.json('world.geojson').then(geojson => {
+  const projection = d3.geoMercator()
+    .fitExtent([[0, 0], [width, height]], geojson);
+
+  const path = d3.geoPath().projection(projection);
+
+  svg.selectAll('path')
+    .data(geojson.features)
+    .join('path')
+      .attr('d', path)
+      .attr('fill', '#ccc')
+      .attr('stroke', '#fff');
+});
+```
+
+## Zoom and Pan Pattern
+
+```javascript
+const zoom = d3.zoom()
+  .scaleExtent([0.5, 5])
+  .on('zoom', (event) => g.attr('transform', event.transform));
+
+svg.call(zoom);
+
+const g = svg.append('g');
+g.selectAll('circle').data(data).join('circle')...
+```
+
+## Brush Selection Pattern
+
+```javascript
+const brush = d3.brush()
+  .extent([[0, 0], [width, height]])
+  .on('end', (event) => {
+    if (!event.selection) return;
+    const [[x0, y0], [x1, y1]] = event.selection;
+    const selected = data.filter(d => {
+      const x = xScale(d.x), y = yScale(d.y);
+      return x >= x0 && x <= x1 && y >= y0 && y <= y1;
+    });
+    updateChart(selected);
+  });
+
+svg.append('g').attr('class', 'brush').call(brush);
+```
+
+## Transition Pattern
+
+```javascript
+function update(newData) {
+  yScale.domain([0, d3.max(newData, d => d.value)]);
+
+  svg.selectAll('rect')
+    .data(newData)
+    .join('rect')
+    .transition().duration(750).ease(d3.easeCubicOut)
+      .attr('y', d => yScale(d.value))
+      .attr('height', d => height - yScale(d.value));
+
+  svg.select('.y-axis').transition().duration(750).call(d3.axisLeft(yScale));
+}
+```
+
+## Tooltip Pattern
+
+```javascript
+const tooltip = d3.select('body').append('div')
+  .attr('class', 'tooltip')
+  .style('position', 'absolute')
+  .style('visibility', 'hidden');
+
+svg.selectAll('circle')
+  .data(data)
+  .join('circle')
+    .attr('r', 5)
+    .on('mouseover', (event, d) => {
+      tooltip.style('visibility', 'visible').html(`Value: ${d.value}`);
+    })
+    .on('mousemove', (event) => {
+      tooltip.style('top', (event.pageY - 10) + 'px')
+             .style('left', (event.pageX + 10) + 'px');
+    })
+    .on('mouseout', () => tooltip.style('visibility', 'hidden'));
+```
--- a/skills/d3-visualization/resources/evaluation-rubric.json
+++ b/skills/d3-visualization/resources/evaluation-rubric.json
@@ -0,0 +1,87 @@
+{
+  "skill_name": "d3-visualization",
+  "version": "1.0",
+  "threshold": 3.5,
+  "criteria": [
+    {
+      "name": "Completeness",
+      "weight": 1.5,
+      "description": "Covers full D3 workflow from setup to advanced layouts",
+      "levels": {
+        "5": "Complete coverage: selections, data joins, scales, shapes, layouts, transitions, interactions with examples",
+        "3": "Core concepts covered but missing advanced topics (force layouts, geographic maps) or practical examples",
+        "1": "Incomplete: missing critical concepts like scales or data joins"
+      }
+    },
+    {
+      "name": "Clarity",
+      "weight": 1.4,
+      "description": "Explanations are clear with WHY sections and code examples",
+      "levels": {
+        "5": "Every concept has WHY explanation, clear examples, and common pitfalls documented",
+        "3": "Most concepts explained but some lack context or examples",
+        "1": "Unclear explanations, missing examples, jargon without definitions"
+      }
+    },
+    {
+      "name": "Actionability",
+      "weight": 1.5,
+      "description": "Provides copy-paste templates and step-by-step workflows",
+      "levels": {
+        "5": "Complete code templates for all common charts, step-by-step workflows with checklists",
+        "3": "Some templates and workflows but incomplete or missing key patterns",
+        "1": "Theoretical only, no actionable code or workflows"
+      }
+    },
+    {
+      "name": "Structure",
+      "weight": 1.3,
+      "description": "Organized with clear navigation and progressive learning path",
+      "levels": {
+        "5": "Interactive hub, clear workflows, logical resource organization, all content linked",
+        "3": "Organized but navigation unclear or some content orphaned",
+        "1": "Disorganized, hard to navigate, no clear entry points"
+      }
+    },
+    {
+      "name": "Triggers",
+      "weight": 1.4,
+      "description": "YAML description clearly defines WHEN to use skill",
+      "levels": {
+        "5": "Trigger-focused description with specific use cases, anti-triggers documented",
+        "3": "Some triggers mentioned but vague or missing anti-triggers",
+        "1": "Generic description, unclear when to use"
+      }
+    },
+    {
+      "name": "Resource Quality",
+      "weight": 1.3,
+      "description": "Resources are self-contained, under 500 lines, with WHY/WHAT structure",
+      "levels": {
+        "5": "All resources <500 lines, WHY sections present, no orphaned content",
+        "3": "Some resources over limit or missing WHY sections",
+        "1": "Resources over 500 lines, missing structure"
+      }
+    },
+    {
+      "name": "User Collaboration",
+      "weight": 1.2,
+      "description": "Skill creation involved user at decision points",
+      "levels": {
+        "5": "User approved at each major step, validated structure and scope",
+        "3": "Some user input but limited collaboration",
+        "1": "No user collaboration"
+      }
+    },
+    {
+      "name": "File Size",
+      "weight": 1.4,
+      "description": "All files under size limits",
+      "levels": {
+        "5": "SKILL.md <500 lines, all resources <500 lines",
+        "3": "Some files over limits",
+        "1": "Multiple files significantly over limits"
+      }
+    }
+  ]
+}
--- a/skills/d3-visualization/resources/getting-started.md
+++ b/skills/d3-visualization/resources/getting-started.md
@@ -0,0 +1,496 @@
+# Getting Started with D3.js
+
+## Why Learn D3
+
+**Problem D3 Solves**: Chart libraries (Chart.js, Highcharts, Plotly) offer pre-made chart types with limited customization. When you need bespoke visualizations—unusual chart types, specific interactions, custom layouts—you need low-level control. D3 provides primitive building blocks that compose into any visualization.
+
+**Trade-off**: Steeper learning curve and more code than chart libraries, but unlimited customization and flexibility.
+
+**When D3 is the Right Choice**:
+- Custom visualization designs not available in libraries
+- Complex interactions (linked views, custom brushing, coordinated highlighting)
+- Network graphs, force-directed layouts, geographic maps
+- Full control over visual encoding, transitions, animations
+- Integration with data pipelines for real-time dashboards
+
+---
+
+## Prerequisites
+
+D3 assumes you know these web technologies:
+
+**Required**:
+- **HTML**: Document structure, elements, attributes
+- **SVG**: Scalable Vector Graphics (`<svg>`, `<circle>`, `<rect>`, `<path>`, `viewBox`, transforms)
+- **CSS**: Styling, selectors, specificity
+- **JavaScript**: ES6 syntax, functions, arrays, objects, arrow functions, promises, async/await
+
+**Helpful**:
+- Modern frameworks (React, Vue) for integration patterns
+- Data formats (CSV, JSON, GeoJSON)
+- Basic statistics (distributions, correlations) for visualization design
+
+**If you lack prerequisites**: Learn HTML/SVG/CSS/JavaScript fundamentals first. D3 documentation assumes this knowledge.
+
+---
+
+## Setup
+
+### Option 1: CodePen (Recommended for Learning)
+
+**Quick start without tooling**:
+
+1. **Create HTML file** (`index.html`):
+```html
+<!DOCTYPE html>
+<html>
+<head>
+  <meta charset="utf-8">
+  <title>D3 Visualization</title>
+  <style>
+    svg { border: 1px solid #ccc; }
+  </style>
+</head>
+<body>
+  <svg width="600" height="400"></svg>
+  <script type="module" src="index.js"></script>
+</body>
+</html>
+```
+
+2. **Create JavaScript file** (`index.js`):
+```javascript
+// Import from Skypack CDN (ESM)
+import * as d3 from 'https://cdn.skypack.dev/d3@7';
+
+// Your D3 code here
+console.log(d3.version);
+```
+
+3. **Open HTML in browser** or use Live Server extension in VS Code
+
+---
+
+### Option 2: Bundler (Vite, Webpack)
+
+**For production apps**:
+
+1. **Install D3**:
+```bash
+npm install d3
+```
+
+2. **Import modules**:
+```javascript
+// Import specific modules (recommended - smaller bundle)
+import { select, selectAll } from 'd3-selection';
+import { scaleLinear, scaleBand } from 'd3-scale';
+import { axisBottom, axisLeft } from 'd3-axis';
+
+// Or import entire D3 namespace
+import * as d3 from 'd3';
+```
+
+3. **Use in your code**:
+```javascript
+const svg = d3.select('svg');
+```
+
+---
+
+### Option 3: UMD Script Tag (Legacy)
+
+```html
+<script src="https://d3js.org/d3.v7.min.js"></script>
+<script>
+  // d3 available globally
+  const svg = d3.select('svg');
+</script>
+```
+
+**Note**: Modern approach uses ES modules (options 1-2).
+
+---
+
+## Core Concepts
+
+### Concept 1: Selections
+
+**What**: D3 selections are wrappers around DOM elements enabling method chaining for manipulation.
+
+**Why**: Declarative syntax; modify multiple elements concisely.
+
+**Basic Pattern**:
+```javascript
+// Select single element
+const svg = d3.select('svg');
+
+// Select all matching elements
+const circles = d3.selectAll('circle');
+
+// Modify attributes
+circles.attr('r', 10).attr('fill', 'steelblue');
+
+// Add elements
+svg.append('circle').attr('cx', 50).attr('cy', 50).attr('r', 20);
+```
+
+**Methods**:
+- `.select(selector)`: First matching element
+- `.selectAll(selector)`: All matching elements
+- `.attr(name, value)`: Set attribute
+- `.style(name, value)`: Set CSS style
+- `.text(value)`: Set text content
+- `.append(type)`: Create and append child
+- `.remove()`: Delete elements
+
+---
+
+### Concept 2: Data Joins
+
+**What**: Binding data arrays to DOM elements, establishing one-to-one correspondence.
+
+**Why**: Enables data-driven visualizations; DOM automatically reflects data changes.
+
+**Pattern**:
+```javascript
+const data = [10, 20, 30, 40, 50];
+
+svg.selectAll('circle')
+  .data(data)               // Bind array to selection
+  .join('circle')           // Create/update/remove elements to match data
+    .attr('cx', (d, i) => i * 50 + 25)  // d = datum, i = index
+    .attr('cy', 50)
+    .attr('r', d => d);     // Radius = data value
+```
+
+**What `.join()` does**:
+- **Enter**: Creates new elements for array items without elements
+- **Update**: Updates existing elements
+- **Exit**: Removes elements without corresponding array items
+
+---
+
+### Concept 3: Scales
+
+**What**: Functions mapping data domain (input range) to visual range (output values).
+
+**Why**: Data values (e.g., 0-1000) don't directly map to pixels. Scales handle transformation.
+
+**Example**:
+```javascript
+const data = [{x: 0}, {x: 50}, {x: 100}];
+
+// Create scale
+const xScale = d3.scaleLinear()
+  .domain([0, 100])         // Data min/max
+  .range([0, 500]);         // Pixel min/max
+
+// Use scale
+svg.selectAll('circle')
+  .data(data)
+  .join('circle')
+    .attr('cx', d => xScale(d.x));  // 0→0px, 50→250px, 100→500px
+```
+
+**Common scales**:
+- `scaleLinear()`: Quantitative, proportional
+- `scaleBand()`: Categorical, for bars
+- `scaleTime()`: Temporal data
+- `scaleOrdinal()`: Categorical → categories (colors)
+
+---
+
+### Concept 4: Method Chaining
+
+**What**: D3 methods return the selection, enabling sequential calls.
+
+**Why**: Concise, readable code that mirrors workflow steps.
+
+**Example**:
+```javascript
+d3.select('svg')
+  .append('g')
+  .attr('transform', 'translate(50, 50)')
+  .selectAll('rect')
+  .data(data)
+  .join('rect')
+  .attr('x', (d, i) => i * 40)
+  .attr('y', d => height - d.value)
+  .attr('width', 30)
+  .attr('height', d => d.value)
+  .attr('fill', 'steelblue');
+```
+
+Reads like: "Select SVG → append group → position group → select rects → bind data → create rects → set attributes"
+
+---
+
+## First Visualization: Simple Bar Chart
+
+### Goal
+Create a vertical bar chart from array `[30, 80, 45, 60, 20, 90, 50]`.
+
+### Setup SVG Container
+
+```javascript
+// Dimensions
+const width = 500;
+const height = 300;
+const margin = {top: 20, right: 20, bottom: 30, left: 40};
+const innerWidth = width - margin.left - margin.right;
+const innerHeight = height - margin.top - margin.bottom;
+
+// Create SVG
+const svg = d3.select('svg')
+  .attr('width', width)
+  .attr('height', height);
+
+// Create inner group for margins
+const g = svg.append('g')
+  .attr('transform', `translate(${margin.left}, ${margin.top})`);
+```
+
+**Why margins?** Leave space for axes outside the data area.
+
+---
+
+### Create Data and Scales
+
+```javascript
+// Data
+const data = [30, 80, 45, 60, 20, 90, 50];
+
+// X scale (categorical - bar positions)
+const xScale = d3.scaleBand()
+  .domain(d3.range(data.length))  // [0, 1, 2, 3, 4, 5, 6]
+  .range([0, innerWidth])
+  .padding(0.1);                  // 10% spacing between bars
+
+// Y scale (quantitative - bar heights)
+const yScale = d3.scaleLinear()
+  .domain([0, d3.max(data)])      // 0 to 90
+  .range([innerHeight, 0]);       // Inverted (SVG y increases downward)
+```
+
+**Why inverted y-range?** SVG y-axis increases downward; we want high values at top.
+
+---
+
+### Create Bars
+
+```javascript
+g.selectAll('rect')
+  .data(data)
+  .join('rect')
+    .attr('x', (d, i) => xScale(i))
+    .attr('y', d => yScale(d))
+    .attr('width', xScale.bandwidth())
+    .attr('height', d => innerHeight - yScale(d))
+    .attr('fill', 'steelblue');
+```
+
+**Breakdown**:
+- `x`: Bar position from xScale
+- `y`: Top of bar from yScale
+- `width`: Automatic from scaleBand
+- `height`: Distance from bar top to bottom (innerHeight - y)
+
+---
+
+### Add Axes
+
+```javascript
+// X axis
+g.append('g')
+  .attr('transform', `translate(0, ${innerHeight})`)
+  .call(d3.axisBottom(xScale));
+
+// Y axis
+g.append('g')
+  .call(d3.axisLeft(yScale));
+```
+
+**What `.call()` does**: Invokes function with selection as argument. `d3.axisBottom(xScale)` is a function that generates axis.
+
+---
+
+### Complete Code
+
+```javascript
+import * as d3 from 'https://cdn.skypack.dev/d3@7';
+
+const width = 500, height = 300;
+const margin = {top: 20, right: 20, bottom: 30, left: 40};
+const innerWidth = width - margin.left - margin.right;
+const innerHeight = height - margin.top - margin.bottom;
+
+const data = [30, 80, 45, 60, 20, 90, 50];
+
+const svg = d3.select('svg').attr('width', width).attr('height', height);
+const g = svg.append('g').attr('transform', `translate(${margin.left}, ${margin.top})`);
+
+const xScale = d3.scaleBand()
+  .domain(d3.range(data.length))
+  .range([0, innerWidth])
+  .padding(0.1);
+
+const yScale = d3.scaleLinear()
+  .domain([0, d3.max(data)])
+  .range([innerHeight, 0]);
+
+g.selectAll('rect')
+  .data(data)
+  .join('rect')
+    .attr('x', (d, i) => xScale(i))
+    .attr('y', d => yScale(d))
+    .attr('width', xScale.bandwidth())
+    .attr('height', d => innerHeight - yScale(d))
+    .attr('fill', 'steelblue');
+
+g.append('g').attr('transform', `translate(0, ${innerHeight})`).call(d3.axisBottom(xScale));
+g.append('g').call(d3.axisLeft(yScale));
+```
+
+---
+
+## Loading Data
+
+### CSV Files
+
+**File**: `data.csv`
+```
+name,value
+A,30
+B,80
+C,45
+```
+
+**Load and Parse**:
+```javascript
+d3.csv('data.csv').then(data => {
+  // data = [{name: 'A', value: '30'}, ...] (values are strings!)
+
+  // Convert strings to numbers
+  data.forEach(d => { d.value = +d.value; });
+
+  // Or use conversion function
+  d3.csv('data.csv', d => ({
+    name: d.name,
+    value: +d.value
+  })).then(data => {
+    // Now d.value is number
+    createVisualization(data);
+  });
+});
+```
+
+**Key Points**:
+- Returns Promise
+- Values are strings by default (CSV has no types)
+- Use `+` operator or `parseFloat()` for numbers
+- Use `d3.timeParse()` for dates
+
+---
+
+### JSON Files
+
+**File**: `data.json`
+```json
+[
+  {"name": "A", "value": 30},
+  {"name": "B", "value": 80}
+]
+```
+
+**Load**:
+```javascript
+d3.json('data.json').then(data => {
+  // data already has correct types
+  createVisualization(data);
+});
+```
+
+**Advantage**: Types preserved (numbers are numbers, not strings).
+
+---
+
+### Async/Await Pattern
+
+```javascript
+async function init() {
+  const data = await d3.csv('data.csv', d => ({
+    name: d.name,
+    value: +d.value
+  }));
+
+  createVisualization(data);
+}
+
+init();
+```
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Forgetting Data Type Conversion
+
+```javascript
+// WRONG - CSV values are strings
+d3.csv('data.csv').then(data => {
+  const max = d3.max(data, d => d.value);  // String comparison! '9' > '80'
+});
+
+// CORRECT
+d3.csv('data.csv').then(data => {
+  data.forEach(d => { d.value = +d.value; });
+  const max = d3.max(data, d => d.value);  // Numeric comparison
+});
+```
+
+---
+
+### Pitfall 2: Inverted Y-Axis
+
+```javascript
+// WRONG - bars upside down
+const yScale = scaleLinear().domain([0, 100]).range([0, height]);
+
+// CORRECT - high values at top
+const yScale = scaleLinear().domain([0, 100]).range([height, 0]);
+```
+
+---
+
+### Pitfall 3: Missing Margins
+
+```javascript
+// WRONG - axes overlap chart
+svg.selectAll('rect').attr('x', d => xScale(d.x))...
+
+// CORRECT - use margin convention
+const g = svg.append('g').attr('transform', `translate(${margin.left}, ${margin.top})`);
+g.selectAll('rect').attr('x', d => xScale(d.x))...
+```
+
+---
+
+### Pitfall 4: Not Using .call() for Axes
+
+```javascript
+// WRONG - won't work
+g.append('g').axisBottom(xScale);
+
+// CORRECT
+g.append('g').call(d3.axisBottom(xScale));
+```
+
+---
+
+## Next Steps
+
+- **Understand data joins**: [Selections & Data Joins](selections-datajoins.md)
+- **Master scales**: [Scales & Axes](scales-axes.md)
+- **Build more charts**: [Workflows](workflows.md)
+- **Add interactions**: [Transitions & Interactions](transitions-interactions.md)
--- a/skills/d3-visualization/resources/scales-axes.md
+++ b/skills/d3-visualization/resources/scales-axes.md
@@ -0,0 +1,497 @@
+# Scales & Axes
+
+## Why Scales Matter
+
+**The Problem**: Data values (0-1000, dates, categories) don't directly map to visual properties (pixels 0-500, colors, positions). Hardcoding transformations (`x = value * 0.5`) breaks when data changes.
+
+**D3's Solution**: Scale functions encapsulate domain-to-range mapping. Change data → update domain → visualization adapts automatically.
+
+**Key Principle**: "Separate data space from visual space." Scales are the bridge.
+
+---
+
+## Scale Fundamentals
+
+### Domain and Range
+
+```javascript
+const scale = d3.scaleLinear()
+  .domain([0, 100])      // Data min/max (input)
+  .range([0, 500]);      // Visual min/max (output)
+
+scale(0);    // → 0px
+scale(50);   // → 250px
+scale(100);  // → 500px
+```
+
+**Domain**: Input data extent
+**Range**: Output visual extent
+**Scale**: Function mapping domain → range
+
+---
+
+## Scale Types
+
+### Continuous → Continuous
+
+#### scaleLinear
+
+**Use**: Quantitative data, proportional relationships
+
+```javascript
+const xScale = d3.scaleLinear()
+  .domain([0, d3.max(data, d => d.value)])
+  .range([0, width]);
+```
+
+**Math**: Linear interpolation `y = mx + b`
+
+---
+
+#### scaleSqrt
+
+**Use**: Sizing circles by area (not radius)
+
+```javascript
+const radiusScale = d3.scaleSqrt()
+  .domain([0, 1000000])  // Population
+  .range([0, 50]);       // Max radius
+
+// Circle area ∝ population (perceptually accurate)
+```
+
+**Why**: `Area = πr²`, so `r = √(Area)`. Sqrt scale makes area proportional to data.
+
+---
+
+#### scalePow
+
+**Use**: Custom exponents
+
+```javascript
+const scale = d3.scalePow()
+  .exponent(2)           // Quadratic
+  .domain([0, 100])
+  .range([0, 500]);
+```
+
+---
+
+#### scaleLog
+
+**Use**: Exponential data (orders of magnitude)
+
+```javascript
+const scale = d3.scaleLog()
+  .domain([1, 1000])     // Don't include 0!
+  .range([0, 500]);
+
+scale(1);    // → 0
+scale(10);   // → 167
+scale(100);  // → 333
+scale(1000); // → 500
+```
+
+**Note**: Log undefined at 0. Domain must be positive.
+
+---
+
+#### scaleTime
+
+**Use**: Temporal data
+
+```javascript
+const xScale = d3.scaleTime()
+  .domain([new Date(2020, 0, 1), new Date(2020, 11, 31)])
+  .range([0, width]);
+```
+
+**Works with**: Date objects, timestamps
+
+---
+
+#### scaleSequential
+
+**Use**: Continuous data → color gradients
+
+```javascript
+const colorScale = d3.scaleSequential(d3.interpolateBlues)
+  .domain([0, 100]);
+
+colorScale(0);    // → light blue
+colorScale(50);   // → medium blue
+colorScale(100);  // → dark blue
+```
+
+**Interpolators**: `interpolateBlues`, `interpolateReds`, `interpolateViridis`, `interpolateRainbow` (avoid rainbow!)
+
+---
+
+### Continuous → Discrete
+
+#### scaleQuantize
+
+**Use**: Divide continuous domain into discrete bins
+
+```javascript
+const colorScale = d3.scaleQuantize()
+  .domain([0, 100])
+  .range(['green', 'yellow', 'orange', 'red']);
+
+colorScale(20);   // → 'green' (0-25)
+colorScale(40);   // → 'yellow' (25-50)
+colorScale(75);   // → 'orange' (50-75)
+colorScale(95);   // → 'red' (75-100)
+```
+
+**Equal bins**: Domain divided evenly by range length.
+
+---
+
+#### scaleQuantile
+
+**Use**: Divide data into quantiles (equal-sized groups)
+
+```javascript
+const colorScale = d3.scaleQuantile()
+  .domain(data.map(d => d.value))  // Actual data values
+  .range(['green', 'yellow', 'orange', 'red']);
+
+// Quartiles: 25% of data in each color
+```
+
+**Difference from quantize**: Bins have equal data counts, not equal domain width.
+
+---
+
+#### scaleThreshold
+
+**Use**: Explicit breakpoints
+
+```javascript
+const colorScale = d3.scaleThreshold()
+  .domain([40, 60, 80])          // Split points
+  .range(['green', 'yellow', 'orange', 'red']);
+
+colorScale(30);   // → 'green' (<40)
+colorScale(50);   // → 'yellow' (40-60)
+colorScale(70);   // → 'orange' (60-80)
+colorScale(90);   // → 'red' (≥80)
+```
+
+**Use for**: Letter grades, risk levels, custom categories.
+
+---
+
+### Discrete → Discrete
+
+#### scaleOrdinal
+
+**Use**: Map categories to categories (often colors)
+
+```javascript
+const colorScale = d3.scaleOrdinal()
+  .domain(['A', 'B', 'C'])
+  .range(['red', 'green', 'blue']);
+
+colorScale('A');  // → 'red'
+colorScale('B');  // → 'green'
+```
+
+**Built-in schemes**: `d3.schemeCategory10`, `d3.schemeTableau10`
+
+```javascript
+const colorScale = d3.scaleOrdinal(d3.schemeCategory10);
+```
+
+---
+
+#### scaleBand
+
+**Use**: Position categorical bars
+
+```javascript
+const xScale = d3.scaleBand()
+  .domain(['A', 'B', 'C', 'D'])
+  .range([0, 500])
+  .padding(0.1);          // 10% spacing
+
+xScale('A');              // → 0
+xScale('B');              // → 125
+xScale.bandwidth();       // → 112.5 (width of each band)
+```
+
+**Key method**: `.bandwidth()` returns width for bars/columns.
+
+---
+
+#### scalePoint
+
+**Use**: Position categorical points (scatter, line chart categories)
+
+```javascript
+const xScale = d3.scalePoint()
+  .domain(['A', 'B', 'C', 'D'])
+  .range([0, 500])
+  .padding(0.5);
+
+xScale('A');              // → 50 (centered in first position)
+xScale('B');              // → 200
+```
+
+**Difference from band**: Points (no width), bands (width for bars).
+
+---
+
+## Scale Selection Guide
+
+**Quantitative linear**: scaleLinear | **Exponential**: scaleLog | **Circle sizing**: scaleSqrt
+**Temporal**: scaleTime | **Categorical bars**: scaleBand | **Categorical points**: scalePoint
+**Categorical colors**: scaleOrdinal | **Bins (equal domain)**: scaleQuantize | **Bins (equal data)**: scaleQuantile
+**Custom breakpoints**: scaleThreshold | **Color gradient**: scaleSequential
+
+---
+
+## Scale Methods
+
+### Domain from Data
+
+```javascript
+// Extent (min, max)
+const xScale = scaleLinear()
+  .domain(d3.extent(data, d => d.x))  // [min, max]
+  .range([0, width]);
+
+// Max only (0 to max)
+const yScale = scaleLinear()
+  .domain([0, d3.max(data, d => d.y)])
+  .range([height, 0]);
+
+// Custom
+const yScale = scaleLinear()
+  .domain([-100, 100])  // Symmetric around 0
+  .range([height, 0]);
+```
+
+---
+
+### Inversion
+
+```javascript
+const xScale = scaleLinear()
+  .domain([0, 100])
+  .range([0, 500]);
+
+xScale(50);           // → 250 (data → visual)
+xScale.invert(250);   // → 50 (visual → data)
+```
+
+**Use**: Convert mouse position to data value.
+
+---
+
+### Clamping
+
+```javascript
+const scale = scaleLinear()
+  .domain([0, 100])
+  .range([0, 500])
+  .clamp(true);
+
+scale(-10);   // → 0 (clamped to range min)
+scale(150);   // → 500 (clamped to range max)
+```
+
+**Without clamp**: `scale(150)` → 750 (extrapolates beyond range).
+
+---
+
+### Nice Domains
+
+```javascript
+const yScale = scaleLinear()
+  .domain([0.201, 0.967])
+  .range([height, 0])
+  .nice();
+
+yScale.domain();  // → [0.2, 1.0] (rounded)
+```
+
+**Use**: Clean axis labels (0.2, 0.4, 0.6 instead of 0.201, 0.401...).
+
+---
+
+## Axes
+
+### Creating Axes
+
+```javascript
+// Scales first
+const xScale = scaleBand().domain(categories).range([0, width]);
+const yScale = scaleLinear().domain([0, max]).range([height, 0]);
+
+// Axis generators
+const xAxis = d3.axisBottom(xScale);
+const yAxis = d3.axisLeft(yScale);
+
+// Render axes
+svg.append('g')
+  .attr('transform', `translate(0, ${height})`)  // Position at bottom
+  .call(xAxis);
+
+svg.append('g')
+  .call(yAxis);
+```
+
+**Axis orientations**:
+- `axisBottom`: Ticks below, labels below
+- `axisTop`: Ticks above, labels above
+- `axisLeft`: Ticks left, labels left
+- `axisRight`: Ticks right, labels right
+
+---
+
+### Customizing Ticks
+
+```javascript
+// Number of ticks (approximate)
+yAxis.ticks(5);  // D3 chooses ~5 "nice" values
+
+// Explicit tick values
+xAxis.tickValues([0, 25, 50, 75, 100]);
+
+// Tick format
+yAxis.tickFormat(d => d + '%');        // Add percent sign
+yAxis.tickFormat(d3.format('.2f'));    // 2 decimal places
+yAxis.tickFormat(d3.timeFormat('%b')); // Month abbreviations
+```
+
+---
+
+### Tick Styling
+
+```javascript
+yAxis.tickSize(10).tickPadding(5);  // Length and spacing
+```
+
+---
+
+
+### Updating Axes
+
+```javascript
+function update(newData) {
+  // Update scale domain
+  yScale.domain([0, d3.max(newData, d => d.value)]);
+
+  // Update axis with transition
+  svg.select('.y-axis')
+    .transition()
+    .duration(500)
+    .call(d3.axisLeft(yScale));
+}
+```
+
+---
+
+## Color Scales
+
+### Categorical Colors
+
+```javascript
+const color = d3.scaleOrdinal(d3.schemeCategory10);  // Built-in scheme
+const color = d3.scaleOrdinal().domain(['low', 'medium', 'high']).range(['green', 'yellow', 'red']);  // Custom
+```
+
+---
+
+### Sequential & Diverging Colors
+
+```javascript
+// Sequential
+const color = d3.scaleSequential(d3.interpolateBlues).domain([0, 100]);
+// Interpolators: Blues, Reds, Greens, Viridis, Plasma, Inferno (avoid Rainbow!)
+
+// Diverging
+const color = d3.scaleDiverging(d3.interpolateRdYlGn).domain([-100, 0, 100]);
+```
+
+---
+
+## Common Patterns
+
+```javascript
+// Responsive: update range on resize
+const xScale = scaleLinear().domain([0, 100]).range([0, width]);
+
+// Multi-scale chart
+const xScale = scaleTime().domain(dateExtent).range([0, width]);
+const colorScale = scaleOrdinal(schemeCategory10);
+const sizeScale = scaleSqrt().domain([0, maxPop]).range([0, 50]);
+
+// Symmetric domain
+const max = d3.max(data, d => Math.abs(d.value));
+const yScale = scaleLinear().domain([-max, max]).range([height, 0]);
+```
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Forgetting to Invert Y Range
+
+```javascript
+// WRONG - high values at bottom
+const yScale = scaleLinear().domain([0, 100]).range([0, height]);
+
+// CORRECT - high values at top
+const yScale = scaleLinear().domain([0, 100]).range([height, 0]);
+```
+
+---
+
+### Pitfall 2: Log Scale with Zero
+
+```javascript
+// WRONG - log(0) undefined
+const scale = scaleLog().domain([0, 1000]);
+
+// CORRECT - start at small positive number
+const scale = scaleLog().domain([1, 1000]);
+```
+
+---
+
+### Pitfall 3: Not Updating Domain
+
+```javascript
+// WRONG - scale domain never changes
+const yScale = scaleLinear().domain([0, 100]).range([height, 0]);
+update(newData);  // New max might be 200!
+
+// CORRECT
+function update(newData) {
+  yScale.domain([0, d3.max(newData, d => d.value)]);
+  // Re-render with updated scale
+}
+```
+
+---
+
+### Pitfall 4: Using Band Scale for Quantitative Data
+
+```javascript
+// WRONG
+const xScale = scaleBand().domain([0, 10, 20, 30]);  // Numbers!
+
+// CORRECT - use linear for numbers
+const xScale = scaleLinear().domain([0, 30]).range([0, width]);
+```
+
+---
+
+## Next Steps
+
+- Use scales in charts: [Workflows](workflows.md)
+- Combine with shape generators: [Shapes & Layouts](shapes-layouts.md)
+- Add to interactive charts: [Transitions & Interactions](transitions-interactions.md)
--- a/skills/d3-visualization/resources/selections-datajoins.md
+++ b/skills/d3-visualization/resources/selections-datajoins.md
@@ -0,0 +1,494 @@
+# Selections & Data Joins
+
+## Why Data Joins Matter
+
+**The Problem**: Manually creating, updating, and removing DOM elements for dynamic data is error-prone. You must track which elements correspond to which data items, handle array length changes, and avoid orphaned elements.
+
+**D3's Solution**: Data joins establish declarative one-to-one correspondence between data arrays and DOM elements. Specify desired end state; D3 handles lifecycle automatically.
+
+**Key Principle**: "Join data to elements, then manipulate elements using data." Changes to data propagate to visuals through re-joining.
+
+---
+
+## Selections Fundamentals
+
+### Creating Selections
+
+```javascript
+// Select first matching element
+const svg = d3.select('svg');
+const firstCircle = d3.select('circle');
+
+// Select all matching elements
+const allCircles = d3.selectAll('circle');
+const allRects = d3.selectAll('rect');
+
+// CSS selectors supported
+const blueCircles = d3.selectAll('circle.blue');
+const chartGroup = d3.select('#chart');
+```
+
+**Key Methods**:
+- `select(selector)`: Returns selection with first match
+- `selectAll(selector)`: Returns selection with all matches
+- Empty selection if no matches (not null)
+
+---
+
+### Modifying Elements
+
+```javascript
+// Set attributes
+selection.attr('r', 10);              // Set radius to 10
+selection.attr('fill', 'steelblue');  // Set fill color
+
+// Set styles
+selection.style('opacity', 0.7);
+selection.style('stroke-width', '2px');
+
+// Set classes
+selection.classed('active', true);    // Add class
+selection.classed('hidden', false);   // Remove class
+
+// Set text
+selection.text('Label');
+
+// Set HTML
+selection.html('<strong>Bold</strong>');
+
+// Set properties (DOM properties, not attributes)
+selection.property('checked', true);  // For checkboxes, etc.
+```
+
+**Attribute vs Style**:
+- Use `.attr()` for SVG attributes: `x`, `y`, `r`, `fill`, `stroke`
+- Use `.style()` for CSS properties: `opacity`, `font-size`, `color`
+
+---
+
+### Creating and Removing Elements
+
+```javascript
+// Append child element
+const g = svg.append('g');           // Returns new selection
+g.attr('transform', 'translate(50, 50)');
+
+// Insert before sibling
+svg.insert('rect', 'circle');        // Insert rect before first circle
+
+// Remove elements
+selection.remove();                   // Delete from DOM
+```
+
+---
+
+### Method Chaining
+
+All modification methods return the selection, enabling chains:
+
+```javascript
+d3.select('svg')
+  .append('circle')
+  .attr('cx', 100)
+  .attr('cy', 100)
+  .attr('r', 50)
+  .attr('fill', 'steelblue')
+  .style('opacity', 0.7)
+  .on('click', handleClick);
+```
+
+**Why Chaining Works**: Methods mutate selection and return it.
+
+---
+
+## Data Join Pattern
+
+### Basic Join
+
+```javascript
+const data = [10, 20, 30, 40, 50];
+
+svg.selectAll('circle')      // Select all circles (may be empty)
+  .data(data)                // Bind data array
+  .join('circle')            // Create/update/remove to match data
+    .attr('r', d => d);      // Set attributes using data
+```
+
+**What `.join()` Does**:
+1. **Enter**: Creates `<circle>` elements for data items without elements (5 circles created if none exist)
+2. **Update**: Updates existing circles if some already exist
+3. **Exit**: Removes circles if data shrinks (e.g., data becomes `[10, 20]`, 3 circles removed)
+
+---
+
+### Accessor Functions
+
+Pass functions to `.attr()` and `.style()` receiving `(datum, index)` parameters:
+
+```javascript
+.attr('cx', (d, i) => i * 50 + 25)  // d = data value, i = index
+.attr('cy', 100)                    // Static value
+.attr('r', d => d)                  // Radius = data value
+```
+
+**Pattern**: `(d, i) => expression`
+- `d`: Datum (current array element)
+- `i`: Index in array (0-based)
+
+---
+
+### Data with Objects
+
+```javascript
+const data = [
+  {name: 'A', value: 30, color: 'red'},
+  {name: 'B', value: 80, color: 'blue'},
+  {name: 'C', value: 45, color: 'green'}
+];
+
+svg.selectAll('circle')
+  .data(data)
+  .join('circle')
+    .attr('r', d => d.value)       // Access object properties
+    .attr('fill', d => d.color);
+```
+
+---
+
+## Enter, Exit, Update Pattern
+
+For fine-grained control (especially with custom transitions):
+
+```javascript
+svg.selectAll('circle')
+  .data(data)
+  .join(
+    // Enter: new elements
+    enter => enter.append('circle')
+      .attr('r', 0)                 // Start small
+      .call(enter => enter.transition().attr('r', d => d.value)),
+
+    // Update: existing elements
+    update => update
+      .call(update => update.transition().attr('r', d => d.value)),
+
+    // Exit: removing elements
+    exit => exit
+      .call(exit => exit.transition().attr('r', 0).remove())
+  );
+```
+
+**When to Use**:
+- Custom enter animations (fade in, slide in)
+- Custom exit animations (fade out, fall down)
+- Different behavior for enter vs update
+
+**When NOT Needed**: Simple updates use `.join('circle').transition()...` after join.
+
+---
+
+## Key Functions (Object Constancy)
+
+**Problem**: By default, data join matches by index. If data reorders, elements don't track items correctly.
+
+```javascript
+// Without key function
+const data1 = [{id: 'A', value: 30}, {id: 'B', value: 80}];
+const data2 = [{id: 'B', value: 90}, {id: 'A', value: 40}];  // Reordered!
+
+// Element 0 bound to A (30), then B (90) - wrong!
+// Element 1 bound to B (80), then A (40) - wrong!
+```
+
+**Solution**: Key function returns unique identifier:
+
+```javascript
+svg.selectAll('circle')
+  .data(data, d => d.id)            // Key function
+  .join('circle')
+    .attr('r', d => d.value);
+
+// Now element tracks data item by ID, not position
+```
+
+**Key Function Signature**: `(datum) => uniqueIdentifier`
+
+**When to Use**:
+- Data array reorders
+- Data array filters (items removed/added)
+- Transitions where element identity matters
+
+---
+
+## Updating Scales
+
+When data changes, update scale domains:
+
+```javascript
+function update(newData) {
+  // Recalculate domain
+  yScale.domain([0, d3.max(newData, d => d.value)]);
+
+  // Update axis
+  svg.select('.y-axis')
+    .transition()
+    .duration(500)
+    .call(d3.axisLeft(yScale));
+
+  // Update bars
+  svg.selectAll('rect')
+    .data(newData, d => d.id)       // Key function
+    .join('rect')
+    .transition()
+    .duration(500)
+      .attr('y', d => yScale(d.value))
+      .attr('height', d => height - yScale(d.value));
+}
+```
+
+---
+
+## Event Handling
+
+```javascript
+selection.on('click', function(event, d) {
+  // `this` = DOM element
+  // `event` = mouse event object
+  // `d` = bound datum
+
+  console.log('Clicked', d);
+  d3.select(this).attr('fill', 'red');
+});
+```
+
+**Common Events**:
+- Mouse: `click`, `mouseover`, `mouseout`, `mousemove`
+- Touch: `touchstart`, `touchend`, `touchmove`
+- Drag: Use `d3.drag()` behavior instead
+- Zoom: Use `d3.zoom()` behavior instead
+
+**Event Object Properties**:
+- `event.pageX`, `event.pageY`: Mouse position
+- `event.target`: DOM element
+- `event.preventDefault()`: Prevent default behavior
+
+---
+
+## Selection Iteration
+
+```javascript
+// Apply function to each element
+selection.each(function(d, i) {
+  // `this` = DOM element
+  // `d` = datum
+  // `i` = index
+  console.log(d);
+});
+
+// Call function with selection
+function styleCircles(selection) {
+  selection
+    .attr('stroke', 'black')
+    .attr('stroke-width', 2);
+}
+
+svg.selectAll('circle').call(styleCircles);
+```
+
+**Use `.call()` for**: Reusable functions, axes, behaviors (drag, zoom).
+
+---
+
+## Filtering and Sorting
+
+```javascript
+// Filter selection
+const largeCircles = svg.selectAll('circle')
+  .filter(d => d.value > 50);
+
+largeCircles.attr('fill', 'red');
+
+// Sort elements in DOM
+svg.selectAll('circle')
+  .sort((a, b) => a.value - b.value);  // Ascending order
+```
+
+**Note**: `.sort()` reorders DOM elements, affecting document flow and rendering order.
+
+---
+
+## Nested Selections
+
+```javascript
+// Parent selection
+const groups = svg.selectAll('g')
+  .data(data)
+  .join('g');
+
+// Child selection within each group
+groups.selectAll('circle')
+  .data(d => [d, d, d])     // 3 circles per group
+  .join('circle')
+    .attr('r', 5);
+```
+
+**Pattern**: Each parent element gets its own child selection.
+
+---
+
+## Selection vs Element
+
+```javascript
+// Selection (D3 wrapper)
+const selection = d3.select('circle');
+selection.attr('r', 10);              // D3 methods
+
+// Raw DOM element
+const element = selection.node();
+element.setAttribute('r', 10);        // Native DOM methods
+```
+
+**Use `.node()` to**:
+- Access native DOM properties
+- Use with non-D3 libraries
+- Perform operations D3 doesn't support
+
+---
+
+## Common Patterns
+
+### Pattern 1: Update Function
+
+```javascript
+function update(newData) {
+  const circles = svg.selectAll('circle')
+    .data(newData, d => d.id);
+
+  circles.join('circle')
+    .attr('cx', d => xScale(d.x))
+    .attr('cy', d => yScale(d.y))
+    .attr('r', 5);
+}
+
+// Call on data change
+update(data);
+```
+
+---
+
+### Pattern 2: Conditional Styling
+
+```javascript
+svg.selectAll('circle')
+  .data(data)
+  .join('circle')
+    .attr('r', 5)
+    .attr('fill', d => d.value > 50 ? 'red' : 'steelblue');
+```
+
+---
+
+### Pattern 3: Data-Driven Classes
+
+```javascript
+svg.selectAll('rect')
+  .data(data)
+  .join('rect')
+    .classed('high', d => d.value > 80)
+    .classed('medium', d => d.value > 40 && d.value <= 80)
+    .classed('low', d => d.value <= 40);
+```
+
+---
+
+### Pattern 4: Multiple Attributes
+
+```javascript
+selection.attr('stroke', 'black').attr('stroke-width', 2).attr('fill', 'none');
+```
+
+---
+
+## Debugging
+
+```javascript
+console.log(selection.size());       // Number of elements
+console.log(selection.nodes());      // Array of DOM elements
+console.log(selection.data());       // Array of bound data
+```
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Forgetting to Return Accessor Value
+
+```javascript
+// WRONG - undefined attribute
+.attr('r', d => { console.log(d); })
+
+// CORRECT
+.attr('r', d => {
+  console.log(d);
+  return d.value;  // Explicit return
+})
+
+// Or use arrow function implicit return
+.attr('r', d => d.value)
+```
+
+---
+
+### Pitfall 2: Not Using Key Function for Dynamic Data
+
+```javascript
+// WRONG - elements don't track items
+.data(dynamicData)
+
+// CORRECT
+.data(dynamicData, d => d.id)
+```
+
+---
+
+### Pitfall 3: Selecting Before Elements Exist
+
+```javascript
+// WRONG - circles don't exist yet
+const circles = d3.selectAll('circle');
+svg.append('circle');  // This circle not in `circles` selection
+
+// CORRECT - select after creating
+svg.append('circle');
+const circles = d3.selectAll('circle');
+
+// Or use enter selection
+const circles = svg.selectAll('circle')
+  .data(data)
+  .join('circle');  // Creates circles, returns selection
+```
+
+---
+
+### Pitfall 4: Modifying Selection Doesn't Update Variable
+
+```javascript
+const circles = svg.selectAll('circle');
+circles.attr('fill', 'red');           // OK - modifies elements
+
+circles.data(newData).join('circle');  // Returns NEW selection
+// `circles` variable still references OLD selection
+
+// CORRECT - reassign
+circles = svg.selectAll('circle')
+  .data(newData)
+  .join('circle');
+```
+
+---
+
+## Next Steps
+
+- Learn scale functions: [Scales & Axes](scales-axes.md)
+- See data join in action: [Workflows](workflows.md)
+- Add transitions: [Transitions & Interactions](transitions-interactions.md)
--- a/skills/d3-visualization/resources/shapes-layouts.md
+++ b/skills/d3-visualization/resources/shapes-layouts.md
@@ -0,0 +1,490 @@
+# Shapes & Layouts
+
+## Why Shape Generators Matter
+
+**The Problem**: Creating SVG paths manually for lines, areas, and arcs requires complex math and string concatenation. `<path d="M0,50 L10,40 L20,60...">` is tedious and error-prone.
+
+**D3's Solution**: Shape generators convert data arrays into SVG path strings automatically. You configure the generator, pass data, get path string.
+
+**Key Principle**: "Configure generator once, reuse for all data." Separation of shape logic from data.
+
+---
+
+## Line Generator
+
+### Basic Usage
+
+```javascript
+const data = [
+  {x: 0, y: 50},
+  {x: 100, y: 80},
+  {x: 200, y: 40},
+  {x: 300, y: 90}
+];
+
+const line = d3.line()
+  .x(d => d.x)
+  .y(d => d.y);
+
+svg.append('path')
+  .datum(data)              // Use .datum() for single item
+  .attr('d', line)          // line(data) generates path string
+  .attr('fill', 'none')
+  .attr('stroke', 'steelblue')
+  .attr('stroke-width', 2);
+```
+
+**Key Methods**:
+- `.x(accessor)`: Function returning x coordinate
+- `.y(accessor)`: Function returning y coordinate
+- Returns path string when called with data
+
+---
+
+### With Scales
+
+```javascript
+const xScale = d3.scaleLinear().domain([0, 300]).range([0, width]);
+const yScale = d3.scaleLinear().domain([0, 100]).range([height, 0]);
+
+const line = d3.line()
+  .x(d => xScale(d.x))
+  .y(d => yScale(d.y));
+```
+
+---
+
+### Curves
+
+```javascript
+const line = d3.line()
+  .x(d => xScale(d.x))
+  .y(d => yScale(d.y))
+  .curve(d3.curveMonotoneX);  // Smooth curve
+```
+
+**Curve Types**:
+- `d3.curveLinear`: Straight lines (default)
+- `d3.curveMonotoneX`: Smooth, preserves monotonicity
+- `d3.curveCatmullRom`: Smooth, passes through points
+- `d3.curveStep`: Step function
+- `d3.curveBasis`: B-spline
+
+---
+
+### Missing Data
+
+```javascript
+const line = d3.line()
+  .x(d => xScale(d.x))
+  .y(d => yScale(d.y))
+  .defined(d => d.y !== null);  // Skip null values
+```
+
+---
+
+## Area Generator
+
+### Basic Usage
+
+```javascript
+const area = d3.area()
+  .x(d => xScale(d.x))
+  .y0(height)               // Baseline (bottom)
+  .y1(d => yScale(d.y));    // Top line
+
+svg.append('path')
+  .datum(data)
+  .attr('d', area)
+  .attr('fill', 'steelblue')
+  .attr('opacity', 0.5);
+```
+
+**Key Methods**:
+- `.y0()`: Baseline (constant or function)
+- `.y1()`: Top boundary (usually data-driven)
+
+---
+
+### Stacked Area Chart
+
+```javascript
+const stack = d3.stack()
+  .keys(['series1', 'series2', 'series3']);
+
+const series = stack(data);  // Returns array of series
+
+const area = d3.area()
+  .x(d => xScale(d.data.x))
+  .y0(d => yScale(d[0]))    // Lower bound
+  .y1(d => yScale(d[1]));   // Upper bound
+
+svg.selectAll('path')
+  .data(series)
+  .join('path')
+    .attr('d', area)
+    .attr('fill', (d, i) => colorScale(i));
+```
+
+---
+
+## Arc Generator
+
+### Basic Usage
+
+```javascript
+const arc = d3.arc()
+  .innerRadius(0)           // 0 = pie, >0 = donut
+  .outerRadius(100);
+
+const arcData = {
+  startAngle: 0,
+  endAngle: Math.PI / 2    // 90 degrees
+};
+
+svg.append('path')
+  .datum(arcData)
+  .attr('d', arc)
+  .attr('fill', 'steelblue');
+```
+
+**Angle Units**: Radians (0 to 2π)
+
+---
+
+### Donut Chart
+
+```javascript
+const arc = d3.arc()
+  .innerRadius(50)
+  .outerRadius(100);
+
+const pie = d3.pie()
+  .value(d => d.value);
+
+const arcs = pie(data);    // Generates angles
+
+svg.selectAll('path')
+  .data(arcs)
+  .join('path')
+    .attr('d', arc)
+    .attr('fill', (d, i) => colorScale(i));
+```
+
+---
+
+### Rounded Corners
+
+```javascript
+const arc = d3.arc()
+  .innerRadius(50)
+  .outerRadius(100)
+  .cornerRadius(5);        // Rounded edges
+```
+
+---
+
+### Labels
+
+```javascript
+// Use centroid for label positioning
+svg.selectAll('text')
+  .data(arcs)
+  .join('text')
+    .attr('transform', d => `translate(${arc.centroid(d)})`)
+    .attr('text-anchor', 'middle')
+    .text(d => d.data.label);
+```
+
+---
+
+## Pie Generator
+
+### Basic Usage
+
+```javascript
+const data = [30, 80, 45, 60, 20];
+
+const pie = d3.pie();
+
+const arcs = pie(data);
+// Returns: [{data: 30, startAngle: 0, endAngle: 0.53, ...}, ...]
+```
+
+**What Pie Does**: Converts values → angles. Use with arc generator for rendering.
+
+---
+
+### With Objects
+
+```javascript
+const data = [
+  {name: 'A', value: 30},
+  {name: 'B', value: 80},
+  {name: 'C', value: 45}
+];
+
+const pie = d3.pie()
+  .value(d => d.value);
+
+const arcs = pie(data);
+```
+
+---
+
+### Sorting
+
+```javascript
+const pie = d3.pie()
+  .value(d => d.value)
+  .sort((a, b) => b.value - a.value);  // Descending
+```
+
+---
+
+### Padding
+
+```javascript
+const pie = d3.pie()
+  .value(d => d.value)
+  .padAngle(0.02);         // Gap between slices
+```
+
+---
+
+## Stack Generator
+
+### Basic Usage
+
+```javascript
+const data = [
+  {month: 'Jan', apples: 30, oranges: 20, bananas: 40},
+  {month: 'Feb', apples: 50, oranges: 30, bananas: 35},
+  {month: 'Mar', apples: 40, oranges: 25, bananas: 45}
+];
+
+const stack = d3.stack()
+  .keys(['apples', 'oranges', 'bananas']);
+
+const series = stack(data);
+// Returns: [[{0: 0, 1: 30, data: {month: 'Jan', ...}}, ...], [...], [...]]
+```
+
+**Output**: Array of series, each with `[lower, upper]` bounds for stacking.
+
+---
+
+### Stacked Bar Chart
+
+```javascript
+const xScale = d3.scaleBand()
+  .domain(data.map(d => d.month))
+  .range([0, width])
+  .padding(0.1);
+
+const yScale = d3.scaleLinear()
+  .domain([0, d3.max(series, s => d3.max(s, d => d[1]))])
+  .range([height, 0]);
+
+svg.selectAll('g')
+  .data(series)
+  .join('g')
+    .attr('fill', (d, i) => colorScale(i))
+  .selectAll('rect')
+  .data(d => d)
+  .join('rect')
+    .attr('x', d => xScale(d.data.month))
+    .attr('y', d => yScale(d[1]))
+    .attr('height', d => yScale(d[0]) - yScale(d[1]))
+    .attr('width', xScale.bandwidth());
+```
+
+---
+
+### Offsets
+
+```javascript
+// Default: stacked on top of each other
+const stack = d3.stack().keys(keys);
+
+// Normalized (0-1)
+const stack = d3.stack().keys(keys).offset(d3.stackOffsetExpand);
+
+// Centered (streamgraph)
+const stack = d3.stack().keys(keys).offset(d3.stackOffsetWiggle);
+```
+
+---
+
+## Symbol Generator
+
+### Basic Usage
+
+```javascript
+const symbol = d3.symbol()
+  .type(d3.symbolCircle)
+  .size(100);              // Area in square pixels
+
+svg.append('path')
+  .attr('d', symbol())
+  .attr('fill', 'steelblue');
+```
+
+**Symbol Types**:
+- `symbolCircle`, `symbolCross`, `symbolDiamond`, `symbolSquare`, `symbolStar`, `symbolTriangle`, `symbolWye`
+
+---
+
+### In Scatter Plots
+
+```javascript
+const symbol = d3.symbol()
+  .type(d => d3.symbolCircle)
+  .size(d => sizeScale(d.value));
+
+svg.selectAll('path')
+  .data(data)
+  .join('path')
+    .attr('d', symbol)
+    .attr('transform', d => `translate(${xScale(d.x)}, ${yScale(d.y)})`);
+```
+
+---
+
+## Complete Examples
+
+### Line Chart
+
+```javascript
+const xScale = d3.scaleTime().domain(d3.extent(data, d => d.date)).range([0, width]);
+const yScale = d3.scaleLinear().domain([0, d3.max(data, d => d.value)]).range([height, 0]);
+
+const line = d3.line()
+  .x(d => xScale(d.date))
+  .y(d => yScale(d.value))
+  .curve(d3.curveMonotoneX);
+
+svg.append('path')
+  .datum(data)
+  .attr('d', line)
+  .attr('fill', 'none')
+  .attr('stroke', 'steelblue')
+  .attr('stroke-width', 2);
+
+svg.append('g').attr('transform', `translate(0, ${height})`).call(d3.axisBottom(xScale));
+svg.append('g').call(d3.axisLeft(yScale));
+```
+
+---
+
+### Area Chart
+
+```javascript
+const area = d3.area()
+  .x(d => xScale(d.date))
+  .y0(height)
+  .y1(d => yScale(d.value))
+  .curve(d3.curveMonotoneX);
+
+svg.append('path')
+  .datum(data)
+  .attr('d', area)
+  .attr('fill', 'steelblue')
+  .attr('opacity', 0.5);
+```
+
+---
+
+### Donut Chart
+
+```javascript
+const pie = d3.pie().value(d => d.value);
+const arc = d3.arc().innerRadius(50).outerRadius(100);
+
+const arcs = pie(data);
+
+svg.selectAll('path')
+  .data(arcs)
+  .join('path')
+    .attr('d', arc)
+    .attr('fill', (d, i) => colorScale(i))
+    .attr('stroke', 'white')
+    .attr('stroke-width', 2);
+
+svg.selectAll('text')
+  .data(arcs)
+  .join('text')
+    .attr('transform', d => `translate(${arc.centroid(d)})`)
+    .attr('text-anchor', 'middle')
+    .text(d => d.data.label);
+```
+
+---
+
+### Stacked Bar Chart
+
+```javascript
+const stack = d3.stack().keys(['series1', 'series2', 'series3']);
+const series = stack(data);
+
+const xScale = d3.scaleBand().domain(data.map(d => d.category)).range([0, width]).padding(0.1);
+const yScale = d3.scaleLinear().domain([0, d3.max(series, s => d3.max(s, d => d[1]))]).range([height, 0]);
+
+svg.selectAll('g')
+  .data(series)
+  .join('g')
+    .attr('fill', (d, i) => colorScale(i))
+  .selectAll('rect')
+  .data(d => d)
+  .join('rect')
+    .attr('x', d => xScale(d.data.category))
+    .attr('y', d => yScale(d[1]))
+    .attr('height', d => yScale(d[0]) - yScale(d[1]))
+    .attr('width', xScale.bandwidth());
+```
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Using .data() Instead of .datum()
+
+```javascript
+// WRONG - creates path per data point
+svg.selectAll('path').data(data).join('path').attr('d', line);
+
+// CORRECT - single path for entire dataset
+svg.append('path').datum(data).attr('d', line);
+```
+
+---
+
+### Pitfall 2: Forgetting fill="none" for Lines
+
+```javascript
+// WRONG - area filled by default
+svg.append('path').datum(data).attr('d', line);
+
+// CORRECT
+svg.append('path').datum(data).attr('d', line).attr('fill', 'none').attr('stroke', 'steelblue');
+```
+
+---
+
+### Pitfall 3: Wrong Angle Units
+
+```javascript
+// WRONG - degrees
+arc.startAngle(90).endAngle(180);
+
+// CORRECT - radians
+arc.startAngle(Math.PI / 2).endAngle(Math.PI);
+```
+
+---
+
+## Next Steps
+
+- Apply to real data: [Workflows](workflows.md)
+- Add transitions: [Transitions & Interactions](transitions-interactions.md)
+- Combine with layouts: [Advanced Layouts](advanced-layouts.md)
--- a/skills/d3-visualization/resources/transitions-interactions.md
+++ b/skills/d3-visualization/resources/transitions-interactions.md
@@ -0,0 +1,415 @@
+# Transitions & Interactions
+
+## Transitions
+
+### Why Transitions Matter
+
+**The Problem**: Instant changes are jarring. Users miss updates or lose track of which element changed.
+
+**D3's Solution**: Smooth interpolation between old and new states over time. Movement helps users track element identity (object constancy).
+
+**When to Use**: Data updates, not initial render. Transitions show *change*, so there must be a previous state.
+
+---
+
+### Basic Pattern
+
+```javascript
+// Without transition (instant)
+selection.attr('r', 10);
+
+// With transition (animated)
+selection.transition().duration(500).attr('r', 10);
+```
+
+**Key**: Insert `.transition()` before `.attr()` or `.style()` calls you want animated.
+
+---
+
+### Duration and Delay
+
+```javascript
+selection
+  .transition()
+  .duration(750)           // Milliseconds
+  .delay(100)              // Wait before starting
+  .attr('r', 20);
+
+// Staggered (delay per element)
+selection
+  .transition()
+  .duration(500)
+  .delay((d, i) => i * 50)  // 0ms, 50ms, 100ms...
+  .attr('opacity', 1);
+```
+
+---
+
+### Easing Functions
+
+```javascript
+selection
+  .transition()
+  .duration(500)
+  .ease(d3.easeCubicOut)   // Fast start, slow finish
+  .attr('r', 20);
+```
+
+**Common Easings**:
+- `d3.easeLinear`: Constant speed
+- `d3.easeCubicIn`: Slow start
+- `d3.easeCubicOut`: Slow finish
+- `d3.easeCubic`: Slow start and finish
+- `d3.easeBounceOut`: Bouncing effect
+- `d3.easeElasticOut`: Elastic spring
+
+---
+
+### Chained Transitions
+
+```javascript
+selection
+  .transition()
+  .duration(500)
+  .attr('r', 20)
+  .transition()            // Second transition starts after first
+  .duration(500)
+  .attr('fill', 'red');
+```
+
+---
+
+### Named Transitions
+
+```javascript
+// Interrupt previous transition with same name
+selection
+  .transition('resize')
+  .duration(500)
+  .attr('r', 20);
+
+// Later (cancels previous 'resize' transition)
+selection
+  .transition('resize')
+  .attr('r', 30);
+```
+
+---
+
+### Update Pattern with Transitions
+
+```javascript
+function update(newData) {
+  // Update scale
+  yScale.domain([0, d3.max(newData, d => d.value)]);
+
+  // Update bars
+  svg.selectAll('rect')
+    .data(newData, d => d.id)  // Key function
+    .join('rect')
+    .transition()
+    .duration(750)
+      .attr('y', d => yScale(d.value))
+      .attr('height', d => height - yScale(d.value));
+
+  // Update axis
+  svg.select('.y-axis')
+    .transition()
+    .duration(750)
+    .call(d3.axisLeft(yScale));
+}
+```
+
+---
+
+### Enter/Exit Transitions
+
+```javascript
+svg.selectAll('circle')
+  .data(data, d => d.id)
+  .join(
+    enter => enter.append('circle')
+      .attr('r', 0)
+      .call(enter => enter.transition().attr('r', 5)),
+
+    update => update
+      .call(update => update.transition().attr('fill', 'blue')),
+
+    exit => exit
+      .call(exit => exit.transition().attr('r', 0).remove())
+  );
+```
+
+---
+
+## Interactions
+
+### Event Handling
+
+```javascript
+selection.on('click', function(event, d) {
+  console.log('Clicked', d);
+  d3.select(this).attr('fill', 'red');
+});
+```
+
+**Common Events**: `click`, `mouseover`, `mouseout`, `mousemove`, `dblclick`
+
+---
+
+### Tooltips
+
+```javascript
+const tooltip = d3.select('body').append('div')
+  .attr('class', 'tooltip')
+  .style('position', 'absolute')
+  .style('visibility', 'hidden')
+  .style('background', '#fff')
+  .style('padding', '5px')
+  .style('border', '1px solid #ccc');
+
+svg.selectAll('circle')
+  .data(data)
+  .join('circle')
+    .attr('r', 5)
+    .on('mouseover', (event, d) => {
+      tooltip.style('visibility', 'visible')
+             .html(`<strong>${d.label}</strong><br/>Value: ${d.value}`);
+    })
+    .on('mousemove', (event) => {
+      tooltip.style('top', (event.pageY - 10) + 'px')
+             .style('left', (event.pageX + 10) + 'px');
+    })
+    .on('mouseout', () => {
+      tooltip.style('visibility', 'hidden');
+    });
+```
+
+---
+
+### Hover Effects
+
+```javascript
+svg.selectAll('rect')
+  .data(data)
+  .join('rect')
+    .attr('fill', 'steelblue')
+    .on('mouseover', function() {
+      d3.select(this)
+        .transition()
+        .duration(200)
+        .attr('fill', 'orange');
+    })
+    .on('mouseout', function() {
+      d3.select(this)
+        .transition()
+        .duration(200)
+        .attr('fill', 'steelblue');
+    });
+```
+
+---
+
+## Drag Behavior
+
+### Basic Dragging
+
+```javascript
+const drag = d3.drag()
+  .on('start', dragstarted)
+  .on('drag', dragged)
+  .on('end', dragended);
+
+function dragstarted(event, d) {
+  d3.select(this).raise().attr('stroke', 'black');
+}
+
+function dragged(event, d) {
+  d3.select(this)
+    .attr('cx', event.x)
+    .attr('cy', event.y);
+}
+
+function dragended(event, d) {
+  d3.select(this).attr('stroke', null);
+}
+
+svg.selectAll('circle').call(drag);
+```
+
+**Event Properties**:
+- `event.x`, `event.y`: New position
+- `event.dx`, `event.dy`: Change from last position
+- `event.subject`: Bound datum
+
+---
+
+### Drag with Constraints
+
+```javascript
+function dragged(event, d) {
+  d3.select(this)
+    .attr('cx', Math.max(0, Math.min(width, event.x)))
+    .attr('cy', Math.max(0, Math.min(height, event.y)));
+}
+```
+
+---
+
+## Zoom and Pan
+
+### Basic Zoom
+
+```javascript
+const zoom = d3.zoom()
+  .scaleExtent([0.5, 5])          // Min/max zoom
+  .on('zoom', zoomed);
+
+function zoomed(event) {
+  g.attr('transform', event.transform);
+}
+
+svg.call(zoom);
+
+// Container for zoomable content
+const g = svg.append('g');
+g.selectAll('circle').data(data).join('circle')...
+```
+
+**Key**: Apply transform to container group, not individual elements.
+
+---
+
+### Programmatic Zoom
+
+```javascript
+// Zoom to specific level
+svg.transition()
+  .duration(750)
+  .call(zoom.transform, d3.zoomIdentity.scale(2));
+
+// Zoom to fit
+svg.transition()
+  .duration(750)
+  .call(zoom.transform, d3.zoomIdentity.translate(100, 100).scale(1.5));
+```
+
+---
+
+### Zoom with Constraints
+
+```javascript
+const zoom = d3.zoom()
+  .scaleExtent([0.5, 5])
+  .translateExtent([[0, 0], [width, height]])  // Pan limits
+  .on('zoom', zoomed);
+```
+
+---
+
+## Brush Selection
+
+### 2D Brush
+
+```javascript
+const brush = d3.brush()
+  .extent([[0, 0], [width, height]])
+  .on('end', brushended);
+
+function brushended(event) {
+  if (!event.selection) return;  // No selection
+  const [[x0, y0], [x1, y1]] = event.selection;
+
+  // Filter data
+  const selected = data.filter(d => {
+    const x = xScale(d.x), y = yScale(d.y);
+    return x >= x0 && x <= x1 && y >= y0 && y <= y1;
+  });
+
+  console.log('Selected', selected);
+}
+
+svg.append('g').attr('class', 'brush').call(brush);
+```
+
+---
+
+### 1D Brush
+
+```javascript
+// X-axis only
+const brushX = d3.brushX()
+  .extent([[0, 0], [width, height]])
+  .on('end', brushended);
+
+// Y-axis only
+const brushY = d3.brushY()
+  .extent([[0, 0], [width, height]])
+  .on('end', brushended);
+```
+
+---
+
+### Clear Brush
+
+```javascript
+svg.select('.brush').call(brush.move, null);  // Clear selection
+```
+
+---
+
+### Programmatic Brush
+
+```javascript
+// Set selection programmatically
+svg.select('.brush').call(brush.move, [[100, 100], [200, 200]]);
+```
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Transition on Initial Render
+
+```javascript
+// WRONG - transition when no previous state
+svg.selectAll('circle').data(data).join('circle').transition().attr('r', 5);
+
+// CORRECT - transition only on updates
+const circles = svg.selectAll('circle').data(data).join('circle').attr('r', 5);
+circles.transition().attr('r', 10);  // Later, on update
+```
+
+---
+
+### Pitfall 2: Not Using Key Function
+
+```javascript
+// WRONG - elements don't track data items
+.data(newData).join('circle').transition().attr('cx', d => xScale(d.x));
+
+// CORRECT - use key function
+.data(newData, d => d.id).join('circle').transition().attr('cx', d => xScale(d.x));
+```
+
+---
+
+### Pitfall 3: Applying Zoom to Wrong Element
+
+```javascript
+// WRONG - zoom affects individual elements
+svg.selectAll('circle').call(zoom);
+
+// CORRECT - zoom affects container
+const g = svg.append('g');
+svg.call(zoom);
+g.selectAll('circle')...  // Elements in zoomable container
+```
+
+---
+
+## Next Steps
+
+- Apply to workflows: [Workflows](workflows.md)
+- See complete examples: [Common Patterns](common-patterns.md)
+- Combine with layouts: [Advanced Layouts](advanced-layouts.md)
--- a/skills/d3-visualization/resources/workflows.md
+++ b/skills/d3-visualization/resources/workflows.md
@@ -0,0 +1,499 @@
+# Workflows
+
+## Bar Chart Workflow
+
+### Problem
+Display categorical data with quantitative values as vertical bars.
+
+### Steps
+
+1. **Prepare data**
+```javascript
+const data = [
+  {category: 'A', value: 30},
+  {category: 'B', value: 80},
+  {category: 'C', value: 45}
+];
+```
+
+2. **Create SVG container**
+```javascript
+const margin = {top: 20, right: 20, bottom: 30, left: 40};
+const width = 600 - margin.left - margin.right;
+const height = 400 - margin.top - margin.bottom;
+
+const svg = d3.select('#chart')
+  .append('svg')
+    .attr('width', width + margin.left + margin.right)
+    .attr('height', height + margin.top + margin.bottom)
+  .append('g')
+    .attr('transform', `translate(${margin.left}, ${margin.top})`);
+```
+
+3. **Create scales**
+```javascript
+const xScale = d3.scaleBand()
+  .domain(data.map(d => d.category))
+  .range([0, width])
+  .padding(0.1);
+
+const yScale = d3.scaleLinear()
+  .domain([0, d3.max(data, d => d.value)])
+  .range([height, 0]);
+```
+
+4. **Create axes**
+```javascript
+svg.append('g')
+  .attr('class', 'x-axis')
+  .attr('transform', `translate(0, ${height})`)
+  .call(d3.axisBottom(xScale));
+
+svg.append('g')
+  .attr('class', 'y-axis')
+  .call(d3.axisLeft(yScale));
+```
+
+5. **Draw bars**
+```javascript
+svg.selectAll('rect')
+  .data(data)
+  .join('rect')
+    .attr('x', d => xScale(d.category))
+    .attr('y', d => yScale(d.value))
+    .attr('width', xScale.bandwidth())
+    .attr('height', d => height - yScale(d.value))
+    .attr('fill', 'steelblue');
+```
+
+### Key Concepts
+- **scaleBand**: Positions categorical bars with automatic spacing
+- **Inverted Y range**: `[height, 0]` puts origin at bottom-left
+- **Height calculation**: `height - yScale(d.value)` computes bar height
+
+---
+
+## Line Chart Workflow
+
+### Problem
+Show trends over time or continuous data.
+
+### Steps
+
+1. **Prepare temporal data**
+```javascript
+const data = [
+  {date: new Date(2020, 0, 1), value: 30},
+  {date: new Date(2020, 1, 1), value: 80},
+  {date: new Date(2020, 2, 1), value: 45}
+];
+```
+
+2. **Create scales**
+```javascript
+const xScale = d3.scaleTime()
+  .domain(d3.extent(data, d => d.date))
+  .range([0, width]);
+
+const yScale = d3.scaleLinear()
+  .domain([0, d3.max(data, d => d.value)])
+  .range([height, 0]);
+```
+
+3. **Create line generator**
+```javascript
+const line = d3.line()
+  .x(d => xScale(d.date))
+  .y(d => yScale(d.value))
+  .curve(d3.curveMonotoneX);
+```
+
+4. **Draw line**
+```javascript
+svg.append('path')
+  .datum(data)
+  .attr('d', line)
+  .attr('fill', 'none')
+  .attr('stroke', 'steelblue')
+  .attr('stroke-width', 2);
+```
+
+5. **Add axes**
+```javascript
+svg.append('g')
+  .attr('transform', `translate(0, ${height})`)
+  .call(d3.axisBottom(xScale).ticks(5));
+
+svg.append('g')
+  .call(d3.axisLeft(yScale));
+```
+
+### Key Concepts
+- **scaleTime**: Handles Date objects automatically
+- **d3.extent**: Returns `[min, max]` for domain
+- **.datum() vs .data()**: Use `.datum()` for single path, `.data()` for multiple paths
+- **fill='none'**: Lines need stroke, not fill
+
+---
+
+## Scatter Plot Workflow
+
+### Problem
+Visualize relationship between two quantitative variables.
+
+### Steps
+
+```javascript
+// 1. Prepare data and scales
+const data = [{x: 10, y: 20}, {x: 40, y: 90}, {x: 80, y: 50}];
+
+const xScale = d3.scaleLinear()
+  .domain([0, d3.max(data, d => d.x)])
+  .range([0, width]);
+
+const yScale = d3.scaleLinear()
+  .domain([0, d3.max(data, d => d.y)])
+  .range([height, 0]);
+
+// 2. Draw points
+svg.selectAll('circle')
+  .data(data)
+  .join('circle')
+    .attr('cx', d => xScale(d.x))
+    .attr('cy', d => yScale(d.y))
+    .attr('r', 5)
+    .attr('fill', 'steelblue')
+    .attr('opacity', 0.7);
+
+// 3. Add axes
+svg.append('g').attr('transform', `translate(0, ${height})`).call(d3.axisBottom(xScale));
+svg.append('g').call(d3.axisLeft(yScale));
+```
+
+### Key Concepts
+- **Both linear scales**: X and Y are quantitative
+- **Opacity for overlaps**: Helps see density
+
+---
+
+## Update Pattern Workflow
+
+### Problem
+Dynamically update visualization when data changes.
+
+### Steps
+
+1. **Initial render**
+```javascript
+const svg = d3.select('#chart').append('svg')...
+
+function render(data) {
+  // Update scale domains
+  yScale.domain([0, d3.max(data, d => d.value)]);
+
+  // Update axis
+  svg.select('.y-axis')
+    .transition()
+    .duration(750)
+    .call(d3.axisLeft(yScale));
+
+  // Update bars
+  svg.selectAll('rect')
+    .data(data, d => d.id)  // Key function!
+    .join(
+      enter => enter.append('rect')
+        .attr('x', d => xScale(d.category))
+        .attr('y', height)
+        .attr('width', xScale.bandwidth())
+        .attr('height', 0)
+        .attr('fill', 'steelblue')
+        .call(enter => enter.transition()
+          .duration(750)
+          .attr('y', d => yScale(d.value))
+          .attr('height', d => height - yScale(d.value))),
+
+      update => update
+        .call(update => update.transition()
+          .duration(750)
+          .attr('y', d => yScale(d.value))
+          .attr('height', d => height - yScale(d.value))),
+
+      exit => exit
+        .call(exit => exit.transition()
+          .duration(750)
+          .attr('y', height)
+          .attr('height', 0)
+          .remove())
+    );
+}
+
+// Initial render
+render(initialData);
+```
+
+2. **Update on new data**
+```javascript
+// Later, when data changes
+render(newData);
+```
+
+### Key Concepts
+- **Encapsulate in function**: Reusable render logic
+- **Key function**: Tracks element identity across updates
+- **Enter/Update/Exit**: Custom animations for each lifecycle
+- **Update domain first**: Recalculate scales before rendering
+
+---
+
+## Network Visualization Workflow
+
+### Problem
+Display relationships between entities (nodes and links).
+
+### Steps
+
+1. **Prepare data**
+```javascript
+const nodes = [
+  {id: 'A', group: 1},
+  {id: 'B', group: 1},
+  {id: 'C', group: 2}
+];
+
+const links = [
+  {source: 'A', target: 'B'},
+  {source: 'B', target: 'C'}
+];
+```
+
+2. **Create force simulation**
+```javascript
+const simulation = d3.forceSimulation(nodes)
+  .force('link', d3.forceLink(links).id(d => d.id).distance(100))
+  .force('charge', d3.forceManyBody().strength(-200))
+  .force('center', d3.forceCenter(width / 2, height / 2))
+  .force('collide', d3.forceCollide().radius(20));
+```
+
+3. **Draw links**
+```javascript
+const link = svg.append('g')
+  .selectAll('line')
+  .data(links)
+  .join('line')
+    .attr('stroke', '#999')
+    .attr('stroke-width', 2);
+```
+
+4. **Draw nodes**
+```javascript
+const node = svg.append('g')
+  .selectAll('circle')
+  .data(nodes)
+  .join('circle')
+    .attr('r', 10)
+    .attr('fill', d => d.group === 1 ? 'steelblue' : 'orange');
+```
+
+5. **Update positions on tick**
+```javascript
+simulation.on('tick', () => {
+  link
+    .attr('x1', d => d.source.x)
+    .attr('y1', d => d.source.y)
+    .attr('x2', d => d.target.x)
+    .attr('y2', d => d.target.y);
+
+  node
+    .attr('cx', d => d.x)
+    .attr('cy', d => d.y);
+});
+```
+
+6. **Add drag behavior**
+```javascript
+function drag(simulation) {
+  function dragstarted(event) {
+    if (!event.active) simulation.alphaTarget(0.3).restart();
+    event.subject.fx = event.subject.x;
+    event.subject.fy = event.subject.y;
+  }
+
+  function dragged(event) {
+    event.subject.fx = event.x;
+    event.subject.fy = event.y;
+  }
+
+  function dragended(event) {
+    if (!event.active) simulation.alphaTarget(0);
+    event.subject.fx = null;
+    event.subject.fy = null;
+  }
+
+  return d3.drag()
+    .on('start', dragstarted)
+    .on('drag', dragged)
+    .on('end', dragended);
+}
+
+node.call(drag(simulation));
+```
+
+### Key Concepts
+- **forceSimulation**: Physics-based layout
+- **tick handler**: Updates positions every frame
+- **id accessor**: Links reference nodes by ID
+- **fx/fy**: Fixed positions during drag
+
+---
+
+## Hierarchy Visualization (Treemap) Workflow
+
+### Problem
+Show hierarchical data with nested rectangles.
+
+### Steps
+
+```javascript
+// 1. Prepare and process hierarchy
+const data = {name: 'root', children: [{name: 'A', value: 100}, {name: 'B', value: 200}]};
+
+const root = d3.hierarchy(data)
+  .sum(d => d.value)
+  .sort((a, b) => b.value - a.value);
+
+// 2. Apply layout
+d3.treemap().size([width, height]).padding(1)(root);
+
+// 3. Draw rectangles
+const colorScale = d3.scaleOrdinal(d3.schemeCategory10);
+
+svg.selectAll('rect')
+  .data(root.leaves())
+  .join('rect')
+    .attr('x', d => d.x0)
+    .attr('y', d => d.y0)
+    .attr('width', d => d.x1 - d.x0)
+    .attr('height', d => d.y1 - d.y0)
+    .attr('fill', d => colorScale(d.parent.data.name));
+```
+
+### Key Concepts
+- **d3.hierarchy**: Creates hierarchy from nested object
+- **.sum()**: Aggregates values up tree
+- **x0, y0, x1, y1**: Layout computes rectangle bounds
+
+---
+
+## Geographic Map (Choropleth) Workflow
+
+### Problem
+Visualize regional data on a map.
+
+### Steps
+
+```javascript
+// 1. Load data
+Promise.all([d3.json('countries.geojson'), d3.csv('data.csv')])
+  .then(([geojson, csvData]) => {
+    // 2. Setup projection and path
+    const projection = d3.geoMercator().fitExtent([[0, 0], [width, height]], geojson);
+    const path = d3.geoPath().projection(projection);
+
+    // 3. Create color scale and data lookup
+    const colorScale = d3.scaleSequential(d3.interpolateBlues)
+      .domain([0, d3.max(csvData, d => +d.value)]);
+    const dataById = new Map(csvData.map(d => [d.id, +d.value]));
+
+    // 4. Draw map
+    svg.selectAll('path')
+      .data(geojson.features)
+      .join('path')
+        .attr('d', path)
+        .attr('fill', d => dataById.get(d.id) ? colorScale(dataById.get(d.id)) : '#ccc')
+        .attr('stroke', '#fff');
+  });
+```
+
+### Key Concepts
+- **fitExtent**: Auto-scales projection to fit bounds
+- **geoPath**: Converts GeoJSON to SVG paths
+- **Map for lookup**: Fast data join by ID
+
+---
+
+## Real-Time Updates Workflow
+
+### Problem
+Continuously update visualization with streaming data.
+
+### Steps
+
+```javascript
+// 1. Setup sliding window
+const maxPoints = 50;
+let data = [];
+
+function update(newValue) {
+  data.push({time: new Date(), value: newValue});
+  if (data.length > maxPoints) data.shift();
+
+  // 2. Update scales and render
+  xScale.domain(d3.extent(data, d => d.time));
+  yScale.domain([0, d3.max(data, d => d.value)]);
+
+  svg.select('.x-axis').transition().duration(200).call(d3.axisBottom(xScale));
+  svg.select('.y-axis').transition().duration(200).call(d3.axisLeft(yScale));
+
+  const line = d3.line().x(d => xScale(d.time)).y(d => yScale(d.value));
+  svg.select('.line').datum(data).transition().duration(200).attr('d', line);
+}
+
+// 3. Poll data
+setInterval(() => update(Math.random() * 100), 1000);
+```
+
+### Key Concepts
+- **Sliding window**: Fixed-size buffer
+- **Short transitions**: 200ms feels responsive
+
+---
+
+## Linked Views Workflow
+
+### Problem
+Coordinate highlighting across multiple charts.
+
+### Steps
+
+```javascript
+// 1. Shared event handlers
+function highlight(id) {
+  svg1.selectAll('circle').attr('opacity', d => d.id === id ? 1 : 0.3);
+  svg2.selectAll('rect').attr('opacity', d => d.id === id ? 1 : 0.3);
+}
+
+function unhighlight() {
+  svg1.selectAll('circle').attr('opacity', 1);
+  svg2.selectAll('rect').attr('opacity', 1);
+}
+
+// 2. Attach to elements
+svg1.selectAll('circle')
+  .data(data).join('circle')
+  .on('mouseover', (event, d) => highlight(d.id))
+  .on('mouseout', unhighlight);
+
+svg2.selectAll('rect')
+  .data(data).join('rect')
+  .on('mouseover', (event, d) => highlight(d.id))
+  .on('mouseout', unhighlight);
+```
+
+### Key Concepts
+- **Shared state**: Common ID across views
+- **Coordinated updates**: Single event updates all charts
+
+## Next Steps
+
+- [Getting Started](getting-started.md) | [Common Patterns](common-patterns.md) | [Selections](selections-datajoins.md) | [Scales](scales-axes.md) | [Shapes](shapes-layouts.md)
--- a/skills/data-schema-knowledge-modeling/SKILL.md
+++ b/skills/data-schema-knowledge-modeling/SKILL.md
@@ -0,0 +1,193 @@
+---
+name: data-schema-knowledge-modeling
+description: Use when designing database schemas, need to model domain entities and relationships clearly, building knowledge graphs or ontologies, creating API data models, defining system boundaries and invariants, migrating between data models, establishing taxonomies or hierarchies, user mentions "schema", "data model", "entities", "relationships", "ontology", "knowledge graph", or when scattered/inconsistent data structures need formalization.
+---
+
+# Data Schema & Knowledge Modeling
+
+## Table of Contents
+- [Purpose](#purpose)
+- [When to Use](#when-to-use)
+- [What Is It](#what-is-it)
+- [Workflow](#workflow)
+- [Schema Types](#schema-types)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Create rigorous, validated models of entities, relationships, and constraints that enable correct system implementation, knowledge representation, and semantic reasoning.
+
+## When to Use
+
+**Invoke this skill when you need to:**
+- Design database schema (SQL, NoSQL, graph) for new application
+- Model complex domain with many entities and relationships
+- Build knowledge graph or ontology for semantic search/reasoning
+- Define API data models and contracts
+- Create taxonomies or classification hierarchies
+- Establish data governance and canonical models
+- Migrate legacy schemas to modern architectures
+- Resolve ambiguity in domain concepts and relationships
+- Enable data integration across systems
+- Document system invariants and business rules
+
+**Common trigger phrases:**
+- "Design a schema for..."
+- "Model the entities and relationships"
+- "Create a knowledge graph"
+- "What's the data model?"
+- "Define the ontology"
+- "How should we structure this data?"
+- "Map relationships between..."
+- "Design the API data model"
+
+## What Is It
+
+**Data schema & knowledge modeling** is the process of formally defining:
+
+1. **Entities** - Things that exist (User, Product, Order, Organization)
+2. **Attributes** - Properties of entities (name, price, status, createdAt)
+3. **Relationships** - Connections between entities (User owns Order, Product belongsTo Category)
+4. **Constraints** - Rules and invariants (unique email, price > 0, one primary address)
+5. **Cardinality** - How many of each (one-to-many, many-to-many)
+
+**Quick example:** E-commerce schema:
+- **Entities**: User, Product, Order, Cart, Payment
+- **Relationships**: User has many Orders, Order contains many Products (via OrderItems), User has one Cart
+- **Constraints**: Email must be unique, Order total matches sum of OrderItems, Payment amount equals Order total
+- **Result**: Unambiguous model that prevents data inconsistencies
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Data Schema & Knowledge Modeling Progress:
+- [ ] Step 1: Gather domain requirements and scope
+- [ ] Step 2: Identify entities and attributes
+- [ ] Step 3: Define relationships and cardinality
+- [ ] Step 4: Specify constraints and invariants
+- [ ] Step 5: Validate and document the model
+```
+
+**Step 1: Gather domain requirements and scope**
+
+Ask user for domain description, core use cases (what queries/operations will this support), existing data (if migration/integration), performance/scale requirements, and technology constraints (SQL vs NoSQL vs graph database). Understanding use cases shapes the model - OLTP vs OLAP vs graph traversal require different designs. See [Schema Types](#schema-types) for guidance.
+
+**Step 2: Identify entities and attributes**
+
+Extract nouns from requirements (those are candidate entities). For each entity, list attributes with types and nullability. Use [resources/template.md](resources/template.md) for systematic entity identification. Verify each entity represents a distinct concept with independent lifecycle. Document entity purpose and examples.
+
+**Step 3: Define relationships and cardinality**
+
+Map connections between entities (one-to-one, one-to-many, many-to-many). For many-to-many, identify junction tables/entities. Specify relationship directionality and optionality (can X exist without Y?). Use [resources/methodology.md](resources/methodology.md) for complex relationship patterns like hierarchies, polymorphic associations, and temporal relationships.
+
+**Step 4: Specify constraints and invariants**
+
+Define uniqueness constraints, foreign key relationships, check constraints, and business rules. Document domain invariants (rules that must ALWAYS be true). Identify derived/computed attributes vs stored. Use [resources/methodology.md](resources/methodology.md) for advanced constraint patterns and validation strategies.
+
+**Step 5: Validate and document the model**
+
+Create `data-schema-knowledge-modeling.md` file with complete schema definition. Validate against use cases - can the schema support required queries/operations? Check for normalization (eliminate redundancy) or denormalization (optimize for specific queries). Self-assess using [resources/evaluators/rubric_data_schema_knowledge_modeling.json](resources/evaluators/rubric_data_schema_knowledge_modeling.json). Minimum standard: Average score ≥ 3.5.
+
+## Schema Types
+
+Choose based on use case and technology:
+
+**Relational (SQL) Schema**
+- **Best for:** Transactional systems (OLTP), strong consistency, complex queries with joins
+- **Pattern:** Normalized tables, foreign keys, ACID transactions
+- **Example use cases:** E-commerce orders, banking transactions, HR systems
+- **Key decision:** Normalization level (3NF for consistency vs denormalized for read performance)
+
+**Document/NoSQL Schema**
+- **Best for:** Flexible/evolving structure, high write throughput, denormalized reads
+- **Pattern:** Nested documents, embedded relationships, no joins
+- **Example use cases:** Content management, user profiles, event logs
+- **Key decision:** Embed vs reference (embed for 1-to-few, reference for 1-to-many)
+
+**Graph Schema (Ontology)**
+- **Best for:** Complex relationships, traversal queries, semantic reasoning, knowledge graphs
+- **Pattern:** Nodes (entities), edges (relationships), properties on both
+- **Example use cases:** Social networks, fraud detection, recommendation engines, scientific research
+- **Key decision:** Property graph vs RDF triples
+
+**Event/Time-Series Schema**
+- **Best for:** Audit logs, metrics, IoT data, append-only data
+- **Pattern:** Immutable events, time-based partitioning, aggregation tables
+- **Example use cases:** User activity tracking, monitoring, financial transactions
+- **Key decision:** Raw events vs pre-aggregated summaries
+
+**Dimensional (Data Warehouse) Schema**
+- **Best for:** Analytics (OLAP), aggregations, historical reporting
+- **Pattern:** Fact tables + dimension tables (star/snowflake schema)
+- **Example use cases:** Business intelligence, sales analytics, customer 360
+- **Key decision:** Star schema (denormalized) vs snowflake (normalized dimensions)
+
+## Common Patterns
+
+**Pattern: Entity Lifecycle Modeling**
+Track entity state changes explicitly. Example: Order (draft → pending → confirmed → shipped → delivered → completed/cancelled). Include status field, timestamps for each state, and transitions table if history needed.
+
+**Pattern: Soft Deletes**
+Never physically delete records - add `deletedAt` timestamp. Allows data recovery, audit compliance, and referential integrity. Filter `WHERE deletedAt IS NULL` in queries.
+
+**Pattern: Polymorphic Associations**
+Entity relates to multiple types. Example: Comment can be on Post or Photo. Options: (1) separate foreign keys (commentableType + commentableId), (2) junction tables per type, (3) single table inheritance.
+
+**Pattern: Temporal/Historical Data**
+Track changes over time. Options: (1) Effective/expiry dates per record, (2) separate history table, (3) event sourcing (store all changes as events). Choose based on query patterns.
+
+**Pattern: Multi-tenancy**
+Isolate data per customer. Options: (1) Separate databases (strong isolation), (2) Shared schema with tenantId column (efficient), (3) Separate schemas in same DB (balance). Add tenantId to all queries if shared.
+
+**Pattern: Hierarchies**
+Model trees/nested structures. Options: (1) Adjacency list (parentId), (2) Nested sets (left/right values), (3) Path enumeration (materialized path), (4) Closure table (all ancestor-descendant pairs). Trade-offs between read/write performance.
+
+## Guardrails
+
+**✓ Do:**
+- Start with use cases - schema serves queries/operations
+- Normalize first, then denormalize for specific performance needs
+- Document all constraints and invariants explicitly
+- Use meaningful, consistent naming conventions
+- Consider future evolution - design for extensibility
+- Validate model against ALL required use cases
+- Model the real world accurately (don't force fit to technology)
+
+**✗ Don't:**
+- Design schema in isolation from use cases
+- Premature optimization (denormalize before measuring)
+- Skip constraint definitions (leads to data corruption)
+- Use generic names (data, value, thing) - be specific
+- Ignore cardinality and nullability
+- Model implementation details in domain entities
+- Forget about data migration path from existing systems
+- Create circular dependencies between entities
+
+## Quick Reference
+
+**Resources:**
+- `resources/template.md` - Structured process for entity identification, relationship mapping, and constraint definition
+- `resources/methodology.md` - Advanced patterns: temporal modeling, graph ontologies, schema evolution, normalization strategies
+- `resources/examples/` - Worked examples showing complete schema designs with validation
+- `resources/evaluators/rubric_data_schema_knowledge_modeling.json` - Quality assessment before delivery
+
+**When to choose which resource:**
+- Simple domain (< 10 entities) → Start with template
+- Complex domain or graph/ontology → Study methodology for advanced patterns
+- Need to see examples → Review examples folder
+- Before delivering to user → Always validate with rubric
+
+**Expected deliverable:**
+`data-schema-knowledge-modeling.md` file containing: domain description, complete entity definitions with attributes and types, relationship mappings with cardinality, constraint specifications, diagram (ERD/graph visualization), validation against use cases, and implementation notes.
+
+**Common schema notations:**
+- **ERD** (Entity-Relationship Diagram): Visual representation of entities and relationships
+- **UML Class Diagram**: Object-oriented view with inheritance and associations
+- **Graph Diagram**: Nodes and edges for graph databases
+- **JSON Schema**: API/document structure with validation rules
+- **SQL DDL**: Executable CREATE TABLE statements
+- **Ontology (OWL/RDF)**: Semantic web knowledge representation
--- a/skills/data-schema-knowledge-modeling/resources/evaluators/rubric_data_schema_knowledge_modeling.json
+++ b/skills/data-schema-knowledge-modeling/resources/evaluators/rubric_data_schema_knowledge_modeling.json
@@ -0,0 +1,282 @@
+{
+  "criteria": [
+    {
+      "name": "Entity Identification & Completeness",
+      "description": "Are all domain entities identified? Each with clear purpose, distinct identity, and no redundancy?",
+      "scoring": {
+        "1": "Missing critical entities. Entities poorly defined or overlapping. No clear distinction between entities and attributes.",
+        "2": "Some entities identified but gaps in coverage. Some entity purposes unclear. Minor redundancy.",
+        "3": "Most entities identified with clear purposes. Reasonable coverage. Entities generally distinct.",
+        "4": "All required entities identified with clear, documented purposes. Good examples provided. No redundancy.",
+        "5": "Complete entity coverage validated against all use cases. Each entity has purpose, examples, lifecycle documented. Entity vs value object distinction clear. No overlap or redundancy."
+      }
+    },
+    {
+      "name": "Attribute Definition Quality",
+      "description": "Are attributes complete with appropriate data types, nullability, and constraints?",
+      "scoring": {
+        "1": "Attributes missing or poorly typed. Wrong data types (e.g., money as VARCHAR). Nullability ignored.",
+        "2": "Basic attributes present but some types questionable. Nullability inconsistent. Some constraints missing.",
+        "3": "Attributes defined with reasonable types. Nullability specified. Core constraints present.",
+        "4": "All attributes well-typed (DECIMAL for money, proper VARCHAR lengths). Nullability correctly specified. Constraints documented.",
+        "5": "Comprehensive attribute definitions with justification for types, nullability, defaults, and constraints. Audit fields (createdAt, updatedAt) included where appropriate. No technical debt."
+      }
+    },
+    {
+      "name": "Relationship Modeling Accuracy",
+      "description": "Are relationships correctly identified with proper cardinality, optionality, and implementation?",
+      "scoring": {
+        "1": "Relationships missing or incorrect. Cardinality wrong. M:N modeled without junction table.",
+        "2": "Some relationships identified but cardinality questionable. Missing junction tables or unclear optionality.",
+        "3": "Most relationships mapped with cardinality. Junction tables for M:N. Optionality specified.",
+        "4": "All relationships correctly modeled. Proper cardinality (1:1, 1:N, M:N). Junction tables where needed. Clear optionality.",
+        "5": "Comprehensive relationship documentation with bidirectional naming, implementation details (FKs, ON DELETE actions), and validation that relationships support all use cases. Complex patterns (polymorphic, hierarchical) correctly handled."
+      }
+    },
+    {
+      "name": "Constraint & Invariant Specification",
+      "description": "Are business rules enforced via constraints? Are domain invariants documented and validated?",
+      "scoring": {
+        "1": "No constraints beyond primary keys. Business rules not documented. Invariants missing.",
+        "2": "Basic constraints (NOT NULL, UNIQUE) present but business rules not enforced. Invariants mentioned but not validated.",
+        "3": "Good constraint coverage (PK, FK, UNIQUE, NOT NULL). Some business rules enforced. Invariants documented.",
+        "4": "Comprehensive constraints including CHECK constraints for business rules. All invariants documented with enforcement strategy.",
+        "5": "All constraints documented with rationale. Domain invariants clearly stated and enforced via DB constraints where possible, application logic where not. Validation strategy for complex multi-table invariants. Examples of enforcement code provided."
+      }
+    },
+    {
+      "name": "Normalization & Data Integrity",
+      "description": "Is schema properly normalized (or deliberately denormalized with rationale)?",
+      "scoring": {
+        "1": "Severe normalization violations. Redundant data. Update anomalies likely.",
+        "2": "Some normalization but violations present (partial or transitive dependencies). Some redundancy.",
+        "3": "Generally normalized to 2NF-3NF. Minimal redundancy. Rationale for exceptions provided.",
+        "4": "Proper normalization to 3NF. Any denormalization documented with performance justification. No update anomalies.",
+        "5": "Exemplary normalization with clear explanation of level achieved (1NF/2NF/3NF/BCNF). Strategic denormalization only where measured performance gains justify it. Trade-offs explicitly documented. No data integrity risks."
+      }
+    },
+    {
+      "name": "Use Case Coverage & Validation",
+      "description": "Does schema support all required use cases? Can all queries be answered?",
+      "scoring": {
+        "1": "Schema doesn't support core use cases. Critical queries impossible or require workarounds.",
+        "2": "Supports some use cases but gaps exist. Some queries difficult or inefficient.",
+        "3": "Supports most use cases. Required queries possible though some may be complex.",
+        "4": "All use cases supported. Validation checklist shows each use case can be satisfied. Query paths identified.",
+        "5": "Comprehensive validation against all use cases with example queries. Indexes planned for performance. Edge cases considered. Future use cases accommodated by design."
+      }
+    },
+    {
+      "name": "Technology Appropriateness",
+      "description": "Is the schema type (relational, document, graph) appropriate for the domain?",
+      "scoring": {
+        "1": "Wrong technology choice (e.g., relational for graph problem, or vice versa). Implementation doesn't match paradigm.",
+        "2": "Technology choice questionable. Implementation awkward for chosen paradigm.",
+        "3": "Reasonable technology choice. Implementation follows paradigm conventions.",
+        "4": "Good technology choice with justification. Implementation leverages paradigm strengths.",
+        "5": "Optimal technology choice with clear rationale comparing alternatives. Implementation exemplifies paradigm best practices. Schema leverages technology-specific features appropriately (e.g., JSONB in PostgreSQL, graph traversal in Neo4j)."
+      }
+    },
+    {
+      "name": "Documentation Quality & Clarity",
+      "description": "Is schema well-documented with ERD, implementation code, and clear explanations?",
+      "scoring": {
+        "1": "Minimal documentation. No diagram. Entity definitions incomplete.",
+        "2": "Basic documentation present but gaps. Diagram missing or unclear. Some entities poorly explained.",
+        "3": "Good documentation with most sections complete. Diagram present. Entities explained.",
+        "4": "Comprehensive documentation following template. ERD clear. All entities, relationships, constraints documented. Implementation code provided.",
+        "5": "Exemplary documentation that could serve as reference. ERD/diagram clear and complete. All sections filled thoroughly. Implementation code (SQL DDL / JSON Schema / Cypher) executable. Examples aid understanding. Could be handed to developer for immediate implementation."
+      }
+    },
+    {
+      "name": "Evolution & Migration Strategy",
+      "description": "Is there a plan for schema changes? Migration path from existing systems considered?",
+      "scoring": {
+        "1": "No evolution strategy. If migration, no plan for existing data.",
+        "2": "Evolution mentioned but no concrete strategy. Migration path vague.",
+        "3": "Basic evolution strategy (versioning or backward-compat approach). Migration considered if applicable.",
+        "4": "Clear evolution strategy documented. Migration path defined with phases if migrating from legacy.",
+        "5": "Comprehensive evolution strategy with versioning, backward-compatibility approach, and detailed migration plan if applicable. Rollback strategy considered. Zero-downtime deployment approach specified. Future extensibility designed in."
+      }
+    },
+    {
+      "name": "Advanced Pattern Application",
+      "description": "Are advanced patterns (temporal, hierarchies, polymorphic) correctly applied when needed?",
+      "scoring": {
+        "1": "Complex patterns needed but missing or incorrectly implemented.",
+        "2": "Attempted advanced patterns but implementation flawed or overly complex.",
+        "3": "Advanced patterns applied where needed with reasonable implementation.",
+        "4": "Advanced patterns correctly implemented with good trade-off decisions (e.g., hierarchy approach chosen based on use case).",
+        "5": "Sophisticated pattern usage with clear rationale for choices. Temporal modeling, hierarchies, polymorphic associations, or graph patterns implemented optimally for domain. Trade-offs explicit and justified."
+      }
+    }
+  ],
+  "schema_type_guidance": {
+    "Relational (SQL)": {
+      "target_score": 3.5,
+      "focus_criteria": [
+        "Normalization & Data Integrity",
+        "Constraint & Invariant Specification",
+        "Relationship Modeling Accuracy"
+      ],
+      "key_patterns": [
+        "Proper normalization (3NF typical)",
+        "Foreign key relationships with CASCADE/RESTRICT",
+        "CHECK constraints for business rules",
+        "Junction tables for M:N relationships"
+      ]
+    },
+    "Document/NoSQL": {
+      "target_score": 3.5,
+      "focus_criteria": [
+        "Entity Identification & Completeness",
+        "Use Case Coverage & Validation",
+        "Technology Appropriateness"
+      ],
+      "key_patterns": [
+        "Embed vs reference decision documented",
+        "Denormalization for read performance justified",
+        "Document structure matches query patterns",
+        "JSON schema validation if available"
+      ]
+    },
+    "Graph Database": {
+      "target_score": 4.0,
+      "focus_criteria": [
+        "Relationship Modeling Accuracy",
+        "Technology Appropriateness",
+        "Advanced Pattern Application"
+      ],
+      "key_patterns": [
+        "Nodes for entities, edges for relationships",
+        "Properties on edges for context",
+        "Traversal patterns optimized (< 3 hops typical)",
+        "Index on frequently filtered properties"
+      ]
+    },
+    "Data Warehouse (OLAP)": {
+      "target_score": 3.5,
+      "focus_criteria": [
+        "Use Case Coverage & Validation",
+        "Normalization & Data Integrity",
+        "Technology Appropriateness"
+      ],
+      "key_patterns": [
+        "Star or snowflake schema",
+        "Fact tables with foreign keys to dimensions",
+        "Dimensional attributes denormalized",
+        "Slowly changing dimensions handled"
+      ]
+    }
+  },
+  "domain_complexity_guidance": {
+    "Simple Domain (< 10 entities, straightforward relationships)": {
+      "target_score": 3.0,
+      "acceptable_shortcuts": [
+        "ERD can be simple text diagram",
+        "Fewer implementation details needed",
+        "Basic constraints sufficient"
+      ],
+      "key_quality_gates": [
+        "All entities identified",
+        "Relationships correct",
+        "Supports use cases"
+      ]
+    },
+    "Standard Domain (10-30 entities, moderate complexity)": {
+      "target_score": 3.5,
+      "required_elements": [
+        "Complete entity definitions",
+        "ERD diagram",
+        "All relationships mapped",
+        "Constraints documented",
+        "Implementation code (DDL/schema)"
+      ],
+      "key_quality_gates": [
+        "All 10 criteria evaluated",
+        "Minimum score of 3 on each",
+        "Average ≥ 3.5"
+      ]
+    },
+    "Complex Domain (30+ entities, hierarchies, temporal, polymorphic)": {
+      "target_score": 4.0,
+      "required_elements": [
+        "Comprehensive documentation",
+        "Multiple diagrams (ERD + detail views)",
+        "Advanced pattern usage documented",
+        "Migration strategy if applicable",
+        "Performance considerations",
+        "Example queries for complex patterns"
+      ],
+      "key_quality_gates": [
+        "All 10 criteria evaluated",
+        "Minimum score of 3.5 on each",
+        "Average ≥ 4.0",
+        "Score 5 on Advanced Pattern Application"
+      ]
+    }
+  },
+  "common_failure_modes": {
+    "1. God Entities": {
+      "symptom": "User table with 50+ attributes, or single entity handling multiple concerns",
+      "why_it_fails": "Violates single responsibility, hard to query, update anomalies",
+      "fix": "Extract related concerns into separate entities (UserProfile, UserPreferences, UserAddress)",
+      "related_criteria": ["Entity Identification & Completeness", "Normalization & Data Integrity"]
+    },
+    "2. Missing Junction Tables": {
+      "symptom": "Attempting M:N relationship with direct foreign keys or comma-separated IDs",
+      "why_it_fails": "Can't properly model M:N, violates 1NF, query complexity",
+      "fix": "Always use junction table with composite primary key for M:N relationships",
+      "related_criteria": ["Relationship Modeling Accuracy", "Normalization & Data Integrity"]
+    },
+    "3. Wrong Data Types": {
+      "symptom": "Money as FLOAT, dates as VARCHAR, booleans as CHAR(1)",
+      "why_it_fails": "Precision loss (money), format inconsistency (dates), unclear semantics (booleans)",
+      "fix": "Use DECIMAL for money, DATE/TIMESTAMP for dates, BOOLEAN for flags",
+      "related_criteria": ["Attribute Definition Quality"]
+    },
+    "4. No Constraints": {
+      "symptom": "Business rules in documentation but not enforced in schema",
+      "why_it_fails": "Application bugs can corrupt data, no database-level guarantees",
+      "fix": "Use CHECK constraints, NOT NULL, UNIQUE, FK constraints to enforce rules",
+      "related_criteria": ["Constraint & Invariant Specification"]
+    },
+    "5. Premature Denormalization": {
+      "symptom": "Duplicating data for \"performance\" without measuring",
+      "why_it_fails": "Update anomalies, data inconsistency, wasted effort if not bottleneck",
+      "fix": "Normalize first (3NF), denormalize only after profiling shows actual bottleneck",
+      "related_criteria": ["Normalization & Data Integrity", "Use Case Coverage & Validation"]
+    },
+    "6. Ignoring Use Cases": {
+      "symptom": "Schema designed in isolation, doesn't support required queries",
+      "why_it_fails": "Schema can't answer business questions, requires redesign",
+      "fix": "Validate schema against ALL use cases. Write example queries to verify.",
+      "related_criteria": ["Use Case Coverage & Validation"]
+    },
+    "7. Modeling Implementation": {
+      "symptom": "Entities like \"UserSession\", \"Cache\", \"Queue\" in domain model",
+      "why_it_fails": "Confuses domain concepts with technical infrastructure",
+      "fix": "Model real-world domain entities only. Infrastructure is separate concern.",
+      "related_criteria": ["Entity Identification & Completeness", "Technology Appropriateness"]
+    },
+    "8. No Evolution Strategy": {
+      "symptom": "Can't change schema without breaking production",
+      "why_it_fails": "Schema ossifies, can't adapt to business changes",
+      "fix": "Plan for evolution: versioning, backward-compat changes, or expand-contract migrations",
+      "related_criteria": ["Evolution & Migration Strategy"]
+    }
+  },
+  "scale": {
+    "description": "Each criterion scored 1-5",
+    "min_score": 1,
+    "max_score": 5,
+    "passing_threshold": 3.5,
+    "excellence_threshold": 4.5
+  },
+  "usage_notes": {
+    "when_to_score": "After completing schema design, before delivering to user",
+    "minimum_standard": "Average score ≥ 3.5 across all criteria (standard domain). Simple domains: ≥ 3.0. Complex domains: ≥ 4.0.",
+    "how_to_improve": "If scoring < threshold, identify lowest-scoring criteria and iterate. Common fixes: add missing entities, specify constraints, validate against use cases, improve documentation.",
+    "self_assessment": "Score honestly. Schema flaws are expensive to fix in production. Better to iterate now."
+  }
+}
--- a/skills/data-schema-knowledge-modeling/resources/methodology.md
+++ b/skills/data-schema-knowledge-modeling/resources/methodology.md
@@ -0,0 +1,439 @@
+# Data Schema & Knowledge Modeling: Advanced Methodology
+
+## Workflow
+
+```
+Advanced Schema Modeling:
+- [ ] Step 1: Analyze complex domain patterns
+- [ ] Step 2: Design advanced relationship structures
+- [ ] Step 3: Apply normalization or strategic denormalization
+- [ ] Step 4: Model temporal/historical aspects
+- [ ] Step 5: Plan schema evolution strategy
+```
+
+**Steps:** (1) Identify patterns in [Advanced Relationships](#1-advanced-relationship-patterns), (2) Apply [Hierarchy](#2-hierarchy-modeling) and [Polymorphic](#3-polymorphic-associations) patterns, (3) Use [Normalization](#4-normalization-levels) then [Denormalization](#5-strategic-denormalization), (4) Add [Temporal](#6-temporal--historical-modeling) if needed, (5) Plan [Evolution](#7-schema-evolution).
+
+---
+
+## 1. Advanced Relationship Patterns
+
+### Self-Referential
+
+Entity relates to itself (org charts, categories, social networks).
+
+```sql
+CREATE TABLE Employee (
+  id BIGINT PRIMARY KEY,
+  managerId BIGINT NULL REFERENCES Employee(id),
+  CONSTRAINT no_self_ref CHECK (id != managerId)
+);
+```
+
+Query with recursive CTE for full hierarchy.
+
+### Conditional
+
+Relationship exists only under conditions.
+
+```sql
+CREATE TABLE Order (
+  id BIGINT PRIMARY KEY,
+  status VARCHAR(20),
+  paymentId BIGINT NULL REFERENCES Payment(id),
+  CONSTRAINT payment_when_paid CHECK (
+    (status IN ('paid','completed') AND paymentId IS NOT NULL) OR
+    (status NOT IN ('paid','completed'))
+  )
+);
+```
+
+### Multi-Parent
+
+Entity has multiple parents (document in folders).
+
+```sql
+CREATE TABLE DocumentFolder (
+  documentId BIGINT REFERENCES Document(id),
+  folderId BIGINT REFERENCES Folder(id),
+  PRIMARY KEY (documentId, folderId)
+);
+```
+
+---
+
+## 2. Hierarchy Modeling
+
+Four approaches with trade-offs:
+
+| Approach | Implementation | Read | Write | Best For |
+|----------|---------------|------|-------|----------|
+| **Adjacency List** | `parentId` column | Slow (recursive) | Fast | Shallow trees, frequent updates |
+| **Path Enumeration** | `path VARCHAR` ('/1/5/12/') | Fast | Medium | Read-heavy, moderate depth |
+| **Nested Sets** | `lft, rgt INT` | Fastest | Slow | Read-heavy, rare writes |
+| **Closure Table** | Separate ancestor/descendant table | Fastest | Medium | Complex queries, any depth |
+
+**Adjacency List:**
+```sql
+CREATE TABLE Category (
+  id BIGINT PRIMARY KEY,
+  parentId BIGINT NULL REFERENCES Category(id)
+);
+```
+
+**Closure Table:**
+```sql
+CREATE TABLE CategoryClosure (
+  ancestor BIGINT,
+  descendant BIGINT,
+  depth INT,  -- 0=self, 1=child, 2+=deeper
+  PRIMARY KEY (ancestor, descendant)
+);
+```
+
+**Recommendation:** Adjacency for < 5 levels, Closure for complex queries.
+
+---
+
+## 3. Polymorphic Associations
+
+Entity relates to multiple types (Comment on Post/Photo/Video).
+
+### Approach 1: Separate FKs (Recommended for SQL)
+
+```sql
+CREATE TABLE Comment (
+  id BIGINT PRIMARY KEY,
+  postId BIGINT NULL REFERENCES Post(id),
+  photoId BIGINT NULL REFERENCES Photo(id),
+  videoId BIGINT NULL REFERENCES Video(id),
+  CONSTRAINT one_parent CHECK (
+    (postId IS NOT NULL)::int +
+    (photoId IS NOT NULL)::int +
+    (videoId IS NOT NULL)::int = 1
+  )
+);
+```
+
+**Pros:** Type-safe, referential integrity
+**Cons:** Schema grows with types
+
+### Approach 2: Supertype/Subtype
+
+```sql
+CREATE TABLE Commentable (id BIGINT PRIMARY KEY, type VARCHAR(50));
+CREATE TABLE Post (id BIGINT PRIMARY KEY REFERENCES Commentable(id), ...);
+CREATE TABLE Photo (id BIGINT PRIMARY KEY REFERENCES Commentable(id), ...);
+CREATE TABLE Comment (commentableId BIGINT REFERENCES Commentable(id));
+```
+
+**Use when:** Shared attributes across types.
+
+---
+
+## 4. Graph & Ontology Design
+
+### Property Graph
+
+**Nodes** = entities, **Edges** = relationships, both have properties.
+
+```cypher
+CREATE (u:User {id: 1, name: 'Alice'})
+CREATE (p:Product {id: 100, name: 'Widget'})
+CREATE (u)-[:PURCHASED {date: '2024-01-15', quantity: 2}]->(p)
+```
+
+**Schema:**
+```
+Nodes: User, Product, Category
+Edges: PURCHASED (User→Product, {date, quantity})
+       REVIEWED (User→Product, {rating, comment})
+       BELONGS_TO (Product→Category)
+```
+
+**Design principles:**
+- Nodes for entities with identity
+- Edges for relationships
+- Properties on edges for context
+- Avoid deep traversals (< 3 hops)
+
+### RDF Triples (Semantic Web)
+
+Subject-Predicate-Object:
+```turtle
+ex:Alice rdf:type ex:User .
+ex:Alice ex:purchased ex:Widget .
+```
+
+**Use RDF when:** Standards compliance, semantic reasoning, linked data
+**Use Property Graph when:** Performance, complex traversals
+
+---
+
+## 5. Normalization Levels
+
+### 1NF: Atomic Values
+
+**Violation:** Multiple phones in one column
+**Fix:** Separate UserPhone table
+
+### 2NF: No Partial Dependencies
+
+**Violation:** In OrderItem(orderId, productId, productName), productName depends only on productId
+**Fix:** productName lives in Product table
+
+### 3NF: No Transitive Dependencies
+
+**Violation:** In Address(id, zipCode, city, state), city/state depend on zipCode
+**Fix:** Separate ZipCode table
+
+**When to normalize to 3NF:** OLTP, frequent updates, consistency required
+
+---
+
+## 6. Strategic Denormalization
+
+**Only after profiling shows bottleneck.**
+
+### Pattern 1: Computed Aggregates
+
+Store `Order.total` instead of summing OrderItems on every query.
+
+**Trade-off:** Faster reads, slower writes, consistency risk (use triggers/app logic)
+
+### Pattern 2: Frequent Joins
+
+Embed address fields in User table to avoid join.
+
+**Trade-off:** No join, but updates must maintain both
+
+### Pattern 3: Historical Snapshots
+
+```sql
+CREATE TABLE OrderSnapshot (
+  orderId BIGINT,
+  snapshotDate DATE,
+  userName VARCHAR(255),  -- denormalized from User
+  userEmail VARCHAR(255),
+  PRIMARY KEY (orderId, snapshotDate)
+);
+```
+
+**Use when:** Need point-in-time data (e.g., user's name at time of order)
+
+---
+
+## 7. Temporal & Historical Modeling
+
+### Pattern 1: Effective Dating
+
+```sql
+CREATE TABLE Price (
+  productId BIGINT,
+  price DECIMAL(10,2),
+  effectiveFrom DATE NOT NULL,
+  effectiveTo DATE NULL,  -- NULL = current
+  PRIMARY KEY (productId, effectiveFrom)
+);
+```
+
+**Query current:** WHERE effectiveFrom <= TODAY AND (effectiveTo IS NULL OR effectiveTo > TODAY)
+
+### Pattern 2: History Table
+
+```sql
+CREATE TABLE UserHistory (
+  id BIGINT AUTO_INCREMENT PRIMARY KEY,
+  userId BIGINT,
+  email VARCHAR(255),
+  name VARCHAR(255),
+  validFrom TIMESTAMP DEFAULT NOW(),
+  validTo TIMESTAMP NULL,
+  changeType VARCHAR(20)  -- 'INSERT', 'UPDATE', 'DELETE'
+);
+```
+
+Trigger on User table inserts into UserHistory on changes.
+
+### Pattern 3: Event Sourcing
+
+```sql
+CREATE TABLE OrderEvent (
+  id BIGINT AUTO_INCREMENT PRIMARY KEY,
+  orderId BIGINT,
+  eventType VARCHAR(50),  -- 'CREATED', 'ITEM_ADDED', 'SHIPPED'
+  eventData JSON,
+  occurredAt TIMESTAMP DEFAULT NOW()
+);
+```
+
+Reconstruct state by replaying events.
+
+**Trade-offs:**
+**Pros:** Complete audit, time travel
+**Cons:** Query complexity, storage
+
+---
+
+## 8. Schema Evolution
+
+### Strategy 1: Backward-Compatible
+
+Safe changes (no app changes):
+- Add nullable column
+- Add table (not referenced)
+- Add index
+- Widen column (VARCHAR(100) → VARCHAR(255))
+
+```sql
+ALTER TABLE User ADD COLUMN phoneNumber VARCHAR(20) NULL;
+```
+
+### Strategy 2: Expand-Contract
+
+For breaking changes:
+
+1. **Expand:** Add new alongside old
+   ```sql
+   ALTER TABLE User ADD COLUMN newEmail VARCHAR(255) NULL;
+   ```
+
+2. **Migrate:** Copy data
+   ```sql
+   UPDATE User SET newEmail = email WHERE newEmail IS NULL;
+   ```
+
+3. **Contract:** Remove old
+   ```sql
+   ALTER TABLE User DROP COLUMN email;
+   ALTER TABLE User RENAME COLUMN newEmail TO email;
+   ```
+
+### Strategy 3: Versioned Schemas (NoSQL)
+
+```json
+{"_schemaVersion": "2.0", "email": "alice@example.com"}
+```
+
+App handles multiple versions.
+
+### Strategy 4: Blue-Green
+
+Run old and new schemas simultaneously, dual-write, migrate, switch reads, remove old.
+
+**Best for:** Major redesigns, zero downtime
+
+---
+
+## 9. Multi-Tenancy
+
+### Pattern 1: Separate Databases
+
+```
+tenant1_db, tenant2_db, tenant3_db
+```
+
+**Pros:** Strong isolation
+**Cons:** High overhead
+
+### Pattern 2: Separate Schemas
+
+```sql
+CREATE SCHEMA tenant1;
+CREATE TABLE tenant1.User (...);
+```
+
+**Pros:** Better than separate DBs
+**Cons:** Still some overhead
+
+### Pattern 3: Shared Schema + Tenant ID
+
+```sql
+CREATE TABLE User (
+  id BIGINT PRIMARY KEY,
+  tenantId BIGINT NOT NULL,
+  email VARCHAR(255),
+  UNIQUE (tenantId, email)
+);
+```
+
+**Pros:** Most efficient
+**Cons:** Must filter ALL queries by tenantId
+
+**Recommendation:** Pattern 3 for SaaS, Pattern 1 for regulated industries
+
+---
+
+## 10. Performance
+
+### Indexes
+
+**Covering index** (includes all query columns):
+```sql
+CREATE INDEX idx_user_status ON User(status) INCLUDE (name, email);
+```
+
+**Composite index** (order matters):
+```sql
+-- Good for: WHERE tenantId = X AND createdAt > Y
+CREATE INDEX idx_tenant_date ON Order(tenantId, createdAt);
+```
+
+**Partial index** (reduce size):
+```sql
+CREATE INDEX idx_active ON User(email) WHERE deletedAt IS NULL;
+```
+
+### Partitioning
+
+**Horizontal (sharding):**
+```sql
+CREATE TABLE Order (...) PARTITION BY RANGE (createdAt);
+CREATE TABLE Order_2024_Q1 PARTITION OF Order
+  FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
+```
+
+**Vertical:** Split hot/cold data into separate tables.
+
+---
+
+## 11. Common Advanced Patterns
+
+### Soft Deletes
+
+```sql
+ALTER TABLE User ADD COLUMN deletedAt TIMESTAMP NULL;
+-- Query: WHERE deletedAt IS NULL
+```
+
+### Audit Columns
+
+```sql
+createdAt TIMESTAMP DEFAULT NOW()
+updatedAt TIMESTAMP DEFAULT NOW() ON UPDATE NOW()
+createdBy BIGINT REFERENCES User(id)
+updatedBy BIGINT REFERENCES User(id)
+```
+
+### State Machines
+
+```sql
+CREATE TABLE OrderState (
+  orderId BIGINT REFERENCES Order(id),
+  state VARCHAR(20),
+  transitionedAt TIMESTAMP DEFAULT NOW(),
+  PRIMARY KEY (orderId, transitionedAt)
+);
+-- Track: draft → pending → confirmed → shipped → delivered
+```
+
+### Idempotency Keys
+
+```sql
+CREATE TABLE Request (
+  idempotencyKey UUID PRIMARY KEY,
+  payload JSON,
+  result JSON,
+  processedAt TIMESTAMP
+);
+-- Prevents duplicate processing
+```
--- a/skills/data-schema-knowledge-modeling/resources/template.md
+++ b/skills/data-schema-knowledge-modeling/resources/template.md
@@ -0,0 +1,330 @@
+# Data Schema & Knowledge Modeling Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Data Schema & Knowledge Modeling Progress:
+- [ ] Step 1: Gather domain requirements and scope
+- [ ] Step 2: Identify entities and attributes systematically
+- [ ] Step 3: Define relationships and cardinality
+- [ ] Step 4: Specify constraints and invariants
+- [ ] Step 5: Validate against use cases and document
+```
+
+**Step 1: Gather domain requirements and scope**
+
+Ask user for domain description, core use cases, existing data sources, scale requirements, and technology stack. Use [Input Questions](#input-questions).
+
+**Step 2: Identify entities and attributes systematically**
+
+Extract entities from requirements using [Entity Identification](#entity-identification). Define attributes with types and nullability using [Attribute Guide](#attribute-guide).
+
+**Step 3: Define relationships and cardinality**
+
+Map entity connections using [Relationship Mapping](#relationship-mapping). Specify cardinality (1:1, 1:N, M:N) and optionality.
+
+**Step 4: Specify constraints and invariants**
+
+Define business rules and constraints using [Constraint Specification](#constraint-specification). Document domain invariants.
+
+**Step 5: Validate against use cases and document**
+
+Create `data-schema-knowledge-modeling.md` using [Template](#schema-documentation-template). Verify using [Validation Checklist](#validation-checklist).
+
+---
+
+## Input Questions
+
+**Domain & Scope:**
+- What domain? (e-commerce, healthcare, social network)
+- Boundaries? In/out of scope?
+- Existing schemas to integrate/migrate from?
+
+**Core Use Cases:**
+- Primary operations? (CRUD for which entities?)
+- Required queries/reports?
+- Access patterns? (read-heavy, write-heavy, mixed)
+
+**Scale & Performance:**
+- Data volume? (rows per table, storage)
+- Growth rate? (daily/monthly)
+- Performance SLAs?
+
+**Technology:**
+- Database? (PostgreSQL, MongoDB, Neo4j, etc.)
+- Compliance? (GDPR, HIPAA, SOC2)
+- Evolution needs? (schema versioning, migrations)
+
+---
+
+## Entity Identification
+
+**Step 1: Extract nouns**
+
+List nouns from requirements = candidate entities.
+
+**Step 2: Validate**
+
+For each, check:
+- [ ] Distinct identity? (can point to "this specific X")
+- [ ] Independent lifecycle?
+- [ ] Multiple attributes beyond name?
+- [ ] Track multiple instances?
+
+**Keep** if yes to most. **Reject** if just an attribute.
+
+**Step 3: Entity vs Value Object**
+
+- **Entity**: Has ID, mutable (User, Order)
+- **Value Object**: No ID, immutable (Address, Money)
+
+**Step 4: Document**
+
+```markdown
+### Entity: [Name]
+**Purpose:** [What it represents]
+**Examples:** [2-3 concrete cases]
+**Lifecycle:** [Creation → deletion]
+**Invariants:** [Rules that must hold]
+```
+
+---
+
+## Attribute Guide
+
+**Template:**
+```
+attributeName: DataType [NULL|NOT NULL] [DEFAULT value]
+  - Description: [What it represents]
+  - Validation: [Constraints]
+  - Examples: [Sample values]
+```
+
+**Standard attributes:**
+- `id`: Primary key (UUID/BIGINT)
+- `createdAt`: TIMESTAMP NOT NULL
+- `updatedAt`: TIMESTAMP NOT NULL
+- `deletedAt`: TIMESTAMP NULL (soft deletes)
+
+**Data types:**
+
+| Data | SQL | NoSQL | Notes |
+|------|-----|-------|-------|
+| Short text | VARCHAR(N) | String | Specify max |
+| Long text | TEXT | String | No limit |
+| Integer | INT/BIGINT | Number | Choose size |
+| Decimal | DECIMAL(P,S) | Number | Fixed precision |
+| Money | DECIMAL(19,4) | {amount,currency} | Never FLOAT |
+| Boolean | BOOLEAN | Boolean | Not nullable |
+| Date/Time | TIMESTAMP | ISODate | With timezone |
+| UUID | UUID/CHAR(36) | String | Distributed IDs |
+| JSON | JSON/JSONB | Object | Flexible |
+| Enum | ENUM/VARCHAR | String | Fixed values |
+
+**Nullability:**
+- NOT NULL if required
+- NULL if optional/unknown at creation
+- Avoid NULL for booleans
+
+---
+
+## Relationship Mapping
+
+**Cardinality:**
+
+**1:1** - User has one Profile
+- SQL: `Profile.userId UNIQUE NOT NULL REFERENCES User(id)`
+
+**1:N** - User has many Orders
+- SQL: `Order.userId NOT NULL REFERENCES User(id)`
+
+**M:N** - Order contains Products
+- Junction table:
+  ```sql
+  OrderItem (
+    orderId REFERENCES Order(id),
+    productId REFERENCES Product(id),
+    quantity INT NOT NULL,
+    PRIMARY KEY (orderId, productId)
+  )
+  ```
+
+**Optionality:**
+- Required: NOT NULL
+- Optional: NULL
+
+**Naming:**
+Use verbs: User **owns** Order, Product **belongs to** Category
+
+---
+
+## Constraint Specification
+
+**Primary Keys:**
+```sql
+id BIGINT PRIMARY KEY AUTO_INCREMENT
+-- or --
+id UUID PRIMARY KEY DEFAULT gen_random_uuid()
+```
+
+**Unique:**
+```sql
+email VARCHAR(255) UNIQUE NOT NULL
+UNIQUE (userId, productId)  -- composite
+```
+
+**Foreign Keys:**
+```sql
+userId BIGINT NOT NULL REFERENCES User(id) ON DELETE CASCADE
+-- Options: CASCADE, SET NULL, RESTRICT
+```
+
+**Check Constraints:**
+```sql
+price DECIMAL(10,2) CHECK (price >= 0)
+status VARCHAR(20) CHECK (status IN ('draft','pending','completed'))
+```
+
+**Domain Invariants:**
+
+Document business rules:
+```markdown
+### Invariant: Order total = sum of items
+Order.total = SUM(OrderItem.quantity * OrderItem.price)
+
+### Invariant: Unique email
+No duplicate emails (case-insensitive)
+```
+
+Enforce via: DB constraints (preferred), application logic, or triggers.
+
+---
+
+## Schema Documentation Template
+
+Create: `data-schema-knowledge-modeling.md`
+
+**Required sections:**
+
+1. **Domain Overview** - Purpose, scope, technology
+2. **Use Cases** - Primary operations, query patterns
+3. **Entity Definitions** - For each entity:
+   - Purpose, examples, lifecycle
+   - Attributes table (name, type, null, default, constraints, description)
+   - Relationships (cardinality, FK, optionality)
+   - Invariants
+4. **ERD** - Visual/text diagram showing relationships
+5. **Constraints** - DB constraints, domain invariants
+6. **Normalization** - Level, denormalization decisions
+7. **Implementation** - SQL DDL / JSON Schema / Graph schema as appropriate
+8. **Validation** - Check each use case is supported
+9. **Open Questions** - Unresolved decisions
+
+**Example entity definition:**
+
+```markdown
+### Entity: Order
+
+**Purpose:** Represents customer purchase transaction
+**Examples:** Amazon order #123, Shopify order #456
+**Lifecycle:** Created on checkout → Updated during fulfillment → Completed on delivery
+
+#### Attributes
+
+| Attribute | Type | Null? | Default | Constraints | Description |
+|---|---|---|---|---|---|
+| id | BIGINT | NO | auto | PK | Unique identifier |
+| userId | BIGINT | NO | - | FK→User | Customer who placed order |
+| status | VARCHAR(20) | NO | 'pending' | CHECK IN(...) | Order status |
+| total | DECIMAL(10,2) | NO | - | CHECK >= 0 | Order total |
+
+#### Relationships
+- **belongs to:** 1:N with User (Order.userId → User.id)
+- **contains:** 1:N with OrderItem junction table
+
+#### Invariants
+- total = SUM(OrderItem.quantity * OrderItem.price)
+- status transitions: pending → confirmed → shipped → delivered
+```
+
+---
+
+## Validation Checklist
+
+**Completeness:**
+- [ ] All entities identified
+- [ ] All attributes defined (types, nullability)
+- [ ] All relationships mapped (cardinality)
+- [ ] All constraints specified
+- [ ] All invariants documented
+
+**Correctness:**
+- [ ] Each entity distinct purpose
+- [ ] No redundant entities
+- [ ] Attributes in correct entities
+- [ ] Cardinality reflects reality
+- [ ] Constraints enforce rules
+
+**Use Case Coverage:**
+- [ ] Supports all CRUD operations
+- [ ] All queries answerable
+- [ ] Indexes planned
+- [ ] No missing joins
+
+**Normalization:**
+- [ ] No partial dependencies (2NF)
+- [ ] No transitive dependencies (3NF)
+- [ ] Denormalization documented
+- [ ] No update anomalies
+
+**Technical Quality:**
+- [ ] Consistent naming
+- [ ] Appropriate data types
+- [ ] Primary keys defined
+- [ ] Foreign keys maintain integrity
+- [ ] Soft delete strategy (if needed)
+- [ ] Audit fields (if needed)
+
+**Future-Proofing:**
+- [ ] Schema extensible
+- [ ] Migration path (if applicable)
+- [ ] Versioning strategy
+- [ ] No technical debt
+
+---
+
+## Common Pitfalls
+
+**1. Modeling implementation, not domain**
+- Symptom: Entities like "UserSession", "Cache"
+- Fix: Model real-world concepts only
+
+**2. God entities**
+- Symptom: User with 50+ attributes
+- Fix: Extract to separate entities
+
+**3. Missing junction tables**
+- Symptom: M:N with FKs
+- Fix: Always use junction table
+
+**4. Nullable FKs without reason**
+- Symptom: All relationships optional
+- Fix: NOT NULL unless truly optional
+
+**5. Not enforcing invariants**
+- Symptom: Rules in docs only
+- Fix: CHECK constraints, triggers, app validation
+
+**6. Premature denormalization**
+- Symptom: Duplicating without measurement
+- Fix: Normalize first, denormalize after profiling
+
+**7. Wrong data types**
+- Symptom: Money as VARCHAR
+- Fix: DECIMAL for money, proper types for all
+
+**8. No migration strategy**
+- Symptom: Can't change schema
+- Fix: Versioning, backward-compat changes
--- a/skills/decision-matrix/SKILL.md
+++ b/skills/decision-matrix/SKILL.md
@@ -0,0 +1,182 @@
+---
+name: decision-matrix
+description: Use when comparing multiple named alternatives across several criteria, need transparent trade-off analysis, making group decisions requiring alignment, choosing between vendors/tools/strategies, stakeholders need to see decision rationale, balancing competing priorities (cost vs quality vs speed), user mentions "which option should we choose", "compare alternatives", "evaluate vendors", "trade-offs", or when decision needs to be defensible and data-driven.
+---
+
+# Decision Matrix
+
+## What Is It?
+
+A decision matrix is a structured tool for comparing multiple alternatives against weighted criteria to make transparent, defensible choices. It forces explicit trade-off analysis by scoring each option on each criterion, making subjective factors visible and comparable.
+
+**Quick example:**
+
+| Option | Cost (30%) | Speed (25%) | Quality (45%) | Weighted Score |
+|--------|-----------|------------|---------------|----------------|
+| Option A | 8 (2.4) | 6 (1.5) | 9 (4.05) | **7.95** ← Winner |
+| Option B | 6 (1.8) | 9 (2.25) | 7 (3.15) | 7.20 |
+| Option C | 9 (2.7) | 4 (1.0) | 6 (2.7) | 6.40 |
+
+The numbers in parentheses show criterion score × weight. Option A wins despite not being fastest or cheapest because quality matters most (45% weight).
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Decision Matrix Progress:
+- [ ] Step 1: Frame the decision and list alternatives
+- [ ] Step 2: Identify and weight criteria
+- [ ] Step 3: Score each alternative on each criterion
+- [ ] Step 4: Calculate weighted scores and analyze results
+- [ ] Step 5: Validate quality and deliver recommendation
+```
+
+**Step 1: Frame the decision and list alternatives**
+
+Ask user for decision context (what are we choosing and why), list of alternatives (specific named options, not generic categories), constraints or dealbreakers (must-have requirements), and stakeholders (who needs to agree). Understanding must-haves helps filter options before scoring. See [Framing Questions](#framing-questions) for clarification prompts.
+
+**Step 2: Identify and weight criteria**
+
+Collaborate with user to identify criteria (what factors matter for this decision), determine weights (which criteria matter most, as percentages summing to 100%), and validate coverage (do criteria capture all important trade-offs). If user is unsure about weighting → Use [resources/template.md](resources/template.md) for weighting techniques. See [Criterion Types](#criterion-types) for common patterns.
+
+**Step 3: Score each alternative on each criterion**
+
+For each option, score on each criterion using consistent scale (typically 1-10 where 10 = best). Ask user for scores or research objective data (cost, speed metrics) where available. Document assumptions and data sources. For complex scoring → See [resources/methodology.md](resources/methodology.md) for calibration techniques.
+
+**Step 4: Calculate weighted scores and analyze results**
+
+Calculate weighted score for each option (sum of criterion score × weight). Rank options by total score. Identify close calls (options within 5% of each other). Check for sensitivity (would changing one weight flip the decision). See [Sensitivity Analysis](#sensitivity-analysis) for interpretation guidance.
+
+**Step 5: Validate quality and deliver recommendation**
+
+Self-assess using [resources/evaluators/rubric_decision_matrix.json](resources/evaluators/rubric_decision_matrix.json) (minimum score ≥ 3.5). Present decision-matrix.md file with clear recommendation, highlight key trade-offs revealed by analysis, note sensitivity to assumptions, and suggest next steps (gather more data on close calls, validate with stakeholders).
+
+## Framing Questions
+
+**To clarify the decision:**
+- What specific decision are we making? (Choose X from Y alternatives)
+- What happens if we don't decide or choose wrong?
+- When do we need to decide by?
+- Can we choose multiple options or only one?
+
+**To identify alternatives:**
+- What are all the named options we're considering?
+- Are there other alternatives we're ruling out immediately? Why?
+- What's the "do nothing" or status quo option?
+
+**To surface must-haves:**
+- Are there absolute dealbreakers? (Budget cap, timeline requirement, compliance need)
+- Which constraints are flexible vs rigid?
+
+## Criterion Types
+
+Common categories for criteria (adapt to your decision):
+
+**Financial Criteria:**
+- Upfront cost, ongoing cost, ROI, payback period, budget impact
+- Typical weight: 20-40% (higher for cost-sensitive decisions)
+
+**Performance Criteria:**
+- Speed, quality, reliability, scalability, capacity, throughput
+- Typical weight: 30-50% (higher for technical decisions)
+
+**Risk Criteria:**
+- Implementation risk, reversibility, vendor lock-in, technical debt, compliance risk
+- Typical weight: 10-25% (higher for enterprise/regulated environments)
+
+**Strategic Criteria:**
+- Alignment with goals, future flexibility, competitive advantage, market positioning
+- Typical weight: 15-30% (higher for long-term decisions)
+
+**Operational Criteria:**
+- Ease of use, maintenance burden, training required, integration complexity
+- Typical weight: 10-20% (higher for internal tools)
+
+**Stakeholder Criteria:**
+- Team preference, user satisfaction, executive alignment, customer impact
+- Typical weight: 5-15% (higher for change management contexts)
+
+## Weighting Approaches
+
+**Method 1: Direct Allocation (simplest)**
+Stakeholders assign percentages totaling 100%. Quick but can be arbitrary.
+
+**Method 2: Pairwise Comparison (more rigorous)**
+Compare each criterion pair: "Is cost more important than speed?" Build ranking, then assign weights.
+
+**Method 3: Must-Have vs Nice-to-Have (filters first)**
+Separate absolute requirements (pass/fail) from weighted criteria. Only evaluate options that pass must-haves.
+
+**Method 4: Stakeholder Averaging (group decisions)**
+Each stakeholder assigns weights independently, then average. Reveals divergence in priorities.
+
+See [resources/methodology.md](resources/methodology.md) for detailed facilitation techniques.
+
+## Sensitivity Analysis
+
+After calculating scores, check robustness:
+
+**1. Close calls:** Options within 5-10% of winner → Need more data or second opinion
+**2. Dominant criteria:** One criterion driving entire decision → Is weight too high?
+**3. Weight sensitivity:** Would swapping two criterion weights flip the winner? → Decision is fragile
+**4. Score sensitivity:** Would adjusting one score by ±1 point flip the winner? → Decision is sensitive to that data point
+
+**Red flags:**
+- Winner changes with small weight adjustments → Need stakeholder alignment on priorities
+- One option wins every criterion → Matrix is overkill, choice is obvious
+- Scores are mostly guesses → Gather more data before deciding
+
+## Common Patterns
+
+**Technology Selection:**
+- Criteria: Cost, performance, ecosystem maturity, team familiarity, vendor support
+- Weight: Performance and maturity typically 50%+
+
+**Vendor Evaluation:**
+- Criteria: Price, features, integration, support, reputation, contract terms
+- Weight: Features and integration typically 40-50%
+
+**Strategic Choices:**
+- Criteria: Market opportunity, resource requirements, risk, alignment, timing
+- Weight: Market opportunity and alignment typically 50%+
+
+**Hiring Decisions:**
+- Criteria: Experience, culture fit, growth potential, compensation expectations, availability
+- Weight: Experience and culture fit typically 50%+
+
+**Feature Prioritization:**
+- Criteria: User impact, effort, strategic value, risk, dependencies
+- Weight: User impact and strategic value typically 50%+
+
+## When NOT to Use This Skill
+
+**Skip decision matrix if:**
+- Only one viable option (no real alternatives to compare)
+- Decision is binary yes/no with single criterion (use simpler analysis)
+- Options differ on only one dimension (just compare that dimension)
+- Decision is urgent and stakes are low (analysis overhead not worth it)
+- Criteria are impossible to define objectively (purely emotional/aesthetic choice)
+- You already know the answer (using matrix to justify pre-made decision is waste)
+
+**Use instead:**
+- Single criterion → Simple ranking or threshold check
+- Binary decision → Pro/con list or expected value calculation
+- Highly uncertain → Scenario planning or decision tree
+- Purely subjective → Gut check or user preference vote
+
+## Quick Reference
+
+**Process:**
+1. Frame decision → List alternatives
+2. Identify criteria → Assign weights (sum to 100%)
+3. Score each option on each criterion (1-10 scale)
+4. Calculate weighted scores → Rank options
+5. Check sensitivity → Deliver recommendation
+
+**Resources:**
+- [resources/template.md](resources/template.md) - Structured matrix format and weighting techniques
+- [resources/methodology.md](resources/methodology.md) - Advanced techniques (group facilitation, calibration, sensitivity analysis)
+- [resources/evaluators/rubric_decision_matrix.json](resources/evaluators/rubric_decision_matrix.json) - Quality checklist before delivering
+
+**Deliverable:** `decision-matrix.md` file with table, rationale, and recommendation
--- a/skills/decision-matrix/resources/evaluators/rubric_decision_matrix.json
+++ b/skills/decision-matrix/resources/evaluators/rubric_decision_matrix.json
@@ -0,0 +1,218 @@
+{
+  "criteria": [
+    {
+      "name": "Decision Framing & Context",
+      "description": "Is the decision clearly defined with all viable alternatives identified?",
+      "scoring": {
+        "1": "Decision is vague or ill-defined. Alternatives are incomplete or include non-comparable options. No stakeholder identification.",
+        "3": "Decision is stated but lacks specificity. Most alternatives listed but may be missing key options. Stakeholders mentioned generally.",
+        "5": "Exemplary framing. Decision is specific and unambiguous. All viable alternatives identified (including 'do nothing' if relevant). Must-have requirements separated from criteria. Stakeholders clearly identified with their priorities noted."
+      }
+    },
+    {
+      "name": "Criteria Quality & Coverage",
+      "description": "Are criteria well-chosen, measurable, independent, and comprehensive?",
+      "scoring": {
+        "1": "Criteria are vague, redundant, or missing key factors. Too many (>10) or too few (<3). No clear definitions.",
+        "3": "Criteria cover main factors but may have some redundancy or gaps. 4-8 criteria with basic definitions. Some differentiation between options.",
+        "5": "Exemplary criteria selection. 4-7 criteria that are measurable, independent, relevant, and differentiate between options. Each criterion has clear definition and measurement approach. No redundancy. Captures all important trade-offs."
+      }
+    },
+    {
+      "name": "Weighting Appropriateness",
+      "description": "Do criterion weights reflect true priorities and sum to 100%?",
+      "scoring": {
+        "1": "Weights don't sum to 100%, are arbitrary, or clearly misaligned with stated priorities. No rationale provided.",
+        "3": "Weights sum to 100% and are reasonable but may lack explicit justification. Some alignment with priorities.",
+        "5": "Exemplary weighting. Weights sum to 100%, clearly reflect stakeholder priorities, and have documented rationale (pairwise comparison, swing weighting, or stakeholder averaging). Weight distribution makes sense for decision type."
+      }
+    },
+    {
+      "name": "Scoring Rigor & Data Quality",
+      "description": "Are scores based on data or defensible judgments with documented sources?",
+      "scoring": {
+        "1": "Scores appear to be wild guesses with no justification. No data sources. Inconsistent scale usage.",
+        "3": "Mix of data-driven and subjective scores. Some sources documented. Mostly consistent 1-10 scale. Some assumptions noted.",
+        "5": "Exemplary scoring rigor. Objective criteria backed by real data (quotes, benchmarks, measurements). Subjective criteria have clear anchors/definitions. All assumptions and data sources documented. Consistent 1-10 scale usage."
+      }
+    },
+    {
+      "name": "Calculation Accuracy",
+      "description": "Are weighted scores calculated correctly and presented clearly?",
+      "scoring": {
+        "1": "Calculation errors present. Weights don't match stated percentages. Formula mistakes. Unclear presentation.",
+        "3": "Calculations are mostly correct with minor issues. Weighted scores shown but presentation could be clearer.",
+        "5": "Perfect calculations. Weighted scores = Σ(score × weight) for each option. Table clearly shows raw scores, weights (as percentages), weighted scores, and totals. Ranking is correct."
+      }
+    },
+    {
+      "name": "Sensitivity Analysis",
+      "description": "Is decision robustness assessed (close calls, weight sensitivity, score uncertainty)?",
+      "scoring": {
+        "1": "No sensitivity analysis. Winner declared without checking if decision is robust.",
+        "3": "Basic sensitivity noted (e.g., 'close call' mentioned) but not systematically analyzed.",
+        "5": "Thorough sensitivity analysis. Identifies close calls (<10% margin). Tests weight sensitivity (would swapping weights flip decision?). Notes which scores are most uncertain. Assesses decision robustness and flags fragile decisions."
+      }
+    },
+    {
+      "name": "Recommendation Quality",
+      "description": "Is recommendation clear with rationale, trade-offs, and confidence level?",
+      "scoring": {
+        "1": "No clear recommendation or just states winner without rationale. No trade-off discussion.",
+        "3": "Recommendation stated with basic rationale. Some trade-offs mentioned. Confidence level implied but not stated.",
+        "5": "Exemplary recommendation. Clear winner with score. Explains WHY winner prevails (which criteria drive decision). Acknowledges trade-offs (where winner scores lower). States confidence level based on margin and sensitivity. Suggests next steps."
+      }
+    },
+    {
+      "name": "Assumption & Limitation Documentation",
+      "description": "Are key assumptions, uncertainties, and limitations explicitly stated?",
+      "scoring": {
+        "1": "No assumptions documented. Presents results as facts without acknowledging uncertainty or limitations.",
+        "3": "Some assumptions mentioned. Acknowledges uncertainty exists but not comprehensive.",
+        "5": "All key assumptions explicitly documented. Uncertainties flagged (which scores are guesses vs data). Limitations noted (e.g., 'cost estimates are preliminary', 'performance benchmarks unavailable'). Reader understands confidence bounds."
+      }
+    },
+    {
+      "name": "Stakeholder Alignment",
+      "description": "For group decisions, are different stakeholder priorities surfaced and addressed?",
+      "scoring": {
+        "1": "Single set of weights/scores presented as if universal. No acknowledgment of stakeholder differences.",
+        "3": "Stakeholder differences mentioned but not systematically addressed. Single averaged view presented.",
+        "5": "Stakeholder differences explicitly surfaced. If priorities diverge, shows impact (e.g., 'Under engineering priorities, A wins; under sales priorities, B wins'). Facilitates alignment or escalates decision appropriately."
+      }
+    },
+    {
+      "name": "Communication & Presentation",
+      "description": "Is matrix table clear, readable, and appropriately formatted?",
+      "scoring": {
+        "1": "Matrix is confusing, poorly formatted, or missing key elements (weights, totals). Hard to interpret.",
+        "3": "Matrix is readable with minor formatting issues. Weights and totals shown but could be clearer.",
+        "5": "Exemplary presentation. Table is clean and scannable. Column headers show criteria names AND weights (%). Weighted scores shown (not just raw scores). Winner visually highlighted. Assumptions and next steps clearly stated."
+      }
+    }
+  ],
+  "minimum_score": 3.5,
+  "guidance_by_decision_type": {
+    "Technology Selection (tools, platforms, vendors)": {
+      "target_score": 4.0,
+      "focus_criteria": [
+        "Criteria Quality & Coverage",
+        "Scoring Rigor & Data Quality",
+        "Sensitivity Analysis"
+      ],
+      "common_pitfalls": [
+        "Missing 'Total Cost of Ownership' as criterion (not just upfront cost)",
+        "Ignoring integration complexity or vendor lock-in risk",
+        "Not scoring 'do nothing / keep current solution' as baseline"
+      ]
+    },
+    "Strategic Choices (market entry, partnerships, positioning)": {
+      "target_score": 4.0,
+      "focus_criteria": [
+        "Decision Framing & Context",
+        "Weighting Appropriateness",
+        "Stakeholder Alignment"
+      ],
+      "common_pitfalls": [
+        "Weighting short-term metrics too heavily over strategic fit",
+        "Not including reversibility / optionality as criterion",
+        "Ignoring stakeholder misalignment on priorities"
+      ]
+    },
+    "Vendor / Supplier Evaluation": {
+      "target_score": 3.8,
+      "focus_criteria": [
+        "Criteria Quality & Coverage",
+        "Scoring Rigor & Data Quality",
+        "Assumption & Limitation Documentation"
+      ],
+      "common_pitfalls": [
+        "Relying on vendor-provided data without validation",
+        "Not including 'vendor financial health' or 'support SLA' criteria",
+        "Missing contract terms (pricing lock, exit clauses) as criterion"
+      ]
+    },
+    "Feature Prioritization": {
+      "target_score": 3.5,
+      "focus_criteria": [
+        "Weighting Appropriateness",
+        "Scoring Rigor & Data Quality",
+        "Sensitivity Analysis"
+      ],
+      "common_pitfalls": [
+        "Not including 'effort' or 'technical risk' as criteria",
+        "Scoring 'user impact' without user research data",
+        "Ignoring dependencies between features"
+      ]
+    },
+    "Hiring Decisions": {
+      "target_score": 3.5,
+      "focus_criteria": [
+        "Criteria Quality & Coverage",
+        "Scoring Rigor & Data Quality",
+        "Assumption & Limitation Documentation"
+      ],
+      "common_pitfalls": [
+        "Criteria too vague (e.g., 'culture fit' without definition)",
+        "Interviewer bias in scores (need calibration)",
+        "Not documenting what good vs poor looks like for each criterion"
+      ]
+    }
+  },
+  "guidance_by_complexity": {
+    "Simple (3-4 alternatives, clear criteria, aligned stakeholders)": {
+      "target_score": 3.5,
+      "sufficient_rigor": "Basic weighting (direct allocation), data-driven scores where possible, simple sensitivity check (margin analysis)"
+    },
+    "Moderate (5-7 alternatives, some subjectivity, minor disagreement)": {
+      "target_score": 3.8,
+      "sufficient_rigor": "Structured weighting (rank-order or pairwise), documented scoring rationale, sensitivity analysis on close calls"
+    },
+    "Complex (8+ alternatives, high subjectivity, stakeholder conflict)": {
+      "target_score": 4.2,
+      "sufficient_rigor": "Advanced weighting (AHP, swing), score calibration/normalization, Monte Carlo or scenario sensitivity, stakeholder convergence process (Delphi, NGT)"
+    }
+  },
+  "common_failure_modes": {
+    "1. Post-Rationalization": {
+      "symptom": "Weights or scores appear engineered to justify pre-made decision",
+      "detection": "Oddly specific weights (37%), generous scores for preferred option, stakeholders admit 'we already know the answer'",
+      "prevention": "Assign weights BEFORE scoring alternatives. Use blind facilitation. Ask: 'If matrix contradicts gut, do we trust it?'"
+    },
+    "2. Garbage In, Garbage Out": {
+      "symptom": "All scores are guesses with no data backing",
+      "detection": "Cannot answer 'where did this score come from?', scores assigned in <5 min, all round numbers (5, 7, 8)",
+      "prevention": "Require data sources for objective criteria. Define scoring anchors for subjective criteria. Flag uncertainties."
+    },
+    "3. Analysis Paralysis": {
+      "symptom": "Endless refinement, never deciding",
+      "detection": ">10 criteria, winner changes 3+ times, 'just one more round' requests",
+      "prevention": "Set decision deadline. Cap criteria at 5-7. Use satisficing rule: 'Any option >7.0 is acceptable.'"
+    },
+    "4. Criterion Soup": {
+      "symptom": "Overlapping, redundant, or conflicting criteria",
+      "detection": "Two criteria always score the same, scorer confusion ('how is this different?')",
+      "prevention": "Independence test: Can option score high on A but low on B? If no, merge them. Write clear definitions."
+    },
+    "5. Ignoring Sensitivity": {
+      "symptom": "Winner declared without robustness check",
+      "detection": "No mention of margin, close calls, or what would flip decision",
+      "prevention": "Always report margin. Test: 'If we swapped top 2 weights, does winner change?' Flag fragile decisions."
+    },
+    "6. Stakeholder Misalignment": {
+      "symptom": "Different stakeholders have different priorities but single matrix presented",
+      "detection": "Engineering wants A, sales wants B, but matrix 'proves' one is right",
+      "prevention": "Surface weight differences. Show 'under X priorities, A wins; under Y priorities, B wins.' Escalate if needed."
+    },
+    "7. Missing 'Do Nothing'": {
+      "symptom": "Only evaluating new alternatives, forgetting status quo is an option",
+      "detection": "All alternatives are new changes, no baseline comparison",
+      "prevention": "Always include current state / do nothing as an option to evaluate if change is worth it."
+    },
+    "8. False Precision": {
+      "symptom": "Scores to 2 decimals when underlying data is rough guess",
+      "detection": "Weighted total: 7.342 but scores are subjective estimates",
+      "prevention": "Match precision to confidence. Rough guesses → round to 0.5. Data-driven → decimals OK."
+    }
+  }
+}
--- a/skills/decision-matrix/resources/methodology.md
+++ b/skills/decision-matrix/resources/methodology.md
@@ -0,0 +1,398 @@
+# Decision Matrix: Advanced Methodology
+
+## Workflow
+
+Copy this checklist for complex decision scenarios:
+
+```
+Advanced Decision Matrix Progress:
+- [ ] Step 1: Diagnose decision complexity
+- [ ] Step 2: Apply advanced weighting techniques
+- [ ] Step 3: Calibrate and normalize scores
+- [ ] Step 4: Perform rigorous sensitivity analysis
+- [ ] Step 5: Facilitate group convergence
+```
+
+**Step 1: Diagnose decision complexity** - Identify complexity factors (stakeholder disagreement, high uncertainty, strategic importance). See [1. Decision Complexity Assessment](#1-decision-complexity-assessment).
+
+**Step 2: Apply advanced weighting techniques** - Use AHP or other rigorous methods for contentious decisions. See [2. Advanced Weighting Methods](#2-advanced-weighting-methods).
+
+**Step 3: Calibrate and normalize scores** - Handle different scoring approaches and normalize across scorers. See [3. Score Calibration & Normalization](#3-score-calibration--normalization).
+
+**Step 4: Perform rigorous sensitivity analysis** - Test decision robustness with Monte Carlo or scenario analysis. See [4. Advanced Sensitivity Analysis](#4-advanced-sensitivity-analysis).
+
+**Step 5: Facilitate group convergence** - Use Delphi method or consensus-building techniques. See [5. Group Decision Facilitation](#5-group-decision-facilitation).
+
+---
+
+## 1. Decision Complexity Assessment
+
+### Complexity Indicators
+
+**Low Complexity** (use basic template):
+- Clear stakeholder alignment on priorities
+- Objective criteria with available data
+- Low stakes (reversible decision)
+- 3-5 alternatives
+
+**Medium Complexity** (use enhanced techniques):
+- Moderate stakeholder disagreement
+- Mix of objective and subjective criteria
+- Moderate stakes (partially reversible)
+- 5-8 alternatives
+
+**High Complexity** (use full methodology):
+- Significant stakeholder disagreement on priorities
+- Mostly subjective criteria or high uncertainty
+- High stakes (irreversible or strategic decision)
+- >8 alternatives or multi-phase decision
+- Regulatory or compliance implications
+
+### Complexity Scoring
+
+| Factor | Low (1) | Medium (2) | High (3) |
+|--------|---------|------------|----------|
+| **Stakeholder alignment** | Aligned priorities | Some disagreement | Conflicting priorities |
+| **Criteria objectivity** | Mostly data-driven | Mix of data & judgment | Mostly subjective |
+| **Decision stakes** | Reversible, low cost | Partially reversible | Irreversible, strategic |
+| **Uncertainty level** | Low uncertainty | Moderate uncertainty | High uncertainty |
+| **Number of alternatives** | 3-4 options | 5-7 options | 8+ options |
+
+**Complexity Score = Sum of factors**
+- **5-7 points:** Use basic template
+- **8-11 points:** Use enhanced techniques (sections 2-3)
+- **12-15 points:** Use full methodology (all sections)
+
+---
+
+## 2. Advanced Weighting Methods
+
+### Analytic Hierarchy Process (AHP)
+
+**When to use:** High-stakes decisions with contentious priorities, need rigorous justification
+
+**Process:**
+
+1. **Create pairwise comparison matrix:** For each pair, rate 1-9 (1=equal, 3=slightly more important, 5=moderately, 7=strongly, 9=extremely)
+2. **Calculate weights:** Normalize columns, average rows
+3. **Check consistency:** CR < 0.10 acceptable (use online AHP calculator: bpmsg.com/ahp/ahp-calc.php)
+
+**Example:** Comparing Cost, Performance, Risk, Ease pairwise yields weights: Performance 55%, Risk 20%, Cost 15%, Ease 10%
+
+**Advantage:** Rigorous, forces logical consistency in pairwise judgments.
+
+### Swing Weighting
+
+**When to use:** Need to justify weights based on value difference, not just importance
+
+**Process:**
+
+1. **Baseline:** Imagine all criteria at worst level
+2. **Swing:** For each criterion, ask "What value does moving from worst to best create?"
+3. **Rank swings:** Which swing creates most value?
+4. **Assign points:** Give highest swing 100 points, others relative to it
+5. **Convert to weights:** Normalize points to percentages
+
+**Example:**
+
+| Criterion | Worst → Best Scenario | Value of Swing | Points | Weight |
+|-----------|----------------------|----------------|--------|--------|
+| Performance | 50ms → 5ms response | Huge value gain | 100 | 45% |
+| Cost | $100K → $50K | Moderate value | 60 | 27% |
+| Risk | High → Low risk | Significant value | 50 | 23% |
+| Ease | Hard → Easy to use | Minor value | 10 | 5% |
+
+**Total points:** 220 → **Weights:** 100/220=45%, 60/220=27%, 50/220=23%, 10/220=5%
+
+**Advantage:** Focuses on marginal value, not abstract importance. Reveals if criteria with wide option variance should be weighted higher.
+
+### Multi-Voting (Group Weighting)
+
+**When to use:** Group of 5-15 stakeholders needs to converge on weights
+
+**Process:**
+
+1. **Round 1 - Individual allocation:** Each person assigns 100 points across criteria
+2. **Reveal distribution:** Show average and variance for each criterion
+3. **Discuss outliers:** Why did some assign 40% to Cost while others assigned 10%?
+4. **Round 2 - Revised allocation:** Re-allocate with new information
+5. **Converge:** Repeat until variance is acceptable or use average
+
+**Example:**
+
+| Criterion | Round 1 Avg | Round 1 Variance | Round 2 Avg | Round 2 Variance |
+|-----------|-------------|------------------|-------------|------------------|
+| Cost | 25% | High (±15%) | 30% | Low (±5%) |
+| Performance | 40% | Medium (±10%) | 38% | Low (±4%) |
+| Risk | 20% | Low (±5%) | 20% | Low (±3%) |
+| Ease | 15% | High (±12%) | 12% | Low (±4%) |
+
+**Convergence achieved** when variance <±5% for all criteria.
+
+---
+
+## 3. Score Calibration & Normalization
+
+### Handling Different Scorer Tendencies
+
+**Problem:** Some scorers are "hard graders" (6-7 range), others are "easy graders" (8-9 range). This skews results.
+
+**Solution: Z-score normalization**
+
+**Step 1: Calculate each scorer's mean and standard deviation**
+
+Scorer A: Gave scores [8, 9, 7, 8] → Mean=8, SD=0.8
+Scorer B: Gave scores [5, 6, 4, 6] → Mean=5.25, SD=0.8
+
+**Step 2: Normalize each score**
+
+Z-score = (Raw Score - Scorer Mean) / Scorer SD
+
+**Step 3: Re-scale to 1-10**
+
+Normalized Score = 5.5 + (Z-score × 1.5)
+
+**Result:** Scorers are calibrated to same scale, eliminating grading bias.
+
+### Dealing with Missing Data
+
+**Scenario:** Some alternatives can't be scored on all criteria (e.g., vendor A won't share cost until later).
+
+**Approach 1: Conditional matrix**
+
+Score available criteria only, note which are missing. Once data arrives, re-run matrix.
+
+**Approach 2: Pessimistic/Optimistic bounds**
+
+Assign worst-case and best-case scores for missing data. Run matrix twice:
+- Pessimistic scenario: Missing data gets low score (e.g., 3)
+- Optimistic scenario: Missing data gets high score (e.g., 8)
+
+If same option wins both scenarios → Decision is robust. If different winners → Missing data is decision-critical, must obtain before deciding.
+
+### Non-Linear Scoring Curves
+
+**Problem:** Not all criteria are linear. E.g., cost difference between $10K and $20K matters more than $110K vs $120K.
+
+**Solution: Apply utility curves**
+
+**Diminishing returns curve** (Cost, Time):
+- Score = 10 × (1 - e^(-k × Cost Improvement))
+- k = sensitivity parameter (higher k = faster diminishing returns)
+
+**Threshold curve** (Must meet minimum):
+- Score = 0 if below threshold
+- Score = 1-10 linear above threshold
+
+**Example:** Load time criterion with 2-second threshold:
+- Option A: 1.5s → Score = 10 (below threshold = great)
+- Option B: 3s → Score = 5 (above threshold, linear penalty)
+- Option C: 5s → Score = 1 (way above threshold)
+
+---
+
+## 4. Advanced Sensitivity Analysis
+
+### Monte Carlo Sensitivity
+
+**When to use:** High uncertainty in scores, want to understand probability distribution of outcomes
+
+**Process:**
+
+1. **Define uncertainty ranges** for each score
+   - Option A Cost score: 6 ± 2 (could be 4-8)
+   - Option A Performance: 9 ± 0.5 (could be 8.5-9.5)
+
+2. **Run simulations** (1000+ iterations):
+   - Randomly sample scores within uncertainty ranges
+   - Calculate weighted total for each option
+   - Record winner
+
+3. **Analyze results:**
+   - Option A wins: 650/1000 = 65% probability
+   - Option B wins: 300/1000 = 30% probability
+   - Option C wins: 50/1000 = 5% probability
+
+**Interpretation:**
+- **>80% win rate:** High confidence in decision
+- **50-80% win rate:** Moderate confidence, option is likely but not certain
+- **<50% win rate:** Low confidence, gather more data or consider decision is close call
+
+**Tools:** Excel (=RANDBETWEEN or =NORM.INV), Python (numpy.random), R (rnorm)
+
+### Scenario Analysis
+
+**When to use:** Future is uncertain, decisions need to be robust across scenarios
+
+**Process:**
+
+1. **Define scenarios** (typically 3-4):
+   - Best case: Favorable market conditions
+   - Base case: Expected conditions
+   - Worst case: Unfavorable conditions
+   - Black swan: Unlikely but high-impact event
+
+2. **Adjust criterion weights or scores per scenario:**
+
+| Scenario | Cost Weight | Performance Weight | Risk Weight |
+|----------|-------------|--------------------|-------------|
+| Best case | 20% | 50% | 30% |
+| Base case | 30% | 40% | 30% |
+| Worst case | 40% | 20% | 40% |
+
+3. **Run matrix for each scenario**, identify winner
+
+4. **Evaluate robustness:**
+   - **Dominant option:** Wins in all scenarios → Robust choice
+   - **Scenario-dependent:** Different winners → Need to assess scenario likelihood
+   - **Mixed:** Wins in base + one other → Moderately robust
+
+### Threshold Analysis
+
+**Question:** At what weight does the decision flip?
+
+**Process:**
+
+1. **Vary one criterion weight** from 0% to 100% (keeping others proportional)
+2. **Plot total scores** for all options vs. weight
+3. **Identify crossover point** where lines intersect (decision flips)
+
+**Example:**
+
+When Performance weight < 25% → Option B wins (cost-optimized)
+When Performance weight > 25% → Option A wins (performance-optimized)
+
+**Insight:** Current weight is 40% for Performance. Decision is robust unless Performance drops below 25% importance.
+
+**Practical use:** Communicate to stakeholders: "Even if we reduce Performance priority to 25% (vs current 40%), Option A still wins. Decision is robust."
+
+---
+
+## 5. Group Decision Facilitation
+
+### Delphi Method (Asynchronous Consensus)
+
+**When to use:** Experts geographically distributed, want to avoid groupthink, need convergence without meetings
+
+**Process:**
+
+**Round 1:**
+- Each expert scores options independently (no discussion)
+- Facilitator compiles scores, calculates median and range
+
+**Round 2:**
+- Share Round 1 results (anonymous)
+- Experts see median scores and outliers
+- Ask experts to re-score, especially if they were outliers (optional: provide reasoning)
+
+**Round 3:**
+- Share Round 2 results
+- Experts make final adjustments
+- Converge on consensus scores (median or mean)
+
+**Convergence criteria:** Standard deviation of scores <1.5 points per criterion
+
+**Example:**
+
+| Option | Criterion | R1 Scores | R1 Median | R2 Scores | R2 Median | R3 Scores | R3 Median |
+|--------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
+| A | Cost | [5, 7, 9, 6] | 6.5 | [6, 7, 8, 6] | 6.5 | [6, 7, 7, 7] | **7** |
+
+**Advantage:** Avoids dominance by loudest voice, reduces groupthink, allows reflection time.
+
+### Nominal Group Technique (Structured Meeting)
+
+**When to use:** In-person or virtual meeting, need structured discussion to surface disagreements
+
+**Process:**
+
+1. **Silent generation (10 min):** Each person scores options independently
+2. **Round-robin sharing (20 min):** Each person shares one score and rationale (no debate yet)
+3. **Discussion (30 min):** Debate differences, especially outliers
+4. **Re-vote (5 min):** Independent re-scoring after hearing perspectives
+5. **Aggregation:** Calculate final scores (mean or median)
+
+**Facilitation tips:**
+- Enforce "no interruptions" during round-robin
+- Time-box discussion to avoid analysis paralysis
+- Focus debate on criteria with widest score variance
+
+### Handling Persistent Disagreement
+
+**Scenario:** After multiple rounds, stakeholders still disagree on weights or scores.
+
+**Options:**
+
+**1. Separate matrices by stakeholder group:**
+
+Run matrix for Engineering priorities, Sales priorities, Executive priorities separately. Present all three results. Highlight where recommendations align vs. differ.
+
+**2. Escalate to decision-maker:**
+
+Present divergence transparently: "Engineering weights Performance at 60%, Sales weights Cost at 50%. Under Engineering weights, Option A wins. Under Sales weights, Option B wins. Recommendation: [Decision-maker] must adjudicate priority trade-off."
+
+**3. Multi-criteria satisficing:**
+
+Instead of optimizing weighted sum, find option that meets minimum thresholds on all criteria. This avoids weighting debate.
+
+**Example:** Option must score ≥7 on Performance AND ≤$50K cost AND ≥6 on Ease of Use. Find options that satisfy all constraints.
+
+---
+
+## 6. Matrix Variations & Extensions
+
+### Weighted Pros/Cons Matrix
+Hybrid: Add "Key Pros/Cons/Dealbreakers" columns to matrix for qualitative context alongside quantitative scores.
+
+### Multi-Phase Decision Matrix
+**Phase 1:** High-level filter (simple criteria) → shortlist top 3
+**Phase 2:** Deep-dive (detailed criteria) → select winner
+Avoids analysis paralysis by not deep-diving on all options upfront.
+
+### Risk-Adjusted Matrix
+For uncertain scores, use expected value: (Optimistic + 4×Most Likely + Pessimistic) / 6
+Accounts for score uncertainty in final weighted total.
+
+---
+
+## 7. Common Failure Modes & Recovery
+
+| Failure Mode | Symptoms | Recovery |
+|--------------|----------|----------|
+| **Post-Rationalization** | Oddly specific weights, generous scores for preferred option | Assign weights BEFORE scoring, use third-party facilitator |
+| **Analysis Paralysis** | >10 criteria, endless tweaking, winner changes repeatedly | Set deadline, time-box criteria (5 max), use satisficing rule |
+| **Garbage In, Garbage Out** | Scores are guesses, no data sources, false confidence | Flag uncertainties, gather real data, acknowledge limits |
+| **Criterion Soup** | Overlapping criteria, scorer confusion | Consolidate redundant criteria, define each clearly |
+| **Spreadsheet Error** | Calculation mistakes, weights don't sum to 100% | Use templates with formulas, peer review calculations |
+
+---
+
+## 8. When to Abandon the Matrix
+
+Despite best efforts, sometimes a decision matrix is not the right tool:
+
+**Abandon if:**
+
+1. **Purely emotional decision:** Choosing baby name, selecting wedding venue (no "right" answer)
+   - **Use instead:** Gut feel, user preference vote
+
+2. **Single dominant criterion:** Only cost matters, everything else is noise
+   - **Use instead:** Simple cost comparison table
+
+3. **Decision already made:** Political realities mean decision is predetermined
+   - **Use instead:** Document decision rationale (not fake analysis)
+
+4. **Future is too uncertain:** Can't meaningfully score because context will change dramatically
+   - **Use instead:** Scenario planning, real options analysis, reversible pilot
+
+5. **Stakeholders distrust process:** Matrix seen as "math washing" to impose decision
+   - **Use instead:** Deliberative dialog, voting, or delegated authority
+
+**Recognize when structured analysis adds value vs. when it's theater.** Decision matrices work best when:
+- Multiple alternatives genuinely exist
+- Trade-offs are real and must be balanced
+- Stakeholders benefit from transparency
+- Data is available or can be gathered
+- Decision is reversible if matrix misleads
+
+If these don't hold, consider alternative decision frameworks.
--- a/Show More
+++ b/Show More