Initial commit

2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions
--- a/skills/design-of-experiments/SKILL.md
+++ b/skills/design-of-experiments/SKILL.md
@@ -0,0 +1,200 @@
+---
+name: design-of-experiments
+description: Use when optimizing multi-factor systems with limited experimental budget, screening many variables to find the vital few, discovering interactions between parameters, mapping response surfaces for peak performance, validating robustness to noise factors, or when users mention factorial designs, A/B/n testing, parameter tuning, process optimization, or experimental efficiency.
+---
+# Design of Experiments
+
+## Table of Contents
+- [Purpose](#purpose)
+- [When to Use](#when-to-use)
+- [What Is It?](#what-is-it)
+- [Workflow](#workflow)
+- [Common Patterns](#common-patterns)
+- [Guardrails](#guardrails)
+- [Quick Reference](#quick-reference)
+
+## Purpose
+
+Design of Experiments (DOE) helps you systematically discover how multiple factors affect an outcome while minimizing the number of experimental runs. Instead of testing one variable at a time (inefficient) or guessing randomly (unreliable), DOE uses structured experimental designs to:
+
+- **Screen** many factors to find the critical few
+- **Optimize** factor settings to maximize/minimize a response
+- **Discover interactions** where factors affect each other
+- **Map response surfaces** to understand the full factor space
+- **Validate robustness** against noise and environmental variation
+
+## When to Use
+
+Use this skill when:
+
+- **Limited experimental budget**: You have constraints on time, cost, or resources for testing
+- **Multiple factors**: 3+ controllable variables that could affect the outcome
+- **Interaction suspicion**: Factors may interact (effect of A depends on level of B)
+- **Optimization needed**: Finding best settings, not just "better than baseline"
+- **Screening required**: Many candidate factors (10+), need to identify vital few
+- **Response surface**: Need to map curvature, find peaks/valleys, understand tradeoffs
+- **Robust design**: Must work well despite noise factors or environmental variation
+- **Process improvement**: Manufacturing, chemical processes, software performance tuning
+- **Product development**: Formulations, recipes, configurations with multiple parameters
+- **A/B/n testing**: Web/app features with multiple variants and combinations
+- **Machine learning**: Hyperparameter tuning for models with many parameters
+
+Trigger phrases: "optimize", "tune parameters", "factorial test", "interaction effects", "response surface", "efficient experiments", "minimize runs", "robustness", "sensitivity analysis"
+
+## What Is It?
+
+Design of Experiments is a statistical framework for planning, executing, and analyzing experiments where you deliberately vary multiple input factors to observe effects on output responses.
+
+**Quick example:**
+
+You're optimizing a web signup flow with 3 factors:
+- **Factor A**: Form layout (single-page vs multi-step)
+- **Factor B**: CTA button color (blue vs green)
+- **Factor C**: Social proof (testimonials vs user count)
+
+**Naive approach**: Test one at a time = 6 runs (2 levels each × 3 factors)
+- But you miss interactions! Maybe blue works better for single-page, green for multi-step.
+
+**DOE approach**: 2³ factorial design = 8 runs
+- Tests all combinations: (single/blue/testimonials), (single/blue/count), (single/green/testimonials), etc.
+- Reveals main effects AND interactions
+- Statistical power to detect differences
+
+**Result**: You discover that layout and CTA color interact strongly—multi-step + green outperforms everything, but single-page + blue is close second. Social proof has minimal effect. Make data-driven decision with confidence.
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Design of Experiments Progress:
+- [ ] Step 1: Define objectives and constraints
+- [ ] Step 2: Identify factors, levels, and responses
+- [ ] Step 3: Choose experimental design
+- [ ] Step 4: Plan execution details
+- [ ] Step 5: Create experiment plan document
+- [ ] Step 6: Validate quality
+```
+
+**Step 1: Define objectives and constraints**
+
+Clarify the experiment goal (screening vs optimization), response metric(s), experimental budget (max runs), time/cost constraints, and success criteria. See [Common Patterns](#common-patterns) for typical objectives.
+
+**Step 2: Identify factors, levels, and responses**
+
+List all candidate factors (controllable inputs), specify levels for each factor (low/high or discrete values), categorize factors (control vs noise), and define response variables (measurable outputs). For screening many factors (8+), see [resources/methodology.md](resources/methodology.md#screening-designs) for Plackett-Burman and fractional factorial approaches.
+
+**Step 3: Choose experimental design**
+
+Based on objective and constraints:
+- **For screening 5+ factors with limited runs** → Use [resources/methodology.md](resources/methodology.md#screening-designs) for fractional factorial or Plackett-Burman
+- **For optimizing 2-5 factors** → Use [resources/template.md](resources/template.md#factorial-designs) for full or fractional factorial
+- **For response surface mapping** → Use [resources/methodology.md](resources/methodology.md#response-surface-methodology) for central composite or Box-Behnken
+- **For robust design against noise** → Use [resources/methodology.md](resources/methodology.md#taguchi-methods) for parameter vs noise factor arrays
+
+**Step 4: Plan execution details**
+
+Specify randomization order (eliminate time trends), blocking strategy (control nuisance variables), replication plan (estimate error), sample size justification (power analysis), and measurement protocols. See [Guardrails](#guardrails) for critical requirements.
+
+**Step 5: Create experiment plan document**
+
+Create `design-of-experiments.md` with sections: objective, factors table, design matrix (run order with factor settings), response variables, execution protocol, and analysis plan. Use [resources/template.md](resources/template.md) for structure.
+
+**Step 6: Validate quality**
+
+Self-assess using [resources/evaluators/rubric_design_of_experiments.json](resources/evaluators/rubric_design_of_experiments.json). Check: objective clarity, factor completeness, design appropriateness, randomization plan, measurement protocol, statistical power, analysis plan, and deliverable quality. **Minimum standard**: Average score ≥ 3.5 before delivering.
+
+## Common Patterns
+
+**Pattern 1: Screening (many factors → vital few)**
+- **Context**: 10-30 candidate factors, limited budget, want to identify 3-5 critical factors
+- **Approach**: Plackett-Burman or fractional factorial (Resolution III/IV)
+- **Output**: Pareto chart of effect sizes, shortlist for follow-up optimization
+- **Example**: Software performance tuning with 15 configuration parameters
+
+**Pattern 2: Optimization (find best settings)**
+- **Context**: 2-5 factors already identified as important, want to find optimal levels
+- **Approach**: Full factorial (2^k) or fractional factorial + steepest ascent
+- **Output**: Main effects plot, interaction plots, recommended settings
+- **Example**: Manufacturing process with temperature, pressure, time factors
+
+**Pattern 3: Response Surface (map the landscape)**
+- **Context**: Need to understand curvature, find maximum/minimum, quantify tradeoffs
+- **Approach**: Central Composite Design (CCD) or Box-Behnken
+- **Output**: Response surface equation, contour plots, optimal region
+- **Example**: Chemical formulation with ingredient ratios
+
+**Pattern 4: Robust Design (work despite noise)**
+- **Context**: Product/process must perform well despite uncontrollable variation
+- **Approach**: Taguchi inner-outer array (control × noise factors)
+- **Output**: Settings that minimize sensitivity to noise factors
+- **Example**: Consumer product that must work across temperature/humidity ranges
+
+**Pattern 5: Sequential Experimentation (learn then refine)**
+- **Context**: High uncertainty, want to learn iteratively with minimal waste
+- **Approach**: Screening → Steepest ascent → Response surface → Confirmation
+- **Output**: Progressively refined understanding and settings
+- **Example**: New product development with unknown factor relationships
+
+## Guardrails
+
+**Critical requirements:**
+
+1. **Randomize run order**: Eliminates time-order bias and confounding with lurking variables. Use random number generator, not "convenient" sequences.
+
+2. **Replicate center points**: For designs with continuous factors, replicate center point runs (3-5 times) to estimate pure error and detect curvature.
+
+3. **Avoid confounding critical interactions**: In fractional factorials, don't confound important 2-way interactions with main effects. Choose Resolution ≥ IV if interactions matter.
+
+4. **Check design balance**: Ensure orthogonality (factors are uncorrelated in design matrix). Correlation > 0.3 reduces precision and interpretability.
+
+5. **Define response precisely**: Use objective, quantitative, repeatable measurements. Avoid subjective scoring unless calibrated with multiple raters.
+
+6. **Justify sample size**: Run power analysis to ensure design can detect meaningful effect sizes with acceptable Type II error risk (β ≤ 0.20).
+
+7. **Document assumptions**: State expected effect magnitudes, interaction assumptions, noise variance estimates. Design validity depends on these.
+
+8. **Plan for analysis before running**: Specify statistical tests, significance level (α), effect size metrics before data collection. Prevents p-hacking.
+
+**Common pitfalls:**
+
+- ❌ **One-factor-at-a-time (OFAT)**: Misses interactions, requires more runs than factorial designs
+- ❌ **Ignoring blocking**: If runs span days/batches/operators, block accordingly or confound results with time trends
+- ❌ **Too many levels**: Use 2-3 levels initially. More levels increase runs exponentially.
+- ❌ **Unmeasured factors**: If an important factor isn't controlled/measured, it becomes noise
+- ❌ **Changing protocols mid-experiment**: Breaks design structure. If necessary, restart or analyze separately.
+
+## Quick Reference
+
+**Key resources:**
+
+- **[resources/template.md](resources/template.md)**: Quick-start templates for common designs (factorial, screening, response surface)
+- **[resources/methodology.md](resources/methodology.md)**: Advanced techniques (optimal designs, Taguchi, mixture experiments, sequential strategies)
+- **[resources/evaluators/rubric_design_of_experiments.json](resources/evaluators/rubric_design_of_experiments.json)**: Quality criteria for experiment plans
+
+**Typical workflow time:**
+
+- Simple factorial (2-4 factors): 15-30 minutes
+- Screening design (8+ factors): 30-45 minutes
+- Response surface design: 45-60 minutes
+- Robust design (Taguchi): 60-90 minutes
+
+**When to escalate:**
+
+- User needs mixture experiments (factors must sum to 100%)
+- Split-plot designs required (hard-to-change factors)
+- Optimal designs for irregular constraints
+- Bayesian adaptive designs
+→ Use [resources/methodology.md](resources/methodology.md) for these advanced cases
+
+**Inputs required:**
+
+- **Process/System**: What you're experimenting on
+- **Factors**: List of controllable inputs with candidate levels
+- **Responses**: Measurable outputs (KPIs, metrics)
+- **Constraints**: Budget (max runs), time, resources
+- **Objective**: Screening, optimization, response surface, or robust design
+
+**Outputs produced:**
+
+- `design-of-experiments.md`: Complete experiment plan with design matrix, randomization, protocols, analysis approach
--- a/skills/design-of-experiments/resources/evaluators/rubric_design_of_experiments.json
+++ b/skills/design-of-experiments/resources/evaluators/rubric_design_of_experiments.json
@@ -0,0 +1,307 @@
+{
+  "criteria": [
+    {
+      "name": "Objective Definition & Context",
+      "description": "Is the experiment objective clearly defined with goal, success criteria, and constraints?",
+      "scoring": {
+        "1": "Vague objective. Goal unclear (not specified if screening/optimization/RSM/robust). Success criteria missing or unmeasurable. Constraints not documented. Insufficient context for experiment design.",
+        "3": "Objective stated but lacks specificity. Goal identified (screening/optimization/etc.) but success criteria qualitative. Some constraints mentioned (run budget, time) but not all. Context provided but gaps remain.",
+        "5": "Exemplary objective definition. Specific goal (screening X factors to Y critical ones, optimize for Z metric, map response surface, robust design against noise). Quantified success criteria (e.g., 'reduce defects < 2%'). All constraints documented (max runs, time, budget, resources). Clear context and rationale."
+      }
+    },
+    {
+      "name": "Factor Selection & Specification",
+      "description": "Are factors comprehensive, well-justified, with appropriate levels and ranges?",
+      "scoring": {
+        "1": "Incomplete factor list. Missing obvious important factors. No rationale for inclusion/exclusion. Levels not specified or inappropriate ranges (too narrow, outside feasible region). Factor types (control/noise) not distinguished.",
+        "3": "Factors identified but selection rationale brief. Levels specified but ranges may be suboptimal. Some justification for factor choice. Control vs noise distinction present but may be incomplete. Minor gaps in factor coverage.",
+        "5": "Comprehensive factor identification with explicit rationale for each. Levels span meaningful ranges based on domain knowledge, literature, or constraints. Control vs noise factors clearly distinguished. Excluded factors documented with reason. Factor table complete (name, type, levels, units, rationale)."
+      }
+    },
+    {
+      "name": "Response Variable Definition",
+      "description": "Are response variables objective, measurable, and aligned with experiment objective?",
+      "scoring": {
+        "1": "Response poorly defined. Measurement method unspecified or subjective. Target direction unclear (maximize/minimize/hit target). No justification for response choice. Multiple responses without tradeoff consideration.",
+        "3": "Response defined but measurement details limited. Method specified but reproducibility questionable. Target direction stated. Single response or multiple without explicit tradeoff strategy. Adequate for purpose.",
+        "5": "Precise response definition with objective, quantitative measurement protocol. Reproducible measurement method specified. Target clear (max/min/target value with tolerance). Multiple responses include tradeoff analysis or desirability function. Response choice well-justified relative to objective."
+      }
+    },
+    {
+      "name": "Design Type Selection & Appropriateness",
+      "description": "Is the experimental design appropriate for the objective, factor count, and constraints?",
+      "scoring": {
+        "1": "Design type missing or inappropriate. Full factorial for 8+ factors (wasteful). Plackett-Burman for optimization (ignores interactions). No justification for design choice. Design structure incorrect (not orthogonal, unbalanced).",
+        "3": "Design type appropriate but suboptimal. Reasonable for objective and factor count. Resolution adequate (e.g., Resolution IV for screening with some interactions). Minor inefficiencies. Justification brief. Design structure mostly correct.",
+        "5": "Optimal design selection with clear rationale. Efficient for objective: Plackett-Burman/fractional factorial for screening, full factorial/RSM for optimization, CCD/Box-Behnken for response surface, Taguchi for robust design. Resolution justified. Design structure correct (orthogonal, balanced, appropriate run count). Confounding documented."
+      }
+    },
+    {
+      "name": "Randomization & Blocking",
+      "description": "Is randomization properly planned? Is blocking used appropriately for nuisance variables?",
+      "scoring": {
+        "1": "No randomization plan or randomization ignored (runs in convenient order). Blocking needed but not used (runs span days/batches/operators without control). Time-order confounding risk. Method for randomization not specified.",
+        "3": "Randomization mentioned but method not detailed. Blocking used if obvious (e.g., runs span 2 days → 2 blocks) but may miss subtler nuisance variables. Partial randomization (e.g., constrained by hard-to-change factors without split-plot acknowledgment).",
+        "5": "Complete randomization plan with specific method (random number generator, software). Run order documented in design matrix. Blocking strategy addresses all major nuisance variables (day, batch, operator, machine). Split-plot design used if factors have different change difficulty. Randomization within blocks documented."
+      }
+    },
+    {
+      "name": "Replication & Center Points",
+      "description": "Is replication planned to estimate error? Are center points included to detect curvature?",
+      "scoring": {
+        "1": "No replication. No center points (for continuous factors). Cannot estimate pure error or detect curvature. Single run per design point with no variance estimation strategy.",
+        "3": "Some replication: center points present (2-3 replicates) OR partial design replication. Can estimate error but power may be limited. Replication adequate for basic analysis but not robust. Center points may be insufficient (< 3).",
+        "5": "Appropriate replication strategy: 3-5 center point replicates for continuous factors, plus optional full design replication (2-3x) for critical experiments. Replication justified by power analysis. Pure error estimate enables lack-of-fit test. Center points detect curvature for follow-up RSM."
+      }
+    },
+    {
+      "name": "Sample Size & Statistical Power",
+      "description": "Is the design adequately powered to detect meaningful effects?",
+      "scoring": {
+        "1": "No power analysis. Run count arbitrary or based solely on convenience. Underpowered (Type II error risk > 0.5). Insufficient runs to estimate all effects in model (degrees of freedom deficit). Effect size not specified.",
+        "3": "Informal power consideration (rule of thumb, pilot data). Run count reasonable for factor count. Likely adequate to detect large effects (> 1.5σ) but may miss smaller meaningful effects. Effect size and noise variance roughly estimated.",
+        "5": "Formal power analysis conducted. Minimum detectable effect size specified based on practical significance. Noise variance estimated from historical data, pilot runs, or domain knowledge. Run count justified to achieve power ≥ 0.80 (β ≤ 0.20) at α = 0.05. Degrees of freedom adequate for model estimation and error testing."
+      }
+    },
+    {
+      "name": "Execution Protocol & Measurement",
+      "description": "Is the execution protocol detailed, standardized, and reproducible?",
+      "scoring": {
+        "1": "No protocol or very high-level only. Factor settings not translated to actual units/procedures. Measurement method vague. No quality controls. Timeline missing. Protocol not reproducible by independent experimenter.",
+        "3": "Protocol present with key steps. Factor settings specified in actual units. Measurement method outlined but some details missing. Basic quality controls (calibration mentioned). Timeline present. Mostly reproducible but some ambiguity.",
+        "5": "Detailed step-by-step protocol. Factor settings precisely specified with units and tolerances. Measurement method fully detailed (instrument, procedure, recording). Quality controls comprehensive (calibration, stability checks, outlier handling). Realistic timeline with contingency. Protocol reproducible by independent party without clarification."
+      }
+    },
+    {
+      "name": "Analysis Plan & Decision Criteria",
+      "description": "Is the analysis approach pre-specified with clear decision criteria?",
+      "scoring": {
+        "1": "No analysis plan. Statistical methods not specified. Significance level not stated. Decision criteria vague or missing. No plan for residual diagnostics. Risk of p-hacking (data-driven analysis choices).",
+        "3": "Basic analysis plan: main effects, ANOVA mentioned. Significance level stated (α = 0.05). Decision criteria present but qualitative. Residual checks mentioned but not detailed. Some pre-specification but room for ad-hoc choices.",
+        "5": "Comprehensive pre-specified analysis plan. Methods detailed: effect estimation, ANOVA, regression model form, graphical analysis (main effects, interaction plots, Pareto charts). Significance level and decision criteria quantified. Residual diagnostics specified (normality, constant variance, independence tests). Follow-up strategy if assumptions violated (transformations, robust methods). Prevents p-hacking."
+      }
+    },
+    {
+      "name": "Assumptions, Limitations & Risk Mitigation",
+      "description": "Are key assumptions stated explicitly? Are limitations and risks acknowledged with mitigation?",
+      "scoring": {
+        "1": "Assumptions not documented. Limitations not acknowledged. Risks ignored. No contingency plans. Design presented as if no uncertainty. Sparsity-of-effects assumed without justification in screening designs.",
+        "3": "Key assumptions mentioned (linearity, interaction structure, variance homogeneity). Some limitations noted (design resolution, factor range). Risks identified but mitigation incomplete. Assumptions mostly reasonable but not fully justified.",
+        "5": "All critical assumptions explicitly stated and justified: effect linearity, interaction sparsity (if assumed), process stability, measurement precision, independence. Limitations clearly documented: confounding structure in fractional designs, extrapolation boundaries, measurement limits. Risks identified with mitigation strategies (e.g., confirmation runs, fold-over if confounding ambiguous). Assumptions testable via diagnostics."
+      }
+    }
+  ],
+  "minimum_score": 3.5,
+  "guidance_by_experiment_type": {
+    "Screening (8+ factors)": {
+      "target_score": 4.0,
+      "focus_criteria": [
+        "Design Type Selection & Appropriateness",
+        "Factor Selection & Specification",
+        "Assumptions, Limitations & Risk Mitigation"
+      ],
+      "recommended_designs": [
+        "Plackett-Burman (12, 16, 20 runs)",
+        "Fractional Factorial Resolution III-IV (2^(k-p) with k-p ≥ 4)",
+        "Definitive Screening Designs (3-column designs for k factors in 2k+1 runs)"
+      ],
+      "common_pitfalls": [
+        "Using full factorial (2^k runs explode for k > 5)",
+        "Ignoring that main effects confounded with 2-way interactions (sparsity assumption critical)",
+        "Not planning fold-over or follow-up design if confounding becomes problematic",
+        "Insufficient factor coverage (missing important variables)"
+      ],
+      "quality_indicators": {
+        "excellent": "Efficient design (12-24 runs for 8-15 factors), sparsity assumption justified, clear ranking of factors by effect size, shortlist for follow-up (top 3-5 factors identified)",
+        "sufficient": "Adequate design for factor count, main effects estimated, Pareto chart produced, factors ranked",
+        "insufficient": "Design inefficient (too many or too few runs), confounding not understood, no clear factor prioritization"
+      }
+    },
+    "Optimization (2-5 factors)": {
+      "target_score": 4.2,
+      "focus_criteria": [
+        "Design Type Selection & Appropriateness",
+        "Randomization & Blocking",
+        "Analysis Plan & Decision Criteria"
+      ],
+      "recommended_designs": [
+        "Full Factorial 2^k (k ≤ 5)",
+        "Fractional Factorial Resolution V (2^(k-1) with k ≤ 6)",
+        "Add center points (3-5) to detect curvature for RSM follow-up"
+      ],
+      "common_pitfalls": [
+        "Choosing Resolution III design (main effects confounded with 2-way interactions)",
+        "No center points → cannot detect curvature or estimate pure error",
+        "Ignoring interaction plots (may show strong interactions that change optimal settings)",
+        "Not randomizing run order (time trends confound with factor effects)"
+      ],
+      "quality_indicators": {
+        "excellent": "Resolution V design, 3-5 center points, randomized, interactions estimated, optimal settings identified with confidence intervals, confirmation runs planned",
+        "sufficient": "Resolution IV design, center points present, main effects and some interactions clear, optimum estimated",
+        "insufficient": "Low resolution, no center points, interactions not estimable, optimum uncertain"
+      }
+    },
+    "Response Surface (curvature mapping)": {
+      "target_score": 4.5,
+      "focus_criteria": [
+        "Design Type Selection & Appropriateness",
+        "Replication & Center Points",
+        "Analysis Plan & Decision Criteria"
+      ],
+      "recommended_designs": [
+        "Central Composite Design (CCD): 2^k + 2k + 3-5 center points",
+        "Box-Behnken Design (safer if extremes problematic)",
+        "Ensure rotatability (α = (2^k)^0.25 for CCD) or face-centered (α=1)"
+      ],
+      "common_pitfalls": [
+        "Using factorial design only (cannot fit quadratic, misses curvature)",
+        "Insufficient center points (< 3) → poor pure error estimate",
+        "Not checking rotatability → prediction variance uneven across design space",
+        "Extrapolating beyond design region (local approximation only)"
+      ],
+      "quality_indicators": {
+        "excellent": "CCD or Box-Behnken, 3-5 center points, quadratic model fitted, stationary point identified (max/min/saddle), contour plots, sensitivity analysis, confirmation runs at optimum",
+        "sufficient": "Appropriate RSM design, quadratic model, optimum estimated, contour plot",
+        "insufficient": "Linear model only, no curvature detection, optimum not characterized, no graphical visualization"
+      }
+    },
+    "Robust Design (Taguchi)": {
+      "target_score": 4.3,
+      "focus_criteria": [
+        "Factor Selection & Specification",
+        "Design Type Selection & Appropriateness",
+        "Analysis Plan & Decision Criteria"
+      ],
+      "recommended_designs": [
+        "Inner-outer array: L8/L12/L16 inner (control factors) × L4 outer (noise factors)",
+        "Calculate SNR (signal-to-noise ratio) for each inner run",
+        "Two-step optimization: (1) maximize SNR, (2) adjust mean to target"
+      ],
+      "common_pitfalls": [
+        "Not distinguishing control factors (settable in production) from noise factors (uncontrollable variation)",
+        "Using only mean response (ignores variance/robustness objective)",
+        "Choosing SNR metric that doesn't match objective (larger-better vs smaller-better vs target)",
+        "Too many noise factors (outer array size explodes)"
+      ],
+      "quality_indicators": {
+        "excellent": "Control and noise factors clearly distinguished, appropriate SNR metric, inner-outer array crossed correctly, two-step optimization yields settings robust to noise, confirmation under varied noise conditions",
+        "sufficient": "Inner-outer array used, SNR calculated, robust settings identified, some confirmation",
+        "insufficient": "No noise factors considered, only mean optimization, robustness not validated, SNR metric wrong"
+      }
+    },
+    "Sequential Experimentation": {
+      "target_score": 4.0,
+      "focus_criteria": [
+        "Objective Definition & Context",
+        "Design Type Selection & Appropriateness",
+        "Analysis Plan & Decision Criteria"
+      ],
+      "recommended_approach": [
+        "Stage 1: Screening (Plackett-Burman, 12-16 runs) → identify 3-5 factors",
+        "Stage 2: Steepest ascent (4-6 runs) → move toward optimal region",
+        "Stage 3: Factorial optimization (2^k, 8-16 runs) → estimate interactions",
+        "Stage 4: RSM refinement (CCD, 15-20 runs) → find true optimum",
+        "Stage 5: Confirmation (3-5 runs) → validate"
+      ],
+      "common_pitfalls": [
+        "Trying one-shot full design (wasteful if many factors, high uncertainty)",
+        "Skipping steepest ascent (factorial centered at wrong region)",
+        "Not updating factor ranges between stages (RSM far from optimum)",
+        "No confirmation runs (model not validated)"
+      ],
+      "quality_indicators": {
+        "excellent": "Multi-stage plan specified upfront, decision rules for progression (e.g., 'if curvature detected, add RSM'), factor ranges updated based on learnings, confirmation at end, total runs < 50% of one-shot approach",
+        "sufficient": "Sequential stages planned, some adaptivity, confirmation included",
+        "insufficient": "Single-stage only, no follow-up strategy, confirmation missing, inefficient run count"
+      }
+    }
+  },
+  "guidance_by_complexity": {
+    "Simple (2-4 factors, well-understood process)": {
+      "target_score": 3.8,
+      "sufficient_depth": "Full factorial or Resolution V fractional. Randomization and center points. ANOVA and main effects/interaction plots. Optimal settings with 90% CI. Confirmation runs.",
+      "key_requirements": [
+        "Complete factor table with levels and rationale",
+        "Design matrix with randomized run order",
+        "Analysis plan: ANOVA, interaction plots, optimal settings",
+        "3-5 center points for curvature detection",
+        "Confirmation runs (3+) at optimum"
+      ]
+    },
+    "Moderate (5-8 factors, some uncertainty)": {
+      "target_score": 4.0,
+      "sufficient_depth": "Fractional factorial (Resolution IV-V) or screening design. Randomization and blocking if needed. Power analysis for run count. Potential follow-up RSM if curvature detected. Residual diagnostics.",
+      "key_requirements": [
+        "Power analysis justifying run count",
+        "Confounding structure documented (for fractional designs)",
+        "Randomization and blocking plan",
+        "Pre-specified analysis (effects, ANOVA, model form)",
+        "Residual diagnostics (normality, constant variance, independence)",
+        "Follow-up strategy (fold-over, RSM, confirmation)"
+      ]
+    },
+    "Complex (8+ factors, high uncertainty, constraints)": {
+      "target_score": 4.2,
+      "sufficient_depth": "Multi-stage sequential strategy or optimal design (D-optimal) for constraints. Screening → optimization → RSM → confirmation. Comprehensive assumptions, limitations, risk mitigation. Advanced analysis (canonical, desirability functions, transformations).",
+      "key_requirements": [
+        "Sequential experimentation plan (screening → optimization → RSM)",
+        "Optimal design if irregular constraints (D-optimal, mixture designs, split-plot)",
+        "Power analysis at each stage",
+        "Comprehensive assumptions and limitations documented",
+        "Risk mitigation strategies (fold-over, blocking, replication)",
+        "Advanced analysis techniques (canonical analysis, response surface equations, multi-response optimization)",
+        "Confirmation and validation strategy"
+      ]
+    }
+  },
+  "common_failure_modes": [
+    {
+      "failure": "One-Factor-At-a-Time (OFAT) approach",
+      "symptom": "Proposal to vary factors sequentially: test Factor A at low/high while others fixed, then Factor B, etc.",
+      "detection": "Look for phrases like 'test each factor individually', 'change one variable at a time', 'hold all others constant'",
+      "fix": "Explain factorial designs test multiple factors simultaneously with fewer runs and reveal interactions. Example: 3 factors OFAT = 6 runs (2 per factor), misses interactions. 2^3 factorial = 8 runs, estimates main effects + all interactions."
+    },
+    {
+      "failure": "Ignoring randomization",
+      "symptom": "Runs executed in 'convenient' order (all low levels first, then high) or grouped by factor level. No mention of randomization in protocol.",
+      "detection": "Design matrix lacks 'Run Order' column or run order = design point order (1,2,3,...). Phrase 'run in order listed' or 'group by factor A level'.",
+      "fix": "Emphasize randomization eliminates time-order bias, learning effects, drift. Provide method: assign random numbers to each run, sort by random number = execution order. Exception: hard-to-change factors require split-plot design."
+    },
+    {
+      "failure": "No center points or replication",
+      "symptom": "Design has single run per design point, no center (0,0,0) replicates. Cannot estimate pure error or detect curvature.",
+      "detection": "Design matrix for continuous factors has no runs at center point. No mention of replication strategy.",
+      "fix": "Always add 3-5 center point replicates for continuous factors. Enables pure error estimate (test lack-of-fit), detects curvature (signals need for RSM follow-up), improves power."
+    },
+    {
+      "failure": "Underpowered design",
+      "symptom": "Very few runs relative to factors. Risk of missing important effects (high Type II error). No power analysis or effect size justification.",
+      "detection": "Run count < 2*(# factors). No mention of minimum detectable effect. Noise variance unknown or ignored.",
+      "fix": "Conduct power analysis. Specify minimum meaningful effect (δ). Estimate noise (σ) from pilot data. Calculate required n for power ≥ 0.80. Use standard designs (Plackett-Burman for screening, 2^k factorial for optimization) rather than arbitrary small sample."
+    },
+    {
+      "failure": "Wrong design type for objective",
+      "symptom": "Screening with full factorial (wasteful), optimization with Plackett-Burman (ignores interactions), curvature with factorial only (cannot fit quadratic).",
+      "detection": "Check alignment: Screening → Plackett-Burman/fractional factorial. Optimization → full factorial/Resolution V. Response surface → CCD/Box-Behnken. Robust → Taguchi inner-outer.",
+      "fix": "Match design to objective. Screening: minimize runs, identify vital few (Plackett-Burman). Optimization: estimate interactions (full/fractional factorial). RSM: fit curvature (CCD/Box-Behnken). Robust: control vs noise factors (inner-outer array)."
+    },
+    {
+      "failure": "Confounding not understood",
+      "symptom": "Fractional factorial used but confounding structure not documented. Claim 'main effects estimated' without noting confounding with 2-way interactions (Resolution III).",
+      "detection": "Design resolution not stated. No defining relation or alias structure. Resolution III design used for optimization (interactions matter).",
+      "fix": "Document confounding. State defining relation (e.g., I=ABCD). List aliases (e.g., A confounded with BCD). Choose Resolution ≥ IV if interactions important. Plan fold-over if confounding becomes problematic."
+    },
+    {
+      "failure": "No analysis plan (risk of p-hacking)",
+      "symptom": "Analysis approach vague ('will analyze data'), no pre-specified model, no decision criteria. Statistical tests chosen after seeing data.",
+      "detection": "Analysis section missing or very brief. No significance level stated. Model form not specified. Phrases like 'explore data', 'see what's significant'.",
+      "fix": "Pre-specify analysis before data collection. State model form (linear: Y ~ A + B + AB, quadratic: Y ~ A + B + A^2 + B^2 + AB). Set α (typically 0.05). Define decision criteria (effects with p < 0.05 considered significant). Specify diagnostics (residual plots, normality test)."
+    },
+    {
+      "failure": "Extrapolating beyond design region",
+      "symptom": "Recommending factor settings outside tested ranges based on model predictions. Claiming optimum at edge or outside design space.",
+      "detection": "Optimal settings include factor values < low level or > high level tested. Phrases like 'model predicts even better results at [extreme value]'.",
+      "fix": "Response surface models are local approximations. Only trust predictions within tested region (interpolation). If optimum appears outside, run steepest ascent to move toward new region, then new RSM centered there. Do not extrapolate."
+    }
+  ]
+}
--- a/skills/design-of-experiments/resources/methodology.md
+++ b/skills/design-of-experiments/resources/methodology.md
@@ -0,0 +1,413 @@
+# Design of Experiments - Advanced Methodology
+
+## Workflow
+
+Copy this checklist for advanced DOE cases:
+
+```
+Advanced DOE Progress:
+- [ ] Step 1: Assess complexity and choose advanced technique
+- [ ] Step 2: Design experiment using specialized method
+- [ ] Step 3: Plan execution with advanced considerations
+- [ ] Step 4: Analyze with appropriate statistical methods
+- [ ] Step 5: Iterate or confirm findings
+```
+
+**Step 1: Assess complexity**
+
+Identify which advanced technique applies: screening 8+ factors, response surface with curvature, robust design, mixture constraints, hard-to-change factors, or irregular factor space. See technique selection criteria below.
+
+**Step 2: Design experiment**
+
+Apply specialized design method from sections: [Screening Designs](#1-screening-designs), [Response Surface Methodology](#2-response-surface-methodology), [Taguchi Methods](#3-taguchi-methods-robust-parameter-design), [Optimal Designs](#4-optimal-designs), [Mixture Experiments](#5-mixture-experiments), or [Split-Plot Designs](#6-split-plot-designs).
+
+**Step 3: Plan execution**
+
+Address advanced considerations: blocking for nuisance variables, replication for variance estimation, center points for curvature detection, and sequential strategies. See [Sequential Experimentation](#7-sequential-experimentation) for iterative approaches.
+
+**Step 4: Analyze**
+
+Use appropriate analysis for design type: effect estimation, ANOVA, regression modeling, response surface equations, contour plots, and residual diagnostics. See [Analysis Techniques](#8-analysis-techniques).
+
+**Step 5: Iterate or confirm**
+
+Based on findings, run confirmation experiments, refine factor ranges, add center/axial points for RSM, or screen additional factors. See [Sequential Experimentation](#7-sequential-experimentation).
+
+---
+
+## 1. Screening Designs
+
+**When to use**: 8-30 candidate factors, limited experimental budget, goal is to identify 3-5 vital factors for follow-up optimization.
+
+### Plackett-Burman Designs
+
+**Structure**: Orthogonal designs with runs = multiple of 4. Screens k factors in k+1 runs.
+
+**Standard designs**:
+- 12 runs → screen up to 11 factors
+- 16 runs → screen up to 15 factors
+- 20 runs → screen up to 19 factors
+- 24 runs → screen up to 23 factors
+
+**Example: 12-run Plackett-Burman generator matrix**:
+```
+Run 1: + + - + + + - - - + -
+Subsequent runs: Cycle previous row left, last run is all minus
+```
+
+**Analysis**: Fit linear model Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ. Rank factors by |βᵢ|. Select top 3-5 for optimization. Pareto chart: cumulative % variance explained.
+
+**Limitation**: Main effects confounded with 2-way interactions. Only valid if interactions negligible (sparsity-of-effects principle).
+
+### Fractional Factorial Screening
+
+**When to use**: 5-8 factors, need to estimate some 2-way interactions, Resolution IV or V required.
+
+**Common designs**:
+- **2⁵⁻¹ (Resolution V)**: 16 runs, 5 factors. Main effects and 2-way interactions clear. Generator: I = ABCDE.
+- **2⁶⁻² (Resolution IV)**: 16 runs, 6 factors. Main effects clear, 2-way confounded with 2-way. Generators: I = ABCE, I = BCDF.
+- **2⁷⁻³ (Resolution IV)**: 16 runs, 7 factors. Generators: I = ABCD, I = ABEF, I = ACEG.
+
+**Confounding analysis**: Use defining relation to determine alias structure. Example for 2⁵⁻¹ with I = ABCDE:
+- A aliased with BCDE
+- AB aliased with CDE
+- ABC aliased with DE
+
+**Fold-over technique**: If screening reveals ambiguous confounding, run fold-over design (flip all signs) to de-alias. 16 runs + 16 fold-over = 32 runs = full 2⁵ factorial.
+
+---
+
+## 2. Response Surface Methodology
+
+**When to use**: 2-5 factors already identified as important, need to find optimum, expect curvature (quadratic relationship).
+
+### Central Composite Design (CCD)
+
+**Structure**: Factorial points + axial points + center points
+
+**Components**:
+- **Factorial points**: 2^k corner points (±1 for all factors)
+- **Axial points**: 2k points on axes (±α, 0, 0, ...) where α determines rotatability
+- **Center points**: 3-5 replicates at origin (0, 0, ..., 0)
+
+**Total runs**: 2^k + 2k + nc (nc = number of center points)
+
+**Example: CCD for 3 factors** (8 + 6 + 5 = 19 runs):
+
+| Type | X₁ | X₂ | X₃ |
+|------|----|----|-----|
+| Factorial | -1 | -1 | -1 |
+| ... | ... | ... | ... | (8 factorial points)
+| Axial | -α | 0 | 0 |
+| Axial | +α | 0 | 0 |
+| Axial | 0 | -α | 0 |
+| Axial | 0 | +α | 0 |
+| Axial | 0 | 0 | -α |
+| Axial | 0 | 0 | +α |
+| Center | 0 | 0 | 0 | (replicate 5 times)
+
+**Rotatability**: Choose α = (2^k)^(1/4) for equal prediction variance at equal distance from center. For 3 factors: α = 1.682.
+
+**Model**: Fit quadratic: Y = β₀ + Σβᵢxᵢ + Σβᵢᵢxᵢ² + Σβᵢⱼxᵢxⱼ
+
+**Analysis**: Canonical analysis to find stationary point (maximum, minimum, or saddle). Ridge analysis if optimum outside design region.
+
+### Box-Behnken Design
+
+**Structure**: 3-level design that avoids extreme corners (all factors at ±1 simultaneously).
+
+**Advantages**: Fewer runs than CCD, safer when extreme combinations may damage equipment or produce out-of-spec product.
+
+**Example: Box-Behnken for 3 factors** (12 + 3 = 15 runs):
+
+| X₁ | X₂ | X₃ |
+|----|----|----|
+| -1 | -1 | 0 |
+| +1 | -1 | 0 |
+| -1 | +1 | 0 |
+| +1 | +1 | 0 |
+| -1 | 0 | -1 |
+| +1 | 0 | -1 |
+| -1 | 0 | +1 |
+| +1 | 0 | +1 |
+| 0 | -1 | -1 |
+| 0 | +1 | -1 |
+| 0 | -1 | +1 |
+| 0 | +1 | +1 |
+| 0 | 0 | 0 | (center, replicate 3 times)
+
+**Model**: Same quadratic as CCD.
+
+**Trade-off**: Slightly less efficient than CCD for prediction, but avoids extreme points.
+
+---
+
+## 3. Taguchi Methods (Robust Parameter Design)
+
+**When to use**: Product/process must perform well despite uncontrollable variation (noise factors). Goal: Find control factor settings that minimize sensitivity to noise.
+
+### Inner-Outer Array Structure
+
+**Inner array**: Control factors (factors you can set in production)
+**Outer array**: Noise factors (environmental conditions, material variation, user variation)
+
+**Approach**: Cross inner array with outer array. Each inner array run is repeated at all outer array conditions.
+
+**Example: L₈ inner × L₄ outer** (8 control combinations × 4 noise conditions = 32 runs):
+
+**Inner array (control factors A, B, C)**:
+
+| Run | A | B | C |
+|-----|---|---|---|
+| 1 | -1 | -1 | -1 |
+| 2 | -1 | -1 | +1 |
+| ... | ... | ... | ... |
+| 8 | +1 | +1 | +1 |
+
+**Outer array (noise factors N₁, N₂)**:
+
+| Noise | N₁ | N₂ |
+|-------|----|----|
+| 1 | -1 | -1 |
+| 2 | -1 | +1 |
+| 3 | +1 | -1 |
+| 4 | +1 | +1 |
+
+**Data collection**: For each inner run, measure response Y at all 4 noise conditions. Calculate mean (Ȳ) and variance (s²) or signal-to-noise ratio (SNR).
+
+**Signal-to-Noise Ratios**:
+- **Larger-is-better**: SNR = -10 log₁₀(Σ(1/Y²)/n)
+- **Smaller-is-better**: SNR = -10 log₁₀(ΣY²/n)
+- **Target-is-best**: SNR = 10 log₁₀(Ȳ²/s²)
+
+**Analysis**: Choose control factor settings that maximize SNR (robust to noise) while achieving target mean.
+
+**Two-step optimization**:
+1. Maximize SNR to reduce variability
+2. Adjust mean to target using control factors that don't affect SNR
+
+---
+
+## 4. Optimal Designs
+
+**When to use**: Irregular factor space (constraints, categorical factors, unequal ranges), custom run budget, or standard designs don't fit.
+
+### D-Optimal Designs
+
+**Criterion**: Minimize determinant of (X'X)⁻¹ (maximize information, minimize variance of coefficient estimates).
+
+**Algorithm**: Computer-generated. Start with candidate set of all feasible runs, select subset that maximizes |X'X|.
+
+**Use cases**:
+- Mixture experiments with additional process variables
+- Constrained factor spaces (e.g., temperature + pressure can't both be high)
+- Irregular grids (e.g., existing data points + new runs)
+- Unequal factor ranges
+
+**Software**: Use R (AlgDesign package), JMP, Design-Expert, or Python (pyDOE).
+
+### A-Optimal and G-Optimal
+
+**A-optimal**: Minimize average variance of predictions (trace of (X'X)⁻¹).
+**G-optimal**: Minimize maximum variance across design space (minimax criterion).
+
+**Choice**: D-optimal for parameter estimation, G-optimal for prediction across entire space, A-optimal for average prediction quality.
+
+---
+
+## 5. Mixture Experiments
+
+**When to use**: Factors are proportions that must sum to 100% (e.g., chemical formulations, blend compositions, budget allocations).
+
+### Simplex-Lattice Designs
+
+**Constraints**: x₁ + x₂ + ... + xₖ = 1, all xᵢ ≥ 0
+
+**{q,k} designs**: q = number of levels for each component, k = number of components
+
+**Example: {2,3} simplex-lattice** (3 components at 0%, 50%, 100%):
+
+| Run | x₁ | x₂ | x₃ |
+|-----|----|----|-----|
+| 1 | 1.0 | 0.0 | 0.0 |
+| 2 | 0.0 | 1.0 | 0.0 |
+| 3 | 0.0 | 0.0 | 1.0 |
+| 4 | 0.5 | 0.5 | 0.0 |
+| 5 | 0.5 | 0.0 | 0.5 |
+| 6 | 0.0 | 0.5 | 0.5 |
+
+**Model**: Scheffé canonical polynomials. Linear: Y = β₁x₁ + β₂x₂ + β₃x₃. Quadratic: Y = Σβᵢxᵢ + Σβᵢⱼxᵢxⱼ.
+
+### Simplex-Centroid Designs
+
+**Structure**: Pure components + binary blends + ternary blends + overall centroid.
+
+**Example: 3-component simplex-centroid** (7 runs):
+
+| Run | x₁ | x₂ | x₃ |
+|-----|----|----|-----|
+| 1 | 1.0 | 0 | 0 | (pure components)
+| 2 | 0 | 1.0 | 0 |
+| 3 | 0 | 0 | 1.0 |
+| 4 | 0.5 | 0.5 | 0 | (binary blends)
+| 5 | 0.5 | 0 | 0.5 |
+| 6 | 0 | 0.5 | 0.5 |
+| 7 | 0.33 | 0.33 | 0.33 | (centroid)
+
+**Constraints**: Add lower/upper bounds if components have minimum/maximum limits. Use D-optimal design for constrained mixture space.
+
+---
+
+## 6. Split-Plot Designs
+
+**When to use**: Some factors are hard to change (e.g., temperature requires hours to stabilize), others are easy to change. Randomizing all factors fully is impractical or expensive.
+
+### Structure
+
+**Whole-plot factors**: Hard to change (temperature, batch, supplier)
+**Subplot factors**: Easy to change (concentration, time, operator)
+
+**Design**: Randomize whole-plot factors at top level, randomize subplot factors within each whole-plot level.
+
+**Example: 2² split-plot** (Temperature = whole-plot, Time = subplot, 2 replicates):
+
+| Whole-plot | Temp | Subplot | Time | Run order |
+|------------|------|---------|------|-----------|
+| 1 | Low | 1 | Short | 1 |
+| 1 | Low | 2 | Long | 2 |
+| 2 | High | 3 | Short | 4 |
+| 2 | High | 4 | Long | 3 |
+| (Replicate block 2)
+
+**Analysis**: Mixed model with whole-plot error and subplot error terms. Whole-plot factors tested with lower precision (fewer degrees of freedom).
+
+**Trade-off**: Allows practical execution when full randomization impossible, but reduces statistical power for hard-to-change factors.
+
+---
+
+## 7. Sequential Experimentation
+
+**Philosophy**: Learn iteratively, adapt design based on results. Minimize total runs while maximizing information.
+
+### Stage 1: Screening
+
+**Objective**: Reduce 10-20 candidates to 3-5 critical factors.
+**Design**: Plackett-Burman or 2^(k-p) fractional factorial (Resolution III-IV).
+**Runs**: 12-20.
+**Output**: Ranked factor list, effect sizes with uncertainty.
+
+### Stage 2: Steepest Ascent/Descent
+
+**Objective**: Move quickly toward optimal region using main effects from screening.
+**Method**: Calculate path of steepest ascent (gradient = effect estimates). Run experiments along this path until response stops improving.
+**Example**: If screening finds temp effect = +10, pressure effect = +5, move in direction (temp: +2, pressure: +1).
+
+### Stage 3: Factorial Optimization
+
+**Objective**: Explore region around best settings from steepest ascent, estimate interactions.
+**Design**: 2^k full factorial or Resolution V fractional factorial with 3-5 factors.
+**Runs**: 16-32.
+**Output**: Optimal settings, interaction effects, linear model.
+
+### Stage 4: Response Surface Refinement
+
+**Objective**: Fit curvature, find true optimum.
+**Design**: CCD or Box-Behnken centered at best settings from Stage 3.
+**Runs**: 15-20.
+**Output**: Quadratic model, stationary point (optimum), contour plots.
+
+### Stage 5: Confirmation
+
+**Objective**: Validate predicted optimum.
+**Design**: 3-5 replication runs at predicted optimal settings.
+**Output**: Confidence interval for response at optimum. If prediction interval contains observed mean, model validated.
+
+**Total runs example**: Screening (16) + Steepest ascent (4) + Factorial (16) + RSM (15) + Confirmation (3) = 54 runs. Compare to one-shot full factorial for 10 factors = 1024 runs.
+
+---
+
+## 8. Analysis Techniques
+
+### Effect Estimation
+
+**Factorial designs**: Estimate main effect of factor A as: Effect(A) = (Ȳ₊ - Ȳ₋) where Ȳ₊ = mean response when A is high, Ȳ₋ = mean when A is low.
+
+**Interaction effect**: Effect(AB) = [(Ȳ₊₊ + Ȳ₋₋) - (Ȳ₊₋ + Ȳ₋₊)] / 2
+
+**Standard error**: SE(effect) = 2σ/√n, where σ estimated from replicates or center points.
+
+### ANOVA
+
+**Purpose**: Test statistical significance of effects.
+**Null hypothesis**: Effect = 0.
+**Test statistic**: F = MS(effect) / MS(error), compare to F-distribution.
+**Significance**: p < 0.05 (or chosen α level) → reject H₀, effect is significant.
+
+### Regression Modeling
+
+**Linear model**: Y = β₀ + β₁x₁ + β₂x₂ + β₁₂x₁x₂ + ε
+**Quadratic model** (RSM): Y = β₀ + Σβᵢxᵢ + Σβᵢᵢxᵢ² + Σβᵢⱼxᵢxⱼ + ε
+
+**Fit**: Least squares (minimize Σ(Yᵢ - Ŷᵢ)²).
+**Assessment**: R², adjusted R², RMSE, residual plots.
+
+### Residual Diagnostics
+
+**Check assumptions**:
+1. **Normal probability plot**: Residuals should fall on straight line. Non-normality indicates transformation needed.
+2. **Residuals vs fitted**: Random scatter around zero. Funnel shape indicates non-constant variance (transform Y).
+3. **Residuals vs run order**: Random. Trend indicates time drift, lack of randomization.
+4. **Residuals vs factors**: Random. Pattern indicates missing interaction or curvature.
+
+**Transformations**: Log(Y) for multiplicative effects, √Y for count data, 1/Y for rate data, Box-Cox for data-driven choice.
+
+### Optimization
+
+**Contour plots**: Visualize response surface, identify optimal region, assess tradeoffs.
+**Desirability functions**: Multi-response optimization. Convert each response to 0-1 scale (0 = unacceptable, 1 = ideal). Maximize geometric mean of desirabilities.
+**Canonical analysis**: Find stationary point (∂Y/∂xᵢ = 0), classify as maximum, minimum, or saddle point based on eigenvalues of Hessian matrix.
+
+---
+
+## 9. Sample Size and Power Analysis
+
+**Before designing experiment, determine required runs**:
+
+**Power**: Probability of detecting true effect if it exists (1 - β). Standard: power ≥ 0.80.
+
+**Effect size (δ)**: Minimum meaningful difference. Example: "Must detect 10% yield improvement."
+
+**Noise (σ)**: Process variability. Estimate from historical data, pilot runs, or engineering judgment.
+
+**Formula for factorial designs**: n ≥ 2(Zα/2 + Zβ)²σ² / δ² per cell.
+
+**Example**: Detect δ = 5 units, σ = 3 units, α = 0.05, power = 0.80.
+- n ≥ 2(1.96 + 0.84)²(3²) / 5² = 2(7.84)(9) / 25 ≈ 6 replicates per factor level combination.
+
+**For screening**: Use effect sparsity assumption. If testing 10 factors, expect 2-3 active. Size design to detect large effects (1-2σ).
+
+**Software**: Use G*Power, R (pwr package), JMP, or online calculators.
+
+---
+
+## 10. Common Pitfalls and Solutions
+
+**Pitfall 1: Ignoring confounding in screening designs**
+- **Problem**: Plackett-Burman confounds main effects with 2-way interactions. If interactions exist, main effect estimates are biased.
+- **Solution**: Use only when sparsity-of-effects applies (most interactions negligible). Follow up ambiguous results with Resolution IV/V design or fold-over.
+
+**Pitfall 2: Extrapolating beyond design region**
+- **Problem**: Response surface models are local approximations. Predicting outside tested factor ranges is unreliable.
+- **Solution**: Expand design if optimum appears outside current region. Run steepest ascent, then new RSM centered on improved region.
+
+**Pitfall 3: Inadequate replication**
+- **Problem**: Without replicates, cannot estimate pure error or test lack-of-fit.
+- **Solution**: Always include 3-5 center point replicates. For critical experiments, replicate entire design (2-3 times).
+
+**Pitfall 4: Changing protocols mid-experiment**
+- **Problem**: Breaks orthogonality, confounds design structure with time.
+- **Solution**: Complete design as planned. If protocol change necessary, analyze before/after separately or treat as blocking factor.
+
+**Pitfall 5: Treating categorical factors as continuous**
+- **Problem**: Assigning arbitrary numeric codes (-1, 0, +1) to unordered categories (e.g., Supplier A/B/C) implies ordering that doesn't exist.
+- **Solution**: Use indicator variables (dummy coding) or separate experiments for each category level.
--- a/skills/design-of-experiments/resources/template.md
+++ b/skills/design-of-experiments/resources/template.md
@@ -0,0 +1,395 @@
+# Design of Experiments - Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+DOE Template Progress:
+- [ ] Step 1: Define experiment objective
+- [ ] Step 2: List factors and levels
+- [ ] Step 3: Select design type
+- [ ] Step 4: Generate design matrix
+- [ ] Step 5: Randomize and document protocol
+- [ ] Step 6: Finalize experiment plan
+```
+
+**Step 1: Define experiment objective**
+
+Specify what you're trying to learn (screening, optimization, response surface, robust design), primary response metric(s), and success criteria. See [Objective Definition](#objective-definition) for examples.
+
+**Step 2: List factors and levels**
+
+Identify all factors (controllable inputs), specify levels for each (2-3 initially), distinguish control vs noise factors, and define measurable responses. See [Factor Table Template](#factor-table-template) for structure.
+
+**Step 3: Select design type**
+
+Based on objective:
+- **2-5 factors, want all combinations** → [Full Factorial](#full-factorial-designs)
+- **5+ factors, limited runs** → [Fractional Factorial](#fractional-factorial-designs)
+- **Screening 8+ factors** → [Plackett-Burman](#plackett-burman-screening)
+
+**Step 4: Generate design matrix**
+
+Create run-by-run table with factor settings for each experimental run. See [Design Matrix Examples](#design-matrix-examples) for format.
+
+**Step 5: Randomize and document protocol**
+
+Randomize run order, specify blocking if needed, detail measurement procedures, and plan replication strategy. See [Execution Details](#execution-details) for guidance.
+
+**Step 6: Finalize experiment plan**
+
+Create complete `design-of-experiments.md` document using [Document Structure Template](#document-structure-template). Self-check with quality criteria in [Quality Checklist](#quality-checklist).
+
+---
+
+## Document Structure Template
+
+Use this structure for the final `design-of-experiments.md` file:
+
+```markdown
+# Design of Experiments: [Experiment Name]
+
+## 1. Objective
+
+**Goal**: [Screening | Optimization | Response Surface | Robust Design]
+
+**Context**: [1-2 sentences describing the system/process being studied]
+
+**Success Criteria**: [What constitutes a successful experiment? Measurable outcomes.]
+
+**Constraints**:
+- Budget: [Maximum number of runs allowed]
+- Time: [Deadline or duration per run]
+- Resources: [Equipment, personnel, materials]
+
+## 2. Factors and Levels
+
+| Factor | Type | Low Level (-1) | High Level (+1) | Center (0) | Units | Rationale |
+|--------|------|----------------|-----------------|------------|-------|-----------|
+| A: [Name] | Control | [value] | [value] | [value] | [units] | [Why this factor?] |
+| B: [Name] | Control | [value] | [value] | [value] | [units] | [Why this factor?] |
+| C: [Name] | Noise | [value] | [value] | - | [units] | [Uncontrollable variation] |
+
+**Factor Selection Rationale**: [Why these factors? Any excluded? Assumptions?]
+
+## 3. Response Variables
+
+| Response | Description | Measurement Method | Target | Units |
+|----------|-------------|-------------------|---------|-------|
+| Y1: [Name] | [What it measures] | [How measured] | [Maximize/Minimize/Target value] | [units] |
+| Y2: [Name] | [What it measures] | [How measured] | [Maximize/Minimize/Target value] | [units] |
+
+**Response Selection Rationale**: [Why these responses? Any tradeoffs?]
+
+## 4. Experimental Design
+
+**Design Type**: [Full Factorial 2^k | Fractional Factorial 2^(k-p) | Plackett-Burman | Central Composite | Box-Behnken]
+
+**Resolution**: [For fractional factorials: III, IV, or V]
+
+**Runs**:
+- Design points: [number]
+- Center points: [number of replicates at center]
+- Total runs: [design + center]
+
+**Design Rationale**: [Why this design? What can/can't it detect?]
+
+## 5. Design Matrix
+
+| Run | Order | Block | A | B | C | Y1 | Y2 | Notes |
+|-----|-------|-------|---|---|---|----|----|-------|
+| 1 | 5 | 1 | -1 | -1 | -1 | | | |
+| 2 | 12 | 1 | +1 | -1 | -1 | | | |
+| 3 | 3 | 1 | -1 | +1 | -1 | | | |
+| 4 | 8 | 1 | +1 | +1 | -1 | | | |
+| 5 | 1 | 2 | -1 | -1 | +1 | | | |
+| ... | ... | ... | ... | ... | ... | | | |
+
+**Randomization**: Run order randomized using [method]. Original design point order preserved in "Run" column.
+
+**Blocking**: [If used] Runs blocked by [day/batch/operator/etc.] to control for [nuisance variable].
+
+## 6. Execution Protocol
+
+**Preparation**:
+- [ ] [Equipment setup/calibration steps]
+- [ ] [Material preparation]
+- [ ] [Personnel training]
+
+**Run Procedure**:
+1. [Step-by-step protocol for each run]
+2. [Factor settings to apply]
+3. [Wait/equilibration time]
+4. [Response measurement procedure]
+5. [Recording method]
+
+**Quality Controls**:
+- [Measurement calibration checks]
+- [Process stability verification]
+- [Outlier detection procedure]
+
+**Timeline**: [Start date, duration per run, expected completion]
+
+## 7. Analysis Plan
+
+**Primary Analysis**:
+- Calculate main effects for factors A, B, C
+- Calculate 2-way interaction effects (AB, AC, BC)
+- Fit linear model: Y = β0 + β1·A + β2·B + β3·C + β12·AB + ...
+- ANOVA to test significance (α = 0.05)
+- Residual diagnostics (normality, constant variance, independence)
+
+**Graphical Analysis**:
+- Main effects plot
+- Interaction plot
+- Pareto chart of standardized effects
+- Residual plots (normal probability, vs fitted, vs order)
+
+**Decision Criteria**:
+- Effects significant at p < 0.05 are considered important
+- Interaction present if p(interaction) < 0.05
+- Optimal settings chosen to [maximize/minimize] Y1 while [constraint on Y2]
+
+**Follow-up**:
+- If curvature detected → Run [response surface design]
+- If additional factors identified → Run [screening design]
+- Confirmation runs: [Number] at predicted optimum settings
+
+## 8. Assumptions and Limitations
+
+**Assumptions**:
+- [Linear relationship between factors and response]
+- [No strong higher-order interactions]
+- [Homogeneous variance across factor space]
+- [Errors are independent and normally distributed]
+- [Process is stable during experiment]
+
+**Limitations**:
+- [Design resolution limits – e.g., 2-way interactions confounded]
+- [Factor range restrictions]
+- [Measurement precision limits]
+- [External validity – generalization beyond tested region]
+
+**Risks**:
+- [What could invalidate results?]
+- [Mitigation strategies]
+
+## 9. Expected Outcomes
+
+**If screening design**:
+- Pareto chart identifying 3-5 critical factors from [N] candidates
+- Effect size estimates with confidence intervals
+- Shortlist for follow-up optimization experiment
+
+**If optimization design**:
+- Optimal factor settings: A = [value], B = [value], C = [value]
+- Predicted response at optimum: Y1 = [value] ± [CI]
+- Interaction insights: [Which factors interact? How?]
+
+**If response surface**:
+- Response surface equation: Y = [polynomial model]
+- Contour/surface plots showing optimal region
+- Sensitivity analysis showing robustness
+
+**Deliverables**:
+- This experiment plan document
+- Completed design matrix with results (after execution)
+- Analysis report with plots and recommendations
+```
+
+---
+
+## Objective Definition
+
+**Screening**: Screen 12 software config parameters to identify 3-5 affecting API response time. Success: Reduce candidates 60%+. Constraint: Max 16 runs.
+
+**Optimization**: Optimize injection molding (temp, pressure, time) to minimize defect rate while cycle time < 45s. Success: < 2% defects (currently 8%). Constraint: Max 20 runs, 2 days.
+
+**Response Surface**: Map yield vs temperature/pH, find maximum, model curvature. Success: R² > 0.90, optimal region. Constraint: Max 15 runs.
+
+---
+
+## Factor Table Template
+
+| Factor | Type | Low (-1) | High (+1) | Center (0) | Units | Rationale |
+|--------|------|----------|-----------|------------|-------|-----------|
+| A: Temperature | Control | 150°C | 200°C | 175°C | °C | Literature suggests 150-200 range optimal |
+| B: Pressure | Control | 50 psi | 100 psi | 75 psi | psi | Equipment operates 50-100, nonlinear expected |
+| C: Time | Control | 10 min | 30 min | 20 min | min | Longer times may improve but cost increases |
+| D: Humidity | Noise | 30% | 70% | - | %RH | Uncontrollable environmental variation |
+
+**Type definitions**:
+- **Control**: Factors you can set deliberately in the experiment
+- **Noise**: Factors that vary but can't be controlled (for robust design)
+- **Held constant**: Factors fixed at one level (not in design)
+
+**Level selection guidance**:
+- **2 levels**: Start here for screening/optimization. Detects linear effects and interactions.
+- **3 levels**: Add center point to detect curvature. Required for response surface designs.
+- **Categorical**: Use coded values (-1, +1) for categories (e.g., Supplier A = -1, Supplier B = +1)
+
+---
+
+## Full Factorial Designs
+
+**When to use**: 2-5 factors, want to estimate all main effects and interactions, budget allows 2^k runs.
+
+**Design structure**: Test all combinations of factor levels.
+
+**Example: 2³ factorial (3 factors, 2 levels each = 8 runs)**
+
+| Run | A | B | C |
+|-----|---|---|---|
+| 1 | - | - | - |
+| 2 | + | - | - |
+| 3 | - | + | - |
+| 4 | + | + | - |
+| 5 | - | - | + |
+| 6 | + | - | + |
+| 7 | - | + | + |
+| 8 | + | + | + |
+
+**Advantages**:
+- Estimates all main effects and 2-way/3-way interactions
+- No confounding
+- Maximum precision for given number of factors
+
+**Limitations**:
+- Runs grow exponentially: 2³ = 8, 2⁴ = 16, 2⁵ = 32, 2⁶ = 64
+- Inefficient for screening (wastes runs on unimportant factors)
+
+**Add center points**: Replicate 3-5 runs at center (0, 0, 0) to detect curvature and estimate pure error.
+
+---
+
+## Fractional Factorial Designs
+
+**When to use**: 5+ factors, limited budget, willing to sacrifice some interaction information.
+
+**Design structure**: Test a fraction (1/2, 1/4, 1/8) of full factorial, deliberately confounding higher-order interactions.
+
+**Example: 2⁵⁻¹ design (5 factors, 16 runs instead of 32)**
+
+**Resolution IV**: Main effects clear, 2-way interactions confounded with each other.
+
+| Run | A | B | C | D | E |
+|-----|---|---|---|---|---|
+| 1 | - | - | - | - | + |
+| 2 | + | - | - | - | - |
+| 3 | - | + | - | - | - |
+| 4 | + | + | - | - | + |
+| 5 | - | - | + | - | - |
+| ... | ... | ... | ... | ... | ... |
+
+**Generator**: E = ABCD (defining relation: I = ABCDE)
+
+**Confounding structure**:
+- A confounded with BCDE
+- AB confounded with CDE
+- ABC confounded with DE
+
+**Resolution levels**:
+- **Resolution III**: Main effects confounded with 2-way interactions. Use for screening only.
+- **Resolution IV**: Main effects clear, 2-way confounded with 2-way. Good for screening + some optimization.
+- **Resolution V**: Main effects and 2-way clear, 2-way confounded with 3-way. Preferred for optimization.
+
+**Choosing fraction**: Use standard designs (tables available) or design software to ensure desired resolution.
+
+---
+
+## Plackett-Burman Screening
+
+**When to use**: Screen 8-15 factors with minimal runs, only care about main effects.
+
+**Design structure**: Orthogonal design with runs = next multiple of 4 above number of factors.
+
+**Example: 12-run Plackett-Burman for up to 11 factors**
+
+| Run | A | B | C | D | E | F | G | H | J | K | L |
+|-----|---|---|---|---|---|---|---|---|---|---|---|
+| 1 | + | + | - | + | + | + | - | - | - | + | - |
+| 2 | + | - | + | + | + | - | - | - | + | - | + |
+| 3 | - | + | + | + | - | - | - | + | - | + | + |
+| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
+
+**Advantages**:
+- Very efficient: Screen 11 factors in 12 runs (vs 2048 for full factorial)
+- Main effects estimated independently
+
+**Limitations**:
+- 2-way interactions completely confounded with main effects
+- Only use when interactions unlikely or unimportant
+- Cannot estimate interactions
+
+**Use case**: Early-stage screening to reduce 15 candidates to 4-5 for follow-up factorial design.
+
+---
+
+## Design Matrix Examples
+
+**Format**: Each row = one run. **Columns**: Run (design point #), Order (randomized sequence), Block (if used), Factors (coded -1/0/+1 or actual values), Responses (blank until execution), Notes (observations).
+
+| Run | Order | A: Temp (°C) | B: Press (psi) | C: Time (min) | Y1: Yield (%) | Y2: Cost ($) |
+|-----|-------|--------------|----------------|---------------|---------------|--------------|
+| 1 | 3 | 150 | 50 | 10 | | |
+| 2 | 7 | 200 | 50 | 10 | | |
+| 3 | 1 | 150 | 100 | 10 | | |
+| 4 | 5 | 200 | 100 | 10 | | |
+
+---
+
+## Execution Details
+
+**Randomization**: Eliminates bias from time trends/drift. Method: (1) List runs, (2) Assign random numbers, (3) Sort by random number = execution order, (4) Document both orders. Exception: Don't randomize hard-to-change factors (use split-plot design, see methodology.md).
+
+**Blocking**: Use when runs span days/batches/operators. Method: Divide into 2-4 balanced blocks, randomize within each, analyze with block as factor. Example: 16 runs over 2 days → 2 blocks of 8.
+
+**Replication**: True replication (repeat entire run), repeated measures (multiple measurements per run), or center points (3-5 replicates at center for pure error). Guidance: Always include 3-5 center points for continuous factors.
+
+---
+
+## Quality Checklist
+
+Before finalizing the experiment plan, verify:
+
+**Objective & Scope**:
+- [ ] Goal clearly stated (screening | optimization | response surface | robust)
+- [ ] Success criteria are measurable and realistic
+- [ ] Constraints documented (runs, time, cost)
+
+**Factors**:
+- [ ] All important factors included
+- [ ] Levels span meaningful range (not too narrow, not outside feasible region)
+- [ ] Factor types identified (control vs noise)
+- [ ] Rationale for each factor documented
+
+**Responses**:
+- [ ] Responses are objective and quantitative
+- [ ] Measurement method specified and validated
+- [ ] Target direction clear (maximize | minimize | hit target)
+
+**Design**:
+- [ ] Design type appropriate for objective and budget
+- [ ] Design resolution adequate (e.g., Resolution IV+ if interactions matter)
+- [ ] Run count justified (power analysis or practical limit)
+- [ ] Design matrix correct (orthogonal, balanced)
+
+**Execution**:
+- [ ] Randomization method specified
+- [ ] Blocking used if runs span nuisance variable levels
+- [ ] Replication plan documented (center points, full replicates)
+- [ ] Protocol detailed enough for independent execution
+- [ ] Timeline realistic
+
+**Analysis**:
+- [ ] Analysis plan specified before data collection
+- [ ] Significance level (α) stated
+- [ ] Decision criteria clear
+- [ ] Residual diagnostics planned
+- [ ] Follow-up strategy identified
+
+**Assumptions & Risks**:
+- [ ] Key assumptions stated explicitly
+- [ ] Limitations acknowledged (resolution, range, measurement)
+- [ ] Risks identified with mitigation plans