Initial commit

2025-11-29 17:58:30 +08:00
commit e4074142fc
25 changed files with 2894 additions and 0 deletions
--- a/skills/power-analysis/SKILL.md
+++ b/skills/power-analysis/SKILL.md
@@ -0,0 +1,480 @@
+---
+name: power-analysis
+description: "Calculate statistical power and required sample sizes for research studies. Use when: (1) Designing experiments to determine sample size, (2) Justifying sample size for grant proposals or protocols, (3) Evaluating adequacy of existing studies, (4) Meeting NIH rigor standards for pre-registration, (5) Conducting retrospective power analysis to interpret null results."
+allowed-tools: Read, Write
+version: 1.0.0
+---
+
+# Statistical Power Analysis Skill
+
+## Purpose
+
+Calculate statistical power and determine required sample sizes for research studies. Essential for experimental design, grant writing, and meeting NIH rigor and reproducibility standards.
+
+## Core Concepts
+
+### Statistical Power
+**Definition:** Probability of detecting a true effect when it exists (1 - β)
+
+**Standard:** Power ≥ 0.80 (80%) is typically required for NIH grants and pre-registration
+
+### Key Parameters
+1. **Effect Size (d, r, η²)** - Magnitude of the phenomenon
+2. **Alpha (α)** - Type I error rate (typically 0.05)
+3. **Power (1-β)** - Probability of detecting effect (typically 0.80)
+4. **Sample Size (N)** - Number of participants/observations needed
+
+### The Relationship
+```
+Power = f(Effect Size, Sample Size, Alpha, Test Type)
+
+For given effect size and alpha:
+↑ Sample Size → ↑ Power
+↑ Effect Size → ↓ Sample Size needed
+```
+
+## When to Use This Skill
+
+### Pre-Study (Prospective Power Analysis)
+1. **Grant Proposals** - Justify requested sample size
+2. **Study Design** - Determine recruitment needs
+3. **Pre-Registration** - Document planned sample size with justification
+4. **Resource Planning** - Estimate time and cost requirements
+5. **Ethical Review** - Minimize participants while maintaining power
+
+### Post-Study (Retrospective/Sensitivity Analysis)
+1. **Null Results** - Was study adequately powered?
+2. **Publication** - Report achieved power
+3. **Meta-Analysis** - Assess individual study adequacy
+4. **Study Critique** - Evaluate power of published work
+
+## Common Study Designs
+
+### 1. Independent Samples T-Test
+**Use:** Compare two independent groups
+
+**Formula:**
+```
+N per group = 2 * (z_α/2 + z_β)² * σ² / d²
+
+Where:
+- d = effect size (Cohen's d)
+- α = significance level (typ. 0.05)
+- β = Type II error (1 - power)
+- σ² = pooled variance
+```
+
+**Example:**
+```
+Research Question: Does intervention improve test scores vs. control?
+Effect Size: d = 0.5 (medium effect)
+Alpha: 0.05
+Power: 0.80
+
+Result: N = 64 per group (128 total)
+```
+
+**Effect Size Guidelines (Cohen's d):**
+- Small: d = 0.2
+- Medium: d = 0.5
+- Large: d = 0.8
+
+### 2. Paired Samples T-Test
+**Use:** Pre-post comparisons, matched pairs
+
+**Formula:**
+```
+N = (z_α/2 + z_β)² * 2(1-ρ) / d²
+
+Where ρ = correlation between measures
+```
+
+**Example:**
+```
+Research Question: Does training improve performance (pre-post)?
+Effect Size: d = 0.6
+Correlation: ρ = 0.5 (moderate test-retest reliability)
+Alpha: 0.05
+Power: 0.80
+
+Result: N = 24 participants
+```
+
+**Key Insight:** Higher correlation → fewer participants needed
+
+### 3. One-Way ANOVA
+**Use:** Compare 3+ independent groups
+
+**Formula:**
+```
+N per group = (k * (z_α/2 + z_β)²) / f²
+
+Where:
+- k = number of groups
+- f = effect size (Cohen's f)
+```
+
+**Example:**
+```
+Research Question: Compare 4 treatment conditions
+Effect Size: f = 0.25 (medium)
+Alpha: 0.05
+Power: 0.80
+
+Result: N = 45 per group (180 total)
+```
+
+**Effect Size Guidelines (Cohen's f):**
+- Small: f = 0.10
+- Medium: f = 0.25
+- Large: f = 0.40
+
+### 4. Chi-Square Test
+**Use:** Association between categorical variables
+
+**Formula:**
+```
+N = (z_α/2 + z_β)² / (w² * df)
+
+Where:
+- w = effect size (Cohen's w)
+- df = degrees of freedom
+```
+
+**Example:**
+```
+Research Question: Is treatment success related to group (2x2 table)?
+Effect Size: w = 0.3 (medium)
+Alpha: 0.05
+Power: 0.80
+df = 1
+
+Result: N = 88 total participants
+```
+
+**Effect Size Guidelines (Cohen's w):**
+- Small: w = 0.10
+- Medium: w = 0.30
+- Large: w = 0.50
+
+### 5. Correlation
+**Use:** Relationship between continuous variables
+
+**Formula:**
+```
+N = (z_α/2 + z_β)² / C(r)² + 3
+
+Where C(r) = 0.5 * ln((1+r)/(1-r)) [Fisher's z]
+```
+
+**Example:**
+```
+Research Question: Correlation between anxiety and performance
+Expected r: 0.30
+Alpha: 0.05
+Power: 0.80
+
+Result: N = 84 participants
+```
+
+**Effect Size Guidelines (Pearson's r):**
+- Small: r = 0.10
+- Medium: r = 0.30
+- Large: r = 0.50
+
+### 6. Multiple Regression
+**Use:** Predict outcome from multiple predictors
+
+**Formula:**
+```
+N = (L / f²) + k + 1
+
+Where:
+- L = non-centrality parameter
+- f² = effect size (Cohen's f²)
+- k = number of predictors
+```
+
+**Example:**
+```
+Research Question: Predict depression from 5 variables
+Effect Size: f² = 0.15 (medium)
+Alpha: 0.05
+Power: 0.80
+Predictors: 5
+
+Result: N = 92 participants
+```
+
+**Effect Size Guidelines (f²):**
+- Small: f² = 0.02
+- Medium: f² = 0.15
+- Large: f² = 0.35
+
+## Choosing Effect Sizes
+
+### Method 1: Previous Literature
+**Best Practice:** Use meta-analytic estimates
+
+```
+Example: Meta-analysis shows d = 0.45 for CBT vs. control
+Use: d = 0.45 for power calculation
+```
+
+### Method 2: Pilot Study
+```
+Run small pilot (N = 20-30)
+Calculate observed effect size
+Adjust for uncertainty (use 80% of observed)
+```
+
+### Method 3: Minimum Meaningful Effect
+```
+Question: What's the smallest effect worth detecting?
+Clinical: Minimum clinically important difference (MCID)
+Practical: Cost-benefit threshold
+```
+
+### Method 4: Cohen's Conventions
+**Use only when no better information available:**
+- Small effects: Often require N > 300
+- Medium effects: Typically N = 50-100 per group
+- Large effects: May need only N = 20-30 per group
+
+**Warning:** Cohen's conventions are rough guidelines, not universal truths!
+
+## NIH Rigor Standards
+
+### Required Elements for NIH Grants
+
+1. **Justified Effect Size**
+   - Cite source (literature, pilot, theory)
+   - Explain why this effect size is reasonable
+   - Consider range of plausible values
+
+2. **Power Calculation**
+   - Show formula or software used
+   - Report all assumptions
+   - Calculate required N
+
+3. **Sensitivity Analysis**
+   - Show power across range of effect sizes
+   - Demonstrate study is adequately powered
+
+4. **Accounting for Attrition**
+   ```
+   N_recruit = N_required / (1 - expected_attrition_rate)
+   
+   Example:
+   N_required = 100
+   Expected attrition = 20%
+   N_recruit = 100 / 0.80 = 125
+   ```
+
+### Sample Power Analysis Section (Grant)
+
+```
+Sample Size Justification
+
+We will recruit 140 participants (70 per group) to detect a medium 
+effect (d = 0.50) with 80% power at α = 0.05 using an independent 
+samples t-test. This effect size is based on our pilot study (N = 30) 
+which showed d = 0.62, and is consistent with meta-analytic estimates 
+for similar interventions (Smith et al., 2023; d = 0.48, 95% CI 
+[0.35, 0.61]).
+
+We calculated the required sample size using G*Power 3.1, assuming 
+a two-tailed test. To account for anticipated 20% attrition based on 
+previous studies in this population, we will recruit 175 participants 
+(rounded to 180 for equal groups: 90 per condition).
+
+Sensitivity analysis shows our design provides:
+- 95% power to detect d = 0.60 (large effect)
+- 80% power to detect d = 0.50 (medium effect)  
+- 55% power to detect d = 0.40 (small-medium effect)
+
+This ensures adequate power while minimizing participant burden and 
+research costs.
+```
+
+## Tools and Software
+
+### G*Power (Free)
+- **Use:** Most common power analysis tool
+- **Pros:** User-friendly, comprehensive test coverage
+- **Cons:** Desktop only, manual calculations
+- **Download:** https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower
+
+### R (pwr package)
+```R
+library(pwr)
+
+# Independent t-test
+pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05, type = "two.sample")
+
+# ANOVA
+pwr.anova.test(k = 4, f = 0.25, power = 0.80, sig.level = 0.05)
+
+# Correlation
+pwr.r.test(r = 0.30, power = 0.80, sig.level = 0.05)
+```
+
+### Python (statsmodels)
+```python
+from statsmodels.stats.power import ttest_power
+
+# Calculate power
+power = ttest_power(effect_size=0.5, nobs=64, alpha=0.05)
+
+# Calculate required N
+from statsmodels.stats.power import tt_ind_solve_power
+n = tt_ind_solve_power(effect_size=0.5, power=0.8, alpha=0.05)
+```
+
+### Online Calculators
+- **PASS:** https://www.ncss.com/software/pass/
+- **PS:** https://sph.umich.edu/biostat/power-and-sample-size.html
+
+## Common Pitfalls
+
+### Pitfall 1: Using Post-Hoc Power
+❌ **Wrong:** Calculate power after null result using observed effect
+**Problem:** Will always show low power for null results (circular reasoning)
+
+✅ **Right:** Report sensitivity analysis (what effects were you powered to detect?)
+
+### Pitfall 2: Underpowered Studies
+❌ **Wrong:** "We had N=20, but it was a pilot study"
+**Problem:** Even pilots should have defensible sample sizes
+
+✅ **Right:** Use sequential design, specify stopping rules, or acknowledge limitation
+
+### Pitfall 3: Overpowered Studies
+❌ **Wrong:** N = 1000 to detect tiny effect (d = 0.1)
+**Problem:** Wastes resources, detects trivial effects
+
+✅ **Right:** Power for smallest meaningful effect, not just statistical significance
+
+### Pitfall 4: Ignoring Multiple Comparisons
+❌ **Wrong:** Power calculation for single test, but running 10 tests
+**Problem:** Actual power much lower due to multiple testing correction
+
+✅ **Right:** Adjust alpha or specify primary vs. secondary outcomes
+
+### Pitfall 5: Wrong Test Type
+❌ **Wrong:** Power for independent t-test, but actually paired
+**Problem:** Wrong sample size (paired designs often need fewer)
+
+✅ **Right:** Match power calculation to actual analysis plan
+
+## Reporting Power Analysis
+
+### In Pre-Registration
+```
+Sample Size Determination:
+- Statistical test: Independent samples t-test (two-tailed)
+- Effect size: d = 0.50 (based on [citation])
+- Alpha: 0.05
+- Power: 0.80
+- Required N: 64 per group (128 total)
+- Accounting for 15% attrition: 150 recruited (75 per group)
+- Power analysis conducted using G*Power 3.1.9.7
+```
+
+### In Manuscript Methods
+```
+We determined our sample size a priori using power analysis. To detect 
+a medium effect (Cohen's d = 0.50) with 80% power at α = 0.05 
+(two-tailed), we required 64 participants per group (G*Power 3.1). 
+Accounting for anticipated 15% attrition, we recruited 150 participants.
+```
+
+### In Results (for Null Findings)
+```
+Although we found no significant effect (p = .23), sensitivity analysis 
+shows our study had 80% power to detect effects of d ≥ 0.50 and 55% 
+power for d = 0.40. Our 95% CI for the effect size was d = -0.12 to 0.38, 
+excluding large effects but not ruling out small-to-medium effects.
+```
+
+## Integration with Research Workflow
+
+### With Experiment-Designer Agent
+```
+Agent uses power-analysis skill to:
+1. Calculate required sample size
+2. Justify N in protocol
+3. Generate power analysis section for pre-registration
+4. Create sensitivity analysis plots
+```
+
+### With NIH-Validator
+```
+Validator checks:
+- Power ≥ 80%
+- Effect size justified with citation
+- Attrition accounted for
+- Analysis plan matches power calculation
+```
+
+### With Statistical Analysis
+```
+After analysis:
+1. Report achieved power or sensitivity analysis
+2. Calculate confidence intervals around effect size
+3. Interpret results in context of statistical power
+```
+
+## Examples
+
+### Example 1: RCT Power Analysis
+```
+Design: Randomized controlled trial, two groups
+Outcome: Depression scores (continuous)
+Analysis: Independent samples t-test
+Literature: Meta-analysis shows d = 0.55 for CBT vs. waitlist
+Conservative estimate: d = 0.50
+Alpha: 0.05 (two-tailed)
+Power: 0.80
+
+Calculation (G*Power):
+Input: t-test, independent samples, d = 0.5, α = 0.05, power = 0.80
+Output: N = 64 per group (128 total)
+
+With 20% attrition: 160 recruited (80 per group)
+```
+
+### Example 2: Within-Subjects Design
+```
+Design: Pre-post intervention
+Outcome: Anxiety scores
+Analysis: Paired t-test
+Pilot data: d = 0.70, r = 0.60
+Alpha: 0.05
+Power: 0.80
+
+Calculation (considering correlation):
+N = 19 participants
+
+With 15% attrition: 23 recruited
+```
+
+### Example 3: Factorial ANOVA
+```
+Design: 2x3 factorial (Treatment x Time)
+Analysis: Two-way ANOVA
+Effect of interest: Interaction (Treatment × Time)
+Effect size: f = 0.25 (medium)
+Alpha: 0.05
+Power: 0.80
+
+Calculation:
+N = 50 per cell (300 total for 6 cells)
+
+With 10% attrition: 330 recruited (55 per cell)
+```
+
+---
+
+**Last Updated:** 2025-11-09
+**Version:** 1.0.0