Files
2025-11-29 17:58:28 +08:00

481 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: power-analysis
description: "Calculate statistical power and required sample sizes for research studies. Use when: (1) Designing experiments to determine sample size, (2) Justifying sample size for grant proposals or protocols, (3) Evaluating adequacy of existing studies, (4) Meeting NIH rigor standards for pre-registration, (5) Conducting retrospective power analysis to interpret null results."
allowed-tools: Read, Write
version: 1.0.0
---
# Statistical Power Analysis Skill
## Purpose
Calculate statistical power and determine required sample sizes for research studies. Essential for experimental design, grant writing, and meeting NIH rigor and reproducibility standards.
## Core Concepts
### Statistical Power
**Definition:** Probability of detecting a true effect when it exists (1 - β)
**Standard:** Power ≥ 0.80 (80%) is typically required for NIH grants and pre-registration
### Key Parameters
1. **Effect Size (d, r, η²)** - Magnitude of the phenomenon
2. **Alpha (α)** - Type I error rate (typically 0.05)
3. **Power (1-β)** - Probability of detecting effect (typically 0.80)
4. **Sample Size (N)** - Number of participants/observations needed
### The Relationship
```
Power = f(Effect Size, Sample Size, Alpha, Test Type)
For given effect size and alpha:
↑ Sample Size → ↑ Power
↑ Effect Size → ↓ Sample Size needed
```
## When to Use This Skill
### Pre-Study (Prospective Power Analysis)
1. **Grant Proposals** - Justify requested sample size
2. **Study Design** - Determine recruitment needs
3. **Pre-Registration** - Document planned sample size with justification
4. **Resource Planning** - Estimate time and cost requirements
5. **Ethical Review** - Minimize participants while maintaining power
### Post-Study (Retrospective/Sensitivity Analysis)
1. **Null Results** - Was study adequately powered?
2. **Publication** - Report achieved power
3. **Meta-Analysis** - Assess individual study adequacy
4. **Study Critique** - Evaluate power of published work
## Common Study Designs
### 1. Independent Samples T-Test
**Use:** Compare two independent groups
**Formula:**
```
N per group = 2 * (z_α/2 + z_β)² * σ² / d²
Where:
- d = effect size (Cohen's d)
- α = significance level (typ. 0.05)
- β = Type II error (1 - power)
- σ² = pooled variance
```
**Example:**
```
Research Question: Does intervention improve test scores vs. control?
Effect Size: d = 0.5 (medium effect)
Alpha: 0.05
Power: 0.80
Result: N = 64 per group (128 total)
```
**Effect Size Guidelines (Cohen's d):**
- Small: d = 0.2
- Medium: d = 0.5
- Large: d = 0.8
### 2. Paired Samples T-Test
**Use:** Pre-post comparisons, matched pairs
**Formula:**
```
N = (z_α/2 + z_β)² * 2(1-ρ) / d²
Where ρ = correlation between measures
```
**Example:**
```
Research Question: Does training improve performance (pre-post)?
Effect Size: d = 0.6
Correlation: ρ = 0.5 (moderate test-retest reliability)
Alpha: 0.05
Power: 0.80
Result: N = 24 participants
```
**Key Insight:** Higher correlation → fewer participants needed
### 3. One-Way ANOVA
**Use:** Compare 3+ independent groups
**Formula:**
```
N per group = (k * (z_α/2 + z_β)²) / f²
Where:
- k = number of groups
- f = effect size (Cohen's f)
```
**Example:**
```
Research Question: Compare 4 treatment conditions
Effect Size: f = 0.25 (medium)
Alpha: 0.05
Power: 0.80
Result: N = 45 per group (180 total)
```
**Effect Size Guidelines (Cohen's f):**
- Small: f = 0.10
- Medium: f = 0.25
- Large: f = 0.40
### 4. Chi-Square Test
**Use:** Association between categorical variables
**Formula:**
```
N = (z_α/2 + z_β)² / (w² * df)
Where:
- w = effect size (Cohen's w)
- df = degrees of freedom
```
**Example:**
```
Research Question: Is treatment success related to group (2x2 table)?
Effect Size: w = 0.3 (medium)
Alpha: 0.05
Power: 0.80
df = 1
Result: N = 88 total participants
```
**Effect Size Guidelines (Cohen's w):**
- Small: w = 0.10
- Medium: w = 0.30
- Large: w = 0.50
### 5. Correlation
**Use:** Relationship between continuous variables
**Formula:**
```
N = (z_α/2 + z_β)² / C(r)² + 3
Where C(r) = 0.5 * ln((1+r)/(1-r)) [Fisher's z]
```
**Example:**
```
Research Question: Correlation between anxiety and performance
Expected r: 0.30
Alpha: 0.05
Power: 0.80
Result: N = 84 participants
```
**Effect Size Guidelines (Pearson's r):**
- Small: r = 0.10
- Medium: r = 0.30
- Large: r = 0.50
### 6. Multiple Regression
**Use:** Predict outcome from multiple predictors
**Formula:**
```
N = (L / f²) + k + 1
Where:
- L = non-centrality parameter
- f² = effect size (Cohen's f²)
- k = number of predictors
```
**Example:**
```
Research Question: Predict depression from 5 variables
Effect Size: f² = 0.15 (medium)
Alpha: 0.05
Power: 0.80
Predictors: 5
Result: N = 92 participants
```
**Effect Size Guidelines (f²):**
- Small: f² = 0.02
- Medium: f² = 0.15
- Large: f² = 0.35
## Choosing Effect Sizes
### Method 1: Previous Literature
**Best Practice:** Use meta-analytic estimates
```
Example: Meta-analysis shows d = 0.45 for CBT vs. control
Use: d = 0.45 for power calculation
```
### Method 2: Pilot Study
```
Run small pilot (N = 20-30)
Calculate observed effect size
Adjust for uncertainty (use 80% of observed)
```
### Method 3: Minimum Meaningful Effect
```
Question: What's the smallest effect worth detecting?
Clinical: Minimum clinically important difference (MCID)
Practical: Cost-benefit threshold
```
### Method 4: Cohen's Conventions
**Use only when no better information available:**
- Small effects: Often require N > 300
- Medium effects: Typically N = 50-100 per group
- Large effects: May need only N = 20-30 per group
**Warning:** Cohen's conventions are rough guidelines, not universal truths!
## NIH Rigor Standards
### Required Elements for NIH Grants
1. **Justified Effect Size**
- Cite source (literature, pilot, theory)
- Explain why this effect size is reasonable
- Consider range of plausible values
2. **Power Calculation**
- Show formula or software used
- Report all assumptions
- Calculate required N
3. **Sensitivity Analysis**
- Show power across range of effect sizes
- Demonstrate study is adequately powered
4. **Accounting for Attrition**
```
N_recruit = N_required / (1 - expected_attrition_rate)
Example:
N_required = 100
Expected attrition = 20%
N_recruit = 100 / 0.80 = 125
```
### Sample Power Analysis Section (Grant)
```
Sample Size Justification
We will recruit 140 participants (70 per group) to detect a medium
effect (d = 0.50) with 80% power at α = 0.05 using an independent
samples t-test. This effect size is based on our pilot study (N = 30)
which showed d = 0.62, and is consistent with meta-analytic estimates
for similar interventions (Smith et al., 2023; d = 0.48, 95% CI
[0.35, 0.61]).
We calculated the required sample size using G*Power 3.1, assuming
a two-tailed test. To account for anticipated 20% attrition based on
previous studies in this population, we will recruit 175 participants
(rounded to 180 for equal groups: 90 per condition).
Sensitivity analysis shows our design provides:
- 95% power to detect d = 0.60 (large effect)
- 80% power to detect d = 0.50 (medium effect)
- 55% power to detect d = 0.40 (small-medium effect)
This ensures adequate power while minimizing participant burden and
research costs.
```
## Tools and Software
### G*Power (Free)
- **Use:** Most common power analysis tool
- **Pros:** User-friendly, comprehensive test coverage
- **Cons:** Desktop only, manual calculations
- **Download:** https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower
### R (pwr package)
```R
library(pwr)
# Independent t-test
pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05, type = "two.sample")
# ANOVA
pwr.anova.test(k = 4, f = 0.25, power = 0.80, sig.level = 0.05)
# Correlation
pwr.r.test(r = 0.30, power = 0.80, sig.level = 0.05)
```
### Python (statsmodels)
```python
from statsmodels.stats.power import ttest_power
# Calculate power
power = ttest_power(effect_size=0.5, nobs=64, alpha=0.05)
# Calculate required N
from statsmodels.stats.power import tt_ind_solve_power
n = tt_ind_solve_power(effect_size=0.5, power=0.8, alpha=0.05)
```
### Online Calculators
- **PASS:** https://www.ncss.com/software/pass/
- **PS:** https://sph.umich.edu/biostat/power-and-sample-size.html
## Common Pitfalls
### Pitfall 1: Using Post-Hoc Power
❌ **Wrong:** Calculate power after null result using observed effect
**Problem:** Will always show low power for null results (circular reasoning)
✅ **Right:** Report sensitivity analysis (what effects were you powered to detect?)
### Pitfall 2: Underpowered Studies
❌ **Wrong:** "We had N=20, but it was a pilot study"
**Problem:** Even pilots should have defensible sample sizes
✅ **Right:** Use sequential design, specify stopping rules, or acknowledge limitation
### Pitfall 3: Overpowered Studies
❌ **Wrong:** N = 1000 to detect tiny effect (d = 0.1)
**Problem:** Wastes resources, detects trivial effects
✅ **Right:** Power for smallest meaningful effect, not just statistical significance
### Pitfall 4: Ignoring Multiple Comparisons
❌ **Wrong:** Power calculation for single test, but running 10 tests
**Problem:** Actual power much lower due to multiple testing correction
✅ **Right:** Adjust alpha or specify primary vs. secondary outcomes
### Pitfall 5: Wrong Test Type
❌ **Wrong:** Power for independent t-test, but actually paired
**Problem:** Wrong sample size (paired designs often need fewer)
✅ **Right:** Match power calculation to actual analysis plan
## Reporting Power Analysis
### In Pre-Registration
```
Sample Size Determination:
- Statistical test: Independent samples t-test (two-tailed)
- Effect size: d = 0.50 (based on [citation])
- Alpha: 0.05
- Power: 0.80
- Required N: 64 per group (128 total)
- Accounting for 15% attrition: 150 recruited (75 per group)
- Power analysis conducted using G*Power 3.1.9.7
```
### In Manuscript Methods
```
We determined our sample size a priori using power analysis. To detect
a medium effect (Cohen's d = 0.50) with 80% power at α = 0.05
(two-tailed), we required 64 participants per group (G*Power 3.1).
Accounting for anticipated 15% attrition, we recruited 150 participants.
```
### In Results (for Null Findings)
```
Although we found no significant effect (p = .23), sensitivity analysis
shows our study had 80% power to detect effects of d ≥ 0.50 and 55%
power for d = 0.40. Our 95% CI for the effect size was d = -0.12 to 0.38,
excluding large effects but not ruling out small-to-medium effects.
```
## Integration with Research Workflow
### With Experiment-Designer Agent
```
Agent uses power-analysis skill to:
1. Calculate required sample size
2. Justify N in protocol
3. Generate power analysis section for pre-registration
4. Create sensitivity analysis plots
```
### With NIH-Validator
```
Validator checks:
- Power ≥ 80%
- Effect size justified with citation
- Attrition accounted for
- Analysis plan matches power calculation
```
### With Statistical Analysis
```
After analysis:
1. Report achieved power or sensitivity analysis
2. Calculate confidence intervals around effect size
3. Interpret results in context of statistical power
```
## Examples
### Example 1: RCT Power Analysis
```
Design: Randomized controlled trial, two groups
Outcome: Depression scores (continuous)
Analysis: Independent samples t-test
Literature: Meta-analysis shows d = 0.55 for CBT vs. waitlist
Conservative estimate: d = 0.50
Alpha: 0.05 (two-tailed)
Power: 0.80
Calculation (G*Power):
Input: t-test, independent samples, d = 0.5, α = 0.05, power = 0.80
Output: N = 64 per group (128 total)
With 20% attrition: 160 recruited (80 per group)
```
### Example 2: Within-Subjects Design
```
Design: Pre-post intervention
Outcome: Anxiety scores
Analysis: Paired t-test
Pilot data: d = 0.70, r = 0.60
Alpha: 0.05
Power: 0.80
Calculation (considering correlation):
N = 19 participants
With 15% attrition: 23 recruited
```
### Example 3: Factorial ANOVA
```
Design: 2x3 factorial (Treatment x Time)
Analysis: Two-way ANOVA
Effect of interest: Interaction (Treatment × Time)
Effect size: f = 0.25 (medium)
Alpha: 0.05
Power: 0.80
Calculation:
N = 50 per cell (300 total for 6 cells)
With 10% attrition: 330 recruited (55 per cell)
```
---
**Last Updated:** 2025-11-09
**Version:** 1.0.0