gh-astoreyai-ai-scientist/skills/power-analysis/SKILL.md

---
name: power-analysis
description: "Calculate statistical power and required sample sizes for research studies. Use when: (1) Designing experiments to determine sample size, (2) Justifying sample size for grant proposals or protocols, (3) Evaluating adequacy of existing studies, (4) Meeting NIH rigor standards for pre-registration, (5) Conducting retrospective power analysis to interpret null results."
allowed-tools: Read, Write
version: 1.0.0
---

# Statistical Power Analysis Skill

## Purpose

Calculate statistical power and determine required sample sizes for research studies. Essential for experimental design, grant writing, and meeting NIH rigor and reproducibility standards.

## Core Concepts

### Statistical Power
**Definition:** Probability of detecting a true effect when it exists (1 - β)

**Standard:** Power ≥ 0.80 (80%) is typically required for NIH grants and pre-registration

### Key Parameters
1. **Effect Size (d, r, η²)** - Magnitude of the phenomenon
2. **Alpha (α)** - Type I error rate (typically 0.05)
3. **Power (1-β)** - Probability of detecting effect (typically 0.80)
4. **Sample Size (N)** - Number of participants/observations needed

### The Relationship
```
Power = f(Effect Size, Sample Size, Alpha, Test Type)

For given effect size and alpha:
↑ Sample Size → ↑ Power
↑ Effect Size → ↓ Sample Size needed
```

## When to Use This Skill

### Pre-Study (Prospective Power Analysis)
1. **Grant Proposals** - Justify requested sample size
2. **Study Design** - Determine recruitment needs
3. **Pre-Registration** - Document planned sample size with justification
4. **Resource Planning** - Estimate time and cost requirements
5. **Ethical Review** - Minimize participants while maintaining power

### Post-Study (Retrospective/Sensitivity Analysis)
1. **Null Results** - Was study adequately powered?
2. **Publication** - Report achieved power
3. **Meta-Analysis** - Assess individual study adequacy
4. **Study Critique** - Evaluate power of published work

## Common Study Designs

### 1. Independent Samples T-Test
**Use:** Compare two independent groups

**Formula:**
```
N per group = 2 * (z_α/2 + z_β)² * σ² / d²

Where:
- d = effect size (Cohen's d)
- α = significance level (typ. 0.05)
- β = Type II error (1 - power)
- σ² = pooled variance
```

**Example:**
```
Research Question: Does intervention improve test scores vs. control?
Effect Size: d = 0.5 (medium effect)
Alpha: 0.05
Power: 0.80

Result: N = 64 per group (128 total)
```

**Effect Size Guidelines (Cohen's d):**
- Small: d = 0.2
- Medium: d = 0.5
- Large: d = 0.8

### 2. Paired Samples T-Test
**Use:** Pre-post comparisons, matched pairs

**Formula:**
```
N = (z_α/2 + z_β)² * 2(1-ρ) / d²

Where ρ = correlation between measures
```

**Example:**
```
Research Question: Does training improve performance (pre-post)?
Effect Size: d = 0.6
Correlation: ρ = 0.5 (moderate test-retest reliability)
Alpha: 0.05
Power: 0.80

Result: N = 24 participants
```

**Key Insight:** Higher correlation → fewer participants needed

### 3. One-Way ANOVA
**Use:** Compare 3+ independent groups

**Formula:**
```
N per group = (k * (z_α/2 + z_β)²) / f²

Where:
- k = number of groups
- f = effect size (Cohen's f)
```

**Example:**
```
Research Question: Compare 4 treatment conditions
Effect Size: f = 0.25 (medium)
Alpha: 0.05
Power: 0.80

Result: N = 45 per group (180 total)
```

**Effect Size Guidelines (Cohen's f):**
- Small: f = 0.10
- Medium: f = 0.25
- Large: f = 0.40

### 4. Chi-Square Test
**Use:** Association between categorical variables

**Formula:**
```
N = (z_α/2 + z_β)² / (w² * df)

Where:
- w = effect size (Cohen's w)
- df = degrees of freedom
```

**Example:**
```
Research Question: Is treatment success related to group (2x2 table)?
Effect Size: w = 0.3 (medium)
Alpha: 0.05
Power: 0.80
df = 1

Result: N = 88 total participants
```

**Effect Size Guidelines (Cohen's w):**
- Small: w = 0.10
- Medium: w = 0.30
- Large: w = 0.50

### 5. Correlation
**Use:** Relationship between continuous variables

**Formula:**
```
N = (z_α/2 + z_β)² / C(r)² + 3

Where C(r) = 0.5 * ln((1+r)/(1-r)) [Fisher's z]
```

**Example:**
```
Research Question: Correlation between anxiety and performance
Expected r: 0.30
Alpha: 0.05
Power: 0.80

Result: N = 84 participants
```

**Effect Size Guidelines (Pearson's r):**
- Small: r = 0.10
- Medium: r = 0.30
- Large: r = 0.50

### 6. Multiple Regression
**Use:** Predict outcome from multiple predictors

**Formula:**
```
N = (L / f²) + k + 1

Where:
- L = non-centrality parameter
- f² = effect size (Cohen's f²)
- k = number of predictors
```

**Example:**
```
Research Question: Predict depression from 5 variables
Effect Size: f² = 0.15 (medium)
Alpha: 0.05
Power: 0.80
Predictors: 5

Result: N = 92 participants
```

**Effect Size Guidelines (f²):**
- Small: f² = 0.02
- Medium: f² = 0.15
- Large: f² = 0.35

## Choosing Effect Sizes

### Method 1: Previous Literature
**Best Practice:** Use meta-analytic estimates

```
Example: Meta-analysis shows d = 0.45 for CBT vs. control
Use: d = 0.45 for power calculation
```

### Method 2: Pilot Study
```
Run small pilot (N = 20-30)
Calculate observed effect size
Adjust for uncertainty (use 80% of observed)
```

### Method 3: Minimum Meaningful Effect
```
Question: What's the smallest effect worth detecting?
Clinical: Minimum clinically important difference (MCID)
Practical: Cost-benefit threshold
```

### Method 4: Cohen's Conventions
**Use only when no better information available:**
- Small effects: Often require N > 300
- Medium effects: Typically N = 50-100 per group
- Large effects: May need only N = 20-30 per group

**Warning:** Cohen's conventions are rough guidelines, not universal truths!

## NIH Rigor Standards

### Required Elements for NIH Grants

1. **Justified Effect Size**
   - Cite source (literature, pilot, theory)
   - Explain why this effect size is reasonable
   - Consider range of plausible values

2. **Power Calculation**
   - Show formula or software used
   - Report all assumptions
   - Calculate required N

3. **Sensitivity Analysis**
   - Show power across range of effect sizes
   - Demonstrate study is adequately powered

4. **Accounting for Attrition**
   ```
   N_recruit = N_required / (1 - expected_attrition_rate)

   Example:
   N_required = 100
   Expected attrition = 20%
   N_recruit = 100 / 0.80 = 125
   ```

### Sample Power Analysis Section (Grant)

```
Sample Size Justification

We will recruit 140 participants (70 per group) to detect a medium
effect (d = 0.50) with 80% power at α = 0.05 using an independent
samples t-test. This effect size is based on our pilot study (N = 30)
which showed d = 0.62, and is consistent with meta-analytic estimates
for similar interventions (Smith et al., 2023; d = 0.48, 95% CI
[0.35, 0.61]).

We calculated the required sample size using G*Power 3.1, assuming
a two-tailed test. To account for anticipated 20% attrition based on
previous studies in this population, we will recruit 175 participants
(rounded to 180 for equal groups: 90 per condition).

Sensitivity analysis shows our design provides:
- 95% power to detect d = 0.60 (large effect)
- 80% power to detect d = 0.50 (medium effect)
- 55% power to detect d = 0.40 (small-medium effect)

This ensures adequate power while minimizing participant burden and
research costs.
```

## Tools and Software

### G*Power (Free)
- **Use:** Most common power analysis tool
- **Pros:** User-friendly, comprehensive test coverage
- **Cons:** Desktop only, manual calculations
- **Download:** https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower

### R (pwr package)
```R
library(pwr)

# Independent t-test
pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05, type = "two.sample")

# ANOVA
pwr.anova.test(k = 4, f = 0.25, power = 0.80, sig.level = 0.05)

# Correlation
pwr.r.test(r = 0.30, power = 0.80, sig.level = 0.05)
```

### Python (statsmodels)
```python
from statsmodels.stats.power import ttest_power

# Calculate power
power = ttest_power(effect_size=0.5, nobs=64, alpha=0.05)

# Calculate required N
from statsmodels.stats.power import tt_ind_solve_power
n = tt_ind_solve_power(effect_size=0.5, power=0.8, alpha=0.05)
```

### Online Calculators
- **PASS:** https://www.ncss.com/software/pass/
- **PS:** https://sph.umich.edu/biostat/power-and-sample-size.html

## Common Pitfalls

### Pitfall 1: Using Post-Hoc Power
❌ **Wrong:** Calculate power after null result using observed effect
**Problem:** Will always show low power for null results (circular reasoning)

✅ **Right:** Report sensitivity analysis (what effects were you powered to detect?)

### Pitfall 2: Underpowered Studies
❌ **Wrong:** "We had N=20, but it was a pilot study"
**Problem:** Even pilots should have defensible sample sizes

✅ **Right:** Use sequential design, specify stopping rules, or acknowledge limitation

### Pitfall 3: Overpowered Studies
❌ **Wrong:** N = 1000 to detect tiny effect (d = 0.1)
**Problem:** Wastes resources, detects trivial effects

✅ **Right:** Power for smallest meaningful effect, not just statistical significance

### Pitfall 4: Ignoring Multiple Comparisons
❌ **Wrong:** Power calculation for single test, but running 10 tests
**Problem:** Actual power much lower due to multiple testing correction

✅ **Right:** Adjust alpha or specify primary vs. secondary outcomes

### Pitfall 5: Wrong Test Type
❌ **Wrong:** Power for independent t-test, but actually paired
**Problem:** Wrong sample size (paired designs often need fewer)

✅ **Right:** Match power calculation to actual analysis plan

## Reporting Power Analysis

### In Pre-Registration
```
Sample Size Determination:
- Statistical test: Independent samples t-test (two-tailed)
- Effect size: d = 0.50 (based on [citation])
- Alpha: 0.05
- Power: 0.80
- Required N: 64 per group (128 total)
- Accounting for 15% attrition: 150 recruited (75 per group)
- Power analysis conducted using G*Power 3.1.9.7
```

### In Manuscript Methods
```
We determined our sample size a priori using power analysis. To detect
a medium effect (Cohen's d = 0.50) with 80% power at α = 0.05
(two-tailed), we required 64 participants per group (G*Power 3.1).
Accounting for anticipated 15% attrition, we recruited 150 participants.
```

### In Results (for Null Findings)
```
Although we found no significant effect (p = .23), sensitivity analysis
shows our study had 80% power to detect effects of d ≥ 0.50 and 55%
power for d = 0.40. Our 95% CI for the effect size was d = -0.12 to 0.38,
excluding large effects but not ruling out small-to-medium effects.
```

## Integration with Research Workflow

### With Experiment-Designer Agent
```
Agent uses power-analysis skill to:
1. Calculate required sample size
2. Justify N in protocol
3. Generate power analysis section for pre-registration
4. Create sensitivity analysis plots
```

### With NIH-Validator
```
Validator checks:
- Power ≥ 80%
- Effect size justified with citation
- Attrition accounted for
- Analysis plan matches power calculation
```

### With Statistical Analysis
```
After analysis:
1. Report achieved power or sensitivity analysis
2. Calculate confidence intervals around effect size
3. Interpret results in context of statistical power
```

## Examples

### Example 1: RCT Power Analysis
```
Design: Randomized controlled trial, two groups
Outcome: Depression scores (continuous)
Analysis: Independent samples t-test
Literature: Meta-analysis shows d = 0.55 for CBT vs. waitlist
Conservative estimate: d = 0.50
Alpha: 0.05 (two-tailed)
Power: 0.80

Calculation (G*Power):
Input: t-test, independent samples, d = 0.5, α = 0.05, power = 0.80
Output: N = 64 per group (128 total)

With 20% attrition: 160 recruited (80 per group)
```

### Example 2: Within-Subjects Design
```
Design: Pre-post intervention
Outcome: Anxiety scores
Analysis: Paired t-test
Pilot data: d = 0.70, r = 0.60
Alpha: 0.05
Power: 0.80

Calculation (considering correlation):
N = 19 participants

With 15% attrition: 23 recruited
```

### Example 3: Factorial ANOVA
```
Design: 2x3 factorial (Treatment x Time)
Analysis: Two-way ANOVA
Effect of interest: Interaction (Treatment × Time)
Effect size: f = 0.25 (medium)
Alpha: 0.05
Power: 0.80

Calculation:
N = 50 per cell (300 total for 6 cells)

With 10% attrition: 330 recruited (55 per cell)
```

---

**Last Updated:** 2025-11-09
**Version:** 1.0.0