12 KiB
name, description, allowed-tools, version
| name | description | allowed-tools | version |
|---|---|---|---|
| power-analysis | Calculate statistical power and required sample sizes for research studies. Use when: (1) Designing experiments to determine sample size, (2) Justifying sample size for grant proposals or protocols, (3) Evaluating adequacy of existing studies, (4) Meeting NIH rigor standards for pre-registration, (5) Conducting retrospective power analysis to interpret null results. | Read, Write | 1.0.0 |
Statistical Power Analysis Skill
Purpose
Calculate statistical power and determine required sample sizes for research studies. Essential for experimental design, grant writing, and meeting NIH rigor and reproducibility standards.
Core Concepts
Statistical Power
Definition: Probability of detecting a true effect when it exists (1 - β)
Standard: Power ≥ 0.80 (80%) is typically required for NIH grants and pre-registration
Key Parameters
- Effect Size (d, r, η²) - Magnitude of the phenomenon
- Alpha (α) - Type I error rate (typically 0.05)
- Power (1-β) - Probability of detecting effect (typically 0.80)
- Sample Size (N) - Number of participants/observations needed
The Relationship
Power = f(Effect Size, Sample Size, Alpha, Test Type)
For given effect size and alpha:
↑ Sample Size → ↑ Power
↑ Effect Size → ↓ Sample Size needed
When to Use This Skill
Pre-Study (Prospective Power Analysis)
- Grant Proposals - Justify requested sample size
- Study Design - Determine recruitment needs
- Pre-Registration - Document planned sample size with justification
- Resource Planning - Estimate time and cost requirements
- Ethical Review - Minimize participants while maintaining power
Post-Study (Retrospective/Sensitivity Analysis)
- Null Results - Was study adequately powered?
- Publication - Report achieved power
- Meta-Analysis - Assess individual study adequacy
- Study Critique - Evaluate power of published work
Common Study Designs
1. Independent Samples T-Test
Use: Compare two independent groups
Formula:
N per group = 2 * (z_α/2 + z_β)² * σ² / d²
Where:
- d = effect size (Cohen's d)
- α = significance level (typ. 0.05)
- β = Type II error (1 - power)
- σ² = pooled variance
Example:
Research Question: Does intervention improve test scores vs. control?
Effect Size: d = 0.5 (medium effect)
Alpha: 0.05
Power: 0.80
Result: N = 64 per group (128 total)
Effect Size Guidelines (Cohen's d):
- Small: d = 0.2
- Medium: d = 0.5
- Large: d = 0.8
2. Paired Samples T-Test
Use: Pre-post comparisons, matched pairs
Formula:
N = (z_α/2 + z_β)² * 2(1-ρ) / d²
Where ρ = correlation between measures
Example:
Research Question: Does training improve performance (pre-post)?
Effect Size: d = 0.6
Correlation: ρ = 0.5 (moderate test-retest reliability)
Alpha: 0.05
Power: 0.80
Result: N = 24 participants
Key Insight: Higher correlation → fewer participants needed
3. One-Way ANOVA
Use: Compare 3+ independent groups
Formula:
N per group = (k * (z_α/2 + z_β)²) / f²
Where:
- k = number of groups
- f = effect size (Cohen's f)
Example:
Research Question: Compare 4 treatment conditions
Effect Size: f = 0.25 (medium)
Alpha: 0.05
Power: 0.80
Result: N = 45 per group (180 total)
Effect Size Guidelines (Cohen's f):
- Small: f = 0.10
- Medium: f = 0.25
- Large: f = 0.40
4. Chi-Square Test
Use: Association between categorical variables
Formula:
N = (z_α/2 + z_β)² / (w² * df)
Where:
- w = effect size (Cohen's w)
- df = degrees of freedom
Example:
Research Question: Is treatment success related to group (2x2 table)?
Effect Size: w = 0.3 (medium)
Alpha: 0.05
Power: 0.80
df = 1
Result: N = 88 total participants
Effect Size Guidelines (Cohen's w):
- Small: w = 0.10
- Medium: w = 0.30
- Large: w = 0.50
5. Correlation
Use: Relationship between continuous variables
Formula:
N = (z_α/2 + z_β)² / C(r)² + 3
Where C(r) = 0.5 * ln((1+r)/(1-r)) [Fisher's z]
Example:
Research Question: Correlation between anxiety and performance
Expected r: 0.30
Alpha: 0.05
Power: 0.80
Result: N = 84 participants
Effect Size Guidelines (Pearson's r):
- Small: r = 0.10
- Medium: r = 0.30
- Large: r = 0.50
6. Multiple Regression
Use: Predict outcome from multiple predictors
Formula:
N = (L / f²) + k + 1
Where:
- L = non-centrality parameter
- f² = effect size (Cohen's f²)
- k = number of predictors
Example:
Research Question: Predict depression from 5 variables
Effect Size: f² = 0.15 (medium)
Alpha: 0.05
Power: 0.80
Predictors: 5
Result: N = 92 participants
Effect Size Guidelines (f²):
- Small: f² = 0.02
- Medium: f² = 0.15
- Large: f² = 0.35
Choosing Effect Sizes
Method 1: Previous Literature
Best Practice: Use meta-analytic estimates
Example: Meta-analysis shows d = 0.45 for CBT vs. control
Use: d = 0.45 for power calculation
Method 2: Pilot Study
Run small pilot (N = 20-30)
Calculate observed effect size
Adjust for uncertainty (use 80% of observed)
Method 3: Minimum Meaningful Effect
Question: What's the smallest effect worth detecting?
Clinical: Minimum clinically important difference (MCID)
Practical: Cost-benefit threshold
Method 4: Cohen's Conventions
Use only when no better information available:
- Small effects: Often require N > 300
- Medium effects: Typically N = 50-100 per group
- Large effects: May need only N = 20-30 per group
Warning: Cohen's conventions are rough guidelines, not universal truths!
NIH Rigor Standards
Required Elements for NIH Grants
-
Justified Effect Size
- Cite source (literature, pilot, theory)
- Explain why this effect size is reasonable
- Consider range of plausible values
-
Power Calculation
- Show formula or software used
- Report all assumptions
- Calculate required N
-
Sensitivity Analysis
- Show power across range of effect sizes
- Demonstrate study is adequately powered
-
Accounting for Attrition
N_recruit = N_required / (1 - expected_attrition_rate) Example: N_required = 100 Expected attrition = 20% N_recruit = 100 / 0.80 = 125
Sample Power Analysis Section (Grant)
Sample Size Justification
We will recruit 140 participants (70 per group) to detect a medium
effect (d = 0.50) with 80% power at α = 0.05 using an independent
samples t-test. This effect size is based on our pilot study (N = 30)
which showed d = 0.62, and is consistent with meta-analytic estimates
for similar interventions (Smith et al., 2023; d = 0.48, 95% CI
[0.35, 0.61]).
We calculated the required sample size using G*Power 3.1, assuming
a two-tailed test. To account for anticipated 20% attrition based on
previous studies in this population, we will recruit 175 participants
(rounded to 180 for equal groups: 90 per condition).
Sensitivity analysis shows our design provides:
- 95% power to detect d = 0.60 (large effect)
- 80% power to detect d = 0.50 (medium effect)
- 55% power to detect d = 0.40 (small-medium effect)
This ensures adequate power while minimizing participant burden and
research costs.
Tools and Software
G*Power (Free)
- Use: Most common power analysis tool
- Pros: User-friendly, comprehensive test coverage
- Cons: Desktop only, manual calculations
- Download: https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower
R (pwr package)
library(pwr)
# Independent t-test
pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05, type = "two.sample")
# ANOVA
pwr.anova.test(k = 4, f = 0.25, power = 0.80, sig.level = 0.05)
# Correlation
pwr.r.test(r = 0.30, power = 0.80, sig.level = 0.05)
Python (statsmodels)
from statsmodels.stats.power import ttest_power
# Calculate power
power = ttest_power(effect_size=0.5, nobs=64, alpha=0.05)
# Calculate required N
from statsmodels.stats.power import tt_ind_solve_power
n = tt_ind_solve_power(effect_size=0.5, power=0.8, alpha=0.05)
Online Calculators
- PASS: https://www.ncss.com/software/pass/
- PS: https://sph.umich.edu/biostat/power-and-sample-size.html
Common Pitfalls
Pitfall 1: Using Post-Hoc Power
❌ Wrong: Calculate power after null result using observed effect Problem: Will always show low power for null results (circular reasoning)
✅ Right: Report sensitivity analysis (what effects were you powered to detect?)
Pitfall 2: Underpowered Studies
❌ Wrong: "We had N=20, but it was a pilot study" Problem: Even pilots should have defensible sample sizes
✅ Right: Use sequential design, specify stopping rules, or acknowledge limitation
Pitfall 3: Overpowered Studies
❌ Wrong: N = 1000 to detect tiny effect (d = 0.1) Problem: Wastes resources, detects trivial effects
✅ Right: Power for smallest meaningful effect, not just statistical significance
Pitfall 4: Ignoring Multiple Comparisons
❌ Wrong: Power calculation for single test, but running 10 tests Problem: Actual power much lower due to multiple testing correction
✅ Right: Adjust alpha or specify primary vs. secondary outcomes
Pitfall 5: Wrong Test Type
❌ Wrong: Power for independent t-test, but actually paired Problem: Wrong sample size (paired designs often need fewer)
✅ Right: Match power calculation to actual analysis plan
Reporting Power Analysis
In Pre-Registration
Sample Size Determination:
- Statistical test: Independent samples t-test (two-tailed)
- Effect size: d = 0.50 (based on [citation])
- Alpha: 0.05
- Power: 0.80
- Required N: 64 per group (128 total)
- Accounting for 15% attrition: 150 recruited (75 per group)
- Power analysis conducted using G*Power 3.1.9.7
In Manuscript Methods
We determined our sample size a priori using power analysis. To detect
a medium effect (Cohen's d = 0.50) with 80% power at α = 0.05
(two-tailed), we required 64 participants per group (G*Power 3.1).
Accounting for anticipated 15% attrition, we recruited 150 participants.
In Results (for Null Findings)
Although we found no significant effect (p = .23), sensitivity analysis
shows our study had 80% power to detect effects of d ≥ 0.50 and 55%
power for d = 0.40. Our 95% CI for the effect size was d = -0.12 to 0.38,
excluding large effects but not ruling out small-to-medium effects.
Integration with Research Workflow
With Experiment-Designer Agent
Agent uses power-analysis skill to:
1. Calculate required sample size
2. Justify N in protocol
3. Generate power analysis section for pre-registration
4. Create sensitivity analysis plots
With NIH-Validator
Validator checks:
- Power ≥ 80%
- Effect size justified with citation
- Attrition accounted for
- Analysis plan matches power calculation
With Statistical Analysis
After analysis:
1. Report achieved power or sensitivity analysis
2. Calculate confidence intervals around effect size
3. Interpret results in context of statistical power
Examples
Example 1: RCT Power Analysis
Design: Randomized controlled trial, two groups
Outcome: Depression scores (continuous)
Analysis: Independent samples t-test
Literature: Meta-analysis shows d = 0.55 for CBT vs. waitlist
Conservative estimate: d = 0.50
Alpha: 0.05 (two-tailed)
Power: 0.80
Calculation (G*Power):
Input: t-test, independent samples, d = 0.5, α = 0.05, power = 0.80
Output: N = 64 per group (128 total)
With 20% attrition: 160 recruited (80 per group)
Example 2: Within-Subjects Design
Design: Pre-post intervention
Outcome: Anxiety scores
Analysis: Paired t-test
Pilot data: d = 0.70, r = 0.60
Alpha: 0.05
Power: 0.80
Calculation (considering correlation):
N = 19 participants
With 15% attrition: 23 recruited
Example 3: Factorial ANOVA
Design: 2x3 factorial (Treatment x Time)
Analysis: Two-way ANOVA
Effect of interest: Interaction (Treatment × Time)
Effect size: f = 0.25 (medium)
Alpha: 0.05
Power: 0.80
Calculation:
N = 50 per cell (300 total for 6 cells)
With 10% attrition: 330 recruited (55 per cell)
Last Updated: 2025-11-09 Version: 1.0.0