--- name: power-analysis description: "Calculate statistical power and required sample sizes for research studies. Use when: (1) Designing experiments to determine sample size, (2) Justifying sample size for grant proposals or protocols, (3) Evaluating adequacy of existing studies, (4) Meeting NIH rigor standards for pre-registration, (5) Conducting retrospective power analysis to interpret null results." allowed-tools: Read, Write version: 1.0.0 --- # Statistical Power Analysis Skill ## Purpose Calculate statistical power and determine required sample sizes for research studies. Essential for experimental design, grant writing, and meeting NIH rigor and reproducibility standards. ## Core Concepts ### Statistical Power **Definition:** Probability of detecting a true effect when it exists (1 - β) **Standard:** Power ≥ 0.80 (80%) is typically required for NIH grants and pre-registration ### Key Parameters 1. **Effect Size (d, r, η²)** - Magnitude of the phenomenon 2. **Alpha (α)** - Type I error rate (typically 0.05) 3. **Power (1-β)** - Probability of detecting effect (typically 0.80) 4. **Sample Size (N)** - Number of participants/observations needed ### The Relationship ``` Power = f(Effect Size, Sample Size, Alpha, Test Type) For given effect size and alpha: ↑ Sample Size → ↑ Power ↑ Effect Size → ↓ Sample Size needed ``` ## When to Use This Skill ### Pre-Study (Prospective Power Analysis) 1. **Grant Proposals** - Justify requested sample size 2. **Study Design** - Determine recruitment needs 3. **Pre-Registration** - Document planned sample size with justification 4. **Resource Planning** - Estimate time and cost requirements 5. **Ethical Review** - Minimize participants while maintaining power ### Post-Study (Retrospective/Sensitivity Analysis) 1. **Null Results** - Was study adequately powered? 2. **Publication** - Report achieved power 3. **Meta-Analysis** - Assess individual study adequacy 4. **Study Critique** - Evaluate power of published work ## Common Study Designs ### 1. Independent Samples T-Test **Use:** Compare two independent groups **Formula:** ``` N per group = 2 * (z_α/2 + z_β)² * σ² / d² Where: - d = effect size (Cohen's d) - α = significance level (typ. 0.05) - β = Type II error (1 - power) - σ² = pooled variance ``` **Example:** ``` Research Question: Does intervention improve test scores vs. control? Effect Size: d = 0.5 (medium effect) Alpha: 0.05 Power: 0.80 Result: N = 64 per group (128 total) ``` **Effect Size Guidelines (Cohen's d):** - Small: d = 0.2 - Medium: d = 0.5 - Large: d = 0.8 ### 2. Paired Samples T-Test **Use:** Pre-post comparisons, matched pairs **Formula:** ``` N = (z_α/2 + z_β)² * 2(1-ρ) / d² Where ρ = correlation between measures ``` **Example:** ``` Research Question: Does training improve performance (pre-post)? Effect Size: d = 0.6 Correlation: ρ = 0.5 (moderate test-retest reliability) Alpha: 0.05 Power: 0.80 Result: N = 24 participants ``` **Key Insight:** Higher correlation → fewer participants needed ### 3. One-Way ANOVA **Use:** Compare 3+ independent groups **Formula:** ``` N per group = (k * (z_α/2 + z_β)²) / f² Where: - k = number of groups - f = effect size (Cohen's f) ``` **Example:** ``` Research Question: Compare 4 treatment conditions Effect Size: f = 0.25 (medium) Alpha: 0.05 Power: 0.80 Result: N = 45 per group (180 total) ``` **Effect Size Guidelines (Cohen's f):** - Small: f = 0.10 - Medium: f = 0.25 - Large: f = 0.40 ### 4. Chi-Square Test **Use:** Association between categorical variables **Formula:** ``` N = (z_α/2 + z_β)² / (w² * df) Where: - w = effect size (Cohen's w) - df = degrees of freedom ``` **Example:** ``` Research Question: Is treatment success related to group (2x2 table)? Effect Size: w = 0.3 (medium) Alpha: 0.05 Power: 0.80 df = 1 Result: N = 88 total participants ``` **Effect Size Guidelines (Cohen's w):** - Small: w = 0.10 - Medium: w = 0.30 - Large: w = 0.50 ### 5. Correlation **Use:** Relationship between continuous variables **Formula:** ``` N = (z_α/2 + z_β)² / C(r)² + 3 Where C(r) = 0.5 * ln((1+r)/(1-r)) [Fisher's z] ``` **Example:** ``` Research Question: Correlation between anxiety and performance Expected r: 0.30 Alpha: 0.05 Power: 0.80 Result: N = 84 participants ``` **Effect Size Guidelines (Pearson's r):** - Small: r = 0.10 - Medium: r = 0.30 - Large: r = 0.50 ### 6. Multiple Regression **Use:** Predict outcome from multiple predictors **Formula:** ``` N = (L / f²) + k + 1 Where: - L = non-centrality parameter - f² = effect size (Cohen's f²) - k = number of predictors ``` **Example:** ``` Research Question: Predict depression from 5 variables Effect Size: f² = 0.15 (medium) Alpha: 0.05 Power: 0.80 Predictors: 5 Result: N = 92 participants ``` **Effect Size Guidelines (f²):** - Small: f² = 0.02 - Medium: f² = 0.15 - Large: f² = 0.35 ## Choosing Effect Sizes ### Method 1: Previous Literature **Best Practice:** Use meta-analytic estimates ``` Example: Meta-analysis shows d = 0.45 for CBT vs. control Use: d = 0.45 for power calculation ``` ### Method 2: Pilot Study ``` Run small pilot (N = 20-30) Calculate observed effect size Adjust for uncertainty (use 80% of observed) ``` ### Method 3: Minimum Meaningful Effect ``` Question: What's the smallest effect worth detecting? Clinical: Minimum clinically important difference (MCID) Practical: Cost-benefit threshold ``` ### Method 4: Cohen's Conventions **Use only when no better information available:** - Small effects: Often require N > 300 - Medium effects: Typically N = 50-100 per group - Large effects: May need only N = 20-30 per group **Warning:** Cohen's conventions are rough guidelines, not universal truths! ## NIH Rigor Standards ### Required Elements for NIH Grants 1. **Justified Effect Size** - Cite source (literature, pilot, theory) - Explain why this effect size is reasonable - Consider range of plausible values 2. **Power Calculation** - Show formula or software used - Report all assumptions - Calculate required N 3. **Sensitivity Analysis** - Show power across range of effect sizes - Demonstrate study is adequately powered 4. **Accounting for Attrition** ``` N_recruit = N_required / (1 - expected_attrition_rate) Example: N_required = 100 Expected attrition = 20% N_recruit = 100 / 0.80 = 125 ``` ### Sample Power Analysis Section (Grant) ``` Sample Size Justification We will recruit 140 participants (70 per group) to detect a medium effect (d = 0.50) with 80% power at α = 0.05 using an independent samples t-test. This effect size is based on our pilot study (N = 30) which showed d = 0.62, and is consistent with meta-analytic estimates for similar interventions (Smith et al., 2023; d = 0.48, 95% CI [0.35, 0.61]). We calculated the required sample size using G*Power 3.1, assuming a two-tailed test. To account for anticipated 20% attrition based on previous studies in this population, we will recruit 175 participants (rounded to 180 for equal groups: 90 per condition). Sensitivity analysis shows our design provides: - 95% power to detect d = 0.60 (large effect) - 80% power to detect d = 0.50 (medium effect) - 55% power to detect d = 0.40 (small-medium effect) This ensures adequate power while minimizing participant burden and research costs. ``` ## Tools and Software ### G*Power (Free) - **Use:** Most common power analysis tool - **Pros:** User-friendly, comprehensive test coverage - **Cons:** Desktop only, manual calculations - **Download:** https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower ### R (pwr package) ```R library(pwr) # Independent t-test pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05, type = "two.sample") # ANOVA pwr.anova.test(k = 4, f = 0.25, power = 0.80, sig.level = 0.05) # Correlation pwr.r.test(r = 0.30, power = 0.80, sig.level = 0.05) ``` ### Python (statsmodels) ```python from statsmodels.stats.power import ttest_power # Calculate power power = ttest_power(effect_size=0.5, nobs=64, alpha=0.05) # Calculate required N from statsmodels.stats.power import tt_ind_solve_power n = tt_ind_solve_power(effect_size=0.5, power=0.8, alpha=0.05) ``` ### Online Calculators - **PASS:** https://www.ncss.com/software/pass/ - **PS:** https://sph.umich.edu/biostat/power-and-sample-size.html ## Common Pitfalls ### Pitfall 1: Using Post-Hoc Power ❌ **Wrong:** Calculate power after null result using observed effect **Problem:** Will always show low power for null results (circular reasoning) ✅ **Right:** Report sensitivity analysis (what effects were you powered to detect?) ### Pitfall 2: Underpowered Studies ❌ **Wrong:** "We had N=20, but it was a pilot study" **Problem:** Even pilots should have defensible sample sizes ✅ **Right:** Use sequential design, specify stopping rules, or acknowledge limitation ### Pitfall 3: Overpowered Studies ❌ **Wrong:** N = 1000 to detect tiny effect (d = 0.1) **Problem:** Wastes resources, detects trivial effects ✅ **Right:** Power for smallest meaningful effect, not just statistical significance ### Pitfall 4: Ignoring Multiple Comparisons ❌ **Wrong:** Power calculation for single test, but running 10 tests **Problem:** Actual power much lower due to multiple testing correction ✅ **Right:** Adjust alpha or specify primary vs. secondary outcomes ### Pitfall 5: Wrong Test Type ❌ **Wrong:** Power for independent t-test, but actually paired **Problem:** Wrong sample size (paired designs often need fewer) ✅ **Right:** Match power calculation to actual analysis plan ## Reporting Power Analysis ### In Pre-Registration ``` Sample Size Determination: - Statistical test: Independent samples t-test (two-tailed) - Effect size: d = 0.50 (based on [citation]) - Alpha: 0.05 - Power: 0.80 - Required N: 64 per group (128 total) - Accounting for 15% attrition: 150 recruited (75 per group) - Power analysis conducted using G*Power 3.1.9.7 ``` ### In Manuscript Methods ``` We determined our sample size a priori using power analysis. To detect a medium effect (Cohen's d = 0.50) with 80% power at α = 0.05 (two-tailed), we required 64 participants per group (G*Power 3.1). Accounting for anticipated 15% attrition, we recruited 150 participants. ``` ### In Results (for Null Findings) ``` Although we found no significant effect (p = .23), sensitivity analysis shows our study had 80% power to detect effects of d ≥ 0.50 and 55% power for d = 0.40. Our 95% CI for the effect size was d = -0.12 to 0.38, excluding large effects but not ruling out small-to-medium effects. ``` ## Integration with Research Workflow ### With Experiment-Designer Agent ``` Agent uses power-analysis skill to: 1. Calculate required sample size 2. Justify N in protocol 3. Generate power analysis section for pre-registration 4. Create sensitivity analysis plots ``` ### With NIH-Validator ``` Validator checks: - Power ≥ 80% - Effect size justified with citation - Attrition accounted for - Analysis plan matches power calculation ``` ### With Statistical Analysis ``` After analysis: 1. Report achieved power or sensitivity analysis 2. Calculate confidence intervals around effect size 3. Interpret results in context of statistical power ``` ## Examples ### Example 1: RCT Power Analysis ``` Design: Randomized controlled trial, two groups Outcome: Depression scores (continuous) Analysis: Independent samples t-test Literature: Meta-analysis shows d = 0.55 for CBT vs. waitlist Conservative estimate: d = 0.50 Alpha: 0.05 (two-tailed) Power: 0.80 Calculation (G*Power): Input: t-test, independent samples, d = 0.5, α = 0.05, power = 0.80 Output: N = 64 per group (128 total) With 20% attrition: 160 recruited (80 per group) ``` ### Example 2: Within-Subjects Design ``` Design: Pre-post intervention Outcome: Anxiety scores Analysis: Paired t-test Pilot data: d = 0.70, r = 0.60 Alpha: 0.05 Power: 0.80 Calculation (considering correlation): N = 19 participants With 15% attrition: 23 recruited ``` ### Example 3: Factorial ANOVA ``` Design: 2x3 factorial (Treatment x Time) Analysis: Two-way ANOVA Effect of interest: Interaction (Treatment × Time) Effect size: f = 0.25 (medium) Alpha: 0.05 Power: 0.80 Calculation: N = 50 per cell (300 total for 6 cells) With 10% attrition: 330 recruited (55 per cell) ``` --- **Last Updated:** 2025-11-09 **Version:** 1.0.0