Initial commit
This commit is contained in:
@@ -0,0 +1,506 @@
|
||||
# Common Statistical Pitfalls
|
||||
|
||||
## P-Value Misinterpretations
|
||||
|
||||
### Pitfall 1: P-Value = Probability Hypothesis is True
|
||||
**Misconception:** p = .05 means 5% chance the null hypothesis is true.
|
||||
|
||||
**Reality:** P-value is the probability of observing data this extreme (or more) *if* the null hypothesis is true. It says nothing about the probability the hypothesis is true.
|
||||
|
||||
**Correct interpretation:** "If there were truly no effect, we would observe data this extreme only 5% of the time."
|
||||
|
||||
### Pitfall 2: Non-Significant = No Effect
|
||||
**Misconception:** p > .05 proves there's no effect.
|
||||
|
||||
**Reality:** Absence of evidence ≠ evidence of absence. Non-significant results may indicate:
|
||||
- Insufficient statistical power
|
||||
- True effect too small to detect
|
||||
- High variability
|
||||
- Small sample size
|
||||
|
||||
**Better approach:**
|
||||
- Report confidence intervals
|
||||
- Conduct power analysis
|
||||
- Consider equivalence testing
|
||||
|
||||
### Pitfall 3: Significant = Important
|
||||
**Misconception:** Statistical significance means practical importance.
|
||||
|
||||
**Reality:** With large samples, trivial effects become "significant." A statistically significant 0.1 IQ point difference is meaningless in practice.
|
||||
|
||||
**Better approach:**
|
||||
- Report effect sizes
|
||||
- Consider practical significance
|
||||
- Use confidence intervals
|
||||
|
||||
### Pitfall 4: P = .049 vs. P = .051
|
||||
**Misconception:** These are meaningfully different because one crosses the .05 threshold.
|
||||
|
||||
**Reality:** These represent nearly identical evidence. The .05 threshold is arbitrary.
|
||||
|
||||
**Better approach:**
|
||||
- Treat p-values as continuous measures of evidence
|
||||
- Report exact p-values
|
||||
- Consider context and prior evidence
|
||||
|
||||
### Pitfall 5: One-Tailed Tests Without Justification
|
||||
**Misconception:** One-tailed tests are free extra power.
|
||||
|
||||
**Reality:** One-tailed tests assume effects can only go one direction, which is rarely true. They're often used to artificially boost significance.
|
||||
|
||||
**When appropriate:** Only when effects in one direction are theoretically impossible or equivalent to null.
|
||||
|
||||
## Multiple Comparisons Problems
|
||||
|
||||
### Pitfall 6: Multiple Testing Without Correction
|
||||
**Problem:** Testing 20 hypotheses at p < .05 gives ~65% chance of at least one false positive.
|
||||
|
||||
**Examples:**
|
||||
- Testing many outcomes
|
||||
- Testing many subgroups
|
||||
- Conducting multiple interim analyses
|
||||
- Testing at multiple time points
|
||||
|
||||
**Solutions:**
|
||||
- Bonferroni correction (divide α by number of tests)
|
||||
- False Discovery Rate (FDR) control
|
||||
- Prespecify primary outcome
|
||||
- Treat exploratory analyses as hypothesis-generating
|
||||
|
||||
### Pitfall 7: Subgroup Analysis Fishing
|
||||
**Problem:** Testing many subgroups until finding significance.
|
||||
|
||||
**Why problematic:**
|
||||
- Inflates false positive rate
|
||||
- Often reported without disclosure
|
||||
- "Interaction was significant in women" may be random
|
||||
|
||||
**Solutions:**
|
||||
- Prespecify subgroups
|
||||
- Use interaction tests, not separate tests
|
||||
- Require replication
|
||||
- Correct for multiple comparisons
|
||||
|
||||
### Pitfall 8: Outcome Switching
|
||||
**Problem:** Analyzing many outcomes, reporting only significant ones.
|
||||
|
||||
**Detection signs:**
|
||||
- Secondary outcomes emphasized
|
||||
- Incomplete outcome reporting
|
||||
- Discrepancy between registration and publication
|
||||
|
||||
**Solutions:**
|
||||
- Preregister all outcomes
|
||||
- Report all planned outcomes
|
||||
- Distinguish primary from secondary
|
||||
|
||||
## Sample Size and Power Issues
|
||||
|
||||
### Pitfall 9: Underpowered Studies
|
||||
**Problem:** Small samples have low probability of detecting true effects.
|
||||
|
||||
**Consequences:**
|
||||
- High false negative rate
|
||||
- Significant results more likely to be false positives
|
||||
- Overestimated effect sizes (when significant)
|
||||
|
||||
**Solutions:**
|
||||
- Conduct a priori power analysis
|
||||
- Aim for 80-90% power
|
||||
- Consider effect size from prior research
|
||||
|
||||
### Pitfall 10: Post-Hoc Power Analysis
|
||||
**Problem:** Calculating power after seeing results is circular and uninformative.
|
||||
|
||||
**Why useless:**
|
||||
- Non-significant results always have low "post-hoc power"
|
||||
- It recapitulates the p-value without new information
|
||||
|
||||
**Better approach:**
|
||||
- Calculate confidence intervals
|
||||
- Plan replication with adequate sample
|
||||
- Conduct prospective power analysis for future studies
|
||||
|
||||
### Pitfall 11: Small Sample Fallacy
|
||||
**Problem:** Trusting results from very small samples.
|
||||
|
||||
**Issues:**
|
||||
- High sampling variability
|
||||
- Outliers have large influence
|
||||
- Assumptions of tests violated
|
||||
- Confidence intervals very wide
|
||||
|
||||
**Guidelines:**
|
||||
- Be skeptical of n < 30
|
||||
- Check assumptions carefully
|
||||
- Consider non-parametric tests
|
||||
- Replicate findings
|
||||
|
||||
## Effect Size Misunderstandings
|
||||
|
||||
### Pitfall 12: Ignoring Effect Size
|
||||
**Problem:** Focusing only on significance, not magnitude.
|
||||
|
||||
**Why problematic:**
|
||||
- Significance ≠ importance
|
||||
- Can't compare across studies
|
||||
- Doesn't inform practical decisions
|
||||
|
||||
**Solutions:**
|
||||
- Always report effect sizes
|
||||
- Use standardized measures (Cohen's d, r, η²)
|
||||
- Interpret using field conventions
|
||||
- Consider minimum clinically important difference
|
||||
|
||||
### Pitfall 13: Misinterpreting Standardized Effect Sizes
|
||||
**Problem:** Treating Cohen's d = 0.5 as "medium" without context.
|
||||
|
||||
**Reality:**
|
||||
- Field-specific norms vary
|
||||
- Some fields have larger typical effects
|
||||
- Real-world importance depends on context
|
||||
|
||||
**Better approach:**
|
||||
- Compare to effects in same domain
|
||||
- Consider practical implications
|
||||
- Look at raw effect sizes too
|
||||
|
||||
### Pitfall 14: Confusing Explained Variance with Importance
|
||||
**Problem:** "Only explains 5% of variance" = unimportant.
|
||||
|
||||
**Reality:**
|
||||
- Height explains ~5% of variation in NBA player salary but is crucial
|
||||
- Complex phenomena have many small contributors
|
||||
- Predictive accuracy ≠ causal importance
|
||||
|
||||
**Consideration:** Context matters more than percentage alone.
|
||||
|
||||
## Correlation and Causation
|
||||
|
||||
### Pitfall 15: Correlation Implies Causation
|
||||
**Problem:** Inferring causation from correlation.
|
||||
|
||||
**Alternative explanations:**
|
||||
- Reverse causation (B causes A, not A causes B)
|
||||
- Confounding (C causes both A and B)
|
||||
- Coincidence
|
||||
- Selection bias
|
||||
|
||||
**Criteria for causation:**
|
||||
- Temporal precedence
|
||||
- Covariation
|
||||
- No plausible alternatives
|
||||
- Ideally: experimental manipulation
|
||||
|
||||
### Pitfall 16: Ecological Fallacy
|
||||
**Problem:** Inferring individual-level relationships from group-level data.
|
||||
|
||||
**Example:** Countries with more chocolate consumption have more Nobel laureates doesn't mean eating chocolate makes you win Nobels.
|
||||
|
||||
**Why problematic:** Group-level correlations may not hold at individual level.
|
||||
|
||||
### Pitfall 17: Simpson's Paradox
|
||||
**Problem:** Trend appears in groups but reverses when combined (or vice versa).
|
||||
|
||||
**Example:** Treatment appears worse overall but better in every subgroup.
|
||||
|
||||
**Cause:** Confounding variable distributed differently across groups.
|
||||
|
||||
**Solution:** Consider confounders and look at appropriate level of analysis.
|
||||
|
||||
## Regression and Modeling Pitfalls
|
||||
|
||||
### Pitfall 18: Overfitting
|
||||
**Problem:** Model fits sample data well but doesn't generalize.
|
||||
|
||||
**Causes:**
|
||||
- Too many predictors relative to sample size
|
||||
- Fitting noise rather than signal
|
||||
- No cross-validation
|
||||
|
||||
**Solutions:**
|
||||
- Use cross-validation
|
||||
- Penalized regression (LASSO, ridge)
|
||||
- Independent test set
|
||||
- Simpler models
|
||||
|
||||
### Pitfall 19: Extrapolation Beyond Data Range
|
||||
**Problem:** Predicting outside the range of observed data.
|
||||
|
||||
**Why dangerous:**
|
||||
- Relationships may not hold outside observed range
|
||||
- Increased uncertainty not reflected in predictions
|
||||
|
||||
**Solution:** Only interpolate; avoid extrapolation.
|
||||
|
||||
### Pitfall 20: Ignoring Model Assumptions
|
||||
**Problem:** Using statistical tests without checking assumptions.
|
||||
|
||||
**Common violations:**
|
||||
- Non-normality (for parametric tests)
|
||||
- Heteroscedasticity (unequal variances)
|
||||
- Non-independence
|
||||
- Linearity
|
||||
- No multicollinearity
|
||||
|
||||
**Solutions:**
|
||||
- Check assumptions with diagnostics
|
||||
- Use robust methods
|
||||
- Transform data
|
||||
- Use appropriate non-parametric alternatives
|
||||
|
||||
### Pitfall 21: Treating Non-Significant Covariates as Eliminating Confounding
|
||||
**Problem:** "We controlled for X and it wasn't significant, so it's not a confounder."
|
||||
|
||||
**Reality:** Non-significant covariates can still be important confounders. Significance ≠ confounding.
|
||||
|
||||
**Solution:** Include theoretically important covariates regardless of significance.
|
||||
|
||||
### Pitfall 22: Collinearity Masking Effects
|
||||
**Problem:** When predictors are highly correlated, true effects may appear non-significant.
|
||||
|
||||
**Manifestations:**
|
||||
- Large standard errors
|
||||
- Unstable coefficients
|
||||
- Sign changes when adding/removing variables
|
||||
|
||||
**Detection:**
|
||||
- Variance Inflation Factors (VIF)
|
||||
- Correlation matrices
|
||||
|
||||
**Solutions:**
|
||||
- Remove redundant predictors
|
||||
- Combine correlated variables
|
||||
- Use regularization methods
|
||||
|
||||
## Specific Test Misuses
|
||||
|
||||
### Pitfall 23: T-Test for Multiple Groups
|
||||
**Problem:** Conducting multiple t-tests instead of ANOVA.
|
||||
|
||||
**Why wrong:** Inflates Type I error rate dramatically.
|
||||
|
||||
**Correct approach:**
|
||||
- Use ANOVA first
|
||||
- Follow with planned comparisons or post-hoc tests with correction
|
||||
|
||||
### Pitfall 24: Pearson Correlation for Non-Linear Relationships
|
||||
**Problem:** Using Pearson's r for curved relationships.
|
||||
|
||||
**Why misleading:** r measures linear relationships only.
|
||||
|
||||
**Solutions:**
|
||||
- Check scatterplots first
|
||||
- Use Spearman's ρ for monotonic relationships
|
||||
- Consider polynomial or non-linear models
|
||||
|
||||
### Pitfall 25: Chi-Square with Small Expected Frequencies
|
||||
**Problem:** Chi-square test with expected cell counts < 5.
|
||||
|
||||
**Why wrong:** Violates test assumptions, p-values inaccurate.
|
||||
|
||||
**Solutions:**
|
||||
- Fisher's exact test
|
||||
- Combine categories
|
||||
- Increase sample size
|
||||
|
||||
### Pitfall 26: Paired vs. Independent Tests
|
||||
**Problem:** Using independent samples test for paired data (or vice versa).
|
||||
|
||||
**Why wrong:**
|
||||
- Wastes power (paired data analyzed as independent)
|
||||
- Violates independence assumption (independent data analyzed as paired)
|
||||
|
||||
**Solution:** Match test to design.
|
||||
|
||||
## Confidence Interval Misinterpretations
|
||||
|
||||
### Pitfall 27: 95% CI = 95% Probability True Value Inside
|
||||
**Misconception:** "95% chance the true value is in this interval."
|
||||
|
||||
**Reality:** The true value either is or isn't in this specific interval. If we repeated the study many times, 95% of resulting intervals would contain the true value.
|
||||
|
||||
**Better interpretation:** "We're 95% confident this interval contains the true value."
|
||||
|
||||
### Pitfall 28: Overlapping CIs = No Difference
|
||||
**Problem:** Assuming overlapping confidence intervals mean no significant difference.
|
||||
|
||||
**Reality:** Overlapping CIs are less stringent than difference tests. Two CIs can overlap while the difference between groups is significant.
|
||||
|
||||
**Guideline:** Overlap of point estimate with other CI is more relevant than overlap of intervals.
|
||||
|
||||
### Pitfall 29: Ignoring CI Width
|
||||
**Problem:** Focusing only on whether CI includes zero, not precision.
|
||||
|
||||
**Why important:** Wide CIs indicate high uncertainty. "Significant" effects with huge CIs are less convincing.
|
||||
|
||||
**Consider:** Both significance and precision.
|
||||
|
||||
## Bayesian vs. Frequentist Confusions
|
||||
|
||||
### Pitfall 30: Mixing Bayesian and Frequentist Interpretations
|
||||
**Problem:** Making Bayesian statements from frequentist analyses.
|
||||
|
||||
**Examples:**
|
||||
- "Probability hypothesis is true" (Bayesian) from p-value (frequentist)
|
||||
- "Evidence for null" from non-significant result (frequentist can't support null)
|
||||
|
||||
**Solution:**
|
||||
- Be clear about framework
|
||||
- Use Bayesian methods for Bayesian questions
|
||||
- Use Bayes factors to compare hypotheses
|
||||
|
||||
### Pitfall 31: Ignoring Prior Probability
|
||||
**Problem:** Treating all hypotheses as equally likely initially.
|
||||
|
||||
**Reality:** Extraordinary claims need extraordinary evidence. Prior plausibility matters.
|
||||
|
||||
**Consider:**
|
||||
- Plausibility given existing knowledge
|
||||
- Mechanism plausibility
|
||||
- Base rates
|
||||
|
||||
## Data Transformation Issues
|
||||
|
||||
### Pitfall 32: Dichotomizing Continuous Variables
|
||||
**Problem:** Splitting continuous variables at arbitrary cutoffs.
|
||||
|
||||
**Consequences:**
|
||||
- Loss of information and power
|
||||
- Arbitrary distinctions
|
||||
- Discarding individual differences
|
||||
|
||||
**Exceptions:** Clinically meaningful cutoffs with strong justification.
|
||||
|
||||
**Better:** Keep continuous or use multiple categories.
|
||||
|
||||
### Pitfall 33: Trying Multiple Transformations
|
||||
**Problem:** Testing many transformations until finding significance.
|
||||
|
||||
**Why problematic:** Inflates Type I error, is a form of p-hacking.
|
||||
|
||||
**Better approach:**
|
||||
- Prespecify transformations
|
||||
- Use theory-driven transformations
|
||||
- Correct for multiple testing if exploring
|
||||
|
||||
## Missing Data Problems
|
||||
|
||||
### Pitfall 34: Listwise Deletion by Default
|
||||
**Problem:** Automatically deleting all cases with any missing data.
|
||||
|
||||
**Consequences:**
|
||||
- Reduced power
|
||||
- Potential bias if data not missing completely at random (MCAR)
|
||||
|
||||
**Better approaches:**
|
||||
- Multiple imputation
|
||||
- Maximum likelihood methods
|
||||
- Analyze missingness patterns
|
||||
|
||||
### Pitfall 35: Ignoring Missing Data Mechanisms
|
||||
**Problem:** Not considering why data are missing.
|
||||
|
||||
**Types:**
|
||||
- MCAR (Missing Completely at Random): Safe to delete
|
||||
- MAR (Missing at Random): Can impute
|
||||
- MNAR (Missing Not at Random): May bias results
|
||||
|
||||
**Solution:** Analyze patterns, use appropriate methods, consider sensitivity analyses.
|
||||
|
||||
## Publication and Reporting Issues
|
||||
|
||||
### Pitfall 36: Selective Reporting
|
||||
**Problem:** Only reporting significant results or favorable analyses.
|
||||
|
||||
**Consequences:**
|
||||
- Literature appears more consistent than reality
|
||||
- Meta-analyses biased
|
||||
- Wasted research effort
|
||||
|
||||
**Solutions:**
|
||||
- Preregistration
|
||||
- Report all analyses
|
||||
- Use reporting guidelines (CONSORT, PRISMA, etc.)
|
||||
|
||||
### Pitfall 37: Rounding to p < .05
|
||||
**Problem:** Reporting exact p-values selectively (e.g., p = .049 but p < .05 for .051).
|
||||
|
||||
**Why problematic:** Obscures values near threshold, enables p-hacking detection evasion.
|
||||
|
||||
**Better:** Always report exact p-values.
|
||||
|
||||
### Pitfall 38: No Data Sharing
|
||||
**Problem:** Not making data available for verification or reanalysis.
|
||||
|
||||
**Consequences:**
|
||||
- Can't verify results
|
||||
- Can't include in meta-analyses
|
||||
- Hinders scientific progress
|
||||
|
||||
**Best practice:** Share data unless privacy concerns prohibit.
|
||||
|
||||
## Cross-Validation and Generalization
|
||||
|
||||
### Pitfall 39: No Cross-Validation
|
||||
**Problem:** Testing model on same data used to build it.
|
||||
|
||||
**Consequence:** Overly optimistic performance estimates.
|
||||
|
||||
**Solutions:**
|
||||
- Split data (train/test)
|
||||
- K-fold cross-validation
|
||||
- Independent validation sample
|
||||
|
||||
### Pitfall 40: Data Leakage
|
||||
**Problem:** Information from test set leaking into training.
|
||||
|
||||
**Examples:**
|
||||
- Normalizing before splitting
|
||||
- Feature selection on full dataset
|
||||
- Including temporal information
|
||||
|
||||
**Consequence:** Inflated performance metrics.
|
||||
|
||||
**Prevention:** All preprocessing decisions made using only training data.
|
||||
|
||||
## Meta-Analysis Pitfalls
|
||||
|
||||
### Pitfall 41: Apples and Oranges
|
||||
**Problem:** Combining studies with different designs, populations, or measures.
|
||||
|
||||
**Balance:** Need homogeneity but also comprehensiveness.
|
||||
|
||||
**Solutions:**
|
||||
- Clear inclusion criteria
|
||||
- Subgroup analyses
|
||||
- Meta-regression for moderators
|
||||
|
||||
### Pitfall 42: Ignoring Publication Bias
|
||||
**Problem:** Published studies overrepresent significant results.
|
||||
|
||||
**Consequences:** Overestimated effects in meta-analyses.
|
||||
|
||||
**Detection:**
|
||||
- Funnel plots
|
||||
- Trim-and-fill
|
||||
- PET-PEESE
|
||||
- P-curve analysis
|
||||
|
||||
**Solutions:**
|
||||
- Include unpublished studies
|
||||
- Register reviews
|
||||
- Use bias-correction methods
|
||||
|
||||
## General Best Practices
|
||||
|
||||
1. **Preregister studies** - Distinguish confirmatory from exploratory
|
||||
2. **Report transparently** - All analyses, not just significant ones
|
||||
3. **Check assumptions** - Don't blindly apply tests
|
||||
4. **Use appropriate tests** - Match test to data and design
|
||||
5. **Report effect sizes** - Not just p-values
|
||||
6. **Consider practical significance** - Not just statistical
|
||||
7. **Replicate findings** - One study is rarely definitive
|
||||
8. **Share data and code** - Enable verification
|
||||
9. **Use confidence intervals** - Show uncertainty
|
||||
10. **Think causally carefully** - Most research is correlational
|
||||
Reference in New Issue
Block a user