370 lines
11 KiB
Markdown
370 lines
11 KiB
Markdown
# Statistical Assumptions and Diagnostic Procedures
|
||
|
||
This document provides comprehensive guidance on checking and validating statistical assumptions for various analyses.
|
||
|
||
## General Principles
|
||
|
||
1. **Always check assumptions before interpreting test results**
|
||
2. **Use multiple diagnostic methods** (visual + formal tests)
|
||
3. **Consider robustness**: Some tests are robust to violations under certain conditions
|
||
4. **Document all assumption checks** in analysis reports
|
||
5. **Report violations and remedial actions taken**
|
||
|
||
## Common Assumptions Across Tests
|
||
|
||
### 1. Independence of Observations
|
||
|
||
**What it means**: Each observation is independent; measurements on one subject do not influence measurements on another.
|
||
|
||
**How to check**:
|
||
- Review study design and data collection procedures
|
||
- For time series: Check autocorrelation (ACF/PACF plots, Durbin-Watson test)
|
||
- For clustered data: Consider intraclass correlation (ICC)
|
||
|
||
**What to do if violated**:
|
||
- Use mixed-effects models for clustered/hierarchical data
|
||
- Use time series methods for temporally dependent data
|
||
- Use generalized estimating equations (GEE) for correlated data
|
||
|
||
**Critical severity**: HIGH - violations can severely inflate Type I error
|
||
|
||
---
|
||
|
||
### 2. Normality
|
||
|
||
**What it means**: Data or residuals follow a normal (Gaussian) distribution.
|
||
|
||
**When required**:
|
||
- t-tests (for small samples; robust for n > 30 per group)
|
||
- ANOVA (for small samples; robust for n > 30 per group)
|
||
- Linear regression (for residuals)
|
||
- Some correlation tests (Pearson)
|
||
|
||
**How to check**:
|
||
|
||
**Visual methods** (primary):
|
||
- Q-Q (quantile-quantile) plot: Points should fall on diagonal line
|
||
- Histogram with normal curve overlay
|
||
- Kernel density plot
|
||
|
||
**Formal tests** (secondary):
|
||
- Shapiro-Wilk test (recommended for n < 50)
|
||
- Kolmogorov-Smirnov test
|
||
- Anderson-Darling test
|
||
|
||
**Python implementation**:
|
||
```python
|
||
from scipy import stats
|
||
import matplotlib.pyplot as plt
|
||
|
||
# Shapiro-Wilk test
|
||
statistic, p_value = stats.shapiro(data)
|
||
|
||
# Q-Q plot
|
||
stats.probplot(data, dist="norm", plot=plt)
|
||
```
|
||
|
||
**Interpretation guidance**:
|
||
- For n < 30: Both visual and formal tests important
|
||
- For 30 ≤ n < 100: Visual inspection primary, formal tests secondary
|
||
- For n ≥ 100: Formal tests overly sensitive; rely on visual inspection
|
||
- Look for severe skewness, outliers, or bimodality
|
||
|
||
**What to do if violated**:
|
||
- **Mild violations** (slight skewness): Proceed if n > 30 per group
|
||
- **Moderate violations**: Use non-parametric alternatives (Mann-Whitney, Kruskal-Wallis, Wilcoxon)
|
||
- **Severe violations**:
|
||
- Transform data (log, square root, Box-Cox)
|
||
- Use non-parametric methods
|
||
- Use robust regression methods
|
||
- Consider bootstrapping
|
||
|
||
**Critical severity**: MEDIUM - parametric tests are often robust to mild violations with adequate sample size
|
||
|
||
---
|
||
|
||
### 3. Homogeneity of Variance (Homoscedasticity)
|
||
|
||
**What it means**: Variances are equal across groups or across the range of predictors.
|
||
|
||
**When required**:
|
||
- Independent samples t-test
|
||
- ANOVA
|
||
- Linear regression (constant variance of residuals)
|
||
|
||
**How to check**:
|
||
|
||
**Visual methods** (primary):
|
||
- Box plots by group (for t-test/ANOVA)
|
||
- Residuals vs. fitted values plot (for regression) - should show random scatter
|
||
- Scale-location plot (square root of standardized residuals vs. fitted)
|
||
|
||
**Formal tests** (secondary):
|
||
- Levene's test (robust to non-normality)
|
||
- Bartlett's test (sensitive to non-normality, not recommended)
|
||
- Brown-Forsythe test (median-based version of Levene's)
|
||
- Breusch-Pagan test (for regression)
|
||
|
||
**Python implementation**:
|
||
```python
|
||
from scipy import stats
|
||
import pingouin as pg
|
||
|
||
# Levene's test
|
||
statistic, p_value = stats.levene(group1, group2, group3)
|
||
|
||
# For regression
|
||
# Breusch-Pagan test
|
||
from statsmodels.stats.diagnostic import het_breuschpagan
|
||
_, p_value, _, _ = het_breuschpagan(residuals, exog)
|
||
```
|
||
|
||
**Interpretation guidance**:
|
||
- Variance ratio (max/min) < 2-3: Generally acceptable
|
||
- For ANOVA: Test is robust if groups have equal sizes
|
||
- For regression: Look for funnel patterns in residual plots
|
||
|
||
**What to do if violated**:
|
||
- **t-test**: Use Welch's t-test (does not assume equal variances)
|
||
- **ANOVA**: Use Welch's ANOVA or Brown-Forsythe ANOVA
|
||
- **Regression**:
|
||
- Transform dependent variable (log, square root)
|
||
- Use weighted least squares (WLS)
|
||
- Use robust standard errors (HC3)
|
||
- Use generalized linear models (GLM) with appropriate variance function
|
||
|
||
**Critical severity**: MEDIUM - tests can be robust with equal sample sizes
|
||
|
||
---
|
||
|
||
## Test-Specific Assumptions
|
||
|
||
### T-Tests
|
||
|
||
**Assumptions**:
|
||
1. Independence of observations
|
||
2. Normality (each group for independent t-test; differences for paired t-test)
|
||
3. Homogeneity of variance (independent t-test only)
|
||
|
||
**Diagnostic workflow**:
|
||
```python
|
||
import scipy.stats as stats
|
||
import pingouin as pg
|
||
|
||
# Check normality for each group
|
||
stats.shapiro(group1)
|
||
stats.shapiro(group2)
|
||
|
||
# Check homogeneity of variance
|
||
stats.levene(group1, group2)
|
||
|
||
# If assumptions violated:
|
||
# Option 1: Welch's t-test (unequal variances)
|
||
pg.ttest(group1, group2, correction=False) # Welch's
|
||
|
||
# Option 2: Non-parametric alternative
|
||
pg.mwu(group1, group2) # Mann-Whitney U
|
||
```
|
||
|
||
---
|
||
|
||
### ANOVA
|
||
|
||
**Assumptions**:
|
||
1. Independence of observations within and between groups
|
||
2. Normality in each group
|
||
3. Homogeneity of variance across groups
|
||
|
||
**Additional considerations**:
|
||
- For repeated measures ANOVA: Sphericity assumption (Mauchly's test)
|
||
|
||
**Diagnostic workflow**:
|
||
```python
|
||
import pingouin as pg
|
||
|
||
# Check normality per group
|
||
for group in df['group'].unique():
|
||
data = df[df['group'] == group]['value']
|
||
stats.shapiro(data)
|
||
|
||
# Check homogeneity of variance
|
||
pg.homoscedasticity(df, dv='value', group='group')
|
||
|
||
# For repeated measures: Check sphericity
|
||
# Automatically tested in pingouin's rm_anova
|
||
```
|
||
|
||
**What to do if sphericity violated** (repeated measures):
|
||
- Greenhouse-Geisser correction (ε < 0.75)
|
||
- Huynh-Feldt correction (ε > 0.75)
|
||
- Use multivariate approach (MANOVA)
|
||
|
||
---
|
||
|
||
### Linear Regression
|
||
|
||
**Assumptions**:
|
||
1. **Linearity**: Relationship between X and Y is linear
|
||
2. **Independence**: Residuals are independent
|
||
3. **Homoscedasticity**: Constant variance of residuals
|
||
4. **Normality**: Residuals are normally distributed
|
||
5. **No multicollinearity**: Predictors are not highly correlated (multiple regression)
|
||
|
||
**Diagnostic workflow**:
|
||
|
||
**1. Linearity**:
|
||
```python
|
||
import matplotlib.pyplot as plt
|
||
import seaborn as sns
|
||
|
||
# Scatter plots of Y vs each X
|
||
# Residuals vs. fitted values (should be randomly scattered)
|
||
plt.scatter(fitted_values, residuals)
|
||
plt.axhline(y=0, color='r', linestyle='--')
|
||
```
|
||
|
||
**2. Independence**:
|
||
```python
|
||
from statsmodels.stats.stattools import durbin_watson
|
||
|
||
# Durbin-Watson test (for time series)
|
||
dw_statistic = durbin_watson(residuals)
|
||
# Values between 1.5-2.5 suggest independence
|
||
```
|
||
|
||
**3. Homoscedasticity**:
|
||
```python
|
||
# Breusch-Pagan test
|
||
from statsmodels.stats.diagnostic import het_breuschpagan
|
||
_, p_value, _, _ = het_breuschpagan(residuals, exog)
|
||
|
||
# Visual: Scale-location plot
|
||
plt.scatter(fitted_values, np.sqrt(np.abs(std_residuals)))
|
||
```
|
||
|
||
**4. Normality of residuals**:
|
||
```python
|
||
# Q-Q plot of residuals
|
||
stats.probplot(residuals, dist="norm", plot=plt)
|
||
|
||
# Shapiro-Wilk test
|
||
stats.shapiro(residuals)
|
||
```
|
||
|
||
**5. Multicollinearity**:
|
||
```python
|
||
from statsmodels.stats.outliers_influence import variance_inflation_factor
|
||
|
||
# Calculate VIF for each predictor
|
||
vif_data = pd.DataFrame()
|
||
vif_data["feature"] = X.columns
|
||
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(len(X.columns))]
|
||
|
||
# VIF > 10 indicates severe multicollinearity
|
||
# VIF > 5 indicates moderate multicollinearity
|
||
```
|
||
|
||
**What to do if violated**:
|
||
- **Non-linearity**: Add polynomial terms, use GAM, or transform variables
|
||
- **Heteroscedasticity**: Transform Y, use WLS, use robust SE
|
||
- **Non-normal residuals**: Transform Y, use robust methods, check for outliers
|
||
- **Multicollinearity**: Remove correlated predictors, use PCA, ridge regression
|
||
|
||
---
|
||
|
||
### Logistic Regression
|
||
|
||
**Assumptions**:
|
||
1. **Independence**: Observations are independent
|
||
2. **Linearity**: Linear relationship between log-odds and continuous predictors
|
||
3. **No perfect multicollinearity**: Predictors not perfectly correlated
|
||
4. **Large sample size**: At least 10-20 events per predictor
|
||
|
||
**Diagnostic workflow**:
|
||
|
||
**1. Linearity of logit**:
|
||
```python
|
||
# Box-Tidwell test: Add interaction with log of continuous predictor
|
||
# If interaction is significant, linearity violated
|
||
```
|
||
|
||
**2. Multicollinearity**:
|
||
```python
|
||
# Use VIF as in linear regression
|
||
```
|
||
|
||
**3. Influential observations**:
|
||
```python
|
||
# Cook's distance, DFBetas, leverage
|
||
from statsmodels.stats.outliers_influence import OLSInfluence
|
||
|
||
influence = OLSInfluence(model)
|
||
cooks_d = influence.cooks_distance
|
||
```
|
||
|
||
**4. Model fit**:
|
||
```python
|
||
# Hosmer-Lemeshow test
|
||
# Pseudo R-squared
|
||
# Classification metrics (accuracy, AUC-ROC)
|
||
```
|
||
|
||
---
|
||
|
||
## Outlier Detection
|
||
|
||
**Methods**:
|
||
1. **Visual**: Box plots, scatter plots
|
||
2. **Statistical**:
|
||
- Z-scores: |z| > 3 suggests outlier
|
||
- IQR method: Values < Q1 - 1.5×IQR or > Q3 + 1.5×IQR
|
||
- Modified Z-score using median absolute deviation (robust to outliers)
|
||
|
||
**For regression**:
|
||
- **Leverage**: High leverage points (hat values)
|
||
- **Influence**: Cook's distance > 4/n suggests influential point
|
||
- **Outliers**: Studentized residuals > ±3
|
||
|
||
**What to do**:
|
||
1. Investigate data entry errors
|
||
2. Consider if outliers are valid observations
|
||
3. Report sensitivity analysis (results with and without outliers)
|
||
4. Use robust methods if outliers are legitimate
|
||
|
||
---
|
||
|
||
## Sample Size Considerations
|
||
|
||
### Minimum Sample Sizes (Rules of Thumb)
|
||
|
||
- **T-test**: n ≥ 30 per group for robustness to non-normality
|
||
- **ANOVA**: n ≥ 30 per group
|
||
- **Correlation**: n ≥ 30 for adequate power
|
||
- **Simple regression**: n ≥ 50
|
||
- **Multiple regression**: n ≥ 10-20 per predictor (minimum 10 + k predictors)
|
||
- **Logistic regression**: n ≥ 10-20 events per predictor
|
||
|
||
### Small Sample Considerations
|
||
|
||
For small samples:
|
||
- Assumptions become more critical
|
||
- Use exact tests when available (Fisher's exact, exact logistic regression)
|
||
- Consider non-parametric alternatives
|
||
- Use permutation tests or bootstrap methods
|
||
- Be conservative with interpretation
|
||
|
||
---
|
||
|
||
## Reporting Assumption Checks
|
||
|
||
When reporting analyses, include:
|
||
|
||
1. **Statement of assumptions checked**: List all assumptions tested
|
||
2. **Methods used**: Describe visual and formal tests employed
|
||
3. **Results of diagnostic tests**: Report test statistics and p-values
|
||
4. **Assessment**: State whether assumptions were met or violated
|
||
5. **Actions taken**: If violated, describe remedial actions (transformations, alternative tests, robust methods)
|
||
|
||
**Example reporting statement**:
|
||
> "Normality was assessed using Shapiro-Wilk tests and Q-Q plots. Data for Group A (W = 0.97, p = .18) and Group B (W = 0.96, p = .12) showed no significant departure from normality. Homogeneity of variance was assessed using Levene's test, which was non-significant (F(1, 58) = 1.23, p = .27), indicating equal variances across groups. Therefore, assumptions for the independent samples t-test were satisfied."
|