Initial commit
This commit is contained in:
859
skills/statsmodels/references/stats_diagnostics.md
Normal file
859
skills/statsmodels/references/stats_diagnostics.md
Normal file
@@ -0,0 +1,859 @@
|
||||
# Statistical Tests and Diagnostics Reference
|
||||
|
||||
This document provides comprehensive guidance on statistical tests, diagnostics, and tools available in statsmodels.
|
||||
|
||||
## Overview
|
||||
|
||||
Statsmodels provides extensive statistical testing capabilities:
|
||||
- Residual diagnostics and specification tests
|
||||
- Hypothesis testing (parametric and non-parametric)
|
||||
- Goodness-of-fit tests
|
||||
- Multiple comparisons and post-hoc tests
|
||||
- Power and sample size calculations
|
||||
- Robust covariance matrices
|
||||
- Influence and outlier detection
|
||||
|
||||
## Residual Diagnostics
|
||||
|
||||
### Autocorrelation Tests
|
||||
|
||||
**Ljung-Box Test**: Tests for autocorrelation in residuals
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import acorr_ljungbox
|
||||
|
||||
# Test residuals for autocorrelation
|
||||
lb_test = acorr_ljungbox(residuals, lags=10, return_df=True)
|
||||
print(lb_test)
|
||||
|
||||
# H0: No autocorrelation up to lag k
|
||||
# If p-value < 0.05, reject H0 (autocorrelation present)
|
||||
```
|
||||
|
||||
**Durbin-Watson Test**: Tests for first-order autocorrelation
|
||||
|
||||
```python
|
||||
from statsmodels.stats.stattools import durbin_watson
|
||||
|
||||
dw_stat = durbin_watson(residuals)
|
||||
print(f"Durbin-Watson: {dw_stat:.4f}")
|
||||
|
||||
# DW ≈ 2: no autocorrelation
|
||||
# DW < 2: positive autocorrelation
|
||||
# DW > 2: negative autocorrelation
|
||||
# Exact critical values depend on n and k
|
||||
```
|
||||
|
||||
**Breusch-Godfrey Test**: More general test for autocorrelation
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import acorr_breusch_godfrey
|
||||
|
||||
bg_test = acorr_breusch_godfrey(results, nlags=5)
|
||||
lm_stat, lm_pval, f_stat, f_pval = bg_test
|
||||
|
||||
print(f"LM statistic: {lm_stat:.4f}, p-value: {lm_pval:.4f}")
|
||||
# H0: No autocorrelation up to lag k
|
||||
```
|
||||
|
||||
### Heteroskedasticity Tests
|
||||
|
||||
**Breusch-Pagan Test**: Tests for heteroskedasticity
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import het_breuschpagan
|
||||
|
||||
bp_test = het_breuschpagan(residuals, exog)
|
||||
lm_stat, lm_pval, f_stat, f_pval = bp_test
|
||||
|
||||
print(f"Breusch-Pagan test p-value: {lm_pval:.4f}")
|
||||
# H0: Homoskedasticity (constant variance)
|
||||
# If p-value < 0.05, reject H0 (heteroskedasticity present)
|
||||
```
|
||||
|
||||
**White Test**: More general test for heteroskedasticity
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import het_white
|
||||
|
||||
white_test = het_white(residuals, exog)
|
||||
lm_stat, lm_pval, f_stat, f_pval = white_test
|
||||
|
||||
print(f"White test p-value: {lm_pval:.4f}")
|
||||
# H0: Homoskedasticity
|
||||
```
|
||||
|
||||
**ARCH Test**: Tests for autoregressive conditional heteroskedasticity
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import het_arch
|
||||
|
||||
arch_test = het_arch(residuals, nlags=5)
|
||||
lm_stat, lm_pval, f_stat, f_pval = arch_test
|
||||
|
||||
print(f"ARCH test p-value: {lm_pval:.4f}")
|
||||
# H0: No ARCH effects
|
||||
# If significant, consider GARCH model
|
||||
```
|
||||
|
||||
### Normality Tests
|
||||
|
||||
**Jarque-Bera Test**: Tests for normality using skewness and kurtosis
|
||||
|
||||
```python
|
||||
from statsmodels.stats.stattools import jarque_bera
|
||||
|
||||
jb_stat, jb_pval, skew, kurtosis = jarque_bera(residuals)
|
||||
|
||||
print(f"Jarque-Bera statistic: {jb_stat:.4f}")
|
||||
print(f"p-value: {jb_pval:.4f}")
|
||||
print(f"Skewness: {skew:.4f}")
|
||||
print(f"Kurtosis: {kurtosis:.4f}")
|
||||
|
||||
# H0: Residuals are normally distributed
|
||||
# Normal: skewness ≈ 0, kurtosis ≈ 3
|
||||
```
|
||||
|
||||
**Omnibus Test**: Another normality test (also based on skewness/kurtosis)
|
||||
|
||||
```python
|
||||
from statsmodels.stats.stattools import omni_normtest
|
||||
|
||||
omni_stat, omni_pval = omni_normtest(residuals)
|
||||
print(f"Omnibus test p-value: {omni_pval:.4f}")
|
||||
# H0: Normality
|
||||
```
|
||||
|
||||
**Anderson-Darling Test**: Distribution fit test
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import normal_ad
|
||||
|
||||
ad_stat, ad_pval = normal_ad(residuals)
|
||||
print(f"Anderson-Darling test p-value: {ad_pval:.4f}")
|
||||
```
|
||||
|
||||
**Lilliefors Test**: Modified Kolmogorov-Smirnov test
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import lilliefors
|
||||
|
||||
lf_stat, lf_pval = lilliefors(residuals, dist='norm')
|
||||
print(f"Lilliefors test p-value: {lf_pval:.4f}")
|
||||
```
|
||||
|
||||
### Linearity and Specification Tests
|
||||
|
||||
**Ramsey RESET Test**: Tests for functional form misspecification
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import linear_reset
|
||||
|
||||
reset_test = linear_reset(results, power=2)
|
||||
f_stat, f_pval = reset_test
|
||||
|
||||
print(f"RESET test p-value: {f_pval:.4f}")
|
||||
# H0: Model is correctly specified (linear)
|
||||
# If rejected, may need polynomial terms or transformations
|
||||
```
|
||||
|
||||
**Harvey-Collier Test**: Tests for linearity
|
||||
|
||||
```python
|
||||
from statsmodels.stats.diagnostic import linear_harvey_collier
|
||||
|
||||
hc_stat, hc_pval = linear_harvey_collier(results)
|
||||
print(f"Harvey-Collier test p-value: {hc_pval:.4f}")
|
||||
# H0: Linear specification is correct
|
||||
```
|
||||
|
||||
## Multicollinearity Detection
|
||||
|
||||
**Variance Inflation Factor (VIF)**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.outliers_influence import variance_inflation_factor
|
||||
import pandas as pd
|
||||
|
||||
# Calculate VIF for each variable
|
||||
vif_data = pd.DataFrame()
|
||||
vif_data["Variable"] = X.columns
|
||||
vif_data["VIF"] = [variance_inflation_factor(X.values, i)
|
||||
for i in range(X.shape[1])]
|
||||
|
||||
print(vif_data.sort_values('VIF', ascending=False))
|
||||
|
||||
# Interpretation:
|
||||
# VIF = 1: No correlation with other predictors
|
||||
# VIF > 5: Moderate multicollinearity
|
||||
# VIF > 10: Serious multicollinearity problem
|
||||
# VIF > 20: Severe multicollinearity (consider removing variable)
|
||||
```
|
||||
|
||||
**Condition Number**: From regression results
|
||||
|
||||
```python
|
||||
print(f"Condition number: {results.condition_number:.2f}")
|
||||
|
||||
# Interpretation:
|
||||
# < 10: No multicollinearity concern
|
||||
# 10-30: Moderate multicollinearity
|
||||
# > 30: Strong multicollinearity
|
||||
# > 100: Severe multicollinearity
|
||||
```
|
||||
|
||||
## Influence and Outlier Detection
|
||||
|
||||
### Leverage
|
||||
|
||||
High leverage points have extreme predictor values.
|
||||
|
||||
```python
|
||||
from statsmodels.stats.outliers_influence import OLSInfluence
|
||||
|
||||
influence = results.get_influence()
|
||||
|
||||
# Hat values (leverage)
|
||||
leverage = influence.hat_matrix_diag
|
||||
|
||||
# Rule of thumb: leverage > 2*p/n or 3*p/n is high
|
||||
# p = number of parameters, n = sample size
|
||||
threshold = 2 * len(results.params) / len(y)
|
||||
high_leverage = np.where(leverage > threshold)[0]
|
||||
|
||||
print(f"High leverage observations: {high_leverage}")
|
||||
```
|
||||
|
||||
### Cook's Distance
|
||||
|
||||
Measures overall influence of each observation.
|
||||
|
||||
```python
|
||||
# Cook's distance
|
||||
cooks_d = influence.cooks_distance[0]
|
||||
|
||||
# Rule of thumb: Cook's D > 4/n is influential
|
||||
threshold = 4 / len(y)
|
||||
influential = np.where(cooks_d > threshold)[0]
|
||||
|
||||
print(f"Influential observations (Cook's D): {influential}")
|
||||
|
||||
# Plot
|
||||
import matplotlib.pyplot as plt
|
||||
plt.stem(range(len(cooks_d)), cooks_d)
|
||||
plt.axhline(y=threshold, color='r', linestyle='--', label=f'Threshold (4/n)')
|
||||
plt.xlabel('Observation')
|
||||
plt.ylabel("Cook's Distance")
|
||||
plt.legend()
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### DFFITS
|
||||
|
||||
Measures influence on fitted value.
|
||||
|
||||
```python
|
||||
# DFFITS
|
||||
dffits = influence.dffits[0]
|
||||
|
||||
# Rule of thumb: |DFFITS| > 2*sqrt(p/n) is influential
|
||||
p = len(results.params)
|
||||
n = len(y)
|
||||
threshold = 2 * np.sqrt(p / n)
|
||||
|
||||
influential_dffits = np.where(np.abs(dffits) > threshold)[0]
|
||||
print(f"Influential observations (DFFITS): {influential_dffits}")
|
||||
```
|
||||
|
||||
### DFBETAs
|
||||
|
||||
Measures influence on each coefficient.
|
||||
|
||||
```python
|
||||
# DFBETAs (one for each parameter)
|
||||
dfbetas = influence.dfbetas
|
||||
|
||||
# Rule of thumb: |DFBETA| > 2/sqrt(n)
|
||||
threshold = 2 / np.sqrt(n)
|
||||
|
||||
for i, param_name in enumerate(results.params.index):
|
||||
influential = np.where(np.abs(dfbetas[:, i]) > threshold)[0]
|
||||
if len(influential) > 0:
|
||||
print(f"Influential for {param_name}: {influential}")
|
||||
```
|
||||
|
||||
### Influence Plot
|
||||
|
||||
```python
|
||||
from statsmodels.graphics.regressionplots import influence_plot
|
||||
|
||||
fig, ax = plt.subplots(figsize=(12, 8))
|
||||
influence_plot(results, ax=ax, criterion='cooks')
|
||||
plt.show()
|
||||
|
||||
# Combines leverage, residuals, and Cook's distance
|
||||
# Large bubbles = high Cook's distance
|
||||
# Far from x=0 = high leverage
|
||||
# Far from y=0 = large residual
|
||||
```
|
||||
|
||||
### Studentized Residuals
|
||||
|
||||
```python
|
||||
# Studentized residuals (outliers)
|
||||
student_resid = influence.resid_studentized_internal
|
||||
|
||||
# External studentized residuals (more conservative)
|
||||
student_resid_external = influence.resid_studentized_external
|
||||
|
||||
# Outliers: |studentized residual| > 3 (or > 2.5)
|
||||
outliers = np.where(np.abs(student_resid_external) > 3)[0]
|
||||
print(f"Outliers: {outliers}")
|
||||
```
|
||||
|
||||
## Hypothesis Testing
|
||||
|
||||
### t-tests
|
||||
|
||||
**One-sample t-test**: Test if mean equals specific value
|
||||
|
||||
```python
|
||||
from scipy import stats
|
||||
|
||||
# H0: population mean = mu_0
|
||||
t_stat, p_value = stats.ttest_1samp(data, popmean=mu_0)
|
||||
|
||||
print(f"t-statistic: {t_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
**Two-sample t-test**: Compare means of two groups
|
||||
|
||||
```python
|
||||
# H0: mean1 = mean2 (equal variances)
|
||||
t_stat, p_value = stats.ttest_ind(group1, group2)
|
||||
|
||||
# Welch's t-test (unequal variances)
|
||||
t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=False)
|
||||
|
||||
print(f"t-statistic: {t_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
**Paired t-test**: Compare paired observations
|
||||
|
||||
```python
|
||||
# H0: mean difference = 0
|
||||
t_stat, p_value = stats.ttest_rel(before, after)
|
||||
|
||||
print(f"t-statistic: {t_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
### Proportion Tests
|
||||
|
||||
**One-proportion test**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.proportion import proportions_ztest
|
||||
|
||||
# H0: proportion = p0
|
||||
count = 45 # successes
|
||||
nobs = 100 # total observations
|
||||
p0 = 0.5 # hypothesized proportion
|
||||
|
||||
z_stat, p_value = proportions_ztest(count, nobs, value=p0)
|
||||
|
||||
print(f"z-statistic: {z_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
**Two-proportion test**:
|
||||
|
||||
```python
|
||||
# H0: proportion1 = proportion2
|
||||
counts = [45, 60]
|
||||
nobs = [100, 120]
|
||||
|
||||
z_stat, p_value = proportions_ztest(counts, nobs)
|
||||
print(f"z-statistic: {z_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
### Chi-square Tests
|
||||
|
||||
**Chi-square test of independence**:
|
||||
|
||||
```python
|
||||
from scipy.stats import chi2_contingency
|
||||
|
||||
# Contingency table
|
||||
contingency_table = pd.crosstab(variable1, variable2)
|
||||
|
||||
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
|
||||
|
||||
print(f"Chi-square statistic: {chi2:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
print(f"Degrees of freedom: {dof}")
|
||||
|
||||
# H0: Variables are independent
|
||||
```
|
||||
|
||||
**Chi-square goodness-of-fit**:
|
||||
|
||||
```python
|
||||
from scipy.stats import chisquare
|
||||
|
||||
# Observed frequencies
|
||||
observed = [20, 30, 25, 25]
|
||||
|
||||
# Expected frequencies (equal by default)
|
||||
expected = [25, 25, 25, 25]
|
||||
|
||||
chi2, p_value = chisquare(observed, expected)
|
||||
|
||||
print(f"Chi-square statistic: {chi2:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
|
||||
# H0: Data follow the expected distribution
|
||||
```
|
||||
|
||||
### Non-parametric Tests
|
||||
|
||||
**Mann-Whitney U test** (independent samples):
|
||||
|
||||
```python
|
||||
from scipy.stats import mannwhitneyu
|
||||
|
||||
# H0: Distributions are equal
|
||||
u_stat, p_value = mannwhitneyu(group1, group2, alternative='two-sided')
|
||||
|
||||
print(f"U statistic: {u_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
**Wilcoxon signed-rank test** (paired samples):
|
||||
|
||||
```python
|
||||
from scipy.stats import wilcoxon
|
||||
|
||||
# H0: Median difference = 0
|
||||
w_stat, p_value = wilcoxon(before, after)
|
||||
|
||||
print(f"W statistic: {w_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
**Kruskal-Wallis H test** (>2 groups):
|
||||
|
||||
```python
|
||||
from scipy.stats import kruskal
|
||||
|
||||
# H0: All groups have same distribution
|
||||
h_stat, p_value = kruskal(group1, group2, group3)
|
||||
|
||||
print(f"H statistic: {h_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
**Sign test**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.descriptivestats import sign_test
|
||||
|
||||
# H0: Median = m0
|
||||
result = sign_test(data, m0=0)
|
||||
print(result)
|
||||
```
|
||||
|
||||
### ANOVA
|
||||
|
||||
**One-way ANOVA**:
|
||||
|
||||
```python
|
||||
from scipy.stats import f_oneway
|
||||
|
||||
# H0: All group means are equal
|
||||
f_stat, p_value = f_oneway(group1, group2, group3)
|
||||
|
||||
print(f"F-statistic: {f_stat:.4f}")
|
||||
print(f"p-value: {p_value:.4f}")
|
||||
```
|
||||
|
||||
**Two-way ANOVA** (with statsmodels):
|
||||
|
||||
```python
|
||||
from statsmodels.formula.api import ols
|
||||
from statsmodels.stats.anova import anova_lm
|
||||
|
||||
# Fit model
|
||||
model = ols('response ~ C(factor1) + C(factor2) + C(factor1):C(factor2)',
|
||||
data=df).fit()
|
||||
|
||||
# ANOVA table
|
||||
anova_table = anova_lm(model, typ=2)
|
||||
print(anova_table)
|
||||
```
|
||||
|
||||
**Repeated measures ANOVA**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.anova import AnovaRM
|
||||
|
||||
# Requires long-format data
|
||||
aovrm = AnovaRM(df, depvar='score', subject='subject_id', within=['time'])
|
||||
results = aovrm.fit()
|
||||
|
||||
print(results.summary())
|
||||
```
|
||||
|
||||
## Multiple Comparisons
|
||||
|
||||
### Post-hoc Tests
|
||||
|
||||
**Tukey's HSD** (Honest Significant Difference):
|
||||
|
||||
```python
|
||||
from statsmodels.stats.multicomp import pairwise_tukeyhsd
|
||||
|
||||
# Perform Tukey HSD test
|
||||
tukey = pairwise_tukeyhsd(data, groups, alpha=0.05)
|
||||
|
||||
print(tukey.summary())
|
||||
|
||||
# Plot confidence intervals
|
||||
tukey.plot_simultaneous()
|
||||
plt.show()
|
||||
```
|
||||
|
||||
**Bonferroni correction**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.multitest import multipletests
|
||||
|
||||
# P-values from multiple tests
|
||||
p_values = [0.01, 0.03, 0.04, 0.15, 0.001]
|
||||
|
||||
# Apply correction
|
||||
reject, pvals_corrected, alphac_sidak, alphac_bonf = multipletests(
|
||||
p_values,
|
||||
alpha=0.05,
|
||||
method='bonferroni'
|
||||
)
|
||||
|
||||
print("Rejected:", reject)
|
||||
print("Corrected p-values:", pvals_corrected)
|
||||
```
|
||||
|
||||
**False Discovery Rate (FDR)**:
|
||||
|
||||
```python
|
||||
# FDR correction (less conservative than Bonferroni)
|
||||
reject, pvals_corrected, alphac_sidak, alphac_bonf = multipletests(
|
||||
p_values,
|
||||
alpha=0.05,
|
||||
method='fdr_bh' # Benjamini-Hochberg
|
||||
)
|
||||
|
||||
print("Rejected:", reject)
|
||||
print("Corrected p-values:", pvals_corrected)
|
||||
```
|
||||
|
||||
## Robust Covariance Matrices
|
||||
|
||||
### Heteroskedasticity-Consistent (HC) Standard Errors
|
||||
|
||||
```python
|
||||
# After fitting OLS
|
||||
results = sm.OLS(y, X).fit()
|
||||
|
||||
# HC0 (White's heteroskedasticity-consistent SEs)
|
||||
results_hc0 = results.get_robustcov_results(cov_type='HC0')
|
||||
|
||||
# HC1 (degrees of freedom adjustment)
|
||||
results_hc1 = results.get_robustcov_results(cov_type='HC1')
|
||||
|
||||
# HC2 (leverage adjustment)
|
||||
results_hc2 = results.get_robustcov_results(cov_type='HC2')
|
||||
|
||||
# HC3 (most conservative, recommended for small samples)
|
||||
results_hc3 = results.get_robustcov_results(cov_type='HC3')
|
||||
|
||||
print("Standard OLS SEs:", results.bse)
|
||||
print("Robust HC3 SEs:", results_hc3.bse)
|
||||
```
|
||||
|
||||
### HAC (Heteroskedasticity and Autocorrelation Consistent)
|
||||
|
||||
**Newey-West standard errors**:
|
||||
|
||||
```python
|
||||
# For time series with autocorrelation and heteroskedasticity
|
||||
results_hac = results.get_robustcov_results(cov_type='HAC', maxlags=4)
|
||||
|
||||
print("HAC (Newey-West) SEs:", results_hac.bse)
|
||||
print(results_hac.summary())
|
||||
```
|
||||
|
||||
### Cluster-Robust Standard Errors
|
||||
|
||||
```python
|
||||
# For clustered/grouped data
|
||||
results_cluster = results.get_robustcov_results(
|
||||
cov_type='cluster',
|
||||
groups=cluster_ids
|
||||
)
|
||||
|
||||
print("Cluster-robust SEs:", results_cluster.bse)
|
||||
```
|
||||
|
||||
## Descriptive Statistics
|
||||
|
||||
**Basic descriptive statistics**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.api import DescrStatsW
|
||||
|
||||
# Comprehensive descriptive stats
|
||||
desc = DescrStatsW(data)
|
||||
|
||||
print("Mean:", desc.mean)
|
||||
print("Std Dev:", desc.std)
|
||||
print("Variance:", desc.var)
|
||||
print("Confidence interval:", desc.tconfint_mean())
|
||||
|
||||
# Quantiles
|
||||
print("Median:", desc.quantile(0.5))
|
||||
print("IQR:", desc.quantile([0.25, 0.75]))
|
||||
```
|
||||
|
||||
**Weighted statistics**:
|
||||
|
||||
```python
|
||||
# With weights
|
||||
desc_weighted = DescrStatsW(data, weights=weights)
|
||||
|
||||
print("Weighted mean:", desc_weighted.mean)
|
||||
print("Weighted std:", desc_weighted.std)
|
||||
```
|
||||
|
||||
**Compare two groups**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.weightstats import CompareMeans
|
||||
|
||||
# Create comparison object
|
||||
cm = CompareMeans(DescrStatsW(group1), DescrStatsW(group2))
|
||||
|
||||
# t-test
|
||||
print("t-test:", cm.ttest_ind())
|
||||
|
||||
# Confidence interval for difference
|
||||
print("CI for difference:", cm.tconfint_diff())
|
||||
|
||||
# Test for equal variances
|
||||
print("Equal variance test:", cm.test_equal_var())
|
||||
```
|
||||
|
||||
## Power Analysis and Sample Size
|
||||
|
||||
**Power for t-test**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.power import tt_ind_solve_power
|
||||
|
||||
# Solve for sample size
|
||||
effect_size = 0.5 # Cohen's d
|
||||
alpha = 0.05
|
||||
power = 0.8
|
||||
|
||||
n = tt_ind_solve_power(effect_size=effect_size,
|
||||
alpha=alpha,
|
||||
power=power,
|
||||
alternative='two-sided')
|
||||
|
||||
print(f"Required sample size per group: {n:.0f}")
|
||||
|
||||
# Solve for power given n
|
||||
power = tt_ind_solve_power(effect_size=0.5,
|
||||
nobs1=50,
|
||||
alpha=0.05,
|
||||
alternative='two-sided')
|
||||
|
||||
print(f"Power: {power:.4f}")
|
||||
```
|
||||
|
||||
**Power for proportion test**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.power import zt_ind_solve_power
|
||||
|
||||
# For proportion tests (z-test)
|
||||
effect_size = 0.3 # Difference in proportions
|
||||
alpha = 0.05
|
||||
power = 0.8
|
||||
|
||||
n = zt_ind_solve_power(effect_size=effect_size,
|
||||
alpha=alpha,
|
||||
power=power,
|
||||
alternative='two-sided')
|
||||
|
||||
print(f"Required sample size per group: {n:.0f}")
|
||||
```
|
||||
|
||||
**Power curves**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.power import TTestIndPower
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Create power analysis object
|
||||
analysis = TTestIndPower()
|
||||
|
||||
# Plot power curves for different sample sizes
|
||||
sample_sizes = range(10, 200, 10)
|
||||
effect_sizes = [0.2, 0.5, 0.8] # Small, medium, large
|
||||
|
||||
fig, ax = plt.subplots(figsize=(10, 6))
|
||||
|
||||
for es in effect_sizes:
|
||||
power = [analysis.solve_power(effect_size=es, nobs1=n, alpha=0.05)
|
||||
for n in sample_sizes]
|
||||
ax.plot(sample_sizes, power, label=f'Effect size = {es}')
|
||||
|
||||
ax.axhline(y=0.8, color='r', linestyle='--', label='Power = 0.8')
|
||||
ax.set_xlabel('Sample size per group')
|
||||
ax.set_ylabel('Power')
|
||||
ax.set_title('Power Curves for Two-Sample t-test')
|
||||
ax.legend()
|
||||
ax.grid(True, alpha=0.3)
|
||||
plt.show()
|
||||
```
|
||||
|
||||
## Effect Sizes
|
||||
|
||||
**Cohen's d** (standardized mean difference):
|
||||
|
||||
```python
|
||||
def cohens_d(group1, group2):
|
||||
\"\"\"Calculate Cohen's d for independent samples\"\"\"
|
||||
n1, n2 = len(group1), len(group2)
|
||||
var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
|
||||
|
||||
# Pooled standard deviation
|
||||
pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
|
||||
|
||||
# Cohen's d
|
||||
d = (np.mean(group1) - np.mean(group2)) / pooled_std
|
||||
|
||||
return d
|
||||
|
||||
d = cohens_d(group1, group2)
|
||||
print(f"Cohen's d: {d:.4f}")
|
||||
|
||||
# Interpretation:
|
||||
# |d| < 0.2: negligible
|
||||
# |d| ~ 0.2: small
|
||||
# |d| ~ 0.5: medium
|
||||
# |d| ~ 0.8: large
|
||||
```
|
||||
|
||||
**Eta-squared** (for ANOVA):
|
||||
|
||||
```python
|
||||
# From ANOVA table
|
||||
# η² = SS_between / SS_total
|
||||
|
||||
def eta_squared(anova_table):
|
||||
return anova_table['sum_sq'][0] / anova_table['sum_sq'].sum()
|
||||
|
||||
# After running ANOVA
|
||||
eta_sq = eta_squared(anova_table)
|
||||
print(f"Eta-squared: {eta_sq:.4f}")
|
||||
|
||||
# Interpretation:
|
||||
# 0.01: small effect
|
||||
# 0.06: medium effect
|
||||
# 0.14: large effect
|
||||
```
|
||||
|
||||
## Contingency Tables and Association
|
||||
|
||||
**McNemar's test** (paired binary data):
|
||||
|
||||
```python
|
||||
from statsmodels.stats.contingency_tables import mcnemar
|
||||
|
||||
# 2x2 contingency table
|
||||
table = [[a, b],
|
||||
[c, d]]
|
||||
|
||||
result = mcnemar(table, exact=True) # or exact=False for large samples
|
||||
print(f"p-value: {result.pvalue:.4f}")
|
||||
|
||||
# H0: Marginal probabilities are equal
|
||||
```
|
||||
|
||||
**Cochran-Mantel-Haenszel test**:
|
||||
|
||||
```python
|
||||
from statsmodels.stats.contingency_tables import StratifiedTable
|
||||
|
||||
# For stratified 2x2 tables
|
||||
strat_table = StratifiedTable(tables_list)
|
||||
result = strat_table.test_null_odds()
|
||||
|
||||
print(f"p-value: {result.pvalue:.4f}")
|
||||
```
|
||||
|
||||
## Treatment Effects and Causal Inference
|
||||
|
||||
**Propensity score matching**:
|
||||
|
||||
```python
|
||||
from statsmodels.treatment import propensity_score
|
||||
|
||||
# Estimate propensity scores
|
||||
ps_model = sm.Logit(treatment, X).fit()
|
||||
propensity_scores = ps_model.predict(X)
|
||||
|
||||
# Use for matching or weighting
|
||||
# (manual implementation of matching needed)
|
||||
```
|
||||
|
||||
**Difference-in-differences**:
|
||||
|
||||
```python
|
||||
# Did formula: outcome ~ treatment * post
|
||||
model = ols('outcome ~ treatment + post + treatment:post', data=df).fit()
|
||||
|
||||
# DiD estimate is the interaction coefficient
|
||||
did_estimate = model.params['treatment:post']
|
||||
print(f"DiD estimate: {did_estimate:.4f}")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always check assumptions**: Test before interpreting results
|
||||
2. **Report effect sizes**: Not just p-values
|
||||
3. **Use appropriate tests**: Match test to data type and distribution
|
||||
4. **Correct for multiple comparisons**: When conducting many tests
|
||||
5. **Check sample size**: Ensure adequate power
|
||||
6. **Visual inspection**: Plot data before testing
|
||||
7. **Report confidence intervals**: Along with point estimates
|
||||
8. **Consider alternatives**: Non-parametric when assumptions violated
|
||||
9. **Robust standard errors**: Use when heteroskedasticity/autocorrelation present
|
||||
10. **Document decisions**: Note which tests used and why
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Not checking test assumptions**: May invalidate results
|
||||
2. **Multiple testing without correction**: Inflated Type I error
|
||||
3. **Using parametric tests on non-normal data**: Consider non-parametric
|
||||
4. **Ignoring heteroskedasticity**: Use robust SEs
|
||||
5. **Confusing statistical and practical significance**: Check effect sizes
|
||||
6. **Not reporting confidence intervals**: Only p-values insufficient
|
||||
7. **Using wrong test**: Match test to research question
|
||||
8. **Insufficient power**: Risk of Type II error (false negatives)
|
||||
9. **p-hacking**: Testing many specifications until significant
|
||||
10. **Overinterpreting p-values**: Remember limitations of NHST
|
||||
Reference in New Issue
Block a user