Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/statsmodels/SKILL.md
+++ b/skills/statsmodels/SKILL.md
@@ -0,0 +1,608 @@
+---
+name: statsmodels
+description: "Statistical modeling toolkit. OLS, GLM, logistic, ARIMA, time series, hypothesis tests, diagnostics, AIC/BIC, for rigorous statistical inference and econometric analysis."
+---
+
+# Statsmodels: Statistical Modeling and Econometrics
+
+## Overview
+
+Statsmodels is Python's premier library for statistical modeling, providing tools for estimation, inference, and diagnostics across a wide range of statistical methods. Apply this skill for rigorous statistical analysis, from simple linear regression to complex time series models and econometric analyses.
+
+## When to Use This Skill
+
+This skill should be used when:
+- Fitting regression models (OLS, WLS, GLS, quantile regression)
+- Performing generalized linear modeling (logistic, Poisson, Gamma, etc.)
+- Analyzing discrete outcomes (binary, multinomial, count, ordinal)
+- Conducting time series analysis (ARIMA, SARIMAX, VAR, forecasting)
+- Running statistical tests and diagnostics
+- Testing model assumptions (heteroskedasticity, autocorrelation, normality)
+- Detecting outliers and influential observations
+- Comparing models (AIC/BIC, likelihood ratio tests)
+- Estimating causal effects
+- Producing publication-ready statistical tables and inference
+
+## Quick Start Guide
+
+### Linear Regression (OLS)
+
+```python
+import statsmodels.api as sm
+import numpy as np
+import pandas as pd
+
+# Prepare data - ALWAYS add constant for intercept
+X = sm.add_constant(X_data)
+
+# Fit OLS model
+model = sm.OLS(y, X)
+results = model.fit()
+
+# View comprehensive results
+print(results.summary())
+
+# Key results
+print(f"R-squared: {results.rsquared:.4f}")
+print(f"Coefficients:\\n{results.params}")
+print(f"P-values:\\n{results.pvalues}")
+
+# Predictions with confidence intervals
+predictions = results.get_prediction(X_new)
+pred_summary = predictions.summary_frame()
+print(pred_summary)  # includes mean, CI, prediction intervals
+
+# Diagnostics
+from statsmodels.stats.diagnostic import het_breuschpagan
+bp_test = het_breuschpagan(results.resid, X)
+print(f"Breusch-Pagan p-value: {bp_test[1]:.4f}")
+
+# Visualize residuals
+import matplotlib.pyplot as plt
+plt.scatter(results.fittedvalues, results.resid)
+plt.axhline(y=0, color='r', linestyle='--')
+plt.xlabel('Fitted values')
+plt.ylabel('Residuals')
+plt.show()
+```
+
+### Logistic Regression (Binary Outcomes)
+
+```python
+from statsmodels.discrete.discrete_model import Logit
+
+# Add constant
+X = sm.add_constant(X_data)
+
+# Fit logit model
+model = Logit(y_binary, X)
+results = model.fit()
+
+print(results.summary())
+
+# Odds ratios
+odds_ratios = np.exp(results.params)
+print("Odds ratios:\\n", odds_ratios)
+
+# Predicted probabilities
+probs = results.predict(X)
+
+# Binary predictions (0.5 threshold)
+predictions = (probs > 0.5).astype(int)
+
+# Model evaluation
+from sklearn.metrics import classification_report, roc_auc_score
+
+print(classification_report(y_binary, predictions))
+print(f"AUC: {roc_auc_score(y_binary, probs):.4f}")
+
+# Marginal effects
+marginal = results.get_margeff()
+print(marginal.summary())
+```
+
+### Time Series (ARIMA)
+
+```python
+from statsmodels.tsa.arima.model import ARIMA
+from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
+
+# Check stationarity
+from statsmodels.tsa.stattools import adfuller
+
+adf_result = adfuller(y_series)
+print(f"ADF p-value: {adf_result[1]:.4f}")
+
+if adf_result[1] > 0.05:
+    # Series is non-stationary, difference it
+    y_diff = y_series.diff().dropna()
+
+# Plot ACF/PACF to identify p, q
+fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
+plot_acf(y_diff, lags=40, ax=ax1)
+plot_pacf(y_diff, lags=40, ax=ax2)
+plt.show()
+
+# Fit ARIMA(p,d,q)
+model = ARIMA(y_series, order=(1, 1, 1))
+results = model.fit()
+
+print(results.summary())
+
+# Forecast
+forecast = results.forecast(steps=10)
+forecast_obj = results.get_forecast(steps=10)
+forecast_df = forecast_obj.summary_frame()
+
+print(forecast_df)  # includes mean and confidence intervals
+
+# Residual diagnostics
+results.plot_diagnostics(figsize=(12, 8))
+plt.show()
+```
+
+### Generalized Linear Models (GLM)
+
+```python
+import statsmodels.api as sm
+
+# Poisson regression for count data
+X = sm.add_constant(X_data)
+model = sm.GLM(y_counts, X, family=sm.families.Poisson())
+results = model.fit()
+
+print(results.summary())
+
+# Rate ratios (for Poisson with log link)
+rate_ratios = np.exp(results.params)
+print("Rate ratios:\\n", rate_ratios)
+
+# Check overdispersion
+overdispersion = results.pearson_chi2 / results.df_resid
+print(f"Overdispersion: {overdispersion:.2f}")
+
+if overdispersion > 1.5:
+    # Use Negative Binomial instead
+    from statsmodels.discrete.count_model import NegativeBinomial
+    nb_model = NegativeBinomial(y_counts, X)
+    nb_results = nb_model.fit()
+    print(nb_results.summary())
+```
+
+## Core Statistical Modeling Capabilities
+
+### 1. Linear Regression Models
+
+Comprehensive suite of linear models for continuous outcomes with various error structures.
+
+**Available models:**
+- **OLS**: Standard linear regression with i.i.d. errors
+- **WLS**: Weighted least squares for heteroskedastic errors
+- **GLS**: Generalized least squares for arbitrary covariance structure
+- **GLSAR**: GLS with autoregressive errors for time series
+- **Quantile Regression**: Conditional quantiles (robust to outliers)
+- **Mixed Effects**: Hierarchical/multilevel models with random effects
+- **Recursive/Rolling**: Time-varying parameter estimation
+
+**Key features:**
+- Comprehensive diagnostic tests
+- Robust standard errors (HC, HAC, cluster-robust)
+- Influence statistics (Cook's distance, leverage, DFFITS)
+- Hypothesis testing (F-tests, Wald tests)
+- Model comparison (AIC, BIC, likelihood ratio tests)
+- Prediction with confidence and prediction intervals
+
+**When to use:** Continuous outcome variable, want inference on coefficients, need diagnostics
+
+**Reference:** See `references/linear_models.md` for detailed guidance on model selection, diagnostics, and best practices.
+
+### 2. Generalized Linear Models (GLM)
+
+Flexible framework extending linear models to non-normal distributions.
+
+**Distribution families:**
+- **Binomial**: Binary outcomes or proportions (logistic regression)
+- **Poisson**: Count data
+- **Negative Binomial**: Overdispersed counts
+- **Gamma**: Positive continuous, right-skewed data
+- **Inverse Gaussian**: Positive continuous with specific variance structure
+- **Gaussian**: Equivalent to OLS
+- **Tweedie**: Flexible family for semi-continuous data
+
+**Link functions:**
+- Logit, Probit, Log, Identity, Inverse, Sqrt, CLogLog, Power
+- Choose based on interpretation needs and model fit
+
+**Key features:**
+- Maximum likelihood estimation via IRLS
+- Deviance and Pearson residuals
+- Goodness-of-fit statistics
+- Pseudo R-squared measures
+- Robust standard errors
+
+**When to use:** Non-normal outcomes, need flexible variance and link specifications
+
+**Reference:** See `references/glm.md` for family selection, link functions, interpretation, and diagnostics.
+
+### 3. Discrete Choice Models
+
+Models for categorical and count outcomes.
+
+**Binary models:**
+- **Logit**: Logistic regression (odds ratios)
+- **Probit**: Probit regression (normal distribution)
+
+**Multinomial models:**
+- **MNLogit**: Unordered categories (3+ levels)
+- **Conditional Logit**: Choice models with alternative-specific variables
+- **Ordered Model**: Ordinal outcomes (ordered categories)
+
+**Count models:**
+- **Poisson**: Standard count model
+- **Negative Binomial**: Overdispersed counts
+- **Zero-Inflated**: Excess zeros (ZIP, ZINB)
+- **Hurdle Models**: Two-stage models for zero-heavy data
+
+**Key features:**
+- Maximum likelihood estimation
+- Marginal effects at means or average marginal effects
+- Model comparison via AIC/BIC
+- Predicted probabilities and classification
+- Goodness-of-fit tests
+
+**When to use:** Binary, categorical, or count outcomes
+
+**Reference:** See `references/discrete_choice.md` for model selection, interpretation, and evaluation.
+
+### 4. Time Series Analysis
+
+Comprehensive time series modeling and forecasting capabilities.
+
+**Univariate models:**
+- **AutoReg (AR)**: Autoregressive models
+- **ARIMA**: Autoregressive integrated moving average
+- **SARIMAX**: Seasonal ARIMA with exogenous variables
+- **Exponential Smoothing**: Simple, Holt, Holt-Winters
+- **ETS**: Innovations state space models
+
+**Multivariate models:**
+- **VAR**: Vector autoregression
+- **VARMAX**: VAR with MA and exogenous variables
+- **Dynamic Factor Models**: Extract common factors
+- **VECM**: Vector error correction models (cointegration)
+
+**Advanced models:**
+- **State Space**: Kalman filtering, custom specifications
+- **Regime Switching**: Markov switching models
+- **ARDL**: Autoregressive distributed lag
+
+**Key features:**
+- ACF/PACF analysis for model identification
+- Stationarity tests (ADF, KPSS)
+- Forecasting with prediction intervals
+- Residual diagnostics (Ljung-Box, heteroskedasticity)
+- Granger causality testing
+- Impulse response functions (IRF)
+- Forecast error variance decomposition (FEVD)
+
+**When to use:** Time-ordered data, forecasting, understanding temporal dynamics
+
+**Reference:** See `references/time_series.md` for model selection, diagnostics, and forecasting methods.
+
+### 5. Statistical Tests and Diagnostics
+
+Extensive testing and diagnostic capabilities for model validation.
+
+**Residual diagnostics:**
+- Autocorrelation tests (Ljung-Box, Durbin-Watson, Breusch-Godfrey)
+- Heteroskedasticity tests (Breusch-Pagan, White, ARCH)
+- Normality tests (Jarque-Bera, Omnibus, Anderson-Darling, Lilliefors)
+- Specification tests (RESET, Harvey-Collier)
+
+**Influence and outliers:**
+- Leverage (hat values)
+- Cook's distance
+- DFFITS and DFBETAs
+- Studentized residuals
+- Influence plots
+
+**Hypothesis testing:**
+- t-tests (one-sample, two-sample, paired)
+- Proportion tests
+- Chi-square tests
+- Non-parametric tests (Mann-Whitney, Wilcoxon, Kruskal-Wallis)
+- ANOVA (one-way, two-way, repeated measures)
+
+**Multiple comparisons:**
+- Tukey's HSD
+- Bonferroni correction
+- False Discovery Rate (FDR)
+
+**Effect sizes and power:**
+- Cohen's d, eta-squared
+- Power analysis for t-tests, proportions
+- Sample size calculations
+
+**Robust inference:**
+- Heteroskedasticity-consistent SEs (HC0-HC3)
+- HAC standard errors (Newey-West)
+- Cluster-robust standard errors
+
+**When to use:** Validating assumptions, detecting problems, ensuring robust inference
+
+**Reference:** See `references/stats_diagnostics.md` for comprehensive testing and diagnostic procedures.
+
+## Formula API (R-style)
+
+Statsmodels supports R-style formulas for intuitive model specification:
+
+```python
+import statsmodels.formula.api as smf
+
+# OLS with formula
+results = smf.ols('y ~ x1 + x2 + x1:x2', data=df).fit()
+
+# Categorical variables (automatic dummy coding)
+results = smf.ols('y ~ x1 + C(category)', data=df).fit()
+
+# Interactions
+results = smf.ols('y ~ x1 * x2', data=df).fit()  # x1 + x2 + x1:x2
+
+# Polynomial terms
+results = smf.ols('y ~ x + I(x**2)', data=df).fit()
+
+# Logit
+results = smf.logit('y ~ x1 + x2 + C(group)', data=df).fit()
+
+# Poisson
+results = smf.poisson('count ~ x1 + x2', data=df).fit()
+
+# ARIMA (not available via formula, use regular API)
+```
+
+## Model Selection and Comparison
+
+### Information Criteria
+
+```python
+# Compare models using AIC/BIC
+models = {
+    'Model 1': model1_results,
+    'Model 2': model2_results,
+    'Model 3': model3_results
+}
+
+comparison = pd.DataFrame({
+    'AIC': {name: res.aic for name, res in models.items()},
+    'BIC': {name: res.bic for name, res in models.items()},
+    'Log-Likelihood': {name: res.llf for name, res in models.items()}
+})
+
+print(comparison.sort_values('AIC'))
+# Lower AIC/BIC indicates better model
+```
+
+### Likelihood Ratio Test (Nested Models)
+
+```python
+# For nested models (one is subset of the other)
+from scipy import stats
+
+lr_stat = 2 * (full_model.llf - reduced_model.llf)
+df = full_model.df_model - reduced_model.df_model
+p_value = 1 - stats.chi2.cdf(lr_stat, df)
+
+print(f"LR statistic: {lr_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+
+if p_value < 0.05:
+    print("Full model significantly better")
+else:
+    print("Reduced model preferred (parsimony)")
+```
+
+### Cross-Validation
+
+```python
+from sklearn.model_selection import KFold
+from sklearn.metrics import mean_squared_error
+
+kf = KFold(n_splits=5, shuffle=True, random_state=42)
+cv_scores = []
+
+for train_idx, val_idx in kf.split(X):
+    X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
+    y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
+
+    # Fit model
+    model = sm.OLS(y_train, X_train).fit()
+
+    # Predict
+    y_pred = model.predict(X_val)
+
+    # Score
+    rmse = np.sqrt(mean_squared_error(y_val, y_pred))
+    cv_scores.append(rmse)
+
+print(f"CV RMSE: {np.mean(cv_scores):.4f} ± {np.std(cv_scores):.4f}")
+```
+
+## Best Practices
+
+### Data Preparation
+
+1. **Always add constant**: Use `sm.add_constant()` unless excluding intercept
+2. **Check for missing values**: Handle or impute before fitting
+3. **Scale if needed**: Improves convergence, interpretation (but not required for tree models)
+4. **Encode categoricals**: Use formula API or manual dummy coding
+
+### Model Building
+
+1. **Start simple**: Begin with basic model, add complexity as needed
+2. **Check assumptions**: Test residuals, heteroskedasticity, autocorrelation
+3. **Use appropriate model**: Match model to outcome type (binary→Logit, count→Poisson)
+4. **Consider alternatives**: If assumptions violated, use robust methods or different model
+
+### Inference
+
+1. **Report effect sizes**: Not just p-values
+2. **Use robust SEs**: When heteroskedasticity or clustering present
+3. **Multiple comparisons**: Correct when testing many hypotheses
+4. **Confidence intervals**: Always report alongside point estimates
+
+### Model Evaluation
+
+1. **Check residuals**: Plot residuals vs fitted, Q-Q plot
+2. **Influence diagnostics**: Identify and investigate influential observations
+3. **Out-of-sample validation**: Test on holdout set or cross-validate
+4. **Compare models**: Use AIC/BIC for non-nested, LR test for nested
+
+### Reporting
+
+1. **Comprehensive summary**: Use `.summary()` for detailed output
+2. **Document decisions**: Note transformations, excluded observations
+3. **Interpret carefully**: Account for link functions (e.g., exp(β) for log link)
+4. **Visualize**: Plot predictions, confidence intervals, diagnostics
+
+## Common Workflows
+
+### Workflow 1: Linear Regression Analysis
+
+1. Explore data (plots, descriptives)
+2. Fit initial OLS model
+3. Check residual diagnostics
+4. Test for heteroskedasticity, autocorrelation
+5. Check for multicollinearity (VIF)
+6. Identify influential observations
+7. Refit with robust SEs if needed
+8. Interpret coefficients and inference
+9. Validate on holdout or via CV
+
+### Workflow 2: Binary Classification
+
+1. Fit logistic regression (Logit)
+2. Check for convergence issues
+3. Interpret odds ratios
+4. Calculate marginal effects
+5. Evaluate classification performance (AUC, confusion matrix)
+6. Check for influential observations
+7. Compare with alternative models (Probit)
+8. Validate predictions on test set
+
+### Workflow 3: Count Data Analysis
+
+1. Fit Poisson regression
+2. Check for overdispersion
+3. If overdispersed, fit Negative Binomial
+4. Check for excess zeros (consider ZIP/ZINB)
+5. Interpret rate ratios
+6. Assess goodness of fit
+7. Compare models via AIC
+8. Validate predictions
+
+### Workflow 4: Time Series Forecasting
+
+1. Plot series, check for trend/seasonality
+2. Test for stationarity (ADF, KPSS)
+3. Difference if non-stationary
+4. Identify p, q from ACF/PACF
+5. Fit ARIMA or SARIMAX
+6. Check residual diagnostics (Ljung-Box)
+7. Generate forecasts with confidence intervals
+8. Evaluate forecast accuracy on test set
+
+## Reference Documentation
+
+This skill includes comprehensive reference files for detailed guidance:
+
+### references/linear_models.md
+Detailed coverage of linear regression models including:
+- OLS, WLS, GLS, GLSAR, Quantile Regression
+- Mixed effects models
+- Recursive and rolling regression
+- Comprehensive diagnostics (heteroskedasticity, autocorrelation, multicollinearity)
+- Influence statistics and outlier detection
+- Robust standard errors (HC, HAC, cluster)
+- Hypothesis testing and model comparison
+
+### references/glm.md
+Complete guide to generalized linear models:
+- All distribution families (Binomial, Poisson, Gamma, etc.)
+- Link functions and when to use each
+- Model fitting and interpretation
+- Pseudo R-squared and goodness of fit
+- Diagnostics and residual analysis
+- Applications (logistic, Poisson, Gamma regression)
+
+### references/discrete_choice.md
+Comprehensive guide to discrete outcome models:
+- Binary models (Logit, Probit)
+- Multinomial models (MNLogit, Conditional Logit)
+- Count models (Poisson, Negative Binomial, Zero-Inflated, Hurdle)
+- Ordinal models
+- Marginal effects and interpretation
+- Model diagnostics and comparison
+
+### references/time_series.md
+In-depth time series analysis guidance:
+- Univariate models (AR, ARIMA, SARIMAX, Exponential Smoothing)
+- Multivariate models (VAR, VARMAX, Dynamic Factor)
+- State space models
+- Stationarity testing and diagnostics
+- Forecasting methods and evaluation
+- Granger causality, IRF, FEVD
+
+### references/stats_diagnostics.md
+Comprehensive statistical testing and diagnostics:
+- Residual diagnostics (autocorrelation, heteroskedasticity, normality)
+- Influence and outlier detection
+- Hypothesis tests (parametric and non-parametric)
+- ANOVA and post-hoc tests
+- Multiple comparisons correction
+- Robust covariance matrices
+- Power analysis and effect sizes
+
+**When to reference:**
+- Need detailed parameter explanations
+- Choosing between similar models
+- Troubleshooting convergence or diagnostic issues
+- Understanding specific test statistics
+- Looking for code examples for advanced features
+
+**Search patterns:**
+```bash
+# Find information about specific models
+grep -r "Quantile Regression" references/
+
+# Find diagnostic tests
+grep -r "Breusch-Pagan" references/stats_diagnostics.md
+
+# Find time series guidance
+grep -r "SARIMAX" references/time_series.md
+```
+
+## Common Pitfalls to Avoid
+
+1. **Forgetting constant term**: Always use `sm.add_constant()` unless no intercept desired
+2. **Ignoring assumptions**: Check residuals, heteroskedasticity, autocorrelation
+3. **Wrong model for outcome type**: Binary→Logit/Probit, Count→Poisson/NB, not OLS
+4. **Not checking convergence**: Look for optimization warnings
+5. **Misinterpreting coefficients**: Remember link functions (log, logit, etc.)
+6. **Using Poisson with overdispersion**: Check dispersion, use Negative Binomial if needed
+7. **Not using robust SEs**: When heteroskedasticity or clustering present
+8. **Overfitting**: Too many parameters relative to sample size
+9. **Data leakage**: Fitting on test data or using future information
+10. **Not validating predictions**: Always check out-of-sample performance
+11. **Comparing non-nested models**: Use AIC/BIC, not LR test
+12. **Ignoring influential observations**: Check Cook's distance and leverage
+13. **Multiple testing**: Correct p-values when testing many hypotheses
+14. **Not differencing time series**: Fit ARIMA on non-stationary data
+15. **Confusing prediction vs confidence intervals**: Prediction intervals are wider
+
+## Getting Help
+
+For detailed documentation and examples:
+- Official docs: https://www.statsmodels.org/stable/
+- User guide: https://www.statsmodels.org/stable/user-guide.html
+- Examples: https://www.statsmodels.org/stable/examples/index.html
+- API reference: https://www.statsmodels.org/stable/api.html
--- a/skills/statsmodels/references/discrete_choice.md
+++ b/skills/statsmodels/references/discrete_choice.md
@@ -0,0 +1,669 @@
+# Discrete Choice Models Reference
+
+This document provides comprehensive guidance on discrete choice models in statsmodels, including binary, multinomial, count, and ordinal models.
+
+## Overview
+
+Discrete choice models handle outcomes that are:
+- **Binary**: 0/1, success/failure
+- **Multinomial**: Multiple unordered categories
+- **Ordinal**: Ordered categories
+- **Count**: Non-negative integers
+
+All models use maximum likelihood estimation and assume i.i.d. errors.
+
+## Binary Models
+
+### Logit (Logistic Regression)
+
+Uses logistic distribution for binary outcomes.
+
+**When to use:**
+- Binary classification (yes/no, success/failure)
+- Probability estimation for binary outcomes
+- Interpretable odds ratios
+
+**Model**: P(Y=1|X) = 1 / (1 + exp(-Xβ))
+
+```python
+import statsmodels.api as sm
+from statsmodels.discrete.discrete_model import Logit
+
+# Prepare data
+X = sm.add_constant(X_data)
+
+# Fit model
+model = Logit(y, X)
+results = model.fit()
+
+print(results.summary())
+```
+
+**Interpretation:**
+```python
+import numpy as np
+
+# Odds ratios
+odds_ratios = np.exp(results.params)
+print("Odds ratios:", odds_ratios)
+
+# For 1-unit increase in X, odds multiply by exp(β)
+# OR > 1: increases odds of success
+# OR < 1: decreases odds of success
+# OR = 1: no effect
+
+# Confidence intervals for odds ratios
+odds_ci = np.exp(results.conf_int())
+print("Odds ratio 95% CI:")
+print(odds_ci)
+```
+
+**Marginal effects:**
+```python
+# Average marginal effects (AME)
+marginal_effects = results.get_margeff(at='mean')
+print(marginal_effects.summary())
+
+# Marginal effects at means (MEM)
+marginal_effects_mem = results.get_margeff(at='mean', method='dydx')
+
+# Marginal effects at representative values
+marginal_effects_custom = results.get_margeff(at='mean',
+                                              atexog={'x1': 1, 'x2': 5})
+```
+
+**Predictions:**
+```python
+# Predicted probabilities
+probs = results.predict(X)
+
+# Binary predictions (0.5 threshold)
+predictions = (probs > 0.5).astype(int)
+
+# Custom threshold
+threshold = 0.3
+predictions_custom = (probs > threshold).astype(int)
+
+# For new data
+X_new = sm.add_constant(X_new_data)
+new_probs = results.predict(X_new)
+```
+
+**Model evaluation:**
+```python
+from sklearn.metrics import (classification_report, confusion_matrix,
+                             roc_auc_score, roc_curve)
+
+# Classification report
+print(classification_report(y, predictions))
+
+# Confusion matrix
+print(confusion_matrix(y, predictions))
+
+# AUC-ROC
+auc = roc_auc_score(y, probs)
+print(f"AUC: {auc:.4f}")
+
+# Pseudo R-squared
+print(f"McFadden's Pseudo R²: {results.prsquared:.4f}")
+```
+
+### Probit
+
+Uses normal distribution for binary outcomes.
+
+**When to use:**
+- Binary outcomes
+- Prefer normal distribution assumption
+- Field convention (econometrics often uses probit)
+
+**Model**: P(Y=1|X) = Φ(Xβ), where Φ is standard normal CDF
+
+```python
+from statsmodels.discrete.discrete_model import Probit
+
+model = Probit(y, X)
+results = model.fit()
+
+print(results.summary())
+```
+
+**Comparison with Logit:**
+- Probit and Logit usually give similar results
+- Probit: symmetric, based on normal distribution
+- Logit: slightly heavier tails, easier interpretation (odds ratios)
+- Coefficients not directly comparable (scale difference)
+
+```python
+# Marginal effects are comparable
+logit_me = logit_results.get_margeff().margeff
+probit_me = probit_results.get_margeff().margeff
+
+print("Logit marginal effects:", logit_me)
+print("Probit marginal effects:", probit_me)
+```
+
+## Multinomial Models
+
+### MNLogit (Multinomial Logit)
+
+For unordered categorical outcomes with 3+ categories.
+
+**When to use:**
+- Multiple unordered categories (e.g., transportation mode, brand choice)
+- No natural ordering among categories
+- Need probabilities for each category
+
+**Model**: P(Y=j|X) = exp(Xβⱼ) / Σₖ exp(Xβₖ)
+
+```python
+from statsmodels.discrete.discrete_model import MNLogit
+
+# y should be integers 0, 1, 2, ... for categories
+model = MNLogit(y, X)
+results = model.fit()
+
+print(results.summary())
+```
+
+**Interpretation:**
+```python
+# One category is reference (usually category 0)
+# Coefficients represent log-odds relative to reference
+
+# For category j vs reference:
+# exp(β_j) = odds ratio of category j vs reference
+
+# Predicted probabilities for each category
+probs = results.predict(X)  # Shape: (n_samples, n_categories)
+
+# Most likely category
+predicted_categories = probs.argmax(axis=1)
+```
+
+**Relative risk ratios:**
+```python
+# Exponentiate coefficients for relative risk ratios
+import numpy as np
+import pandas as pd
+
+# Get parameter names and values
+params_df = pd.DataFrame({
+    'coef': results.params,
+    'RRR': np.exp(results.params)
+})
+print(params_df)
+```
+
+### Conditional Logit
+
+For choice models where alternatives have characteristics.
+
+**When to use:**
+- Alternative-specific regressors (vary across choices)
+- Panel data with choices
+- Discrete choice experiments
+
+```python
+from statsmodels.discrete.conditional_models import ConditionalLogit
+
+# Data structure: long format with choice indicator
+model = ConditionalLogit(y_choice, X_alternatives, groups=individual_id)
+results = model.fit()
+```
+
+## Count Models
+
+### Poisson
+
+Standard model for count data.
+
+**When to use:**
+- Count outcomes (events, occurrences)
+- Rare events
+- Mean ≈ variance
+
+**Model**: P(Y=k|X) = exp(-λ) λᵏ / k!, where log(λ) = Xβ
+
+```python
+from statsmodels.discrete.count_model import Poisson
+
+model = Poisson(y_counts, X)
+results = model.fit()
+
+print(results.summary())
+```
+
+**Interpretation:**
+```python
+# Rate ratios (incident rate ratios)
+rate_ratios = np.exp(results.params)
+print("Rate ratios:", rate_ratios)
+
+# For 1-unit increase in X, expected count multiplies by exp(β)
+```
+
+**Check overdispersion:**
+```python
+# Mean and variance should be similar for Poisson
+print(f"Mean: {y_counts.mean():.2f}")
+print(f"Variance: {y_counts.var():.2f}")
+
+# Formal test
+from statsmodels.stats.stattools import durbin_watson
+
+# Overdispersion if variance >> mean
+# Rule of thumb: variance/mean > 1.5 suggests overdispersion
+overdispersion_ratio = y_counts.var() / y_counts.mean()
+print(f"Variance/Mean: {overdispersion_ratio:.2f}")
+
+if overdispersion_ratio > 1.5:
+    print("Consider Negative Binomial model")
+```
+
+**With offset (for rates):**
+```python
+# When modeling rates with varying exposure
+# log(λ) = log(exposure) + Xβ
+
+model = Poisson(y_counts, X, offset=np.log(exposure))
+results = model.fit()
+```
+
+### Negative Binomial
+
+For overdispersed count data (variance > mean).
+
+**When to use:**
+- Count data with overdispersion
+- Excess variance not explained by Poisson
+- Heterogeneity in counts
+
+**Model**: Adds dispersion parameter α to account for overdispersion
+
+```python
+from statsmodels.discrete.count_model import NegativeBinomial
+
+model = NegativeBinomial(y_counts, X)
+results = model.fit()
+
+print(results.summary())
+print(f"Dispersion parameter alpha: {results.params['alpha']:.4f}")
+```
+
+**Compare with Poisson:**
+```python
+# Fit both models
+poisson_results = Poisson(y_counts, X).fit()
+nb_results = NegativeBinomial(y_counts, X).fit()
+
+# AIC comparison (lower is better)
+print(f"Poisson AIC: {poisson_results.aic:.2f}")
+print(f"Negative Binomial AIC: {nb_results.aic:.2f}")
+
+# Likelihood ratio test (if NB is better)
+from scipy import stats
+lr_stat = 2 * (nb_results.llf - poisson_results.llf)
+lr_pval = 1 - stats.chi2.cdf(lr_stat, df=1)  # 1 extra parameter (alpha)
+print(f"LR test p-value: {lr_pval:.4f}")
+
+if lr_pval < 0.05:
+    print("Negative Binomial significantly better")
+```
+
+### Zero-Inflated Models
+
+For count data with excess zeros.
+
+**When to use:**
+- More zeros than expected from Poisson/NB
+- Two processes: one for zeros, one for counts
+- Examples: number of doctor visits, insurance claims
+
+**Models:**
+- ZeroInflatedPoisson (ZIP)
+- ZeroInflatedNegativeBinomialP (ZINB)
+
+```python
+from statsmodels.discrete.count_model import (ZeroInflatedPoisson,
+                                               ZeroInflatedNegativeBinomialP)
+
+# ZIP model
+zip_model = ZeroInflatedPoisson(y_counts, X, exog_infl=X_inflation)
+zip_results = zip_model.fit()
+
+# ZINB model (for overdispersion + excess zeros)
+zinb_model = ZeroInflatedNegativeBinomialP(y_counts, X, exog_infl=X_inflation)
+zinb_results = zinb_model.fit()
+
+print(zip_results.summary())
+```
+
+**Two parts of the model:**
+```python
+# 1. Inflation model: P(Y=0 due to inflation)
+# 2. Count model: distribution of counts
+
+# Predicted probabilities of inflation
+inflation_probs = zip_results.predict(X, which='prob')
+
+# Predicted counts
+predicted_counts = zip_results.predict(X, which='mean')
+```
+
+### Hurdle Models
+
+Two-stage model: whether any counts, then how many.
+
+**When to use:**
+- Excess zeros
+- Different processes for zero vs positive counts
+- Zeros structurally different from positive values
+
+```python
+from statsmodels.discrete.count_model import HurdleCountModel
+
+# Specify count distribution and zero inflation
+model = HurdleCountModel(y_counts, X,
+                         exog_infl=X_hurdle,
+                         dist='poisson')  # or 'negbin'
+results = model.fit()
+
+print(results.summary())
+```
+
+## Ordinal Models
+
+### Ordered Logit/Probit
+
+For ordered categorical outcomes.
+
+**When to use:**
+- Ordered categories (e.g., low/medium/high, ratings 1-5)
+- Natural ordering matters
+- Want to respect ordinal structure
+
+**Model**: Cumulative probability model with cutpoints
+
+```python
+from statsmodels.miscmodels.ordinal_model import OrderedModel
+
+# y should be ordered integers: 0, 1, 2, ...
+model = OrderedModel(y_ordered, X, distr='logit')  # or 'probit'
+results = model.fit(method='bfgs')
+
+print(results.summary())
+```
+
+**Interpretation:**
+```python
+# Cutpoints (thresholds between categories)
+cutpoints = results.params[-n_categories+1:]
+print("Cutpoints:", cutpoints)
+
+# Coefficients
+coefficients = results.params[:-n_categories+1]
+print("Coefficients:", coefficients)
+
+# Predicted probabilities for each category
+probs = results.predict(X)  # Shape: (n_samples, n_categories)
+
+# Most likely category
+predicted_categories = probs.argmax(axis=1)
+```
+
+**Proportional odds assumption:**
+```python
+# Test if coefficients are same across cutpoints
+# (Brant test - implement manually or check residuals)
+
+# Check: model each cutpoint separately and compare coefficients
+```
+
+## Model Diagnostics
+
+### Goodness of Fit
+
+```python
+# Pseudo R-squared (McFadden)
+print(f"Pseudo R²: {results.prsquared:.4f}")
+
+# AIC/BIC for model comparison
+print(f"AIC: {results.aic:.2f}")
+print(f"BIC: {results.bic:.2f}")
+
+# Log-likelihood
+print(f"Log-likelihood: {results.llf:.2f}")
+
+# Likelihood ratio test vs null model
+lr_stat = 2 * (results.llf - results.llnull)
+from scipy import stats
+lr_pval = 1 - stats.chi2.cdf(lr_stat, results.df_model)
+print(f"LR test p-value: {lr_pval}")
+```
+
+### Classification Metrics (Binary)
+
+```python
+from sklearn.metrics import (accuracy_score, precision_score, recall_score,
+                             f1_score, roc_auc_score)
+
+# Predictions
+probs = results.predict(X)
+predictions = (probs > 0.5).astype(int)
+
+# Metrics
+print(f"Accuracy: {accuracy_score(y, predictions):.4f}")
+print(f"Precision: {precision_score(y, predictions):.4f}")
+print(f"Recall: {recall_score(y, predictions):.4f}")
+print(f"F1: {f1_score(y, predictions):.4f}")
+print(f"AUC: {roc_auc_score(y, probs):.4f}")
+```
+
+### Classification Metrics (Multinomial)
+
+```python
+from sklearn.metrics import accuracy_score, classification_report, log_loss
+
+# Predicted categories
+probs = results.predict(X)
+predictions = probs.argmax(axis=1)
+
+# Accuracy
+accuracy = accuracy_score(y, predictions)
+print(f"Accuracy: {accuracy:.4f}")
+
+# Classification report
+print(classification_report(y, predictions))
+
+# Log loss
+logloss = log_loss(y, probs)
+print(f"Log Loss: {logloss:.4f}")
+```
+
+### Count Model Diagnostics
+
+```python
+# Observed vs predicted frequencies
+observed = pd.Series(y_counts).value_counts().sort_index()
+predicted = results.predict(X)
+predicted_counts = pd.Series(np.round(predicted)).value_counts().sort_index()
+
+# Compare distributions
+import matplotlib.pyplot as plt
+fig, ax = plt.subplots()
+observed.plot(kind='bar', alpha=0.5, label='Observed', ax=ax)
+predicted_counts.plot(kind='bar', alpha=0.5, label='Predicted', ax=ax)
+ax.legend()
+ax.set_xlabel('Count')
+ax.set_ylabel('Frequency')
+plt.show()
+
+# Rootogram (better visualization)
+from statsmodels.graphics.agreement import mean_diff_plot
+# Custom rootogram implementation needed
+```
+
+### Influence and Outliers
+
+```python
+# Standardized residuals
+std_resid = (y - results.predict(X)) / np.sqrt(results.predict(X))
+
+# Check for outliers (|std_resid| > 2)
+outliers = np.where(np.abs(std_resid) > 2)[0]
+print(f"Number of outliers: {len(outliers)}")
+
+# Leverage (hat values) - for logit/probit
+# from statsmodels.stats.outliers_influence
+```
+
+## Hypothesis Testing
+
+```python
+# Single parameter test (automatic in summary)
+
+# Multiple parameters: Wald test
+# Test H0: β₁ = β₂ = 0
+R = [[0, 1, 0, 0], [0, 0, 1, 0]]
+wald_test = results.wald_test(R)
+print(wald_test)
+
+# Likelihood ratio test for nested models
+model_reduced = Logit(y, X_reduced).fit()
+model_full = Logit(y, X_full).fit()
+
+lr_stat = 2 * (model_full.llf - model_reduced.llf)
+df = model_full.df_model - model_reduced.df_model
+from scipy import stats
+lr_pval = 1 - stats.chi2.cdf(lr_stat, df)
+print(f"LR test p-value: {lr_pval:.4f}")
+```
+
+## Model Selection and Comparison
+
+```python
+# Fit multiple models
+models = {
+    'Logit': Logit(y, X).fit(),
+    'Probit': Probit(y, X).fit(),
+    # Add more models
+}
+
+# Compare AIC/BIC
+comparison = pd.DataFrame({
+    'AIC': {name: model.aic for name, model in models.items()},
+    'BIC': {name: model.bic for name, model in models.items()},
+    'Pseudo R²': {name: model.prsquared for name, model in models.items()}
+})
+print(comparison.sort_values('AIC'))
+
+# Cross-validation for predictive performance
+from sklearn.model_selection import cross_val_score
+from sklearn.linear_model import LogisticRegression
+
+# Use sklearn wrapper or manual CV
+```
+
+## Formula API
+
+Use R-style formulas for easier specification.
+
+```python
+import statsmodels.formula.api as smf
+
+# Logit with formula
+formula = 'y ~ x1 + x2 + C(category) + x1:x2'
+results = smf.logit(formula, data=df).fit()
+
+# MNLogit with formula
+results = smf.mnlogit(formula, data=df).fit()
+
+# Poisson with formula
+results = smf.poisson(formula, data=df).fit()
+
+# Negative Binomial with formula
+results = smf.negativebinomial(formula, data=df).fit()
+```
+
+## Common Applications
+
+### Binary Classification (Marketing Response)
+
+```python
+# Predict customer purchase probability
+X = sm.add_constant(customer_features)
+model = Logit(purchased, X)
+results = model.fit()
+
+# Targeting: select top 20% likely to purchase
+probs = results.predict(X)
+top_20_pct_idx = np.argsort(probs)[-int(0.2*len(probs)):]
+```
+
+### Multinomial Choice (Transportation Mode)
+
+```python
+# Predict transportation mode choice
+model = MNLogit(mode_choice, X)
+results = model.fit()
+
+# Predicted mode for new commuter
+new_commuter = sm.add_constant(new_features)
+mode_probs = results.predict(new_commuter)
+predicted_mode = mode_probs.argmax(axis=1)
+```
+
+### Count Data (Number of Doctor Visits)
+
+```python
+# Model healthcare utilization
+model = NegativeBinomial(num_visits, X)
+results = model.fit()
+
+# Expected visits for new patient
+expected_visits = results.predict(new_patient_X)
+```
+
+### Zero-Inflated (Insurance Claims)
+
+```python
+# Many people have zero claims
+# Zero-inflation: some never claim
+# Count process: those who might claim
+
+zip_model = ZeroInflatedPoisson(claims, X_count, exog_infl=X_inflation)
+results = zip_model.fit()
+
+# P(never file claim)
+never_claim_prob = results.predict(X, which='prob-zero')
+
+# Expected claims
+expected_claims = results.predict(X, which='mean')
+```
+
+## Best Practices
+
+1. **Check data type**: Ensure response matches model (binary, counts, categories)
+2. **Add constant**: Always use `sm.add_constant()` unless no intercept desired
+3. **Scale continuous predictors**: For better convergence and interpretation
+4. **Check convergence**: Look for convergence warnings
+5. **Use formula API**: For categorical variables and interactions
+6. **Marginal effects**: Report marginal effects, not just coefficients
+7. **Model comparison**: Use AIC/BIC and cross-validation
+8. **Validate**: Holdout set or cross-validation for predictive models
+9. **Check overdispersion**: For count models, test Poisson assumption
+10. **Consider alternatives**: Zero-inflation, hurdle models for excess zeros
+
+## Common Pitfalls
+
+1. **Forgetting constant**: No intercept term
+2. **Perfect separation**: Logit/probit may not converge
+3. **Using Poisson with overdispersion**: Check and use Negative Binomial
+4. **Misinterpreting coefficients**: Remember they're on log-odds/log scale
+5. **Not checking convergence**: Optimization may fail silently
+6. **Wrong distribution**: Match model to data type (binary/count/categorical)
+7. **Ignoring excess zeros**: Use ZIP/ZINB when appropriate
+8. **Not validating predictions**: Always check out-of-sample performance
+9. **Comparing non-nested models**: Use AIC/BIC, not likelihood ratio test
+10. **Ordinal as nominal**: Use OrderedModel for ordered categories
--- a/skills/statsmodels/references/glm.md
+++ b/skills/statsmodels/references/glm.md
@@ -0,0 +1,619 @@
+# Generalized Linear Models (GLM) Reference
+
+This document provides comprehensive guidance on generalized linear models in statsmodels, including families, link functions, and applications.
+
+## Overview
+
+GLMs extend linear regression to non-normal response distributions through:
+1. **Distribution family**: Specifies the conditional distribution of the response
+2. **Link function**: Transforms the linear predictor to the scale of the mean
+3. **Variance function**: Relates variance to the mean
+
+**General form**: g(μ) = Xβ, where g is the link function and μ = E(Y|X)
+
+## When to Use GLM
+
+- **Binary outcomes**: Logistic regression (Binomial family with logit link)
+- **Count data**: Poisson or Negative Binomial regression
+- **Positive continuous data**: Gamma or Inverse Gaussian
+- **Non-normal distributions**: When OLS assumptions violated
+- **Link functions**: Need non-linear relationship between predictors and response scale
+
+## Distribution Families
+
+### Binomial Family
+
+For binary outcomes (0/1) or proportions (k/n).
+
+**When to use:**
+- Binary classification
+- Success/failure outcomes
+- Proportions or rates
+
+**Common links:**
+- Logit (default): log(μ/(1-μ))
+- Probit: Φ⁻¹(μ)
+- Log: log(μ)
+
+```python
+import statsmodels.api as sm
+import statsmodels.formula.api as smf
+
+# Binary logistic regression
+model = sm.GLM(y, X, family=sm.families.Binomial())
+results = model.fit()
+
+# Formula API
+results = smf.glm('success ~ x1 + x2', data=df,
+                  family=sm.families.Binomial()).fit()
+
+# Access predictions (probabilities)
+probs = results.predict(X_new)
+
+# Classification (0.5 threshold)
+predictions = (probs > 0.5).astype(int)
+```
+
+**Interpretation:**
+```python
+import numpy as np
+
+# Odds ratios (for logit link)
+odds_ratios = np.exp(results.params)
+print("Odds ratios:", odds_ratios)
+
+# For 1-unit increase in x, odds multiply by exp(beta)
+```
+
+### Poisson Family
+
+For count data (non-negative integers).
+
+**When to use:**
+- Count outcomes (number of events)
+- Rare events
+- Rate modeling (with offset)
+
+**Common links:**
+- Log (default): log(μ)
+- Identity: μ
+- Sqrt: √μ
+
+```python
+# Poisson regression
+model = sm.GLM(y, X, family=sm.families.Poisson())
+results = model.fit()
+
+# With exposure/offset for rates
+# If modeling rate = counts/exposure
+model = sm.GLM(y, X, family=sm.families.Poisson(),
+               offset=np.log(exposure))
+results = model.fit()
+
+# Interpretation: exp(beta) = multiplicative effect on expected count
+import numpy as np
+rate_ratios = np.exp(results.params)
+print("Rate ratios:", rate_ratios)
+```
+
+**Overdispersion check:**
+```python
+# Deviance / df should be ~1 for Poisson
+overdispersion = results.deviance / results.df_resid
+print(f"Overdispersion: {overdispersion}")
+
+# If >> 1, consider Negative Binomial
+if overdispersion > 1.5:
+    print("Consider Negative Binomial model for overdispersion")
+```
+
+### Negative Binomial Family
+
+For overdispersed count data.
+
+**When to use:**
+- Count data with variance > mean
+- Excess zeros or large variance
+- Poisson model shows overdispersion
+
+```python
+# Negative Binomial GLM
+model = sm.GLM(y, X, family=sm.families.NegativeBinomial())
+results = model.fit()
+
+# Alternative: use discrete choice model with alpha estimation
+from statsmodels.discrete.discrete_model import NegativeBinomial
+nb_model = NegativeBinomial(y, X)
+nb_results = nb_model.fit()
+
+print(f"Dispersion parameter alpha: {nb_results.params[-1]}")
+```
+
+### Gaussian Family
+
+Equivalent to OLS but fit via IRLS (Iteratively Reweighted Least Squares).
+
+**When to use:**
+- Want GLM framework for consistency
+- Need robust standard errors
+- Comparing with other GLMs
+
+**Common links:**
+- Identity (default): μ
+- Log: log(μ)
+- Inverse: 1/μ
+
+```python
+# Gaussian GLM (equivalent to OLS)
+model = sm.GLM(y, X, family=sm.families.Gaussian())
+results = model.fit()
+
+# Verify equivalence with OLS
+ols_results = sm.OLS(y, X).fit()
+print("Parameters close:", np.allclose(results.params, ols_results.params))
+```
+
+### Gamma Family
+
+For positive continuous data, often right-skewed.
+
+**When to use:**
+- Positive outcomes (insurance claims, survival times)
+- Right-skewed distributions
+- Variance proportional to mean²
+
+**Common links:**
+- Inverse (default): 1/μ
+- Log: log(μ)
+- Identity: μ
+
+```python
+# Gamma regression (common for cost data)
+model = sm.GLM(y, X, family=sm.families.Gamma())
+results = model.fit()
+
+# Log link often preferred for interpretation
+model = sm.GLM(y, X, family=sm.families.Gamma(link=sm.families.links.Log()))
+results = model.fit()
+
+# With log link, exp(beta) = multiplicative effect
+import numpy as np
+effects = np.exp(results.params)
+```
+
+### Inverse Gaussian Family
+
+For positive continuous data with specific variance structure.
+
+**When to use:**
+- Positive skewed outcomes
+- Variance proportional to mean³
+- Alternative to Gamma
+
+**Common links:**
+- Inverse squared (default): 1/μ²
+- Log: log(μ)
+
+```python
+model = sm.GLM(y, X, family=sm.families.InverseGaussian())
+results = model.fit()
+```
+
+### Tweedie Family
+
+Flexible family covering multiple distributions.
+
+**When to use:**
+- Insurance claims (mixture of zeros and continuous)
+- Semi-continuous data
+- Need flexible variance function
+
+**Special cases (power parameter p):**
+- p=0: Normal
+- p=1: Poisson
+- p=2: Gamma
+- p=3: Inverse Gaussian
+- 1<p<2: Compound Poisson-Gamma (common for insurance)
+
+```python
+# Tweedie with power=1.5
+model = sm.GLM(y, X, family=sm.families.Tweedie(link=sm.families.links.Log(),
+                                                 var_power=1.5))
+results = model.fit()
+```
+
+## Link Functions
+
+Link functions connect the linear predictor to the mean of the response.
+
+### Available Links
+
+```python
+from statsmodels.genmod import families
+
+# Identity: g(μ) = μ
+link = families.links.Identity()
+
+# Log: g(μ) = log(μ)
+link = families.links.Log()
+
+# Logit: g(μ) = log(μ/(1-μ))
+link = families.links.Logit()
+
+# Probit: g(μ) = Φ⁻¹(μ)
+link = families.links.Probit()
+
+# Complementary log-log: g(μ) = log(-log(1-μ))
+link = families.links.CLogLog()
+
+# Inverse: g(μ) = 1/μ
+link = families.links.InversePower()
+
+# Inverse squared: g(μ) = 1/μ²
+link = families.links.InverseSquared()
+
+# Square root: g(μ) = √μ
+link = families.links.Sqrt()
+
+# Power: g(μ) = μ^p
+link = families.links.Power(power=2)
+```
+
+### Choosing Link Functions
+
+**Canonical links** (default for each family):
+- Binomial → Logit
+- Poisson → Log
+- Gamma → Inverse
+- Gaussian → Identity
+- Inverse Gaussian → Inverse squared
+
+**When to use non-canonical:**
+- **Log link with Binomial**: Risk ratios instead of odds ratios
+- **Identity link**: Direct additive effects (when sensible)
+- **Probit vs Logit**: Similar results, preference based on field
+- **CLogLog**: Asymmetric relationship, common in survival analysis
+
+```python
+# Example: Risk ratios with log-binomial model
+model = sm.GLM(y, X, family=sm.families.Binomial(link=sm.families.links.Log()))
+results = model.fit()
+
+# exp(beta) now gives risk ratios, not odds ratios
+risk_ratios = np.exp(results.params)
+```
+
+## Model Fitting and Results
+
+### Basic Workflow
+
+```python
+import statsmodels.api as sm
+
+# Add constant
+X = sm.add_constant(X_data)
+
+# Specify family and link
+family = sm.families.Poisson(link=sm.families.links.Log())
+
+# Fit model using IRLS
+model = sm.GLM(y, X, family=family)
+results = model.fit()
+
+# Summary
+print(results.summary())
+```
+
+### Results Attributes
+
+```python
+# Parameters and inference
+results.params              # Coefficients
+results.bse                 # Standard errors
+results.tvalues            # Z-statistics
+results.pvalues            # P-values
+results.conf_int()         # Confidence intervals
+
+# Predictions
+results.fittedvalues       # Fitted values (μ)
+results.predict(X_new)     # Predictions for new data
+
+# Model fit statistics
+results.aic                # Akaike Information Criterion
+results.bic                # Bayesian Information Criterion
+results.deviance           # Deviance
+results.null_deviance      # Null model deviance
+results.pearson_chi2       # Pearson chi-squared statistic
+results.df_resid           # Residual degrees of freedom
+results.llf                # Log-likelihood
+
+# Residuals
+results.resid_response     # Response residuals (y - μ)
+results.resid_pearson      # Pearson residuals
+results.resid_deviance     # Deviance residuals
+results.resid_anscombe     # Anscombe residuals
+results.resid_working      # Working residuals
+```
+
+### Pseudo R-squared
+
+```python
+# McFadden's pseudo R-squared
+pseudo_r2 = 1 - (results.deviance / results.null_deviance)
+print(f"Pseudo R²: {pseudo_r2:.4f}")
+
+# Adjusted pseudo R-squared
+n = len(y)
+k = len(results.params)
+adj_pseudo_r2 = 1 - ((n-1)/(n-k)) * (results.deviance / results.null_deviance)
+print(f"Adjusted Pseudo R²: {adj_pseudo_r2:.4f}")
+```
+
+## Diagnostics
+
+### Goodness of Fit
+
+```python
+# Deviance should be approximately χ² with df_resid degrees of freedom
+from scipy import stats
+
+deviance_pval = 1 - stats.chi2.cdf(results.deviance, results.df_resid)
+print(f"Deviance test p-value: {deviance_pval}")
+
+# Pearson chi-squared test
+pearson_pval = 1 - stats.chi2.cdf(results.pearson_chi2, results.df_resid)
+print(f"Pearson chi² test p-value: {pearson_pval}")
+
+# Check for overdispersion/underdispersion
+dispersion = results.pearson_chi2 / results.df_resid
+print(f"Dispersion: {dispersion}")
+# Should be ~1; >1 suggests overdispersion, <1 underdispersion
+```
+
+### Residual Analysis
+
+```python
+import matplotlib.pyplot as plt
+
+# Deviance residuals vs fitted
+plt.figure(figsize=(10, 6))
+plt.scatter(results.fittedvalues, results.resid_deviance, alpha=0.5)
+plt.xlabel('Fitted values')
+plt.ylabel('Deviance residuals')
+plt.axhline(y=0, color='r', linestyle='--')
+plt.title('Deviance Residuals vs Fitted')
+plt.show()
+
+# Q-Q plot of deviance residuals
+from statsmodels.graphics.gofplots import qqplot
+qqplot(results.resid_deviance, line='s')
+plt.title('Q-Q Plot of Deviance Residuals')
+plt.show()
+
+# For binary outcomes: binned residual plot
+if isinstance(results.model.family, sm.families.Binomial):
+    from statsmodels.graphics.gofplots import qqplot
+    # Group predictions and compute average residuals
+    # (custom implementation needed)
+    pass
+```
+
+### Influence and Outliers
+
+```python
+from statsmodels.stats.outliers_influence import GLMInfluence
+
+influence = GLMInfluence(results)
+
+# Leverage
+leverage = influence.hat_matrix_diag
+
+# Cook's distance
+cooks_d = influence.cooks_distance[0]
+
+# DFFITS
+dffits = influence.dffits[0]
+
+# Find influential observations
+influential = np.where(cooks_d > 4/len(y))[0]
+print(f"Influential observations: {influential}")
+```
+
+## Hypothesis Testing
+
+```python
+# Wald test for single parameter (automatically in summary)
+
+# Likelihood ratio test for nested models
+# Fit reduced model
+model_reduced = sm.GLM(y, X_reduced, family=family).fit()
+model_full = sm.GLM(y, X_full, family=family).fit()
+
+# LR statistic
+lr_stat = 2 * (model_full.llf - model_reduced.llf)
+df = model_full.df_model - model_reduced.df_model
+
+from scipy import stats
+lr_pval = 1 - stats.chi2.cdf(lr_stat, df)
+print(f"LR test p-value: {lr_pval}")
+
+# Wald test for multiple parameters
+# Test beta_1 = beta_2 = 0
+R = [[0, 1, 0, 0], [0, 0, 1, 0]]
+wald_test = results.wald_test(R)
+print(wald_test)
+```
+
+## Robust Standard Errors
+
+```python
+# Heteroscedasticity-robust (sandwich estimator)
+results_robust = results.get_robustcov_results(cov_type='HC0')
+
+# Cluster-robust
+results_cluster = results.get_robustcov_results(cov_type='cluster',
+                                                groups=cluster_ids)
+
+# Compare standard errors
+print("Regular SE:", results.bse)
+print("Robust SE:", results_robust.bse)
+```
+
+## Model Comparison
+
+```python
+# AIC/BIC for non-nested models
+models = [model1_results, model2_results, model3_results]
+for i, res in enumerate(models, 1):
+    print(f"Model {i}: AIC={res.aic:.2f}, BIC={res.bic:.2f}")
+
+# Likelihood ratio test for nested models (as shown above)
+
+# Cross-validation for predictive performance
+from sklearn.model_selection import KFold
+from sklearn.metrics import log_loss
+
+kf = KFold(n_splits=5, shuffle=True, random_state=42)
+cv_scores = []
+
+for train_idx, val_idx in kf.split(X):
+    X_train, X_val = X[train_idx], X[val_idx]
+    y_train, y_val = y[train_idx], y[val_idx]
+
+    model_cv = sm.GLM(y_train, X_train, family=family).fit()
+    pred_probs = model_cv.predict(X_val)
+
+    score = log_loss(y_val, pred_probs)
+    cv_scores.append(score)
+
+print(f"CV Log Loss: {np.mean(cv_scores):.4f} ± {np.std(cv_scores):.4f}")
+```
+
+## Prediction
+
+```python
+# Point predictions
+predictions = results.predict(X_new)
+
+# For classification: get probabilities and convert
+if isinstance(family, sm.families.Binomial):
+    probs = predictions
+    class_predictions = (probs > 0.5).astype(int)
+
+# For counts: predictions are expected counts
+if isinstance(family, sm.families.Poisson):
+    expected_counts = predictions
+
+# Prediction intervals via bootstrap
+n_boot = 1000
+boot_preds = np.zeros((n_boot, len(X_new)))
+
+for i in range(n_boot):
+    # Bootstrap resample
+    boot_idx = np.random.choice(len(y), size=len(y), replace=True)
+    X_boot, y_boot = X[boot_idx], y[boot_idx]
+
+    # Fit and predict
+    boot_model = sm.GLM(y_boot, X_boot, family=family).fit()
+    boot_preds[i] = boot_model.predict(X_new)
+
+# 95% prediction intervals
+pred_lower = np.percentile(boot_preds, 2.5, axis=0)
+pred_upper = np.percentile(boot_preds, 97.5, axis=0)
+```
+
+## Common Applications
+
+### Logistic Regression (Binary Classification)
+
+```python
+import statsmodels.api as sm
+
+# Fit logistic regression
+X = sm.add_constant(X_data)
+model = sm.GLM(y, X, family=sm.families.Binomial())
+results = model.fit()
+
+# Odds ratios
+odds_ratios = np.exp(results.params)
+odds_ci = np.exp(results.conf_int())
+
+# Classification metrics
+from sklearn.metrics import classification_report, roc_auc_score
+
+probs = results.predict(X)
+predictions = (probs > 0.5).astype(int)
+
+print(classification_report(y, predictions))
+print(f"AUC: {roc_auc_score(y, probs):.4f}")
+
+# ROC curve
+from sklearn.metrics import roc_curve
+import matplotlib.pyplot as plt
+
+fpr, tpr, thresholds = roc_curve(y, probs)
+plt.plot(fpr, tpr)
+plt.plot([0, 1], [0, 1], 'k--')
+plt.xlabel('False Positive Rate')
+plt.ylabel('True Positive Rate')
+plt.title('ROC Curve')
+plt.show()
+```
+
+### Poisson Regression (Count Data)
+
+```python
+# Fit Poisson model
+X = sm.add_constant(X_data)
+model = sm.GLM(y_counts, X, family=sm.families.Poisson())
+results = model.fit()
+
+# Rate ratios
+rate_ratios = np.exp(results.params)
+print("Rate ratios:", rate_ratios)
+
+# Check overdispersion
+dispersion = results.pearson_chi2 / results.df_resid
+if dispersion > 1.5:
+    print(f"Overdispersion detected ({dispersion:.2f}). Consider Negative Binomial.")
+```
+
+### Gamma Regression (Cost/Duration Data)
+
+```python
+# Fit Gamma model with log link
+X = sm.add_constant(X_data)
+model = sm.GLM(y_cost, X,
+               family=sm.families.Gamma(link=sm.families.links.Log()))
+results = model.fit()
+
+# Multiplicative effects
+effects = np.exp(results.params)
+print("Multiplicative effects on mean:", effects)
+```
+
+## Best Practices
+
+1. **Check distribution assumptions**: Plot histograms and Q-Q plots of response
+2. **Verify link function**: Use canonical links unless there's a reason not to
+3. **Examine residuals**: Deviance residuals should be approximately normal
+4. **Test for overdispersion**: Especially for Poisson models
+5. **Use offsets appropriately**: For rate modeling with varying exposure
+6. **Consider robust SEs**: When variance assumptions questionable
+7. **Compare models**: Use AIC/BIC for non-nested, LR test for nested
+8. **Interpret on original scale**: Transform coefficients (e.g., exp for log link)
+9. **Check influential observations**: Use Cook's distance
+10. **Validate predictions**: Use cross-validation or holdout set
+
+## Common Pitfalls
+
+1. **Forgetting to add constant**: No intercept term
+2. **Using wrong family**: Check distribution of response
+3. **Ignoring overdispersion**: Use Negative Binomial instead of Poisson
+4. **Misinterpreting coefficients**: Remember link function transformation
+5. **Not checking convergence**: IRLS may not converge; check warnings
+6. **Complete separation in logistic**: Some categories perfectly predict outcome
+7. **Using identity link with bounded outcomes**: May predict outside valid range
+8. **Comparing models with different samples**: Use same observations
+9. **Forgetting offset in rate models**: Must use log(exposure) as offset
+10. **Not considering alternatives**: Mixed models, zero-inflation for complex data
--- a/skills/statsmodels/references/linear_models.md
+++ b/skills/statsmodels/references/linear_models.md
@@ -0,0 +1,447 @@
+# Linear Regression Models Reference
+
+This document provides detailed guidance on linear regression models in statsmodels, including OLS, GLS, WLS, quantile regression, and specialized variants.
+
+## Core Model Classes
+
+### OLS (Ordinary Least Squares)
+
+Assumes independent, identically distributed errors (Σ=I). Best for standard regression with homoscedastic errors.
+
+**When to use:**
+- Standard regression analysis
+- Errors are independent and have constant variance
+- No autocorrelation or heteroscedasticity
+- Most common starting point
+
+**Basic usage:**
+```python
+import statsmodels.api as sm
+import numpy as np
+
+# Prepare data - ALWAYS add constant for intercept
+X = sm.add_constant(X_data)  # Adds column of 1s for intercept
+
+# Fit model
+model = sm.OLS(y, X)
+results = model.fit()
+
+# View results
+print(results.summary())
+```
+
+**Key results attributes:**
+```python
+results.params           # Coefficients
+results.bse              # Standard errors
+results.tvalues          # T-statistics
+results.pvalues          # P-values
+results.rsquared         # R-squared
+results.rsquared_adj     # Adjusted R-squared
+results.fittedvalues     # Fitted values (predictions on training data)
+results.resid            # Residuals
+results.conf_int()       # Confidence intervals for parameters
+```
+
+**Prediction with confidence/prediction intervals:**
+```python
+# For in-sample predictions
+pred = results.get_prediction(X)
+pred_summary = pred.summary_frame()
+print(pred_summary)  # Contains mean, std, confidence intervals
+
+# For out-of-sample predictions
+X_new = sm.add_constant(X_new_data)
+pred_new = results.get_prediction(X_new)
+pred_summary = pred_new.summary_frame()
+
+# Access intervals
+mean_ci_lower = pred_summary["mean_ci_lower"]
+mean_ci_upper = pred_summary["mean_ci_upper"]
+obs_ci_lower = pred_summary["obs_ci_lower"]  # Prediction intervals
+obs_ci_upper = pred_summary["obs_ci_upper"]
+```
+
+**Formula API (R-style):**
+```python
+import statsmodels.formula.api as smf
+
+# Automatic handling of categorical variables and interactions
+formula = 'y ~ x1 + x2 + C(category) + x1:x2'
+results = smf.ols(formula, data=df).fit()
+```
+
+### WLS (Weighted Least Squares)
+
+Handles heteroscedastic errors (diagonal Σ) where variance differs across observations.
+
+**When to use:**
+- Known heteroscedasticity (non-constant error variance)
+- Different observations have different reliability
+- Weights are known or can be estimated
+
+**Usage:**
+```python
+# If you know the weights (inverse variance)
+weights = 1 / error_variance
+model = sm.WLS(y, X, weights=weights)
+results = model.fit()
+
+# Common weight patterns:
+# - 1/variance: when variance is known
+# - n_i: sample size for grouped data
+# - 1/x: when variance proportional to x
+```
+
+**Feasible WLS (estimating weights):**
+```python
+# Step 1: Fit OLS
+ols_results = sm.OLS(y, X).fit()
+
+# Step 2: Model squared residuals to estimate variance
+abs_resid = np.abs(ols_results.resid)
+variance_model = sm.OLS(np.log(abs_resid**2), X).fit()
+
+# Step 3: Use estimated variance as weights
+weights = 1 / np.exp(variance_model.fittedvalues)
+wls_results = sm.WLS(y, X, weights=weights).fit()
+```
+
+### GLS (Generalized Least Squares)
+
+Handles arbitrary covariance structure (Σ). Superclass for other regression methods.
+
+**When to use:**
+- Known covariance structure
+- Correlated errors
+- More general than WLS
+
+**Usage:**
+```python
+# Specify covariance structure
+# Sigma should be (n x n) covariance matrix
+model = sm.GLS(y, X, sigma=Sigma)
+results = model.fit()
+```
+
+### GLSAR (GLS with Autoregressive Errors)
+
+Feasible generalized least squares with AR(p) errors for time series data.
+
+**When to use:**
+- Time series regression with autocorrelated errors
+- Need to account for serial correlation
+- Violations of error independence
+
+**Usage:**
+```python
+# AR(1) errors
+model = sm.GLSAR(y, X, rho=1)  # rho=1 for AR(1), rho=2 for AR(2), etc.
+results = model.iterative_fit()  # Iteratively estimates AR parameters
+
+print(results.summary())
+print(f"Estimated rho: {results.model.rho}")
+```
+
+### RLS (Recursive Least Squares)
+
+Sequential parameter estimation, useful for adaptive or online learning.
+
+**When to use:**
+- Parameters change over time
+- Online/streaming data
+- Want to see parameter evolution
+
+**Usage:**
+```python
+from statsmodels.regression.recursive_ls import RecursiveLS
+
+model = RecursiveLS(y, X)
+results = model.fit()
+
+# Access time-varying parameters
+params_over_time = results.recursive_coefficients
+cusum = results.cusum  # CUSUM statistic for structural breaks
+```
+
+### Rolling Regressions
+
+Compute estimates across moving windows for time-varying parameter detection.
+
+**When to use:**
+- Parameters vary over time
+- Want to detect structural changes
+- Time series with evolving relationships
+
+**Usage:**
+```python
+from statsmodels.regression.rolling import RollingOLS, RollingWLS
+
+# Rolling OLS with 60-period window
+rolling_model = RollingOLS(y, X, window=60)
+rolling_results = rolling_model.fit()
+
+# Extract time-varying parameters
+rolling_params = rolling_results.params  # DataFrame with parameters over time
+rolling_rsquared = rolling_results.rsquared
+
+# Plot parameter evolution
+import matplotlib.pyplot as plt
+rolling_params.plot()
+plt.title('Time-Varying Coefficients')
+plt.show()
+```
+
+### Quantile Regression
+
+Analyzes conditional quantiles rather than conditional mean.
+
+**When to use:**
+- Interest in quantiles (median, 90th percentile, etc.)
+- Robust to outliers (median regression)
+- Distributional effects across quantiles
+- Heterogeneous effects
+
+**Usage:**
+```python
+from statsmodels.regression.quantile_regression import QuantReg
+
+# Median regression (50th percentile)
+model = QuantReg(y, X)
+results_median = model.fit(q=0.5)
+
+# Multiple quantiles
+quantiles = [0.1, 0.25, 0.5, 0.75, 0.9]
+results_dict = {}
+for q in quantiles:
+    results_dict[q] = model.fit(q=q)
+
+# Plot quantile-varying effects
+import matplotlib.pyplot as plt
+coef_dict = {q: res.params for q, res in results_dict.items()}
+coef_df = pd.DataFrame(coef_dict).T
+coef_df.plot()
+plt.xlabel('Quantile')
+plt.ylabel('Coefficient')
+plt.show()
+```
+
+## Mixed Effects Models
+
+For hierarchical/nested data with random effects.
+
+**When to use:**
+- Clustered/grouped data (students in schools, patients in hospitals)
+- Repeated measures
+- Need random effects to account for grouping
+
+**Usage:**
+```python
+from statsmodels.regression.mixed_linear_model import MixedLM
+
+# Random intercept model
+model = MixedLM(y, X, groups=group_ids)
+results = model.fit()
+
+# Random intercept and slope
+model = MixedLM(y, X, groups=group_ids, exog_re=X_random)
+results = model.fit()
+
+print(results.summary())
+```
+
+## Diagnostics and Model Assessment
+
+### Residual Analysis
+
+```python
+# Basic residual plots
+import matplotlib.pyplot as plt
+
+# Residuals vs fitted
+plt.scatter(results.fittedvalues, results.resid)
+plt.xlabel('Fitted values')
+plt.ylabel('Residuals')
+plt.axhline(y=0, color='r', linestyle='--')
+plt.title('Residuals vs Fitted')
+plt.show()
+
+# Q-Q plot for normality
+from statsmodels.graphics.gofplots import qqplot
+qqplot(results.resid, line='s')
+plt.show()
+
+# Histogram of residuals
+plt.hist(results.resid, bins=30, edgecolor='black')
+plt.xlabel('Residuals')
+plt.ylabel('Frequency')
+plt.title('Distribution of Residuals')
+plt.show()
+```
+
+### Specification Tests
+
+```python
+from statsmodels.stats.diagnostic import het_breuschpagan, het_white
+from statsmodels.stats.stattools import durbin_watson, jarque_bera
+
+# Heteroscedasticity tests
+lm_stat, lm_pval, f_stat, f_pval = het_breuschpagan(results.resid, X)
+print(f"Breusch-Pagan test p-value: {lm_pval}")
+
+# White test
+white_test = het_white(results.resid, X)
+print(f"White test p-value: {white_test[1]}")
+
+# Autocorrelation
+dw_stat = durbin_watson(results.resid)
+print(f"Durbin-Watson statistic: {dw_stat}")
+# DW ~ 2 indicates no autocorrelation
+# DW < 2 suggests positive autocorrelation
+# DW > 2 suggests negative autocorrelation
+
+# Normality test
+jb_stat, jb_pval, skew, kurtosis = jarque_bera(results.resid)
+print(f"Jarque-Bera test p-value: {jb_pval}")
+```
+
+### Multicollinearity
+
+```python
+from statsmodels.stats.outliers_influence import variance_inflation_factor
+
+# Calculate VIF for each variable
+vif_data = pd.DataFrame()
+vif_data["Variable"] = X.columns
+vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
+
+print(vif_data)
+# VIF > 10 indicates problematic multicollinearity
+# VIF > 5 suggests moderate multicollinearity
+
+# Condition number (from summary)
+print(f"Condition number: {results.condition_number}")
+# Condition number > 20 suggests multicollinearity
+# Condition number > 30 indicates serious problems
+```
+
+### Influence Statistics
+
+```python
+from statsmodels.stats.outliers_influence import OLSInfluence
+
+influence = results.get_influence()
+
+# Leverage (hat values)
+leverage = influence.hat_matrix_diag
+# High leverage: > 2*p/n (p=predictors, n=observations)
+
+# Cook's distance
+cooks_d = influence.cooks_distance[0]
+# Influential if Cook's D > 4/n
+
+# DFFITS
+dffits = influence.dffits[0]
+# Influential if |DFFITS| > 2*sqrt(p/n)
+
+# Create influence plot
+from statsmodels.graphics.regressionplots import influence_plot
+fig, ax = plt.subplots(figsize=(12, 8))
+influence_plot(results, ax=ax)
+plt.show()
+```
+
+### Hypothesis Testing
+
+```python
+# Test single coefficient
+# H0: beta_i = 0 (automatically in summary)
+
+# Test multiple restrictions using F-test
+# Example: Test beta_1 = beta_2 = 0
+R = [[0, 1, 0, 0], [0, 0, 1, 0]]  # Restriction matrix
+f_test = results.f_test(R)
+print(f_test)
+
+# Formula-based hypothesis testing
+f_test = results.f_test("x1 = x2 = 0")
+print(f_test)
+
+# Test linear combination: beta_1 + beta_2 = 1
+r_matrix = [[0, 1, 1, 0]]
+q_matrix = [1]  # RHS value
+f_test = results.f_test((r_matrix, q_matrix))
+print(f_test)
+
+# Wald test (equivalent to F-test for linear restrictions)
+wald_test = results.wald_test(R)
+print(wald_test)
+```
+
+## Model Comparison
+
+```python
+# Compare nested models using likelihood ratio test (if using MLE)
+from statsmodels.stats.anova import anova_lm
+
+# Fit restricted and unrestricted models
+model_restricted = sm.OLS(y, X_restricted).fit()
+model_full = sm.OLS(y, X_full).fit()
+
+# ANOVA table for model comparison
+anova_results = anova_lm(model_restricted, model_full)
+print(anova_results)
+
+# AIC/BIC for non-nested model comparison
+print(f"Model 1 AIC: {model1.aic}, BIC: {model1.bic}")
+print(f"Model 2 AIC: {model2.aic}, BIC: {model2.bic}")
+# Lower AIC/BIC indicates better model
+```
+
+## Robust Standard Errors
+
+Handle heteroscedasticity or clustering without reweighting.
+
+```python
+# Heteroscedasticity-robust (HC) standard errors
+results_hc = results.get_robustcov_results(cov_type='HC0')  # White's
+results_hc1 = results.get_robustcov_results(cov_type='HC1')
+results_hc2 = results.get_robustcov_results(cov_type='HC2')
+results_hc3 = results.get_robustcov_results(cov_type='HC3')  # Most conservative
+
+# Newey-West HAC (Heteroscedasticity and Autocorrelation Consistent)
+results_hac = results.get_robustcov_results(cov_type='HAC', maxlags=4)
+
+# Cluster-robust standard errors
+results_cluster = results.get_robustcov_results(cov_type='cluster',
+                                                groups=cluster_ids)
+
+# View robust results
+print(results_hc3.summary())
+```
+
+## Best Practices
+
+1. **Always add constant**: Use `sm.add_constant()` unless you specifically want to exclude the intercept
+2. **Check assumptions**: Run diagnostic tests (heteroscedasticity, autocorrelation, normality)
+3. **Use formula API for categorical variables**: `smf.ols()` handles categorical variables automatically
+4. **Robust standard errors**: Use when heteroscedasticity detected but model specification is correct
+5. **Model selection**: Use AIC/BIC for non-nested models, F-test/likelihood ratio for nested models
+6. **Outliers and influence**: Always check Cook's distance and leverage
+7. **Multicollinearity**: Check VIF and condition number before interpretation
+8. **Time series**: Use `GLSAR` or robust HAC standard errors for autocorrelated errors
+9. **Grouped data**: Consider mixed effects models or cluster-robust standard errors
+10. **Quantile regression**: Use for robust estimation or when interested in distributional effects
+
+## Common Pitfalls
+
+1. **Forgetting to add constant**: Results in no-intercept model
+2. **Ignoring heteroscedasticity**: Use WLS or robust standard errors
+3. **Using OLS with autocorrelated errors**: Use GLSAR or HAC standard errors
+4. **Over-interpreting with multicollinearity**: Check VIF first
+5. **Not checking residuals**: Always plot residuals vs fitted values
+6. **Using t-SNE/PCA residuals**: Residuals should be from original space
+7. **Confusing prediction vs confidence intervals**: Prediction intervals are wider
+8. **Not handling categorical variables properly**: Use formula API or manual dummy coding
+9. **Comparing models with different sample sizes**: Ensure same observations used
+10. **Ignoring influential observations**: Check Cook's distance and DFFITS
--- a/skills/statsmodels/references/stats_diagnostics.md
+++ b/skills/statsmodels/references/stats_diagnostics.md
@@ -0,0 +1,859 @@
+# Statistical Tests and Diagnostics Reference
+
+This document provides comprehensive guidance on statistical tests, diagnostics, and tools available in statsmodels.
+
+## Overview
+
+Statsmodels provides extensive statistical testing capabilities:
+- Residual diagnostics and specification tests
+- Hypothesis testing (parametric and non-parametric)
+- Goodness-of-fit tests
+- Multiple comparisons and post-hoc tests
+- Power and sample size calculations
+- Robust covariance matrices
+- Influence and outlier detection
+
+## Residual Diagnostics
+
+### Autocorrelation Tests
+
+**Ljung-Box Test**: Tests for autocorrelation in residuals
+
+```python
+from statsmodels.stats.diagnostic import acorr_ljungbox
+
+# Test residuals for autocorrelation
+lb_test = acorr_ljungbox(residuals, lags=10, return_df=True)
+print(lb_test)
+
+# H0: No autocorrelation up to lag k
+# If p-value < 0.05, reject H0 (autocorrelation present)
+```
+
+**Durbin-Watson Test**: Tests for first-order autocorrelation
+
+```python
+from statsmodels.stats.stattools import durbin_watson
+
+dw_stat = durbin_watson(residuals)
+print(f"Durbin-Watson: {dw_stat:.4f}")
+
+# DW ≈ 2: no autocorrelation
+# DW < 2: positive autocorrelation
+# DW > 2: negative autocorrelation
+# Exact critical values depend on n and k
+```
+
+**Breusch-Godfrey Test**: More general test for autocorrelation
+
+```python
+from statsmodels.stats.diagnostic import acorr_breusch_godfrey
+
+bg_test = acorr_breusch_godfrey(results, nlags=5)
+lm_stat, lm_pval, f_stat, f_pval = bg_test
+
+print(f"LM statistic: {lm_stat:.4f}, p-value: {lm_pval:.4f}")
+# H0: No autocorrelation up to lag k
+```
+
+### Heteroskedasticity Tests
+
+**Breusch-Pagan Test**: Tests for heteroskedasticity
+
+```python
+from statsmodels.stats.diagnostic import het_breuschpagan
+
+bp_test = het_breuschpagan(residuals, exog)
+lm_stat, lm_pval, f_stat, f_pval = bp_test
+
+print(f"Breusch-Pagan test p-value: {lm_pval:.4f}")
+# H0: Homoskedasticity (constant variance)
+# If p-value < 0.05, reject H0 (heteroskedasticity present)
+```
+
+**White Test**: More general test for heteroskedasticity
+
+```python
+from statsmodels.stats.diagnostic import het_white
+
+white_test = het_white(residuals, exog)
+lm_stat, lm_pval, f_stat, f_pval = white_test
+
+print(f"White test p-value: {lm_pval:.4f}")
+# H0: Homoskedasticity
+```
+
+**ARCH Test**: Tests for autoregressive conditional heteroskedasticity
+
+```python
+from statsmodels.stats.diagnostic import het_arch
+
+arch_test = het_arch(residuals, nlags=5)
+lm_stat, lm_pval, f_stat, f_pval = arch_test
+
+print(f"ARCH test p-value: {lm_pval:.4f}")
+# H0: No ARCH effects
+# If significant, consider GARCH model
+```
+
+### Normality Tests
+
+**Jarque-Bera Test**: Tests for normality using skewness and kurtosis
+
+```python
+from statsmodels.stats.stattools import jarque_bera
+
+jb_stat, jb_pval, skew, kurtosis = jarque_bera(residuals)
+
+print(f"Jarque-Bera statistic: {jb_stat:.4f}")
+print(f"p-value: {jb_pval:.4f}")
+print(f"Skewness: {skew:.4f}")
+print(f"Kurtosis: {kurtosis:.4f}")
+
+# H0: Residuals are normally distributed
+# Normal: skewness ≈ 0, kurtosis ≈ 3
+```
+
+**Omnibus Test**: Another normality test (also based on skewness/kurtosis)
+
+```python
+from statsmodels.stats.stattools import omni_normtest
+
+omni_stat, omni_pval = omni_normtest(residuals)
+print(f"Omnibus test p-value: {omni_pval:.4f}")
+# H0: Normality
+```
+
+**Anderson-Darling Test**: Distribution fit test
+
+```python
+from statsmodels.stats.diagnostic import normal_ad
+
+ad_stat, ad_pval = normal_ad(residuals)
+print(f"Anderson-Darling test p-value: {ad_pval:.4f}")
+```
+
+**Lilliefors Test**: Modified Kolmogorov-Smirnov test
+
+```python
+from statsmodels.stats.diagnostic import lilliefors
+
+lf_stat, lf_pval = lilliefors(residuals, dist='norm')
+print(f"Lilliefors test p-value: {lf_pval:.4f}")
+```
+
+### Linearity and Specification Tests
+
+**Ramsey RESET Test**: Tests for functional form misspecification
+
+```python
+from statsmodels.stats.diagnostic import linear_reset
+
+reset_test = linear_reset(results, power=2)
+f_stat, f_pval = reset_test
+
+print(f"RESET test p-value: {f_pval:.4f}")
+# H0: Model is correctly specified (linear)
+# If rejected, may need polynomial terms or transformations
+```
+
+**Harvey-Collier Test**: Tests for linearity
+
+```python
+from statsmodels.stats.diagnostic import linear_harvey_collier
+
+hc_stat, hc_pval = linear_harvey_collier(results)
+print(f"Harvey-Collier test p-value: {hc_pval:.4f}")
+# H0: Linear specification is correct
+```
+
+## Multicollinearity Detection
+
+**Variance Inflation Factor (VIF)**:
+
+```python
+from statsmodels.stats.outliers_influence import variance_inflation_factor
+import pandas as pd
+
+# Calculate VIF for each variable
+vif_data = pd.DataFrame()
+vif_data["Variable"] = X.columns
+vif_data["VIF"] = [variance_inflation_factor(X.values, i)
+                   for i in range(X.shape[1])]
+
+print(vif_data.sort_values('VIF', ascending=False))
+
+# Interpretation:
+# VIF = 1: No correlation with other predictors
+# VIF > 5: Moderate multicollinearity
+# VIF > 10: Serious multicollinearity problem
+# VIF > 20: Severe multicollinearity (consider removing variable)
+```
+
+**Condition Number**: From regression results
+
+```python
+print(f"Condition number: {results.condition_number:.2f}")
+
+# Interpretation:
+# < 10: No multicollinearity concern
+# 10-30: Moderate multicollinearity
+# > 30: Strong multicollinearity
+# > 100: Severe multicollinearity
+```
+
+## Influence and Outlier Detection
+
+### Leverage
+
+High leverage points have extreme predictor values.
+
+```python
+from statsmodels.stats.outliers_influence import OLSInfluence
+
+influence = results.get_influence()
+
+# Hat values (leverage)
+leverage = influence.hat_matrix_diag
+
+# Rule of thumb: leverage > 2*p/n or 3*p/n is high
+# p = number of parameters, n = sample size
+threshold = 2 * len(results.params) / len(y)
+high_leverage = np.where(leverage > threshold)[0]
+
+print(f"High leverage observations: {high_leverage}")
+```
+
+### Cook's Distance
+
+Measures overall influence of each observation.
+
+```python
+# Cook's distance
+cooks_d = influence.cooks_distance[0]
+
+# Rule of thumb: Cook's D > 4/n is influential
+threshold = 4 / len(y)
+influential = np.where(cooks_d > threshold)[0]
+
+print(f"Influential observations (Cook's D): {influential}")
+
+# Plot
+import matplotlib.pyplot as plt
+plt.stem(range(len(cooks_d)), cooks_d)
+plt.axhline(y=threshold, color='r', linestyle='--', label=f'Threshold (4/n)')
+plt.xlabel('Observation')
+plt.ylabel("Cook's Distance")
+plt.legend()
+plt.show()
+```
+
+### DFFITS
+
+Measures influence on fitted value.
+
+```python
+# DFFITS
+dffits = influence.dffits[0]
+
+# Rule of thumb: |DFFITS| > 2*sqrt(p/n) is influential
+p = len(results.params)
+n = len(y)
+threshold = 2 * np.sqrt(p / n)
+
+influential_dffits = np.where(np.abs(dffits) > threshold)[0]
+print(f"Influential observations (DFFITS): {influential_dffits}")
+```
+
+### DFBETAs
+
+Measures influence on each coefficient.
+
+```python
+# DFBETAs (one for each parameter)
+dfbetas = influence.dfbetas
+
+# Rule of thumb: |DFBETA| > 2/sqrt(n)
+threshold = 2 / np.sqrt(n)
+
+for i, param_name in enumerate(results.params.index):
+    influential = np.where(np.abs(dfbetas[:, i]) > threshold)[0]
+    if len(influential) > 0:
+        print(f"Influential for {param_name}: {influential}")
+```
+
+### Influence Plot
+
+```python
+from statsmodels.graphics.regressionplots import influence_plot
+
+fig, ax = plt.subplots(figsize=(12, 8))
+influence_plot(results, ax=ax, criterion='cooks')
+plt.show()
+
+# Combines leverage, residuals, and Cook's distance
+# Large bubbles = high Cook's distance
+# Far from x=0 = high leverage
+# Far from y=0 = large residual
+```
+
+### Studentized Residuals
+
+```python
+# Studentized residuals (outliers)
+student_resid = influence.resid_studentized_internal
+
+# External studentized residuals (more conservative)
+student_resid_external = influence.resid_studentized_external
+
+# Outliers: |studentized residual| > 3 (or > 2.5)
+outliers = np.where(np.abs(student_resid_external) > 3)[0]
+print(f"Outliers: {outliers}")
+```
+
+## Hypothesis Testing
+
+### t-tests
+
+**One-sample t-test**: Test if mean equals specific value
+
+```python
+from scipy import stats
+
+# H0: population mean = mu_0
+t_stat, p_value = stats.ttest_1samp(data, popmean=mu_0)
+
+print(f"t-statistic: {t_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+**Two-sample t-test**: Compare means of two groups
+
+```python
+# H0: mean1 = mean2 (equal variances)
+t_stat, p_value = stats.ttest_ind(group1, group2)
+
+# Welch's t-test (unequal variances)
+t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=False)
+
+print(f"t-statistic: {t_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+**Paired t-test**: Compare paired observations
+
+```python
+# H0: mean difference = 0
+t_stat, p_value = stats.ttest_rel(before, after)
+
+print(f"t-statistic: {t_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+### Proportion Tests
+
+**One-proportion test**:
+
+```python
+from statsmodels.stats.proportion import proportions_ztest
+
+# H0: proportion = p0
+count = 45  # successes
+nobs = 100  # total observations
+p0 = 0.5    # hypothesized proportion
+
+z_stat, p_value = proportions_ztest(count, nobs, value=p0)
+
+print(f"z-statistic: {z_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+**Two-proportion test**:
+
+```python
+# H0: proportion1 = proportion2
+counts = [45, 60]
+nobs = [100, 120]
+
+z_stat, p_value = proportions_ztest(counts, nobs)
+print(f"z-statistic: {z_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+### Chi-square Tests
+
+**Chi-square test of independence**:
+
+```python
+from scipy.stats import chi2_contingency
+
+# Contingency table
+contingency_table = pd.crosstab(variable1, variable2)
+
+chi2, p_value, dof, expected = chi2_contingency(contingency_table)
+
+print(f"Chi-square statistic: {chi2:.4f}")
+print(f"p-value: {p_value:.4f}")
+print(f"Degrees of freedom: {dof}")
+
+# H0: Variables are independent
+```
+
+**Chi-square goodness-of-fit**:
+
+```python
+from scipy.stats import chisquare
+
+# Observed frequencies
+observed = [20, 30, 25, 25]
+
+# Expected frequencies (equal by default)
+expected = [25, 25, 25, 25]
+
+chi2, p_value = chisquare(observed, expected)
+
+print(f"Chi-square statistic: {chi2:.4f}")
+print(f"p-value: {p_value:.4f}")
+
+# H0: Data follow the expected distribution
+```
+
+### Non-parametric Tests
+
+**Mann-Whitney U test** (independent samples):
+
+```python
+from scipy.stats import mannwhitneyu
+
+# H0: Distributions are equal
+u_stat, p_value = mannwhitneyu(group1, group2, alternative='two-sided')
+
+print(f"U statistic: {u_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+**Wilcoxon signed-rank test** (paired samples):
+
+```python
+from scipy.stats import wilcoxon
+
+# H0: Median difference = 0
+w_stat, p_value = wilcoxon(before, after)
+
+print(f"W statistic: {w_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+**Kruskal-Wallis H test** (>2 groups):
+
+```python
+from scipy.stats import kruskal
+
+# H0: All groups have same distribution
+h_stat, p_value = kruskal(group1, group2, group3)
+
+print(f"H statistic: {h_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+**Sign test**:
+
+```python
+from statsmodels.stats.descriptivestats import sign_test
+
+# H0: Median = m0
+result = sign_test(data, m0=0)
+print(result)
+```
+
+### ANOVA
+
+**One-way ANOVA**:
+
+```python
+from scipy.stats import f_oneway
+
+# H0: All group means are equal
+f_stat, p_value = f_oneway(group1, group2, group3)
+
+print(f"F-statistic: {f_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+```
+
+**Two-way ANOVA** (with statsmodels):
+
+```python
+from statsmodels.formula.api import ols
+from statsmodels.stats.anova import anova_lm
+
+# Fit model
+model = ols('response ~ C(factor1) + C(factor2) + C(factor1):C(factor2)',
+            data=df).fit()
+
+# ANOVA table
+anova_table = anova_lm(model, typ=2)
+print(anova_table)
+```
+
+**Repeated measures ANOVA**:
+
+```python
+from statsmodels.stats.anova import AnovaRM
+
+# Requires long-format data
+aovrm = AnovaRM(df, depvar='score', subject='subject_id', within=['time'])
+results = aovrm.fit()
+
+print(results.summary())
+```
+
+## Multiple Comparisons
+
+### Post-hoc Tests
+
+**Tukey's HSD** (Honest Significant Difference):
+
+```python
+from statsmodels.stats.multicomp import pairwise_tukeyhsd
+
+# Perform Tukey HSD test
+tukey = pairwise_tukeyhsd(data, groups, alpha=0.05)
+
+print(tukey.summary())
+
+# Plot confidence intervals
+tukey.plot_simultaneous()
+plt.show()
+```
+
+**Bonferroni correction**:
+
+```python
+from statsmodels.stats.multitest import multipletests
+
+# P-values from multiple tests
+p_values = [0.01, 0.03, 0.04, 0.15, 0.001]
+
+# Apply correction
+reject, pvals_corrected, alphac_sidak, alphac_bonf = multipletests(
+    p_values,
+    alpha=0.05,
+    method='bonferroni'
+)
+
+print("Rejected:", reject)
+print("Corrected p-values:", pvals_corrected)
+```
+
+**False Discovery Rate (FDR)**:
+
+```python
+# FDR correction (less conservative than Bonferroni)
+reject, pvals_corrected, alphac_sidak, alphac_bonf = multipletests(
+    p_values,
+    alpha=0.05,
+    method='fdr_bh'  # Benjamini-Hochberg
+)
+
+print("Rejected:", reject)
+print("Corrected p-values:", pvals_corrected)
+```
+
+## Robust Covariance Matrices
+
+### Heteroskedasticity-Consistent (HC) Standard Errors
+
+```python
+# After fitting OLS
+results = sm.OLS(y, X).fit()
+
+# HC0 (White's heteroskedasticity-consistent SEs)
+results_hc0 = results.get_robustcov_results(cov_type='HC0')
+
+# HC1 (degrees of freedom adjustment)
+results_hc1 = results.get_robustcov_results(cov_type='HC1')
+
+# HC2 (leverage adjustment)
+results_hc2 = results.get_robustcov_results(cov_type='HC2')
+
+# HC3 (most conservative, recommended for small samples)
+results_hc3 = results.get_robustcov_results(cov_type='HC3')
+
+print("Standard OLS SEs:", results.bse)
+print("Robust HC3 SEs:", results_hc3.bse)
+```
+
+### HAC (Heteroskedasticity and Autocorrelation Consistent)
+
+**Newey-West standard errors**:
+
+```python
+# For time series with autocorrelation and heteroskedasticity
+results_hac = results.get_robustcov_results(cov_type='HAC', maxlags=4)
+
+print("HAC (Newey-West) SEs:", results_hac.bse)
+print(results_hac.summary())
+```
+
+### Cluster-Robust Standard Errors
+
+```python
+# For clustered/grouped data
+results_cluster = results.get_robustcov_results(
+    cov_type='cluster',
+    groups=cluster_ids
+)
+
+print("Cluster-robust SEs:", results_cluster.bse)
+```
+
+## Descriptive Statistics
+
+**Basic descriptive statistics**:
+
+```python
+from statsmodels.stats.api import DescrStatsW
+
+# Comprehensive descriptive stats
+desc = DescrStatsW(data)
+
+print("Mean:", desc.mean)
+print("Std Dev:", desc.std)
+print("Variance:", desc.var)
+print("Confidence interval:", desc.tconfint_mean())
+
+# Quantiles
+print("Median:", desc.quantile(0.5))
+print("IQR:", desc.quantile([0.25, 0.75]))
+```
+
+**Weighted statistics**:
+
+```python
+# With weights
+desc_weighted = DescrStatsW(data, weights=weights)
+
+print("Weighted mean:", desc_weighted.mean)
+print("Weighted std:", desc_weighted.std)
+```
+
+**Compare two groups**:
+
+```python
+from statsmodels.stats.weightstats import CompareMeans
+
+# Create comparison object
+cm = CompareMeans(DescrStatsW(group1), DescrStatsW(group2))
+
+# t-test
+print("t-test:", cm.ttest_ind())
+
+# Confidence interval for difference
+print("CI for difference:", cm.tconfint_diff())
+
+# Test for equal variances
+print("Equal variance test:", cm.test_equal_var())
+```
+
+## Power Analysis and Sample Size
+
+**Power for t-test**:
+
+```python
+from statsmodels.stats.power import tt_ind_solve_power
+
+# Solve for sample size
+effect_size = 0.5  # Cohen's d
+alpha = 0.05
+power = 0.8
+
+n = tt_ind_solve_power(effect_size=effect_size,
+                        alpha=alpha,
+                        power=power,
+                        alternative='two-sided')
+
+print(f"Required sample size per group: {n:.0f}")
+
+# Solve for power given n
+power = tt_ind_solve_power(effect_size=0.5,
+                           nobs1=50,
+                           alpha=0.05,
+                           alternative='two-sided')
+
+print(f"Power: {power:.4f}")
+```
+
+**Power for proportion test**:
+
+```python
+from statsmodels.stats.power import zt_ind_solve_power
+
+# For proportion tests (z-test)
+effect_size = 0.3  # Difference in proportions
+alpha = 0.05
+power = 0.8
+
+n = zt_ind_solve_power(effect_size=effect_size,
+                        alpha=alpha,
+                        power=power,
+                        alternative='two-sided')
+
+print(f"Required sample size per group: {n:.0f}")
+```
+
+**Power curves**:
+
+```python
+from statsmodels.stats.power import TTestIndPower
+import matplotlib.pyplot as plt
+
+# Create power analysis object
+analysis = TTestIndPower()
+
+# Plot power curves for different sample sizes
+sample_sizes = range(10, 200, 10)
+effect_sizes = [0.2, 0.5, 0.8]  # Small, medium, large
+
+fig, ax = plt.subplots(figsize=(10, 6))
+
+for es in effect_sizes:
+    power = [analysis.solve_power(effect_size=es, nobs1=n, alpha=0.05)
+             for n in sample_sizes]
+    ax.plot(sample_sizes, power, label=f'Effect size = {es}')
+
+ax.axhline(y=0.8, color='r', linestyle='--', label='Power = 0.8')
+ax.set_xlabel('Sample size per group')
+ax.set_ylabel('Power')
+ax.set_title('Power Curves for Two-Sample t-test')
+ax.legend()
+ax.grid(True, alpha=0.3)
+plt.show()
+```
+
+## Effect Sizes
+
+**Cohen's d** (standardized mean difference):
+
+```python
+def cohens_d(group1, group2):
+    \"\"\"Calculate Cohen's d for independent samples\"\"\"
+    n1, n2 = len(group1), len(group2)
+    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
+
+    # Pooled standard deviation
+    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
+
+    # Cohen's d
+    d = (np.mean(group1) - np.mean(group2)) / pooled_std
+
+    return d
+
+d = cohens_d(group1, group2)
+print(f"Cohen's d: {d:.4f}")
+
+# Interpretation:
+# |d| < 0.2: negligible
+# |d| ~ 0.2: small
+# |d| ~ 0.5: medium
+# |d| ~ 0.8: large
+```
+
+**Eta-squared** (for ANOVA):
+
+```python
+# From ANOVA table
+# η² = SS_between / SS_total
+
+def eta_squared(anova_table):
+    return anova_table['sum_sq'][0] / anova_table['sum_sq'].sum()
+
+# After running ANOVA
+eta_sq = eta_squared(anova_table)
+print(f"Eta-squared: {eta_sq:.4f}")
+
+# Interpretation:
+# 0.01: small effect
+# 0.06: medium effect
+# 0.14: large effect
+```
+
+## Contingency Tables and Association
+
+**McNemar's test** (paired binary data):
+
+```python
+from statsmodels.stats.contingency_tables import mcnemar
+
+# 2x2 contingency table
+table = [[a, b],
+         [c, d]]
+
+result = mcnemar(table, exact=True)  # or exact=False for large samples
+print(f"p-value: {result.pvalue:.4f}")
+
+# H0: Marginal probabilities are equal
+```
+
+**Cochran-Mantel-Haenszel test**:
+
+```python
+from statsmodels.stats.contingency_tables import StratifiedTable
+
+# For stratified 2x2 tables
+strat_table = StratifiedTable(tables_list)
+result = strat_table.test_null_odds()
+
+print(f"p-value: {result.pvalue:.4f}")
+```
+
+## Treatment Effects and Causal Inference
+
+**Propensity score matching**:
+
+```python
+from statsmodels.treatment import propensity_score
+
+# Estimate propensity scores
+ps_model = sm.Logit(treatment, X).fit()
+propensity_scores = ps_model.predict(X)
+
+# Use for matching or weighting
+# (manual implementation of matching needed)
+```
+
+**Difference-in-differences**:
+
+```python
+# Did formula: outcome ~ treatment * post
+model = ols('outcome ~ treatment + post + treatment:post', data=df).fit()
+
+# DiD estimate is the interaction coefficient
+did_estimate = model.params['treatment:post']
+print(f"DiD estimate: {did_estimate:.4f}")
+```
+
+## Best Practices
+
+1. **Always check assumptions**: Test before interpreting results
+2. **Report effect sizes**: Not just p-values
+3. **Use appropriate tests**: Match test to data type and distribution
+4. **Correct for multiple comparisons**: When conducting many tests
+5. **Check sample size**: Ensure adequate power
+6. **Visual inspection**: Plot data before testing
+7. **Report confidence intervals**: Along with point estimates
+8. **Consider alternatives**: Non-parametric when assumptions violated
+9. **Robust standard errors**: Use when heteroskedasticity/autocorrelation present
+10. **Document decisions**: Note which tests used and why
+
+## Common Pitfalls
+
+1. **Not checking test assumptions**: May invalidate results
+2. **Multiple testing without correction**: Inflated Type I error
+3. **Using parametric tests on non-normal data**: Consider non-parametric
+4. **Ignoring heteroskedasticity**: Use robust SEs
+5. **Confusing statistical and practical significance**: Check effect sizes
+6. **Not reporting confidence intervals**: Only p-values insufficient
+7. **Using wrong test**: Match test to research question
+8. **Insufficient power**: Risk of Type II error (false negatives)
+9. **p-hacking**: Testing many specifications until significant
+10. **Overinterpreting p-values**: Remember limitations of NHST
--- a/skills/statsmodels/references/time_series.md
+++ b/skills/statsmodels/references/time_series.md
@@ -0,0 +1,716 @@
+# Time Series Analysis Reference
+
+This document provides comprehensive guidance on time series models in statsmodels, including ARIMA, state space models, VAR, exponential smoothing, and forecasting methods.
+
+## Overview
+
+Statsmodels offers extensive time series capabilities:
+- **Univariate models**: AR, ARIMA, SARIMAX, Exponential Smoothing
+- **Multivariate models**: VAR, VARMAX, Dynamic Factor Models
+- **State space framework**: Custom models, Kalman filtering
+- **Diagnostic tools**: ACF, PACF, stationarity tests, residual analysis
+- **Forecasting**: Point forecasts and prediction intervals
+
+## Univariate Time Series Models
+
+### AutoReg (AR Model)
+
+Autoregressive model: current value depends on past values.
+
+**When to use:**
+- Univariate time series
+- Past values predict future
+- Stationary series
+
+**Model**: yₜ = c + φ₁yₜ₋₁ + φ₂yₜ₋₂ + ... + φₚyₜ₋ₚ + εₜ
+
+```python
+from statsmodels.tsa.ar_model import AutoReg
+import pandas as pd
+
+# Fit AR(p) model
+model = AutoReg(y, lags=5)  # AR(5)
+results = model.fit()
+
+print(results.summary())
+```
+
+**With exogenous regressors:**
+```python
+# AR with exogenous variables (ARX)
+model = AutoReg(y, lags=5, exog=X_exog)
+results = model.fit()
+```
+
+**Seasonal AR:**
+```python
+# Seasonal lags (e.g., monthly data with yearly seasonality)
+model = AutoReg(y, lags=12, seasonal=True)
+results = model.fit()
+```
+
+### ARIMA (Autoregressive Integrated Moving Average)
+
+Combines AR, differencing (I), and MA components.
+
+**When to use:**
+- Non-stationary time series (needs differencing)
+- Past values and errors predict future
+- Flexible model for many time series
+
+**Model**: ARIMA(p,d,q)
+- p: AR order (lags)
+- d: differencing order (to achieve stationarity)
+- q: MA order (lagged forecast errors)
+
+```python
+from statsmodels.tsa.arima.model import ARIMA
+
+# Fit ARIMA(p,d,q)
+model = ARIMA(y, order=(1, 1, 1))  # ARIMA(1,1,1)
+results = model.fit()
+
+print(results.summary())
+```
+
+**Choosing p, d, q:**
+
+1. **Determine d (differencing order)**:
+```python
+from statsmodels.tsa.stattools import adfuller
+
+# ADF test for stationarity
+def check_stationarity(series):
+    result = adfuller(series)
+    print(f"ADF Statistic: {result[0]:.4f}")
+    print(f"p-value: {result[1]:.4f}")
+    if result[1] <= 0.05:
+        print("Series is stationary")
+        return True
+    else:
+        print("Series is non-stationary, needs differencing")
+        return False
+
+# Test original series
+if not check_stationarity(y):
+    # Difference once
+    y_diff = y.diff().dropna()
+    if not check_stationarity(y_diff):
+        # Difference again
+        y_diff2 = y_diff.diff().dropna()
+        check_stationarity(y_diff2)
+```
+
+2. **Determine p and q (ACF/PACF)**:
+```python
+from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
+import matplotlib.pyplot as plt
+
+# After differencing to stationarity
+fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
+
+# ACF: helps determine q (MA order)
+plot_acf(y_stationary, lags=40, ax=ax1)
+ax1.set_title('Autocorrelation Function (ACF)')
+
+# PACF: helps determine p (AR order)
+plot_pacf(y_stationary, lags=40, ax=ax2)
+ax2.set_title('Partial Autocorrelation Function (PACF)')
+
+plt.tight_layout()
+plt.show()
+
+# Rules of thumb:
+# - PACF cuts off at lag p → AR(p)
+# - ACF cuts off at lag q → MA(q)
+# - Both decay → ARMA(p,q)
+```
+
+3. **Model selection (AIC/BIC)**:
+```python
+# Grid search for best (p,q) given d
+import numpy as np
+
+best_aic = np.inf
+best_order = None
+
+for p in range(5):
+    for q in range(5):
+        try:
+            model = ARIMA(y, order=(p, d, q))
+            results = model.fit()
+            if results.aic < best_aic:
+                best_aic = results.aic
+                best_order = (p, d, q)
+        except:
+            continue
+
+print(f"Best order: {best_order} with AIC: {best_aic:.2f}")
+```
+
+### SARIMAX (Seasonal ARIMA with Exogenous Variables)
+
+Extends ARIMA with seasonality and exogenous regressors.
+
+**When to use:**
+- Seasonal patterns (monthly, quarterly data)
+- External variables influence series
+- Most flexible univariate model
+
+**Model**: SARIMAX(p,d,q)(P,D,Q,s)
+- (p,d,q): Non-seasonal ARIMA
+- (P,D,Q,s): Seasonal ARIMA with period s
+
+```python
+from statsmodels.tsa.statespace.sarimax import SARIMAX
+
+# Seasonal ARIMA for monthly data (s=12)
+model = SARIMAX(y,
+                order=(1, 1, 1),           # (p,d,q)
+                seasonal_order=(1, 1, 1, 12))  # (P,D,Q,s)
+results = model.fit()
+
+print(results.summary())
+```
+
+**With exogenous variables:**
+```python
+# SARIMAX with external predictors
+model = SARIMAX(y,
+                exog=X_exog,
+                order=(1, 1, 1),
+                seasonal_order=(1, 1, 1, 12))
+results = model.fit()
+```
+
+**Example: Monthly sales with trend and seasonality**
+```python
+# Typical for monthly data: (p,d,q)(P,D,Q,12)
+# Start with (1,1,1)(1,1,1,12) or (0,1,1)(0,1,1,12)
+
+model = SARIMAX(monthly_sales,
+                order=(0, 1, 1),
+                seasonal_order=(0, 1, 1, 12),
+                enforce_stationarity=False,
+                enforce_invertibility=False)
+results = model.fit()
+```
+
+### Exponential Smoothing
+
+Weighted averages of past observations with exponentially decreasing weights.
+
+**When to use:**
+- Simple, interpretable forecasts
+- Trend and/or seasonality present
+- No need for explicit model specification
+
+**Types:**
+- Simple Exponential Smoothing: no trend, no seasonality
+- Holt's method: with trend
+- Holt-Winters: with trend and seasonality
+
+```python
+from statsmodels.tsa.holtwinters import ExponentialSmoothing
+
+# Simple exponential smoothing
+model = ExponentialSmoothing(y, trend=None, seasonal=None)
+results = model.fit()
+
+# Holt's method (with trend)
+model = ExponentialSmoothing(y, trend='add', seasonal=None)
+results = model.fit()
+
+# Holt-Winters (trend + seasonality)
+model = ExponentialSmoothing(y,
+                            trend='add',           # 'add' or 'mul'
+                            seasonal='add',        # 'add' or 'mul'
+                            seasonal_periods=12)   # e.g., 12 for monthly
+results = model.fit()
+
+print(results.summary())
+```
+
+**Additive vs Multiplicative:**
+```python
+# Additive: constant seasonal variation
+# yₜ = Level + Trend + Seasonal + Error
+
+# Multiplicative: proportional seasonal variation
+# yₜ = Level × Trend × Seasonal × Error
+
+# Choose based on data:
+# - Additive: seasonal variation constant over time
+# - Multiplicative: seasonal variation increases with level
+```
+
+**Innovations state space (ETS):**
+```python
+from statsmodels.tsa.exponential_smoothing.ets import ETSModel
+
+# More robust, state space formulation
+model = ETSModel(y,
+                error='add',           # 'add' or 'mul'
+                trend='add',           # 'add', 'mul', or None
+                seasonal='add',        # 'add', 'mul', or None
+                seasonal_periods=12)
+results = model.fit()
+```
+
+## Multivariate Time Series
+
+### VAR (Vector Autoregression)
+
+System of equations where each variable depends on past values of all variables.
+
+**When to use:**
+- Multiple interrelated time series
+- Bidirectional relationships
+- Granger causality testing
+
+**Model**: Each variable is AR on all variables:
+- y₁ₜ = c₁ + φ₁₁y₁ₜ₋₁ + φ₁₂y₂ₜ₋₁ + ... + ε₁ₜ
+- y₂ₜ = c₂ + φ₂₁y₁ₜ₋₁ + φ₂₂y₂ₜ₋₁ + ... + ε₂ₜ
+
+```python
+from statsmodels.tsa.api import VAR
+import pandas as pd
+
+# Data should be DataFrame with multiple columns
+# Each column is a time series
+df_multivariate = pd.DataFrame({'series1': y1, 'series2': y2, 'series3': y3})
+
+# Fit VAR
+model = VAR(df_multivariate)
+
+# Select lag order using AIC/BIC
+lag_order_results = model.select_order(maxlags=15)
+print(lag_order_results.summary())
+
+# Fit with optimal lags
+results = model.fit(maxlags=5, ic='aic')
+print(results.summary())
+```
+
+**Granger causality testing:**
+```python
+# Test if series1 Granger-causes series2
+from statsmodels.tsa.stattools import grangercausalitytests
+
+# Requires 2D array [series2, series1]
+test_data = df_multivariate[['series2', 'series1']]
+
+# Test up to max_lag
+max_lag = 5
+results = grangercausalitytests(test_data, max_lag, verbose=True)
+
+# P-values for each lag
+for lag in range(1, max_lag + 1):
+    p_value = results[lag][0]['ssr_ftest'][1]
+    print(f"Lag {lag}: p-value = {p_value:.4f}")
+```
+
+**Impulse Response Functions (IRF):**
+```python
+# Trace effect of shock through system
+irf = results.irf(10)  # 10 periods ahead
+
+# Plot IRFs
+irf.plot(orth=True)  # Orthogonalized (Cholesky decomposition)
+plt.show()
+
+# Cumulative effects
+irf.plot_cum_effects(orth=True)
+plt.show()
+```
+
+**Forecast Error Variance Decomposition:**
+```python
+# Contribution of each variable to forecast error variance
+fevd = results.fevd(10)  # 10 periods ahead
+fevd.plot()
+plt.show()
+```
+
+### VARMAX (VAR with Moving Average and Exogenous Variables)
+
+Extends VAR with MA component and external regressors.
+
+**When to use:**
+- VAR inadequate (MA component needed)
+- External variables affect system
+- More flexible multivariate model
+
+```python
+from statsmodels.tsa.statespace.varmax import VARMAX
+
+# VARMAX(p, q) with exogenous variables
+model = VARMAX(df_multivariate,
+               order=(1, 1),        # (p, q)
+               exog=X_exog)
+results = model.fit()
+
+print(results.summary())
+```
+
+## State Space Models
+
+Flexible framework for custom time series models.
+
+**When to use:**
+- Custom model specification
+- Unobserved components
+- Kalman filtering/smoothing
+- Missing data
+
+```python
+from statsmodels.tsa.statespace.mlemodel import MLEModel
+
+# Extend MLEModel for custom state space models
+# Example: Local level model (random walk + noise)
+```
+
+**Dynamic Factor Models:**
+```python
+from statsmodels.tsa.statespace.dynamic_factor import DynamicFactor
+
+# Extract common factors from multiple time series
+model = DynamicFactor(df_multivariate,
+                      k_factors=2,          # Number of factors
+                      factor_order=2)       # AR order of factors
+results = model.fit()
+
+# Estimated factors
+factors = results.factors.filtered
+```
+
+## Forecasting
+
+### Point Forecasts
+
+```python
+# ARIMA forecasting
+model = ARIMA(y, order=(1, 1, 1))
+results = model.fit()
+
+# Forecast h steps ahead
+h = 10
+forecast = results.forecast(steps=h)
+
+# With exogenous variables (SARIMAX)
+model = SARIMAX(y, exog=X, order=(1, 1, 1))
+results = model.fit()
+
+# Need future exogenous values
+forecast = results.forecast(steps=h, exog=X_future)
+```
+
+### Prediction Intervals
+
+```python
+# Get forecast with confidence intervals
+forecast_obj = results.get_forecast(steps=h)
+forecast_df = forecast_obj.summary_frame()
+
+print(forecast_df)
+# Contains: mean, mean_se, mean_ci_lower, mean_ci_upper
+
+# Extract components
+forecast_mean = forecast_df['mean']
+forecast_ci_lower = forecast_df['mean_ci_lower']
+forecast_ci_upper = forecast_df['mean_ci_upper']
+
+# Plot
+import matplotlib.pyplot as plt
+
+plt.figure(figsize=(12, 6))
+plt.plot(y.index, y, label='Historical')
+plt.plot(forecast_df.index, forecast_mean, label='Forecast', color='red')
+plt.fill_between(forecast_df.index,
+                 forecast_ci_lower,
+                 forecast_ci_upper,
+                 alpha=0.3, color='red', label='95% CI')
+plt.legend()
+plt.title('Forecast with Prediction Intervals')
+plt.show()
+```
+
+### Dynamic vs Static Forecasts
+
+```python
+# Static (one-step-ahead, using actual values)
+static_forecast = results.get_prediction(start=split_point, end=len(y)-1)
+
+# Dynamic (multi-step, using predicted values)
+dynamic_forecast = results.get_prediction(start=split_point,
+                                          end=len(y)-1,
+                                          dynamic=True)
+
+# Plot comparison
+fig, ax = plt.subplots(figsize=(12, 6))
+y.plot(ax=ax, label='Actual')
+static_forecast.predicted_mean.plot(ax=ax, label='Static forecast')
+dynamic_forecast.predicted_mean.plot(ax=ax, label='Dynamic forecast')
+ax.legend()
+plt.show()
+```
+
+## Diagnostic Tests
+
+### Stationarity Tests
+
+```python
+from statsmodels.tsa.stattools import adfuller, kpss
+
+# Augmented Dickey-Fuller (ADF) test
+# H0: unit root (non-stationary)
+adf_result = adfuller(y, autolag='AIC')
+print(f"ADF Statistic: {adf_result[0]:.4f}")
+print(f"p-value: {adf_result[1]:.4f}")
+if adf_result[1] <= 0.05:
+    print("Reject H0: Series is stationary")
+else:
+    print("Fail to reject H0: Series is non-stationary")
+
+# KPSS test
+# H0: stationary (opposite of ADF)
+kpss_result = kpss(y, regression='c', nlags='auto')
+print(f"KPSS Statistic: {kpss_result[0]:.4f}")
+print(f"p-value: {kpss_result[1]:.4f}")
+if kpss_result[1] <= 0.05:
+    print("Reject H0: Series is non-stationary")
+else:
+    print("Fail to reject H0: Series is stationary")
+```
+
+### Residual Diagnostics
+
+```python
+# Ljung-Box test for autocorrelation in residuals
+from statsmodels.stats.diagnostic import acorr_ljungbox
+
+lb_test = acorr_ljungbox(results.resid, lags=10, return_df=True)
+print(lb_test)
+# P-values > 0.05 indicate no significant autocorrelation (good)
+
+# Plot residual diagnostics
+results.plot_diagnostics(figsize=(12, 8))
+plt.show()
+
+# Components:
+# 1. Standardized residuals over time
+# 2. Histogram + KDE of residuals
+# 3. Q-Q plot for normality
+# 4. Correlogram (ACF of residuals)
+```
+
+### Heteroskedasticity Tests
+
+```python
+from statsmodels.stats.diagnostic import het_arch
+
+# ARCH test for heteroskedasticity
+arch_test = het_arch(results.resid, nlags=10)
+print(f"ARCH test statistic: {arch_test[0]:.4f}")
+print(f"p-value: {arch_test[1]:.4f}")
+
+# If significant, consider GARCH model
+```
+
+## Seasonal Decomposition
+
+```python
+from statsmodels.tsa.seasonal import seasonal_decompose
+
+# Decompose into trend, seasonal, residual
+decomposition = seasonal_decompose(y,
+                                   model='additive',  # or 'multiplicative'
+                                   period=12)         # seasonal period
+
+# Plot components
+fig = decomposition.plot()
+fig.set_size_inches(12, 8)
+plt.show()
+
+# Access components
+trend = decomposition.trend
+seasonal = decomposition.seasonal
+residual = decomposition.resid
+
+# STL decomposition (more robust)
+from statsmodels.tsa.seasonal import STL
+
+stl = STL(y, seasonal=13)  # seasonal must be odd
+stl_result = stl.fit()
+
+fig = stl_result.plot()
+plt.show()
+```
+
+## Model Evaluation
+
+### In-Sample Metrics
+
+```python
+# From results object
+print(f"AIC: {results.aic:.2f}")
+print(f"BIC: {results.bic:.2f}")
+print(f"Log-likelihood: {results.llf:.2f}")
+
+# MSE on training data
+from sklearn.metrics import mean_squared_error
+
+mse = mean_squared_error(y, results.fittedvalues)
+rmse = np.sqrt(mse)
+print(f"RMSE: {rmse:.4f}")
+
+# MAE
+from sklearn.metrics import mean_absolute_error
+mae = mean_absolute_error(y, results.fittedvalues)
+print(f"MAE: {mae:.4f}")
+```
+
+### Out-of-Sample Evaluation
+
+```python
+# Train-test split for time series (no shuffle!)
+train_size = int(0.8 * len(y))
+y_train = y[:train_size]
+y_test = y[train_size:]
+
+# Fit on training data
+model = ARIMA(y_train, order=(1, 1, 1))
+results = model.fit()
+
+# Forecast test period
+forecast = results.forecast(steps=len(y_test))
+
+# Metrics
+from sklearn.metrics import mean_squared_error, mean_absolute_error
+
+rmse = np.sqrt(mean_squared_error(y_test, forecast))
+mae = mean_absolute_error(y_test, forecast)
+mape = np.mean(np.abs((y_test - forecast) / y_test)) * 100
+
+print(f"Test RMSE: {rmse:.4f}")
+print(f"Test MAE: {mae:.4f}")
+print(f"Test MAPE: {mape:.2f}%")
+```
+
+### Rolling Forecast
+
+```python
+# More realistic evaluation: rolling one-step-ahead forecasts
+forecasts = []
+
+for t in range(len(y_test)):
+    # Refit or update with new observation
+    y_current = y[:train_size + t]
+    model = ARIMA(y_current, order=(1, 1, 1))
+    fit = model.fit()
+
+    # One-step forecast
+    fc = fit.forecast(steps=1)[0]
+    forecasts.append(fc)
+
+forecasts = np.array(forecasts)
+
+rmse = np.sqrt(mean_squared_error(y_test, forecasts))
+print(f"Rolling forecast RMSE: {rmse:.4f}")
+```
+
+### Cross-Validation
+
+```python
+# Time series cross-validation (expanding window)
+from sklearn.model_selection import TimeSeriesSplit
+
+tscv = TimeSeriesSplit(n_splits=5)
+rmse_scores = []
+
+for train_idx, test_idx in tscv.split(y):
+    y_train_cv = y.iloc[train_idx]
+    y_test_cv = y.iloc[test_idx]
+
+    model = ARIMA(y_train_cv, order=(1, 1, 1))
+    results = model.fit()
+
+    forecast = results.forecast(steps=len(test_idx))
+    rmse = np.sqrt(mean_squared_error(y_test_cv, forecast))
+    rmse_scores.append(rmse)
+
+print(f"CV RMSE: {np.mean(rmse_scores):.4f} ± {np.std(rmse_scores):.4f}")
+```
+
+## Advanced Topics
+
+### ARDL (Autoregressive Distributed Lag)
+
+Bridges univariate and multivariate time series.
+
+```python
+from statsmodels.tsa.ardl import ARDL
+
+# ARDL(p, q) model
+# y depends on its own lags and lags of X
+model = ARDL(y, lags=2, exog=X, exog_lags=2)
+results = model.fit()
+```
+
+### Error Correction Models
+
+For cointegrated series.
+
+```python
+from statsmodels.tsa.vector_ar.vecm import coint_johansen
+
+# Test for cointegration
+johansen_test = coint_johansen(df_multivariate, det_order=0, k_ar_diff=1)
+
+# Fit VECM if cointegrated
+from statsmodels.tsa.vector_ar.vecm import VECM
+
+model = VECM(df_multivariate, k_ar_diff=1, coint_rank=1)
+results = model.fit()
+```
+
+### Regime Switching Models
+
+For structural breaks and regime changes.
+
+```python
+from statsmodels.tsa.regime_switching.markov_regression import MarkovRegression
+
+# Markov switching model
+model = MarkovRegression(y, k_regimes=2, order=1)
+results = model.fit()
+
+# Smoothed probabilities of regimes
+regime_probs = results.smoothed_marginal_probabilities
+```
+
+## Best Practices
+
+1. **Check stationarity**: Difference if needed, verify with ADF/KPSS tests
+2. **Plot data**: Always visualize before modeling
+3. **Identify seasonality**: Use appropriate seasonal models (SARIMAX, Holt-Winters)
+4. **Model selection**: Use AIC/BIC and out-of-sample validation
+5. **Residual diagnostics**: Check for autocorrelation, normality, heteroskedasticity
+6. **Forecast evaluation**: Use rolling forecasts and proper time series CV
+7. **Avoid overfitting**: Prefer simpler models, use information criteria
+8. **Document assumptions**: Note any data transformations (log, differencing)
+9. **Prediction intervals**: Always provide uncertainty estimates
+10. **Refit regularly**: Update models as new data arrives
+
+## Common Pitfalls
+
+1. **Not checking stationarity**: Fit ARIMA on non-stationary data
+2. **Data leakage**: Using future data in transformations
+3. **Wrong seasonal period**: S=4 for quarterly, S=12 for monthly
+4. **Overfitting**: Too many parameters relative to data
+5. **Ignoring residual autocorrelation**: Model inadequate
+6. **Using inappropriate metrics**: MAPE fails with zeros or negatives
+7. **Not handling missing data**: Affects model estimation
+8. **Extrapolating exogenous variables**: Need future X values for SARIMAX
+9. **Confusing static vs dynamic forecasts**: Dynamic more realistic for multi-step
+10. **Not validating forecasts**: Always check out-of-sample performance