# Bayesian Statistical Analysis This document provides guidance on conducting and interpreting Bayesian statistical analyses, which offer an alternative framework to frequentist (classical) statistics. ## Bayesian vs. Frequentist Philosophy ### Fundamental Differences | Aspect | Frequentist | Bayesian | |--------|-------------|----------| | **Probability interpretation** | Long-run frequency of events | Degree of belief/uncertainty | | **Parameters** | Fixed but unknown | Random variables with distributions | | **Inference** | Based on sampling distributions | Based on posterior distributions | | **Primary output** | p-values, confidence intervals | Posterior probabilities, credible intervals | | **Prior information** | Not formally incorporated | Explicitly incorporated via priors | | **Hypothesis testing** | Reject/fail to reject null | Probability of hypotheses given data | | **Sample size** | Often requires minimum | Can work with any sample size | | **Interpretation** | Indirect (probability of data given H₀) | Direct (probability of hypothesis given data) | ### Key Question Difference **Frequentist**: "If the null hypothesis is true, what is the probability of observing data this extreme or more extreme?" **Bayesian**: "Given the observed data, what is the probability that the hypothesis is true?" The Bayesian question is more intuitive and directly addresses what researchers want to know. --- ## Bayes' Theorem **Formula**: ``` P(θ|D) = P(D|θ) × P(θ) / P(D) ``` **In words**: ``` Posterior = Likelihood × Prior / Evidence ``` Where: - **θ (theta)**: Parameter of interest (e.g., mean difference, correlation) - **D**: Observed data - **P(θ|D)**: Posterior distribution (belief about θ after seeing data) - **P(D|θ)**: Likelihood (probability of data given θ) - **P(θ)**: Prior distribution (belief about θ before seeing data) - **P(D)**: Marginal likelihood/evidence (normalizing constant) --- ## Prior Distributions ### Types of Priors #### 1. Informative Priors **When to use**: When you have substantial prior knowledge from: - Previous studies - Expert knowledge - Theory - Pilot data **Example**: Meta-analysis shows effect size d ≈ 0.40, SD = 0.15 - Prior: Normal(0.40, 0.15) **Advantages**: - Incorporates existing knowledge - More efficient (smaller samples needed) - Can stabilize estimates with small data **Disadvantages**: - Subjective (but subjectivity can be strength) - Must be justified and transparent - May be controversial if strong prior conflicts with data --- #### 2. Weakly Informative Priors **When to use**: Default choice for most applications **Characteristics**: - Regularizes estimates (prevents extreme values) - Has minimal influence on posterior with moderate data - Prevents computational issues **Example priors**: - Effect size: Normal(0, 1) or Cauchy(0, 0.707) - Variance: Half-Cauchy(0, 1) - Correlation: Uniform(-1, 1) or Beta(2, 2) **Advantages**: - Balances objectivity and regularization - Computationally stable - Broadly acceptable --- #### 3. Non-Informative (Flat/Uniform) Priors **When to use**: When attempting to be "objective" **Example**: Uniform(-∞, ∞) for any value **⚠️ Caution**: - Can lead to improper posteriors - May produce non-sensible results - Not truly "non-informative" (still makes assumptions) - Often not recommended in modern Bayesian practice **Better alternative**: Use weakly informative priors --- ### Prior Sensitivity Analysis **Always conduct**: Test how results change with different priors **Process**: 1. Fit model with default/planned prior 2. Fit model with more diffuse prior 3. Fit model with more concentrated prior 4. Compare posterior distributions **Reporting**: - If results are similar: Evidence is robust - If results differ substantially: Data are not strong enough to overwhelm prior **Python example**: ```python import pymc as pm # Model with different priors priors = [ ('weakly_informative', pm.Normal.dist(0, 1)), ('diffuse', pm.Normal.dist(0, 10)), ('informative', pm.Normal.dist(0.5, 0.3)) ] results = {} for name, prior in priors: with pm.Model(): effect = pm.Normal('effect', mu=prior.mu, sigma=prior.sigma) # ... rest of model trace = pm.sample() results[name] = trace ``` --- ## Bayesian Hypothesis Testing ### Bayes Factor (BF) **What it is**: Ratio of evidence for two competing hypotheses **Formula**: ``` BF₁₀ = P(D|H₁) / P(D|H₀) ``` **Interpretation**: | BF₁₀ | Evidence | |------|----------| | >100 | Decisive for H₁ | | 30-100 | Very strong for H₁ | | 10-30 | Strong for H₁ | | 3-10 | Moderate for H₁ | | 1-3 | Anecdotal for H₁ | | 1 | No evidence | | 1/3-1 | Anecdotal for H₀ | | 1/10-1/3 | Moderate for H₀ | | 1/30-1/10 | Strong for H₀ | | 1/100-1/30 | Very strong for H₀ | | <1/100 | Decisive for H₀ | **Advantages over p-values**: 1. Can provide evidence for null hypothesis 2. Not dependent on sampling intentions (no "peeking" problem) 3. Directly quantifies evidence 4. Can be updated with more data **Python calculation**: ```python import pingouin as pg # Note: Limited BF support in Python # Better options: R packages (BayesFactor), JASP software # Approximate BF from t-statistic # Using Jeffreys-Zellner-Siow prior from scipy import stats def bf_from_t(t, n1, n2, r_scale=0.707): """ Approximate Bayes Factor from t-statistic r_scale: Cauchy prior scale (default 0.707 for medium effect) """ # This is simplified; use dedicated packages for accurate calculation df = n1 + n2 - 2 # Implementation requires numerical integration pass ``` --- ### Region of Practical Equivalence (ROPE) **Purpose**: Define range of negligible effect sizes **Process**: 1. Define ROPE (e.g., d ∈ [-0.1, 0.1] for negligible effects) 2. Calculate % of posterior inside ROPE 3. Make decision: - >95% in ROPE: Accept practical equivalence - >95% outside ROPE: Reject equivalence - Otherwise: Inconclusive **Advantage**: Directly tests for practical significance **Python example**: ```python # Define ROPE rope_lower, rope_upper = -0.1, 0.1 # Calculate % of posterior in ROPE in_rope = np.mean((posterior_samples > rope_lower) & (posterior_samples < rope_upper)) print(f"{in_rope*100:.1f}% of posterior in ROPE") ``` --- ## Bayesian Estimation ### Credible Intervals **What it is**: Interval containing parameter with X% probability **95% Credible Interval interpretation**: > "There is a 95% probability that the true parameter lies in this interval." **This is what people THINK confidence intervals mean** (but don't in frequentist framework) **Types**: #### Equal-Tailed Interval (ETI) - 2.5th to 97.5th percentile - Simple to calculate - May not include mode for skewed distributions #### Highest Density Interval (HDI) - Narrowest interval containing 95% of distribution - Always includes mode - Better for skewed distributions **Python calculation**: ```python import arviz as az # Equal-tailed interval eti = np.percentile(posterior_samples, [2.5, 97.5]) # HDI hdi = az.hdi(posterior_samples, hdi_prob=0.95) ``` --- ### Posterior Distributions **Interpreting posterior distributions**: 1. **Central tendency**: - Mean: Average posterior value - Median: 50th percentile - Mode: Most probable value (MAP - Maximum A Posteriori) 2. **Uncertainty**: - SD: Spread of posterior - Credible intervals: Quantify uncertainty 3. **Shape**: - Symmetric: Similar to normal - Skewed: Asymmetric uncertainty - Multimodal: Multiple plausible values **Visualization**: ```python import matplotlib.pyplot as plt import arviz as az # Posterior plot with HDI az.plot_posterior(trace, hdi_prob=0.95) # Trace plot (check convergence) az.plot_trace(trace) # Forest plot (multiple parameters) az.plot_forest(trace) ``` --- ## Common Bayesian Analyses ### Bayesian T-Test **Purpose**: Compare two groups (Bayesian alternative to t-test) **Outputs**: 1. Posterior distribution of mean difference 2. 95% credible interval 3. Bayes Factor (BF₁₀) 4. Probability of directional hypothesis (e.g., P(μ₁ > μ₂)) **Python implementation**: ```python import pymc as pm import arviz as az # Bayesian independent samples t-test with pm.Model() as model: # Priors for group means mu1 = pm.Normal('mu1', mu=0, sigma=10) mu2 = pm.Normal('mu2', mu=0, sigma=10) # Prior for pooled standard deviation sigma = pm.HalfNormal('sigma', sigma=10) # Likelihood y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group1) y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group2) # Derived quantity: mean difference diff = pm.Deterministic('diff', mu1 - mu2) # Sample posterior trace = pm.sample(2000, tune=1000, return_inferencedata=True) # Analyze results print(az.summary(trace, var_names=['mu1', 'mu2', 'diff'])) # Probability that group1 > group2 prob_greater = np.mean(trace.posterior['diff'].values > 0) print(f"P(μ₁ > μ₂) = {prob_greater:.3f}") # Plot posterior az.plot_posterior(trace, var_names=['diff'], ref_val=0) ``` --- ### Bayesian ANOVA **Purpose**: Compare three or more groups **Model**: ```python import pymc as pm with pm.Model() as anova_model: # Hyperpriors mu_global = pm.Normal('mu_global', mu=0, sigma=10) sigma_between = pm.HalfNormal('sigma_between', sigma=5) sigma_within = pm.HalfNormal('sigma_within', sigma=5) # Group means (hierarchical) group_means = pm.Normal('group_means', mu=mu_global, sigma=sigma_between, shape=n_groups) # Likelihood y = pm.Normal('y', mu=group_means[group_idx], sigma=sigma_within, observed=data) trace = pm.sample(2000, tune=1000, return_inferencedata=True) # Posterior contrasts contrast_1_2 = trace.posterior['group_means'][:,:,0] - trace.posterior['group_means'][:,:,1] ``` --- ### Bayesian Correlation **Purpose**: Estimate correlation between two variables **Advantage**: Provides distribution of correlation values **Python implementation**: ```python import pymc as pm with pm.Model() as corr_model: # Prior on correlation rho = pm.Uniform('rho', lower=-1, upper=1) # Convert to covariance matrix cov_matrix = pm.math.stack([[1, rho], [rho, 1]]) # Likelihood (bivariate normal) obs = pm.MvNormal('obs', mu=[0, 0], cov=cov_matrix, observed=np.column_stack([x, y])) trace = pm.sample(2000, tune=1000, return_inferencedata=True) # Summarize correlation print(az.summary(trace, var_names=['rho'])) # Probability that correlation is positive prob_positive = np.mean(trace.posterior['rho'].values > 0) ``` --- ### Bayesian Linear Regression **Purpose**: Model relationship between predictors and outcome **Advantages**: - Uncertainty in all parameters - Natural regularization (via priors) - Can incorporate prior knowledge - Credible intervals for predictions **Python implementation**: ```python import pymc as pm with pm.Model() as regression_model: # Priors for coefficients alpha = pm.Normal('alpha', mu=0, sigma=10) # Intercept beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors) sigma = pm.HalfNormal('sigma', sigma=10) # Expected value mu = alpha + pm.math.dot(X, beta) # Likelihood y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y) trace = pm.sample(2000, tune=1000, return_inferencedata=True) # Posterior predictive checks with regression_model: ppc = pm.sample_posterior_predictive(trace) az.plot_ppc(ppc) # Predictions with uncertainty with regression_model: pm.set_data({'X': X_new}) posterior_pred = pm.sample_posterior_predictive(trace) ``` --- ## Hierarchical (Multilevel) Models **When to use**: - Nested/clustered data (students within schools) - Repeated measures - Meta-analysis - Varying effects across groups **Key concept**: Partial pooling - Complete pooling: Ignore groups (biased) - No pooling: Analyze groups separately (high variance) - Partial pooling: Borrow strength across groups (Bayesian) **Example: Varying intercepts**: ```python with pm.Model() as hierarchical_model: # Hyperpriors mu_global = pm.Normal('mu_global', mu=0, sigma=10) sigma_between = pm.HalfNormal('sigma_between', sigma=5) sigma_within = pm.HalfNormal('sigma_within', sigma=5) # Group-level intercepts alpha = pm.Normal('alpha', mu=mu_global, sigma=sigma_between, shape=n_groups) # Likelihood y_obs = pm.Normal('y_obs', mu=alpha[group_idx], sigma=sigma_within, observed=y) trace = pm.sample() ``` --- ## Model Comparison ### Methods #### 1. Bayes Factor - Directly compares model evidence - Sensitive to prior specification - Can be computationally intensive #### 2. Information Criteria **WAIC (Widely Applicable Information Criterion)**: - Bayesian analog of AIC - Lower is better - Accounts for effective number of parameters **LOO (Leave-One-Out Cross-Validation)**: - Estimates out-of-sample prediction error - Lower is better - More robust than WAIC **Python calculation**: ```python import arviz as az # Calculate WAIC and LOO waic = az.waic(trace) loo = az.loo(trace) print(f"WAIC: {waic.elpd_waic:.2f}") print(f"LOO: {loo.elpd_loo:.2f}") # Compare multiple models comparison = az.compare({ 'model1': trace1, 'model2': trace2, 'model3': trace3 }) print(comparison) ``` --- ## Checking Bayesian Models ### 1. Convergence Diagnostics **R-hat (Gelman-Rubin statistic)**: - Compares within-chain and between-chain variance - Values close to 1.0 indicate convergence - R-hat < 1.01: Good - R-hat > 1.05: Poor convergence **Effective Sample Size (ESS)**: - Number of independent samples - Higher is better - ESS > 400 per chain recommended **Trace plots**: - Should look like "fuzzy caterpillar" - No trends, no stuck chains **Python checking**: ```python # Automatic summary with diagnostics print(az.summary(trace, var_names=['parameter'])) # Visual diagnostics az.plot_trace(trace) az.plot_rank(trace) # Rank plots ``` --- ### 2. Posterior Predictive Checks **Purpose**: Does model generate data similar to observed data? **Process**: 1. Generate predictions from posterior 2. Compare to actual data 3. Look for systematic discrepancies **Python implementation**: ```python with model: ppc = pm.sample_posterior_predictive(trace) # Visual check az.plot_ppc(ppc, num_pp_samples=100) # Quantitative checks obs_mean = np.mean(observed_data) pred_means = [np.mean(sample) for sample in ppc.posterior_predictive['y_obs']] p_value = np.mean(pred_means >= obs_mean) # Bayesian p-value ``` --- ## Reporting Bayesian Results ### Example T-Test Report > "A Bayesian independent samples t-test was conducted to compare groups A and B. Weakly informative priors were used: Normal(0, 1) for the mean difference and Half-Cauchy(0, 1) for the pooled standard deviation. The posterior distribution of the mean difference had a mean of 5.2 (95% CI [2.3, 8.1]), indicating that Group A scored higher than Group B. The Bayes Factor BF₁₀ = 23.5 provided strong evidence for a difference between groups, and there was a 99.7% probability that Group A's mean exceeded Group B's mean." ### Example Regression Report > "A Bayesian linear regression was fitted with weakly informative priors (Normal(0, 10) for coefficients, Half-Cauchy(0, 5) for residual SD). The model explained substantial variance (R² = 0.47, 95% CI [0.38, 0.55]). Study hours (β = 0.52, 95% CI [0.38, 0.66]) and prior GPA (β = 0.31, 95% CI [0.17, 0.45]) were credible predictors (95% CIs excluded zero). Posterior predictive checks showed good model fit. Convergence diagnostics were satisfactory (all R-hat < 1.01, ESS > 1000)." --- ## Advantages and Limitations ### Advantages 1. **Intuitive interpretation**: Direct probability statements about parameters 2. **Incorporates prior knowledge**: Uses all available information 3. **Flexible**: Handles complex models easily 4. **No p-hacking**: Can look at data as it arrives 5. **Quantifies uncertainty**: Full posterior distribution 6. **Small samples**: Works with any sample size ### Limitations 1. **Computational**: Requires MCMC sampling (can be slow) 2. **Prior specification**: Requires thought and justification 3. **Complexity**: Steeper learning curve 4. **Software**: Fewer tools than frequentist methods 5. **Communication**: May need to educate reviewers/readers --- ## Key Python Packages - **PyMC**: Full Bayesian modeling framework - **ArviZ**: Visualization and diagnostics - **Bambi**: High-level interface for regression models - **PyStan**: Python interface to Stan - **TensorFlow Probability**: Bayesian inference with TensorFlow --- ## When to Use Bayesian Methods **Use Bayesian when**: - You have prior information to incorporate - You want direct probability statements - Sample size is small - Model is complex (hierarchical, missing data, etc.) - You want to update analysis as data arrives **Frequentist may be sufficient when**: - Standard analysis with large sample - No prior information - Computational resources limited - Reviewers unfamiliar with Bayesian methods