gh-k-dense-ai-claude-scient…/skills/statistical-analysis/references/bayesian_statistics.md

# Bayesian Statistical Analysis

This document provides guidance on conducting and interpreting Bayesian statistical analyses, which offer an alternative framework to frequentist (classical) statistics.

## Bayesian vs. Frequentist Philosophy

### Fundamental Differences

| Aspect | Frequentist | Bayesian |
|--------|-------------|----------|
| **Probability interpretation** | Long-run frequency of events | Degree of belief/uncertainty |
| **Parameters** | Fixed but unknown | Random variables with distributions |
| **Inference** | Based on sampling distributions | Based on posterior distributions |
| **Primary output** | p-values, confidence intervals | Posterior probabilities, credible intervals |
| **Prior information** | Not formally incorporated | Explicitly incorporated via priors |
| **Hypothesis testing** | Reject/fail to reject null | Probability of hypotheses given data |
| **Sample size** | Often requires minimum | Can work with any sample size |
| **Interpretation** | Indirect (probability of data given H₀) | Direct (probability of hypothesis given data) |

### Key Question Difference

**Frequentist**: "If the null hypothesis is true, what is the probability of observing data this extreme or more extreme?"

**Bayesian**: "Given the observed data, what is the probability that the hypothesis is true?"

The Bayesian question is more intuitive and directly addresses what researchers want to know.

---

## Bayes' Theorem

**Formula**:
```
P(θ|D) = P(D|θ) × P(θ) / P(D)
```

**In words**:
```
Posterior = Likelihood × Prior / Evidence
```

Where:
- **θ (theta)**: Parameter of interest (e.g., mean difference, correlation)
- **D**: Observed data
- **P(θ|D)**: Posterior distribution (belief about θ after seeing data)
- **P(D|θ)**: Likelihood (probability of data given θ)
- **P(θ)**: Prior distribution (belief about θ before seeing data)
- **P(D)**: Marginal likelihood/evidence (normalizing constant)

---

## Prior Distributions

### Types of Priors

#### 1. Informative Priors

**When to use**: When you have substantial prior knowledge from:
- Previous studies
- Expert knowledge
- Theory
- Pilot data

**Example**: Meta-analysis shows effect size d ≈ 0.40, SD = 0.15
- Prior: Normal(0.40, 0.15)

**Advantages**:
- Incorporates existing knowledge
- More efficient (smaller samples needed)
- Can stabilize estimates with small data

**Disadvantages**:
- Subjective (but subjectivity can be strength)
- Must be justified and transparent
- May be controversial if strong prior conflicts with data

---

#### 2. Weakly Informative Priors

**When to use**: Default choice for most applications

**Characteristics**:
- Regularizes estimates (prevents extreme values)
- Has minimal influence on posterior with moderate data
- Prevents computational issues

**Example priors**:
- Effect size: Normal(0, 1) or Cauchy(0, 0.707)
- Variance: Half-Cauchy(0, 1)
- Correlation: Uniform(-1, 1) or Beta(2, 2)

**Advantages**:
- Balances objectivity and regularization
- Computationally stable
- Broadly acceptable

---

#### 3. Non-Informative (Flat/Uniform) Priors

**When to use**: When attempting to be "objective"

**Example**: Uniform(-∞, ∞) for any value

**⚠️ Caution**:
- Can lead to improper posteriors
- May produce non-sensible results
- Not truly "non-informative" (still makes assumptions)
- Often not recommended in modern Bayesian practice

**Better alternative**: Use weakly informative priors

---

### Prior Sensitivity Analysis

**Always conduct**: Test how results change with different priors

**Process**:
1. Fit model with default/planned prior
2. Fit model with more diffuse prior
3. Fit model with more concentrated prior
4. Compare posterior distributions

**Reporting**:
- If results are similar: Evidence is robust
- If results differ substantially: Data are not strong enough to overwhelm prior

**Python example**:
```python
import pymc as pm

# Model with different priors
priors = [
    ('weakly_informative', pm.Normal.dist(0, 1)),
    ('diffuse', pm.Normal.dist(0, 10)),
    ('informative', pm.Normal.dist(0.5, 0.3))
]

results = {}
for name, prior in priors:
    with pm.Model():
        effect = pm.Normal('effect', mu=prior.mu, sigma=prior.sigma)
        # ... rest of model
        trace = pm.sample()
        results[name] = trace
```

---

## Bayesian Hypothesis Testing

### Bayes Factor (BF)

**What it is**: Ratio of evidence for two competing hypotheses

**Formula**:
```
BF₁₀ = P(D|H₁) / P(D|H₀)
```

**Interpretation**:

| BF₁₀ | Evidence |
|------|----------|
| >100 | Decisive for H₁ |
| 30-100 | Very strong for H₁ |
| 10-30 | Strong for H₁ |
| 3-10 | Moderate for H₁ |
| 1-3 | Anecdotal for H₁ |
| 1 | No evidence |
| 1/3-1 | Anecdotal for H₀ |
| 1/10-1/3 | Moderate for H₀ |
| 1/30-1/10 | Strong for H₀ |
| 1/100-1/30 | Very strong for H₀ |
| <1/100 | Decisive for H₀ |

**Advantages over p-values**:
1. Can provide evidence for null hypothesis
2. Not dependent on sampling intentions (no "peeking" problem)
3. Directly quantifies evidence
4. Can be updated with more data

**Python calculation**:
```python
import pingouin as pg

# Note: Limited BF support in Python
# Better options: R packages (BayesFactor), JASP software

# Approximate BF from t-statistic
# Using Jeffreys-Zellner-Siow prior
from scipy import stats

def bf_from_t(t, n1, n2, r_scale=0.707):
    """
    Approximate Bayes Factor from t-statistic
    r_scale: Cauchy prior scale (default 0.707 for medium effect)
    """
    # This is simplified; use dedicated packages for accurate calculation
    df = n1 + n2 - 2
    # Implementation requires numerical integration
    pass
```

---

### Region of Practical Equivalence (ROPE)

**Purpose**: Define range of negligible effect sizes

**Process**:
1. Define ROPE (e.g., d ∈ [-0.1, 0.1] for negligible effects)
2. Calculate % of posterior inside ROPE
3. Make decision:
   - >95% in ROPE: Accept practical equivalence
   - >95% outside ROPE: Reject equivalence
   - Otherwise: Inconclusive

**Advantage**: Directly tests for practical significance

**Python example**:
```python
# Define ROPE
rope_lower, rope_upper = -0.1, 0.1

# Calculate % of posterior in ROPE
in_rope = np.mean((posterior_samples > rope_lower) &
                  (posterior_samples < rope_upper))

print(f"{in_rope*100:.1f}% of posterior in ROPE")
```

---

## Bayesian Estimation

### Credible Intervals

**What it is**: Interval containing parameter with X% probability

**95% Credible Interval interpretation**:
> "There is a 95% probability that the true parameter lies in this interval."

**This is what people THINK confidence intervals mean** (but don't in frequentist framework)

**Types**:

#### Equal-Tailed Interval (ETI)
- 2.5th to 97.5th percentile
- Simple to calculate
- May not include mode for skewed distributions

#### Highest Density Interval (HDI)
- Narrowest interval containing 95% of distribution
- Always includes mode
- Better for skewed distributions

**Python calculation**:
```python
import arviz as az

# Equal-tailed interval
eti = np.percentile(posterior_samples, [2.5, 97.5])

# HDI
hdi = az.hdi(posterior_samples, hdi_prob=0.95)
```

---

### Posterior Distributions

**Interpreting posterior distributions**:

1. **Central tendency**:
   - Mean: Average posterior value
   - Median: 50th percentile
   - Mode: Most probable value (MAP - Maximum A Posteriori)

2. **Uncertainty**:
   - SD: Spread of posterior
   - Credible intervals: Quantify uncertainty

3. **Shape**:
   - Symmetric: Similar to normal
   - Skewed: Asymmetric uncertainty
   - Multimodal: Multiple plausible values

**Visualization**:
```python
import matplotlib.pyplot as plt
import arviz as az

# Posterior plot with HDI
az.plot_posterior(trace, hdi_prob=0.95)

# Trace plot (check convergence)
az.plot_trace(trace)

# Forest plot (multiple parameters)
az.plot_forest(trace)
```

---

## Common Bayesian Analyses

### Bayesian T-Test

**Purpose**: Compare two groups (Bayesian alternative to t-test)

**Outputs**:
1. Posterior distribution of mean difference
2. 95% credible interval
3. Bayes Factor (BF₁₀)
4. Probability of directional hypothesis (e.g., P(μ₁ > μ₂))

**Python implementation**:
```python
import pymc as pm
import arviz as az

# Bayesian independent samples t-test
with pm.Model() as model:
    # Priors for group means
    mu1 = pm.Normal('mu1', mu=0, sigma=10)
    mu2 = pm.Normal('mu2', mu=0, sigma=10)

    # Prior for pooled standard deviation
    sigma = pm.HalfNormal('sigma', sigma=10)

    # Likelihood
    y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group1)
    y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group2)

    # Derived quantity: mean difference
    diff = pm.Deterministic('diff', mu1 - mu2)

    # Sample posterior
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

# Analyze results
print(az.summary(trace, var_names=['mu1', 'mu2', 'diff']))

# Probability that group1 > group2
prob_greater = np.mean(trace.posterior['diff'].values > 0)
print(f"P(μ₁ > μ₂) = {prob_greater:.3f}")

# Plot posterior
az.plot_posterior(trace, var_names=['diff'], ref_val=0)
```

---

### Bayesian ANOVA

**Purpose**: Compare three or more groups

**Model**:
```python
import pymc as pm

with pm.Model() as anova_model:
    # Hyperpriors
    mu_global = pm.Normal('mu_global', mu=0, sigma=10)
    sigma_between = pm.HalfNormal('sigma_between', sigma=5)
    sigma_within = pm.HalfNormal('sigma_within', sigma=5)

    # Group means (hierarchical)
    group_means = pm.Normal('group_means',
                            mu=mu_global,
                            sigma=sigma_between,
                            shape=n_groups)

    # Likelihood
    y = pm.Normal('y',
                  mu=group_means[group_idx],
                  sigma=sigma_within,
                  observed=data)

    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

# Posterior contrasts
contrast_1_2 = trace.posterior['group_means'][:,:,0] - trace.posterior['group_means'][:,:,1]
```

---

### Bayesian Correlation

**Purpose**: Estimate correlation between two variables

**Advantage**: Provides distribution of correlation values

**Python implementation**:
```python
import pymc as pm

with pm.Model() as corr_model:
    # Prior on correlation
    rho = pm.Uniform('rho', lower=-1, upper=1)

    # Convert to covariance matrix
    cov_matrix = pm.math.stack([[1, rho],
                                [rho, 1]])

    # Likelihood (bivariate normal)
    obs = pm.MvNormal('obs',
                     mu=[0, 0],
                     cov=cov_matrix,
                     observed=np.column_stack([x, y]))

    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

# Summarize correlation
print(az.summary(trace, var_names=['rho']))

# Probability that correlation is positive
prob_positive = np.mean(trace.posterior['rho'].values > 0)
```

---

### Bayesian Linear Regression

**Purpose**: Model relationship between predictors and outcome

**Advantages**:
- Uncertainty in all parameters
- Natural regularization (via priors)
- Can incorporate prior knowledge
- Credible intervals for predictions

**Python implementation**:
```python
import pymc as pm

with pm.Model() as regression_model:
    # Priors for coefficients
    alpha = pm.Normal('alpha', mu=0, sigma=10)  # Intercept
    beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
    sigma = pm.HalfNormal('sigma', sigma=10)

    # Expected value
    mu = alpha + pm.math.dot(X, beta)

    # Likelihood
    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)

    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

# Posterior predictive checks
with regression_model:
    ppc = pm.sample_posterior_predictive(trace)

az.plot_ppc(ppc)

# Predictions with uncertainty
with regression_model:
    pm.set_data({'X': X_new})
    posterior_pred = pm.sample_posterior_predictive(trace)
```

---

## Hierarchical (Multilevel) Models

**When to use**:
- Nested/clustered data (students within schools)
- Repeated measures
- Meta-analysis
- Varying effects across groups

**Key concept**: Partial pooling
- Complete pooling: Ignore groups (biased)
- No pooling: Analyze groups separately (high variance)
- Partial pooling: Borrow strength across groups (Bayesian)

**Example: Varying intercepts**:
```python
with pm.Model() as hierarchical_model:
    # Hyperpriors
    mu_global = pm.Normal('mu_global', mu=0, sigma=10)
    sigma_between = pm.HalfNormal('sigma_between', sigma=5)
    sigma_within = pm.HalfNormal('sigma_within', sigma=5)

    # Group-level intercepts
    alpha = pm.Normal('alpha',
                     mu=mu_global,
                     sigma=sigma_between,
                     shape=n_groups)

    # Likelihood
    y_obs = pm.Normal('y_obs',
                     mu=alpha[group_idx],
                     sigma=sigma_within,
                     observed=y)

    trace = pm.sample()
```

---

## Model Comparison

### Methods

#### 1. Bayes Factor
- Directly compares model evidence
- Sensitive to prior specification
- Can be computationally intensive

#### 2. Information Criteria

**WAIC (Widely Applicable Information Criterion)**:
- Bayesian analog of AIC
- Lower is better
- Accounts for effective number of parameters

**LOO (Leave-One-Out Cross-Validation)**:
- Estimates out-of-sample prediction error
- Lower is better
- More robust than WAIC

**Python calculation**:
```python
import arviz as az

# Calculate WAIC and LOO
waic = az.waic(trace)
loo = az.loo(trace)

print(f"WAIC: {waic.elpd_waic:.2f}")
print(f"LOO: {loo.elpd_loo:.2f}")

# Compare multiple models
comparison = az.compare({
    'model1': trace1,
    'model2': trace2,
    'model3': trace3
})
print(comparison)
```

---

## Checking Bayesian Models

### 1. Convergence Diagnostics

**R-hat (Gelman-Rubin statistic)**:
- Compares within-chain and between-chain variance
- Values close to 1.0 indicate convergence
- R-hat < 1.01: Good
- R-hat > 1.05: Poor convergence

**Effective Sample Size (ESS)**:
- Number of independent samples
- Higher is better
- ESS > 400 per chain recommended

**Trace plots**:
- Should look like "fuzzy caterpillar"
- No trends, no stuck chains

**Python checking**:
```python
# Automatic summary with diagnostics
print(az.summary(trace, var_names=['parameter']))

# Visual diagnostics
az.plot_trace(trace)
az.plot_rank(trace)  # Rank plots
```

---

### 2. Posterior Predictive Checks

**Purpose**: Does model generate data similar to observed data?

**Process**:
1. Generate predictions from posterior
2. Compare to actual data
3. Look for systematic discrepancies

**Python implementation**:
```python
with model:
    ppc = pm.sample_posterior_predictive(trace)

# Visual check
az.plot_ppc(ppc, num_pp_samples=100)

# Quantitative checks
obs_mean = np.mean(observed_data)
pred_means = [np.mean(sample) for sample in ppc.posterior_predictive['y_obs']]
p_value = np.mean(pred_means >= obs_mean)  # Bayesian p-value
```

---

## Reporting Bayesian Results

### Example T-Test Report

> "A Bayesian independent samples t-test was conducted to compare groups A and B. Weakly informative priors were used: Normal(0, 1) for the mean difference and Half-Cauchy(0, 1) for the pooled standard deviation. The posterior distribution of the mean difference had a mean of 5.2 (95% CI [2.3, 8.1]), indicating that Group A scored higher than Group B. The Bayes Factor BF₁₀ = 23.5 provided strong evidence for a difference between groups, and there was a 99.7% probability that Group A's mean exceeded Group B's mean."

### Example Regression Report

> "A Bayesian linear regression was fitted with weakly informative priors (Normal(0, 10) for coefficients, Half-Cauchy(0, 5) for residual SD). The model explained substantial variance (R² = 0.47, 95% CI [0.38, 0.55]). Study hours (β = 0.52, 95% CI [0.38, 0.66]) and prior GPA (β = 0.31, 95% CI [0.17, 0.45]) were credible predictors (95% CIs excluded zero). Posterior predictive checks showed good model fit. Convergence diagnostics were satisfactory (all R-hat < 1.01, ESS > 1000)."

---

## Advantages and Limitations

### Advantages

1. **Intuitive interpretation**: Direct probability statements about parameters
2. **Incorporates prior knowledge**: Uses all available information
3. **Flexible**: Handles complex models easily
4. **No p-hacking**: Can look at data as it arrives
5. **Quantifies uncertainty**: Full posterior distribution
6. **Small samples**: Works with any sample size

### Limitations

1. **Computational**: Requires MCMC sampling (can be slow)
2. **Prior specification**: Requires thought and justification
3. **Complexity**: Steeper learning curve
4. **Software**: Fewer tools than frequentist methods
5. **Communication**: May need to educate reviewers/readers

---

## Key Python Packages

- **PyMC**: Full Bayesian modeling framework
- **ArviZ**: Visualization and diagnostics
- **Bambi**: High-level interface for regression models
- **PyStan**: Python interface to Stan
- **TensorFlow Probability**: Bayesian inference with TensorFlow

---

## When to Use Bayesian Methods

**Use Bayesian when**:
- You have prior information to incorporate
- You want direct probability statements
- Sample size is small
- Model is complex (hierarchical, missing data, etc.)
- You want to update analysis as data arrives

**Frequentist may be sufficient when**:
- Standard analysis with large sample
- No prior information
- Computational resources limited
- Reviewers unfamiliar with Bayesian methods