Files
gh-k-dense-ai-claude-scient…/skills/statistical-analysis/references/bayesian_statistics.md
2025-11-30 08:30:10 +08:00

662 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Bayesian Statistical Analysis
This document provides guidance on conducting and interpreting Bayesian statistical analyses, which offer an alternative framework to frequentist (classical) statistics.
## Bayesian vs. Frequentist Philosophy
### Fundamental Differences
| Aspect | Frequentist | Bayesian |
|--------|-------------|----------|
| **Probability interpretation** | Long-run frequency of events | Degree of belief/uncertainty |
| **Parameters** | Fixed but unknown | Random variables with distributions |
| **Inference** | Based on sampling distributions | Based on posterior distributions |
| **Primary output** | p-values, confidence intervals | Posterior probabilities, credible intervals |
| **Prior information** | Not formally incorporated | Explicitly incorporated via priors |
| **Hypothesis testing** | Reject/fail to reject null | Probability of hypotheses given data |
| **Sample size** | Often requires minimum | Can work with any sample size |
| **Interpretation** | Indirect (probability of data given H₀) | Direct (probability of hypothesis given data) |
### Key Question Difference
**Frequentist**: "If the null hypothesis is true, what is the probability of observing data this extreme or more extreme?"
**Bayesian**: "Given the observed data, what is the probability that the hypothesis is true?"
The Bayesian question is more intuitive and directly addresses what researchers want to know.
---
## Bayes' Theorem
**Formula**:
```
P(θ|D) = P(D|θ) × P(θ) / P(D)
```
**In words**:
```
Posterior = Likelihood × Prior / Evidence
```
Where:
- **θ (theta)**: Parameter of interest (e.g., mean difference, correlation)
- **D**: Observed data
- **P(θ|D)**: Posterior distribution (belief about θ after seeing data)
- **P(D|θ)**: Likelihood (probability of data given θ)
- **P(θ)**: Prior distribution (belief about θ before seeing data)
- **P(D)**: Marginal likelihood/evidence (normalizing constant)
---
## Prior Distributions
### Types of Priors
#### 1. Informative Priors
**When to use**: When you have substantial prior knowledge from:
- Previous studies
- Expert knowledge
- Theory
- Pilot data
**Example**: Meta-analysis shows effect size d ≈ 0.40, SD = 0.15
- Prior: Normal(0.40, 0.15)
**Advantages**:
- Incorporates existing knowledge
- More efficient (smaller samples needed)
- Can stabilize estimates with small data
**Disadvantages**:
- Subjective (but subjectivity can be strength)
- Must be justified and transparent
- May be controversial if strong prior conflicts with data
---
#### 2. Weakly Informative Priors
**When to use**: Default choice for most applications
**Characteristics**:
- Regularizes estimates (prevents extreme values)
- Has minimal influence on posterior with moderate data
- Prevents computational issues
**Example priors**:
- Effect size: Normal(0, 1) or Cauchy(0, 0.707)
- Variance: Half-Cauchy(0, 1)
- Correlation: Uniform(-1, 1) or Beta(2, 2)
**Advantages**:
- Balances objectivity and regularization
- Computationally stable
- Broadly acceptable
---
#### 3. Non-Informative (Flat/Uniform) Priors
**When to use**: When attempting to be "objective"
**Example**: Uniform(-∞, ∞) for any value
**⚠️ Caution**:
- Can lead to improper posteriors
- May produce non-sensible results
- Not truly "non-informative" (still makes assumptions)
- Often not recommended in modern Bayesian practice
**Better alternative**: Use weakly informative priors
---
### Prior Sensitivity Analysis
**Always conduct**: Test how results change with different priors
**Process**:
1. Fit model with default/planned prior
2. Fit model with more diffuse prior
3. Fit model with more concentrated prior
4. Compare posterior distributions
**Reporting**:
- If results are similar: Evidence is robust
- If results differ substantially: Data are not strong enough to overwhelm prior
**Python example**:
```python
import pymc as pm
# Model with different priors
priors = [
('weakly_informative', pm.Normal.dist(0, 1)),
('diffuse', pm.Normal.dist(0, 10)),
('informative', pm.Normal.dist(0.5, 0.3))
]
results = {}
for name, prior in priors:
with pm.Model():
effect = pm.Normal('effect', mu=prior.mu, sigma=prior.sigma)
# ... rest of model
trace = pm.sample()
results[name] = trace
```
---
## Bayesian Hypothesis Testing
### Bayes Factor (BF)
**What it is**: Ratio of evidence for two competing hypotheses
**Formula**:
```
BF₁₀ = P(D|H₁) / P(D|H₀)
```
**Interpretation**:
| BF₁₀ | Evidence |
|------|----------|
| >100 | Decisive for H₁ |
| 30-100 | Very strong for H₁ |
| 10-30 | Strong for H₁ |
| 3-10 | Moderate for H₁ |
| 1-3 | Anecdotal for H₁ |
| 1 | No evidence |
| 1/3-1 | Anecdotal for H₀ |
| 1/10-1/3 | Moderate for H₀ |
| 1/30-1/10 | Strong for H₀ |
| 1/100-1/30 | Very strong for H₀ |
| <1/100 | Decisive for H₀ |
**Advantages over p-values**:
1. Can provide evidence for null hypothesis
2. Not dependent on sampling intentions (no "peeking" problem)
3. Directly quantifies evidence
4. Can be updated with more data
**Python calculation**:
```python
import pingouin as pg
# Note: Limited BF support in Python
# Better options: R packages (BayesFactor), JASP software
# Approximate BF from t-statistic
# Using Jeffreys-Zellner-Siow prior
from scipy import stats
def bf_from_t(t, n1, n2, r_scale=0.707):
"""
Approximate Bayes Factor from t-statistic
r_scale: Cauchy prior scale (default 0.707 for medium effect)
"""
# This is simplified; use dedicated packages for accurate calculation
df = n1 + n2 - 2
# Implementation requires numerical integration
pass
```
---
### Region of Practical Equivalence (ROPE)
**Purpose**: Define range of negligible effect sizes
**Process**:
1. Define ROPE (e.g., d ∈ [-0.1, 0.1] for negligible effects)
2. Calculate % of posterior inside ROPE
3. Make decision:
- >95% in ROPE: Accept practical equivalence
- >95% outside ROPE: Reject equivalence
- Otherwise: Inconclusive
**Advantage**: Directly tests for practical significance
**Python example**:
```python
# Define ROPE
rope_lower, rope_upper = -0.1, 0.1
# Calculate % of posterior in ROPE
in_rope = np.mean((posterior_samples > rope_lower) &
(posterior_samples < rope_upper))
print(f"{in_rope*100:.1f}% of posterior in ROPE")
```
---
## Bayesian Estimation
### Credible Intervals
**What it is**: Interval containing parameter with X% probability
**95% Credible Interval interpretation**:
> "There is a 95% probability that the true parameter lies in this interval."
**This is what people THINK confidence intervals mean** (but don't in frequentist framework)
**Types**:
#### Equal-Tailed Interval (ETI)
- 2.5th to 97.5th percentile
- Simple to calculate
- May not include mode for skewed distributions
#### Highest Density Interval (HDI)
- Narrowest interval containing 95% of distribution
- Always includes mode
- Better for skewed distributions
**Python calculation**:
```python
import arviz as az
# Equal-tailed interval
eti = np.percentile(posterior_samples, [2.5, 97.5])
# HDI
hdi = az.hdi(posterior_samples, hdi_prob=0.95)
```
---
### Posterior Distributions
**Interpreting posterior distributions**:
1. **Central tendency**:
- Mean: Average posterior value
- Median: 50th percentile
- Mode: Most probable value (MAP - Maximum A Posteriori)
2. **Uncertainty**:
- SD: Spread of posterior
- Credible intervals: Quantify uncertainty
3. **Shape**:
- Symmetric: Similar to normal
- Skewed: Asymmetric uncertainty
- Multimodal: Multiple plausible values
**Visualization**:
```python
import matplotlib.pyplot as plt
import arviz as az
# Posterior plot with HDI
az.plot_posterior(trace, hdi_prob=0.95)
# Trace plot (check convergence)
az.plot_trace(trace)
# Forest plot (multiple parameters)
az.plot_forest(trace)
```
---
## Common Bayesian Analyses
### Bayesian T-Test
**Purpose**: Compare two groups (Bayesian alternative to t-test)
**Outputs**:
1. Posterior distribution of mean difference
2. 95% credible interval
3. Bayes Factor (BF₁₀)
4. Probability of directional hypothesis (e.g., P(μ₁ > μ₂))
**Python implementation**:
```python
import pymc as pm
import arviz as az
# Bayesian independent samples t-test
with pm.Model() as model:
# Priors for group means
mu1 = pm.Normal('mu1', mu=0, sigma=10)
mu2 = pm.Normal('mu2', mu=0, sigma=10)
# Prior for pooled standard deviation
sigma = pm.HalfNormal('sigma', sigma=10)
# Likelihood
y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group1)
y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group2)
# Derived quantity: mean difference
diff = pm.Deterministic('diff', mu1 - mu2)
# Sample posterior
trace = pm.sample(2000, tune=1000, return_inferencedata=True)
# Analyze results
print(az.summary(trace, var_names=['mu1', 'mu2', 'diff']))
# Probability that group1 > group2
prob_greater = np.mean(trace.posterior['diff'].values > 0)
print(f"P(μ₁ > μ₂) = {prob_greater:.3f}")
# Plot posterior
az.plot_posterior(trace, var_names=['diff'], ref_val=0)
```
---
### Bayesian ANOVA
**Purpose**: Compare three or more groups
**Model**:
```python
import pymc as pm
with pm.Model() as anova_model:
# Hyperpriors
mu_global = pm.Normal('mu_global', mu=0, sigma=10)
sigma_between = pm.HalfNormal('sigma_between', sigma=5)
sigma_within = pm.HalfNormal('sigma_within', sigma=5)
# Group means (hierarchical)
group_means = pm.Normal('group_means',
mu=mu_global,
sigma=sigma_between,
shape=n_groups)
# Likelihood
y = pm.Normal('y',
mu=group_means[group_idx],
sigma=sigma_within,
observed=data)
trace = pm.sample(2000, tune=1000, return_inferencedata=True)
# Posterior contrasts
contrast_1_2 = trace.posterior['group_means'][:,:,0] - trace.posterior['group_means'][:,:,1]
```
---
### Bayesian Correlation
**Purpose**: Estimate correlation between two variables
**Advantage**: Provides distribution of correlation values
**Python implementation**:
```python
import pymc as pm
with pm.Model() as corr_model:
# Prior on correlation
rho = pm.Uniform('rho', lower=-1, upper=1)
# Convert to covariance matrix
cov_matrix = pm.math.stack([[1, rho],
[rho, 1]])
# Likelihood (bivariate normal)
obs = pm.MvNormal('obs',
mu=[0, 0],
cov=cov_matrix,
observed=np.column_stack([x, y]))
trace = pm.sample(2000, tune=1000, return_inferencedata=True)
# Summarize correlation
print(az.summary(trace, var_names=['rho']))
# Probability that correlation is positive
prob_positive = np.mean(trace.posterior['rho'].values > 0)
```
---
### Bayesian Linear Regression
**Purpose**: Model relationship between predictors and outcome
**Advantages**:
- Uncertainty in all parameters
- Natural regularization (via priors)
- Can incorporate prior knowledge
- Credible intervals for predictions
**Python implementation**:
```python
import pymc as pm
with pm.Model() as regression_model:
# Priors for coefficients
alpha = pm.Normal('alpha', mu=0, sigma=10) # Intercept
beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
sigma = pm.HalfNormal('sigma', sigma=10)
# Expected value
mu = alpha + pm.math.dot(X, beta)
# Likelihood
y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)
trace = pm.sample(2000, tune=1000, return_inferencedata=True)
# Posterior predictive checks
with regression_model:
ppc = pm.sample_posterior_predictive(trace)
az.plot_ppc(ppc)
# Predictions with uncertainty
with regression_model:
pm.set_data({'X': X_new})
posterior_pred = pm.sample_posterior_predictive(trace)
```
---
## Hierarchical (Multilevel) Models
**When to use**:
- Nested/clustered data (students within schools)
- Repeated measures
- Meta-analysis
- Varying effects across groups
**Key concept**: Partial pooling
- Complete pooling: Ignore groups (biased)
- No pooling: Analyze groups separately (high variance)
- Partial pooling: Borrow strength across groups (Bayesian)
**Example: Varying intercepts**:
```python
with pm.Model() as hierarchical_model:
# Hyperpriors
mu_global = pm.Normal('mu_global', mu=0, sigma=10)
sigma_between = pm.HalfNormal('sigma_between', sigma=5)
sigma_within = pm.HalfNormal('sigma_within', sigma=5)
# Group-level intercepts
alpha = pm.Normal('alpha',
mu=mu_global,
sigma=sigma_between,
shape=n_groups)
# Likelihood
y_obs = pm.Normal('y_obs',
mu=alpha[group_idx],
sigma=sigma_within,
observed=y)
trace = pm.sample()
```
---
## Model Comparison
### Methods
#### 1. Bayes Factor
- Directly compares model evidence
- Sensitive to prior specification
- Can be computationally intensive
#### 2. Information Criteria
**WAIC (Widely Applicable Information Criterion)**:
- Bayesian analog of AIC
- Lower is better
- Accounts for effective number of parameters
**LOO (Leave-One-Out Cross-Validation)**:
- Estimates out-of-sample prediction error
- Lower is better
- More robust than WAIC
**Python calculation**:
```python
import arviz as az
# Calculate WAIC and LOO
waic = az.waic(trace)
loo = az.loo(trace)
print(f"WAIC: {waic.elpd_waic:.2f}")
print(f"LOO: {loo.elpd_loo:.2f}")
# Compare multiple models
comparison = az.compare({
'model1': trace1,
'model2': trace2,
'model3': trace3
})
print(comparison)
```
---
## Checking Bayesian Models
### 1. Convergence Diagnostics
**R-hat (Gelman-Rubin statistic)**:
- Compares within-chain and between-chain variance
- Values close to 1.0 indicate convergence
- R-hat < 1.01: Good
- R-hat > 1.05: Poor convergence
**Effective Sample Size (ESS)**:
- Number of independent samples
- Higher is better
- ESS > 400 per chain recommended
**Trace plots**:
- Should look like "fuzzy caterpillar"
- No trends, no stuck chains
**Python checking**:
```python
# Automatic summary with diagnostics
print(az.summary(trace, var_names=['parameter']))
# Visual diagnostics
az.plot_trace(trace)
az.plot_rank(trace) # Rank plots
```
---
### 2. Posterior Predictive Checks
**Purpose**: Does model generate data similar to observed data?
**Process**:
1. Generate predictions from posterior
2. Compare to actual data
3. Look for systematic discrepancies
**Python implementation**:
```python
with model:
ppc = pm.sample_posterior_predictive(trace)
# Visual check
az.plot_ppc(ppc, num_pp_samples=100)
# Quantitative checks
obs_mean = np.mean(observed_data)
pred_means = [np.mean(sample) for sample in ppc.posterior_predictive['y_obs']]
p_value = np.mean(pred_means >= obs_mean) # Bayesian p-value
```
---
## Reporting Bayesian Results
### Example T-Test Report
> "A Bayesian independent samples t-test was conducted to compare groups A and B. Weakly informative priors were used: Normal(0, 1) for the mean difference and Half-Cauchy(0, 1) for the pooled standard deviation. The posterior distribution of the mean difference had a mean of 5.2 (95% CI [2.3, 8.1]), indicating that Group A scored higher than Group B. The Bayes Factor BF₁₀ = 23.5 provided strong evidence for a difference between groups, and there was a 99.7% probability that Group A's mean exceeded Group B's mean."
### Example Regression Report
> "A Bayesian linear regression was fitted with weakly informative priors (Normal(0, 10) for coefficients, Half-Cauchy(0, 5) for residual SD). The model explained substantial variance (R² = 0.47, 95% CI [0.38, 0.55]). Study hours (β = 0.52, 95% CI [0.38, 0.66]) and prior GPA (β = 0.31, 95% CI [0.17, 0.45]) were credible predictors (95% CIs excluded zero). Posterior predictive checks showed good model fit. Convergence diagnostics were satisfactory (all R-hat < 1.01, ESS > 1000)."
---
## Advantages and Limitations
### Advantages
1. **Intuitive interpretation**: Direct probability statements about parameters
2. **Incorporates prior knowledge**: Uses all available information
3. **Flexible**: Handles complex models easily
4. **No p-hacking**: Can look at data as it arrives
5. **Quantifies uncertainty**: Full posterior distribution
6. **Small samples**: Works with any sample size
### Limitations
1. **Computational**: Requires MCMC sampling (can be slow)
2. **Prior specification**: Requires thought and justification
3. **Complexity**: Steeper learning curve
4. **Software**: Fewer tools than frequentist methods
5. **Communication**: May need to educate reviewers/readers
---
## Key Python Packages
- **PyMC**: Full Bayesian modeling framework
- **ArviZ**: Visualization and diagnostics
- **Bambi**: High-level interface for regression models
- **PyStan**: Python interface to Stan
- **TensorFlow Probability**: Bayesian inference with TensorFlow
---
## When to Use Bayesian Methods
**Use Bayesian when**:
- You have prior information to incorporate
- You want direct probability statements
- Sample size is small
- Model is complex (hierarchical, missing data, etc.)
- You want to update analysis as data arrives
**Frequentist may be sufficient when**:
- Standard analysis with large sample
- No prior information
- Computational resources limited
- Reviewers unfamiliar with Bayesian methods