gh-k-dense-ai-claude-scient…/skills/pymc/references/sampling_inference.md

# PyMC Sampling and Inference Methods

This reference covers the sampling algorithms and inference methods available in PyMC for posterior inference.

## MCMC Sampling Methods

### Primary Sampling Function

**`pm.sample(draws=1000, tune=1000, chains=4, **kwargs)`**

The main interface for MCMC sampling in PyMC.

**Key Parameters:**
- `draws`: Number of samples to draw per chain (default: 1000)
- `tune`: Number of tuning/warmup samples (default: 1000, discarded)
- `chains`: Number of parallel chains (default: 4)
- `cores`: Number of CPU cores to use (default: all available)
- `target_accept`: Target acceptance rate for step size tuning (default: 0.8, increase to 0.9-0.95 for difficult posteriors)
- `random_seed`: Random seed for reproducibility
- `return_inferencedata`: Return ArviZ InferenceData object (default: True)
- `idata_kwargs`: Additional kwargs for InferenceData creation (e.g., `{"log_likelihood": True}` for model comparison)

**Returns:** InferenceData object containing posterior samples, sampling statistics, and diagnostics

**Example:**
```python
with pm.Model() as model:
    # ... define model ...
    idata = pm.sample(draws=2000, tune=1000, chains=4, target_accept=0.9)
```

### Sampling Algorithms

PyMC automatically selects appropriate samplers based on model structure, but you can specify algorithms manually.

#### NUTS (No-U-Turn Sampler)

**Default algorithm** for continuous parameters. Highly efficient Hamiltonian Monte Carlo variant.

- Automatically tunes step size and mass matrix
- Adaptive: explores posterior geometry during tuning
- Best for smooth, continuous posteriors
- Can struggle with high correlation or multimodality

**Manual specification:**
```python
with model:
    idata = pm.sample(step=pm.NUTS(target_accept=0.95))
```

**When to adjust:**
- Increase `target_accept` (0.9-0.99) if seeing divergences
- Use `init='adapt_diag'` for faster initialization (default)
- Use `init='jitter+adapt_diag'` for difficult initializations

#### Metropolis

General-purpose Metropolis-Hastings sampler.

- Works for both continuous and discrete variables
- Less efficient than NUTS for smooth continuous posteriors
- Useful for discrete parameters or non-differentiable models
- Requires manual tuning

**Example:**
```python
with model:
    idata = pm.sample(step=pm.Metropolis())
```

#### Slice Sampler

Slice sampling for univariate distributions.

- No tuning required
- Good for difficult univariate posteriors
- Can be slow for high dimensions

**Example:**
```python
with model:
    idata = pm.sample(step=pm.Slice())
```

#### CompoundStep

Combine different samplers for different parameters.

**Example:**
```python
with model:
    # Use NUTS for continuous params, Metropolis for discrete
    step1 = pm.NUTS([continuous_var1, continuous_var2])
    step2 = pm.Metropolis([discrete_var])
    idata = pm.sample(step=[step1, step2])
```

### Sampling Diagnostics

PyMC automatically computes diagnostics. Check these before trusting results:

#### Effective Sample Size (ESS)

Measures independent information in correlated samples.

- **Rule of thumb**: ESS > 400 per chain (1600 total for 4 chains)
- Low ESS indicates high autocorrelation
- Access via: `az.ess(idata)`

#### R-hat (Gelman-Rubin statistic)

Measures convergence across chains.

- **Rule of thumb**: R-hat < 1.01 for all parameters
- R-hat > 1.01 indicates non-convergence
- Access via: `az.rhat(idata)`

#### Divergences

Indicate regions where NUTS struggled.

- **Rule of thumb**: 0 divergences (or very few)
- Divergences suggest biased samples
- **Fix**: Increase `target_accept`, reparameterize, or use stronger priors
- Access via: `idata.sample_stats.diverging.sum()`

#### Energy Plot

Visualizes Hamiltonian Monte Carlo energy transitions.

```python
az.plot_energy(idata)
```

Good separation between energy distributions indicates healthy sampling.

### Handling Sampling Issues

#### Divergences

```python
# Increase target acceptance rate
idata = pm.sample(target_accept=0.95)

# Or reparameterize using non-centered parameterization
# Bad (centered):
mu = pm.Normal('mu', 0, 1)
sigma = pm.HalfNormal('sigma', 1)
x = pm.Normal('x', mu, sigma, observed=data)

# Good (non-centered):
mu = pm.Normal('mu', 0, 1)
sigma = pm.HalfNormal('sigma', 1)
x_offset = pm.Normal('x_offset', 0, 1, observed=(data - mu) / sigma)
```

#### Slow Sampling

```python
# Use fewer tuning steps if model is simple
idata = pm.sample(tune=500)

# Increase cores for parallelization
idata = pm.sample(cores=8, chains=8)

# Use variational inference for initialization
with model:
    approx = pm.fit()  # Run ADVI
    idata = pm.sample(start=approx.sample(return_inferencedata=False)[0])
```

#### High Autocorrelation

```python
# Increase draws
idata = pm.sample(draws=5000)

# Reparameterize to reduce correlation
# Consider using QR decomposition for regression models
```

## Variational Inference

Faster approximate inference for large models or quick exploration.

### ADVI (Automatic Differentiation Variational Inference)

**`pm.fit(n=10000, method='advi', **kwargs)`**

Approximates posterior with simpler distribution (typically mean-field Gaussian).

**Key Parameters:**
- `n`: Number of iterations (default: 10000)
- `method`: VI algorithm ('advi', 'fullrank_advi', 'svgd')
- `random_seed`: Random seed

**Returns:** Approximation object for sampling and analysis

**Example:**
```python
with model:
    approx = pm.fit(n=50000)
    # Draw samples from approximation
    idata = approx.sample(1000)
    # Or sample for MCMC initialization
    start = approx.sample(return_inferencedata=False)[0]
```

**Trade-offs:**
- **Pros**: Much faster than MCMC, scales to large data
- **Cons**: Approximate, may miss posterior structure, underestimates uncertainty

### Full-Rank ADVI

Captures correlations between parameters.

```python
with model:
    approx = pm.fit(method='fullrank_advi')
```

More accurate than mean-field but slower.

### SVGD (Stein Variational Gradient Descent)

Non-parametric variational inference.

```python
with model:
    approx = pm.fit(method='svgd', n=20000)
```

Better captures multimodality but more computationally expensive.

## Prior and Posterior Predictive Sampling

### Prior Predictive Sampling

Sample from the prior distribution (before seeing data).

**`pm.sample_prior_predictive(samples=500, **kwargs)`**

**Purpose:**
- Validate priors are reasonable
- Check implied predictions before fitting
- Ensure model generates plausible data

**Example:**
```python
with model:
    prior_pred = pm.sample_prior_predictive(samples=1000)

# Visualize prior predictions
az.plot_ppc(prior_pred, group='prior')
```

### Posterior Predictive Sampling

Sample from posterior predictive distribution (after fitting).

**`pm.sample_posterior_predictive(trace, **kwargs)`**

**Purpose:**
- Model validation via posterior predictive checks
- Generate predictions for new data
- Assess goodness-of-fit

**Example:**
```python
with model:
    # After sampling
    idata = pm.sample()

    # Add posterior predictive samples
    pm.sample_posterior_predictive(idata, extend_inferencedata=True)

# Posterior predictive check
az.plot_ppc(idata)
```

### Predictions for New Data

Update data and sample predictive distribution:

```python
with model:
    # Original model fit
    idata = pm.sample()

    # Update with new predictor values
    pm.set_data({'X': X_new})

    # Sample predictions
    post_pred_new = pm.sample_posterior_predictive(
        idata.posterior,
        var_names=['y_pred']
    )
```

## Maximum A Posteriori (MAP) Estimation

Find posterior mode (point estimate).

**`pm.find_MAP(start=None, method='L-BFGS-B', **kwargs)`**

**When to use:**
- Quick point estimates
- Initialization for MCMC
- When full posterior not needed

**Example:**
```python
with model:
    map_estimate = pm.find_MAP()
    print(map_estimate)
```

**Limitations:**
- Doesn't quantify uncertainty
- Can find local optima in multimodal posteriors
- Sensitive to prior specification

## Inference Recommendations

### Standard Workflow

1. **Start with ADVI** for quick exploration:
   ```python
   approx = pm.fit(n=20000)
   ```

2. **Run MCMC** for full inference:
   ```python
   idata = pm.sample(draws=2000, tune=1000)
   ```

3. **Check diagnostics**:
   ```python
   az.summary(idata, var_names=['~mu_log__'])  # Exclude transformed vars
   ```

4. **Sample posterior predictive**:
   ```python
   pm.sample_posterior_predictive(idata, extend_inferencedata=True)
   ```

### Choosing Inference Method

| Scenario | Recommended Method |
|----------|-------------------|
| Small-medium models, need full uncertainty | MCMC with NUTS |
| Large models, initial exploration | ADVI |
| Discrete parameters | Metropolis or marginalize |
| Hierarchical models with divergences | Non-centered parameterization + NUTS |
| Very large data | Minibatch ADVI |
| Quick point estimates | MAP or ADVI |

### Reparameterization Tricks

**Non-centered parameterization** for hierarchical models:

```python
# Centered (can cause divergences):
mu = pm.Normal('mu', 0, 10)
sigma = pm.HalfNormal('sigma', 1)
theta = pm.Normal('theta', mu, sigma, shape=n_groups)

# Non-centered (better sampling):
mu = pm.Normal('mu', 0, 10)
sigma = pm.HalfNormal('sigma', 1)
theta_offset = pm.Normal('theta_offset', 0, 1, shape=n_groups)
theta = pm.Deterministic('theta', mu + sigma * theta_offset)
```

**QR decomposition** for correlated predictors:

```python
import numpy as np

# QR decomposition
Q, R = np.linalg.qr(X)

with pm.Model():
    # Uncorrelated coefficients
    beta_tilde = pm.Normal('beta_tilde', 0, 1, shape=p)

    # Transform back to original scale
    beta = pm.Deterministic('beta', pm.math.solve(R, beta_tilde))

    mu = pm.math.dot(Q, beta_tilde)
    sigma = pm.HalfNormal('sigma', 1)
    y = pm.Normal('y', mu, sigma, observed=y_obs)
```

## Advanced Sampling

### Sequential Monte Carlo (SMC)

For complex posteriors or model evidence estimation:

```python
with model:
    idata = pm.sample_smc(draws=2000, chains=4)
```

Good for multimodal posteriors or when NUTS struggles.

### Custom Initialization

Provide starting values:

```python
start = {'mu': 0, 'sigma': 1}
with model:
    idata = pm.sample(start=start)
```

Or use MAP estimate:

```python
with model:
    start = pm.find_MAP()
    idata = pm.sample(start=start)
```