Files
2025-11-30 08:30:10 +08:00

321 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PyMC Distributions Reference
This reference provides a comprehensive catalog of probability distributions available in PyMC, organized by category. Use this to select appropriate distributions for priors and likelihoods when building Bayesian models.
## Continuous Distributions
Continuous distributions define probability densities over real-valued domains.
### Common Continuous Distributions
**`pm.Normal(name, mu, sigma)`**
- Normal (Gaussian) distribution
- Parameters: `mu` (mean), `sigma` (standard deviation)
- Support: (-∞, ∞)
- Common uses: Default prior for unbounded parameters, likelihood for continuous data with additive noise
**`pm.HalfNormal(name, sigma)`**
- Half-normal distribution (positive half of normal)
- Parameters: `sigma` (standard deviation)
- Support: [0, ∞)
- Common uses: Prior for scale/standard deviation parameters
**`pm.Uniform(name, lower, upper)`**
- Uniform distribution
- Parameters: `lower`, `upper` (bounds)
- Support: [lower, upper]
- Common uses: Weakly informative prior when parameter must be bounded
**`pm.Beta(name, alpha, beta)`**
- Beta distribution
- Parameters: `alpha`, `beta` (shape parameters)
- Support: [0, 1]
- Common uses: Prior for probabilities and proportions
**`pm.Gamma(name, alpha, beta)`**
- Gamma distribution
- Parameters: `alpha` (shape), `beta` (rate)
- Support: (0, ∞)
- Common uses: Prior for positive parameters, rate parameters
**`pm.Exponential(name, lam)`**
- Exponential distribution
- Parameters: `lam` (rate parameter)
- Support: [0, ∞)
- Common uses: Prior for scale parameters, waiting times
**`pm.LogNormal(name, mu, sigma)`**
- Log-normal distribution
- Parameters: `mu`, `sigma` (parameters of underlying normal)
- Support: (0, ∞)
- Common uses: Prior for positive parameters with multiplicative effects
**`pm.StudentT(name, nu, mu, sigma)`**
- Student's t-distribution
- Parameters: `nu` (degrees of freedom), `mu` (location), `sigma` (scale)
- Support: (-∞, ∞)
- Common uses: Robust alternative to normal for outlier-resistant models
**`pm.Cauchy(name, alpha, beta)`**
- Cauchy distribution
- Parameters: `alpha` (location), `beta` (scale)
- Support: (-∞, ∞)
- Common uses: Heavy-tailed alternative to normal
### Specialized Continuous Distributions
**`pm.Laplace(name, mu, b)`** - Laplace (double exponential) distribution
**`pm.AsymmetricLaplace(name, kappa, mu, b)`** - Asymmetric Laplace distribution
**`pm.InverseGamma(name, alpha, beta)`** - Inverse gamma distribution
**`pm.Weibull(name, alpha, beta)`** - Weibull distribution for reliability analysis
**`pm.Logistic(name, mu, s)`** - Logistic distribution
**`pm.LogitNormal(name, mu, sigma)`** - Logit-normal distribution for (0,1) support
**`pm.Pareto(name, alpha, m)`** - Pareto distribution for power-law phenomena
**`pm.ChiSquared(name, nu)`** - Chi-squared distribution
**`pm.ExGaussian(name, mu, sigma, nu)`** - Exponentially modified Gaussian
**`pm.VonMises(name, mu, kappa)`** - Von Mises (circular normal) distribution
**`pm.SkewNormal(name, mu, sigma, alpha)`** - Skew-normal distribution
**`pm.Triangular(name, lower, c, upper)`** - Triangular distribution
**`pm.Gumbel(name, mu, beta)`** - Gumbel distribution for extreme values
**`pm.Rice(name, nu, sigma)`** - Rice (Rician) distribution
**`pm.Moyal(name, mu, sigma)`** - Moyal distribution
**`pm.Kumaraswamy(name, a, b)`** - Kumaraswamy distribution (Beta alternative)
**`pm.Interpolated(name, x_points, pdf_points)`** - Custom distribution from interpolation
## Discrete Distributions
Discrete distributions define probabilities over integer-valued domains.
### Common Discrete Distributions
**`pm.Bernoulli(name, p)`**
- Bernoulli distribution (binary outcome)
- Parameters: `p` (success probability)
- Support: {0, 1}
- Common uses: Binary classification, coin flips
**`pm.Binomial(name, n, p)`**
- Binomial distribution
- Parameters: `n` (number of trials), `p` (success probability)
- Support: {0, 1, ..., n}
- Common uses: Number of successes in fixed trials
**`pm.Poisson(name, mu)`**
- Poisson distribution
- Parameters: `mu` (rate parameter)
- Support: {0, 1, 2, ...}
- Common uses: Count data, rates, occurrences
**`pm.Categorical(name, p)`**
- Categorical distribution
- Parameters: `p` (probability vector)
- Support: {0, 1, ..., K-1}
- Common uses: Multi-class classification
**`pm.DiscreteUniform(name, lower, upper)`**
- Discrete uniform distribution
- Parameters: `lower`, `upper` (bounds)
- Support: {lower, ..., upper}
- Common uses: Uniform prior over finite integers
**`pm.NegativeBinomial(name, mu, alpha)`**
- Negative binomial distribution
- Parameters: `mu` (mean), `alpha` (dispersion)
- Support: {0, 1, 2, ...}
- Common uses: Overdispersed count data
**`pm.Geometric(name, p)`**
- Geometric distribution
- Parameters: `p` (success probability)
- Support: {0, 1, 2, ...}
- Common uses: Number of failures before first success
### Specialized Discrete Distributions
**`pm.BetaBinomial(name, alpha, beta, n)`** - Beta-binomial (overdispersed binomial)
**`pm.HyperGeometric(name, N, k, n)`** - Hypergeometric distribution
**`pm.DiscreteWeibull(name, q, beta)`** - Discrete Weibull distribution
**`pm.OrderedLogistic(name, eta, cutpoints)`** - Ordered logistic for ordinal data
**`pm.OrderedProbit(name, eta, cutpoints)`** - Ordered probit for ordinal data
## Multivariate Distributions
Multivariate distributions define joint probability distributions over vector-valued random variables.
### Common Multivariate Distributions
**`pm.MvNormal(name, mu, cov)`**
- Multivariate normal distribution
- Parameters: `mu` (mean vector), `cov` (covariance matrix)
- Common uses: Correlated continuous variables, Gaussian processes
**`pm.Dirichlet(name, a)`**
- Dirichlet distribution
- Parameters: `a` (concentration parameters)
- Support: Simplex (sums to 1)
- Common uses: Prior for probability vectors, topic modeling
**`pm.Multinomial(name, n, p)`**
- Multinomial distribution
- Parameters: `n` (number of trials), `p` (probability vector)
- Common uses: Count data across multiple categories
**`pm.MvStudentT(name, nu, mu, cov)`**
- Multivariate Student's t-distribution
- Parameters: `nu` (degrees of freedom), `mu` (location), `cov` (scale matrix)
- Common uses: Robust multivariate modeling
### Specialized Multivariate Distributions
**`pm.LKJCorr(name, n, eta)`** - LKJ correlation matrix prior (for correlation matrices)
**`pm.LKJCholeskyCov(name, n, eta, sd_dist)`** - LKJ prior with Cholesky decomposition
**`pm.Wishart(name, nu, V)`** - Wishart distribution (for covariance matrices)
**`pm.InverseWishart(name, nu, V)`** - Inverse Wishart distribution
**`pm.MatrixNormal(name, mu, rowcov, colcov)`** - Matrix normal distribution
**`pm.KroneckerNormal(name, mu, covs, sigma)`** - Kronecker-structured normal
**`pm.CAR(name, mu, W, alpha, tau)`** - Conditional autoregressive (spatial)
**`pm.ICAR(name, W, sigma)`** - Intrinsic conditional autoregressive (spatial)
## Mixture Distributions
Mixture distributions combine multiple component distributions.
**`pm.Mixture(name, w, comp_dists)`**
- General mixture distribution
- Parameters: `w` (weights), `comp_dists` (component distributions)
- Common uses: Clustering, multi-modal data
**`pm.NormalMixture(name, w, mu, sigma)`**
- Mixture of normal distributions
- Common uses: Mixture of Gaussians clustering
### Zero-Inflated and Hurdle Models
**`pm.ZeroInflatedPoisson(name, psi, mu)`** - Excess zeros in count data
**`pm.ZeroInflatedBinomial(name, psi, n, p)`** - Zero-inflated binomial
**`pm.ZeroInflatedNegativeBinomial(name, psi, mu, alpha)`** - Zero-inflated negative binomial
**`pm.HurdlePoisson(name, psi, mu)`** - Hurdle Poisson (two-part model)
**`pm.HurdleGamma(name, psi, alpha, beta)`** - Hurdle gamma
**`pm.HurdleLogNormal(name, psi, mu, sigma)`** - Hurdle log-normal
## Time Series Distributions
Distributions designed for temporal data and sequential modeling.
**`pm.AR(name, rho, sigma, init_dist)`**
- Autoregressive process
- Parameters: `rho` (AR coefficients), `sigma` (innovation std), `init_dist` (initial distribution)
- Common uses: Time series modeling, sequential data
**`pm.GaussianRandomWalk(name, mu, sigma, init_dist)`**
- Gaussian random walk
- Parameters: `mu` (drift), `sigma` (step size), `init_dist` (initial value)
- Common uses: Cumulative processes, random walk priors
**`pm.MvGaussianRandomWalk(name, mu, cov, init_dist)`**
- Multivariate Gaussian random walk
**`pm.GARCH11(name, omega, alpha_1, beta_1)`**
- GARCH(1,1) volatility model
- Common uses: Financial time series, volatility modeling
**`pm.EulerMaruyama(name, dt, sde_fn, sde_pars, init_dist)`**
- Stochastic differential equation via Euler-Maruyama discretization
- Common uses: Continuous-time processes
## Special Distributions
**`pm.Deterministic(name, var)`**
- Deterministic transformation (not a random variable)
- Use for computed quantities derived from other variables
**`pm.Potential(name, logp)`**
- Add arbitrary log-probability contribution
- Use for custom likelihood components or constraints
**`pm.Flat(name)`**
- Improper flat prior (constant density)
- Use sparingly; can cause sampling issues
**`pm.HalfFlat(name)`**
- Improper flat prior on positive reals
- Use sparingly; can cause sampling issues
## Distribution Modifiers
**`pm.Truncated(name, dist, lower, upper)`**
- Truncate any distribution to specified bounds
**`pm.Censored(name, dist, lower, upper)`**
- Handle censored observations (observed bounds, not exact values)
**`pm.CustomDist(name, ..., logp, random)`**
- Define custom distributions with user-specified log-probability and random sampling functions
**`pm.Simulator(name, fn, params, ...)`**
- Custom distributions via simulation (for likelihood-free inference)
## Usage Tips
### Choosing Priors
1. **Scale parameters** (σ, τ): Use `HalfNormal`, `HalfCauchy`, `Exponential`, or `Gamma`
2. **Probabilities**: Use `Beta` or `Uniform(0, 1)`
3. **Unbounded parameters**: Use `Normal` or `StudentT` (for robustness)
4. **Positive parameters**: Use `LogNormal`, `Gamma`, or `Exponential`
5. **Correlation matrices**: Use `LKJCorr`
6. **Count data**: Use `Poisson` or `NegativeBinomial` (for overdispersion)
### Shape Broadcasting
PyMC distributions support NumPy-style broadcasting. Use the `shape` parameter to create vectors or arrays of random variables:
```python
# Vector of 5 independent normals
beta = pm.Normal('beta', mu=0, sigma=1, shape=5)
# 3x4 matrix of independent gammas
tau = pm.Gamma('tau', alpha=2, beta=1, shape=(3, 4))
```
### Using dims for Named Dimensions
Instead of shape, use `dims` for more readable models:
```python
with pm.Model(coords={'predictors': ['age', 'income', 'education']}) as model:
beta = pm.Normal('beta', mu=0, sigma=1, dims='predictors')
```