17 KiB
SHAP Theoretical Foundation
This document explains the theoretical foundations of SHAP (SHapley Additive exPlanations), including Shapley values from game theory, the principles that make SHAP unique, and connections to other explanation methods.
Game Theory Origins
Shapley Values
SHAP is grounded in Shapley values, a solution concept from cooperative game theory developed by Lloyd Shapley in 1951.
Core Concept: In cooperative game theory, players collaborate to achieve a total payoff, and the question is: how should this payoff be fairly distributed among players?
Mapping to Machine Learning:
- Players → Input features
- Game → Model prediction task
- Payoff → Model output (prediction value)
- Coalition → Subset of features with known values
- Fair Distribution → Attributing prediction to features
The Shapley Value Formula
For a feature i, its Shapley value \phi_i is:
\phi_i = \sum_{S \subseteq F \setminus \{i\}} \frac{|S|!(|F|-|S|-1)!}{|F|!} [f(S \cup \{i\}) - f(S)]
Where:
Fis the set of all featuresSis a subset of features not includingif(S)is the model's expected output given only features inS|S|is the size of subsetS
Interpretation:
The Shapley value averages the marginal contribution of feature i across all possible feature coalitions (subsets). The contribution is weighted by how likely each coalition is to occur.
Key Properties of Shapley Values
1. Efficiency (Additivity):
\sum_{i=1}^{n} \phi_i = f(x) - f(\emptyset)
The sum of all SHAP values equals the difference between the model's prediction for the instance and the expected value (baseline).
This is why SHAP waterfall plots always sum to the total prediction change.
2. Symmetry:
If two features i and j contribute equally to all coalitions, then \phi_i = \phi_j.
Features with identical effects receive identical attribution.
3. Dummy:
If a feature i doesn't change the model output for any coalition, then \phi_i = 0.
Irrelevant features receive zero attribution.
4. Monotonicity: If a feature's marginal contribution increases across coalitions, its Shapley value increases.
From Game Theory to Machine Learning
The Challenge
Computing exact Shapley values requires evaluating the model on all possible feature coalitions:
- For
nfeatures, there are2^npossible coalitions - For 50 features, this is over 1 quadrillion evaluations
This exponential complexity makes exact computation intractable for most real-world models.
SHAP's Solution: Additive Feature Attribution
SHAP connects Shapley values to additive feature attribution methods, enabling efficient computation.
Additive Feature Attribution Model:
g(z') = \phi_0 + \sum_{i=1}^{M} \phi_i z'_i
Where:
gis the explanation modelz' \in \{0,1\}^Mindicates feature presence/absence\phi_iis the attribution to featurei\phi_0is the baseline (expected value)
SHAP proves that Shapley values are the only attribution values satisfying three desirable properties: local accuracy, missingness, and consistency.
SHAP Properties and Guarantees
Local Accuracy
Property: The explanation matches the model's output:
f(x) = g(x') = \phi_0 + \sum_{i=1}^{M} \phi_i x'_i
Interpretation: SHAP values exactly account for the model's prediction. This enables waterfall plots to precisely decompose predictions.
Missingness
Property: If a feature is missing (not observed), its attribution is zero:
x'_i = 0 \Rightarrow \phi_i = 0
Interpretation: Only features that are present contribute to explanations.
Consistency
Property: If a model changes so a feature's marginal contribution increases (or stays the same) for all inputs, that feature's attribution should not decrease.
Interpretation: If a feature becomes more important to the model, its SHAP value reflects this. This enables meaningful model comparisons.
SHAP as a Unified Framework
SHAP unifies several existing explanation methods by showing they're special cases of Shapley values under specific assumptions.
LIME (Local Interpretable Model-agnostic Explanations)
LIME's Approach: Fit a local linear model around a prediction using perturbed samples.
Connection to SHAP: LIME approximates Shapley values but with suboptimal sample weighting. SHAP uses theoretically optimal weights derived from Shapley value formula.
Key Difference: LIME's loss function and sampling don't guarantee consistency or exact additivity; SHAP does.
DeepLIFT
DeepLIFT's Approach: Backpropagate contributions through neural networks by comparing to reference activations.
Connection to SHAP: DeepExplainer uses DeepLIFT but averages over multiple reference samples to approximate conditional expectations, yielding Shapley values.
Layer-Wise Relevance Propagation (LRP)
LRP's Approach: Decompose neural network predictions by propagating relevance scores backward through layers.
Connection to SHAP: LRP is a special case of SHAP with specific propagation rules. SHAP generalizes these rules with Shapley value theory.
Integrated Gradients
Integrated Gradients' Approach: Integrate gradients along path from baseline to input.
Connection to SHAP: When using a single reference point, Integrated Gradients approximates SHAP values for smooth models.
SHAP Computation Methods
Different SHAP explainers use specialized algorithms to compute Shapley values efficiently for specific model types.
Tree SHAP (TreeExplainer)
Innovation: Exploits tree structure to compute exact Shapley values in polynomial time instead of exponential.
Algorithm:
- Traverses each tree path from root to leaf
- Computes feature contributions using tree splits and weights
- Aggregates across all trees in ensemble
Complexity: O(TLD^2) where T = number of trees, L = max leaves, D = max depth
Key Advantage: Exact Shapley values computed efficiently for tree-based models (XGBoost, LightGBM, Random Forest, etc.)
Kernel SHAP (KernelExplainer)
Innovation: Uses weighted linear regression to estimate Shapley values for any model.
Algorithm:
- Samples coalitions (feature subsets) according to Shapley kernel weights
- Evaluates model on each coalition (missing features replaced by background values)
- Fits weighted linear model to estimate feature attributions
Complexity: O(n \cdot 2^M) but approximates with fewer samples
Key Advantage: Model-agnostic; works with any prediction function
Trade-off: Slower than specialized explainers; approximate rather than exact
Deep SHAP (DeepExplainer)
Innovation: Combines DeepLIFT with Shapley value sampling.
Algorithm:
- Computes DeepLIFT attributions for each reference sample
- Averages attributions across multiple reference samples
- Approximates conditional expectations:
E[f(x) | x_S]
Complexity: O(n \cdot m) where m = number of reference samples
Key Advantage: Efficiently approximates Shapley values for deep neural networks
Linear SHAP (LinearExplainer)
Innovation: Closed-form Shapley values for linear models.
Algorithm:
- For independent features:
\phi_i = w_i \cdot (x_i - E[x_i]) - For correlated features: Adjusts for feature covariance
Complexity: O(n) - nearly instantaneous
Key Advantage: Exact Shapley values with minimal computation
Understanding Conditional Expectations
The Core Challenge
Computing f(S) (model output given only features in S) requires handling missing features.
Question: How should we represent "missing" features when the model requires all features as input?
Two Approaches
1. Interventional (Marginal) Approach:
- Replace missing features with values from background dataset
- Estimates:
E[f(x) | x_S]by marginalizing overx_{\bar{S}} - Interpretation: "What would the model predict if we didn't know features
\bar{S}?"
2. Observational (Conditional) Approach:
- Use conditional distribution:
E[f(x) | x_S = x_S^*] - Accounts for feature dependencies
- Interpretation: "What would the model predict for similar instances with features
S = x_S^*?"
Trade-offs:
- Interventional: Simpler, assumes feature independence, matches causal interpretation
- Observational: More accurate for correlated features, requires conditional distribution estimation
TreeExplainer supports both via feature_perturbation parameter.
Baseline (Expected Value) Selection
The baseline \phi_0 = E[f(x)] represents the model's average prediction.
Computing the Baseline
For TreeExplainer:
- With background data: Average prediction on background dataset
- With tree_path_dependent: Weighted average using tree leaf distributions
For DeepExplainer / KernelExplainer:
- Average prediction on background samples
Importance of Baseline
- SHAP values measure deviation from baseline
- Different baselines → different SHAP values (but still sum correctly)
- Choose baseline representative of "typical" or "neutral" input
- Common choices: Training set mean, median, or mode
Interpreting SHAP Values
Units and Scale
SHAP values have the same units as the model output:
- Regression: Same units as target variable (dollars, temperature, etc.)
- Classification (log-odds): Log-odds units
- Classification (probability): Probability units (if model output transformed)
Magnitude: Higher absolute SHAP value = stronger feature impact
Sign:
- Positive SHAP value = Feature pushes prediction higher
- Negative SHAP value = Feature pushes prediction lower
Additive Decomposition
For a prediction f(x):
f(x) = E[f(X)] + \sum_{i=1}^{n} \phi_i(x)
Example:
- Expected value (baseline): 0.3
- SHAP values: {Age: +0.15, Income: +0.10, Education: -0.05}
- Prediction:
0.3 + 0.15 + 0.10 - 0.05 = 0.50
Global vs. Local Importance
Local (Instance-level):
- SHAP values for single prediction:
\phi_i(x) - Explains: "Why did the model predict
f(x)for this instance?" - Visualization: Waterfall, force plots
Global (Dataset-level):
- Average absolute SHAP values:
E[|\phi_i(x)|] - Explains: "Which features are most important overall?"
- Visualization: Beeswarm, bar plots
Key Insight: Global importance is the aggregation of local importances, maintaining consistency between instance and dataset explanations.
SHAP vs. Other Feature Importance Methods
Comparison with Permutation Importance
Permutation Importance:
- Shuffles a feature and measures accuracy drop
- Global metric only (no instance-level explanations)
- Can be misleading with correlated features
SHAP:
- Provides both local and global importance
- Handles feature correlations through coalitional averaging
- Consistent: Additive property guarantees sum to prediction
Comparison with Feature Coefficients (Linear Models)
Feature Coefficients (w_i):
- Measure impact per unit change in feature
- Don't account for feature scale or distribution
SHAP for Linear Models:
\phi_i = w_i \cdot (x_i - E[x_i])- Accounts for feature value relative to average
- More interpretable for comparing features with different units/scales
Comparison with Tree Feature Importance (Gini/Split-based)
Gini/Split Importance:
- Based on training process (purity gain or frequency of splits)
- Biased toward high-cardinality features
- No instance-level explanations
- Can be misleading (importance ≠ predictive power)
SHAP (Tree SHAP):
- Based on model output (prediction behavior)
- Fair attribution through Shapley values
- Provides instance-level explanations
- Consistent and theoretically grounded
Interactions and Higher-Order Effects
SHAP Interaction Values
Standard SHAP captures main effects. SHAP interaction values capture pairwise interactions.
Formula for Interaction:
\phi_{i,j} = \sum_{S \subseteq F \setminus \{i,j\}} \frac{|S|!(|F|-|S|-2)!}{2(|F|-1)!} \Delta_{ij}(S)
Where \Delta_{ij}(S) is the interaction effect of features i and j given coalition S.
Interpretation:
\phi_{i,i}: Main effect of featurei\phi_{i,j}(i \neq j): Interaction effect between featuresiandj
Property:
\phi_i = \phi_{i,i} + \sum_{j \neq i} \phi_{i,j}
Main SHAP value equals main effect plus half of all pairwise interactions involving feature i.
Computing Interactions
TreeExplainer supports exact interaction computation:
explainer = shap.TreeExplainer(model)
shap_interaction_values = explainer.shap_interaction_values(X)
Limitation: Exponentially complex for other explainers (only practical for tree models)
Theoretical Limitations and Considerations
Computational Complexity
Exact Computation: O(2^n) - intractable for large n
Specialized Algorithms:
- Tree SHAP:
O(TLD^2)- efficient for trees - Deep SHAP, Kernel SHAP: Approximations required
Implication: For non-tree models with many features, explanations may be approximate.
Feature Independence Assumption
Kernel SHAP and Basic Implementation: Assume features can be independently manipulated
Challenge: Real features are often correlated (e.g., height and weight)
Solutions:
- Use observational approach (conditional expectations)
- TreeExplainer with correlation-aware perturbation
- Feature grouping for highly correlated features
Out-of-Distribution Samples
Issue: Creating coalitions by replacing features may create unrealistic samples (outside training distribution)
Example: Setting "Age=5" and "Has PhD=Yes" simultaneously
Implication: SHAP values reflect model behavior on potentially unrealistic inputs
Mitigation: Use observational approach or carefully selected background data
Causality
SHAP measures association, not causation
SHAP answers: "How does the model's prediction change with this feature?" SHAP does NOT answer: "What would happen if we changed this feature in reality?"
Example:
- SHAP: "Hospital stay length increases prediction of mortality" (association)
- Causality: "Longer hospital stays cause higher mortality" (incorrect!)
Implication: Use domain knowledge to interpret SHAP causally; SHAP alone doesn't establish causation.
Advanced Theoretical Topics
SHAP as Optimal Credit Allocation
SHAP is the unique attribution method satisfying:
- Local accuracy: Explanation matches model
- Missingness: Absent features have zero attribution
- Consistency: Attribution reflects feature importance changes
Proof: Lundberg & Lee (2017) showed Shapley values are the only solution satisfying these axioms.
Connection to Functional ANOVA
SHAP values correspond to first-order terms in functional ANOVA decomposition:
f(x) = f_0 + \sum_i f_i(x_i) + \sum_{i,j} f_{ij}(x_i, x_j) + ...
Where f_i(x_i) captures main effect of feature i, and \phi_i \approx f_i(x_i).
Relationship to Sensitivity Analysis
SHAP generalizes sensitivity analysis:
- Sensitivity Analysis:
\frac{\partial f}{\partial x_i}(local gradient) - SHAP: Integrated sensitivity over feature coalition space
Gradient-based methods (GradientExplainer, Integrated Gradients) approximate SHAP using derivatives.
Practical Implications of Theory
Why Use SHAP?
- Theoretical Guarantees: Only method with consistency, local accuracy, and missingness
- Unified Framework: Connects and generalizes multiple explanation methods
- Additive Decomposition: Predictions precisely decompose into feature contributions
- Model Comparison: Consistency enables comparing feature importance across models
- Versatility: Works with any model type (with appropriate explainer)
When to Be Cautious
- Computational Cost: May be slow for complex models without specialized explainers
- Feature Correlation: Standard approaches may create unrealistic samples
- Interpretation: Requires understanding baseline, units, and assumptions
- Causality: SHAP doesn't imply causation; use domain knowledge
- Approximations: Non-tree methods use approximations; understand accuracy trade-offs
References and Further Reading
Foundational Papers:
- Shapley, L. S. (1951). "A value for n-person games"
- Lundberg, S. M., & Lee, S. I. (2017). "A Unified Approach to Interpreting Model Predictions" (NeurIPS)
- Lundberg, S. M., et al. (2020). "From local explanations to global understanding with explainable AI for trees" (Nature Machine Intelligence)
Key Concepts:
- Cooperative game theory and Shapley values
- Additive feature attribution methods
- Conditional expectation estimation
- Tree SHAP algorithm and polynomial-time computation
This theoretical foundation explains why SHAP is a principled, versatile, and powerful tool for model interpretation.