On the (In)fidelity and Sensitivity of Explanations
Motivation
- Evaluation of explanation techniques is often subjective ("does this make sense to a human?")
- Objective measures would be desirable
- Sounder theoretical foundation
- Would enable systematic evaluation, improvement
Notation
-
\mathbf{f}
: Model to be explained
-
\mathbf{x}
: Input datapoint
-
\Phi(\mathbf{f}, \mathbf{x})
: saliency explanation for \mathbf{f}
around \mathbf{x}
-
\mathbf{I}
: Random variable describing input perturbations
-
\Phi^*
: Optimal explanation function w.r.t. the proposed infidelity measure and a given \mathbf{I}
-
\mu_\mathbf{I}
: Probability measure for \mathbf{I}
-
\mathbf{e}_i
: coordinate basis vector
Summary
- Proposes objective measures to evaluate two desirable properties of saliency explanations
- Saliency explanations predict model behavior under input perturbation
- Infidelity: Divergence between predicted and actual model behavior
- Sensitivity: Instability of the explanation under input perturbation
- Proposes a precise mathematical definition for these measures
- Infidelity: parameterized by a random distribution of input perturbations (I)
- Sensitivity: parameterized by the radius of a hypersphere around the input
- Derives the optimal explanation function w.r.t. infidelity and a given I
- Relates infidelity to existing explanation techniques
- Shows they have optimal infidelity for specific choice of I
- Proposes new explanation techniques based on optimal infidelity
- Shows that smoothing can (under specific circumstances) reduce both sensitivity and infidelity
- Experiments
Infidelity
- Probabilistic formulation of the difference between predicted behavior
\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x})
and actual behavior \mathbf{f}(\mathbf{x}) - \mathbf{f}(\mathbf{x} - \mathbf{I})
\text{INFD}(\Phi, \mathbf{f}, \mathbf{x}) = \mathbb{E}_{\mathbf{I} \sim \mu_\mathbf{I}}\left[\left(\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x}) - (\mathbf{f}(\mathbf{x}) - \mathbf{f}(\mathbf{x} - \mathbf{I}))\right)^2\right]
- Behavior depends on choice of
\mathbf{I}
, which could be
- deterministic or random
- related to a baseline
\mathbf{x}_0
or not
- anything, really (which perhaps makes it less objective than advertised)
- Optimal explanation
\Phi^*
w.r.t. infidelity and fixed \mathbf{I}
is derived
- not directly calculable but can be estimated with Monte Carlo sampling
- Various common explainers are
\Phi^*
w.r.t. some \mathbf{I}
- When
\mathbf{I} = \epsilon \cdot \mathbf{e}_i
, then \lim_{\epsilon \rightarrow 0} \Phi^*(\mathbf{f}, \mathbf{x}) = \nabla \mathbf{f}(\mathbf{x})
is the simple input gradient
- When
\mathbf{I} = \mathbf{e}_i \odot \mathbf{x}
, then \Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x}
is the occlusion-1 explanation (i.e., change under removal of single pixels)
- When
\mathbf{I} = \mathbf{z} \odot \mathbf{x}
where \mathbf{z}
is a random vector of zeroes and ones, then \Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x}
is the Shapley value
- New proposed explainers are
\Phi^*
w.r.t. different \mathbf{I}
- Noisy baseline:
\mathbf{I} = \mathbf{x} - (\mathbf{x}_0 + \mathbf{\epsilon})
with noise vector \mathbf{\epsilon}
- Square removal (for images): essentially Shapley values but on square patches of predefined length instead of single pixels
Explanation Sensitivity
- Sensitivity is the maximal change of the explanation vector within a hypersphere around
\mathbf{x}
\text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) = \max_{\left|\mathbf{y} - \mathbf{x}\right| \leq r} \left|\Phi(\mathbf{f}, \mathbf{y}) - \Phi(\mathbf{f}, \mathbf{x})\right|
- Can be robustly estimated through Monte-Carlo sampling
- Bounded even if the explanation is not locally Lipschitz-continuous (e.g. simple gradient in ReLU-activated networks)
- Sensitivity is reduced by smoothing of the explanation
- Smoothed explanation
\Phi_k(\mathbf{f}, \mathbf{x}) = \int_\mathbf{z} \Phi(\mathbf{f}, \mathbf{x}) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}
w.r.t. a kernel function k(\mathbf{x}, \mathbf{z})
(e.g. Gaussian pdf)
\text{SENS}_\text{MAX}(\Phi_k, \mathbf{f}, \mathbf{x}, r) \leq \int_\mathbf{z} \text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}
- If there are only some hot spots where the unsmoothed explanation changes quickly, this kind of smoothing could reduce sensitivity dramatically
- With a Gaussian kernel,
\Phi_k
is Smooth-Grad
- Under specific circumstances, smoothing can also reduce infidelity (Theorem 4.2 in the paper; equations too long to put here)
Experiments
- Experiments measure infidelity, sensitivity for common saliency explanations and the proposed new variants
- Noisy baseline shows particularly low infidelity, competitive sensitivity
- Square Removal looks extremely successful w.r.t. infidelity, outdoing SHAP
- Smooth-Grad reliably reduces both infidelity and sensitivity, albeit not impressively everywhere
- Infidelity largely stable on ImageNet
- Sensitivity somewhat stable on Cifar-10
- How comparable are these values? With different
\mathbf{I}
, INFD is different for each setup