|
|
|
# Motivation
|
|
|
|
|
|
|
|
* Evaluation of explanation techniques is often subjective ("does this make sense to a human?")
|
|
|
|
* Objective measures would be desirable
|
|
|
|
* Sounder theoretical foundation
|
|
|
|
* Would enable systematic evaluation, improvement
|
|
|
|
|
|
|
|
# Notation
|
|
|
|
|
|
|
|
* $`\mathbf{f}`$: Model to be explained
|
|
|
|
* $`\mathbf{x}`$: Input datapoint
|
|
|
|
* $`\Phi(\mathbf{f}, \mathbf{x})`$: saliency explanation for $`\mathbf{f}`$ around $`\mathbf{x}`$
|
|
|
|
* $`\mathbf{I}`$: Random variable describing input perturbations
|
|
|
|
* $`\Phi^*`$: Optimal explanation function w.r.t. the proposed infidelity measure and a given $`\mathbf{I}`$
|
|
|
|
* $`\mu_\mathbf{I}`$: Probability measure for $`\mathbf{I}`$
|
|
|
|
* $`\mathbf{e}_i`$: coordinate basis vector
|
|
|
|
|
|
|
|
# Summary
|
|
|
|
|
|
|
|
* Proposes objective measures to evaluate two desirable properties of saliency explanations
|
|
|
|
* Saliency explanations predict model behavior under input perturbation
|
|
|
|
* Infidelity: Divergence between predicted and actual model behavior
|
|
|
|
* Sensitivity: Instability of the explanation under input perturbation
|
|
|
|
* Proposes a precise mathematical definition for these measures
|
|
|
|
* Infidelity: parameterized by a random distribution of input perturbations (**I**)
|
|
|
|
* Sensitivity: parameterized by the radius of a hypersphere around the input
|
|
|
|
* Derives the optimal explanation function w.r.t. infidelity and a given **I**
|
|
|
|
* Relates infidelity to existing explanation techniques
|
|
|
|
* Shows they have optimal infidelity for specific choice of **I**
|
|
|
|
* Proposes new explanation techniques based on optimal infidelity
|
|
|
|
* chooses a different **I**
|
|
|
|
* Shows that smoothing can (under specific circumstances) reduce both sensitivity and infidelity
|
|
|
|
* Experiments
|
|
|
|
|
|
|
|
# Infidelity
|
|
|
|
|
|
|
|
* Probabilistic formulation of the difference between predicted behavior $`\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x})`$ and actual behavior $`\mathbf{f}(\mathbf{x}) - \mathbf{f}(\mathbf{x} - \mathbf{I})`$
|
|
|
|
* $`\text{INFD}(\Phi, \mathbf{f}, \mathbf{x}) = \mathbb{E}_{\mathbf{I} \sim \mu_\mathbf{I}}\left[\left(\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x}) - (\mathbf{f}(\mathbf{x}) - \mathbf{f}(\mathbf{x} - \mathbf{I}))\right)^2\right]`$
|
|
|
|
* Behavior depends on choice of $`\mathbf{I}`$, which could be
|
|
|
|
* deterministic or random
|
|
|
|
* related to a baseline $`\mathbf{x}_0`$ or not
|
|
|
|
* anything, really (which perhaps makes it less objective than advertised)
|
|
|
|
* Optimal explanation $`\Phi^*`$ w.r.t. infidelity and fixed $`\mathbf{I}`$ is derived
|
|
|
|
* not directly calculable but can be estimated with Monte Carlo sampling
|
|
|
|
* Various common explainers are $`\Phi^*`$ w.r.t. some $`\mathbf{I}`$
|
|
|
|
* When $`\mathbf{I} = \epsilon \cdot \mathbf{e}_i`$, then $`\lim_{\epsilon \rightarrow 0} \Phi^*(\mathbf{f}, \mathbf{x}) = \nabla \mathbf{f}(\mathbf{x})`$ is the simple input gradient
|
|
|
|
* When $`\mathbf{I} = \mathbf{e}_i \odot \mathbf{x}`$, then $`\Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x}`$ is the occlusion-1 explanation (i.e., change under removal of single pixels)
|
|
|
|
* When $`\mathbf{I} = \mathbf{z} \odot \mathbf{x}`$ where $`\mathbf{z}`$ is a random vector of zeroes and ones, then $`\Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x}`$ is the Shapley value
|
|
|
|
* New proposed explainers are $`\Phi^*`$ w.r.t. different $`\mathbf{I}`$
|
|
|
|
* Noisy baseline: $`\mathbf{I} = \mathbf{x} - (\mathbf{x}_0 + \mathbf{\epsilon})`$ with noise vector $`\mathbf{\epsilon}`$
|
|
|
|
* Square removal (for images): essentially Shapley values but on square patches of predefined length instead of single pixels
|
|
|
|
|
|
|
|
# Explanation Sensitivity
|
|
|
|
|
|
|
|
* Sensitivity is the maximal change of the explanation vector within a hypersphere around $`\mathbf{x}`$
|
|
|
|
* $`\text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) = \max_{\left|\mathbf{y} - \mathbf{x}\right| \leq r} \left|\Phi(\mathbf{f}, \mathbf{y}) - \Phi(\mathbf{f}, \mathbf{x})\right|`$
|
|
|
|
* Can be robustly estimated through Monte-Carlo sampling
|
|
|
|
* Bounded even if the explanation is not locally Lipschitz-continuous (e.g. simple gradient in ReLU-activated networks)
|
|
|
|
* Sensitivity is reduced by smoothing of the explanation
|
|
|
|
* Smoothed explanation $`\Phi_k(\mathbf{f}, \mathbf{x}) = \int_\mathbf{z} \Phi(\mathbf{f}, \mathbf{x}) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}`$ w.r.t. a kernel function $`k(\mathbf{x}, \mathbf{z})`$ (e.g. Gaussian pdf)
|
|
|
|
* $`\text{SENS}_\text{MAX}(\Phi_k, \mathbf{f}, \mathbf{x}, r) \leq \int_\mathbf{z} \text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}`$
|
|
|
|
* If there are only some hot spots where the unsmoothed explanation changes quickly, this kind of smoothing could reduce sensitivity dramatically
|
|
|
|
* With a Gaussian kernel, $`\Phi_k`$ is Smooth-Grad
|
|
|
|
* Under specific circumstances, smoothing can also reduce infidelity (Theorem 4.2 in the paper; equations too long to put here)
|
|
|
|
|
|
|
|
# Experiments
|
|
|
|
|
|
|
|
* Experiments measure infidelity, sensitivity for common saliency explanations and the proposed new variants
|
|
|
|
* Noisy baseline shows particularly low infidelity, competitive sensitivity
|
|
|
|
* Square Removal looks extremely successful w.r.t. infidelity, outdoing SHAP
|
|
|
|
* Smooth-Grad reliably reduces both infidelity and sensitivity, albeit not impressively everywhere
|
|
|
|
* Infidelity largely stable on ImageNet
|
|
|
|
* Sensitivity somewhat stable on Cifar-10
|
|
|
|
* How comparable are these values? With different $`\mathbf{I}`$, INFD is different for each setup |