

# Motivation






* Evaluation of explanation techniques is often subjective ("does this make sense to a human?")



* Objective measures would be desirable



* Sounder theoretical foundation



* Would enable systematic evaluation, improvement






# Notation






* $`\mathbf{f}`$: Model to be explained



* $`\mathbf{x}`$: Input datapoint



* $`\Phi(\mathbf{f}, \mathbf{x})`$: saliency explanation for $`\mathbf{f}`$ around $`\mathbf{x}`$



* $`\mathbf{I}`$: Random variable describing input perturbations



* $`\Phi^*`$: Optimal explanation function w.r.t. the proposed infidelity measure and a given $`\mathbf{I}`$



* $`\mu_\mathbf{I}`$: Probability measure for $`\mathbf{I}`$



* $`\mathbf{e}_i`$: coordinate basis vector






# Summary






* Proposes objective measures to evaluate two desirable properties of saliency explanations



* Saliency explanations predict model behavior under input perturbation



* Infidelity: Divergence between predicted and actual model behavior



* Sensitivity: Instability of the explanation under input perturbation



* Proposes a precise mathematical definition for these measures



* Infidelity: parameterized by a random distribution of input perturbations (**I**)



* Sensitivity: parameterized by the radius of a hypersphere around the input



* Derives the optimal explanation function w.r.t. infidelity and a given **I**



* Relates infidelity to existing explanation techniques



* Shows they have optimal infidelity for specific choice of **I**



* Proposes new explanation techniques based on optimal infidelity



* chooses a different **I**



* Shows that smoothing can (under specific circumstances) reduce both sensitivity and infidelity



* Experiments






# Infidelity






* Probabilistic formulation of the difference between predicted behavior $`\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x})`$ and actual behavior $`\mathbf{f}(\mathbf{x})  \mathbf{f}(\mathbf{x}  \mathbf{I})`$



* $`\text{INFD}(\Phi, \mathbf{f}, \mathbf{x}) = \mathbb{E}_{\mathbf{I} \sim \mu_\mathbf{I}}\left[\left(\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x})  (\mathbf{f}(\mathbf{x})  \mathbf{f}(\mathbf{x}  \mathbf{I}))\right)^2\right]`$



* Behavior depends on choice of $`\mathbf{I}`$, which could be



* deterministic or random



* related to a baseline $`\mathbf{x}_0`$ or not



* anything, really (which perhaps makes it less objective than advertised)



* Optimal explanation $`\Phi^*`$ w.r.t. infidelity and fixed $`\mathbf{I}`$ is derived



* not directly calculable but can be estimated with Monte Carlo sampling



* Various common explainers are $`\Phi^*`$ w.r.t. some $`\mathbf{I}`$



* When $`\mathbf{I} = \epsilon \cdot \mathbf{e}_i`$, then $`\lim_{\epsilon \rightarrow 0} \Phi^*(\mathbf{f}, \mathbf{x}) = \nabla \mathbf{f}(\mathbf{x})`$ is the simple input gradient



* When $`\mathbf{I} = \mathbf{e}_i \odot \mathbf{x}`$, then $`\Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x}`$ is the occlusion1 explanation (i.e., change under removal of single pixels)



* When $`\mathbf{I} = \mathbf{z} \odot \mathbf{x}`$ where $`\mathbf{z}`$ is a random vector of zeroes and ones, then $`\Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x}`$ is the Shapley value



* New proposed explainers are $`\Phi^*`$ w.r.t. different $`\mathbf{I}`$



* Noisy baseline: $`\mathbf{I} = \mathbf{x}  (\mathbf{x}_0 + \mathbf{\epsilon})`$ with noise vector $`\mathbf{\epsilon}`$



* Square removal (for images): essentially Shapley values but on square patches of predefined length instead of single pixels






# Explanation Sensitivity






* Sensitivity is the maximal change of the explanation vector within a hypersphere around $`\mathbf{x}`$



* $`\text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) = \max_{\left\mathbf{y}  \mathbf{x}\right \leq r} \left\Phi(\mathbf{f}, \mathbf{y})  \Phi(\mathbf{f}, \mathbf{x})\right`$



* Can be robustly estimated through MonteCarlo sampling



* Bounded even if the explanation is not locally Lipschitzcontinuous (e.g. simple gradient in ReLUactivated networks)



* Sensitivity is reduced by smoothing of the explanation



* Smoothed explanation $`\Phi_k(\mathbf{f}, \mathbf{x}) = \int_\mathbf{z} \Phi(\mathbf{f}, \mathbf{x}) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}`$ w.r.t. a kernel function $`k(\mathbf{x}, \mathbf{z})`$ (e.g. Gaussian pdf)



* $`\text{SENS}_\text{MAX}(\Phi_k, \mathbf{f}, \mathbf{x}, r) \leq \int_\mathbf{z} \text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}`$



* If there are only some hot spots where the unsmoothed explanation changes quickly, this kind of smoothing could reduce sensitivity dramatically



* With a Gaussian kernel, $`\Phi_k`$ is SmoothGrad



* Under specific circumstances, smoothing can also reduce infidelity (Theorem 4.2 in the paper; equations too long to put here)






# Experiments






* Experiments measure infidelity, sensitivity for common saliency explanations and the proposed new variants



* Noisy baseline shows particularly low infidelity, competitive sensitivity



* Square Removal looks extremely successful w.r.t. infidelity, outdoing SHAP



* SmoothGrad reliably reduces both infidelity and sensitivity, albeit not impressively everywhere



* Infidelity largely stable on ImageNet



* Sensitivity somewhat stable on Cifar10



* How comparable are these values? With different $`\mathbf{I}`$, INFD is different for each setup 