Florian Knauf · 04882494
--- a/On-the-(In)fidelity-and-Sensitivity-of-Explanations.md
+++ b/On-the-(In)fidelity-and-Sensitivity-of-Explanations.md
+# Motivation
+* Evaluation of explanation techniques is often subjective ("does this make sense to a human?")
+* Objective measures would be desirable
+  * Sounder theoretical foundation
+  * Would enable systematic evaluation, improvement
+# Notation
+* $`\mathbf{f}`$: Model to be explained
+* $`\mathbf{x}`$: Input datapoint
+* $`\Phi(\mathbf{f}, \mathbf{x})`$: saliency explanation for $`\mathbf{f}`$ around $`\mathbf{x}`$
+* $`\mathbf{I}`$: Random variable describing input perturbations
+* $`\Phi^*`$: Optimal explanation function w.r.t. the proposed infidelity measure and a given $`\mathbf{I}`$
+* $`\mu_\mathbf{I}`$: Probability measure for $`\mathbf{I}`$
+* $`\mathbf{e}_i`$: coordinate basis vector
+# Summary
+* Proposes objective measures to evaluate two desirable properties of saliency explanations
+  * Saliency explanations predict model behavior under input perturbation
+  * Infidelity: Divergence between predicted and actual model behavior
+  * Sensitivity: Instability of the explanation under input perturbation
+* Proposes a precise mathematical definition for these measures
+  * Infidelity: parameterized by a random distribution of input perturbations (**I**)
+  * Sensitivity: parameterized by the radius of a hypersphere around the input
+* Derives the optimal explanation function w.r.t. infidelity and a given **I**
+* Relates infidelity to existing explanation techniques
+  * Shows they have optimal infidelity for specific choice of **I**
+* Proposes new explanation techniques based on optimal infidelity
+  * chooses a different **I**
+* Shows that smoothing can (under specific circumstances) reduce both sensitivity and infidelity
+* Experiments
+# Infidelity
+* Probabilistic formulation of the difference between predicted behavior $`\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x})`$ and actual behavior $`\mathbf{f}(\mathbf{x}) - \mathbf{f}(\mathbf{x} - \mathbf{I})`$
+* $`\text{INFD}(\Phi, \mathbf{f}, \mathbf{x}) = \mathbb{E}_{\mathbf{I} \sim \mu_\mathbf{I}}\left[\left(\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x}) - (\mathbf{f}(\mathbf{x}) - \mathbf{f}(\mathbf{x} - \mathbf{I}))\right)^2\right]`$
+* Behavior depends on choice of $`\mathbf{I}`$, which could be
+  * deterministic or random
+  * related to a baseline $`\mathbf{x}_0`$ or not
+  * anything, really (which perhaps makes it less objective than advertised)
+* Optimal explanation $`\Phi^*`$ w.r.t. infidelity and fixed $`\mathbf{I}`$ is derived
+  * not directly calculable but can be estimated with Monte Carlo sampling
+* Various common explainers are $`\Phi^*`$ w.r.t. some $`\mathbf{I}`$
+  * When $`\mathbf{I} = \epsilon \cdot \mathbf{e}_i`$, then $`\lim_{\epsilon \rightarrow 0} \Phi^*(\mathbf{f}, \mathbf{x}) = \nabla \mathbf{f}(\mathbf{x})`$ is the simple input gradient
+  * When $`\mathbf{I} = \mathbf{e}_i \odot \mathbf{x}`$, then $`\Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x}`$ is the occlusion-1 explanation (i.e., change under removal of single pixels)
+  * When $`\mathbf{I} = \mathbf{z} \odot \mathbf{x}`$ where $`\mathbf{z}`$ is a random vector of zeroes and ones, then $`\Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x}`$ is the Shapley value
+* New proposed explainers are $`\Phi^*`$ w.r.t. different $`\mathbf{I}`$
+  * Noisy baseline: $`\mathbf{I} = \mathbf{x} - (\mathbf{x}_0 + \mathbf{\epsilon})`$ with noise vector $`\mathbf{\epsilon}`$
+  * Square removal (for images): essentially Shapley values but on square patches of predefined length instead of single pixels
+# Explanation Sensitivity
+* Sensitivity is the maximal change of the explanation vector within a hypersphere around $`\mathbf{x}`$
+* $`\text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) = \max_{\left|\mathbf{y} - \mathbf{x}\right| \leq r} \left|\Phi(\mathbf{f}, \mathbf{y}) - \Phi(\mathbf{f}, \mathbf{x})\right|`$
+* Can be robustly estimated through Monte-Carlo sampling
+* Bounded even if the explanation is not locally Lipschitz-continuous (e.g. simple gradient in ReLU-activated networks)
+* Sensitivity is reduced by smoothing of the explanation
+  * Smoothed explanation $`\Phi_k(\mathbf{f}, \mathbf{x}) = \int_\mathbf{z} \Phi(\mathbf{f}, \mathbf{x}) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}`$ w.r.t. a kernel function $`k(\mathbf{x}, \mathbf{z})`$ (e.g. Gaussian pdf) 
+  * $`\text{SENS}_\text{MAX}(\Phi_k, \mathbf{f}, \mathbf{x}, r) \leq \int_\mathbf{z} \text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}`$
+  * If there are only some hot spots where the unsmoothed explanation changes quickly, this kind of smoothing could reduce sensitivity dramatically
+  * With a Gaussian kernel, $`\Phi_k`$ is Smooth-Grad
+* Under specific circumstances, smoothing can also reduce infidelity (Theorem 4.2 in the paper; equations too long to put here)
+# Experiments
+* Experiments measure infidelity, sensitivity for common saliency explanations and the proposed new variants
+* Noisy baseline shows particularly low infidelity, competitive sensitivity
+* Square Removal looks extremely successful w.r.t. infidelity, outdoing SHAP
+* Smooth-Grad reliably reduces both infidelity and sensitivity, albeit not impressively everywhere
+  * Infidelity largely stable on ImageNet
+  * Sensitivity somewhat stable on Cifar-10
+* How comparable are these values? With different $`\mathbf{I}`$, INFD is different for each setup