Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
I interpretability
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Avishek Anand
  • interpretability
  • Wiki
  • On the (In)fidelity and Sensitivity of Explanations

Last edited by Florian Knauf Jan 12, 2020
Page history

On the (In)fidelity and Sensitivity of Explanations

Motivation

  • Evaluation of explanation techniques is often subjective ("does this make sense to a human?")
  • Objective measures would be desirable
    • Sounder theoretical foundation
    • Would enable systematic evaluation, improvement

Notation

  • \mathbf{f}: Model to be explained
  • \mathbf{x}: Input datapoint
  • \Phi(\mathbf{f}, \mathbf{x}): saliency explanation for \mathbf{f} around \mathbf{x}
  • \mathbf{I}: Random variable describing input perturbations
  • \Phi^*: Optimal explanation function w.r.t. the proposed infidelity measure and a given \mathbf{I}
  • \mu_\mathbf{I}: Probability measure for \mathbf{I}
  • \mathbf{e}_i: coordinate basis vector

Summary

  • Proposes objective measures to evaluate two desirable properties of saliency explanations
    • Saliency explanations predict model behavior under input perturbation
    • Infidelity: Divergence between predicted and actual model behavior
    • Sensitivity: Instability of the explanation under input perturbation
  • Proposes a precise mathematical definition for these measures
    • Infidelity: parameterized by a random distribution of input perturbations (I)
    • Sensitivity: parameterized by the radius of a hypersphere around the input
  • Derives the optimal explanation function w.r.t. infidelity and a given I
  • Relates infidelity to existing explanation techniques
    • Shows they have optimal infidelity for specific choice of I
  • Proposes new explanation techniques based on optimal infidelity
    • chooses a different I
  • Shows that smoothing can (under specific circumstances) reduce both sensitivity and infidelity
  • Experiments

Infidelity

  • Probabilistic formulation of the difference between predicted behavior \mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x}) and actual behavior \mathbf{f}(\mathbf{x}) - \mathbf{f}(\mathbf{x} - \mathbf{I})
  • \text{INFD}(\Phi, \mathbf{f}, \mathbf{x}) = \mathbb{E}_{\mathbf{I} \sim \mu_\mathbf{I}}\left[\left(\mathbf{I}^T\Phi(\mathbf{f}, \mathbf{x}) - (\mathbf{f}(\mathbf{x}) - \mathbf{f}(\mathbf{x} - \mathbf{I}))\right)^2\right]
  • Behavior depends on choice of \mathbf{I}, which could be
    • deterministic or random
    • related to a baseline \mathbf{x}_0 or not
    • anything, really (which perhaps makes it less objective than advertised)
  • Optimal explanation \Phi^* w.r.t. infidelity and fixed \mathbf{I} is derived
    • not directly calculable but can be estimated with Monte Carlo sampling
  • Various common explainers are \Phi^* w.r.t. some \mathbf{I}
    • When \mathbf{I} = \epsilon \cdot \mathbf{e}_i, then \lim_{\epsilon \rightarrow 0} \Phi^*(\mathbf{f}, \mathbf{x}) = \nabla \mathbf{f}(\mathbf{x}) is the simple input gradient
    • When \mathbf{I} = \mathbf{e}_i \odot \mathbf{x}, then \Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x} is the occlusion-1 explanation (i.e., change under removal of single pixels)
    • When \mathbf{I} = \mathbf{z} \odot \mathbf{x} where \mathbf{z} is a random vector of zeroes and ones, then \Phi^*(\mathbf{f}, \mathbf{x}) \odot \mathbf{x} is the Shapley value
  • New proposed explainers are \Phi^* w.r.t. different \mathbf{I}
    • Noisy baseline: \mathbf{I} = \mathbf{x} - (\mathbf{x}_0 + \mathbf{\epsilon}) with noise vector \mathbf{\epsilon}
    • Square removal (for images): essentially Shapley values but on square patches of predefined length instead of single pixels

Explanation Sensitivity

  • Sensitivity is the maximal change of the explanation vector within a hypersphere around \mathbf{x}
  • \text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) = \max_{\left|\mathbf{y} - \mathbf{x}\right| \leq r} \left|\Phi(\mathbf{f}, \mathbf{y}) - \Phi(\mathbf{f}, \mathbf{x})\right|
  • Can be robustly estimated through Monte-Carlo sampling
  • Bounded even if the explanation is not locally Lipschitz-continuous (e.g. simple gradient in ReLU-activated networks)
  • Sensitivity is reduced by smoothing of the explanation
    • Smoothed explanation \Phi_k(\mathbf{f}, \mathbf{x}) = \int_\mathbf{z} \Phi(\mathbf{f}, \mathbf{x}) k(\mathbf{x}, \mathbf{z}) d\mathbf{z} w.r.t. a kernel function k(\mathbf{x}, \mathbf{z}) (e.g. Gaussian pdf)
    • \text{SENS}_\text{MAX}(\Phi_k, \mathbf{f}, \mathbf{x}, r) \leq \int_\mathbf{z} \text{SENS}_\text{MAX}(\Phi, \mathbf{f}, \mathbf{x}, r) k(\mathbf{x}, \mathbf{z}) d\mathbf{z}
    • If there are only some hot spots where the unsmoothed explanation changes quickly, this kind of smoothing could reduce sensitivity dramatically
    • With a Gaussian kernel, \Phi_k is Smooth-Grad
  • Under specific circumstances, smoothing can also reduce infidelity (Theorem 4.2 in the paper; equations too long to put here)

Experiments

  • Experiments measure infidelity, sensitivity for common saliency explanations and the proposed new variants
  • Noisy baseline shows particularly low infidelity, competitive sensitivity
  • Square Removal looks extremely successful w.r.t. infidelity, outdoing SHAP
  • Smooth-Grad reliably reduces both infidelity and sensitivity, albeit not impressively everywhere
    • Infidelity largely stable on ImageNet
    • Sensitivity somewhat stable on Cifar-10
  • How comparable are these values? With different \mathbf{I}, INFD is different for each setup
Clone repository
  • Concept based Explanations
  • Interpretability By Design
  • Limitations of Interpretability
  • Neurips 2019 Interpretability Roundup
  • On the (In)fidelity and Sensitivity of Explanations
  • Re inforcement Learning for NLP and Text
  • Tutorials and Introductory remarks
  • Visualizing and Measuring the Geometry of BERT
  • a benchmark for interpretability methods in deep neural networks
  • bam
  • explanations can be manipulated and geometry is to blame
  • Home