Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
I interpretability
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Avishek Anand
  • interpretability
  • Wiki
  • bam

Last edited by MaximilianIdahl Jan 05, 2020
Page history

bam

Benchmarking Attribution Methods with Ground Truth (Relative Feature Importance)

Paper by Yang and Kim, google-research. Short version @HCML workshop NeurIPS 2019, under review by AISTATS 2020

  • output of interpretability methods often assessed by humans
    • qualitative assessment vulnerable to bias and subjectivity
    • just because explanation makes sense to human does not mean it is correct
  • Assessment metrics should capture mismatch between interpretation and a model's rationale behind prediction
  • Here: focus on false positive set of explanations (feature importances)
    • set of features attributed as important, with ground-truth that they are not

Key Idea:

  • Build semi-natural dataset with pixel-wise labels for feature importance and train set of models
    • here: MSCOCO objects into MiniScenes fig1

Quantify to the extent to which a method incorrectly attributes unimportant features with metric contrasting attributions

  • between models (model dependence)
  • between inputs (input dependence and input independence)

Model contrast score (MCS):

  • measure differences on concept attributions between models. Input dependence rate (IDR):
  • measure percentage of correctly classified images where a object is falsely attributed as more important than the scene region it replaces Input independence rate (IIR):

Testing for false positives

One way:

  • identify unimportant features and expect their attributions to be zero.

In reality , we do not know the absolute feature importance, but we can control the relative feature importance

  • relative between models, i.e. how important a feature is to a model relative to another model
  • by changing the frequency certain features occur in the dataset
Clone repository
  • Concept based Explanations
  • Interpretability By Design
  • Limitations of Interpretability
  • Neurips 2019 Interpretability Roundup
  • On the (In)fidelity and Sensitivity of Explanations
  • Re inforcement Learning for NLP and Text
  • Tutorials and Introductory remarks
  • Visualizing and Measuring the Geometry of BERT
  • a benchmark for interpretability methods in deep neural networks
  • bam
  • explanations can be manipulated and geometry is to blame
  • Home