|
|
# Benchmarking Attribution Methods with Ground Truth (Relative Feature Importance)
|
|
|
Paper by Yang and Kim, google-research. Short version @HCML workshop NeurIPS 2019, under review by AISTATS 2020
|
|
|
|
|
|
* output of interpretability methods often assessed by humans
|
|
|
* qualitative assessment vulnerable to bias and subjectivity
|
|
|
* just because explanation makes sense to human does not mean it is correct
|
|
|
* Assessment metrics should capture mismatch between interpretation and a model's rationale behind prediction
|
|
|
* Here: focus on false positive set of explanations (feature importances)
|
|
|
* set of features attributed as important, with ground-truth that they are not
|
|
|
|
|
|
**Key Idea**:
|
|
|
* Build semi-natural dataset with pixel-wise labels for feature importance and train set of models
|
|
|
* here: MSCOCO objects into MiniScenes
|
|
|
![fig1](uploads/bam_fig1.png)
|
|
|
|
|
|
Quantify to the extent to which a method incorrectly attributes unimportant features with metric contrasting attributions
|
|
|
* between models (model dependence)
|
|
|
* between inputs (input dependence and input independence)
|
|
|
|
|
|
Model contrast score (MCS):
|
|
|
* measure differences on concept attributions between models.
|
|
|
Input dependence rate (IDR):
|
|
|
* measure percentage of correctly classified images where a object is falsely attributed as more important than the scene region it replaces
|
|
|
Input independence rate (IIR):
|
|
|
*
|
|
|
|
|
|
|
|
|
## Testing for false positives
|
|
|
One way:
|
|
|
* identify unimportant features and expect their attributions to be zero.
|
|
|
|
|
|
In reality , we do not know the **absolute** feature importance, but we can control the **relative** feature importance
|
|
|
* relative between models, i.e. how important a feature is to a model relative to another model
|
|
|
* by changing the frequency certain features occur in the dataset
|
|
|
|
|
|
|