|
|
# A Benchmark for Interpretability Methods in Deep Neural Networks
|
|
|
Hooker et. al, google-research
|
|
|
|
|
|
Keys:
|
|
|
* Method to measure an approximate accuracy of feature importance estimates in DNNs
|
|
|
* Popular interpretability methods for feature importance estimation are not better than random, only certain ensemble based ones are (VarGrad, SmoothGrad-Squared).
|
|
|
* Some ensembles are not better than underlying method $`\rightarrow`$ ensembling remains questionable
|
|
|
|
|
|
**ROAR** (**R**em**O**ve **A**nd **R**etrain)
|
|
|
* An approach to evaluate interpretability methods by verifying how the accuracy of a retrained model degrades as features, that have been estimated to be important, are removed.
|
|
|
* Given a trained CNN
|
|
|
* For each image (train+test):
|
|
|
* sort features (pixels) by their estimated importances
|
|
|
* replace top-$`t`$ fraction with a fixed uninformative value (per-channel mean).
|
|
|
* Train new model on modified dataset
|
|
|
* If the feature importance estimator was more accurate, the subsequent feature removel will cause higher degradation.
|
|
|
|
|
|
![figure1](uploads/roar_fig1.png)
|
|
|
|
|
|
Experiments:
|
|
|
* three large-scale image datasets (incl. ImageNet)
|
|
|
* Baseline: use random feature importance estimates, as well as sobel filter ones. Both are independent of model.
|
|
|
* *Is the interpretability method more accurate than a random guess as to which features are important?*
|
|
|
* Base feature importance estimator methods: Gradient Heatmap (GRAD), Integrated Gradients (IG), Guided Backprop (GB)
|
|
|
* Ensembling feature importance methods: SmoothGrad, SmoothGrad-Squared and VarGrad
|
|
|
* Generate modified datasets with different degradation levels $`t = [0., 10, ..., 100]`$ for each method.
|
|
|
* train models on these datasets, evaluate their performance.
|
|
|
|
|
|
|
|
|
Findings:
|
|
|
* Model performance is robust to random input feature removel $`\rightarrow`$ small subset of features are sufficient for decision making
|
|
|
* Randomly removing 90% of ImageNet input features, model still at ~63% test acc.
|
|
|
* Base Methods used are no better than random.
|
|
|
* SmoothGrad-Squared and VarGrad far outperform base method as well as random.
|
|
|
* SmoothGrad, performs worse than single base method while being computaitonally more constly
|
|
|
|
|
|
Why retraining from scratch?
|
|
|
* Otherwise the reason for a model's degradation in performance is unclear: Could be
|
|
|
* Due to artifacts introduced by replacement value (no definition of uninformative value if net didnt learn it as such)
|
|
|
* Due to approximate accuracy of the estimator
|
|
|
* shift in distributions
|
|
|
|
|
|
Evaluating the right aspect?
|
|
|
* Model used for eval is a newly trained one, different from original model
|
|
|
* Possibilities of ROAR:
|
|
|
* Remove inputs, acc. drops: it is likely that removed inputs were informative to the original model
|
|
|
* Remove inputs, acc. remains good: Either removed uninformative features or there is redundancy in inputs
|
|
|
|
|
|
|
|
|
Note
|
|
|
* L1 regularization (or any algorithm that uses feature selection) might corrupt ROAR, if unused features are not masked during retraining |
|
|
\ No newline at end of file |