Maximilian Idahl · cbec5402
--- a/a-benchmark-for-interpretability-methods-in-deep-neural-networks.md
+++ b/a-benchmark-for-interpretability-methods-in-deep-neural-networks.md
+# A Benchmark for Interpretability Methods in Deep Neural Networks
+Hooker et. al, google-research
+
+Keys:
+* Method to measure an approximate accuracy of feature importance estimates in DNNs
+* Popular interpretability methods for feature importance estimation are not better than random, only certain ensemble based ones are (VarGrad, SmoothGrad-Squared).
+* Some ensembles are not better than underlying method $`\rightarrow`$ ensembling remains questionable
+  
+**ROAR** (**R**em**O**ve **A**nd **R**etrain)
+* An approach to evaluate interpretability methods by verifying how the accuracy of a retrained model degrades as features, that have been estimated to be important, are removed.
+* Given a trained CNN
+  * For each image (train+test):
+    * sort features (pixels) by their estimated importances
+    * replace top-$`t`$ fraction with a fixed uninformative value (per-channel mean).
+  * Train new model on modified dataset
+* If the feature importance estimator was more accurate, the subsequent feature removel will cause higher degradation.
+
+![figure1](uploads/roar_fig1.png)
+
+Experiments:
+* three large-scale image datasets (incl. ImageNet)
+* Baseline: use random feature importance estimates, as well as sobel filter ones. Both are independent of model.
+  *  *Is the interpretability method more accurate than a random guess as to which features are important?*
+* Base feature importance estimator methods: Gradient Heatmap (GRAD), Integrated Gradients (IG), Guided Backprop (GB)
+* Ensembling feature importance methods: SmoothGrad, SmoothGrad-Squared and VarGrad
+* Generate modified datasets with different degradation levels $`t = [0., 10, ..., 100]`$ for each method.
+* train models on these datasets, evaluate their performance.
+
+
+Findings:
+* Model performance is robust to random input feature removel $`\rightarrow`$ small subset of features are sufficient for decision making
+  * Randomly removing 90% of ImageNet input features, model still at ~63% test acc.
+* Base Methods used are no better than random.
+* SmoothGrad-Squared and VarGrad far outperform base method as well as random.
+* SmoothGrad, performs worse than single base method while being computaitonally more constly
+
+Why retraining from scratch?
+* Otherwise the reason for a model's degradation in performance is unclear: Could be
+    * Due to artifacts introduced by replacement value (no definition of uninformative value if net didnt learn it as such)
+    * Due to approximate accuracy of the estimator
+    * shift in distributions
+
+Evaluating the right aspect?
+* Model used for eval is a newly trained one, different from original model
+* Possibilities of ROAR:
+  * Remove inputs, acc. drops: it is likely that removed inputs were informative to the original model
+  * Remove inputs, acc. remains good: Either removed uninformative features or there is redundancy in inputs
+
+
+Note
+* L1 regularization (or any algorithm that uses feature selection) might corrupt ROAR, if unused features are not masked during retraining
\ No newline at end of file