Changes

Maximilian Idahl · fe92a61f
--- a/explanations-can-be-manipulated-and-geometry-is-to-blame.md
+++ b/explanations-can-be-manipulated-and-geometry-is-to-blame.md
 # Explanations can be manipulated and geometry is to blame
-**Keys:**
+## Keys:
 * Adversarial attacks on explanation maps: 
  * they can be changed to an arbitrary target map by applying visually hardly perceptible input pertubation
  * The pertubation does not change the output of the network for that input
@@ -10,3 +10,29 @@
  * principle curvatures
  * geodesic distance between original input and manipulated counterpart
 * Using this insight to limit possible ways of manipulations $`\rightarrow`$ enhance resilience of explanation methods
+## Manipulation of explanations
+* Used explanation methods: Gradient-based and propagation-based
+### Manipulation Method
+Given:
+* neural net $`g : \mathbb{R}^d \rightarrow \mathbb{R}^K`$ with ReLU non-linearities with input $`x \in \mathbb{R^d}`$ and predicted class $`k \in \mathbb{K}`$ is given by $`k = \argmax_ig(x)_i`$
+* explanation map $`h : \mathbb{R}^d \rightarrow \mathbb{R}^d`$ associated each pixel with a relevance score
+* target $`h^t :\in \mathbb{R}^d`$
+For the manipulated image $`x_{adv} = x + \delta x`$:
+1. output of net stays constant: $`g(x_{adv}) \approx g(x)`$
+2. the explanation is close to target: $`h(x_{adv}) \approx h^t`$
+3. norm of pertubation $`\delta x`$ is small: $`||\delta x|| = ||x_{adv} - x|| \ll 1`$ 
+Loss function: optimize $`\mathcal{L} = ||h(x_{adv}) - h^t||^2 + \gamma ||g(x_{adv}) - g(x)||^2`$ w.r.t. $`x_{adv}`$, $`\gamma \in \mathbb{R_+}`$ is hyperparam 
+* Requires to compute gradient w.r.t. the input $`\Delta h(x)`$ of the explanation (if explanation map is first order gradient, one needs second order gradient to optimize it)
+* ReLU has  vanishing second derivative $`\rightarrow`$ replace ReLU with softplus
+### Experiments
+* Qualitative Analysis: Target is closely emulated, pertubation is small.
+* Quantitative Analysis: measure SSIM, PCC, MSE between both target and manipulated explanation map as well as original image and perturbed image.
+<img src="uploads/expl_manipulated_fig2.png" width="800">
\ No newline at end of file