

# Explanations can be manipulated and geometry is to blame



**Keys:**



## Keys:



* Adversarial attacks on explanation maps:



* they can be changed to an arbitrary target map by applying visually hardly perceptible input pertubation



* The pertubation does not change the output of the network for that input

...  ...  @@ 9,4 +9,30 @@ 


* We can derive a bound on the degree of possible manipulation. The bound is proportional to two differential geometric quantities:



* principle curvatures



* geodesic distance between original input and manipulated counterpart



* Using this insight to limit possible ways of manipulations $`\rightarrow`$ enhance resilience of explanation methods 


\ No newline at end of file 


* Using this insight to limit possible ways of manipulations $`\rightarrow`$ enhance resilience of explanation methods









## Manipulation of explanations



* Used explanation methods: Gradientbased and propagationbased



### Manipulation Method



Given:



* neural net $`g : \mathbb{R}^d \rightarrow \mathbb{R}^K`$ with ReLU nonlinearities with input $`x \in \mathbb{R^d}`$ and predicted class $`k \in \mathbb{K}`$ is given by $`k = \argmax_ig(x)_i`$



* explanation map $`h : \mathbb{R}^d \rightarrow \mathbb{R}^d`$ associated each pixel with a relevance score



* target $`h^t :\in \mathbb{R}^d`$






For the manipulated image $`x_{adv} = x + \delta x`$:



1. output of net stays constant: $`g(x_{adv}) \approx g(x)`$



2. the explanation is close to target: $`h(x_{adv}) \approx h^t`$



3. norm of pertubation $`\delta x`$ is small: $`\delta x = x_{adv}  x \ll 1`$






Loss function: optimize $`\mathcal{L} = h(x_{adv})  h^t^2 + \gamma g(x_{adv})  g(x)^2`$ w.r.t. $`x_{adv}`$, $`\gamma \in \mathbb{R_+}`$ is hyperparam






* Requires to compute gradient w.r.t. the input $`\Delta h(x)`$ of the explanation (if explanation map is first order gradient, one needs second order gradient to optimize it)



* ReLU has vanishing second derivative $`\rightarrow`$ replace ReLU with softplus






### Experiments



* Qualitative Analysis: Target is closely emulated, pertubation is small.



* Quantitative Analysis: measure SSIM, PCC, MSE between both target and manipulated explanation map as well as original image and perturbed image.









<img src="uploads/expl_manipulated_fig2.png" width="800"> 


\ No newline at end of file 