|
# Explanations can be manipulated and geometry is to blame
|
|
# Explanations can be manipulated and geometry is to blame
|
|
**Keys:**
|
|
## Keys:
|
|
* Adversarial attacks on explanation maps:
|
|
* Adversarial attacks on explanation maps:
|
|
* they can be changed to an arbitrary target map by applying visually hardly perceptible input pertubation
|
|
* they can be changed to an arbitrary target map by applying visually hardly perceptible input pertubation
|
|
* The pertubation does not change the output of the network for that input
|
|
* The pertubation does not change the output of the network for that input
|
... | @@ -9,4 +9,30 @@ |
... | @@ -9,4 +9,30 @@ |
|
* We can derive a bound on the degree of possible manipulation. The bound is proportional to two differential geometric quantities:
|
|
* We can derive a bound on the degree of possible manipulation. The bound is proportional to two differential geometric quantities:
|
|
* principle curvatures
|
|
* principle curvatures
|
|
* geodesic distance between original input and manipulated counterpart
|
|
* geodesic distance between original input and manipulated counterpart
|
|
* Using this insight to limit possible ways of manipulations $`\rightarrow`$ enhance resilience of explanation methods |
|
* Using this insight to limit possible ways of manipulations $`\rightarrow`$ enhance resilience of explanation methods
|
|
\ No newline at end of file |
|
|
|
|
|
|
|
|
|
## Manipulation of explanations
|
|
|
|
* Used explanation methods: Gradient-based and propagation-based
|
|
|
|
### Manipulation Method
|
|
|
|
Given:
|
|
|
|
* neural net $`g : \mathbb{R}^d \rightarrow \mathbb{R}^K`$ with ReLU non-linearities with input $`x \in \mathbb{R^d}`$ and predicted class $`k \in \mathbb{K}`$ is given by $`k = \argmax_ig(x)_i`$
|
|
|
|
* explanation map $`h : \mathbb{R}^d \rightarrow \mathbb{R}^d`$ associated each pixel with a relevance score
|
|
|
|
* target $`h^t :\in \mathbb{R}^d`$
|
|
|
|
|
|
|
|
For the manipulated image $`x_{adv} = x + \delta x`$:
|
|
|
|
1. output of net stays constant: $`g(x_{adv}) \approx g(x)`$
|
|
|
|
2. the explanation is close to target: $`h(x_{adv}) \approx h^t`$
|
|
|
|
3. norm of pertubation $`\delta x`$ is small: $`||\delta x|| = ||x_{adv} - x|| \ll 1`$
|
|
|
|
|
|
|
|
Loss function: optimize $`\mathcal{L} = ||h(x_{adv}) - h^t||^2 + \gamma ||g(x_{adv}) - g(x)||^2`$ w.r.t. $`x_{adv}`$, $`\gamma \in \mathbb{R_+}`$ is hyperparam
|
|
|
|
|
|
|
|
* Requires to compute gradient w.r.t. the input $`\Delta h(x)`$ of the explanation (if explanation map is first order gradient, one needs second order gradient to optimize it)
|
|
|
|
* ReLU has vanishing second derivative $`\rightarrow`$ replace ReLU with softplus
|
|
|
|
|
|
|
|
### Experiments
|
|
|
|
* Qualitative Analysis: Target is closely emulated, pertubation is small.
|
|
|
|
* Quantitative Analysis: measure SSIM, PCC, MSE between both target and manipulated explanation map as well as original image and perturbed image.
|
|
|
|
|
|
|
|
|
|
|
|
<img src="uploads/expl_manipulated_fig2.png" width="800"> |
|
|
|
\ No newline at end of file |