# Explanations can be manipulated and geometry is to blame

## Keys:

- Adversarial attacks on explanation maps:
- This phenomenon is related to geometry of the networks output manifold
- We can derive a bound on the degree of possible manipulation. The bound is proportional to two differential geometric quantities:
- principle curvatures
- geodesic distance between original input and manipulated counterpart

- inputs that are similar to each other (L2) can have explanations that are drastically different, as geodesic distance can be substantially greater than L2
- Using softplus with small
`\beta`

makes explanatios more robust in terms of manipulations.- large curvature of NNs decision function is responsible for vulnerability
- softplus leads to reduced maximal curvature (smoothing kinks) compared to ReLU

- can use softplus
**only**for the explanation generation, leaving original network as is- doing this is much faster than SmoothGrad method

## Manipulation of explanations

- Used explanation methods: Gradient-based and propagation-based

### Manipulation Method

Given:

- neural net
`g : \mathbb{R}^d \rightarrow \mathbb{R}^K`

with ReLU non-linearities with input`x \in \mathbb{R^d}`

and predicted class`k \in \mathbb{K}`

is given by`k = \argmax_ig(x)_i`

- explanation map
`h : \mathbb{R}^d \rightarrow \mathbb{R}^d`

associated each pixel with a relevance score - target
`h^t :\in \mathbb{R}^d`

For the manipulated image `x_{adv} = x + \delta x`

:

- output of net stays constant:
`g(x_{adv}) \approx g(x)`

- the explanation is close to target:
`h(x_{adv}) \approx h^t`

- norm of pertubation
`\delta x`

is small:`||\delta x|| = ||x_{adv} - x|| \ll 1`

Loss function: optimize `\mathcal{L} = ||h(x_{adv}) - h^t||^2 + \gamma ||g(x_{adv}) - g(x)||^2`

w.r.t. `x_{adv}`

, `\gamma \in \mathbb{R_+}`

is hyperparam

- Requires to compute gradient of both the network output and the generated explanation map w.r.t. the input
- if explanation map is based on first order gradient, one needs second order gradient to optimize it

- ReLU has vanishing second derivative
`\rightarrow`

replace ReLU with softplus

### Experiments

- Qualitative Analysis: Target is closely emulated, pertubation is small.
- Quantitative Analysis: measure SSIM, PCC, MSE between both target and manipulated explanation map as well as original image and perturbed image.
- ReLU vs softplus (with different
`\beta`

)