These blocks are made by starting out with a dark grey image and then backpropagating on the image with the pre trained network with a negative epsilon in order to minimise loss for the target class, a more negative epsilon will not necessarily give a better result. But it's a bell curve instead, and the epsilon is optimized by looking for the local target class probability maxima in the domain [lower_limit
, 0
)
These adversarial blocks can be generated for any animal class.
sign(data_gradients)
gives the element wise signs of the data gradientepsilon
defines the "strength" of the perturbation of the image
In a nutshell, instead of optimizing the model to reduce the loss, we're un-optimizing the input image to maximise loss.
-
This works primarily because of the piecewise linear nature of deep neural networks. For example, look at ReLU or at maxout functions, they're all piecewise linear. Even a carefully tuned sigmoid has an approximate linear nature when taken piecewise.
-
With varying values of epsilon, we will see an approximately linear relationship between "confidence" and epsilon.
- this can be used to turn one animal into another specific animal for the deep neural network
- The key here is to understand how FGSM actually worked.
In FGSM, we were tampering with the pixels which has a positive gradient and added a certain value
gradient * epsilon
to each of those pixels. This made the image deviate further and further away from the class it actually belongs to and thus maximising loss in the process. Note that this was done with a positive epsilon value
But for our current objective, we will try to "optimize" the image to a different class. This can be done by:
- Doing a forward pass with an image of class
x
and with a label ofy
. Wherey
is the class to which we want to convert our image to. - Performing a backpropagation on the network and extracting the gradients on the input image.
- Now instead of trying to maximise loss using the FGSM, we'll reduce the loss with a negative epsilon FGSM.
- This will help reduce the loss of the image with respect to the target class
y
, and with a sufficiently negative epsilon value, the image gets mis-classified as the target class.
If you didn't read the boring stuff above, just remember that
- A positive epsilon value will un-optimize the image
- A negative epsilon value will optimize the image for the given label class