# Introduction to adversarial robustness

** Published:**

This lesson is from Adversarial Robustness - Theory and Practice

# Introduction

- Adversarial robustness
- Developing classifiers that are robust to perturbations of their inputs
- by an adversary intent on fooling the classifier

- Image Classification in PyTorch
- transform the image to approximately zero-mean and unit variance
- Perturbation is to be added in the original or unnormalised image

## ImageNet

```
import PIL
import torch
import torchvision
from torchvision import transforms
import torch.nn as nn
import json
with open("data/imagenet_class_index.json") as fp:
imagenet = json.load(fp)
imagenet_classes = {int(i):x[1] for i, x in imagenet.items()}
mean = [0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]
class Normalize(nn.Module):
def __init__(self, mean, std):
super(Normalize, self).__init__()
mean = torch.Tensor(mean).reshape([1,3,1,1])
std = torch.Tensor(std).reshape([1,3,1,1])
self.mean = mean
self.std = std
def forward(self, x):
return (x - self.mean) / self.std
norm = Normalize(mean=mean, std=std)
preprocess = transforms.Compose([transforms.Resize(224),
transforms.ToTensor()])
model = torchvision.models.resnet50(pretrained=True)
img = "data/imgs/pig.jpg"
img = PIL.Image.open(img)
img_tensor = norm(preprocess(img).unsqueeze(dim=0))
model.eval()
pred = model(img_tensor)
pred = pred.max(dim=1)[1].item()
print(pred, imagenet_classes[pred]) # 341 hog
```

## Notations

### Model or Hypothesis function

- $ h_\theta : \mathcal{X} \rightarrow \mathbb{R}^k $
- mapping from input space $3D$ Tensor to output space which is $kD$ Vector
- $k$ is the number of classes being predicted
- In this case of ResNet PyTorch, the output is logits so the output may $\pm$ real numbers
- $\theta$ represents parameters defining the model
- convolutional filters, fully-connected layer weight metrics, biases etc
- trained parameters

- convolutional filters, fully-connected layer weight metrics, biases etc

### Loss Function

$ \ell: \mathbb{R}^k \times \mathbb{Z}_+ \rightarrow \mathbb{R}+ $

mapping from the model predictions and true labels to a non-negative number

$\mathbb{R}^k$ - model output i.e. logits and can be $\pm$

$\mathbb{Z}_+$ is the index of true class i.e. number from $1$ to $k$

- Loss the classifier acheives with input $x$ and output $y$
- $\ell(h_\theta(x), y)$
- $x \in \mathcal{X}$ as input
- $y \in \mathbb{Z}$ is true class

- $\ell(h_\theta(x), y)$
Cross Entropy Loss (or softmax loss)

most common loss

- \[\ell (h_\theta (x), y) = \log \left ( \sum_{j=1}^k \exp(h_\theta (x)_j) \right ) - h_\theta (x)_y\]
where $h_θ(x)_j$ denotes the $j^{th}$ elements of the vector $h_θ(x)$

This comes from softmax activation

Softmax Operator

- $\sigma : \mathbb{R}^k \rightarrow \mathbb{R}^k$
- is a mapping from class logits returned by $h_\theta$ to probability distribution
- goal of training neural network is to maximize the probability of true class
- $\sigma(z)
*i = \frac{exp(z_i)}{\sum*{j=1}^{k}\exp(z_{j})}$

- $\sigma : \mathbb{R}^k \rightarrow \mathbb{R}^k$
Since probabilities get vanishingly small, it is common to maximize the log of the probability of true class

- Now, $h_\theta(x)$ is a logit vector with $y$ as true class
- Prob Vector is $\sigma(h_\theta(x))$
- predicted probability for true class is $\sigma(h_\theta(x))_y$
- $log$ of predicted probability that is to be maximized is
- \[\log \sigma(h_\theta(x))_y = \log \left(\frac{exp(h_\theta(x)_y)}{\sum_{j=1}^{k}\exp(h_\theta(x)_{j})} \right) = h_\theta(x)_y - \log \left (\sum_{j=1}^{k}\exp(h_\theta(x)_{j}) \right )\]

Since the convention is to minimize the loss rather than maximize probability, we use negation of this quantity as our loss function

`loss = nn.CrossEntropyLoss()(model(img_tensor), target=torch.LongTensor([341])) loss = loss.item() print(loss) # 0.003882253309711814`

- If the loss is small e.g. $0.003$ then it corresponds to $e^{-0.003} \approx 0.996$ probability

### Creating Adversarial Example

- Training Approach
- is to optimize the parameters $ \theta $ so as to minimize the average loss over training set $ {x_i \in \mathcal{X}, y_i \in \mathbb{Z}} $, $i=1,…,m$
- Average Loss $ = \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i) $

- Thus, Optimization Problem is
- $ \min\limits_\theta \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i) $

- We solve Optimization Problem by (stochastic) gradient descent for some minibatch $\mathcal{B} \subseteq {1,\ldots,m}$
- We compute gradient of loss with respect to $\theta$ and make small adjustment to $\theta$ in the negative direction
- Loss Function $ \ell(h_\theta(x_i), y_i) $ for $i \in \mathcal{B}$
- Gradient of Loss Function is $ \nabla_\theta \ell(h_\theta(x_i), y_i) $ for $i \in \mathcal{B}$
- Mini Batch
$ \frac{1}{\mid \mathcal{B} \mid} \sum\limits_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i) $

- \[\theta := \theta - \frac{\alpha}{|\mathcal{B}|} \sum\limits_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i)\]
- where $\alpha$ is step size

- We repeat the process for different mini-batches covering the entire training set, until the parameters converge.

- is to optimize the parameters $ \theta $ so as to minimize the average loss over training set $ {x_i \in \mathcal{X}, y_i \in \mathbb{Z}} $, $i=1,…,m$