# Introduction to adversarial robustness

Published:

This lesson is from Adversarial Robustness - Theory and Practice

# Introduction

• Developing classifiers that are robust to perturbations of their inputs
• by an adversary intent on fooling the classifier
• Image Classification in PyTorch
• transform the image to approximately zero-mean and unit variance
• Perturbation is to be added in the original or unnormalised image

## ImageNet

model = torchvision.models.resnet50(pretrained=True)
model.eval()

img = "data/imgs/pig.jpg"
img = PIL.Image.open(img)
img_tensor = utils.preprocess(img).unsqueeze(dim=0)
img_tensor = utils.normalize(img_tensor)

pred = model(img_tensor)
label, name = utils.pred_class(pred)

print(label, name) # 341 hog


## Notations

### Model or Hypothesis function

• $h_\theta : \mathcal{X} \rightarrow \mathbb{R}^k$
• mapping from input space $3D$ Tensor to output space which is $kD$ Vector
• $k$ is the number of classes being predicted
• In this case of ResNet PyTorch, the output is logits so the output may $\pm$ real numbers
• $\theta$ represents parameters defining the model
• convolutional filters, fully-connected layer weight metrics, biases etc
• trained parameters

### Loss Function

• $\ell: \mathbb{R}^k \times \mathbb{Z}_+ \rightarrow \mathbb{R}+$

• mapping from the model predictions and true labels to a non-negative number

• $\mathbb{R}^k$ - model output i.e. logits and can be $\pm$

• $\mathbb{Z}_+$ is the index of true class i.e. number from $1$ to $k$

• Loss the classifier acheives with input $x$ and output $y$
• $\ell(h_\theta(x), y)$
• $x \in \mathcal{X}$ as input
• $y \in \mathbb{Z}$ is true class
• Cross Entropy Loss (or softmax loss)

• most common loss

• $\ell (h_\theta (x), y) = \log \left ( \sum_{j=1}^k \exp(h_\theta (x)_j) \right ) - h_\theta (x)_y$
• where $h_θ(x)_j$ denotes the $j^{th}$ elements of the vector $h_θ(x)$

• This comes from softmax activation

• Softmax Operator

• $\sigma : \mathbb{R}^k \rightarrow \mathbb{R}^k$
• is a mapping from class logits returned by $h_\theta$ to probability distribution
• goal of training neural network is to maximize the probability of true class
• $\sigma(z)i = \frac{exp(z_i)}{\sum{j=1}^{k}\exp(z_{j})}$
• Since probabilities get vanishingly small, it is common to maximize the log of the probability of true class

• Now, $h_\theta(x)$ is a logit vector with $y$ as true class
• Prob Vector is $\sigma(h_\theta(x))$
• predicted probability for true class is $\sigma(h_\theta(x))_y$
• $log$ of predicted probability that is to be maximized is
• $\log \sigma(h_\theta(x))_y = \log \left(\frac{exp(h_\theta(x)_y)}{\sum_{j=1}^{k}\exp(h_\theta(x)_{j})} \right) = h_\theta(x)_y - \log \left (\sum_{j=1}^{k}\exp(h_\theta(x)_{j}) \right )$
• Since the convention is to minimize the loss rather than maximize probability, we use negation of this quantity as our loss function

• loss = nn.CrossEntropyLoss()(model(img_tensor), target=torch.LongTensor([341]))
loss = loss.item()
print(loss) # 0.003882253309711814

• If the loss is small e.g. $0.003$ then it corresponds to $e^{-0.003} \approx 0.996$ probability

### Creating Adversarial Example

• Training Approach
• is to optimize the parameters $\theta$ so as to minimize the average loss over training set ${x_i \in \mathcal{X}, y_i \in \mathbb{Z}}$, $i=1,…,m$
• Average Loss $= \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i)$
• Thus, Optimization Problem is
• $\min\limits_\theta \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i)$
• We solve Optimization Problem by (stochastic) gradient descent for some minibatch $\mathcal{B} \subseteq {1,\ldots,m}$
• We compute gradient of loss with respect to $\theta$ and make small adjustment to $\theta$ in the negative direction
• Loss Function $\ell(h_\theta(x_i), y_i)$ for $i \in \mathcal{B}$
• Gradient of Loss Function is $\nabla_\theta \ell(h_\theta(x_i), y_i)$ for $i \in \mathcal{B}$
• Mini Batch
• $\frac{1}{\mid \mathcal{B} \mid} \sum\limits_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i)$

• $\theta := \theta - \frac{\alpha}{|\mathcal{B}|} \sum\limits_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i)$
• where $\alpha$ is step size
• We repeat the process for different mini-batches covering the entire training set, until the parameters converge.
• $\nabla_\theta \ell(h_\theta(x_i), y_i)$

• computes how a small adjustment to each of the parameters $\theta$ will affect the loss function
• Computed by Backpropogation

• Gradient of loss wrt input $x_i$

• computes as how small changes to the image affect the loss function
• Image is adjusted to maximize the loss

• Thus $$\min\limits_\theta \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i) \ becomes \ \DeclareMathOperator*{\maximize}{maximize} \maximize_{\hat{x}} \ell(h_\theta(\hat{x}), y)$$

• $\hat{x}$ denotes adversarial example that is maximize the loss
• In order to make $\hat{x} \sim x$
• Optimize over the perturbation to $x$, denoted by $\delta$, and optimized over $\delta$
• $\maximize_\limits{\delta \in \Delta} \ell(h_\theta(x +\delta), y)$
• where $\Delta$ represents allowable set of perturbations
• A common perturbation set to use, is the $\ell_\infty$ defined by $\Delta = {\delta : |\delta|_\infty \leq \epsilon}$
• $\ell_\infty$ norm of a vector $z$ is defined as
• $\norm{z}\infty = \max\limits{i} \mid z_i \mid$
• e.g. L-infinity norm of vector X= [-6, 4, 2] is 6
model = torchvision.models.resnet50(pretrained=True)
model.eval()

epsilon = 2./255

img = "data/imgs/pig.jpg"
target = 341

img = PIL.Image.open(img)
img_tensor = utils.preprocess(img).unsqueeze(dim=0)
target_tensor = torch.LongTensor([target])

delta_tensor = torch.zeros_like(img_tensor, requires_grad=True)
opt = optim.SGD([delta_tensor], lr=1e-1) # optimizer on delta

model = torchvision.models.resnet50(pretrained=True)
model.eval();

for t in range(30):
norm_tensor = utils.normalize(img_tensor + delta_tensor)
pred = model(norm_tensor)

loss = nn.CrossEntropyLoss()(pred, target_tensor)
loss = -loss

if t % 5 == 0:
print(t, loss.item())

loss.backward()
opt.step() # will update delta_tensor

delta_tensor.data.clamp_(-epsilon, epsilon)

# -0.003882253309711814
# -0.006934622768312693
# -0.015804270282387733
# -0.08014067262411118
# -11.92103385925293
# -13.965073585510254

label, name, prob = utils.pred_class(pred)
print(label, name, prob) # 106 wombat 0.999923586845398

prob = nn.Softmax(dim=1)(pred)[0][341].item()
print(prob) # 1.3545100046030711e-06