Introduction to adversarial robustness

5 minute read

Published: September 01, 2021

This lesson is from Adversarial Robustness - Theory and Practice

Introduction

Adversarial robustness
- Developing classifiers that are robust to perturbations of their inputs
- by an adversary intent on fooling the classifier
Image Classification in PyTorch
- transform the image to approximately zero-mean and unit variance
- Perturbation is to be added in the original or unnormalised image

ImageNet

model = torchvision.models.resnet50(pretrained=True)
model.eval()

img = "data/imgs/pig.jpg"
img = PIL.Image.open(img)
img_tensor = utils.preprocess(img).unsqueeze(dim=0)
img_tensor = utils.normalize(img_tensor)

pred = model(img_tensor)
label, name = utils.pred_class(pred)

print(label, name) # 341 hog

Notations

Model or Hypothesis function

$ h_\theta : \mathcal{X} \rightarrow \mathbb{R}^k $
mapping from input space $3D$ Tensor to output space which is $kD$ Vector
$k$ is the number of classes being predicted
In this case of ResNet PyTorch, the output is logits so the output may $\pm$ real numbers
$\theta$ represents parameters defining the model
- convolutional filters, fully-connected layer weight metrics, biases etc
  - trained parameters

Loss Function

$ \ell: \mathbb{R}^k \times \mathbb{Z}_+ \rightarrow \mathbb{R}+ $
mapping from the model predictions and true labels to a non-negative number
$\mathbb{R}^k$ - model output i.e. logits and can be $\pm$
$\mathbb{Z}_+$ is the index of true class i.e. number from $1$ to $k$
Loss the classifier acheives with input $x$ and output $y$
- $\ell(h_\theta(x), y)$
  - $x \in \mathcal{X}$ as input
  - $y \in \mathbb{Z}$ is true class
Cross Entropy Loss (or softmax loss)
- most common loss
- \[\ell (h_\theta (x), y) = \log \left ( \sum_{j=1}^k \exp(h_\theta (x)_j) \right ) - h_\theta (x)_y\]
- where $h_θ(x)_j$ denotes the $j^{th}$ elements of the vector $h_θ(x)$
- This comes from softmax activation
- Softmax Operator
  - $\sigma : \mathbb{R}^k \rightarrow \mathbb{R}^k$
    - is a mapping from class logits returned by $h_\theta$ to probability distribution
    - goal of training neural network is to maximize the probability of true class
    - $\sigma(z)i = \frac{exp(z_i)}{\sum{j=1}^{k}\exp(z_{j})}$
- Since probabilities get vanishingly small, it is common to maximize the log of the probability of true class
  - Now, $h_\theta(x)$ is a logit vector with $y$ as true class
  - Prob Vector is $\sigma(h_\theta(x))$
  - predicted probability for true class is $\sigma(h_\theta(x))_y$
  - $log$ of predicted probability that is to be maximized is
    - \[\log \sigma(h_\theta(x))_y = \log \left(\frac{exp(h_\theta(x)_y)}{\sum_{j=1}^{k}\exp(h_\theta(x)_{j})} \right) = h_\theta(x)_y - \log \left (\sum_{j=1}^{k}\exp(h_\theta(x)_{j}) \right )\]
- Since the convention is to minimize the loss rather than maximize probability, we use negation of this quantity as our loss function
  - loss = nn.CrossEntropyLoss()(model(img_tensor), target=torch.LongTensor([341])) loss = loss.item() print(loss) # 0.003882253309711814
  - If the loss is small e.g. $0.003$ then it corresponds to $e^{-0.003} \approx 0.996$ probability

Creating Adversarial Example

Training Approach
- is to optimize the parameters $ \theta $ so as to minimize the average loss over training set $ {x_i \in \mathcal{X}, y_i \in \mathbb{Z}} $, $i=1,…,m$
  - Average Loss $ = \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i) $
- Thus, Optimization Problem is
  - $ \min\limits_\theta \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i) $
- We solve Optimization Problem by (stochastic) gradient descent for some minibatch $\mathcal{B} \subseteq {1,\ldots,m}$
- We compute gradient of loss with respect to $\theta$ and make small adjustment to $\theta$ in the negative direction
  - Loss Function $ \ell(h_\theta(x_i), y_i) $ for $i \in \mathcal{B}$
  - Gradient of Loss Function is $ \nabla_\theta \ell(h_\theta(x_i), y_i) $ for $i \in \mathcal{B}$
  - Mini Batch
    - $ \frac{1}{\mid \mathcal{B} \mid} \sum\limits_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i) $
    - \[\theta := \theta - \frac{\alpha}{|\mathcal{B}|} \sum\limits_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i)\]
    - where $\alpha$ is step size
- We repeat the process for different mini-batches covering the entire training set, until the parameters converge.
$ \nabla_\theta \ell(h_\theta(x_i), y_i) $
- Gradient
  - computes how a small adjustment to each of the parameters $\theta$ will affect the loss function
  - Computed by Backpropogation
Adversarial
- Gradient of loss wrt input $x_i$
  - computes as how small changes to the image affect the loss function
- Image is adjusted to maximize the loss
- Thus $$ \min\limits_\theta \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i) \
  becomes \
  \DeclareMathOperator*{\maximize}{maximize} \maximize_{\hat{x}} \ell(h_\theta(\hat{x}), y) $$
- $\hat{x}$ denotes adversarial example that is maximize the loss
- In order to make $\hat{x} \sim x$
  - Optimize over the perturbation to $x$, denoted by $\delta$, and optimized over $\delta$
  - $\maximize_\limits{\delta \in \Delta} \ell(h_\theta(x +\delta), y)$
  - where $\Delta$ represents allowable set of perturbations
  - A common perturbation set to use, is the $\ell_\infty$ defined by $ \Delta = {\delta : |\delta|_\infty \leq \epsilon} $
    - $\ell_\infty$ norm of a vector $z$ is defined as
      - $ \norm{z}\infty = \max\limits{i} \mid z_i \mid $
      - e.g. L-infinity norm of vector X= [-6, 4, 2] is 6

model = torchvision.models.resnet50(pretrained=True)
model.eval()

epsilon = 2./255

img = "data/imgs/pig.jpg"
target = 341

img = PIL.Image.open(img)
img_tensor = utils.preprocess(img).unsqueeze(dim=0)
target_tensor = torch.LongTensor([target])

delta_tensor = torch.zeros_like(img_tensor, requires_grad=True) 
opt = optim.SGD([delta_tensor], lr=1e-1) # optimizer on delta

model = torchvision.models.resnet50(pretrained=True)
model.eval();

for t in range(30):
    norm_tensor = utils.normalize(img_tensor + delta_tensor)
    pred = model(norm_tensor)
    
    loss = nn.CrossEntropyLoss()(pred, target_tensor)
    loss = -loss
    
    if t % 5 == 0:
        print(t, loss.item())
        
    opt.zero_grad()
    loss.backward()
    opt.step() # will update delta_tensor
    
    delta_tensor.data.clamp_(-epsilon, epsilon)

# -0.003882253309711814
# -0.006934622768312693
# -0.015804270282387733
# -0.08014067262411118
# -11.92103385925293
# -13.965073585510254

label, name, prob = utils.pred_class(pred)
print(label, name, prob) # 106 wombat 0.999923586845398

prob = nn.Softmax(dim=1)(pred)[0][341].item()
print(prob) # 1.3545100046030711e-06

Share on

Twitter Facebook LinkedIn

Introduction to adversarial robustness

Introduction

ImageNet

Notations

Model or Hypothesis function

Loss Function

Creating Adversarial Example

Share on

You May Also Enjoy

Applied Software Design

Code: CMake and Catch2

C++

Pointers: slide 1

C++

Arrays and Vectors: slide 1

C++

Functions: slide 1