Introduction to adversarial robustness
Published:
This lesson is from Adversarial Robustness - Theory and Practice
Introduction
- Adversarial robustness
- Developing classifiers that are robust to perturbations of their inputs
- by an adversary intent on fooling the classifier
- Image Classification in PyTorch
- transform the image to approximately zero-mean and unit variance
- Perturbation is to be added in the original or unnormalised image
ImageNet
model = torchvision.models.resnet50(pretrained=True)
model.eval()
img = "data/imgs/pig.jpg"
img = PIL.Image.open(img)
img_tensor = utils.preprocess(img).unsqueeze(dim=0)
img_tensor = utils.normalize(img_tensor)
pred = model(img_tensor)
label, name = utils.pred_class(pred)
print(label, name) # 341 hog
Notations
Model or Hypothesis function
- hθ:X→Rk
- mapping from input space 3D Tensor to output space which is kD Vector
- k is the number of classes being predicted
- In this case of ResNet PyTorch, the output is logits so the output may ± real numbers
- θ represents parameters defining the model
- convolutional filters, fully-connected layer weight metrics, biases etc
- trained parameters
- convolutional filters, fully-connected layer weight metrics, biases etc
Loss Function
ℓ:Rk×Z+→R+
mapping from the model predictions and true labels to a non-negative number
Rk - model output i.e. logits and can be ±
Z+ is the index of true class i.e. number from 1 to k
- Loss the classifier acheives with input x and output y
- ℓ(hθ(x),y)
- x∈X as input
- y∈Z is true class
- ℓ(hθ(x),y)
Cross Entropy Loss (or softmax loss)
most common loss
- ℓ(hθ(x),y)=log(k∑j=1exp(hθ(x)j))−hθ(x)y
where h_θ(x)_j denotes the j^{th} elements of the vector h_θ(x)
This comes from softmax activation
Softmax Operator
- \sigma : \mathbb{R}^k \rightarrow \mathbb{R}^k
- is a mapping from class logits returned by h_\theta to probability distribution
- goal of training neural network is to maximize the probability of true class
- $\sigma(z)i = \frac{exp(z_i)}{\sum{j=1}^{k}\exp(z_{j})}$
- \sigma : \mathbb{R}^k \rightarrow \mathbb{R}^k
Since probabilities get vanishingly small, it is common to maximize the log of the probability of true class
- Now, h_\theta(x) is a logit vector with y as true class
- Prob Vector is \sigma(h_\theta(x))
- predicted probability for true class is \sigma(h_\theta(x))_y
- log of predicted probability that is to be maximized is
- \log \sigma(h_\theta(x))_y = \log \left(\frac{exp(h_\theta(x)_y)}{\sum_{j=1}^{k}\exp(h_\theta(x)_{j})} \right) = h_\theta(x)_y - \log \left (\sum_{j=1}^{k}\exp(h_\theta(x)_{j}) \right )
Since the convention is to minimize the loss rather than maximize probability, we use negation of this quantity as our loss function
loss = nn.CrossEntropyLoss()(model(img_tensor), target=torch.LongTensor([341])) loss = loss.item() print(loss) # 0.003882253309711814
- If the loss is small e.g. 0.003 then it corresponds to e^{-0.003} \approx 0.996 probability
Creating Adversarial Example
- Training Approach
- is to optimize the parameters \theta so as to minimize the average loss over training set {x_i \in \mathcal{X}, y_i \in \mathbb{Z}} , i=1,…,m
- Average Loss = \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i)
- Thus, Optimization Problem is
- \min\limits_\theta \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i)
- We solve Optimization Problem by (stochastic) gradient descent for some minibatch \mathcal{B} \subseteq {1,\ldots,m}
- We compute gradient of loss with respect to \theta and make small adjustment to \theta in the negative direction
- Loss Function \ell(h_\theta(x_i), y_i) for i \in \mathcal{B}
- Gradient of Loss Function is \nabla_\theta \ell(h_\theta(x_i), y_i) for i \in \mathcal{B}
- Mini Batch
\frac{1}{\mid \mathcal{B} \mid} \sum\limits_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i)
- \theta := \theta - \frac{\alpha}{|\mathcal{B}|} \sum\limits_{i \in \mathcal{B}} \nabla_\theta \ell(h_\theta(x_i), y_i)
- where \alpha is step size
- We repeat the process for different mini-batches covering the entire training set, until the parameters converge.
- is to optimize the parameters \theta so as to minimize the average loss over training set {x_i \in \mathcal{X}, y_i \in \mathbb{Z}} , i=1,…,m
\nabla_\theta \ell(h_\theta(x_i), y_i)
- Gradient
- computes how a small adjustment to each of the parameters \theta will affect the loss function
- Computed by Backpropogation
- Gradient
Adversarial
Gradient of loss wrt input x_i
- computes as how small changes to the image affect the loss function
Image is adjusted to maximize the loss
Thus $$ \min\limits_\theta \frac{1}{m} \sum\limits_{i=1}^m \ell(h_\theta(x_i), y_i) \
becomes \
\DeclareMathOperator*{\maximize}{maximize} \maximize_{\hat{x}} \ell(h_\theta(\hat{x}), y) $$
- \hat{x} denotes adversarial example that is maximize the loss
- In order to make \hat{x} \sim x
- Optimize over the perturbation to x, denoted by \delta, and optimized over \delta
- \maximize_\limits{\delta \in \Delta} \ell(h_\theta(x +\delta), y)
- where \Delta represents allowable set of perturbations
- A common perturbation set to use, is the \ell_\infty defined by \Delta = {\delta : |\delta|_\infty \leq \epsilon}
- \ell_\infty norm of a vector z is defined as
- $ \norm{z}\infty = \max\limits{i} \mid z_i \mid $
- e.g. L-infinity norm of vector X= [-6, 4, 2] is 6
- \ell_\infty norm of a vector z is defined as
model = torchvision.models.resnet50(pretrained=True)
model.eval()
epsilon = 2./255
img = "data/imgs/pig.jpg"
target = 341
img = PIL.Image.open(img)
img_tensor = utils.preprocess(img).unsqueeze(dim=0)
target_tensor = torch.LongTensor([target])
delta_tensor = torch.zeros_like(img_tensor, requires_grad=True)
opt = optim.SGD([delta_tensor], lr=1e-1) # optimizer on delta
model = torchvision.models.resnet50(pretrained=True)
model.eval();
for t in range(30):
norm_tensor = utils.normalize(img_tensor + delta_tensor)
pred = model(norm_tensor)
loss = nn.CrossEntropyLoss()(pred, target_tensor)
loss = -loss
if t % 5 == 0:
print(t, loss.item())
opt.zero_grad()
loss.backward()
opt.step() # will update delta_tensor
delta_tensor.data.clamp_(-epsilon, epsilon)
# -0.003882253309711814
# -0.006934622768312693
# -0.015804270282387733
# -0.08014067262411118
# -11.92103385925293
# -13.965073585510254
label, name, prob = utils.pred_class(pred)
print(label, name, prob) # 106 wombat 0.999923586845398
prob = nn.Softmax(dim=1)(pred)[0][341].item()
print(prob) # 1.3545100046030711e-06