This post covers paper “Opportunities and Challenges in Deep Learning Adversarial Robustness: A Survey”
- DNN models are susceptible to small input perturbations. in most cases imperceptible to the human eye
- small additive targeted noise to the input image, makes models to misclassify objects which before could be identified with 99.99% confidence
- Such models report high confidence in the predictions. Such perturbations, which can fool a trained model, are known as adversarial attacks.
- Explainability may solve but doesn’t improve the model.
- generate models which are robust against adversarial attacks
- introduce robustness in their models’ layers such that the models are not fooled by out of distribution examples, known or unknown attacks, targeted or untargeted attacks.
- Robustness against adversarial attacks is a dual optimization problem, in which the attackers try to maximize the loss while the defenses try to minimize the chance of a model being fooled by the attacker.