Producing and Defending Against Targeted Adversarial Examples
School Name
Heathwood Hall Episcopal School
Grade Level
11th Grade
Presentation Topic
Computer Science
Presentation Type
Non-Mentored
Abstract
Recently, much attention in the literature has been given to "adversarial examples'', input data crafted specifically to catastrophically incline a neural network towards a particular class. However, these methods have mostly been focused on adversarial perturbation, in which valid, human-understandable input data is slightly modified to generate low probabilities for all classes, or high probabilities for an arbitrary class. Our research focuses on the natural complement to this --- synthesizing adversarial examples from scratch to target a specific class with high probabilities. We introduce a method, modeled on the Fast Gradient Sign Method of (Goodfellow 2014b), for carrying out such targeted attacks, and show that state-of-the-art image classification networks are in fact completely vulnerable to it. Furthermore, we introduce an original method for defending against these adversarial attacks which is shown to have significant improvement over current adversarial training methods.
Recommended Citation
Adamo, Nico, "Producing and Defending Against Targeted Adversarial Examples" (2020). South Carolina Junior Academy of Science. 159.
https://scholarexchange.furman.edu/scjas/2020/all/159
Location
Furman Hall 109
Start Date
3-28-2020 8:45 AM
Presentation Format
Oral and Written
Group Project
No
Producing and Defending Against Targeted Adversarial Examples
Furman Hall 109
Recently, much attention in the literature has been given to "adversarial examples'', input data crafted specifically to catastrophically incline a neural network towards a particular class. However, these methods have mostly been focused on adversarial perturbation, in which valid, human-understandable input data is slightly modified to generate low probabilities for all classes, or high probabilities for an arbitrary class. Our research focuses on the natural complement to this --- synthesizing adversarial examples from scratch to target a specific class with high probabilities. We introduce a method, modeled on the Fast Gradient Sign Method of (Goodfellow 2014b), for carrying out such targeted attacks, and show that state-of-the-art image classification networks are in fact completely vulnerable to it. Furthermore, we introduce an original method for defending against these adversarial attacks which is shown to have significant improvement over current adversarial training methods.