Producing and Defending Against Targeted Adversarial Examples

Author(s)

Nico Adamo

School Name

Heathwood Hall Episcopal School

Grade Level

11th Grade

Presentation Topic

Computer Science

Presentation Type

Non-Mentored

Abstract

Recently, much attention in the literature has been given to "adversarial examples'', input data crafted specifically to catastrophically incline a neural network towards a particular class. However, these methods have mostly been focused on adversarial perturbation, in which valid, human-understandable input data is slightly modified to generate low probabilities for all classes, or high probabilities for an arbitrary class. Our research focuses on the natural complement to this --- synthesizing adversarial examples from scratch to target a specific class with high probabilities. We introduce a method, modeled on the Fast Gradient Sign Method of (Goodfellow 2014b), for carrying out such targeted attacks, and show that state-of-the-art image classification networks are in fact completely vulnerable to it. Furthermore, we introduce an original method for defending against these adversarial attacks which is shown to have significant improvement over current adversarial training methods.

Location

Furman Hall 109

Start Date

3-28-2020 8:45 AM

Presentation Format

Oral and Written

Group Project

No

COinS
 
Mar 28th, 8:45 AM

Producing and Defending Against Targeted Adversarial Examples

Furman Hall 109

Recently, much attention in the literature has been given to "adversarial examples'', input data crafted specifically to catastrophically incline a neural network towards a particular class. However, these methods have mostly been focused on adversarial perturbation, in which valid, human-understandable input data is slightly modified to generate low probabilities for all classes, or high probabilities for an arbitrary class. Our research focuses on the natural complement to this --- synthesizing adversarial examples from scratch to target a specific class with high probabilities. We introduce a method, modeled on the Fast Gradient Sign Method of (Goodfellow 2014b), for carrying out such targeted attacks, and show that state-of-the-art image classification networks are in fact completely vulnerable to it. Furthermore, we introduce an original method for defending against these adversarial attacks which is shown to have significant improvement over current adversarial training methods.