International Business Machines Corporation
Detecting adversary attacks on a deep neural network (DNN)
Last updated:
Abstract:
A method, apparatus and computer program product to protect a deep neural network (DNN) having a plurality of layers including one or more intermediate layers. In this approach, a training data set is received. During training of the DNN using the received training data set, a representation of activations associated with an intermediate layer is recorded. For at least one or more of the representations, a separate classifier (model) is trained. The classifiers, collectively, are used to train an outlier detection model. Following training, the outliner detection model is used to detect an adversarial input on the deep neural network. The outlier detection model generates a prediction, and an indicator whether a given input is the adversarial input. According to a further aspect, an action is taken to protect a deployed system associated with the DNN in response to detection of the adversary input.
Utility
17 Nov 2020
19 May 2022