International Business Machines Corporation
Interpretability-aware adversarial attack and defense method for deep learnings

Last updated:

Abstract:

Embodiments relate to a system, program product, and method to support a convolutional neural network (CNN). A class-specific discriminative image region is localized to interpret a prediction of a CNN and to apply a class activation map (CAM) function to received input data. First and second attacks are generated on the CNN with respect to the received input data. The first attack generates first perturbed data and a corresponding first CAM, and the second attack generates second perturbed data and a corresponding second CAM. An interpretability discrepancy is measured to quantify one or more differences between the first CAM and the second CAM. The measured interpretability discrepancy is applied to the CNN. The application is a response to an inconsistency between the first CAM and the second CAM and functions to strengthen the CNN against an adversarial attack.

Status:
Grant
Type:

Utility

Filling date:

14 Jan 2020

Issue date:

26 Jul 2022