International Business Machines Corporation
Root cause analysis of vulnerability of neural networks to adversarial examples

Last updated:

Abstract:

An illustrative embodiment includes a method for protecting a machine learning model. The method includes: determining concept-level interpretability of respective units within the model; determining sensitivity of the respective units within the model to an adversarial attack; identifying units within the model which are both interpretable and sensitive to the adversarial attack; and enhancing defense against the adversarial attack by masking at least a portion of the units identified as both interpretable and sensitive to the adversarial attack.

Status:
Grant
Type:

Utility

Filling date:

3 Sep 2019

Issue date:

13 Sep 2022