International Business Machines Corporation
Root cause analysis of vulnerability of neural networks to adversarial examples
Last updated:
Abstract:
An illustrative embodiment includes a method for protecting a machine learning model. The method includes: determining concept-level interpretability of respective units within the model; determining sensitivity of the respective units within the model to an adversarial attack; identifying units within the model which are both interpretable and sensitive to the adversarial attack; and enhancing defense against the adversarial attack by masking at least a portion of the units identified as both interpretable and sensitive to the adversarial attack.
Status:
Grant
Type:
Utility
Filling date:
3 Sep 2019
Issue date:
13 Sep 2022