International Business Machines Corporation
Detecting trojan neural networks
Last updated:
Abstract:
One or more computer processors generate a plurality of adversarial perturbations associated with a model, wherein the plurality of adversarial perturbations comprises a universal perturbation and one or more per-sample perturbations. The one or more computer processors identify a plurality of neuron activations associated with the model and the plurality of generated adversarial perturbations. The one or more computer processors maximize the identified plurality of neuron activations. The one or more computer processors determine the model is a Trojan model by leveraging one or more similarities associated with the maximized neuron activations and the generated adversarial perturbations.
Status:
Grant
Type:
Utility
Filling date:
17 Aug 2020
Issue date:
19 Jul 2022