Oracle Corporation
EXTRACTED MODEL ADVERSARIES FOR IMPROVED BLACK BOX ATTACKS
Last updated:
Abstract:
Techniques are described for identifying successful adversarial attacks for a black box reading comprehension model using an extracted white box reading comprehension model. The system trains a white box reading comprehension model that behaves similar to the black box reading comprehension model using the set of queries and corresponding responses from the black box reading comprehension model as training data. The system tests adversarial attacks, involving modified informational content for execution of queries, against the trained white box reading comprehension model. Queries used for successful attacks on the white box model may be applied to the black box model itself as part of a black box improvement process.
Utility
9 Dec 2020
17 Feb 2022