Oracle Corporation
AUGMENTED TRAINING SET OR TEST SET FOR IMPROVED CLASSIFICATION MODEL ROBUSTNESS
Last updated:
Abstract:
A target set of texts, for training and/or evaluating a text classification model, is augmented using insertions into a base text within the original target set. In an embodiment, an expanded text, including the base text and an insertion word, must satisfy one or more inclusion criteria in order to be added to the target set. The inclusion criteria may require that the expanded text constitutes a successful attack on the classification model, the expanded text has a satisfactory perplexity score, and/or the expanded text is verified as being valid. In an embodiment, if a number of expanded texts added into the target set is below a threshold number, insertions are made into an expanded text (which was generated based on the base text). Inclusion criteria are evaluated against the doubly-expanded text to determine whether to add the doubly-expanded text to the target set.
Utility
3 Aug 2021
4 Aug 2022