International Business Machines Corporation
DATA AUGMENTATION BY DYNAMIC WORD REPLACEMENT

Last updated:

Abstract:

A computer-implemented method is provided for data augmentation. The method includes calculating, by a hardware processor for each of words in a text data, a word replacement probability based on a word occurrence frequency in the text data, wherein the word replacement probability decreases with increasing word occurrence frequency. The method additionally includes selectively replacing at least one of the words in the text data with words predicted therefor by a Bidirectional Neural Network Language Model (BiNNLM) to generate augmented text data, based on the word replacement probability.

Status:
Application
Type:

Utility

Filling date:

20 Mar 2020

Issue date:

23 Sep 2021