International Business Machines Corporation
GENERATION OF MATCHED CORPUS FOR LANGUAGE MODEL TRAINING
Last updated:
Abstract:
A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.
Status:
Application
Type:
Utility
Filling date:
6 Feb 2020
Issue date:
12 Aug 2021