International Business Machines Corporation
GENERATION OF MATCHED CORPUS FOR LANGUAGE MODEL TRAINING

Last updated:

Abstract:

A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.

Status:
Application
Type:

Utility

Filling date:

6 Feb 2020

Issue date:

12 Aug 2021