International Business Machines Corporation
Generation of matched corpus for language model training

Last updated:

Abstract:

A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.

Status:
Grant
Type:

Utility

Filling date:

6 Feb 2020

Issue date:

15 Mar 2022