SAP SE
GENERATING CORPUS FOR TRAINING AND VALIDATING MACHINE LEARNING MODEL FOR NATURAL LANGUAGE PROCESSING

Last updated:

Abstract:

A method may include generating, based a context-free grammar, a sample forming a corpus. The context-free grammar may include production rules for replacing a first nonterminal symbol with a second nonterminal symbol and/or a terminal symbol. The sample may be generated by rewriting recursively a first text string to form a second text string associated with the sample. The first text string may be rewritten by applying the production rules to replace nonterminal symbols included in the first text string until no nonterminal symbols remain in the first text string. A machine learning model may be trained, based on the corpus, to process a natural language. Related methods and articles of manufacture are also disclosed.

Status:
Application
Type:

Utility

Filling date:

27 Sep 2019

Issue date:

1 Apr 2021