Oracle Corporation
HYBRID IN-DOMAIN AND OUT-OF-DOMAIN DOCUMENT PROCESSING FOR NON-VOCABULARY TOKENS OF ELECTRONIC DOCUMENTS

Last updated:

Abstract:

Techniques are described herein for training and evaluating machine learning (ML) models for document processing computing applications based on in-domain and out-of-domain characteristics. In some embodiments, an ML system is configured to form feature vectors by mapping unknown tokens to known tokens within a domain based, at least in part, on out-of-domain characteristics. In other embodiments, the ML system is configured to map the unknown tokens to an aggregate vector representation based on the out-of-domain characteristics. The ML system may use the feature vectors to train ML models and/or estimate unknown labels for the new documents.

Status:
Application
Type:

Utility

Filling date:

27 Nov 2019

Issue date:

27 May 2021