Oracle Corporation
HYBRID IN-DOMAIN AND OUT-OF-DOMAIN DOCUMENT PROCESSING FOR NON-VOCABULARY TOKENS OF ELECTRONIC DOCUMENTS

Last updated: 29 Sep 2021

Abstract:

Techniques are described herein for training and evaluating machine learning (ML) models for document processing computing applications based on in-domain and out-of-domain characteristics. In some embodiments, an ML system is configured to form feature vectors by mapping unknown tokens to known tokens within a domain based, at least in part, on out-of-domain characteristics. In other embodiments, the ML system is configured to map the unknown tokens to an aggregate vector representation based on the out-of-domain characteristics. The ML system may use the feature vectors to train ML models and/or estimate unknown labels for the new documents.

Status:

Application

Type:

Utility

Filling date:

13 Jan 2020

Issue date:

27 May 2021

Full patent description

Patent application document

Oracle Corporation HYBRID IN-DOMAIN AND OUT-OF-DOMAIN DOCUMENT PROCESSING FOR NON-VOCABULARY TOKENS OF ELECTRONIC DOCUMENTS

Abstract:

Oracle Corporation
HYBRID IN-DOMAIN AND OUT-OF-DOMAIN DOCUMENT PROCESSING FOR NON-VOCABULARY TOKENS OF ELECTRONIC DOCUMENTS