Oracle Corporation
HYBRID IN-DOMAIN AND OUT-OF-DOMAIN DOCUMENT PROCESSING FOR NON-VOCABULARY TOKENS OF ELECTRONIC DOCUMENTS
Last updated:
Abstract:
Techniques are described herein for training and evaluating machine learning (ML) models for document processing computing applications based on in-domain and out-of-domain characteristics. In some embodiments, an ML system is configured to form feature vectors by mapping unknown tokens to known tokens within a domain based, at least in part, on out-of-domain characteristics. In other embodiments, the ML system is configured to map the unknown tokens to an aggregate vector representation based on the out-of-domain characteristics. The ML system may use the feature vectors to train ML models and/or estimate unknown labels for the new documents.
Status:
Application
Type:
Utility
Filling date:
13 Jan 2020
Issue date:
27 May 2021