Microsoft Corporation
DOCUMENT SAMPLING USING PREFETCHING AND PRECOMPUTING

Last updated:

Abstract:

A system to facilitate document sampling may include a sampling service engine coupled to a document data store that contains a set of unlabeled documents. The sampling service engine may include local storage and a prefetching component to download a subset of the documents from the document data store before completion of an executing Machine Learning ("ML") model training process. The prefetching component may also store the subset of the documents in the local storage. A precomputing component may execute a sampling algorithm on the stored subset of the documents and select viable documents for user-provided labels based on ML model prediction scores and at least one sub-sampling type. The document data store and the sampling service engine might, in some embodiments, execute in a cloud computing environment.

Status:
Application
Type:

Utility

Filling date:

22 Jan 2021

Issue date:

28 Jul 2022