Microsoft Corporation
Scalable and Resource-Efficient Extraction of Data from Network-Accessible Documents

Last updated:

Abstract:

A technique is described herein for processing network-accessible documents in a scalable and resource-efficient manner. A model-generating process provided by the technique includes three-phases. A first phase generates a set of sample documents associated with a particular class of documents, a second phase applies labels to the sample documents to produce a set of labeled documents, and a third phase generates at least one data-extraction model based on the set of labeled documents. The data-extraction model includes data-extracting logic for extracting at least one specified data item from new documents that match the class of documents. In a data-extracting process, the technique identifies a data-extraction model that applies to the new document and then applies that model.

Status:
Application
Type:

Utility

Filling date:

13 Dec 2019

Issue date:

17 Jun 2021