Coupa Software Incorporated
AUTOMATIC SELECTION OF TEMPLATES FOR EXTRACTION OF DATA FROM ELECTRONIC DOCUMENTS
Last updated:
Abstract:
A computer-implemented method for automatic template selection for extracting data from an input electronic document is provided. The method includes receiving a first set of candidate templates and an input electronic document. For each candidate template, a template similarity ratio value is calculated that represents a similarity of the candidate template to the input electronic document. The first set of candidate templates are ranked according to the template similarity ratios and then matched to the input electronic document resulting in generating a normalized similarity score for each particular candidate from among the candidate templates. Differences in normalized similarity scores of successive pairs of the candidate templates is determined and a breaking point is established. A second set of candidate templates is formed by selecting candidate templates that are ranked above the breaking point. Data from the input electronic document is extracted using the second set of candidate templates.
Utility
20 Nov 2020
27 Jan 2022