Datawatch Corporation
Systems and methods for generating tables from print-ready digital source documents
Last updated:
Abstract:
Systems and methods are provided for generating tables from print-ready digital source documents. A document is received and one or more text fragments are identified on a rendered page of the document. A wrapping region collection is generated, comprising one or more wrapping regions. A tabular, narrative and label score is generated for each wrapping region. A block type is assigned to each wrapping region based on the scores. A wrapping region group and a block set are generated. One or more tables are generated based on text fragments corresponding to one of the one or more blocks. The text fragments are organized into corresponding fields of the one or more tables.
Status:
Grant
Type:
Utility
Filling date:
9 May 2019
Issue date:
15 Dec 2020