International Business Machines Corporation
Data structure generation for tabular information in scanned images
Last updated:
Abstract:
Computer-implemented methods are provided for generating a data structure representing tabular information in a scanned image. Such a method can include storing image data representing a scanned image of a table, processing the image data to identify positions of characters and lines in the image, and mapping locations in the image of information cells, each containing a set of the characters, in dependence on said positions. The method can also include, for each cell, determining cell attribute values, dependent on the cell locations, for a predefined set of cell attributes, and supplying the attribute values as inputs to a machine-learning model trained to pre-classify cells as header cells or data cells in dependence on cell attribute values.
Utility
24 Jun 2019
13 Jul 2021