Adobe Inc.
Detecting the bounds of borderless tables in fixed-format structured documents using machine learning

Last updated:

Abstract:

Techniques are disclosed for detecting the bounds of borderless open tables in fixed-format structured documents, such as PDF documents, and grouping text lines into predicted borderless tables. The target document comprises a set of text lines each having a respective vertical and horizontal position in the target document. A sorted list of the text lines is generated based upon a vertical and horizontal position of each text line in the target document. For each text line in the sorted list, a respective probability that the text line in the sorted list belongs to a borderless table is then determined. According to one embodiment, the probability may be determined using a classifier that may employ a logistic regression algorithm.

Status:
Grant
Type:

Utility

Filling date:

22 May 2019

Issue date:

7 Sep 2021