Adobe Inc.
MACHINE LEARNING TECHNIQUES FOR IDENTIFYING LOGICAL SECTIONS IN UNSTRUCTURED DATA

Last updated:

Abstract:

Methods and systems disclosed herein relate generally to systems and methods for using machine learning techniques to generate section identifiers for one or more sections of the unstructured or unformatted text data. A document-processing application identifies, with a feature-prediction layer of a machine-learning model, a feature representation that represents a semantic structure of a text section within the unformatted and unstructured document. The document-processing application generates, with a sequence-prediction layer of the machine-learning model, a section identifier (e.g., heading, body, list) for a corresponding text section by applying the sequence-prediction layer to the feature representation and using contextual information of neighboring text sections.

Status:
Application
Type:

Utility

Filling date:

18 Nov 2020

Issue date:

19 May 2022