Adobe Inc.
MACHINE LEARNING TECHNIQUES FOR IDENTIFYING LOGICAL SECTIONS IN UNSTRUCTURED DATA
Last updated:
Abstract:
Methods and systems disclosed herein relate generally to systems and methods for using machine learning techniques to generate section identifiers for one or more sections of the unstructured or unformatted text data. A document-processing application identifies, with a feature-prediction layer of a machine-learning model, a feature representation that represents a semantic structure of a text section within the unformatted and unstructured document. The document-processing application generates, with a sequence-prediction layer of the machine-learning model, a section identifier (e.g., heading, body, list) for a corresponding text section by applying the sequence-prediction layer to the feature representation and using contextual information of neighboring text sections.
Utility
18 Nov 2020
19 May 2022