The Boeing Company
Natural Language Processing (NLP) Pipeline for Automated Attribute Extraction
Last updated:
Abstract:
A method for training a filter-based text recognition system for cataloging image portions associated with files using text from the image portions, the method comprising: receiving a first set of text represented in a first image portion associated with a first file; classifying the first image portion into a predetermined group, wherein the classifying is based at least in part on the first set of text; extracting a first set of features from the first set of text; harmonizing existing data in the predetermined group with the first set of text to modify the first set of features; categorizing the first set of text; and determining analytics-based rules based at least in part on the first set of features.
Utility
12 Dec 2019
17 Jun 2021