Oracle Corporation
Character-based attribute value extraction system

Last updated:

Abstract:

A system is provided that extracts attribute values. The system receives data including unstructured text from a data store. The system further tokenizes the unstructured text into tokens, where a token is a character of the unstructured text. The system further annotates the tokens with attribute labels, where an attribute label for a token is determined, in least in part, based on a word that the token originates from within the unstructured text. The system further groups the tokens into text segments based on the attribute labels, where a set of tokens that are annotated with an identical attribute label are grouped into a text segment, and where the text segments define attribute values. The system further stores the attribute labels and the attribute values within the data store.

Status:
Grant
Type:

Utility

Filling date:

30 Apr 2015

Issue date:

18 May 2021