Microsoft Corporation
Automated Structured Textual Content Categorization Accuracy With Neural Networks
Last updated:
Abstract:
To provide automated categorization of structured textual content individual nodes of textual content, from a document object model encapsulation of the structured textual content, have a multidimensional vector associated with them, where the values of the various dimensions of the multidimensional vector are based on the textual content in the corresponding node, the visual features applied or associated with the textual content of the corresponding node, and positional information of the textual content of the corresponding node. The multidimensional vectors are input to a neighbor-imbuing neural network. The enhanced multidimensional vectors output by the neighbor-imbuing neural network are then be provided to a categorization neural network. The resulting output can be in the form of multidimensional vectors whose dimensionality is proportional to categories into which the structured textual content is to be categorized. A weighted merge takes into account multiple nodes that are grouped together.
Utility
19 Jun 2020
23 Dec 2021