Wipro Limited
Method and system for determining structural blocks of a document
Last updated:
Abstract:
This disclosure relates generally to document processing, and more particularly to method and system for determining structural blocks of a document. In one embodiment, the method may include extracting text from the document, the text including text lines. The method may further include generating a feature vector for each of the text lines, the feature vector for the text line including a set of feature values for a set of corresponding features in the text line. The method may further include creating an input matrix for each of the text lines, the input matrix for the text line including a set of feature vectors corresponding to a set of neighboring text lines along with the text line. The method may further include determining a structural block tag for each of the text lines based on the corresponding input matrix using a machine learning model.
Utility
31 Mar 2018
18 Feb 2020