International Business Machines Corporation
Semantic header detection using pre-trained embeddings
Last updated:
Abstract:
A method, computer system, and a computer program product for detecting one or more semantic headers in one or more tabular structures by utilizing a custom pre-trained embeddings model is provided. The present invention may include receiving the custom pre-trained embeddings model. The present invention may also include computing one or more dot product values associated with the one or more tabular structures from the one or more documents based on the context of each cell associated with the one or more tabular structures in the one or more documents. The present invention may then include generating one or more similarity feature values based on the computed one or more dot product values. The present invention may further include detecting the one or more semantic headers associated with the one or more tabular structures based on the one or more similarity feature values.
Utility
10 Oct 2019
19 Apr 2022