International Business Machines Corporation
Searching multilingual documents based on document structure extraction

Last updated:

Abstract:

An approach is provided for searching multilingual documents. A first classification is determined that includes a first document and other document(s) by minimizing a first distance between a first numerical fixed length vector for the first document and other numerical fixed length vector(s) for other document(s). Based on a query and a natural language detected in the query, a second document is selected. A second stream modeling the second document is encoded as a second numerical fixed length vector. Based on a distance between the first and second numerical fixed length vectors being less than a threshold, the first classification is identified as including the second document. Documents in the first classification are ranked and presented as having content matching the second document's content. At least one of the ranked documents is expressed in a natural language different from the natural language of the second document.

Status:
Grant
Type:

Utility

Filling date:

5 May 2020

Issue date:

11 Jan 2022