Wipro Limited
System and method for retrieving one or more documents

Last updated:

Abstract:

This disclosure relates generally to an information retrieval technology and more particularly to a creation of a taxonomy to facilitate subsequent search and retrieval of information. In one embodiment, an information retrieval device is disclosed, that comprises a processor and a memory that stores instructions, which, on execution, causes the processor to receive an input corpus. Thereafter, input document clusters are generated from top input n-grams associated with the input corpus. Further, top-ranked input n-grams are determined from the top input n-grams. Thereafter, an external corpus is identified based on the top-ranked input n-grams. An enriched corpus (external and input corpus), is clustered based on top enriched n-grams associated with the enriched corpus to generate enriched document clusters. Further, for each n-gram of the enriched corpus, corresponding n-gram clusters are determined. Finally, creates a taxonomy based on the input document clusters, the enriched document clusters, the n-gram clusters and the top-ranked input n-grams.

Status:
Grant
Type:

Utility

Filling date:

20 Nov 2018

Issue date:

22 Mar 2022