International Business Machines Corporation
Matching strings in a large relational database

Last updated:

Abstract:

A computer-implemented method identifies strings of data from a database. One or more processors receive data as an input string. The processor(s) generate a first binary code using a binary locality sensitive hashing of k-grams in the input string, where the binary locality sensitive hashing on the k-grams in the input string is derived from a first set of bi-grams in the input string, a second set of bi-grams in the input string, and a quantity of intersecting bi-grams from the first set of bi-grams and the second set of bi-grams. In response to receiving a search request for a particular string, the processor(s) generate a second binary code using a binary locality sensitive hashing on the particular string, and search a database in a query process. The processor(s) then rank and return a set of similar strings found in the database.

Status:
Grant
Type:

Utility

Filling date:

21 Oct 2019

Issue date:

1 Feb 2022