International Business Machines Corporation
Similarity matching systems and methods for record linkage

Last updated:

Abstract:

A given query entity of a query database and a set of reference entities from a master database are accessed; each entity accessed corresponds to an entry in a respective database, which is mapped to a set of words that are decomposed into tokens. For each reference entity, a closest token is identified therein for each token of the given query entity, via a given string metric. A number of closest tokens are thus respectively associated with highest scores of similarity between tokens of the query entity and tokens of each reference entity. An entity similarity score is computed based on said highest scores. A reference entity of the master database is identified, which is closest to said given query entity, based on the entity similarity score. Records of the given query entity are linked to records of the master database, based on the closest reference entity identified.

Status:
Grant
Type:

Utility

Filling date:

15 May 2018

Issue date:

23 Nov 2021