Fair Isaac Corporation
SIMILARITY SHARDING

Last updated:

Abstract:

Computer-implemented systems and methods for efficiently searching large data volumes for one or more items with a definable degree of similarity. The systems and methods may include functionality directed to selecting at least one token from the one or more tokens in a target item, the token including an identifiable character string defining, fully or partially, at least one of a name, an address, an entity or other identifier associated with the target item; extracting a character from the identifiable character string after the character string is standardized to a known common version of the character string; responsive to a character distribution lookup, determining that the extracted character corresponds to a first shard from among a plurality of discrete shards; and grouping the item into the first shard, the character distribution lookup being adjustable overtime to provide for a balanced distribution of items across the plurality of discrete shards.

Status:
Application
Type:

Utility

Filling date:

14 May 2021

Issue date:

2 Sep 2021