Amazon.com, Inc.
Scaling record linkage via elimination of highly overlapped blocks

Last updated:

Abstract:

Techniques for scaling record linkage via elimination of highly overlapped blocks are described. A method for scaling record linkage via elimination of highly overlapped blocks includes identifying a first plurality of blocks based at least on a plurality of records stored in a storage service of a provider network, identifying a plurality of sets of matching blocks from the first plurality of blocks, deleting the plurality of sets of matching blocks except for a first block from each set from the plurality of sets of matching blocks, and iteratively performing dynamic blocking based at least on the first block to generate subsequent pluralities of blocks until the subsequent pluralities of blocks are below a threshold size.

Status:
Grant
Type:

Utility

Filling date:

30 Sep 2019

Issue date:

7 Sep 2021