Amazon.com, Inc.
Real-time detection of duplicate data records

Last updated:

Abstract:

Disclosed are various embodiments for real-time detection of duplicate data records. A duplicate detection application generates a set of clusters from a set of data records by grouping each data record in the set of data records according to similarity to respective centroid data records of the set of clusters. The duplicate detection application determines whether a particular data record has a potential duplicate in the set of data records by first comparing the particular data record to the respective centroid data records to identify a most similar cluster in the set of clusters. The duplicate detection application then compares the particular data record to each data record in the most similar cluster.

Status:
Grant
Type:

Utility

Filling date:

19 Jun 2019

Issue date:

24 May 2022