eBay Inc.
CLUSTER-BASED NEAR-DUPLICATE DOCUMENT DETECTION

Last updated:

Abstract:

Technologies are shown for near-duplicate detection where a message is received and a fingerprint generated for some or all of its content. A distance metric is determined between the received message fingerprint and fingerprints for a cluster of other messages. If the message fingerprint matches a fingerprint in a cluster, then the received message is added to the matching cluster. A risk value associated with the matching cluster can be determined. If the risk value is greater than a risk threshold, the received message fingerprint can be added to a risk list or an alert, notification or block indication can be generated. A fingerprint can be determined for an inquiry message and, if the inquiry message fingerprint matches a fingerprint in the risk list, then an alert can be generated. The distance metric between fingerprints correlates to a similarity between the message content corresponding to the fingerprints.

Status:
Application
Type:

Utility

Filling date:

15 May 2020

Issue date:

18 Nov 2021