Adobe Inc.
Generating overlap estimations between high-volume digital data sets based on multiple sketch vector similarity estimators

Last updated:

Abstract:

The present disclosure relates to systems, methods, and non-transitory computer-readable media that estimate the overlap between sets of data samples. In particular, in one or more embodiments, the disclosed systems utilize a sketch-based sampling routine and a flexible, accurate estimator to determine the overlap (e.g., the intersection) between sets of data samples. For example, in some implementations, the disclosed systems generate a sketch vector--such as a one permutation hashing vector--for each set of data samples. The disclosed systems further compare the sketch vectors to determine an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator. The disclosed systems utilize one or more of the determined similarity estimators in generating an overlap estimation for the sets of data samples.

Status:
Grant
Type:

Utility

Filling date:

5 Nov 2020

Issue date:

20 Sep 2022