Oracle Corporation
ESTIMATING NUMBER OF DISTINCT VALUES IN A DATA SET USING MACHINE LEARNING

Last updated:

Abstract:

Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.

Status:
Application
Type:

Utility

Filling date:

19 May 2020

Issue date:

25 Nov 2021