Oracle Corporation
ESTIMATING NUMBER OF DISTINCT VALUES IN A DATA SET USING MACHINE LEARNING
Last updated:
Abstract:
Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.
Status:
Application
Type:
Utility
Filling date:
19 May 2020
Issue date:
25 Nov 2021