International Business Machines Corporation
Predictive data distribution for parallel databases to optimize storage and query performance

Last updated:

Abstract:

A computer-implemented method for balancing storage utilization and query processing in a distributed database. In one embodiment, the method receives a set of queries to perform on a database; determines a uniqueness score and a usage score based on the set of queries for each column of each data table in the database; normalizes the usage score and the uniqueness score to generate a normalized usage score and a normalized uniqueness score; multiplies the normalized uniqueness score by a first weight factor to produce a weighted uniqueness score; multiplies the normalized usage score by a second weight factor to produce a weighted usage score; combines the weighted uniqueness score and the weighted usage score to generate a combined column score; selects a column having a highest combined column score; and recreates the plurality of data tables of the database on the plurality of nodes using the column as a new distribution key.

Status:
Grant
Type:

Utility

Filling date:

1 Jun 2018

Issue date:

2 Nov 2021