International Business Machines Corporation
Predictive data distribution for parallel databases to optimize storage and query performance

Last updated:

Abstract:

A computer-implemented method for balancing storage utilization and query processing in a distributed database. In one embodiment, the computer-implemented method receives a set of queries to perform on a database that is distributed among a plurality of nodes. The database includes a plurality of data tables that each includes a plurality of columns and a plurality of rows. The computer-implemented method determines a uniqueness score and a join score for each column of each data table in the database based on the set of queries. The computer-implemented method determines a new distribution key based on the uniqueness score and the join score for each column of each data table in the database. The computer-implemented method recreates the plurality of data tables of the database on the plurality of nodes using the new distribution key for execution of the set of queries.

Status:
Grant
Type:

Utility

Filling date:

1 Jun 2018

Issue date:

26 Oct 2021