International Business Machines Corporation
Predictive data distribution for parallel databases to optimize storage and query performance
Last updated:
Abstract:
A computer-implemented method for balancing storage utilization and query processing in a distributed database. In one embodiment, the computer-implemented method receives a set of queries to perform on a database that is distributed among a plurality of nodes. The database includes a plurality of data tables that each includes a plurality of columns and a plurality of rows. The computer-implemented method determines a uniqueness score and a join score for each column of each data table in the database based on the set of queries. The computer-implemented method determines a new distribution key based on the uniqueness score and the join score for each column of each data table in the database. The computer-implemented method recreates the plurality of data tables of the database on the plurality of nodes using the new distribution key for execution of the set of queries.
Utility
1 Jun 2018
26 Oct 2021