International Business Machines Corporation
Single-pass distributed sampling from block-partitioned matrices

Last updated:

Abstract:

A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.

Status:
Grant
Type:

Utility

Filling date:

8 Feb 2019

Issue date:

7 Dec 2021