Snowflake Inc.
SYSTEMS, METHODS, AND DEVICES FOR MANAGING DATA SKEW IN A JOIN OPERATION

Last updated:

Abstract:

Systems, methods, and devices, for managing data skew during a join operation are disclosed. A method includes computing a hash value for a join operation and detecting data skew on a probe side of the join operation at a runtime of the join operation using a lightweight sketch data structure. The method includes identifying a frequent probe-side join key on the probe side of the join operation during a probe phase of the join operation. The method includes identifying a frequent build-side row having a build-side join key corresponding with the frequent probe-side join key. The method includes asynchronously distributing the frequent build-side row to one or more remote servers.

Status:
Application
Type:

Utility

Filling date:

12 Mar 2021

Issue date:

1 Jul 2021