Snowflake Inc.
DETECTING DATA SKEW IN A JOIN OPERATION

Last updated:

Abstract:

Systems, methods, and devices, for managing data skew during a join operation are disclosed. A method includes computing a hash value for a join operation and detecting data skew on a probe side of the join operation at a runtime of the join operation using a lightweight sketch data structure. The method includes identifying a frequent probe-side join key on the probe side of the join operation during a probe phase of the join operation. The method includes identifying a frequent build-side row having a build-side join key corresponding with the frequent probe-side join key. The method includes asynchronously distributing the frequent build-side row to one or more remote servers.

Status:
Application
Type:

Utility

Filling date:

15 Oct 2021

Issue date:

3 Feb 2022