NetApp, Inc.
Methods for optimized variable-size deduplication using two stage content-defined chunking and devices thereof

Last updated:

Abstract:

Methods, non-transitory machine readable media, and computing devices that compare a hash value to a predefined value for sliding windows in parallel for segments partitioned from an input data stream. A bit array is parsed according to minimum and maximum chunk sizes to identify chunk boundaries for the input data stream. The bit array is populated based on a result of the comparison and portions of the bit array are parsed in parallel. Unique chunks of the input data stream defined by the chunk boundaries are stored in a storage device. Accordingly, this technology utilizes parallel processing in two stages. In a first stage, rolling window based hashing is performed concurrently to identify potential chunk boundaries. In a second stage, actual chunk boundaries are selected based on minimum and maximum chunk size constraints. This technology advantageously facilitates significant deduplication ratio improvement as well as improved parallel chunking performance.

Status:
Grant
Type:

Utility

Filling date:

14 Jan 2019

Issue date:

15 Dec 2020