International Business Machines Corporation
Determining chunk boundaries for deduplication of storage objects

Last updated:

Abstract:

Described are a method, system, and computer program product for deduplicating a storage object. A hash of a window of data of a storage object is determined and a determination is made as to whether the window of data of the storage object corresponds to a chunk boundary. A determination is made as to whether the hash of the object matches one pseudo fingerprints in a list of at least one pseudo fingerprint. A storage object chunk boundary based on the window of data is stored in response to the window of data corresponding to the chunk boundary or in response to determining that the hash of the object matches one of the pseudo fingerprints. A determination is made of a new window of data in the storage object following the window of data when the window of data is not an end of data of the storage object.

Status:
Grant
Type:

Utility

Filling date:

3 Sep 2019

Issue date:

16 Aug 2022