International Business Machines Corporation
Parallel deduplication using automatic chunk sizing

Last updated:

Abstract:

An approach for parallel deduplication using automatic chunk sizing. A dynamic chunk deduplicator receives a request to perform data deduplication where the request includes an identification of a dataset. The dynamic chunk deduplicator analyzes file level usage for one or more data files including the dataset to associate a deduplication chunk size with the one or more data files. The dynamic chunk deduplicator creates a collection of data segments from the dataset, based on the deduplication chunk size associated with the one or more data files. The dynamic chunk deduplicator creates a deduplication data chunk size plan where the deduplication data chunk size plan includes deduplication actions for the collection of data segments and outputs the deduplication data chunk size plan.

Status:
Grant
Type:

Utility

Filling date:

15 Oct 2019

Issue date:

26 Oct 2021