Commvault Systems, Inc.
OPTIMIZED DEDUPLICATION BASED ON BACKUP FREQUENCY IN A DISTRIBUTED DATA STORAGE SYSTEM

Last updated:

Abstract:

Disclosed deduplication techniques at a distributed data storage system guarantee that space reclamation will not affect deduplicated data integrity even without perfect synchronization between components. By understanding certain "behavioral" characteristics and schedule cadences of backup operations that generate backup copies received at the distributed data storage system, data blocks that are not re-written by subsequent backup copies are pro-actively aged, while promoting continued retention of data blocks that are re-written. An expiry scheme operates with block-level granularity. Each unique deduplicated data block is given an expiry timeframe based on the block's arrival time at the distributed data storage system (i.e., when a backup copy supplies the block) and further based on backup frequencies of the various virtual disks referencing a unique system-wide identifier of the block, which is based on the block's hash value. Communications between components are kept to an as-needed basis. Cloud-based and multi-cloud configurations are disclosed.

Status:
Application
Type:

Utility

Filling date:

31 Mar 2022

Issue date:

14 Jul 2022