VMware, Inc.
Global deduplication on distributed storage using segment usage tables
Last updated:
Abstract:
Solutions are disclosed for blocks in a multi-writer log-structured file system. Solutions include selecting candidate segments in a storage medium; reading blocks of the candidate segments; determining whether any blocks are duplicates; updating a reference count for the duplicate blocks; identifying unique blocks; writing at least a portion of the unique blocks to a log; determining whether the log has accumulated a full segment of data; based at least on determining that the log has accumulated a full segment of data, writing the full segment to the storage medium; updating a segment usage table (SUT) to mark the candidate segments as free; and updating the SUT to mark a segment of the storage medium as no longer free. Some examples identify a window start time and stop time, because older segments have been deduped and younger segments may be volatile. Some examples adjust the window to improve performance.
Utility
24 Apr 2020
17 Aug 2021