Microsoft Corporation
Snapshot-based data corruption detection

Last updated:

Abstract:

Embodiments described herein detect data corruption in a distributed data set system. For example, a system comprises node(s) for processing queries with respect to a distributed data set comprising a plurality of storage segments. A write transaction resulting from a query with respect to a particular storage segment is logged in a log record that describes a modification to the storage segment. A log service provides the log record to a data server managing a portion of the distributed data set in which the storage segment is included, which performs the write transaction with respect to the storage segment. For redundancy purposes, the data server has replica(s) that manage respective replicas of the portion of the distributed data set managed thereby. For backup purposes, snapshots of the replica(s) are periodically generated. To determine a data corruption, a snapshot of one replica is cross-validated with a snapshot of another replica.

Status:
Grant
Type:

Utility

Filling date:

22 Apr 2021

Issue date:

15 Feb 2022