Nutanix, Inc.
TECHNIQUE FOR REPLICATING OPLOG INDEX AMONG NODES OF A CLUSTER
Last updated:
Abstract:
A technique replicates an index of an operations log (oplog) from a primary node to a secondary node of a cluster in the event of a failure of the primary node. The oplog functions as a staging area to coalesce random write operations directed to a virtual disk (vdisk) stored on a backend storage tier organized as an extent store. The oplog temporarily caches data associated with the random write operations (i.e., write data) as well as metadata describing the write data. The metadata includes descriptors to the write data corresponding to virtual address regions, i.e., offset ranges, of the vdisk and are used to identify the offset ranges of write data for the vdisk that are cached in the oplog. To facilitate fast lookup operations of the offset ranges when determining whether write data io is cached in the oplog, an oplog index provides a state of the latest data for offset ranges of the vdisk. The technique enables fast failover of metadata used to construct the oplog index in memory of a node, such as the secondary node, without downtime or significant metadata replay.
Utility
31 Mar 2021
4 Aug 2022