Twitter, Inc. (delisted)
Distributed dataset modification, retention, and replication

Last updated:

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data retention and modification. One of the methods includes dividing partitions into a set of generations according to a retention policy; accumulating modification and deletion events that define changes to be applied to data of the distributed dataset; and when a triggering event occurs for a triggered generation in the set of generations, rolling an oldest partition out of the triggered generation, the rolling comprising: if the oldest partition has reached the end of a retention period for the dataset, marking the oldest partition for deletion in the triggered generation; otherwise: creating a new partition corresponding to the data of the oldest partition, wherein the data is cleaned using a scrubbing process; adding the new partition to a next generation in the set of generations; and marking the oldest partition for deletion in the triggered generation.

Status:
Grant
Type:

Utility

Filling date:

15 Jan 2019

Issue date:

13 Apr 2021