Box, Inc.
SYSTEMS AND METHODS FOR SHARDING BASED ON DISTRIBUTED INVERTED INDEXES
Last updated:
Abstract:
According to one embodiment, distributing data across a plurality of storage shards can comprise generating a file key for each file of a plurality of files stored in a plurality of physical shards, each physical shard maintained by a node of a plurality of nodes in one or more clusters. The file key can comprise a hash of an enterprise identifier for an entity to which the creator of the file is a member, a hash of a folder identifier for a location in which the file is stored, and a hash of a file identifier uniquely identifying the file. The generated file keys can be sorted into an ordered list and the ordered list can be logically partitioning into a plurality of logical shards. Each logical shard of the plurality of logical shards can then be mapped to one of the plurality of physical shards.
Utility
11 Oct 2019
16 Apr 2020