Amazon.com, Inc.
Dynamic distributed data clustering using multi-level hash trees
Last updated:
Abstract:
Techniques are described for clustering data at the point of ingestion for storage using scalable storage resources. To cluster data at the point of ingestion, a data ingestion and query service uses a multilevel hash tree (MLHT)-based index to map a hierarchy of attribute values associated with each data element onto a point of a MLHT (which itself conceptually maps onto a continuous range of values). The total range of the MLHT is divided into one or more data partitions, each of which is mapped to one or more physical storage resources. A mapping algorithm uses the hierarchy of attribute fields to calculate the position of each data element ingested and, consequently, a physical storage resource at which to store the data element.
Utility
28 Jun 2018
14 Sep 2021