Rapid7, Inc.
PROGRAMMABLE FRAMEWORK FOR DISTRIBUTED COMPUTATION OF STATISTICAL FUNCTIONS OVER TIME-BASED DATA
Last updated:
Abstract:
Systems and methods are disclosed to implement a distributed query execution system that performs statistical operations on specified time windows over time-based datasets. In embodiments, the query system splits a statistical function into a set of parallel accumulator tasks that correspond to different portions of the dataset and/or function time windows. The accumulator tasks are executed in parallel by individual accumulator nodes to generate individual statistical result structures. The structures are then combined by an aggregator node to produce an aggregate result structure that indicates the results of the statistical function over the time windows. In embodiments, the accumulator and aggregator tasks are implemented and executed using a programmable task execution framework that allows developers to define custom accumulator and aggregator tasks. Advantageously, the query system allows queries with time-windowed statistical functions to be parallelized across a group of worker nodes and scaled to very large datasets.
Utility
24 May 2022
8 Sep 2022