Walmart Inc.
TRAVERSING A LARGE CONNECTED COMPONENT ON A DISTRIBUTED FILE-BASED DATA STRUCTURE
Last updated:
Abstract:
A distributed system including multiple processing nodes. The distributed system can perform certain acts. The acts can include receiving a set of input nodes and a set of criteria. The acts can include obtaining an adjacency list representing a large connected component. The large connected component can include nodes, edges, and edge metadata. A quantity of the nodes of the large connected component can exceed 1 billion. The adjacency list can be distributed across the multiple processing nodes. The nodes of the large connected component can include the input nodes. The acts also can include performing one or more iterations of traversing the large connected component until a stopping condition is satisfied. Each iteration can include processing a set of input nodes at the multiple processing nodes using the set of criteria to generate first data at the multiple processing nodes, determining a set of output nodes such that each output node of the set of output nodes is one hop from a respective input node of the set of input nodes, consolidating the first data from the multiple processing nodes to a first processing node of the multiple processing nodes, processing the first data at the first processing node; and assigning the set of input nodes for a subsequent iteration of the one or more iterations based on the set of output nodes when the stopping condition is not satisfied. The acts further can include outputting second data based on the first data received and processed at the first processing node during the one or more iterations. Other embodiments are disclosed.
Utility
30 Jan 2020
5 Aug 2021