Intel Corporation
ALGORITHMS FOR OPTIMIZING SMALL MESSAGE COLLECTIVES WITH HARDWARE SUPPORTED TRIGGERED OPERATIONS
Last updated:
Abstract:
Algorithms for optimizing small message collectives with hardware supported triggered operations and associated methods, apparatus, and systems. The algorithms are implemented in a distributed compute environment comprising a plurality of ranks including a root, a plurality of intermediate nodes, and a plurality of leaf nodes, where each of the plurality of ranks comprising a compute platform having a communication interface including embedded logic for implementing the algorithms. Collectives are employed to transfer data between parent ranks and child ranks. In connection with the collectives, control messages are sent from children of a collective to the parent of the collective informing the parent that the children of the collective have free buffers ready to receive data. The parent employs a counter to determine that a control message has been received from each of its children indicating each child has a free buffer prior to sending data to the children in the collective.
Utility
23 Dec 2020
2 Sep 2021