International Business Machines Corporation
DYNAMIC NETWORK BANDWIDTH IN DISTRIBUTED DEEP LEARNING TRAINING

Last updated:

Abstract:

Embodiments of a method are disclosed. The method includes performing distributed deep learning training on a batch of training data. The method also includes determining training times representing an amount of time between a beginning batch time and an end batch time. Further, the method includes modifying a communication aspect of the communication straggler to reduce a future network communication time for the communication straggler to send a future result of the distributed deep learning training on a new batch of training data in response to the centralized parameter server determining that the learner is the communication straggler.

Status:
Application
Type:

Utility

Filling date:

9 Jul 2020

Issue date:

13 Jan 2022