International Business Machines Corporation
DYNAMIC NETWORK BANDWIDTH IN DISTRIBUTED DEEP LEARNING TRAINING
Last updated:
Abstract:
Embodiments of a method are disclosed. The method includes performing distributed deep learning training on a batch of training data. The method also includes determining training times representing an amount of time between a beginning batch time and an end batch time. Further, the method includes modifying a communication aspect of the communication straggler to reduce a future network communication time for the communication straggler to send a future result of the distributed deep learning training on a new batch of training data in response to the centralized parameter server determining that the learner is the communication straggler.
Status:
Application
Type:
Utility
Filling date:
9 Jul 2020
Issue date:
13 Jan 2022