International Business Machines Corporation
DYNAMIC COMPUTATION IN DECENTRALIZED DISTRIBUTED DEEP LEARNING TRAINING

Last updated:

Abstract:

Embodiments of a method are disclosed. The method includes performing decentralized distributed deep learning training on a batch of training data. Additionally, the method includes determining a training time wherein the learner performs the decentralized distributed deep learning training on the batch of training data. Further, the method includes generating a table having the training time and other processing times for corresponding other learners performing the decentralized distributed deep learning training on corresponding other batches of other training data. The method also includes determining that the learner is a straggler based on the table and a threshold for the training time. Additionally, the method includes modifying a processing aspect of the straggler to reduce a future training time of the straggler for performing the decentralized distributed deep learning training on a new batch of training data in response to determining the learner is the straggler.

Status:
Application
Type:

Utility

Filling date:

9 Jul 2020

Issue date:

13 Jan 2022