Alibaba Group Holding Limited
METHOD AND SYSTEM FOR DISTRIBUTED NEURAL NETWORK TRAINING
Last updated:
Abstract:
The present disclosure discloses a system and method for distributed neural network training. The method includes: computing, by a plurality of heterogeneous computation units (HCUs) in a neural network processing system, a first plurality of gradients from a first plurality of samples; aggregating the first plurality of gradients to generate an aggregated gradient; computing, by the plurality of HCUs, a second plurality of gradients from a second plurality of samples; aggregating, at each of the plurality of HCUs, the aggregated gradient with a corresponding gradient of the second plurality of gradients to generate a local gradient update; and updating, at each of the plurality of HCUs, a local copy of a neural network with the local gradient update.
Utility
24 Oct 2019
29 Apr 2021