Alibaba Group Holding Limited
METHOD AND SYSTEM FOR DISTRIBUTED NEURAL NETWORK TRAINING

Last updated:

Abstract:

The present disclosure discloses a system and method for distributed neural network training. The method includes: computing, by a plurality of heterogeneous computation units (HCUs) in a neural network processing system, a first plurality of gradients from a first plurality of samples; aggregating the first plurality of gradients to generate an aggregated gradient; computing, by the plurality of HCUs, a second plurality of gradients from a second plurality of samples; aggregating, at each of the plurality of HCUs, the aggregated gradient with a corresponding gradient of the second plurality of gradients to generate a local gradient update; and updating, at each of the plurality of HCUs, a local copy of a neural network with the local gradient update.

Status:
Application
Type:

Utility

Filling date:

24 Oct 2019

Issue date:

29 Apr 2021