Microsoft Corporation
DUAL-MOMENTUM GRADIENT OPTIMIZATION WITH REDUCED MEMORY REQUIREMENTS
Last updated:
Abstract:
Systems and methods related to dual-momentum gradient optimization with reduced memory requirements are described. An example method in a system comprising a gradient optimizer and a memory configured to store momentum values associated with a neural network model comprising L layers is described. The method includes retrieving from the memory a first set of momentum values and a second set of momentum values, corresponding to a layer of the neural network model, having a selected storage format. The method further includes converting the first set of momentum values to a third set of momentum values having a training format associated with the gradient optimizer and converting the second set of momentum values to a fourth set of momentum values having a training format associated with the gradient optimizer. The method further includes performing gradient optimization using the third set of momentum values and the fourth set of momentum values.
Utility
17 Apr 2020
21 Oct 2021