International Business Machines Corporation
LEARNING WITH MOMENT ESTIMATION USING DIFFERENT TIME CONSTANTS
Last updated:
Abstract:
A technique for training a model includes obtaining a training example for a model having model parameters stored on one or more computer readable storage mediums operably coupled to the hardware processor. The training example includes an outcome and features to explain the outcome. A gradient is calculated with respect to the model parameters of the model using the training example. Two estimates of a moment of the gradient with two different time constants are computed for the same type of the moment using the gradient. Using a hardware processor, the model parameters of the model are updated using the two estimates of the moment with the two different time constants to reduce errors while calculating the at least two estimates of the moment of the gradient.
Utility
11 Feb 2020
12 Aug 2021