Visa Inc.
Method, System, and Computer Program Product for Training Distributed Machine Learning Models
Last updated:
Abstract:
Provided is a method for training distributed machine learning models. The method may include initializing a distributed machine learning model on a plurality of computing devices. Training data associated with a plurality of samples may be received. Each sample may be forward propagated through the distributed machine learning model to generate an output. A loss for each sample of the plurality of samples may be determined based on the output. The loss for each sample may be backward propagated to each computing device. The parameter(s) of each computational node may be asynchronously updated based on the loss as it is backward propagated and/or while at least one of the samples is forward propagating. The parameter(s) may be stored and/or communicated to the other computing devices. Each of the other computing devices of the plurality of computing devices may store the parameter(s). A system and computer program product are also disclosed.
Utility
17 Nov 2020
19 May 2022