NVIDIA Corporation
MEMORY EFFICIENT NEURAL NETWORKS

Last updated:

Abstract:

One embodiment of a method includes performing one or more activation functions in a neural network using weights that have been quantized from floating point values to values that are represented using fewer bits than the floating point values. The method further includes performing a first quantization of the weights from the floating point values to the values that are represented using fewer bits than the floating point values after the floating point values are updated using a first number of forward-backward passes of the neural network using training data. The method further includes performing a second quantization of the weights from the floating point values to the values that are represented using fewer bits than the floating point values after the floating point values are updated using a second number of forward-backward passes of the neural network following the first quantization of the weights.

Status:
Application
Type:

Utility

Filling date:

2 Apr 2019

Issue date:

12 Mar 2020