Advanced Micro Devices, Inc.
EFFICIENT WEIGHT CLIPPING FOR NEURAL NETWORKS
Last updated:
Abstract:
Systems, apparatuses, and methods for implementing one-sided per-kernel clipping and weight transformation for neural networks are disclosed. Various parameters of a neural network are quantized from higher-bit representations to lower-bit representations to reduce memory utilization and power consumption. To exploit the effective range of quantized representations, positively biased weights are clipped and negated before convolution. Then, the results are rescaled back after convolution. A one-sided clipping technique is used for transforming weights to exploit the quantization range effectively, with the side chosen to be clipped being the biased side. This technique uses a global strategy for clipping without requiring skilled expertise. This approach allows the system to retain as much information as possible without losing unnecessary accuracy when quantizing parameters from higher-bit representations to lower-bit representations.
Utility
25 Sep 2020
30 Dec 2021