Advanced Micro Devices, Inc.
EFFICIENT WEIGHT CLIPPING FOR NEURAL NETWORKS

Last updated:

Abstract:

Systems, apparatuses, and methods for implementing one-sided per-kernel clipping and weight transformation for neural networks are disclosed. Various parameters of a neural network are quantized from higher-bit representations to lower-bit representations to reduce memory utilization and power consumption. To exploit the effective range of quantized representations, positively biased weights are clipped and negated before convolution. Then, the results are rescaled back after convolution. A one-sided clipping technique is used for transforming weights to exploit the quantization range effectively, with the side chosen to be clipped being the biased side. This technique uses a global strategy for clipping without requiring skilled expertise. This approach allows the system to retain as much information as possible without losing unnecessary accuracy when quantizing parameters from higher-bit representations to lower-bit representations.

Status:
Application
Type:

Utility

Filling date:

25 Sep 2020

Issue date:

30 Dec 2021