Microsoft Corporation
NEURAL NETWORK COMPRESSION BASED ON BANK-BALANCED SPARSITY

Last updated:

Abstract:

In embodiments of the present disclosure, there is provided an approach for neural network model compression based on bank-balanced sparsity. In embodiments of the present disclosure, a set of weight parameters, such as a weight matrix, in a neural network is divided into a plurality of equal-sized banks in terms of number of elements, and then all of the equal-sized banks are pruned at the same sparsity level. In this way, each pruned bank will have the same number of non-zero elements, which is suitable for hardware speedup. Moreover, since each bank is pruned independently in a fine granularity, the model accuracy can be ensured. Thus, according to embodiments of the present disclosure, the neural network compression method based on bank-balanced sparsity can achieve both high model accuracy and high hardware speedup.

Status:
Application
Type:

Utility

Filling date:

15 Nov 2019

Issue date:

20 May 2021