Alibaba Group Holding Limited
COMPUTATION UNIT, RELATED APPARATUS, AND METHOD
Last updated:
Abstract:
This disclosure provides a computation unit, a related apparatus, and a method. The computation unit includes: a weight buffer adapted to store a row vector fetched from an M.times.K.alpha. sparsified weight matrix, where M and K are respectively a number of rows and a number of columns of the weight matrix before being sparsified, and .alpha. is a sparsity coefficient; an excitation buffer adapted to store a K.times.N excitation matrix; an index selector adapted to store a selection index corresponding to the row vector, and select a row of the excitation matrix based on the selection index, to obtain a K.alpha..times.N selected excitation matrix; and a dot product computation unit adapted to multiply the row vector by the selected excitation matrix. This disclosure implements a manner of running a DNN on hardware. In such a manner, structured sparsity of a DNN can be fully utilized, so that inference efficiency is improved; moreover, a register file occupies relatively small bandwidth, and a timing constraint is weak.
Utility
25 Oct 2021
12 May 2022