Alibaba Group Holding Limited
Structured Pruning for Machine Learning Model
Last updated:
Abstract:
An input weight pattern of a machine learning model may be received. The input weight pattern may be pruned to produce an output weight pattern based on a predetermined pruning algorithm. The pruning algorithm may include partitioning the input weight pattern into a plurality of sub-patterns, each row of the input weight pattern including sub-rows of a first number of sub-patterns, and each column of the input weight pattern including sub-columns of a second number of sub-patterns; and pruning sub-columns and sub-rows from the plurality of sub-patterns to achieve predetermined column and row sparsities respectively, with a constraint that at least one sub-row in each row of the input weight pattern is not pruned. The output weight pattern may further be compressed to produce a compact weight pattern. The compact weight pattern has lower memory and computational overheads as compared to the input weight pattern for the machine learning model.
Utility
25 Oct 2019
29 Apr 2021