Alibaba Group Holding Limited
Structured Pruning for Machine Learning Model

Last updated:

Abstract:

An input weight pattern of a machine learning model may be received. The input weight pattern may be pruned to produce an output weight pattern based on a predetermined pruning algorithm. The pruning algorithm may include partitioning the input weight pattern into a plurality of sub-patterns, each row of the input weight pattern including sub-rows of a first number of sub-patterns, and each column of the input weight pattern including sub-columns of a second number of sub-patterns; and pruning sub-columns and sub-rows from the plurality of sub-patterns to achieve predetermined column and row sparsities respectively, with a constraint that at least one sub-row in each row of the input weight pattern is not pruned. The output weight pattern may further be compressed to produce a compact weight pattern. The compact weight pattern has lower memory and computational overheads as compared to the input weight pattern for the machine learning model.

Status:
Application
Type:

Utility

Filling date:

25 Oct 2019

Issue date:

29 Apr 2021