International Business Machines Corporation
COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS

Last updated: 27 Jul 2021

Abstract:

A neural inference chip includes a global weight memory; a neural core; and a network connecting the global weight memory to the at least one neural core. The neural core comprises a local weight memory. The local weight memory comprises a plurality of memory banks. Each of the plurality of memory banks is uniquely addressable by at least one index. The neural inference chip is adapted to store in the global weight memory a compressed weight block comprising at least one compressed weight matrix. The neural inference chip is adapted to transmit the compressed weight block from the global weight memory to the core via the network. The core is adapted to decode the at least one compressed weight matrix into a decoded weight matrix and store the decoded weight matrix in its local weight memory. The at core is adapted to apply the decoded weight matrix to a plurality of input activations to produce a plurality of output activations.

Status:

Application

Type:

Utility

Filling date:

3 Jan 2020

Issue date:

8 Jul 2021

Full patent description

Patent application document

International Business Machines Corporation COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS

Abstract:

International Business Machines Corporation
COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS