Amazon.com, Inc.
Low latency neural network model loading

Last updated:

Abstract:

An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.

Status:
Grant
Type:

Utility

Filling date:

27 Nov 2019

Issue date:

23 Nov 2021