International Business Machines Corporation
Overlapping CNN cache reuse in high resolution and streaming-based deep learning inference engines

Last updated:

Abstract:

A method optimizes Convolutional Neural Network (CNN) inference time for full resolution images. One or more processors divide a full resolution image into a plurality of partially overlapping sub-images. The processor(s) select, from the plurality of partially overlapping sub-images, a first sub-image and a second sub-image that overlap one another in an overlapping area. The processor(s) feed the first sub-image, including the overlapping area, into a Convolutional Neural Network (CNN) in order to create a first inference result for the first sub-image, where the CNN has been trained at a fine resolution. The processor(s) cache an inference result from the CNN for the overlapping area, and then utilize the cached inference result when inferring the second sub-image in the CNN. The processor(s) then identify a specific object in the full resolution image based on inferring the first sub-image and the second sub-image.

Status:
Grant
Type:

Utility

Filling date:

26 Sep 2018

Issue date:

16 Nov 2021