Microsoft Corporation
DYNAMIC CACHE MANAGEMENT IN BEAM SEARCH
Last updated:
Abstract:
Systems and methods for dynamically modifying a cache associated with a neural network model of a natural language generator are described. In examples, a neural network model employs a beam search algorithm at a decoder when decoding output and generating predicted output candidates. The decoder utilizes caching techniques to improve a speed at which the neural network operations. When an amount of memory utilized by one or more caches of the neural network model is determined to exceed a threshold memory size, a layer-specific portion of a cache associated with a layer of the neural network model is identified. The identified layer-specific portion of the cache can be deleted when the amount of memory utilized by the cache of the neural network model exceeds the threshold memory size. In examples, data in the cache is deduplicated and/or deleted.
Utility
18 Feb 2021
31 Mar 2022