Apple Inc.
EXECUTION OF SEGMENTED MACHINE LEARNING MODELS
Last updated:
Abstract:
A device implementing a system to execute machine learning models from memory includes at least one processor configured to receive a request to provide an input to one or more machine learning (ML) models arranged into a graph of connected layers, the one or more ML models stored in the first type of memory. The at least one processor is further configured to divide the graph of connected layers into a plurality of segments such that at least two of the plurality of segments concurrently fits within allocated space of the second type of memory. The at least one processor is further configured to cause the input to be processed through the first segment of the plurality of segments using the second type of memory while a second segment of the plurality of segments is concurrently loaded from the first type of memory into the second type of memory.
Utility
14 Jun 2021
23 Dec 2021