Amazon.com, Inc.
Processing requests using compressed and complete machine learning models

Last updated:

Abstract:

A machine learning-based service processes requests using compressed and complete models to provide faster response times for servicing requests to process data. Initially, a host processes data using a compressed model that is stored in the host's memory and then switches to a larger, more accurate complete model after it is loaded into the host's memory. A host of the machine-learning based service may receive one or more requests to process data. In response, the host uses a compressed version of a model to begin processing the data. The host starts loading the complete version of the model into the host's memory. When the complete version of the model is loaded into memory, the host switches to process a remaining portion of the data using the complete version of the model.

Status:
Grant
Type:

Utility

Filling date:

5 Mar 2019

Issue date:

13 Sep 2022