SAP SE
Machine Learning Performance and Workload Management

Last updated:

Abstract:

Systems and methods are described herein for reducing resource consumption of a database system and a machine learning (ML) system. Data is received from an ML application of a database system. The data includes a first inference call for a predicted response to the received data. The first inference call is a request to a ML model to generate one or more predictions for which a response is unknown. An ML model using the received data generates an output comprising the predicted response to the data. The output for future inference calls is cached in an inference cache so as to bypass the ML model. The generated output to the ML application is provided by the ML model. A second inference call is received which includes the data of the first inference call. The cached output is retrieved from the inference cache. The retrieving bypasses the ML model.

Status:
Application
Type:

Utility

Filling date:

2 Jul 2019

Issue date:

7 Jan 2021