Uber Technologies, Inc.
MODEL BASED REINFORCEMENT LEARNING BASED ON GENERALIZED HIDDEN PARAMETER MARKOV DECISION PROCESSES

Last updated: 28 Jul 2021

Abstract:

A machine learning model for reinforcement learning uses parameterized families of Markov decision processes (MDP) with latent variables. The system uses latent variables to improve ability of models to transfer knowledge and generalize to new tasks. Accordingly, trained machine learning based models are able to work in unseen environments or combinations of conditions/factors that the machine learning model was never trained on. For example, robots or self-driving vehicles based on the machine learning based models are robust to changing goals and are able to adapt to novel reward functions or tasks flexibly while being able to transfer knowledge about environments and agents to new tasks.

Status:

Application

Type:

Utility

Filling date:

22 May 2020

Issue date:

26 Nov 2020

Full patent description

Patent application document

Uber Technologies, Inc. MODEL BASED REINFORCEMENT LEARNING BASED ON GENERALIZED HIDDEN PARAMETER MARKOV DECISION PROCESSES

Abstract:

Uber Technologies, Inc.
MODEL BASED REINFORCEMENT LEARNING BASED ON GENERALIZED HIDDEN PARAMETER MARKOV DECISION PROCESSES