Uber Technologies, Inc.
MODEL BASED REINFORCEMENT LEARNING BASED ON GENERALIZED HIDDEN PARAMETER MARKOV DECISION PROCESSES

Last updated:

Abstract:

A machine learning model for reinforcement learning uses parameterized families of Markov decision processes (MDP) with latent variables. The system uses latent variables to improve ability of models to transfer knowledge and generalize to new tasks. Accordingly, trained machine learning based models are able to work in unseen environments or combinations of conditions/factors that the machine learning model was never trained on. For example, robots or self-driving vehicles based on the machine learning based models are robust to changing goals and are able to adapt to novel reward functions or tasks flexibly while being able to transfer knowledge about environments and agents to new tasks.

Status:
Application
Type:

Utility

Filling date:

22 May 2020

Issue date:

26 Nov 2020