Adobe Inc.
REINFORCEMENT LEARNING WITH A STOCHASTIC ACTION SET

Last updated:

Abstract:

Systems and methods are described for a decision-making process including actions characterized by stochastic availability, provide an Markov decision process (MDP) model that includes a stochastic action set based on the decision-making process, compute a policy function for the MDP model using a policy gradient based at least in part on a function representing the stochasticity of the stochastic action set, identify a probability distribution for one or more actions available at a time period using the policy function, and select an action for the time period based on the probability distribution.

Status:
Application
Type:

Utility

Filling date:

23 Sep 2019

Issue date:

25 Mar 2021