Electronic Arts Inc.
Reinforcement Learning for Concurrent Actions
Last updated:
Abstract:
A computer-implemented method comprises instantiating a policy function approximator. The policy function approximator is configured to calculate a plurality of estimated action probabilities in dependence on a given state of the environment. Each of the plurality of estimated action probabilities corresponds to a respective one of a plurality of discrete actions performable by the reinforcement learning agent within the environment. An initial plurality of estimated action probabilities in dependence on a first state of the environment are calculated. Two or more of the plurality of discrete actions are concurrently performed within the environment when the environment is in the first state. In response to the concurrent performance, a reward value is received. In response to the received reward value being greater than a baseline reward value, the policy function approximator is updated, such that it is configured to calculate an updated plurality of estimated action probabilities.
Utility
12 Nov 2018
19 Sep 2019