Microsoft Corporation
REINFORCEMENT LEARNING WITH QUANTUM ORACLE
Last updated:
Abstract:
A computing device is provided, including a processor configured to transmit, to a quantum coprocessor, instructions to encode a Markov decision process (MDP) model as a quantum oracle. The processor may be further configured to train a reinforcement learning model at least in part by transmitting a plurality of superposition queries to the quantum oracle encoded at the quantum coprocessor. Training the reinforcement learning model may further include receiving, from the quantum coprocessor, one or more measurement results in response to the plurality of superposition queries. Training the reinforcement learning model may further include updating a policy function of the reinforcement learning model based at least in part on the one or more measurement results.
Utility
27 Jan 2021
11 Aug 2022