International Business Machines Corporation
LEVERAGING DYNAMICAL PRIORS FOR SYMBOLIC MAPPINGS IN SAFE REINFORCEMENT LEARNING
Last updated:
Abstract:
Embodiments of the disclosure provide a reinforcement learning model configured to receive state data (e.g., image state data) and determine candidate actions (e.g., environment navigation actions, environment modification actions, etc.) based on the received state data. Embodiments of the disclosure further provide an object detector configured to generate symbolic state data (e.g., safety relevant data) from the state data. Accordingly, as described herein, a safety system can update a dynamical safety constraint based on the symbolic state data, as well as filter the actions determined by the reinforcement learning model and select an action to be executed based on the dynamical safety constraint. For instance, the safety system classifies each action (e.g., each candidate action determined by the reinforcement learning model) in each symbolic state as either "safe" or "not safe" based on the dynamical safety constraint (e.g., and a safe action may be selected and executed).
Utility
18 Feb 2021
18 Aug 2022