Apple Inc.
Incorporating user feedback into text prediction models via joint reward planning
Last updated:
Abstract:
An example process includes: obtaining input token(s); determining, using a joint prediction model, based on the input token(s): a first predicted token following the input token(s) and a second predicted token following the first predicted token; and a first user action to be performed on the first predicted token, where determining the first user action includes: determining a first reward value for performing the first user action based on a first current reward value for performing the first user action and a second reward value for performing a second user action on the second predicted token; outputting the first predicted token; detecting a user action performed on the first predicted token; and in accordance with a determination that the detected user action does not match the first user action: causing parameters of the joint prediction model to be updated, the parameters being configured to determine the first user action.
Utility
31 Aug 2020
23 Nov 2021