Microsoft Corporation
COMPRESSING TOKENS BASED ON POSITIONS FOR TRANSFORMER MODELS

Last updated:

Abstract:

Embodiments of the present disclosure include systems and methods for compressing tokens based on positions for training data that is used to train transformer models. In some embodiments, a set of input data for training a transformer model is received. The set of input data comprises a set of tokens and a set of position values. A first token in the set of tokens that is the same as a second token in the set of tokens is identified. The position value representing the first token with the position value representing the second token are combined. The set of tokens is modified by removing the first token from the set of tokens. A set of training data is generated to comprise the modified set of tokens and the set of position values. The transformer model is trained using the set of training data.

Status:
Application
Type:

Utility

Filling date:

21 Jul 2020

Issue date:

27 Jan 2022