Microsoft Corporation
FORCING WEIGHTS OF TRANSFORMER MODEL LAYERS
Last updated:
Abstract:
Embodiments of the present disclosure include systems and methods for forcing weights of transformer model layers when training a transformer model. In some embodiments, input data is received at a first layer included in a transformer model. The input data is processed through the first layer of the transformer model to produce a first output data. The first output data is processed through the first layer of the transformer model to produce a second output data. The first output data is processed through a second layer included in the transformer model to produce a third output data. A difference is calculated between the second output data and the third output data. Weights included in the first layer of the transformer model are adjusted based on the calculated difference.
Utility
9 Sep 2020
10 Mar 2022