Microsoft Corporation
DECIMATING HIDDEN LAYERS FOR TRAINING TRANSFORMER MODELS
Last updated:
Abstract:
Embodiments of the present disclosure include systems and methods for decimating hidden layers for training transformer models. In some embodiments, input data for training a transform model is received receive at a transformer layer included in the transformer model. The transformer layer comprises a hidden layer. The hidden layer comprises a set of neurons configured to process training data. A subset of the set of neurons of the hidden layer is selected. Only the subset of the set of neurons of the hidden layer are used to train the transformer model with the input data.
Status:
Application
Type:
Utility
Filling date:
1 Oct 2020
Issue date:
7 Apr 2022