The Toronto-Dominion Bank
Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures

Last updated: 1 Sep 2021

Abstract:

An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.

Status:

Application

Type:

Utility

Filling date:

5 Feb 2021

Issue date:

19 Aug 2021

Full patent description

Patent application document

The Toronto-Dominion Bank Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures

Abstract:

The Toronto-Dominion Bank
Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures