Microsoft Corporation
Transformer-Based Neural Network including a Mask Attention Network

Last updated:

Abstract:

A transformer-based neural network includes at least one mask attention network (MAN). The MAN computes an original attention data structure that expresses influence between pairs of data items in a sequence of data items. The MAN then modifies the original data structure by mask values in a mask data structure, to produce a modified attention data structure. Compared to the original attention data structure, the modified attention data structure better accounts for the influence of neighboring data items in the sequence of data items, given a particular data item under consideration. The mask data structure used by the MAN can have static and/or machine-trained mask values. In one implementation, the transformer-based neural network includes at least one MAN in combination with at least one other attention network that does not use a mask data structure, and at least one feed-forward neural network.

Status:
Application
Type:

Utility

Filling date:

27 Aug 2020

Issue date:

3 Mar 2022