Advanced Micro Devices, Inc.
MATRIX DATA BROADCAST ARCHITECTURE

Last updated:

Abstract:

Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. The processor core executes a software application with matrix operations. The processor core supports the broadcast of shared data to multiple compute units of the processor core. A compiler or other code assigns thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read accesses to a memory subsystem for the shared data, the processor core generates a single access request. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted by the processor core.

Status:
Application
Type:

Utility

Filling date:

30 Dec 2019

Issue date:

24 Jun 2021