Advanced Micro Devices, Inc.
MATRIX DATA BROADCAST ARCHITECTURE
Last updated:
Abstract:
Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. The processor core executes a software application with matrix operations. The processor core supports the broadcast of shared data to multiple compute units of the processor core. A compiler or other code assigns thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read accesses to a memory subsystem for the shared data, the processor core generates a single access request. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted by the processor core.
Utility
30 Dec 2019
24 Jun 2021