NVIDIA Corporation
TECHNIQUES FOR MEMORY ERROR ISOLATION

Last updated:

Abstract:

Apparatuses, systems, and techniques to detect memory errors and isolate or migrate partitions on a parallel processing unit using an application programming interface to facilitate parallel computing, such as CUDA. In at least one embodiment, interrupts are intercepted and processed on a graphics processing unit indicating a memory error for one or more partitions, and a policy is applied to isolate that memory error from other partitions.

Status:
Application
Type:

Utility

Filling date:

20 Mar 2020

Issue date:

23 Sep 2021