Alibaba Group Holding Limited
Detecting error in executing computation graph on heterogeneous computing devices

Last updated:

Abstract:

The present disclosure relates to a method for detecting error in executing a computation graph on heterogeneous computing devices. The method comprises receiving a first reference value as an execution result for a first node of the computation graph from a reference device included in the heterogeneous computing devices, receiving a first target value from a target device included in the heterogeneous computing devices as an execution result by the target device for the first node, comparing the first reference value and the first target value, and determining whether the first target value is in error based on the comparison of the first reference value and the first target value. The method can further comprise generating multiple execution contexts for executing the computation graph on the heterogeneous computing devices.

Status:
Grant
Type:

Utility

Filling date:

20 May 2019

Issue date:

7 Sep 2021