NVIDIA Corporation
FULLY-FUSED NEURAL NETWORK EXECUTION

Last updated: 14 Sep 2022

Abstract:

A fully-connected neural network may be configured for execution by a processor as a fully-fused neural network by limiting slow global memory accesses to reading and writing inputs to and outputs from the fully-connected neural network. The computational cost of fully-connected neural networks scale quadratically with its width, whereas its memory traffic scales linearly. Modern graphics processing units typically have much greater computational throughput compared with memory bandwidth, so that for narrow, fully-connected neural networks, the linear memory traffic is the bottleneck. The key to improving performance of the fully-connected neural network is to minimize traffic to slow "global" memory (off-chip memory and high-level caches) and to fully utilize fast on-chip memory (low-level caches, "shared" memory, and registers), which is achieved by the fully-fused approach. A real-time neural radiance caching technique for path-traced global illumination is implemented using the fully-fused neural network for caching scattered radiance components of global illumination.

Status:

Application

Type:

Utility

Filling date:

7 Jun 2021

Issue date:

8 Sep 2022

Full patent description

Patent application document

NVIDIA Corporation FULLY-FUSED NEURAL NETWORK EXECUTION

Abstract:

NVIDIA Corporation
FULLY-FUSED NEURAL NETWORK EXECUTION