B Wu, F Iandola, PH Jin… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Object detection is a crucial task for autonomous driving. In addition to requiring high accuracy to ensure safety, object detection for autonomous driving also requires real-time …
D Foley, J Danskin - IEEE Micro, 2017 - ieeexplore.ieee.org
This article introduces Nvidia's high-performance Pascal GPU. GP100 features in-package high-bandwidth memory, support for efficient FP16 operations, unified memory, and …
S Mittal - IEEE Transactions on Parallel and Distributed …, 2016 - ieeexplore.ieee.org
To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs …
X Chen, LW Chang, CI Rodrigues, J Lv… - 2014 47th Annual …, 2014 - ieeexplore.ieee.org
With the SIMT execution model, GPUs can hide memory latency through massive multithreading for many applications that have regular memory access patterns. To support …
Modern scientific discovery is driven by an insatiable demand for computing performance. The HPC community is targeting development of supercomputers able to sustain 1 ExaFlops …
The massive parallel architecture enables graphics processing units (GPUs) to boost performance for a wide range of applications. Initially, GPUs only employ scratchpad …
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and …
A Sethia, DA Jamshidi, S Mahlke - 2015 IEEE 21st International …, 2015 - ieeexplore.ieee.org
With the prevalence of GPUs as throughput engines for data parallel workloads, the landscape of GPU computing is changing significantly. Non-graphics workloads with high …
This paper introduces a new resource virtualization framework, Zorua, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the …