A survey of methods for analyzing and improving GPU energy efficiency

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …

Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving

B Wu, F Iandola, PH Jin… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Object detection is a crucial task for autonomous driving. In addition to requiring high
accuracy to ensure safety, object detection for autonomous driving also requires real-time …

Ultra-performance Pascal GPU and NVLink interconnect

D Foley, J Danskin - IEEE Micro, 2017 - ieeexplore.ieee.org
This article introduces Nvidia's high-performance Pascal GPU. GP100 features in-package
high-bandwidth memory, support for efficient FP16 operations, unified memory, and …

A survey of techniques for architecting and managing GPU register file

S Mittal - IEEE Transactions on Parallel and Distributed …, 2016 - ieeexplore.ieee.org
To support their massively-multithreaded architecture, GPUs use very large register file (RF)
which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs …

Adaptive cache management for energy-efficient GPU computing

X Chen, LW Chang, CI Rodrigues, J Lv… - 2014 47th Annual …, 2014 - ieeexplore.ieee.org
With the SIMT execution model, GPUs can hide memory latency through massive
multithreading for many applications that have regular memory access patterns. To support …

Scaling the power wall: a path to exascale

O Villa, DR Johnson, M Oconnor… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org
Modern scientific discovery is driven by an insatiable demand for computing performance.
The HPC community is targeting development of supercomputers able to sustain 1 ExaFlops …

Coordinated static and dynamic cache bypassing for GPUs

X Xie, Y Liang, Y Wang, G Sun… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
The massive parallel architecture enables graphics processing units (GPUs) to boost
performance for a wide range of applications. Initially, GPUs only employ scratchpad …

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps

N Vijaykumar, G Pekhimenko, A Jog… - ACM SIGARCH …, 2015 - dl.acm.org
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent
execution of thousands of threads. Unfortunately, different bottlenecks during execution and …

Mascar: Speeding up GPU warps by reducing memory pitstops

A Sethia, DA Jamshidi, S Mahlke - 2015 IEEE 21st International …, 2015 - ieeexplore.ieee.org
With the prevalence of GPUs as throughput engines for data parallel workloads, the
landscape of GPU computing is changing significantly. Non-graphics workloads with high …

Zorua: A holistic approach to resource virtualization in GPUs

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …