A survey of methods for analyzing and improving GPU energy efficiency

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …

A survey of architectural techniques for managing process variation

S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
Process variation—deviation in parameters from their nominal specifications—threatens to
slow down and even pause technological scaling, and mitigation of it is the way to continue …

Warped-compression: Enabling power efficient GPUs through register compression

S Lee, K Kim, G Koo, H Jeon, WW Ro… - ACM SIGARCH …, 2015 - dl.acm.org
This paper presents Warped-Compression, a warp-level register compression scheme for
reducing GPU power consumption. This work is motivated by the observation that the …

Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory

M Mao, W Wen, Y Zhang, Y Chen, H Li - Proceedings of the 51st Annual …, 2014 - dl.acm.org
SRAM based register file (RF) is one of the major factors limiting the scaling of GPGPU. In
this work, we propose to use the emerging nonvolatile domain-wall-shift-write based …

LTRF: Enabling high-capacity register files for GPUs via hardware/software cooperative register prefetching

M Sadrosadati, A Mirhosseini, SB Ehsani… - ACM SIGPLAN …, 2018 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …

Logics with aggregate operators

L Hella, L Libkin, J Nurmonen, L Wong - Journal of the ACM (JACM), 2001 - dl.acm.org
We study adding aggregate operators, such as summing up elements of a column of a
relation, to logics with counting mechanisms. The primary motivation comes from database …

Warped-preexecution: A GPU pre-execution approach for improving latency hiding

K Kim, S Lee, MK Yoon, G Koo, WW Ro… - … Symposium on High …, 2016 - ieeexplore.ieee.org
This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

A survey of techniques for architecting and managing GPU register file

S Mittal - IEEE Transactions on Parallel and Distributed …, 2016 - ieeexplore.ieee.org
To support their massively-multithreaded architecture, GPUs use very large register file (RF)
which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs …

CryoCache: A fast, large, and cost-effective cache architecture for cryogenic computing

D Min, I Byun, GH Lee, S Na, J Kim - Proceedings of the Twenty-Fifth …, 2020 - dl.acm.org
Cryogenic computing, which is to run a computer at extremely low temperatures (eg, 77K), is
a highly promising solution to dramatically improve the computer's performance and power …

Regless: Just-in-time operand staging for GPUs

J Kloosterman, J Beaumont, DA Jamshidi… - Proceedings of the 50th …, 2017 - dl.acm.org
The register file is one of the largest and most power-hungry structures in a Graphics
Processing Unit (GPU), because massive multithreading requires all the register state for …