Zorua: A holistic approach to resource virtualization in GPUs

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …

Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS

Q Jiao, M Lu, HP Huynh, T Mitra - 2015 IEEE/ACM International …, 2015 - ieeexplore.ieee.org
Current generation GPUs can accelerate high-performance, compute-intensive applications
by exploiting massive thread-level parallelism. The high performance, however, comes at …

Virtual thread: Maximizing thread-level parallelism beyond GPU scheduling limit

MK Yoon, K Kim, S Lee, WW Ro… - ACM SIGARCH Computer …, 2016 - dl.acm.org
Modern GPUs require tens of thousands of concurrent threads to fully utilize the massive
amount of processing resources. However, thread concurrency in GPUs can be diminished …

Warped-preexecution: A GPU pre-execution approach for improving latency hiding

K Kim, S Lee, MK Yoon, G Koo, WW Ro… - … Symposium on High …, 2016 - ieeexplore.ieee.org
This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

Regless: Just-in-time operand staging for GPUs

J Kloosterman, J Beaumont, DA Jamshidi… - Proceedings of the 50th …, 2017 - dl.acm.org
The register file is one of the largest and most power-hungry structures in a Graphics
Processing Unit (GPU), because massive multithreading requires all the register state for …

Unified on-chip memory allocation for SIMT architecture

AB Hayes, EZ Zhang - Proceedings of the 28th ACM international …, 2014 - dl.acm.org
The popularity of general purpose Graphic Processing Unit (GPU) is largely attributed to the
tremendous concurrency enabled by its underlying architecture--single instruction multiple …

A stall-aware warp scheduling for dynamically optimizing thread-level parallelism in GPGPUs

Y Yu, W Xiao, X He, H Guo, Y Wang… - Proceedings of the 29th …, 2015 - dl.acm.org
General-Purpose Graphic Processing Units (GPGPU) have been widely used in high
performance computing as application accelerators due to their massive parallelism and …

Phase aware warp scheduling: Mitigating effects of phase behavior in gpgpu applications

M Awatramani, X Zhu, J Zambreno… - … Conference on Parallel …, 2015 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) have been widely adopted as accelerators for high
performance computing due to the immense amount of computational throughput they offer …

[图书][B] Decoupled vector-fetch architecture with a scalarizing compiler

Y Lee - 2016 - search.proquest.com
As we approach the end of conventional technology scaling, computer architects are forced
to incorporate specialized and heterogeneous accelerators into general-purpose processors …

GPU NTC process variation compensation with voltage stacking

RT Possignolo, E Ebrahimi… - … Transactions on Very …, 2018 - ieeexplore.ieee.org
Near-threshold computing (NTC) has the potential to significantly improve efficiency in high
throughput architectures, such as general-purpose computing on graphic processing unit …