A hierarchical thread scheduler and register file for energy-efficient throughput processors

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org

This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …

被引用次数：84 相关文章所有 27 个版本

[PDF] psu.edu

Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS

Q Jiao, M Lu, HP Huynh, T Mitra - 2015 IEEE/ACM International …, 2015 - ieeexplore.ieee.org

Current generation GPUs can accelerate high-performance, compute-intensive applications
by exploiting massive thread-level parallelism. The high performance, however, comes at …

被引用次数：84 相关文章所有 7 个版本

Virtual thread: Maximizing thread-level parallelism beyond GPU scheduling limit

MK Yoon, K Kim, S Lee, WW Ro… - ACM SIGARCH Computer …, 2016 - dl.acm.org

Modern GPUs require tens of thousands of concurrent threads to fully utilize the massive
amount of processing resources. However, thread concurrency in GPUs can be diminished …

被引用次数：68 相关文章所有 6 个版本

[PDF] academia.edu

Warped-preexecution: A GPU pre-execution approach for improving latency hiding

K Kim, S Lee, MK Yoon, G Koo, WW Ro… - … Symposium on High …, 2016 - ieeexplore.ieee.org

This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

被引用次数：61 相关文章所有 6 个版本

[PDF] acm.org

Regless: Just-in-time operand staging for GPUs

J Kloosterman, J Beaumont, DA Jamshidi… - Proceedings of the 50th …, 2017 - dl.acm.org

The register file is one of the largest and most power-hungry structures in a Graphics
Processing Unit (GPU), because massive multithreading requires all the register state for …

被引用次数：45 相关文章所有 4 个版本

[PDF] rutgers.edu

Unified on-chip memory allocation for SIMT architecture

AB Hayes, EZ Zhang - Proceedings of the 28th ACM international …, 2014 - dl.acm.org

The popularity of general purpose Graphic Processing Unit (GPU) is largely attributed to the
tremendous concurrency enabled by its underlying architecture--single instruction multiple …

被引用次数：37 相关文章所有 4 个版本

A stall-aware warp scheduling for dynamically optimizing thread-level parallelism in GPGPUs

Y Yu, W Xiao, X He, H Guo, Y Wang… - Proceedings of the 29th …, 2015 - dl.acm.org

General-Purpose Graphic Processing Units (GPGPU) have been widely used in high
performance computing as application accelerators due to their massive parallelism and …

被引用次数：30 相关文章

[PDF] iastate.edu

Phase aware warp scheduling: Mitigating effects of phase behavior in gpgpu applications

M Awatramani, X Zhu, J Zambreno… - … Conference on Parallel …, 2015 - ieeexplore.ieee.org

Graphics Processing Units (GPUs) have been widely adopted as accelerators for high
performance computing due to the immense amount of computational throughput they offer …

被引用次数：21 相关文章所有 6 个版本

[PDF] proquest.com

[图书][B] Decoupled vector-fetch architecture with a scalarizing compiler

Y Lee - 2016 - search.proquest.com

As we approach the end of conventional technology scaling, computer architects are forced
to incorporate specialized and heterogeneous accelerators into general-purpose processors …

被引用次数：19 相关文章所有 7 个版本

[PDF] ieee.org

GPU NTC process variation compensation with voltage stacking

RT Possignolo, E Ebrahimi… - … Transactions on Very …, 2018 - ieeexplore.ieee.org

Near-threshold computing (NTC) has the potential to significantly improve efficiency in high
throughput architectures, such as general-purpose computing on graphic processing unit …

被引用次数：16 相关文章所有 8 个版本

高级搜索

QQ 群