Analyzing CUDA workloads using a detailed GPU simulator

A Bakhoda, GL Yuan, WWL Fung… - … analysis of systems …, 2009 - ieeexplore.ieee.org
Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models
that understanding their performance can provide insight in designing tomorrow's manycore …

Thread block compaction for efficient SIMT control flow

WWL Fung, TM Aamodt - 2011 IEEE 17th international …, 2011 - ieeexplore.ieee.org
Manycore accelerators such as graphics processor units (GPUs) organize processing units
into single-instruction, multiple data “cores” to improve throughput per unit hardware cost …

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Y Lee, R Avizienis, A Bishara, R Xia… - Proceedings of the 38th …, 2011 - dl.acm.org
We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …

Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications

H Park, Y Park, S Mahlke - Proceedings of the 42nd Annual IEEE/ACM …, 2009 - dl.acm.org
Mobile computing in the form of smart phones, netbooks, and personal digital assistants has
become an integral part of our everyday lives. Moving ahead to the next generation of …

Rigel: An architecture and scalable programming interface for a 1000-core accelerator

JH Kelm, DR Johnson, MR Johnson… - Proceedings of the 36th …, 2009 - dl.acm.org
This paper considers Rigel, a programmable accelerator architecture for a broad class of
data-and task-parallel computation. Rigel comprises 1000+ hierarchically-organized cores …

PULP: A ultra-low power parallel accelerator for energy-efficient and flexible embedded vision

F Conti, D Rossi, A Pullini, I Loi, L Benini - Journal of Signal Processing …, 2016 - Springer
Novel pervasive devices such as smart surveillance cameras and autonomous micro-UAVs
could greatly benefit from the availability of a computing device supporting embedded …

SGRT: A mobile GPU architecture for real-time ray tracing

WJ Lee, Y Shin, J Lee, JW Kim, JH Nah… - Proceedings of the 5th …, 2013 - dl.acm.org
Recently, with the increasing demand for photorealistic graphics and the rapid advances in
desktop CPUs/GPUs, real-time ray tracing has attracted considerable attention …

RayCore: A ray-tracing hardware architecture for mobile devices

JH Nah, HJ Kwon, DS Kim, CH Jeong, J Park… - ACM Transactions on …, 2014 - dl.acm.org
We present RayCore, a mobile ray-tracing hardware architecture. RayCore facilitates high-
quality rendering effects, such as reflection, refraction, and shadows, on mobile devices by …

A variable warp size architecture

TG Rogers, DR Johnson, M O'Connor… - ACM SIGARCH …, 2015 - dl.acm.org
This paper studies the effect of warp sizing and scheduling on performance and efficiency in
GPUs. We propose Variable Warp Sizing (VWS) which improves the performance of …

T&I engine: Traversal and intersection engine for hardware accelerated ray tracing

JH Nah, JS Park, C Park, JW Kim, YH Jung… - Proceedings of the …, 2011 - dl.acm.org
Ray tracing naturally supports high-quality global illumination effects, but it is
computationally costly. Traversal and intersection operations dominate the computation of …