Simultaneous branch and warp interweaving for sustained GPU performance

N Brunie, C Collange, G Diamos - ACM SIGARCH Computer Architecture …, 2012 - dl.acm.org
Single-Instruction Multiple-Thread (SIMT) micro-architectures implemented in Graphics
Processing Units (GPUs) run fine-grained threads in lockstep by grouping them into units …

R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs

D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023 - dl.acm.org
A generally used GPU programming methodology is that adjacent threads access data in
neighbor or specific-stride memory addresses and perform computations with the fetched …

Rhythm: Harnessing data parallel hardware for server workloads

SR Agrawal, V Pistol, J Pang, J Tran, D Tarjan… - ACM SIGPLAN …, 2014 - dl.acm.org
Trends in increasing web traffic demand an increase in server throughput while preserving
energy efficiency and total cost of ownership. Present work in optimizing data center …

Compile-time function memoization

A Suresh, E Rohou, A Seznec - … of the 26th International Conference on …, 2017 - dl.acm.org
Memoization is the technique of saving the results of computations so that future executions
can be omitted when the same inputs repeat. Recent work showed that memoization can be …

Temporal SIMT execution optimization through elimination of redundant operations

RM Krashinsky - US Patent 9,830,156, 2017 - Google Patents
One embodiment of the present invention sets forth a technique for optimizing parallel
thread execution in a temporal single-instruction multiple thread (SIMT) architecture. When …

Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

P Xiang, Y Yang, M Mantor, N Rubin, LR Hsu… - Proceedings of the 27th …, 2013 - dl.acm.org
State-of-art graphics processing units (GPUs) employ the single-instruction multiple-data
(SIMD) style execution to achieve both high computational throughput and energy efficiency …

Microarchitectural mechanisms to exploit value structure in SIMT architectures

J Kim, C Torng, S Srinath, D Lockhart… - Proceedings of the 40th …, 2013 - dl.acm.org
SIMT architectures improve performance and efficiency by exploiting control and memory-
access structure across data-parallel threads. Value structure occurs when multiple threads …

Dynamic inter-thread vectorization architecture: extracting DLP from TLP

S Kalathingal, C Collange, BN Swamy… - … Architecture and High …, 2016 - ieeexplore.ieee.org
Threads of Single-Program Multiple-Data (SPMD) applications often execute the same
instructions on different data. We propose the Dynamic Inter-Thread Vectorization …

Big. VLITTLE: On-demand data-parallel acceleration for mobile systems on chip

T Ta, K Al-Hawaj, N Cebry, Y Ou, E Hall… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and
high-efficiency solution to executing task-parallel workloads in mobile systems on chip …

SIMT-X: Extending single-instruction multi-threading to out-of-order cores

A Tino, C Collange, A Seznec - ACM Transactions on Architecture and …, 2020 - dl.acm.org
This work introduces Single Instruction Multi-Thread Express (SIMT-X), a general-purpose
Central Processing Unit (CPU) microarchitecture that enables Graphics Processing Units …