R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs

D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023 - dl.acm.org
A generally used GPU programming methodology is that adjacent threads access data in
neighbor or specific-stride memory addresses and perform computations with the fetched …

[HTML][HTML] Gated-CNN: Combating NBTI and HCI aging effects in on-chip activation memories of Convolutional Neural Network accelerators

NL Muñoz, A Valero, RG Tejero, D Zoni - Journal of Systems Architecture, 2022 - Elsevier
Abstract Negative Bias Temperature Instability (NBTI) and Hot Carrier Injection (HCI) are two
of the main reliability threats in current technology nodes. These aging phenomena degrade …

WIR: Warp instruction reuse to minimize repeated computations in GPUs

K Kim, WW Ro - 2018 IEEE International Symposium on High …, 2018 - ieeexplore.ieee.org
Warp instructions with an identical arithmetic operation on same input values produce the
identical computation results. This paper proposes warp instruction reuse to allow such …

Hi-End: Hierarchical, endurance-aware STT-MRAM-based register file for energy-efficient GPUs

W Jeon, JH Park, Y Kim, G Koo, WW Ro - IEEE Access, 2020 - ieeexplore.ieee.org
Modern Graphics Processing Units (GPUs) require large hardware resources for massive
parallel thread executions. In particular, modern GPUs have a large register file composed …

CASH-RF: A compiler-assisted hierarchical register file in GPUs

Y Oh, I Jeong, WW Ro, MK Yoon - IEEE Embedded Systems …, 2022 - ieeexplore.ieee.org
Spin-transfer torque magnetic random-access memory (STT-MRAM) is an emerging
nonvolatile memory technology that has been received significant attention due to its higher …

Conflict-aware compiler for hierarchical register file on GPUs

E Jeong, ES Park, G Koo, Y Oh, MK Yoon - Journal of Systems Architecture, 2024 - Elsevier
Modern graphics processing units (GPUs) leverage a high degree of thread-level
parallelism, necessitating large-sized register files for storing numerous thread contexts. To …

TEA-RC: Thread Context-Aware Register Cache for GPUs

I Jeong, Y Oh, WW Ro, MK Yoon - IEEE Access, 2022 - ieeexplore.ieee.org
Graphics processing units (GPUs) achieve high throughput by exploiting a high degree of
thread-level parallelism (TLP). To support such high TLP, GPUs have a large-sized register …

Energy-Aware Query Processing: A Case Study on Join Reordering

L Bellatreche, F Djellali, W Macyna… - … Conference on Big …, 2023 - ieeexplore.ieee.org
Analytic processing systems have been traditionally designed to optimize time performance,
leaving energy as a secondary aspect. More recently, during the past decade, there has …

MBZip: Multiblock data compression

R Kanakagiri, B Panda, M Mutyam - ACM Transactions on Architecture …, 2017 - dl.acm.org
Compression techniques at the last-level cache and the DRAM play an important role in
improving system performance by increasing their effective capacities. A compressed block …

An aging-aware GPU register file design based on data redundancy

A Valero, F Candel, D Suárez-Gracia… - IEEE Transactions …, 2018 - ieeexplore.ieee.org
Nowadays, GPUs sit at the forefront of high-performance computing thanks to their massive
computational capabilities. Internally, thousands of functional units, architected to be fed by …