Improving energy efficiency of GPUs through data compression and compressed execution

D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023 - dl.acm.org

A generally used GPU programming methodology is that adjacent threads access data in
neighbor or specific-stride memory addresses and perform computations with the fetched …

被引用次数：7 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Gated-CNN: Combating NBTI and HCI aging effects in on-chip activation memories of Convolutional Neural Network accelerators

NL Muñoz, A Valero, RG Tejero, D Zoni - Journal of Systems Architecture, 2022 - Elsevier

Abstract Negative Bias Temperature Instability (NBTI) and Hot Carrier Injection (HCI) are two
of the main reliability threats in current technology nodes. These aging phenomena degrade …

被引用次数：10 相关文章所有 2 个版本

WIR: Warp instruction reuse to minimize repeated computations in GPUs

K Kim, WW Ro - 2018 IEEE International Symposium on High …, 2018 - ieeexplore.ieee.org

Warp instructions with an identical arithmetic operation on same input values produce the
identical computation results. This paper proposes warp instruction reuse to allow such …

被引用次数：25 相关文章所有 3 个版本

[PDF] ieee.org

Hi-End: Hierarchical, endurance-aware STT-MRAM-based register file for energy-efficient GPUs

W Jeon, JH Park, Y Kim, G Koo, WW Ro - IEEE Access, 2020 - ieeexplore.ieee.org

Modern Graphics Processing Units (GPUs) require large hardware resources for massive
parallel thread executions. In particular, modern GPUs have a large register file composed …

被引用次数：11 相关文章所有 6 个版本

CASH-RF: A compiler-assisted hierarchical register file in GPUs

Y Oh, I Jeong, WW Ro, MK Yoon - IEEE Embedded Systems …, 2022 - ieeexplore.ieee.org

Spin-transfer torque magnetic random-access memory (STT-MRAM) is an emerging
nonvolatile memory technology that has been received significant attention due to its higher …

被引用次数：5 相关文章所有 4 个版本

Conflict-aware compiler for hierarchical register file on GPUs

E Jeong, ES Park, G Koo, Y Oh, MK Yoon - Journal of Systems Architecture, 2024 - Elsevier

Modern graphics processing units (GPUs) leverage a high degree of thread-level
parallelism, necessitating large-sized register files for storing numerous thread contexts. To …

TEA-RC: Thread Context-Aware Register Cache for GPUs

I Jeong, Y Oh, WW Ro, MK Yoon - IEEE Access, 2022 - ieeexplore.ieee.org

Graphics processing units (GPUs) achieve high throughput by exploiting a high degree of
thread-level parallelism (TLP). To support such high TLP, GPUs have a large-sized register …

被引用次数：3 相关文章所有 5 个版本

[PDF] uh.edu

Energy-Aware Query Processing: A Case Study on Join Reordering

L Bellatreche, F Djellali, W Macyna… - … Conference on Big …, 2023 - ieeexplore.ieee.org

Analytic processing systems have been traditionally designed to optimize time performance,
leaving energy as a secondary aspect. More recently, during the past decade, there has …

被引用次数：1 相关文章所有 3 个版本

[PDF] acm.org

MBZip: Multiblock data compression

R Kanakagiri, B Panda, M Mutyam - ACM Transactions on Architecture …, 2017 - dl.acm.org

Compression techniques at the last-level cache and the DRAM play an important role in
improving system performance by increasing their effective capacities. A compressed block …

被引用次数：12 相关文章所有 3 个版本

[PDF] upv.es

An aging-aware GPU register file design based on data redundancy

A Valero, F Candel, D Suárez-Gracia… - IEEE Transactions …, 2018 - ieeexplore.ieee.org

Nowadays, GPUs sit at the forefront of high-performance computing thanks to their massive
computational capabilities. Internally, thousands of functional units, architected to be fed by …

被引用次数：11 相关文章所有 6 个版本

高级搜索

QQ 群