Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

AA Abdolrashidi, HA Esfeden… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …

BOW: Breathing operand windows to exploit bypassing in GPUs

HA Esfeden, A Abdolrashidi, S Rahman… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible
for a large portion of the area and power. To simplify the architecture of the RF, it is …

Slumber: static-power management for gpgpu register files

D Tripathy, H Zamani, D Sahoo, LN Bhuyan… - Proceedings of the …, 2020 - dl.acm.org
The leakage power dissipation has become one of the major concerns with technology
scaling. The GPGPU register file has grown in size over last decade in order to support the …

SHREG: Mitigating register redundancy in GPUs

S Jin, H Lee, J Lee, J Kim, WW Ro - Journal of Systems Architecture, 2024 - Elsevier
Abstract Graphics Processing Units (GPUs) have become dominant accelerators for
Machine Learning (ML) and High-Performance Computing (HPC) applications due to their …

Highly concurrent latency-tolerant register files for GPUs

M Sadrosadati, A Mirhosseini, A Hajiabadi… - ACM Transactions on …, 2021 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …

A survey on recent hardware data prefetching approaches with an emphasis on servers

M Bakhshalipour, M Shakerinava, F Golshan… - arXiv preprint arXiv …, 2020 - arxiv.org
Data prefetching, ie, the act of predicting application's future memory accesses and fetching
those that are not in the on-chip caches, is a well-known and widely-used approach to hide …

GPU Architecture

H Jeon - Handbook of Computer Architecture, 2023 - Springer
The graphics processing unit (GPU) became an undoubtedly important computing engine
for high-performance computing. With massive parallelism and easy programmability, GPU …

[图书][B] Improving Performance and Energy Efficiency of GPUs through Locality Analysis

D Tripathy - 2021 - search.proquest.com
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing
numerous compute threads in their streaming multiprocessors (SMs) and enormous memory …

[图书][B] Improving Data-Dependent Parallelism in GPUs Through Programmer-Transparent Architectural Support

AA Abdolrashidi - 2021 - search.proquest.com
As modern GPU workloads become larger and more complex, there is an ever-increasing
demand for GPU computational power. Traditionally, GPUs have lacked generalized data …

[图书][B] Enhanced Register Data-Flow Techniques for High-Performance, Energy-Efficient GPUs

HA Esfeden - 2021 - search.proquest.com
To avoid immoderate power consumption, the chip industry has shifted away from high
performance single threaded designs to high throughput multi-threaded designs. Graphic …