Energy-efficient mechanisms for managing thread context in throughput processors

M Gebhart, DR Johnson, D Tarjan, SW Keckler… - Proceedings of the 38th …, 2011 - dl.acm.org
Modern graphics processing units (GPUs) use a large number of hardware threads to hide
both function unit and memory access latency. Extreme multithreading requires a …

A survey of techniques for designing and managing CPU register file

S Mittal - Concurrency and Computation: Practice and …, 2017 - Wiley Online Library
Processor register file (RF) is an important microarchitectural component used for storing
operands and results of instructions. The design and operation of RF have crucial impact on …

LTRF: Enabling high-capacity register files for GPUs via hardware/software cooperative register prefetching

M Sadrosadati, A Mirhosseini, SB Ehsani… - ACM SIGPLAN …, 2018 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …

A front-end execution architecture for high energy efficiency

R Shioya, M Goshima, H Ando - 2014 47th Annual IEEE/ACM …, 2014 - ieeexplore.ieee.org
Smart phones and tablets have recently become widespread and dominant in the computer
market. Users require that these mobile devices provide a high-quality experience and an …

A hierarchical thread scheduler and register file for energy-efficient throughput processors

M Gebhart, DR Johnson, D Tarjan, SW Keckler… - ACM Transactions on …, 2012 - dl.acm.org
Modern graphics processing units (GPUs) employ a large number of hardware threads to
hide both function unit and memory access latency. Extreme multithreading requires a …

Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs

MA Shoushtary, JM Arnau, JT Murgadas… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Modern GPUs require an enormous register file (RF) to store the context of thousands of
active threads. It consumes considerable energy and contains multiple large banks to …

Clockhands: Rename-free Instruction Set Architecture for Out-of-order Processors

T Koizumi, R Shioya, S Sugita, T Amano… - Proceedings of the 56th …, 2023 - dl.acm.org
Out-of-order superscalar processors are currently the only architecture that speeds up
irregular programs, but they suffer from poor power efficiency. To tackle this issue, we …

Selective register-file cache: an energy saving technique for embedded processor architecture

S Gudaparthi, R Shrestha - Design Automation for Embedded Systems, 2022 - Springer
Embedded system applications of present-day scenario consume profound energy in
execution and its significant fraction is due to an intensive register-file access in the …

Compiling and optimizing real-world programs for STRAIGHT ISA

T Koizumi, S Sugita, R Shioya… - 2021 IEEE 39th …, 2021 - ieeexplore.ieee.org
The renaming unit of a superscalar processor is a very expensive module. It consumes large
amounts of power and limits the front-end bandwidth. To overcome this problem, an …

Highly concurrent latency-tolerant register files for GPUs

M Sadrosadati, A Mirhosseini, A Hajiabadi… - ACM Transactions on …, 2021 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …