Warped-slicer: Efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming

Q Xu, H Jeon, K Kim, WW Ro… - ACM SIGARCH Computer …, 2016 - dl.acm.org
As technology scales, GPUs are forecasted to incorporate an ever-increasing amount of
computing resources to support thread-level parallelism. But even with the best effort …

Warped-compression: Enabling power efficient GPUs through register compression

S Lee, K Kim, G Koo, H Jeon, WW Ro… - ACM SIGARCH …, 2015 - dl.acm.org
This paper presents Warped-Compression, a warp-level register compression scheme for
reducing GPU power consumption. This work is motivated by the observation that the …

Gpuguard: Mitigating contention based side and covert channel attacks on gpus

Q Xu, H Naghibijouybari, S Wang… - Proceedings of the …, 2019 - dl.acm.org
Graphics processing units (GPUs) are moving towards supporting concurrent kernel
execution where multiple kernels may be co-executed on the same GPU and even on the …

GreenMM: energy efficient GPU matrix multiplication through undervolting

H Zamani, Y Liu, D Tripathy, L Bhuyan… - Proceedings of the ACM …, 2019 - dl.acm.org
The current trend of ever-increasing performance in scientific applications comes with
tremendous growth in energy consumption. In this paper, we present GreenMM framework …

Warped-preexecution: A GPU pre-execution approach for improving latency hiding

K Kim, S Lee, MK Yoon, G Koo, WW Ro… - … Symposium on High …, 2016 - ieeexplore.ieee.org
This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

Approximating warps with intra-warp operand value similarity

D Wong, NS Kim, M Annavaram - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Value locality, the recurrence of a previously-seen value, has been the enabler of myriad
optimization techniques in traditional processors. Value similarity relaxes the constraint of …

ITAP: Idle-time-aware power management for GPU execution units

M Sadrosadati, SB Ehsani, H Falahati… - ACM Transactions on …, 2019 - dl.acm.org
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for
applications with massively data-parallel tasks. However, recent studies show that GPUs …

Cross-core Data Sharing for Energy-efficient GPUs

H Falahati, M Sadrosadati, Q Xu… - ACM Transactions on …, 2024 - dl.acm.org
Graphics Processing Units (GPUs) are the accelerator of choice in a variety of application
domains, because they can accelerate massively parallel workloads and can be easily …

GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction

I Chaturvedi, BR Godala, Y Wu, Z Xu… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) use massive multi-threading coupled with static
scheduling to hide instruction latencies. Despite this, memory instructions pose a challenge …

Aging-aware workload management on embedded gpu under process variation

H Lee, M Shafique… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) have been employed in embedded systems to handle
increased amounts of computation and to satisfy the timing requirement. Due to the small …