A case for toggle-aware compression for GPU systems

Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems

F Betzel, K Khatamifard, H Suresh, DJ Lilja… - ACM Computing …, 2018 - dl.acm.org

Approximate computing has gained research attention recently as a way to increase energy
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …

被引用次数：69 相关文章所有 10 个版本

[PDF] ieee.org

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org

Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

被引用次数：95 相关文章所有 10 个版本

[PDF] illinois.edu

Transparent offloading and mapping (TOM) enabling programmer-transparent near-data processing in GPU systems

K Hsieh, E Ebrahimi, G Kim, N Chatterjee… - ACM SIGARCH …, 2016 - dl.acm.org

Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …

被引用次数：311 相关文章所有 23 个版本

[PDF] arxiv.org

Compressing DMA engine: Leveraging activation sparsity for training deep neural networks

M Rhu, M O'Connor, N Chatterjee… - … Symposium on High …, 2018 - ieeexplore.ieee.org

Popular deep learning frameworks require users to fine-tune their memory usage so that the
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …

被引用次数：227 相关文章所有 12 个版本

[PDF] acm.org

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes

R Ausavarungnirun, J Landgraf, V Miller… - Proceedings of the 50th …, 2017 - dl.acm.org

Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …

被引用次数：151 相关文章所有 26 个版本

[PDF] acm.org

What your DRAM power models are not telling you: Lessons from a detailed experimental study

S Ghose, AG Yaglikçi, R Gupta, D Lee… - Proceedings of the …, 2018 - dl.acm.org

Main memory (DRAM) consumes as much as half of the total system power in a computer
today, due to the increasing demand for memory capacity and bandwidth. There is a …

被引用次数：138 相关文章所有 8 个版本

[PDF] acm.org

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org

Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

被引用次数：87 相关文章所有 10 个版本

[PDF] acm.org

Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency

R Ausavarungnirun, V Miller, J Landgraf… - ACM SIGPLAN …, 2018 - dl.acm.org

Graphics Processing Units (GPUs) exploit large amounts of threadlevel parallelism to
provide high instruction throughput and to efficiently hide long-latency stalls. The resulting …

被引用次数：109 相关文章所有 26 个版本

A survey on pcm lifetime enhancement schemes

S Rashidi, M Jalili, H Sarbazi-Azad - ACM Computing Surveys (CSUR), 2019 - dl.acm.org

Phase Change Memory (PCM) is an emerging memory technology that has the capability to
address the growing demand for memory capacity and bridge the gap between the main …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Buddy compression: Enabling larger memory for deep learning and hpc workloads on gpus

E Choukse, MB Sullivan, M O'Connor… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

GPUs accelerate high-throughput applications, which require orders-of-magnitude higher
memory bandwidth than traditional CPU-only systems. However, the capacity of such high …

被引用次数：47 相关文章所有 9 个版本

高级搜索

QQ 群