A survey of recent prefetching techniques for processor caches

S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
As the trends of process scaling make memory systems an even more crucial bottleneck, the
importance of latency hiding techniques such as prefetching grows further. However, naively …

A survey of architectural approaches for data compression in cache and main memory systems

S Mittal, JS Vetter - IEEE Transactions on Parallel and …, 2015 - ieeexplore.ieee.org
As the number of cores on a chip increases and key applications become even more data-
intensive, memory systems in modern processors have to deal with increasingly large …

C-pack: A high-performance microprocessor cache compression algorithm

X Chen, L Yang, RP Dick, L Shang… - IEEE transactions on …, 2009 - ieeexplore.ieee.org
Microprocessor designers have been torn between tight constraints on the amount of on-
chip cache memory and the high latency of off-chip memory, such as dynamic random …

SC2: A statistical compression cache scheme

A Arelakis, P Stenstrom - ACM SIGARCH Computer Architecture News, 2014 - dl.acm.org
Low utilization of on-chip cache capacity limits performance and wastes energy because of
the long latency, limited bandwidth, and energy consumption associated with off-chip …

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps

N Vijaykumar, G Pekhimenko, A Jog… - ACM SIGARCH …, 2015 - dl.acm.org
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent
execution of thousands of threads. Unfortunately, different bottlenecks during execution and …

PACMan: prefetch-aware cache management for high performance caching

CJ Wu, A Jaleel, M Martonosi, SC Steely Jr… - Proceedings of the 44th …, 2011 - dl.acm.org
Hardware prefetching and last-level cache (LLC) management are two independent
mechanisms to mitigate the growing latency to memory. However, the interaction between …

Daemon: Architectural support for efficient data movement in fully disaggregated systems

C Giannoula, K Huang, J Tang, N Koziris… - Proceedings of the …, 2023 - dl.acm.org
Resource disaggregation offers a cost effective solution to resource scaling, utilization, and
failure-handling in data centers by physically separating hardware devices in a server …

Understanding and improving the latency of DRAM-based memory systems

KK Chang - 2017 - search.proquest.com
Over the past two decades, the storage capacity and access bandwidth of main memory
have improved tremendously, by 128x and 20x, respectively. These improvements are …

Approximate communication: Techniques for reducing communication bottlenecks in large-scale parallel systems

F Betzel, K Khatamifard, H Suresh, DJ Lilja… - ACM Computing …, 2018 - dl.acm.org
Approximate computing has gained research attention recently as a way to increase energy
efficiency and/or performance by exploiting some applications' intrinsic error resiliency …

A case for toggle-aware compression for GPU systems

G Pekhimenko, E Bolotin, N Vijaykumar… - … Symposium on High …, 2016 - ieeexplore.ieee.org
Data compression can be an effective method to achieve higher system performance and
energy efficiency in modern data-intensive applications by exploiting redundancy and data …