Pythia: A customizable hardware prefetching framework using online reinforcement learning

R Bera, K Kanellopoulos, A Nori, T Shahroodi… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …

Continuous runahead: Transparent hardware acceleration for memory intensive workloads

M Hashemi, O Mutlu, YN Patt - 2016 49th Annual IEEE/ACM …, 2016 - ieeexplore.ieee.org
Runahead execution pre-executes the application's own code to generate new cache
misses. This pre-execution results in prefetch requests that are overwhelmingly accurate …

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction

R Bera, K Kanellopoulos… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Long-latency load requests continue to limit the performance of modern high-performance
processors. To increase the latency tolerance of a processor, architects have primarily relied …

A case for memory content-based detection and mitigation of data-dependent failures in DRAM

S Khan, C Wilkerson, D Lee… - IEEE Computer …, 2016 - ieeexplore.ieee.org
DRAM cells in close proximity can fail depending on the data content in neighboring cells.
These failures are called data-dependent failures. Detecting and mitigating these failures …

Minnow: Lightweight offload engines for worklist management and worklist-directed prefetching

D Zhang, X Ma, M Thomson, D Chiou - ACM SIGPLAN Notices, 2018 - dl.acm.org
The importance of irregular applications such as graph analytics is rapidly growing with the
rise of Big Data. However, parallel graph workloads tend to perform poorly on general …

Branch runahead: An alternative to branch prediction for impossible to predict branches

S Pruett, Y Patt - MICRO-54: 54th Annual IEEE/ACM International …, 2021 - dl.acm.org
High performance microprocessors require high levels of instruction supply. Branch
prediction has been the most important driver of this for nearly 30 years. Unfortunately …

Precise runahead execution

A Naithani, J Feliu, A Adileh… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Runahead execution improves processor performance by accurately prefetching long-
latency memory accesses. When a long-latency load causes the instruction window to fill up …

Vector runahead

A Naithani, S Ainsworth, TM Jones… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
The memory wall places a significant limit on performance for many modern workloads.
These applications feature complex chains of dependent, indirect memory accesses, which …

Freeway: Maximizing MLP for slice-out-of-order execution

R Kumar, M Alipour… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level
cache access latencies. While out-of-order (OoO) cores, and techniques building on them …