M Hashemi, O Mutlu, YN Patt - 2016 49th Annual IEEE/ACM …, 2016 - ieeexplore.ieee.org
Runahead execution pre-executes the application's own code to generate new cache misses. This pre-execution results in prefetch requests that are overwhelmingly accurate …
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique, executing separately to the main application thread, that exploits massive amounts of …
Long-latency load requests continue to limit the performance of modern high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied …
DRAM cells in close proximity can fail depending on the data content in neighboring cells. These failures are called data-dependent failures. Detecting and mitigating these failures …
The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general …
S Pruett, Y Patt - MICRO-54: 54th Annual IEEE/ACM International …, 2021 - dl.acm.org
High performance microprocessors require high levels of instruction supply. Branch prediction has been the most important driver of this for nearly 30 years. Unfortunately …
A Naithani, J Feliu, A Adileh… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Runahead execution improves processor performance by accurately prefetching long- latency memory accesses. When a long-latency load causes the instruction window to fill up …
The memory wall places a significant limit on performance for many modern workloads. These applications feature complex chains of dependent, indirect memory accesses, which …
R Kumar, M Alipour… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level cache access latencies. While out-of-order (OoO) cores, and techniques building on them …