Filtered runahead execution with a runahead buffer

R Bera, K Kanellopoulos, A Nori, T Shahroodi… - MICRO-54: 54th Annual …, 2021 - dl.acm.org

Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …

被引用次数：75 相关文章所有 7 个版本

[PDF] cmu.edu

Continuous runahead: Transparent hardware acceleration for memory intensive workloads

M Hashemi, O Mutlu, YN Patt - 2016 49th Annual IEEE/ACM …, 2016 - ieeexplore.ieee.org

Runahead execution pre-executes the application's own code to generate new cache
misses. This pre-execution results in prefetch requests that are overwhelmingly accurate …

被引用次数：124 相关文章所有 14 个版本

[PDF] cam.ac.uk

Decoupled vector runahead

A Naithani, J Roelandts, S Ainsworth… - Proceedings of the 56th …, 2023 - dl.acm.org

We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …

被引用次数：9 相关文章所有 9 个版本

[PDF] arxiv.org

Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction

R Bera, K Kanellopoulos… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Long-latency load requests continue to limit the performance of modern high-performance
processors. To increase the latency tolerance of a processor, architects have primarily relied …

被引用次数：21 相关文章所有 7 个版本

[PDF] ieee.org

A case for memory content-based detection and mitigation of data-dependent failures in DRAM

S Khan, C Wilkerson, D Lee… - IEEE Computer …, 2016 - ieeexplore.ieee.org

DRAM cells in close proximity can fail depending on the data content in neighboring cells.
These failures are called data-dependent failures. Detecting and mitigating these failures …

被引用次数：85 相关文章所有 14 个版本

[PDF] acm.org

Minnow: Lightweight offload engines for worklist management and worklist-directed prefetching

D Zhang, X Ma, M Thomson, D Chiou - ACM SIGPLAN Notices, 2018 - dl.acm.org

The importance of irregular applications such as graph analytics is rapidly growing with the
rise of Big Data. However, parallel graph workloads tend to perform poorly on general …

被引用次数：57 相关文章所有 4 个版本

[PDF] acm.org

Branch runahead: An alternative to branch prediction for impossible to predict branches

S Pruett, Y Patt - MICRO-54: 54th Annual IEEE/ACM International …, 2021 - dl.acm.org

High performance microprocessors require high levels of instruction supply. Branch
prediction has been the most important driver of this for nearly 30 years. Unfortunately …

被引用次数：18 相关文章所有 4 个版本

[PDF] ugent.be

Precise runahead execution

A Naithani, J Feliu, A Adileh… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org

Runahead execution improves processor performance by accurately prefetching long-
latency memory accesses. When a long-latency load causes the instruction window to fill up …

被引用次数：28 相关文章所有 21 个版本

[PDF] cam.ac.uk

Vector runahead

A Naithani, S Ainsworth, TM Jones… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

The memory wall places a significant limit on performance for many modern workloads.
These applications feature complex chains of dependent, indirect memory accesses, which …

被引用次数：14 相关文章所有 14 个版本

[PDF] diva-portal.org

Freeway: Maximizing MLP for slice-out-of-order execution

R Kumar, M Alipour… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org

Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level
cache access latencies. While out-of-order (OoO) cores, and techniques building on them …

被引用次数：26 相关文章所有 7 个版本

高级搜索

QQ 群