The load slice core microarchitecture

K Barber, A Bacha, L Zhou, Y Zhang… - 2019 28th …, 2019 - ieeexplore.ieee.org

Hardware security has recently re-surfaced as a first-order concern to the confidentiality
protections of computing systems. Meltdown and Spectre introduced a new class of …

被引用次数：99 相关文章所有 7 个版本

[PDF] acm.org

CRISP: critical slice prefetching

H Litz, G Ayers, P Ranganathan - Proceedings of the 27th ACM …, 2022 - dl.acm.org

The high access latency of DRAM continues to be a performance challenge for
contemporary microprocessor systems. Prefetching is a well-established technique to …

被引用次数：23 相关文章所有 4 个版本

[PDF] acm.org

Branch runahead: An alternative to branch prediction for impossible to predict branches

S Pruett, Y Patt - MICRO-54: 54th Annual IEEE/ACM International …, 2021 - dl.acm.org

High performance microprocessors require high levels of instruction supply. Branch
prediction has been the most important driver of this for nearly 30 years. Unfortunately …

被引用次数：22 相关文章所有 4 个版本

[PDF] usenix.org

Harvesting Memory-bound {CPU} Stall Cycles in Software with {MSH}

Z Luo, S Son, S Ratnasamy, S Shenker - 18th USENIX Symposium on …, 2024 - usenix.org

Memory-bound stalls account for a significant portion of CPU cycles in datacenter
workloads, which makes harvesting them to execute other useful work highly valuable …

[PDF] ugent.be

Precise runahead execution

A Naithani, J Feliu, A Adileh… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org

Runahead execution improves processor performance by accurately prefetching long-
latency memory accesses. When a long-latency load causes the instruction window to fill up …

被引用次数：32 相关文章所有 21 个版本

[PDF] cam.ac.uk

Vector runahead

A Naithani, S Ainsworth, TM Jones… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

The memory wall places a significant limit on performance for many modern workloads.
These applications feature complex chains of dependent, indirect memory accesses, which …

被引用次数：17 相关文章所有 14 个版本

[PDF] hal.science

Long term parking (ltp) criticality-aware resource allocation in ooo processors

A Sembrant, T Carlson, E Hagersten… - Proceedings of the 48th …, 2015 - dl.acm.org

Modern processors employ large structures (IQ, LSQ, register file, etc.) to expose instruction-
level parallelism (ILP) and memory-level parallelism (MLP). These resources are typically …

被引用次数：39 相关文章所有 14 个版本

[PDF] ugent.be

The forward slice core microarchitecture

K Lakshminarasimhan, A Naithani, J Feliu… - Proceedings of the …, 2020 - dl.acm.org

Superscalar out-of-order cores deliver high performance at the cost of increased complexity
and power budget. In-order cores, in contrast, are less complex and have a smaller power …

被引用次数：17 相关文章所有 8 个版本

[PDF] diva-portal.org

Freeway: Maximizing MLP for slice-out-of-order execution

R Kumar, M Alipour… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org

Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level
cache access latencies. While out-of-order (OoO) cores, and techniques building on them …

被引用次数：27 相关文章所有 7 个版本

[PDF] ntnu.no

Delay and bypass: Ready and criticality aware instruction scheduling in out-of-order processors

M Alipour, S Kaxiras, D Black-Schaffer… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Flexible instruction scheduling is essential for performance in out-of-order processors. This
is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete …

被引用次数：21 相关文章所有 5 个版本

高级搜索

QQ 群