Specshield: Shielding speculative data from microarchitectural covert channels

K Barber, A Bacha, L Zhou, Y Zhang… - 2019 28th …, 2019 - ieeexplore.ieee.org
Hardware security has recently re-surfaced as a first-order concern to the confidentiality
protections of computing systems. Meltdown and Spectre introduced a new class of …

CRISP: critical slice prefetching

H Litz, G Ayers, P Ranganathan - Proceedings of the 27th ACM …, 2022 - dl.acm.org
The high access latency of DRAM continues to be a performance challenge for
contemporary microprocessor systems. Prefetching is a well-established technique to …

Branch runahead: An alternative to branch prediction for impossible to predict branches

S Pruett, Y Patt - MICRO-54: 54th Annual IEEE/ACM International …, 2021 - dl.acm.org
High performance microprocessors require high levels of instruction supply. Branch
prediction has been the most important driver of this for nearly 30 years. Unfortunately …

Harvesting Memory-bound {CPU} Stall Cycles in Software with {MSH}

Z Luo, S Son, S Ratnasamy, S Shenker - 18th USENIX Symposium on …, 2024 - usenix.org
Memory-bound stalls account for a significant portion of CPU cycles in datacenter
workloads, which makes harvesting them to execute other useful work highly valuable …

Precise runahead execution

A Naithani, J Feliu, A Adileh… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Runahead execution improves processor performance by accurately prefetching long-
latency memory accesses. When a long-latency load causes the instruction window to fill up …

Vector runahead

A Naithani, S Ainsworth, TM Jones… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
The memory wall places a significant limit on performance for many modern workloads.
These applications feature complex chains of dependent, indirect memory accesses, which …

Long term parking (ltp) criticality-aware resource allocation in ooo processors

A Sembrant, T Carlson, E Hagersten… - Proceedings of the 48th …, 2015 - dl.acm.org
Modern processors employ large structures (IQ, LSQ, register file, etc.) to expose instruction-
level parallelism (ILP) and memory-level parallelism (MLP). These resources are typically …

The forward slice core microarchitecture

K Lakshminarasimhan, A Naithani, J Feliu… - Proceedings of the …, 2020 - dl.acm.org
Superscalar out-of-order cores deliver high performance at the cost of increased complexity
and power budget. In-order cores, in contrast, are less complex and have a smaller power …

Freeway: Maximizing MLP for slice-out-of-order execution

R Kumar, M Alipour… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level
cache access latencies. While out-of-order (OoO) cores, and techniques building on them …

Delay and bypass: Ready and criticality aware instruction scheduling in out-of-order processors

M Alipour, S Kaxiras, D Black-Schaffer… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Flexible instruction scheduling is essential for performance in out-of-order processors. This
is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete …