[图书][B] A primer on hardware prefetching

B Falsafi, TF Wenisch - 2022 - books.google.com
Since the 1970's, microprocessor-based digital platforms have been riding Moore's law,
allowing for doubling of density for the same area roughly every two years. However …

Watchdoglite: Hardware-accelerated compiler-based pointer checking

S Nagarakatte, MMK Martin, S Zdancewic - Proceedings of Annual IEEE …, 2014 - dl.acm.org
Lack of memory safety in C is the root cause of a multitude of serious bugs and security
vulnerabilities. Numerous software-only and hardware-based schemes have been proposed …

Mascar: Speeding up GPU warps by reducing memory pitstops

A Sethia, DA Jamshidi, S Mahlke - 2015 IEEE 21st International …, 2015 - ieeexplore.ieee.org
With the prevalence of GPUs as throughput engines for data parallel workloads, the
landscape of GPU computing is changing significantly. Non-graphics workloads with high …

The load slice core microarchitecture

TE Carlson, W Heirman, O Allam, S Kaxiras… - Proceedings of the …, 2015 - dl.acm.org
Driven by the motivation to expose instruction-level parallelism (ILP), microprocessor cores
have evolved from simple, in-order pipelines into complex, superscalar out-of-order designs …

Warped-preexecution: A GPU pre-execution approach for improving latency hiding

K Kim, S Lee, MK Yoon, G Koo, WW Ro… - … Symposium on High …, 2016 - ieeexplore.ieee.org
This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …

Idempotent processor architecture

M De Kruijf, K Sankaralingam - Proceedings of the 44th Annual IEEE …, 2011 - dl.acm.org
Improving architectural energy efficiency is important to address diminishing energy
efficiency gains from technology scaling. At the same time, limiting hardware complexity is …

Discerning the dominant out-of-order performance advantage: Is it speculation or dynamism?

DS McFarlin, C Tucker, C Zilles - ACM SIGARCH Computer Architecture …, 2013 - dl.acm.org
In this paper, we set out to study the performance advantages of an Out-of-Order (OOO)
processor relative to in-order processors with similar execution resources. In particular, we …

Issue control for multithreaded processing

A Sethia, S Mahlke - US Patent 9,898,409, 2018 - Google Patents
A multithreaded data processing system performs processing using resource circuitry which
is a finite resource. A saturation signal is generated to indicate when the resource circuitry is …

OUTRIDER: Efficient memory latency tolerance with decoupled strands

NC Crago, SJ Patel - Proceedings of the 38th annual international …, 2011 - dl.acm.org
We present OUTRIDER, an architecture for throughput-oriented processors that provides
memory latency tolerance to improve performance on highly threaded workloads …

RETCON: transactional repair without replay

C Blundell, A Raghavan, MMK Martin - ACM SIGARCH Computer …, 2010 - dl.acm.org
Over the past decade there has been a surge of academic and industrial interest in
optimistic concurrency, ie the speculative parallel execution of code regions that have the …