Breaking the memory wall in MonetDB

PA Boncz, ML Kersten, S Manegold - Communications of the ACM, 2008 - dl.acm.org
In the past decades, advances in speed of commodity CPUs have far outpaced advances in
RAM latency. Main-memory access has therefore become a performance bottleneck for …

Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

D Tam, R Azimi, M Stumm - ACM SIGOPS Operating Systems Review, 2007 - dl.acm.org
The major chip manufacturers have all introduced chip multiprocessing (CMP) and
simultaneous multithreading (SMT) technology into their processing units. As a result, even …

Database architecture evolution: Mammals flourished long before dinosaurs became extinct

S Manegold, ML Kersten, P Boncz - Proceedings of the VLDB …, 2009 - dl.acm.org
The holy grail for database architecture research is to find a solution that is Scalable &
Speedy, to run on anything from small ARM processors up to globally distributed compute …

I-spy: Context-driven conditional instruction prefetching with coalescing

TA Khan, A Sriraman, J Devietti… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Modern data center applications have rapidly expanding instruction footprints that lead to
frequent instruction cache misses, increasing cost and degrading data center performance …

Database servers on chip multiprocessors: Limitations and opportunities

N Hardavellas, I Pandis, R Johnson… - Proceedings of the …, 2007 - infoscience.epfl.ch
Prior research shows that database system performance is dominated by off-chip data stalls,
resulting in a concerted effort to bring data into on-chip caches. At the same time, high levels …

Proactive instruction fetch

M Ferdman, C Kaynak, B Falsafi - Proceedings of the 44th Annual IEEE …, 2011 - dl.acm.org
Fast access requirements preclude building L1 instruction caches large enough to capture
the working set of server workloads. Efforts exist to mitigate limited L1 instruction cache …

Twig: Profile-guided btb prefetching for data center applications

TA Khan, N Brown, A Sriraman… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Modern data center applications have deep software stacks, with instruction footprints that
are orders of magnitude larger than typical instruction cache (I-cache) sizes. To efficiently …

Temporal instruction fetch streaming

M Ferdman, TF Wenisch, A Ailamaki… - 2008 41st IEEE/ACM …, 2008 - ieeexplore.ieee.org
L1 instruction-cache misses pose a critical performance bottleneck in commercial server
workloads. Cache access latency constraints preclude L1 instruction caches large enough …

RDIP: Return-address-stack directed instruction prefetching

A Kolli, A Saidi, TF Wenisch - Proceedings of the 46th Annual IEEE/ACM …, 2013 - dl.acm.org
L1 instruction fetch misses remain a critical performance bottleneck, accounting for up to
40% slowdowns in server applications. Whereas instruction footprints typically fit within last …

Thermometer: profile-guided btb replacement for data center applications

S Song, TA Khan, SM Shahri, A Sriraman… - Proceedings of the 49th …, 2022 - dl.acm.org
Modern processors employ a decoupled frontend with Fetch Directed Instruction Prefetching
(FDIP) to avoid frontend stalls in data center applications. However, the large branch …