Reactive NUCA: near-optimal block placement and replication in distributed caches

N Hardavellas, M Ferdman, B Falsafi… - Proceedings of the 36th …, 2009 - dl.acm.org
Increases in on-chip communication delay and the large working sets of server and scientific
workloads complicate the design of the on-chip last-level cache for multicore processors …

Sampling dead block prediction for last-level caches

SM Khan, Y Tian, DA Jimenez - 2010 43rd Annual IEEE/ACM …, 2010 - ieeexplore.ieee.org
Last-level caches (LLCs) are large structures with significant power requirements. They can
be quite inefficient. On average, a cache block in a 2MB LRU-managed LLC is dead 86% of …

Perceptron learning for reuse prediction

E Teran, Z Wang, DA Jiménez - 2016 49th Annual IEEE/ACM …, 2016 - ieeexplore.ieee.org
The disparity between last-level cache and memory latencies motivates the search for
efficient cache management policies. Recent work in predicting reuse of cache blocks …

Data-oriented transaction execution

I Pandis, R Johnson, N Hardavellas… - Proceedings of the …, 2010 - infoscience.epfl.ch
While hardware technology has undergone major advancements over the past decade,
transaction processing systems have remained largely unchanged. The number of cores on …

Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

H Liu, M Ferdman, J Huh… - 2008 41st IEEE/ACM …, 2008 - ieeexplore.ieee.org
Data caches in general-purpose microprocessors often contain mostly dead blocks and are
thus used inefficiently. To improve cache efficiency, dead blocks should be identified and …

RegionScout: Exploiting coarse grain sharing in snoop-based coherence

A Moshovos - … Symposium on Computer Architecture (ISCA'05), 2005 - ieeexplore.ieee.org
It has been shown that many requests miss in all remote nodes in shared memory
multiprocessors. We are motivated by the observation that this behavior extends to much …

Temporal streaming of shared memory

TF Wenisch, S Somogyi, N Hardavellas… - 32nd International …, 2005 - ieeexplore.ieee.org
Coherent read misses in shared-memory multiprocessors account for a substantial fraction
of execution time in many important scientific and commercial workloads. We propose …

Multiperspective reuse prediction

DA Jiménez, E Teran - Proceedings of the 50th Annual IEEE/ACM …, 2017 - dl.acm.org
The disparity between last-level cache and memory latencies motivates the search for
efficient cache management policies. Recent work in predicting reuse of cache blocks …

PLP: page latch-free shared-everything OLTP

I Pandis, P Tözün, FR Johnson… - Proceedings of the …, 2011 - infoscience.epfl.ch
Scaling the performance of shared-everything transaction processing systems to highly-
parallel multicore hardware remains a challenge for database system designers. Recent …

ESKIMO: E nergy savings using S emantic K nowledge of I nconsequential M emory O ccupancy for DRAM subsystem

C Isen, L John - Proceedings of the 42nd Annual IEEE/ACM …, 2009 - dl.acm.org
Dynamic Random Access Memory (DRAM) is used as the bulk of the main memory in most
computing systems and its energy and power consumption has become a first-class design …