Cache decay: Exploiting generational behavior to reduce cache leakage power

S Kaxiras, Z Hu, M Martonosi - Proceedings of the 28th annual …, 2001 - dl.acm.org
Power dissipation is increasingly important in CPUs ranging from those intended for mobile
use, all the way up to high-performance processors for high-end servers. While the bulk of …

Token coherence: Decoupling performance and correctness

MMK Martin, MD Hill, DA Wood - ACM SIGARCH Computer Architecture …, 2003 - dl.acm.org
Many future shared-memory multiprocessor servers will both target commercial workloads
and use highly-integrated" glueless" designs. Implementing low-latency cache coherence in …

RegionScout: Exploiting coarse grain sharing in snoop-based coherence

A Moshovos - … Symposium on Computer Architecture (ISCA'05), 2005 - ieeexplore.ieee.org
It has been shown that many requests miss in all remote nodes in shared memory
multiprocessors. We are motivated by the observation that this behavior extends to much …

Temporal streaming of shared memory

TF Wenisch, S Somogyi, N Hardavellas… - 32nd International …, 2005 - ieeexplore.ieee.org
Coherent read misses in shared-memory multiprocessors account for a substantial fraction
of execution time in many important scientific and commercial workloads. We propose …

Circuit-switched coherence

NDE Jerger, LS Peh, MH Lipasti - Second ACM/IEEE …, 2008 - ieeexplore.ieee.org
Our characterization of a suite of commercial and scientific workloads on a 16-core cache-
coherent chip multiprocessor (CMP) shows that overall system performance is sensitive to …

Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

MMK Martin, PJ Harper, DJ Sorin, MD Hill… - Proceedings of the 30th …, 2003 - dl.acm.org
Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory
multiprocessors. The destination set is the collection of processors that receive a particular …

Studying software evolution information by visualizing the change history

F Van Rysselberghe, S Demeyer - 20th IEEE International …, 2004 - ieeexplore.ieee.org
Before re-engineering a large and complex software system, it is wise to study its change
history in order to identify the most valuable and problematic parts. Unfortunately, typical …

SARC coherence: Scaling directory cache coherence in performance and power

S Kaxiras, G Keramidas - IEEE micro, 2010 - ieeexplore.ieee.org
The SARC project seeks to improve power scalability of shared-memory chip
multiprocessors (CMPs) by making directory coherence more efficient in both power and …

Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

ME Acacio, J González, JM García… - SC'02: Proceedings of …, 2002 - ieeexplore.ieee.org
Cache misses for which data must be obtained from a remote cache (cache-to-cache
transfer misses) account for an important fraction of the total miss rate. Unfortunately, cc …

POPS: Coherence protocol optimization for both private and shared data

H Hossain, S Dwarkadas… - … Conference on Parallel …, 2011 - ieeexplore.ieee.org
As the number of cores in a chip multiprocessor (CMP) increases, the need for larger on-
chip caches also increases in order to avoid creating a bottleneck at the off-chip …