Cooperative caching for chip multiprocessors

J Chang, GS Sohi - ACM SIGARCH Computer Architecture News, 2006 - dl.acm.org
This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's
aggregate on-chip cache resources. Cooperative caching combines the strengths of private …

Transactional lock-free execution of lock-based programs

R Rajwar, JR Goodman - ACM SIGOPS Operating Systems Review, 2002 - dl.acm.org
This paper is motivated by the difficulty in writing correct high-performance programs. Writing
shared-memory multi-threaded programs imposes a complex trade-off between …

Token coherence: Decoupling performance and correctness

MMK Martin, MD Hill, DA Wood - ACM SIGARCH Computer Architecture …, 2003 - dl.acm.org
Many future shared-memory multiprocessor servers will both target commercial workloads
and use highly-integrated" glueless" designs. Implementing low-latency cache coherence in …

Performance pathologies in hardware transactional memory

J Bobba, KE Moore, H Volos, L Yen, MD Hill… - ACM SIGARCH …, 2007 - dl.acm.org
Hardware Transactional Memory (HTM) systems reflect choices from three key design
dimensions: conflict detection, version management, and conflict resolution. Previously …

Selective, accurate, and timely self-invalidation using last-touch prediction

AC Lai, B Falsafi - ACM SIGARCH Computer Architecture News, 2000 - dl.acm.org
Communication in cache-coherent distributed shared memory (DSM) often requires
invalidating (or writing back) cached copies of a memory block, incurring high overheads …

[图书][B] A primer on hardware prefetching

B Falsafi, TF Wenisch - 2022 - books.google.com
Since the 1970's, microprocessor-based digital platforms have been riding Moore's law,
allowing for doubling of density for the same area roughly every two years. However …

Accurate and complexity-effective spatial pattern prediction

CF Chen, SH Yang, B Falsafi… - … Symposium on High …, 2004 - ieeexplore.ieee.org
Recent research suggests that there are large variations in a cache's spatial usage, both
within and across programs. Unfortunately, conventional caches typically employ fixed …

Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

MMK Martin, PJ Harper, DJ Sorin, MD Hill… - Proceedings of the 30th …, 2003 - dl.acm.org
Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory
multiprocessors. The destination set is the collection of processors that receive a particular …

Methods to perform disk writes in a distributed shared disk system needing consistency across failures

S Chandrasekaran, RJ Bamford, WH Bridge… - US Patent …, 2007 - Google Patents
Techniques are provided for managing caches in a system with multiple caches that may
contain different copies of the same data item. Specifically, techniques are provided for …

SARC coherence: Scaling directory cache coherence in performance and power

S Kaxiras, G Keramidas - IEEE micro, 2010 - ieeexplore.ieee.org
The SARC project seeks to improve power scalability of shared-memory chip
multiprocessors (CMPs) by making directory coherence more efficient in both power and …