The gem5 simulator: Version 20.0+

J Lowe-Power, AM Ahmad, A Akram, M Alian… - arXiv preprint arXiv …, 2020 - arxiv.org
The open-source and community-supported gem5 simulator is one of the most popular tools
for computer architecture research. This simulation infrastructure allows researchers to …

Transactional memory coherence and consistency

L Hammond, V Wong, M Chen, BD Carlstrom… - ACM SIGARCH …, 2004 - dl.acm.org
In this paper, we propos a new shared memory model: Transactionalmemory Coherence
and Consistency (TCC). TCC providesa model in which atomic transactions are always the …

Reactive NUCA: near-optimal block placement and replication in distributed caches

N Hardavellas, M Ferdman, B Falsafi… - Proceedings of the 36th …, 2009 - dl.acm.org
Increases in on-chip communication delay and the large working sets of server and scientific
workloads complicate the design of the on-chip last-level cache for multicore processors …

System-on-chip: Reuse and integration

R Saleh, S Wilton, S Mirabbasi, A Hu… - Proceedings of the …, 2006 - ieeexplore.ieee.org
Over the past ten years, as integrated circuits became increasingly more complex and
expensive, the industry began to embrace new design and reuse methodologies that are …

[图书][B] A primer on memory consistency and cache coherence

V Nagarajan, DJ Sorin, MD Hill, DA Wood - 2020 - library.oapen.org
Many modern computer systems, including homogeneous and heterogeneous architectures,
support shared memory in hardware. In a shared memory system, each of the processor …

Cooperative caching for chip multiprocessors

J Chang, GS Sohi - ACM SIGARCH Computer Architecture News, 2006 - dl.acm.org
This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's
aggregate on-chip cache resources. Cooperative caching combines the strengths of private …

Transient-Execution Attacks: A Computer Architect Perspective

L Fiolhais, L Sousa - ACM Computing Surveys, 2023 - dl.acm.org
Computer architects employ a series of performance optimizations at the micro-architecture
level. These optimizations are meant to be invisible to the programmer but they are implicitly …

DeNovo: Rethinking the memory hierarchy for disciplined parallelism

B Choi, R Komuravelli, H Sung… - 2011 International …, 2011 - ieeexplore.ieee.org
For parallelism to become tractable for mass programmers, shared-memory languages and
environments must evolve to enforce disciplined practices that ban" wild shared-memory …

Cache coherence for GPU architectures

I Singh, A Shriraman, WWL Fung… - 2013 IEEE 19th …, 2013 - ieeexplore.ieee.org
While scalable coherence has been extensively studied in the context of general purpose
chip multiprocessors (CMPs), GPU architectures present a new set of challenges …

Virtual circuit tree multicasting: A case for on-chip hardware multicast support

NE Jerger, LS Peh, M Lipasti - ACM SIGARCH Computer Architecture …, 2008 - dl.acm.org
Current state-of-the-art on-chip networks provide efficiency, high throughput, and low latency
for one-to-one (unicast) traffic. The presence of one-to-many (multicast) or one-to-all …