Flattened butterfly topology for on-chip networks

J Kim, J Balfour, W Dally - 40th Annual IEEE/ACM International …, 2007 - ieeexplore.ieee.org
With the trend towards increasing number of cores in chip multiprocessors, the on-chip
interconnect that connects the cores needs to scale efficiently. In this work, we propose the …

An effective hybrid transactional memory system with strong isolation guarantees

CC Minh, M Trautmann, JW Chung… - Proceedings of the 34th …, 2007 - dl.acm.org
We propose signature-accelerated transactional memory (SigTM), ahybrid TM system that
reduces the overhead of software transactions. SigTM uses hardware signatures to track the …

User-level implementations of read-copy update

M Desnoyers, PE McKenney, AS Stern… - … on Parallel and …, 2011 - ieeexplore.ieee.org
Read-copy update (RCU) is a synchronization technique that often replaces reader-writer
locking because RCU's read-side primitives are both wait-free and an order of magnitude …

BulkSC: Bulk enforcement of sequential consistency

L Ceze, J Tuck, P Montesinos, J Torrellas - Proceedings of the 34th …, 2007 - dl.acm.org
While Sequential Consistency (SC) is the most intuitive memory consistency model and the
one most programmers likely assume, current multiprocessors do not support it. Instead …

Delorean: Recording and deterministically replaying shared-memory multiprocessor execution ef? ciently

P Montesinos, L Ceze, J Torrellas - ACM SIGARCH Computer …, 2008 - dl.acm.org
Support for deterministic replay of multithreaded execution can greatly help in finding
concurrency bugs. For highest effectiveness, replay schemes should (i) record at production …

Performance pathologies in hardware transactional memory

J Bobba, KE Moore, H Volos, L Yen, MD Hill… - ACM SIGARCH …, 2007 - dl.acm.org
Hardware Transactional Memory (HTM) systems reflect choices from three key design
dimensions: conflict detection, version management, and conflict resolution. Previously …

Low-cost router microarchitecture for on-chip networks

J Kim - Proceedings of the 42nd annual IEEE/ACM …, 2009 - dl.acm.org
On-chip networks are critical to the scaling of future multi-core processors. The challenge for
on-chip network is to reduce the cost including power consumption and area while providing …

[图书][B] General-purpose graphics processor architectures

Originally developed to support video games, graphics processor units (GPUs) are now
increasingly used for general-purpose (non-graphics) applications ranging from machine …

Flexible decoupled transactional memory support

A Shriraman, S Dwarkadas, ML Scott - ACM SIGARCH Computer …, 2008 - dl.acm.org
A high-concurrency transactional memory (TM) implementation needs to track concurrent
accesses, buffer speculative updates, and manage conflicts. We present a system, FlexTM …

Hardware transactional memory for GPU architectures

WWL Fung, I Singh, A Brownsword… - Proceedings of the 44th …, 2011 - dl.acm.org
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism
(TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of …