Throughput-effective on-chip networks for manycore accelerators

A Bakhoda, J Kim, TM Aamodt - 2010 43rd Annual IEEE/ACM …, 2010 - ieeexplore.ieee.org
As the number of cores and threads in manycore compute accelerators such as Graphics
Processing Units (GPU) increases, so does the importance of on-chip interconnection …

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Y Lee, R Avizienis, A Bishara, R Xia… - Proceedings of the 38th …, 2011 - dl.acm.org
We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …

WAYPOINT: scaling coherence to thousand-core architectures

JH Kelm, MR Johnson, SS Lumettta… - Proceedings of the 19th …, 2010 - dl.acm.org
In this paper, we evaluate a set of coherence architectures in the context of a 1024-core chip
multiprocessor (CMP) tailored to throughput-oriented parallel workloads. Based on our …

Cohesion: a hybrid memory model for accelerators

JH Kelm, DR Johnson, W Tuohy, SS Lumetta… - Proceedings of the 37th …, 2010 - dl.acm.org
Two broad classes of memory models are available today: models with hardware cache
coherence, used in conventional chip multiprocessors, and models that rely upon software …

Rigel: A 1,024-core single-chip accelerator architecture

D Johnson, M Johnson, J Kelm, W Tuohy… - IEEE Micro, 2011 - ieeexplore.ieee.org
Rigel is a single-chip accelerator architecture with 1,024 independent processing cores
targeted at a broad class of data-and task-parallel computation. This article discusses …

Simplified vector-thread architectures for flexible and efficient data-parallel accelerators

CF Batten - 2010 - dspace.mit.edu
This thesis explores a new approach to building data-parallel accelerators that is based on
simplifying the instruction set, microarchitecture, and programming methodology for a vector …

Cohesion: An adaptive hybrid memory model for accelerators

JH Kelm, DR Johnson, W Tuohy, SS Lumetta… - IEEE micro, 2011 - ieeexplore.ieee.org
Cohesion is a hybrid memory model that enables fine-grained temporal data reassignment
between hardware-and software-managed coherence domains, allowing systems to support …

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Y Lee, R Avizienis, A Bishara, R Xia… - ACM Transactions on …, 2013 - dl.acm.org
We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …

[图书][B] Efficient embedded computing

JD Balfour - 2010 - search.proquest.com
This dissertation describes Elm, an efficient programmable system for high-performance
embedded applications. Elm is significantly more efficient than conventional embedded …

Designing on-chip networks for throughput accelerators

A Bakhoda, J Kim, TM Aamodt - ACM Transactions on Architecture and …, 2013 - dl.acm.org
As the number of cores and threads in throughput accelerators such as Graphics Processing
Units (GPU) increases, so does the importance of on-chip interconnection network design …