Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Y Lee, R Avizienis, A Bishara, R Xia… - Proceedings of the 38th …, 2011 - dl.acm.org
We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …

Temporal SIMT execution optimization through elimination of redundant operations

RM Krashinsky - US Patent 9,830,156, 2017 - Google Patents
One embodiment of the present invention sets forth a technique for optimizing parallel
thread execution in a temporal single-instruction multiple thread (SIMT) architecture. When …

SIMD divergence optimization through intra-warp compaction

AS Vaidya, A Shayesteh, DH Woo, R Saharoy… - Proceedings of the 40th …, 2013 - dl.acm.org
SIMD execution units in GPUs are increasingly used for high performance and energy
efficient acceleration of general purpose applications. However, SIMD control flow …

Efficient execution of memory access phases using dataflow specialization

CH Ho, SJ Kim, K Sankaralingam - Proceedings of the 42nd annual …, 2015 - dl.acm.org
This paper identifies a new opportunity for improving the efficiency of a processor core:
memory access phases of programs. These are dynamic regions of programs where most of …

[图书][B] iRODS primer 2: integrated rule-oriented data system

H Xu, T Russell, J Coposky, A Rajasekar, RW Moore… - 2017 - Springer
Policy-based data management enables the creation of community-specific collections.
Every collection is created for a purpose. The purpose defines the set of properties that will …

[PDF][PDF] The Hwacha vector-fetch architecture manual, version 3.8. 1

Y Lee, C Schmidt, A Ou… - … Berkeley, Tech. Rep …, 2015 - aspire.eecs.berkeley.edu
This work-in-progress document outlines the fourth version of the Hwacha vector-fetch
architecture. Inspired by traditional vector machines from the 1970s and 1980s such as the …

[图书][B] Decoupled vector-fetch architecture with a scalarizing compiler

Y Lee - 2016 - search.proquest.com
As we approach the end of conventional technology scaling, computer architects are forced
to incorporate specialized and heterogeneous accelerators into general-purpose processors …

Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Y Lee, R Avizienis, A Bishara, R Xia… - ACM Transactions on …, 2013 - dl.acm.org
We present a taxonomy and modular implementation approach for data-parallel
accelerators, including the MIMD, vector-SIMD, subword-SIMD, SIMT, and vector-thread (VT) …

An integrated vector-scalar design on an in-order ARM core

M Stanic, O Palomar, T Hayes, I Ratkovic… - ACM Transactions on …, 2017 - dl.acm.org
In the low-end mobile processor market, power, energy, and area budgets are significantly
lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that …

[图书][B] Customizable computing

YT Chen, J Cong, M Gill, G Reinman, B Xiao - 2015 - books.google.com
Since the end of Dennard scaling in the early 2000s, improving the energy efficiency of
computation has been the main concern of the research community and industry. The large …