Performance improvement methodology for ClearSpeed's CSX600

J Meng, D Tarjan, K Skadron - Proceedings of the 37th annual …, 2010 - dl.acm.org

SIMD organizations amortize the area and power of fetch, decode, and issue logic across
multiple processing units in order to maximize throughput for a given area and power …

被引用次数：371 相关文章所有 12 个版本

[PDF] massey.ac.nz

Mixing multi-core CPUs and GPUs for scientific simulation software

KA Hawick, A Leist, DP Playne - 2010 - mro.massey.ac.nz

Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units …

被引用次数：25 相关文章所有 4 个版本

[PDF] psu.edu

Data-parallel techniques for simulating a mega-scale agent-based model of systemic inflammatory response syndrome on graphics processing units

S Alberts, MK Keenan, RM D'Souza, G An - Simulation, 2012 - journals.sagepub.com

Agent-based modeling is increasingly being used for computer simulation of complex
biological systems. An agent-based model (ABM) is a bottom-up simulation where the bulk …

被引用次数：18 相关文章所有 5 个版本

[PDF] researchgate.net

Data mining analysis to validate performance tuning practices for HPL

TZ Tan, RSM Goh, V March… - 2009 IEEE international …, 2009 - ieeexplore.ieee.org

Applications performance is a criterion for system evaluation, and hence performance tuning
for these applications is of great interest. One such benchmark application is High …

被引用次数：14 相关文章所有 7 个版本

[PDF] researchgate.net

Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

D Wu, X Zou, K Dai, J Rao, P Chen, Z Zheng - Journal of Zhejiang …, 2011 - Springer

The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive
scientific applications. This paper deals with an implementation of the FFT on the accelerator …

被引用次数：12 相关文章所有 8 个版本

[PDF] arxiv.org

Accelerating BLAS on custom architecture through algorithm-architecture co-design

F Merchant, T Vatwani, A Chattopadhyay… - arXiv preprint arXiv …, 2016 - arxiv.org

Basic Linear Algebra Subprograms (BLAS) play key role in high performance and scientific
computing applications. Experimentally, yesteryear multicore and General Purpose …

被引用次数：4 相关文章所有 2 个版本

[PDF] academia.edu

[PDF][PDF] Leveraging Memory Level Parallelism Using Dynamic Warp Subdivision

J Meng, D Tarjan, K Skadron - … , University of Virginia, Tech. Rep. CS …, 2009 - academia.edu

SIMD organizations have shown to allow high throughput for data-parallel applications.
They can operate on multiple datapaths under the same instruction sequencer, with its set of …

被引用次数：6 相关文章所有 5 个版本

[PDF] utexas.edu

Algorithm/architecture codesign of low power and high performance linear algebra compute fabrics

A Pedram - 2013 IEEE International Symposium on Parallel & …, 2013 - ieeexplore.ieee.org

We show the design of specialized compute fabrics that maintain the efficiency of full custom
hardware while providing enough flexibility to execute a whole class of coarse-grain linear …

被引用次数：5 相关文章所有 12 个版本

[PDF] usu.edu

[图书][B] A finite domain constraint approach for placement and routing of coarse-grained reconfigurable architectures

R Saraswat - 2010 - search.proquest.com

Scheduling, placement, and routing are important steps in Very Large Scale Integration
(VLSI) design. Researchers have developed numerous techniques to solve placement and …

被引用次数：4 相关文章所有 5 个版本

A chemical reactor benchmark for parallel adaptive control using feedforward neural networks

CO Cajueiro, EM Hemerly - Proceedings. Vol. 1. Sixth Brazilian …, 2000 - ieeexplore.ieee.org

This paper applies a parallel scheme for adaptive control that uses only one neural network
to a CSTR (continuous stirred tank reactor). Convergence of the identification error is …

被引用次数：5 相关文章所有 3 个版本

高级搜索

QQ 群