Putting automatic polyhedral compilation for GPGPU to work

T Grosser, A Groesslinger, C Lengauer - Parallel Processing Letters, 2012 - World Scientific

The polyhedral model for loop parallelization has proved to be an effective tool for advanced
optimization and automatic parallelization of programs in higher-level languages. Yet, to …

被引用次数：757 相关文章所有 20 个版本

[PDF] acm.org

Polyhedral parallel code generation for CUDA

S Verdoolaege, J Carlos Juega, A Cohen… - ACM Transactions on …, 2013 - dl.acm.org

This article addresses the compilation of a sequential program for parallel execution on a
modern GPU. To this end, we present a novel source-to-source compiler called PPCG …

被引用次数：521 相关文章所有 10 个版本

[PDF] ethz.ch

Polly-ACC transparent compilation to heterogeneous hardware

T Grosser, T Hoefler - Proceedings of the 2016 International Conference …, 2016 - dl.acm.org

Programming today's increasingly complex heterogeneous hardware is difficult, as it
commonly requires the use of data-parallel languages, pragma annotations, specialized …

被引用次数：65 相关文章所有 34 个版本

[PDF] github.io

Introducing'Bones' a parallelizing source-to-source compiler based on algorithmic skeletons

C Nugteren, H Corporaal - Proceedings of the 5th Annual Workshop on …, 2012 - dl.acm.org

Recent advances in multi-core and many-core processors requires programmers to exploit
an increasing amount of parallelism from their applications. Data parallel languages such as …

被引用次数：69 相关文章所有 5 个版本

[PDF] acm.org

Bones: An automatic skeleton-based C-to-CUDA compiler for GPUs

C Nugteren, H Corporaal - ACM Transactions on Architecture and Code …, 2014 - dl.acm.org

The shift toward parallel processor architectures has made programming and code
generation increasingly challenging. To address this programmability challenge, this article …

被引用次数：50 相关文章所有 2 个版本

[PDF] tue.nl

TC-CIM: Empowering tensor comprehensions for computing-in-memory

A Drebes, L Chelini, O Zinenko, A Cohen… - 10th International …, 2020 - research.tue.nl

Abstract Memristor-based, non-von-Neumann architectures performing tensor operations
directly in memory are a promising approach to address the ever-increasing demand for …

被引用次数：24 相关文章所有 8 个版本

[PDF] academia.edu

Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs

P Di, D Ye, Y Su, Y Sui, J Xue - 2012 41st International …, 2012 - ieeexplore.ieee.org

Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of
GPUs to obtain high performance. One state-of-the-art approach makes use of the …

被引用次数：37 相关文章所有 12 个版本

[PDF] academia.edu

Automatic CPU/GPU generation of multi-versioned OpenCL kernels for C++ scientific applications

R Sotomayor, LM Sanchez, J Garcia Blas… - International journal of …, 2017 - Springer

Parallelism has become one of the most extended paradigms used to improve performance.
However, it forces software developers to adapt applications and coding mechanisms to …

被引用次数：15 相关文章所有 7 个版本

An interactive tool based on polly for detection and parallelization of loops

D Göhringer, J Tepelmann - … of Workshop on Parallel Programming and …, 2014 - dl.acm.org

In many applications, such as signal and image processing, most computation time is spent
within loops. Therefore, these loops are ideal candidates for performance increase when …

被引用次数：15 相关文章

[PDF] tum.de

Transitioning spiking neural network simulators to heterogeneous hardware

QAP Nguyen, P Andelfinger, WJ Tan, W Cai… - ACM Transactions on …, 2021 - dl.acm.org

Spiking neural networks (SNN) are among the most computationally intensive types of
simulation models, with node counts on the order of up to 1011. Currently, there is intensive …

被引用次数：9 相关文章所有 10 个版本

高级搜索

QQ 群