Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions

N Vasilache, O Zinenko, T Theodoridis, P Goyal… - arXiv preprint arXiv …, 2018 - arxiv.org
Deep learning models with convolutional and recurrent networks are now ubiquitous and
analyze massive amounts of audio, image, video, text and graph data, with applications in …

Polly—performing polyhedral optimizations on a low-level intermediate representation

T Grosser, A Groesslinger, C Lengauer - Parallel Processing Letters, 2012 - World Scientific
The polyhedral model for loop parallelization has proved to be an effective tool for advanced
optimization and automatic parallelization of programs in higher-level languages. Yet, to …

Memory-centric accelerator design for convolutional neural networks

M Peemen, AAA Setio, B Mesman… - 2013 IEEE 31st …, 2013 - ieeexplore.ieee.org
In the near future, cameras will be used everywhere as flexible sensors for numerous
applications. For mobility and privacy reasons, the required image processing should be …

Polyhedral parallel code generation for CUDA

S Verdoolaege, J Carlos Juega, A Cohen… - ACM Transactions on …, 2013 - dl.acm.org
This article addresses the compilation of a sequential program for parallel execution on a
modern GPU. To this end, we present a novel source-to-source compiler called PPCG …

Pencil: A platform-neutral compute intermediate language for accelerator programming

R Baghdadi, U Beaugnon, A Cohen… - 2015 International …, 2015 - ieeexplore.ieee.org
Programming accelerators such as GPUs with low-level APIs and languages such as
OpenCL and CUDA is difficult, error-prone, and not performance-portable. Automatic …

Hybrid hexagonal/classical tiling for GPUs

T Grosser, A Cohen, J Holewinski… - Proceedings of Annual …, 2014 - dl.acm.org
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical
hyper-rectangular tiles cannot be used due to the combination of backward and forward …

AN5D: automated stencil framework for high-degree temporal blocking on GPUs

K Matsumura, HR Zohouri, M Wahib, T Endo… - Proceedings of the 18th …, 2020 - dl.acm.org
Stencil computation is one of the most widely-used compute patterns in high performance
computing applications. Spatial and temporal blocking have been proposed to overcome the …

Polyhedral AST generation is more than scanning polyhedra

T Grosser, S Verdoolaege, A Cohen - ACM Transactions on …, 2015 - dl.acm.org
Abstract mathematical representations such as integer polyhedra have been shown to be
useful to precisely analyze computational kernels and to express complex loop …

Split tiling for GPUs: automatic parallelization using trapezoidal tiles

T Grosser, A Cohen, PHJ Kelly, J Ramanujam… - Proceedings of the 6th …, 2013 - dl.acm.org
Tiling is a key technique to enhance data reuse. For computations structured as one
sequential outer" time" loop enclosing a set of parallel inner loops, tiling only the parallel …

Diamond tiling: Tiling techniques to maximize parallelism for stencil computations

U Bondhugula, V Bandishti… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Most stencil computations allow tile-wise concurrent start, ie, there always exists a face of
the iteration space and a set of tiling directions such that all tiles along that face can be …