A stencil compiler for short-vector simd architectures

T Henretty, R Veras, F Franchetti, LN Pouchet… - Proceedings of the 27th …, 2013 - dl.acm.org
Stencil computations are an integral component of applications in a number of scientific
computing domains. Short-vector SIMD instruction sets are ubiquitous on modern …

Data layout transformation for stencil computations on short-vector simd architectures

T Henretty, K Stock, LN Pouchet, F Franchetti… - … CC 2011, Held as Part of …, 2011 - Springer
Stencil computations are at the core of applications in many domains such as computational
electromagnetics, image processing, and partial differential equation solvers used in a …

Predictive modeling in a polyhedral optimization space

E Park, J Cavazos, LN Pouchet, C Bastoul… - International journal of …, 2013 - Springer
High-level program optimizations, such as loop transformations, are critical for high
performance on multi-core targets. However, complex sequences of loop transformations …

Multicore-optimized wavefront diamond blocking for optimizing stencil updates

T Malas, G Hager, H Ltaief, H Stengel, G Wellein… - SIAM Journal on …, 2015 - SIAM
The importance of stencil-based algorithms in computational science has focused attention
on optimized parallel implementations for multilevel cache-based processors. Temporal …

Diamond tiling: Tiling techniques to maximize parallelism for stencil computations

U Bondhugula, V Bandishti… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Most stencil computations allow tile-wise concurrent start, ie, there always exists a face of
the iteration space and a set of tiling directions such that all tiles along that face can be …

An overview on loop tiling techniques for code generation

E Hammami, Y Slama - 2017 IEEE/ACS 14th International …, 2017 - ieeexplore.ieee.org
Loop tiling is a well-known compiler transformation for both sequential and parallel
programs optimization. It focuses on the efficient execution of loop nests in order to generate …

Cache accurate time skewing in iterative stencil computations

R Strzodka, M Shaheen, D Pajak… - … Conference on Parallel …, 2011 - ieeexplore.ieee.org
We present a time skewing algorithm that breaks the memory wall for certain iterative stencil
computations. A stencil computation, even with constant weights, is a completely memory …

Cache oblivious parallelograms in iterative stencil computations

R Strzodka, M Shaheen, D Pajak… - Proceedings of the 24th …, 2010 - dl.acm.org
We present a new cache oblivious scheme for iterative stencil computations that performs
beyond system bandwidth limitations as though gigabytes of data could reside in an …

Multidimensional intratile parallelization for memory-starved stencil computations

TM Malas, G Hager, H Ltaief, DE Keyes - ACM Transactions on Parallel …, 2017 - dl.acm.org
Optimizing the performance of stencil algorithms has been the subject of intense research
over the last two decades. Since many stencil schemes have low arithmetic intensity, most …

Pencil: A pipelined algorithm for distributed stencils

H Wang… - … Conference for High …, 2020 - ieeexplore.ieee.org
Stencil computations are at the core of various Computational Fluid Dynamics (CFD)
applications and have been well-studied for several decades. Typically they're highly …