Data layout transformation for stencil computations on short-vector simd architectures

T Henretty, K Stock, LN Pouchet, F Franchetti… - … CC 2011, Held as Part of …, 2011 - Springer
Stencil computations are at the core of applications in many domains such as computational
electromagnetics, image processing, and partial differential equation solvers used in a …

Register optimizations for stencils on GPUs

PS Rawat, F Rastello, A Sukumaran-Rajam… - Proceedings of the 23rd …, 2018 - dl.acm.org
The recent advent of compute-intensive GPU architecture has allowed application
developers to explore high-order 3D stencils for better computational accuracy. A common …

A framework for enhancing data reuse via associative reordering

K Stock, M Kong, T Grosser, LN Pouchet… - Proceedings of the 35th …, 2014 - dl.acm.org
The freedom to reorder computations involving associative operators has been widely
recognized and exploited in designing parallel algorithms and to a more limited extent in …

SPARTA: spatial acceleration for efficient and scalable horizontal diffusion weather stencil computation

G Singh, A Khodamoradi, K Denolf, J Lo… - Proceedings of the 37th …, 2023 - dl.acm.org
Fast and accurate climate simulations and weather predictions are critical for understanding
and preparing for the impact of climate change. Real-world climate and weather simulations …

Assessing accelerator-based HPC reverse time migration

M Araya-Polo, J Cabezas, M Hanzich… - … on Parallel and …, 2010 - ieeexplore.ieee.org
Oil and gas companies trust Reverse Time Migration (RTM), the most advanced seismic
imaging technique, with crucial decisions on drilling investments. The economic value of the …

Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing

A Lastovetsky, L Szustak… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Load balancing is a widely accepted technique for performance optimization of scientific
applications on parallel architectures. Indeed, balanced applications do not waste processor …

Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs

T Zhao, P Basu, S Williams, M Hall… - Proceedings of the …, 2019 - dl.acm.org
Stencil computations in real-world scientific applications may contain multiple interrelated
stencils, have multiple input grids, and use higher order discretizations with high arithmetic …

Bricks: A high-performance portability layer for computations on block-structured grids

M Lakshminarasimhan, O Antepara… - … Journal of High …, 2024 - journals.sagepub.com
From partial differential equations to the convolutional neural networks in deep learning, to
matrix operations in dense linear algebra, computations on structured grids dominate high …

Accelerating high-order stencils on GPUs

R Sai, J Mellor-Crummey, X Meng… - 2020 IEEE/ACM …, 2020 - ieeexplore.ieee.org
While implementation strategies for low-order stencils on GPUs have been well-studied in
the literature, not all of the techniques work well for high-order stencils, such as those used …

Compiler-directed transformation for higher-order stencils

P Basu, M Hall, S Williams… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
As the cost of data movement increasingly dominates performance, developers of finite-
volume and finite-difference solutions for partial differential equations (PDEs) are exploring …