High performance stencil code generation with lift

B Hagedorn, L Stoltzfus, M Steuwer… - Proceedings of the …, 2018 - dl.acm.org
Stencil computations are widely used from physical simulations to machine-learning. They
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …

Autotuning OpenCL workgroup size for stencil patterns

C Cummins, P Petoumenos, M Steuwer… - arXiv preprint arXiv …, 2015 - arxiv.org
Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels,
and requires knowledge of the underlying hardware, the data being operated on, and the …

Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card

J Jiang, P Zhu - Journal of Applied Geophysics, 2018 - Elsevier
Full waveform inversion (FWI) is a challenging procedure due to the high computational cost
related to the modeling, especially for the elastic case. The graphics processing unit (GPU) …

Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift

L Stoltzfus, B Hagedorn, M Steuwer… - ACM Transactions on …, 2019 - dl.acm.org
Stencil computations are a widely used type of algorithm, found in applications from physical
simulations to machine learning. Stencils are embarrassingly parallel, therefore fit on …

Node-aware stencil communication for heterogeneous supercomputers

C Pearson, M Hidayetoğlu, M Almasri… - 2020 IEEE …, 2020 - ieeexplore.ieee.org
High-performance distributed computing systems increasingly feature nodes that have
multiple CPU sockets and multiple GPUs. The communication bandwidth between these …

Hlsf: A high-level; c++-based framework for stencil computations on accelerators

F Dütsch, K Djelassi, M Haidl, S Gorlatch - Proceedings of the second …, 2014 - dl.acm.org
The development of programs for modern systems with GPUs and other accelerators is a
complex and error-prone task. The popular GPU programming approaches like CUDA and …

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

M de Castro, I Santamaria-Valenzuela, Y Torres… - The Journal of …, 2023 - Springer
Iterative stencil computations are widely used in numerical simulations. They present a high
degree of parallelism, high locality and mostly-coalesced memory access patterns …

Code Generation for Room Acoustics Simulations with Complex Boundary Conditions

L Stoltzfus, B Hamilton, M Steuwer, L Li… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
The software and hardware landscape of high performance computing is expanding faster
than computational scientists can take advantage of new frameworks and platforms. In an …

Towards Collaborative Performance Tuning of Algorithmic Skeletons

C Cummins, P Petoumenos, M Steuwer… - … Level Programming for …, 2016 - research.ed.ac.uk
The physical limitations of microprocessor design have forced the industry towards
increasingly heterogeneous designs to extract performance. This trend has not been …

Collection skeletons: declarative abstractions for data collections

Z Li - 2024 - era.ed.ac.uk
Modern programming languages provide programmers with rich abstractions for data
collections as part of their standard libraries, eg, Containers in the C++ STL, the Java …