High performance stencil code generation with lift

B Hagedorn, L Stoltzfus, M Steuwer… - Proceedings of the …, 2018 - dl.acm.org
Stencil computations are widely used from physical simulations to machine-learning. They
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …

Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies

B Hagedorn, J Lenfers, T Koehler, X Qin… - Proceedings of the …, 2020 - dl.acm.org
Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for
many applications. The predominantly used imperative languages-like C or OpenCL-force …

A theoretical model for global optimization of parallel algorithms

J Miller, L Trümper, C Terboven, MS Müller - Mathematics, 2021 - mdpi.com
With the quickly evolving hardware landscape of high-performance computing (HPC) and its
increasing specialization, the implementation of efficient software applications becomes …

Efficient multi-gpu shared memory via automatic optimization of fine-grained transfers

H Muthukrishnan, D Nellans, D Lustig… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Despite continuing research into inter-GPU communication mechanisms, extracting
performance from multi-GPU systems remains a significant challenge. Inter-GPU …

Generating portable high-performance code via multi-dimensional homomorphisms

A Rasch, R Schulze, S Gorlatch - 2019 28th International …, 2019 - ieeexplore.ieee.org
We address a key challenge in programming high-performance applications-achieving
portable performance, ie, the same source code achieves a consistent, high level of …

High-level synthesis of functional patterns with Lift

M Kristien, B Bodin, M Steuwer, C Dubach - Proceedings of the 6th ACM …, 2019 - dl.acm.org
High-level languages are commonly seen as a good fit to tackle the problem of performance
portability across parallel architectures. The Lift framework is a recent approach which …

Matching linear algebra and tensor code to specialized hardware accelerators

PA Martínez, J Woodruff, J Armengol-Estapé… - Proceedings of the …, 2023 - dl.acm.org
Dedicated tensor accelerators demonstrate the importance of linear algebra in modern
applications. Such accelerators have the potential for impressive performance gains, but …

Achieving High Performance the Functional Way: Expressing High-Performance Optimizations as Rewrite Strategies

B Hagedorn, J Lenfers, T Koehler, X Qin… - Communications of the …, 2023 - dl.acm.org
Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for
many applications. The predominantly used imperative languages force the programmer to …

A language for describing optimization strategies

B Hagedorn, J Lenfers, T Koehler, S Gorlatch… - arXiv preprint arXiv …, 2020 - arxiv.org
Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for
many applications. The predominantly used imperative languages-like C or OpenCL-force …

Report of the workshop on program synthesis for scientific computing

H Finkel, I Laguna - arXiv preprint arXiv:2102.01687, 2021 - arxiv.org
Program synthesis is an active research field in academia, national labs, and industry. Yet,
work directly applicable to scientific computing, while having some impressive successes …