The Droplet Search Algorithm for Kernel Scheduling

M Canesche, V Rosário, E Borin… - ACM Transactions on …, 2024 - dl.acm.org
Kernel scheduling is the problem of finding the most efficient implementation for a
computational kernel. Identifying this implementation involves experimenting with the …

[PDF][PDF] Guided rewriting and constraint satisfaction for parallel GPU code generation

N Mogers - 2023 - core.ac.uk
Abstract Graphics Processing Units (GPUs) are notoriously hard to optimise for manually
due to their scheduling and memory hierarchies. What is needed are good automatic code …

Multi-level functional IR with rewrites for higher-level synthesis of accelerators

C Schlaak - 2023 - era.ed.ac.uk
Specialised accelerators deliver orders of magnitude higher energy-efficiency than general-
purpose processors. Field Programmable Gate Arrays (FPGAs) have become the substrate …

Zero-cost abstractions for irregular data shapes in a high-performance parallel language

F Pizzuti - 2023 - era.ed.ac.uk
Modern parallel accelerators offer an unprecedented degree of performance, and are used
pervasively in important application domains, such as High Performance Computing and …

A domain-extensible compiler with controllable automation of optimisations

T Koehler - arXiv preprint arXiv:2212.12035, 2022 - arxiv.org
In high performance domains like image processing, physics simulation or machine
learning, program performance is critical. Programmers called performance engineers are …