Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures

T Ben-Nun, J de Fine Licht, AN Ziogas… - Proceedings of the …, 2019 - dl.acm.org
The ubiquity of accelerators in high-performance computing has driven programming
complexity beyond the skill-set of the average domain scientist. To maintain performance …

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

M Merouani, KA Boudaoud, IN Aouadj… - arXiv preprint arXiv …, 2024 - arxiv.org
While polyhedral compilers have shown success in implementing advanced code
transformations, they still have challenges in selecting the most profitable transformations …

Schedule synthesis for halide pipelines on gpus

S Sioutas, S Stuijk, T Basten, H Corporaal… - ACM Transactions on …, 2020 - dl.acm.org
The Halide DSL and compiler have enabled high-performance code generation for image
processing pipelines targeting heterogeneous architectures through the separation of …

TIRAMISU: A polyhedral compiler for dense and sparse deep learning

R Baghdadi, AN Debbagh, K Abdous… - arXiv preprint arXiv …, 2020 - arxiv.org
In this paper, we demonstrate a compiler that can optimize sparse and recurrent neural
networks, both of which are currently outside of the scope of existing neural network …

Tiramisu: A code optimization framework for high performance systems

R Baghdadi, J Ray, MB Romdhane… - arXiv preprint arXiv …, 2018 - andreask.cs.illinois.edu
Tiramisu: A Code Optimization Framework for High Performance Systems Page 1 Riyadh
Baghdadi et. al. Tiramisu: A Code Optimization Framework for High Performance Systems …

[PDF][PDF] Technical Report about Tiramisu: a Three-Layered Abstraction for Hiding Hardware Complexity from DSL Compilers

R Baghdadi, J Ray, MB Romdhane… - arXiv preprint arXiv …, 2018 - groups.csail.mit.edu
High-performance DSL developers work hard to take advantage of modern hardware. The
DSL compilers have to build their own complex middle-ends before they can target a …

[引用][C] Towards Image Processing on Embedded Hardware with Lift