Learning to optimize halide with tree search and random programs

A Adams, K Ma, L Anderson, R Baghdadi… - ACM Transactions on …, 2019 - dl.acm.org
We present a new algorithm to automatically schedule Halide programs for high-
performance image processing and deep learning. We significantly improve upon the …

A deep learning based cost model for automatic code optimization

R Baghdadi, M Merouani… - Proceedings of …, 2021 - proceedings.mlsys.org
Enabling compilers to automatically optimize code has been a longstanding goal for the
compiler community. Efficiently solving this problem requires using precise cost models …

A multi-objective auto-tuning framework for parallel codes

H Jordan, P Thoman, JJ Durillo… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
In this paper we introduce a multi-objective autotuning framework comprising compiler and
runtime components. Focusing on individual code regions, our compiler uses a novel search …

Applying static analysis to large-scale, multi-threaded Java programs

C Artho, A Biere - Proceedings 2001 Australian Software …, 2001 - ieeexplore.ieee.org
Static analysis is a tremendous help when trying to find faults in complex software. Writing
multi-threaded programs is difficult, because the thread scheduling increases the program …

Absinthe: Learning an analytical performance model to fuse and tile stencil codes in one shot

T Gysi, T Grosser, T Hoefler - 2019 28th International …, 2019 - ieeexplore.ieee.org
Expensive data movement makes the optimal target-specific selection of data-locality
transformations essential. Loop fusion and tiling are the most important data-locality …

Bandwidth-aware loop tiling for dma-supported scratchpad memory

M Wu, Y Liu, H Cui, Q Wei, Q Li, L Li, F Lv… - Proceedings of the …, 2020 - dl.acm.org
Scratchpad Memory (SPM) is widely used in emerging domain-specific architectures and
accelerators for improving energy efficiency and time predictability. Typically, SPM-based …

Turbotiling: Leveraging prefetching to boost performance of tiled codes

S Mehta, R Garg, N Trivedi, PC Yew - Proceedings of the 2016 …, 2016 - dl.acm.org
Loop tiling or blocking improves temporal locality by dividing the problem domain into tiles
and then repeatedly accessing the data within a tile. While this reduces reuse, it also leads …

[PDF][PDF] A deep learning based cost model for automatic code optimization in tiramisu

M Merouani, MH Leghettas, R Baghdadi, T Arbaoui… - 2020 - researchgate.net
Programmers spend a lot of time and effort optimizing their code to make it run faster, this
has led compiler researchers to focus on developing automatic optimization techniques that …

From single-to multi-objective auto-tuning of programs: Advantages and implications

J Durillo, T Fahringer - Scientific programming, 2014 - content.iospress.com
Automatic tuning (auto-tuning) of software has emerged in recent years as a promising
method that tries to automatically adapt the behaviour of a program to attain different …

A mixed method of parallel software auto-tuning using statistical modeling and machine learning

A Doroshenko, P Ivanenko, O Novak… - … Technologies in Education …, 2019 - Springer
A mixed method combining formal and auto-tuning approaches and aimed at maximizing
efficiency of parallel programs (in terms of execution time) is proposed. The formal approach …