The deep learning compiler: A comprehensive survey

M Li, Y Liu, X Liu, Q Sun, X You, H Yang… - … on Parallel and …, 2020 - ieeexplore.ieee.org
The difficulty of deploying various deep learning (DL) models on diverse DL hardware has
boosted the research and development of DL compilers in the community. Several DL …

MLIR: Scaling compiler infrastructure for domain specific computation

C Lattner, M Amini, U Bondhugula… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
This work presents MLIR, a novel approach to building reusable and extensible compiler
infrastructure. MLIR addresses software fragmentation, compilation for heterogeneous …

MLIR: A compiler infrastructure for the end of Moore's law

C Lattner, M Amini, U Bondhugula, A Cohen… - arXiv preprint arXiv …, 2020 - arxiv.org
This work presents MLIR, a novel approach to building reusable and extensible compiler
infrastructure. MLIR aims to address software fragmentation, improve compilation for …

The sparse polyhedral framework: Composing compiler-generated inspector-executor code

MM Strout, M Hall, C Olschanowsky - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org
Irregular applications such as big graph analysis, material simulations, molecular dynamics
simulations, and finite element analysis have performance problems due to their use of …

Learning to optimize tensor programs

T Chen, L Zheng, E Yan, Z Jiang… - Advances in …, 2018 - proceedings.neurips.cc
We introduce a learning-based framework to optimize tensor programs for deep learning
workloads. Efficient implementations of tensor operators, such as matrix multiplication and …

Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions

N Vasilache, O Zinenko, T Theodoridis, P Goyal… - arXiv preprint arXiv …, 2018 - arxiv.org
Deep learning models with convolutional and recurrent networks are now ubiquitous and
analyze massive amounts of audio, image, video, text and graph data, with applications in …

Ansor: Generating {High-Performance} tensor programs for deep learning

L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali… - … USENIX symposium on …, 2020 - usenix.org
High-performance tensor programs are crucial to guarantee efficient execution of deep
neural networks. However, obtaining performant tensor programs for different operators on …

AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA

J Wang, L Guo, J Cong - The 2021 ACM/SIGDA International Symposium …, 2021 - dl.acm.org
While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …

[PDF][PDF] TVM: end-to-end optimization stack for deep learning

T Chen, T Moreau, Z Jiang, H Shen… - arXiv preprint arXiv …, 2018 - dada.cs.washington.edu
Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current
popularity and utility of deep learning. However, these frameworks are optimized for a …

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

T Henriksen, NGW Serup, M Elsman… - Proceedings of the 38th …, 2017 - dl.acm.org
Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …