{TVM}: An automated {End-to-End} optimizing compiler for deep learning

T Chen, T Moreau, Z Jiang, L Zheng, E Yan… - … USENIX Symposium on …, 2018 - usenix.org
There is an increasing need to bring machine learning to a wide diversity of hardware
devices. Current frameworks rely on vendor-specific operator libraries and optimize for a …

Learning to optimize tensor programs

T Chen, L Zheng, E Yan, Z Jiang… - Advances in …, 2018 - proceedings.neurips.cc
We introduce a learning-based framework to optimize tensor programs for deep learning
workloads. Efficient implementations of tensor operators, such as matrix multiplication and …

Tiramisu: A polyhedral compiler for expressing fast and portable code

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

[PDF][PDF] TVM: end-to-end optimization stack for deep learning

T Chen, T Moreau, Z Jiang, H Shen… - arXiv preprint arXiv …, 2018 - dada.cs.washington.edu
Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current
popularity and utility of deep learning. However, these frameworks are optimized for a …

Gamma: Automating the hw mapping of dnn models on accelerators via genetic algorithm

SC Kao, T Krishna - Proceedings of the 39th International Conference on …, 2020 - dl.acm.org
DNN layers are multi-dimensional loops that can be ordered, tiled, and scheduled in myriad
ways across space and time on DNN accelerators. Each of these choices is called a …

Data movement is all you need: A case study on optimizing transformers

A Ivanov, N Dryden, T Ben-Nun, S Li… - … of Machine Learning …, 2021 - proceedings.mlsys.org
Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …

Confuciux: Autonomous hardware resource assignment for dnn accelerators using reinforcement learning

SC Kao, G Jeong, T Krishna - 2020 53rd Annual IEEE/ACM …, 2020 - ieeexplore.ieee.org
DNN accelerators provide efficiency by leveraging reuse of activations/weights/outputs
during the DNN computations to reduce data movement from DRAM to the chip. The reuse is …

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

T Henriksen, NGW Serup, M Elsman… - Proceedings of the 38th …, 2017 - dl.acm.org
Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …

Graph IRS for impure higher-order languages: making aggressive optimizations affordable with precise effect dependencies

O Bračevac, G Wei, S Jia, S Abeysinghe… - Proceedings of the …, 2023 - dl.acm.org
Graph-based intermediate representations (IRs) are widely used for powerful compiler
optimizations, either interprocedurally in pure functional languages, or intraprocedurally in …

[HTML][HTML] Simple and efficient GPU accelerated topology optimisation: Codes and applications

EA Träff, A Rydahl, S Karlsson, O Sigmund… - Computer Methods in …, 2023 - Elsevier
This work presents topology optimisation implementations for linear elastic compliance
minimisation in three dimensions, accelerated using Graphics Processing Units (GPUs) …