Lift: a functional data-parallel IR for high-performance GPU code generation

T Chen, T Moreau, Z Jiang, L Zheng, E Yan… - … USENIX Symposium on …, 2018 - usenix.org

There is an increasing need to bring machine learning to a wide diversity of hardware
devices. Current frameworks rely on vendor-specific operator libraries and optimize for a …

被引用次数：1880 相关文章所有 22 个版本

[PDF] neurips.cc

Learning to optimize tensor programs

T Chen, L Zheng, E Yan, Z Jiang… - Advances in …, 2018 - proceedings.neurips.cc

We introduce a learning-based framework to optimize tensor programs for deep learning
workloads. Efficient implementations of tensor operators, such as matrix multiplication and …

被引用次数：455 相关文章所有 18 个版本

[PDF] arxiv.org

Tiramisu: A polyhedral compiler for expressing fast and portable code

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org

This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

被引用次数：369 相关文章所有 14 个版本

[PDF] washington.edu

[PDF][PDF] TVM: end-to-end optimization stack for deep learning

T Chen, T Moreau, Z Jiang, H Shen… - arXiv preprint arXiv …, 2018 - dada.cs.washington.edu

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current
popularity and utility of deep learning. However, these frameworks are optimized for a …

被引用次数：277 相关文章所有 5 个版本

[PDF] acm.org

Gamma: Automating the hw mapping of dnn models on accelerators via genetic algorithm

SC Kao, T Krishna - Proceedings of the 39th International Conference on …, 2020 - dl.acm.org

DNN layers are multi-dimensional loops that can be ordered, tiled, and scheduled in myriad
ways across space and time on DNN accelerators. Each of these choices is called a …

被引用次数：136 相关文章所有 3 个版本

[PDF] mlsys.org

Data movement is all you need: A case study on optimizing transformers

A Ivanov, N Dryden, T Ben-Nun, S Li… - … of Machine Learning …, 2021 - proceedings.mlsys.org

Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …

被引用次数：136 相关文章所有 20 个版本

[PDF] arxiv.org

Confuciux: Autonomous hardware resource assignment for dnn accelerators using reinforcement learning

SC Kao, G Jeong, T Krishna - 2020 53rd Annual IEEE/ACM …, 2020 - ieeexplore.ieee.org

DNN accelerators provide efficiency by leveraging reuse of activations/weights/outputs
during the DNN computations to reduce data movement from DRAM to the chip. The reuse is …

被引用次数：109 相关文章所有 7 个版本

[PDF] hiperfit.dk

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

T Henriksen, NGW Serup, M Elsman… - Proceedings of the 38th …, 2017 - dl.acm.org

Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …

被引用次数：231 相关文章所有 10 个版本

[PDF] acm.org

Graph IRS for impure higher-order languages: making aggressive optimizations affordable with precise effect dependencies

O Bračevac, G Wei, S Jia, S Abeysinghe… - Proceedings of the …, 2023 - dl.acm.org

Graph-based intermediate representations (IRs) are widely used for powerful compiler
optimizations, either interprocedurally in pure functional languages, or intraprocedurally in …

被引用次数：16 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Simple and efficient GPU accelerated topology optimisation: Codes and applications

EA Träff, A Rydahl, S Karlsson, O Sigmund… - Computer Methods in …, 2023 - Elsevier

This work presents topology optimisation implementations for linear elastic compliance
minimisation in three dimensions, accelerated using Graphics Processing Units (GPUs) …

被引用次数：19 相关文章所有 4 个版本

高级搜索

QQ 群