Polyhedral parallel code generation for CUDA

M Li, Y Liu, X Liu, Q Sun, X You, H Yang… - … on Parallel and …, 2020 - ieeexplore.ieee.org

The difficulty of deploying various deep learning (DL) models on diverse DL hardware has
boosted the research and development of DL compilers in the community. Several DL …

被引用次数：253 相关文章所有 5 个版本

[PDF] uwaterloo.ca

MLIR: Scaling compiler infrastructure for domain specific computation

C Lattner, M Amini, U Bondhugula… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org

This work presents MLIR, a novel approach to building reusable and extensible compiler
infrastructure. MLIR addresses software fragmentation, compilation for heterogeneous …

被引用次数：552 相关文章所有 10 个版本

[PDF] arxiv.org

MLIR: A compiler infrastructure for the end of Moore's law

C Lattner, M Amini, U Bondhugula, A Cohen… - arXiv preprint arXiv …, 2020 - arxiv.org

This work presents MLIR, a novel approach to building reusable and extensible compiler
infrastructure. MLIR aims to address software fragmentation, improve compilation for …

被引用次数：325 相关文章所有 2 个版本

[PDF] ieee.org

The sparse polyhedral framework: Composing compiler-generated inspector-executor code

MM Strout, M Hall, C Olschanowsky - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org

Irregular applications such as big graph analysis, material simulations, molecular dynamics
simulations, and finite element analysis have performance problems due to their use of …

被引用次数：88 相关文章所有 3 个版本

[PDF] neurips.cc

Learning to optimize tensor programs

T Chen, L Zheng, E Yan, Z Jiang… - Advances in …, 2018 - proceedings.neurips.cc

We introduce a learning-based framework to optimize tensor programs for deep learning
workloads. Efficient implementations of tensor operators, such as matrix multiplication and …

被引用次数：493 相关文章所有 18 个版本

[PDF] arxiv.org

Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions

N Vasilache, O Zinenko, T Theodoridis, P Goyal… - arXiv preprint arXiv …, 2018 - arxiv.org

Deep learning models with convolutional and recurrent networks are now ubiquitous and
analyze massive amounts of audio, image, video, text and graph data, with applications in …

被引用次数：522 相关文章所有 6 个版本

[PDF] usenix.org

Ansor: Generating {High-Performance} tensor programs for deep learning

L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali… - … USENIX symposium on …, 2020 - usenix.org

High-performance tensor programs are crucial to guarantee efficient execution of deep
neural networks. However, obtaining performant tensor programs for different operators on …

被引用次数：425 相关文章所有 16 个版本

[PDF] acm.org

AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA

J Wang, L Guo, J Cong - The 2021 ACM/SIGDA International Symposium …, 2021 - dl.acm.org

While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …

被引用次数：149 相关文章所有 5 个版本

[PDF] washington.edu

[PDF][PDF] TVM: end-to-end optimization stack for deep learning

T Chen, T Moreau, Z Jiang, H Shen… - arXiv preprint arXiv …, 2018 - dada.cs.washington.edu

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current
popularity and utility of deep learning. However, these frameworks are optimized for a …

被引用次数：293 相关文章所有 5 个版本

[PDF] hiperfit.dk

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

T Henriksen, NGW Serup, M Elsman… - Proceedings of the 38th …, 2017 - dl.acm.org

Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …

被引用次数：240 相关文章所有 10 个版本

高级搜索

QQ 群