Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation

J Ansel, E Yang, H He, N Gimelshein, A Jain… - Proceedings of the 29th …, 2024 - dl.acm.org
This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …

Tensorir: An abstraction for automatic tensorized program optimization

S Feng, B Hou, H Jin, W Lin, J Shao, R Lai… - Proceedings of the 28th …, 2023 - dl.acm.org
Deploying deep learning models on various devices has become an important topic. The
wave of hardware specialization brings a diverse set of acceleration primitives for multi …

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

Chimera: An analytical optimizing framework for effective compute-intensive operators fusion

S Zheng, S Chen, P Song, R Chen, X Li… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Machine learning models with various tensor operators are becoming ubiquitous in recent
years. There are two types of operators in machine learning: compute-intensive operators …

Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org
The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …

Graphene: An ir for optimized tensor computations on gpus

B Hagedorn, B Fan, H Chen, C Cecka… - Proceedings of the 28th …, 2023 - dl.acm.org
Modern GPUs accelerate computations and data movements of multi-dimensional tensors in
hardware. However, expressing optimized tensor computations in software is extremely …

Mosaic: An interoperable compiler for tensor algebra

M Bansal, O Hsu, K Olukotun, F Kjolstad - Proceedings of the ACM on …, 2023 - dl.acm.org
We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor expressions to
external functions of other tensor algebra libraries and compilers. Users can extend Mosaic …

Dycl: Dynamic neural network compilation via program rewriting and graph optimization

S Chen, S Wei, C Liu, W Yang - Proceedings of the 32nd ACM SIGSOFT …, 2023 - dl.acm.org
The deep learning (DL) compiler serves as a vital infrastructure component to enable the
deployment of deep neural networks on diverse hardware platforms such as mobile devices …

Memory and computation coordinated mapping of dnns onto complex heterogeneous soc

S Zheng, S Chen, Y Liang - 2023 60th ACM/IEEE Design …, 2023 - ieeexplore.ieee.org
The DNN models are now pervasively used for various applications. Meanwhile, the
computing hardware has shifted towards heterogeneous system composed of various …

Hidet: Task-mapping programming paradigm for deep learning tensor programs

Y Ding, CH Yu, B Zheng, Y Liu, Y Wang… - Proceedings of the 28th …, 2023 - dl.acm.org
As deep learning models nowadays are widely adopted by both cloud services and edge
devices, reducing the latency of deep learning model inferences becomes crucial to provide …