AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware...

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation

J Ansel, E Yang, H He, N Gimelshein, A Jain… - Proceedings of the 29th …, 2024 - dl.acm.org

This paper introduces two extensions to the popular PyTorch machine learning framework,
TorchDynamo and TorchInductor, which implement the torch. compile feature released in …

被引用次数：82 相关文章所有 4 个版本

[PDF] acm.org

Tensorir: An abstraction for automatic tensorized program optimization

S Feng, B Hou, H Jin, W Lin, J Shao, R Lai… - Proceedings of the 28th …, 2023 - dl.acm.org

Deploying deep learning models on various devices has become an important topic. The
wave of hardware specialization brings a diverse set of acceleration primitives for multi …

被引用次数：49 相关文章所有 4 个版本

[PDF] acm.org

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org

With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

被引用次数：7 相关文章所有 5 个版本

[PDF] google.com

Chimera: An analytical optimizing framework for effective compute-intensive operators fusion

S Zheng, S Chen, P Song, R Chen, X Li… - … Symposium on High …, 2023 - ieeexplore.ieee.org

Machine learning models with various tensor operators are becoming ubiquitous in recent
years. There are two types of operators in machine learning: compute-intensive operators …

被引用次数：16 相关文章所有 4 个版本

[PDF] usenix.org

Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org

The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …

被引用次数：1 相关文章

[PDF] acm.org

Graphene: An ir for optimized tensor computations on gpus

B Hagedorn, B Fan, H Chen, C Cecka… - Proceedings of the 28th …, 2023 - dl.acm.org

Modern GPUs accelerate computations and data movements of multi-dimensional tensors in
hardware. However, expressing optimized tensor computations in software is extremely …

被引用次数：12 相关文章

[PDF] acm.org

Mosaic: An interoperable compiler for tensor algebra

M Bansal, O Hsu, K Olukotun, F Kjolstad - Proceedings of the ACM on …, 2023 - dl.acm.org

We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor expressions to
external functions of other tensor algebra libraries and compilers. Users can extend Mosaic …

被引用次数：9 相关文章所有 8 个版本

[PDF] acm.org

Dycl: Dynamic neural network compilation via program rewriting and graph optimization

S Chen, S Wei, C Liu, W Yang - Proceedings of the 32nd ACM SIGSOFT …, 2023 - dl.acm.org

The deep learning (DL) compiler serves as a vital infrastructure component to enable the
deployment of deep neural networks on diverse hardware platforms such as mobile devices …

被引用次数：3 相关文章所有 5 个版本

[PDF] github.io

Memory and computation coordinated mapping of dnns onto complex heterogeneous soc

S Zheng, S Chen, Y Liang - 2023 60th ACM/IEEE Design …, 2023 - ieeexplore.ieee.org

The DNN models are now pervasively used for various applications. Meanwhile, the
computing hardware has shifted towards heterogeneous system composed of various …

被引用次数：6 相关文章所有 2 个版本

[PDF] acm.org

Hidet: Task-mapping programming paradigm for deep learning tensor programs

Y Ding, CH Yu, B Zheng, Y Liu, Y Wang… - Proceedings of the 28th …, 2023 - dl.acm.org

As deep learning models nowadays are widely adopted by both cloud services and edge
devices, reducing the latency of deep learning model inferences becomes crucial to provide …

被引用次数：13 相关文章所有 5 个版本

高级搜索

QQ 群